rocm-systems

Автор	SHA1	Сообщение	Дата
Konstantin Zhuravlyov	08c94463de	loader: allow but skip static relocations for code object v2+ Change-Id: I4ae14cb5e740d7d45810b75038b15a0b94d2bf0b	2024-04-09 11:39:18 -04:00
Konstantin Zhuravlyov	b983c19729	Switch to per-executable contexts in the loader - Per-executable contexts should be used from now on - Global contexts are left as is for now for backwards compatibility and will be phased out in follow up patches. Change-Id: I6291abf865c7ed24ee71f5065e539afc23f5ce64	2024-04-09 10:31:51 -04:00
Konstantin Zhuravlyov	9e8f185397	Add R_AMDGPU_ABS32 support Change-Id: I0ee0302d919ede44765adf02eab15015573efef2	2024-03-26 18:47:29 -04:00
Konstantin Zhuravlyov	c5e74b7d0a	Add dynamic relocation types (NFC) Change-Id: I1b443003077ba241f34444da293e362266c2ae92	2024-03-26 18:47:05 -04:00
Konstantin Zhuravlyov	b2c32ad6cb	Rename existing relocation types to legacy/v1 (NFC) Change-Id: Ided7f656c34131b8067a19c0d3b2955fc8823628	2024-03-26 18:46:50 -04:00
pvanhout	a93c18dc90	[libamdhsacode] Support COV6/Generic Targets Change-Id: I4680577eb56dc436fbc134b169f172dd476bff37	2024-03-12 07:37:32 -04:00
Lancelot SIX	5d3f6a63f1	trap_handler: Set status.skip_export when halting a wave When inspecting waves on architectures where SPI may not initialize TTMP registers, the debugger cannot reliably know if the trap handler was entered and if it saved valuable information in TTMP registers. This patch uses the status.skip_export bit (unused by the compute shaders) to indicate that it got executed before halting a wave. This is done except for gfx940, where ttmp11[31] can be used (as long as TTMP registers are always initialized by SPI for this architecture). It could be possible to be more selective as architectures always initializing TTMP registers do not require this step, but always doing is makes maintenance simpler. Change-Id: I5c4148c78062f7ffa049ac7856c2edc82dbc77d1	2024-03-05 09:28:33 -05:00
Joseph Huber	9e26cbac14	Add executable symbol info for the wavefront size The wavefront size is currently only exposed as an agent level attribute. This is not correctyl, because while the agent has a default wave front size that is usually correct, it can easily be overridden via options like -mwavefrontsize64 on various ISAs. The wavefrontsize attribute is actually more of a calling convention that is consistent within a callgraph. Because the root of each call graph is a kernel in this architecture, we need to be able to query this on a per-kernel basis. This information is already avialable in the kernel descriptor packet, but it wasn't exported. This patch adds HSA_CODE_SYMBOL_INFO_KERNEL_WAVEFRONT_SIZE as a new option to query on the executable symbol. Change-Id: I744815c89cc9d4c82f25479bdd48ae1f32e859ff	2024-02-09 15:55:30 +00:00
Lancelot SIX	6f828d8609	Revert "trap_handler: Set status.skip_export when halting a wave" This reverts commit `c5db063b2f`. This change is required for the runtime to generate reliable core dump files, but this feature has been disabled for now by `5e3be9c28a`. Until it is needed, revert the ABI change in the trap handler to maintain compatibility with older debugger. Change-Id: I77a1562dc7962befe2bf88442df858e2d2b1c5ab	2024-01-16 15:55:59 +00:00
Lancelot SIX	c5db063b2f	trap_handler: Set status.skip_export when halting a wave When inspecting waves on architectures where SPI may not initialize TTMP registers, the debugger cannot reliably know if the trap handler was entered and if it saved valuable information in TTMP registers. This patch uses the status.skip_export bit (unused by the compute shaders) to indicate that it got executed before halting a wave. This is done except for gfx940, where ttmp11[31] can be used (as long as TTMP registers are always initialized by SPI for this architecture). It could be possible to be more selective as architectures always initializing TTMP registers do not require this step, but always doing is makes maintenance simpler. Change-Id: I314db6b37772f7daa8bd405e6662a86658d3f5e0	2023-12-06 21:20:03 -05:00
Lancelot SIX	2f2ba050f6	Park waves for gfx11 and bump abi version to 9 On gfx11, with a sequence such as s_trap 2 s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) s_endpgm the s_sendmsg does deallocate registers while the wave is supposed to be stopped. As a result, the wave cannot do the expected context save operations, and cannot context save. To avoid this problem, park the wave in the trap handler for gfx11. Note that gfx11 has implemented an instruction cache prefetch. When parked, the prefetch tries to access memory past the end of trap handler which causes memory violation exceptions to be reported. To avoid this, we need to add padding at the end of the trap handler. The padding consists of `s_code_end` instructions Given that the trap handler is loaded at a 0x1000 aligned address the maximum prefetch amount (in bytes) is given by `256 - (trap_handler_size % 64)`. Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933	2023-07-15 09:44:50 -04:00
Laurent Morichetti	f31b312611	Update the trap handler for gfx940 gfx940 uses ttmp11 to hold the queue packet index so the first level trap handler uses ttmp13 instead to save ib_sts. Repurpose ttmp11[31] to mean that the ttmps are initialized. The issue was that the debugger could not tell whether ttmp6 was written by the trap handler when determining the stop reason. If ttmp11[31]=0, then the trap handler has not been executed and ttmp6 should be assumed to be 0. If ttmp11[31]=1, then ttmp6 holds the trap_id, if an s_trap instruction caused the exception. Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com> Signed-off-by: Lancelot Six <lancelot.six@amd.com> Change-Id: I9af903abae044b9ec530306229caf3b883f3ee46	2023-04-27 16:15:14 -04:00
Konstantin Zhuravlyov	d962fc39bb	Add support for the following kernel symbol query: - HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_DYNAMIC_CALLSTACK Change-Id: Idff5c1a2ce2a3e2d65bcc9cf1f66a68d37cd41ef	2022-07-29 15:15:24 -04:00
Konstantin Zhuravlyov	9265409f08	Add code object v5 support Change-Id: I03522765056e99ed49e6c5e213ee3753852de27b	2022-04-12 08:53:27 -04:00
Jay Cornwall	f3d942b67f	Report union of wave errors as a bitmask in trap handler Also fix incorrect PC increment on host trap. Change-Id: Ic8bbf2b90f9f879ba62b558b909d010a8939a663	2021-07-16 18:03:26 -05:00
Jay Cornwall	7e4088309d	Add new trap handler, bump debug API version Also fix hsaKmtRuntimeEnable error handling. Continue if ioctl fails. Change-Id: I754ccba5910ccfef6f1ada1415593ef89ce33aba	2021-07-16 18:03:26 -05:00
Laurent Morichetti	ea6ee0aa81	New trap handler ABI (v5) Park the wave, if it is stopped, to avoid halting it at an s_endpgm instruction if the architecture does not support it. Free ttmp6 by converting the dispatch_ptr into a queue packet index (25-bit) and storing it in ttmp7[24:0]. Save the exception PC in ttmp11[22:7] ttmp6[31:0]. Change-Id: Iaa3c5baf5b488c0b534044d338f12bffa63ddce2	2021-03-04 21:44:14 -05:00
Laurent Morichetti	9ca79d072a	New trap handler ABI (v4) Replace the stop reasons ttmp11.trap_raised and ttmp11.excp_raised with ttmp11.wave_stopped which indicates that the trap handler has halted the wave as the result of an event (trap, single-step or exception). If the wave is stopped because of a trap, also record the trap_id in ttmp11.saved_trap_id[7:0]. Save status.halt in ttmp11.saved_status_halt, so that it can be restored when resuming a wave (changing a wave's state from stopped to running or single-stepping). Change-Id: I7322f59b60e8cc1b92bf5f067dba606a3109ef49	2021-02-05 09:56:01 -08:00
Laurent Morichetti	8aec53969f	Don't terminate waves halted at s_endpgm To support single stepping the instruction preceding an s_endpgm, unwind the PC by 8 bytes and set ttmp11[9] to notify the debugger that the wave is halted with a modified PC. Bump the debug r_version for this new trap handler ABI. Change-Id: I55e4e0d65576f92da14a336266c31c513baab547	2021-01-21 20:51:38 -08:00
Tony	ef755e4c82	Update code object V3 kernarg queries Code object V2 had the ability to support the following queries: - HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE - HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE - HSA_CODE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT - HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT However code object V3 onwards cannot support these as the kernel descriptor changed. These queries need to be deprecated. Until then return more reasonable values: - For kernarg alignment return 16 which is the minimum alignment required by the HSA standard. - For kernarg size return the field from the kernel descriptor which is a hint. If it is 0 then the compiler is not specifying the kernarg size, or the kernel has no kernarg. Change-Id: I19ce6cd0f3658a2bf62277492f39100ea5ab4256	2020-11-20 21:39:18 -05:00
Konstantin Zhuravlyov	3a08d0964e	Implement Target ID Proposal Changes from Konstantin Zhuravlyov, Tony Tye Change-Id: I532801193afa9d5b8ac2a877b5497eab661f0597	2020-11-10 13:42:35 -05:00
Sean Keely	9c20f0e649	Correct memory release function. l_name is populated by strdup which requires using free rather than delete. Change-Id: I9d9bdcfaa3ef095502270f332b95a0ee5c0bbcfc	2020-08-26 18:22:59 -05:00
Ramesh Errabolu	fa13208698	Add rocr namespace to core header and impl files Change-Id: I1e1b33f9bba1078d049bc19797889988c3e43360	2020-06-19 22:34:21 -04:00
Sean Keely	ce19721c88	Update copyright date. Change-Id: If4bf4c20cf051878bfe759080bb7345d884dd53d	2020-06-19 22:34:01 -04:00
Konstantin Zhuravlyov	9eb735ec24	Add support for code object URI to ROCr Adds the following: - New factory method to create a code object reader from file with offset and size. - A pair of queries on a loaded code object to get the URI name/length. - A bump to the AMD vendor loader extension API and its associated table. Change-Id: I17c83e9c2447d29a43c438459395365f786a3611	2020-06-01 11:07:50 -04:00
Laurent Morichetti	00da82f951	Add debugger support for wave halted at launch New trap handler ABI: Record in ttmp11[8:7] the event that caused the trap handler to be entered. We currently record 2 events, trap_raised if an s_trap instruction was executed, or excp_raised if an exception (MEM_VIOL or ILLEGAL_INST) was raised. Change-Id: Ie278c8277437b3b67c2737dcd1a12fe6511df428	2020-04-29 19:29:56 -04:00
Laurent Morichetti	5f783494f1	Return a file URI for elf images in shared objects Iterate the loaded shared objects to see if the given elf image binary is part of a loaded segment. Change-Id: I074cacd99eb5b59f883f4ce2bd901e0e35a660b8	2020-04-14 15:22:43 -04:00
Ramesh Errabolu	627991b1c1	Update how code references publicly available ROCr headers Change-Id: I357c51eb713a23704d4fee71081be46a73a71806	2020-02-21 20:01:11 -05:00
Saleel Kudchadker	c57f3da1dc	Reset link_map map in the constructor Change-Id: I8a6ad3bc0fca790dec2992cacf9288068b3bcaa3	2020-02-19 15:29:35 -08:00
Sean Keely	3e9aca0f34	Support stripped binaries and remove unneeded attributes. Attribute optimize(0) doesn't appear to be helpful helpful. This prevents optimization in the function but not at call sites to the function. The function may still be inlined since it has no side effect (in some cases that we currently don't support). Having a side effect prevents a call site optimization that allows removal of a noinline function call with no side effect. Call site optimization should only happen (in GCC at least) when using whole program optimization so this may be stronger than we strictly need. Also added _amdgpu_r_debug to the exported symbol list (global) and switched to the standard macro for an exported symbol (HSA_API). Without being in the global list the debugger will not find this symbol if the binary has been stripped. Change-Id: Ieb00175ccc55fda4491deee44711cd55b3f24aeb	2020-01-21 20:08:02 -05:00
Laurent Morichetti	19e1fb3a4e	Fix a build error when compiling with clang Check __clang__ before __GNUC__ as clang defines both. Change-Id: I9963f8e0665efb4cb08bd3886fb38fee42dd9861	2020-01-15 18:52:53 -08:00
Qingchuan Shi	d63886190f	fix optimize(0) for clang. Change-Id: I83bc57d42815f37445ae97bf6950147e3358ac45	2020-01-13 20:53:40 -05:00
Qingchuan Shi	16a20cfb8c	Adding code object list in loader. Change-Id: Iab3541287bd56276fd32615ee59fcd590de84ca0	2019-10-30 20:31:51 -04:00
Konstantin Zhuravlyov	2275c74695	Loader: add basic logging abilities - Enabled with env var LOADER_ENABLE_LOGGING=1 Change-Id: Ibdbb1b55ffddb7dc9c63e52fc9db3013409376a4	2019-08-21 13:29:15 -04:00
Sean Keely	465a8eb40b	PR from github user DiamondLovesYou. Allow user specified profiles if the HSAIL note is not found. Konstantin reviewed and approved. HSAIL note is not generated by LLVM. Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1	2019-07-16 13:55:38 -05:00
Konstantin Zhuravlyov	7001134757	Process symbols with 0 address Change-Id: I9ed943a8ccd3b103edd6aba8264c009d8cda29fa	2019-03-30 02:14:43 -04:00
Konstantin Zhuravlyov	8bee6e4976	Loader: update symbol processing for v2+ - Skip symbols that are STB_LOCAL and not STT_AMDGPU_HSA_KERNEL Change-Id: I68567f58de9bf3f07dbd8020ef63f47667c86367	2019-01-18 15:42:28 -05:00
Konstantin Zhuravlyov	c1ad82a6b7	Loader updates for code object v3 - Fix loading in some cases - Fix symbol kind Change-Id: I721b4a35972b6d2a6d0ac733ab770b096cc74e17	2019-01-18 15:41:01 -05:00
Konstantin Zhuravlyov	a447d79430	Fix dynamic relocations: - Process dynamic relocation even if there is no symbol associated to it. Change-Id: Iaefee682ee52f5acda8280e5764e6d5fd992774a	2018-11-14 15:25:41 -05:00
Konstantin Zhuravlyov	386874da55	Loader: Add support for v3 object code. Change-Id: I7215bd0c1277c2036bf0fadf5b23cb57fdf7f665	2018-10-06 14:01:59 -04:00
Scott Linder	47f0e6f7d3	Apply dynamic relocations for STT_FUNC symbols Required to support function calls through GOT table. Change-Id: I174a0269fdd67369d38fe41855b7bd01f350b839	2018-09-23 21:42:32 -04:00
Konstantin Zhuravlyov	7ef70f7eaa	Bring naming on par with the spec (hsa-runtime) Change-Id: Ie1903c90a195cf95b186eb5552131a20af408adf	2018-04-10 09:15:02 -04:00
Wilkin	8e3d26c617	ROCm Runtime Support for respecting target xnack setting This includes the changes provided by Konstantin, "Add xnack from elf header" (Change 136389). Change-Id: I95e51141caa0d7c21903b09212c02e4906ec54a3	2018-03-20 16:57:15 -04:00
Tony Tye	d472b24d05	Add support for R_AMDGPU_RELATIVE64 - Add support for R_AMDGPU_RELATIVE64 relocation record. - Return status error if any unsupported relocation record encountered. Change-Id: Icbb5dcb81109a70c1f2195412a0df58a11be9da1	2018-01-30 18:20:26 -05:00
Qingchuan Shi	ce6aee01ed	Add APIs to support debugging vm fault 1. Add hsa ext api hsa_amd_register_vmfault_handler for debugger to register callback in case of VM fault. 2. Extend hsa_ven_amd_loader API to: (1) iterate loaded code objects in executable: hsa_ven_amd_loader_executable_iterate_loaded_code_objects (2) get loaded code object info: hsa_ven_amd_loader_loaded_code_object_get_info 3. Make the id of hsa_queue the same as the one used in communication with thunk (for amd_aql_queue) Change-Id: I68910809e59e24297350d262606f00e96c14bcbd	2017-10-28 21:48:26 -04:00
Konstantin Zhuravlyov	9887c26113	Bring loader in sync with stg/sc Change-Id: Iccce07b8fa03d37c4267a2a9bd343e6614dc43e7	2017-02-10 11:21:15 -05:00
Konstantin Zhuravlyov	08aded148a	Revert "Bring loader in sync with stg/sc" This reverts commit `c798c60343`. Change-Id: If99e8cc9e2afb525f690e49eb6538d8e950a5615	2016-12-14 15:14:36 -05:00
Konstantin Zhuravlyov	c798c60343	Bring loader in sync with stg/sc Change-Id: I684522c442de0872007a7e4da8919067fc7b42b3	2016-12-13 16:30:25 -05:00
Ramesh Errabolu	eb2efb83d1	Initial set of changes for ThreadTrace Change-Id: I07ce31f9b4f508cef0fc9ca6dadcf26b6c90361e	2016-11-21 23:40:56 -06:00
Konstantin Zhuravlyov	4b86843409	Remove `load_legacy` parameter + change prefix for some loaded code object queries back to AMD Change-Id: I74e905abd77dab3a7a00b5ced94cd9b5130365c5	2016-11-20 13:46:17 -05:00

1 2

58 Коммитов