rocm-systems

Autor	SHA1	Zpráva	Datum
Jonathan R. Madsen	7ce263b0e4	Update rocprofiler-register support - add rocprofiler-register to CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS when found - add rocprofiler-register to CPACK_RPM_BINARY_PACKAGE_REQUIRES when found - remove report_tool_load_failures_explicit_ - add HSA_TOOLS_DISABLE_REGISTER flag - add HSA_TOOLS_REPORT_REGISTER_FAILURE - use HSA_TOOLS_REPORT_REGISTER_FAILURE instead of HSA_TOOLS_REPORT_LOAD_FAILURE - changed rocprofiler-register message to not include the word "error" Change-Id: Ib7fd7f14c42758a54c347874018281bb1b5477a6	2024-02-22 11:55:25 -05:00
Shweta Khatri	24633c7a85	Avoid releasing scratch for blit queues At hsa_shutdown(), scratch_lock_ may be gone. Blit queues don't need it. Change-Id: Ic132ac8a6be31fb2f0623137115608b0b222f077	2024-02-22 14:12:05 +00:00
Jonathan Kim	1f63ea3476	Fix export-close race during IPC attach request If two attach requests to the same piece of shared memory occur, a double export or premature dmabuf fd close can occur since the export and close on demand calls are not atomic. Use a reference counter on shared memory dmabuf FDs that have already been opened to avoid this problem. Change-Id: I14a59209c0385e32582af42a57b33b1c6838a9b1	2024-02-22 14:12:05 +00:00
David Yat Sin	5b28a1bc17	Fix compile error on certain gcc versions Change-Id: I8a4fab76d1dcc576eb7706ab45fc786c0cab274a	2024-02-13 15:25:34 -05:00
David Yat Sin	ae16b3e14e	Use sysconf pagesize for system pagesize Provided by user huanggyizhi on github https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/124 Change-Id: Ia03c45f7a869ae2c804accf8163f8ae36c20dd5a	2024-02-13 14:28:10 -05:00
Joseph Huber	9e26cbac14	Add executable symbol info for the wavefront size The wavefront size is currently only exposed as an agent level attribute. This is not correctyl, because while the agent has a default wave front size that is usually correct, it can easily be overridden via options like -mwavefrontsize64 on various ISAs. The wavefrontsize attribute is actually more of a calling convention that is consistent within a callgraph. Because the root of each call graph is a kernel in this architecture, we need to be able to query this on a per-kernel basis. This information is already avialable in the kernel descriptor packet, but it wasn't exported. This patch adds HSA_CODE_SYMBOL_INFO_KERNEL_WAVEFRONT_SIZE as a new option to query on the executable symbol. Change-Id: I744815c89cc9d4c82f25479bdd48ae1f32e859ff	2024-02-09 15:55:30 +00:00
Jonathan Kim	e911335cee	Minimize FD creation on IPC Create Instead of caching shared memory fds for export on the exporter side, only export the FD in the async handler when requested. The importer should request export fd closure once import is done. Change-Id: I469e0cd1749beeb9c506c8a6461745fb039d9c3b	2024-02-07 18:50:54 -05:00
Mythreya	8e312471dc	Fix ToolsApiTable versioning ToolsApiTable's version was incorrectly default initialized to 0. Fixes error in commit fc889669 Change-Id: I41e9301a9c33b119ee50f6164d21ddf11dc188c4	2024-02-07 17:02:32 -05:00
David Yat Sin	f7de85082e	VMM: Allow non-contiguous memory maps Adjust code to allow the use of non-contiguous chunks of memory to be mapped within a single VA range. Change-Id: Ida21ba202927229347b3a32d9b7106df10819cf5	2024-02-07 16:56:52 +00:00
David Yat Sin	0f30da58a7	Improve documentation for set_async_scratch_limit API Change-Id: I03ca986cdd468c7b167e119bd2f25d5c79ff2142	2024-02-07 16:56:52 +00:00
Mythreya	a67af3807f	Initial support for scratch allocation tracking Add new tools table and functions to notify in case of an event Change-Id: I47f0c2f3c8e02d7bcb74d649903eb4f86721c154	2024-02-07 16:56:52 +00:00
Joseph Greathouse	1d6691e06b	Fix undefined behavior in definition of hsa_amd_memory_fault_reason_t Currently, the definition of hsa_amd_memory_fault_reason_t tries to set a constant of 0x8000_0000 by using the definition "1 << 31". However, the 1 in this definition is a signed integer by C++ rules. On our architectures, shifting a signed integer by 31 results in signed integer overflow. Signed integer overflow results in undefined behavior. Forcing the 1 to be unsigned avoids this. Change-Id: I860431eeede4eff29598f646abf3c1337b048d71	2024-02-07 16:56:52 +00:00
Jonathan Kim	1dd4a7dc18	Fix copy logic on devices with no xgmi SDMAs Fix gang factor overwrite of 0 if there are no xGMI SDMAs on the device and gang factor is 1. Change-Id: I041d4b4ae87fb68f224ee4dedb758c6f06c022a9	2024-02-07 16:56:52 +00:00
Jonathan Kim	a3efd13a2f	Fix IPC import on device memory with no requested nodes Users can import device memory without specifying the target node. DMA buf imports return a Thunk handle that's not useful for gpu mapping calls. Fix this by using the import node information to re-import and map with the correct target GPU. Also fix IPC detach calls by deregistering the Thunk handle import immediately during attach instead of failing to do it later on detach since Thunk handles aren't placed into ROCr allocation map. Finally refactor the IPC attach function for cleaner logic flow. Change-Id: Ib2bf178110b2be98bd6917c765f724e4e613f5f2	2024-02-06 23:15:29 +00:00
Jonathan Kim	15691ae460	Fix DMABuf FD closure for IPC attach client We should also close the client side dmabuf fd after importing for target nodes. Change-Id: I74f61dd65bebb03dc002f5df7301efd1ef8d9603	2024-02-06 23:15:29 +00:00
Jonathan Kim	62f3f250ce	Optimize and fix SDMA gang copies Optimizations include: - Greedy gang by placing gang leaders on first D2D sdma blit context to avoid dead locking with other gang leaders and items. Note that this is fine since we can't avoid an oversubscription problem when there is only 1 xGMI link anyways, so treat all xGMI links as a single pipe for ganging. - Non-leader gang items don't have to poll on dependency signals so this opens up more non-blocking SDMA channels. - unlock gang lock when gangs are not needed. - Change gang factor lookup from vector pair to map and register all gpus in gang factor lookup regardless of link type so that we can take advantage of the O(logN) direct key/value lookup time. Fixes include: - HSA_PAGE_SIZE_4KB was an incorrect macro to use for gang size limit. As a result, small copies ended up ganging and hitting latency limit. Use hardcoded 4096 bytes instead. - Cap auxillary gang factor to the number of non-XGMI SDMA engines. Change-Id: Ic23fde131502906a807134a04599aa6d012e8cbb	2024-01-25 10:42:27 -05:00
Sam Wu	1c6ad56dc6	Apply doc standards for ReadtheDocs builds Applies the following changes: add version number to documentation left navigation bar and page title add an "About" section with a license page enable htmlzip, pdf, epub formats when publishing on Read the Docs set pdf title, author, copyright, and version rename .sphinx/.doxygen to sphinx/doxygen remove docBin from URL update rocm-docs-core dependency Change-Id: I947cf32cd42d9f4e55b1ddd324ad4a7e4ba3f3e3	2024-01-18 12:07:27 -05:00
David Yat Sin	32b3a3c299	VMM: Use emplace when adding entries Use emplace to prevent copying the MappedHandle objects when inserting entries into mapped_handle_map_. Change-Id: Id3f40f1eb73ce30e62da53c5aea4dd715e83ac59	2024-01-17 10:25:04 -05:00
David Yat Sin	29efd8eccd	VMM: Fix flags when allocating memory handle When allocating a memory handle, the NoAddress thunk flag should be set so that this allocation does not have a virtual address range. Also, skip mapping the memory when allocating a memory handle Change-Id: I1c168bc00ddbc158d447197c4dc25f96bad02b19	2024-01-17 10:24:58 -05:00
David Yat Sin	2f97049da5	VMM: Default access should be none After a memory handle is created. hsa_amd_vmem_get_access should return HSA_ACCESS_PERMISSION_NONE insread of reporting the allocation as invalid. Change-Id: I1a09d15c220d48497d09c89059493e538f82aeb9	2024-01-17 10:24:51 -05:00
David Yat Sin	8b85f9e668	VMM: Fix access for multi-GPU When using multi-GPU for each BO, a new dmabuf_fd needs to be imported into libdrm. Change-Id: Iaa2415c8f655a1ce8e92b0878517a11ff014a1d5	2024-01-17 10:24:35 -05:00
Jonathan R. Madsen	8f0ea44c09	Suppress reporting no tools were found with rocprofiler-register Change-Id: If853517d40e073202d12e2a6b16fb54be5529650	2024-01-17 01:01:19 -05:00
Jonathan Kim	e20f41df62	Enable IPC DMA buf Set HSA_ENABLE_IPC_MODE_LEGACY off (i.e. use DMA bufs implementation by default). Change-Id: I7b1c6cb7d19310adf6f0bfe060736f4adbf7adc2	2024-01-16 22:43:27 -05:00
Jonathan Kim	5dfebdbca9	Change IPC implementation to use DMA Bufs As the KFD IPC IOCTLs will not be upstreamed, change runtime implementation to use DMA bufs. DMA buf fds will be passed over abstract unix domain sockets. The exporter spins a thread that creates a socket server. The importer connects to the server to fetch the fd. libDRM will be required to do a manual import and GPU map for memory that is not already imported and mapped. For now, use the legacy IPC implementation by default as a follow on patch will disable the HSA_ENABLE_IPC_MODE_LEGACY environment variable. Change-Id: Ifd8469e9adfc81f8a1ea78d6010fb10b515ba1b4	2024-01-16 22:43:00 -05:00
David Yat Sin	0e3f668e2c	Use HybridMutex for IPC locks Change-Id: I24ab4a96237612a7d32beda06cc20b25cb1f0b37	2024-01-16 21:29:39 +00:00
David Yat Sin	8d3fee5095	Use HybridMutex for signal mutexes Implement HybridMutex to improve latencies compared to KernelMutex when there is contention between several threads calling hsa_signal_create and hsa_amd_signal_async_handler. Change-Id: If53377033e749b0050727964c9303f09b02527cc	2024-01-16 21:29:39 +00:00
David Yat Sin	3d1563ee68	Force t1_ update when profiling is enabled Fixes issue where t1_ counters may not be updated when doing dispatch profiling, causing a divide by 0. Change-Id: I91060ac3f9fd2183d277e6e7cd810398a453a87f	2024-01-16 21:29:39 +00:00
David Yat Sin	d16c6db2ee	Increase min KFD version for Virtual mem support KFD had some fixes for handling of virtual memory APIs. These fixes are included in interface version 1.15. Change-Id: Ie701eccf6e032f9ec0a1f4e8a43718964eebddc6	2024-01-16 21:29:39 +00:00
Joseph Huber	4971150576	Improve endianness check Update the `hsa.h` header to use the gcc / clang `__BYTE_ORDER__` macros where available to more accurately autodetect endianness for the target. Change-Id: I7312f3badcba9287a30eb14882b91e2a247acc5f	2024-01-16 21:29:39 +00:00
Lancelot SIX	6f828d8609	Revert "trap_handler: Set status.skip_export when halting a wave" This reverts commit `c5db063b2f`. This change is required for the runtime to generate reliable core dump files, but this feature has been disabled for now by `5e3be9c28a`. Until it is needed, revert the ABI change in the trap handler to maintain compatibility with older debugger. Change-Id: I77a1562dc7962befe2bf88442df858e2d2b1c5ab	2024-01-16 15:55:59 +00:00
Ruili Ji	4b69351394	To fix sdma segment fault for error address pad_size address shall start from command_addr not (command_addr + total_command_size) Change-Id: I3d8491986caf2d4d5dc41b1d90286c21e7c0a457	2023-12-25 09:31:13 +08:00
Alex Sierra	5e3be9c28a	Revert "core dump: Generates a core dump from a fault event" This reverts commit `803e37ded5`. This commit disables core dump feature. Apparently, gfx1101 SA1 waves can not enter the trap handler because they receive an invalid address. However, core dump at the debugger has been moved to rocm 6.2. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Change-Id: I7915caf58118658e5e7f435f91a0a6216d2fdb42	2023-12-18 17:30:13 -06:00
David Yat Sin	6333fdecf3	Use pthread_setaffinity_np On some systems, pthread_addr_setaffinity_np does not exist, so we need to use pthread_setaffinity_np on thread after pthread_create Provided by Julian Samaroo on github https: //github.com/RadeonOpenCompute/ROCR-Runtime/pull/143 Change-Id: I4649f94333f2d7b0a5993b370a4bfc48d92acecb	2023-12-18 17:41:49 -05:00
David Yat Sin	9b2ed66609	Fix README for invalid command `-DCMAKE_INSTALL_PATH` is not valid,use `-DCMAKE_INSTALL_PREFIX` instead https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/171/ Suggested-by: fjh1997 on github Change-Id: Ibb85da7fe755b662fa9a836d6fbe3394d34a0337	2023-12-18 09:15:05 -05:00
David Yat Sin	c86837d8d6	Add query for agent memory and aql ext properties Add query to return flags for GPU agent memory properties and AQL extensions. Implement flag to determine that GPU agent is an APU Change-Id: Ic04c51290b2b9763e14989c117f35a2e22297453	2023-12-07 14:41:37 -05:00
Lancelot SIX	c5db063b2f	trap_handler: Set status.skip_export when halting a wave When inspecting waves on architectures where SPI may not initialize TTMP registers, the debugger cannot reliably know if the trap handler was entered and if it saved valuable information in TTMP registers. This patch uses the status.skip_export bit (unused by the compute shaders) to indicate that it got executed before halting a wave. This is done except for gfx940, where ttmp11[31] can be used (as long as TTMP registers are always initialized by SPI for this architecture). It could be possible to be more selective as architectures always initializing TTMP registers do not require this step, but always doing is makes maintenance simpler. Change-Id: I314db6b37772f7daa8bd405e6662a86658d3f5e0	2023-12-06 21:20:03 -05:00
David Yat Sin	ed1b0b9b1a	Add queries for HSA Ext interface version Change-Id: I26860fb1364cd3a33cdc9b284ac807b2702bb241	2023-12-06 13:58:52 -05:00
Alex Sierra	803e37ded5	core dump: Generates a core dump from a fault event Extracts and creates a core dump ELF file from a fault event, using core dump front end. Signed-off-by: Alex Sierra <Alex.Sierra@amd.com> Change-Id: Ibbbe41b3d13dd3fcb90161e927d48c329cf513a9	2023-12-05 23:19:14 -05:00
Alex Sierra	54604654bd	reports KFD core dump support through hsakmt API Member added to KFDVersion to report if KFD supports core dump mechanism. This is done through hsaKmtRuntimeEnable API call while the topology is being built. It also dictates if core dump will be generated by either KFD or hsa-runtime. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Change-Id: I2e9d4166563402f78613d728446feb692c52d9d1	2023-12-05 23:19:14 -05:00
Alex Sierra	91f2a70817	core dump: ulimit check mechanism added Core dump generation considers ulimit to generate the proper size file. Signed-off-by: Alex Sierra <Alex.Sierra@amd.com> Change-Id: I61d991fc003b173f9075b66bff6a931447720695	2023-12-05 23:19:14 -05:00
Alex Sierra	514b222368	core dump: Front end core dump API This API consists in one function to be called from a fault event at the hsa-runtime to generate a core dump. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Change-Id: Ib1b90d5beb13f93c4e8ebd21fd61705ebb12ca5d	2023-12-05 23:19:14 -05:00
Alex Sierra	1083d5c35f	core dump: SegmentBuilder classes added SegmentBuilder classes are used to get core dump data from the GPUs. So far, it uses thunk API calls and smaps to collect all data from the Hardware. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Change-Id: I2ad70ca5a951885181d3142653b186b0f6be739e	2023-12-05 23:19:14 -05:00
Giovanni LB	71bc875ccd	Adding coordinate query to aqlprofile Change-Id: I9f2fee62a24cf2a4784ba9e8c813b7b7296d034b	2023-12-05 13:25:30 -05:00
Giovanni LB	e8920cacc8	: Adding ATT API extension to aqlprofile Change-Id: Ic511cf871d5d98638d7041ca277f945ae8ced3a5	2023-12-05 13:25:10 -05:00
Jonathan R. Madsen	27eb0516bb	rocprofiler-register updates - fix logic for using HSA_TOOLS_LIB when rocprofiler-register support is enabled - report tool load failure for rocprofiler-register Change-Id: Ife23aa3e6ed19174376cd694764583b73f8976cd	2023-12-04 11:44:58 -06:00
David Yat Sin	251601b20b	Add RISC-V support Patch provided by user Xeonacid via github: https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/172/ Change-Id: I5f9086b536383093e7995b9cfdc19dab213f0265	2023-12-04 15:05:22 +00:00
David Yat Sin	f07b8f2250	Use CPU_SET_S instead of CPU_SET Fix incorrect use of CPU_SET on variable size cpu_set_t Suggested by Christopher E. Moore on github https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/130 Change-Id: I710b56683ba07c08dcd83c851bf72e4f127a0ad4	2023-12-04 15:05:22 +00:00
Giovanni LB	e0c6c5e5bf	Extending AQLprofile API to include counter dimensions Change-Id: If59489a085959f3f765a30e3e445df5151e30350	2023-12-04 15:05:22 +00:00
David Yat Sin	a7a3358067	Implement alternate scratch The alternate scratch memory is used for dispatches that have a low number of waves but relatively large wave size. This allows us to keep the tmpring_size.bits.WAVES field of the main scratch to full occupancy. Change-Id: I32d240fac4b7d38200d1eebc1b0fdc8a823920d3	2023-12-04 15:05:22 +00:00
David Yat Sin	dca8f3a21d	Implement async scratch reclaim For devices where the CP FW supports asynchronous scratch reclaim, ROCr is able to claw-back scratch memory that was assigned to an AQL queue. With that ability, ROCr does not have to rely on using USO (use-scratch-once) when assigning large amounts of memory to a queue. If we reach a situation where we are running low on device memory, ROCr will attempt to claw-back the scratch memory. Change-Id: Iddf8ec84e37ab8b9fdc58bafbe2b61fe2acb6eb7	2023-12-04 15:05:22 +00:00

1 2 3 4 5 ...

916 Commity