rocm-systems

作者	SHA1	备注	提交日期
Amber Lin	9c6828647b	kfdtest: blacklist KFDSVMEvictTest.QueueTest Temporarily blacklist KFDSVMEvictTest.QueueTest on gfx950 Signed-off-by: Amber Lin <Amber.Lin@amd.com> [ROCm/ROCR-Runtime commit: `31d51acb26`]	2025-05-23 01:22:11 -04:00
David Yat Sin	342e478e7d	rocr: Perform memcpy for small code-object loads On large BAR systems, for small-sized code-objects, we get performance using direct memcpy due to latencies when doing the blit-copy. [ROCm/ROCR-Runtime commit: `da2607024b`]	2025-05-22 18:39:19 -04:00
David Yat Sin	9c5bb61708	rocr: Perform range based cache invalidates Invalidate only the address range that covers the newly copied code-object. This avoids invalidating I$ for old code objects and thus might increase I$ hit rate. [ROCm/ROCR-Runtime commit: `e969e01f54`]	2025-05-22 18:39:19 -04:00
Ramakrishnan, Ranjith	85cd72987f	CMake: Remove file reorganization backward compatibility code (#176 ) The feature has already been disabled, and the related source code is no longer required [ROCm/ROCR-Runtime commit: `1785cff6a5`]	2025-05-22 09:47:26 -07:00
Philip Yang	4ac71d1f5d	kfdtest: Add KFDQMTest UserQueueBufValidation Create CP queue and SDMA queue should fail with invalid queue ring buffer or ring buffer size. Test unmap or free queue buffers should fail before queue is destroyed. Use child process to test unmap CWSR buffer will evict queue. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Change-Id: I5dcd51d6b43445d19a986f8b0b82063e20348a5f [ROCm/ROCR-Runtime commit: `bd86fb1e63`]	2025-05-22 10:06:42 -04:00
Philip Yang	50886316e9	libhsakmt: unmap from GPU error handling If unmap from GPU return failed, for example, unmap user queue buffer while queue is active, we should not free obj->mapped_node_id_array, otherwise, the following unmap user queue buffer after queue is destroyed still return failed. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Change-Id: I32aeb18871c2e971d01900d92916c54680f5c9fa [ROCm/ROCR-Runtime commit: `3e6f51b715`]	2025-05-22 10:06:42 -04:00
Apurv Mishra	5c42a9f1bf	kfdtest: Disable tests that cause unwanted behavior disable KFDLocalMemoryTest.Fragmentation and KFDEventTest.MeasureInterruptConsumption as part of the KFD test suite improvement feature Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com> [ROCm/ROCR-Runtime commit: `f853dda9ba`]	2025-05-21 16:29:15 -04:00
Ben Vanik	ba02a7b1ca	kfdtest: Fix SVM profiler QUEUE_RESTORE parsing [ROCm/ROCR-Runtime commit: `d54124383f`]	2025-05-21 13:17:25 -04:00
Ben Vanik	62cd7e1f54	rocr: Fix SVM profiler QUEUE_RESTORE parsing [ROCm/ROCR-Runtime commit: `1a32392912`]	2025-05-21 13:17:25 -04:00
Flora Cui	89e5075ce0	rocr: try defaultSignal for intercept_queue if interrupt is not supported Signed-off-by: Flora Cui <flora.cui@amd.com> [ROCm/ROCR-Runtime commit: `8cf4b7fc05`]	2025-05-21 09:37:47 -04:00
Yiannis Papadopoulos	69505ab60c	Fix formatting [ROCm/ROCR-Runtime commit: `700078d335`]	2025-05-20 13:59:22 -05:00
Yiannis Papadopoulos	38c54b09ac	rocr/aie: Correct operand count [ROCm/ROCR-Runtime commit: `c80616d807`]	2025-05-20 13:59:22 -05:00
David Yat Sin	38ea4370c1	rocr: Fix doorbell ring When compiling with -O0, some compilers generate a xchg instruction for the __atomic_store(...) built-in. Using xchg on MMIO memory is undefined-behavior and may be ignored on certain CPUs. [ROCm/ROCR-Runtime commit: `f011a9506d`]	2025-05-20 09:19:10 -04:00
Aaron Liu	ba372ca4a8	rocrtst/dtif: performance::memory_async_copy test fix on DTIF Signed-off-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Feifei Xu <feifxu@amd.com> Signed-off-by: Longlong Yao <Longlong.Yao@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `297ea78140`]	2025-05-13 16:44:31 -04:00
Jiadong Zhu	b99015f30a	rocr/dtif: use default signal for intercept queue for DTIF Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `0f9d2b836c`]	2025-05-13 16:44:31 -04:00
Aaron Liu	85a11c729c	rocr/dtif: disable interrupt signal for DTIF backend Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `8c1b1201b7`]	2025-05-13 16:44:31 -04:00
Jiadong Zhu	a0dc167541	rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader hsaKmtQueueRingDoorbell is specfic to DTIF backend Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `e2d767879d`]	2025-05-13 16:44:31 -04:00
Aaron Liu	008bbd94d5	rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `e9088d6e47`]	2025-05-13 16:44:31 -04:00
Aaron Liu	c6ffc85a47	rocr/dtif: add DRM APIs wrapper in thunk loader Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `0cd4ddd62b`]	2025-05-13 16:44:31 -04:00
Aaron Liu	6cf184a0d4	rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...) Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `1b79caa214`]	2025-05-13 16:44:31 -04:00
Aaron Liu	87dcbf1255	rocr/dtif: add thunk loader to wrap hsaKmt APIs For native and DTIF backends, unify to use HSAKMT_CALL(...) to call hsaKmt APIs. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `7ba77fb193`]	2025-05-13 16:44:31 -04:00
Aaron Liu	137b168b46	rocr/dtif: add dtif environment variable Using HSA_ENABLE_DTIF to control dtif/native thunk code path Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `166b0fa45a`]	2025-05-13 16:44:31 -04:00
Ma, Li	526db8bbaa	rocr: Expose all available DMA engines (#165 ) When copying for inter devices, Currently only XGMI as exposed. Now SDMA0/1 will be exposed as well for inter device copies especially that they are one of the recommended engines. Signed-off-by: Li Ma <li.ma@amd.com> [ROCm/ROCR-Runtime commit: `e38dd98914`]	2025-05-13 17:42:15 +08:00
Hila, Nino	24b8070788	Update palamida.yml (#158 ) * Update palamida.yml Signed-off-by: Hila, Nino <Nino.Hila@amd.com> * Add palamida.yml --------- Signed-off-by: Hila, Nino <Nino.Hila@amd.com> [ROCm/ROCR-Runtime commit: `f5daf75abf`]	2025-05-12 21:37:36 -07:00
Saleel Kudchadker	c0b0cb1788	rocr: Expose hsa_amd_memory_get_preferred_copy_engine api [ROCm/ROCR-Runtime commit: `1eb8694dd2`]	2025-05-09 17:13:27 -07:00
Shane Xiao	f8ac975cd2	rocr: Set rec_sdma_eng_override_ for all gpus Set the rec_sdma_eng_override_ for other gpus, or DmaCopyOnEngine will use sdma for D<->D copy, which will trigger invalid argument. [ROCm/ROCR-Runtime commit: `82a88f2e2b`]	2025-05-08 23:52:12 +08:00
christian-heusel	6c8a2da29a	rocr:Add missing cstdint include [ROCm/ROCR-Runtime commit: `5cc61b714d`]	2025-05-06 20:52:48 -04:00
Searles, Mark	f698518819	Update createMCObjectStreamer() to use new LLVM API (#156 ) (#157 ) * Update createMCObjectStreamer() to use new LLVM API Obsolete interfaces were removed via llvm-project's f2ff298867d7733122e32eead5a8c524b09dfdb1 * Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR * Fix typo [ROCm/ROCR-Runtime commit: `ac1e6d59c2`]	2025-05-05 13:18:05 -07:00
Apurv Mishra	aa896090f8	kfdtest: Update ROCr homepage in CMakeLists.txt Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com> [ROCm/ROCR-Runtime commit: `aa0a32a166`]	2025-05-01 11:22:49 -04:00
David Yat Sin	b48b401a09	rocr: Fix logic for scratch reclaim Fix logic error that can cause scratch memory to be reclaimed while a dispatch is still using it. [ROCm/ROCR-Runtime commit: `4ed5950beb`]	2025-04-29 17:23:45 -04:00
Amber Lin	9d98d7479d	kfdtest: Skip SVMEvict with xnack=0 Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on NPS2/DPX mode. It's seen with xnack off and happens more often on the partition with less VRAM because of TMR. Temporarily skip SVM Evict tests on Family AV when xnack is disabled. Signed-off-by: Amber Lin <Amber.Lin@amd.com> [ROCm/ROCR-Runtime commit: `5e28208cec`]	2025-04-25 12:45:36 -04:00
Tony Gutierrez	ce61e3301b	rocr: Add large_bar_enabled var to the GPU agent Adds a bool to the GPU agent and a public member method to check if the GPU supports large BAR. This is needed so we can check if large BAR is supported when a user tries to allocate an AQL queue in device memory on a given GPU agent. Also adds an exception to the AQL queue if device-side AQL queues are requested and the GPU owner of the AQL doesn't support large BAR. Otherwise, ROCr will currently allow device-side queues that can cause faults when the user tries to touch their ring buffers and the user will not know why the faults are occuring. This relies on the fact that the KFD does not exposed any links from the CPU to the GPU if large BAR is not enabled (though links from the GPU to the CPU may still be exposed by the KFD). [ROCm/ROCR-Runtime commit: `f2c482d923`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	6f37386eb2	rocr: Flags to alloc queue buf/struct in dev mem This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. Flags are added to support the various use cases. [ROCm/ROCR-Runtime commit: `6e3c375bf1`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	18404ba8a8	rocr: Remove empty shared.cpp [ROCm/ROCR-Runtime commit: `11d1d2cd25`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	3ebcf3020f	rocr/libhsakmt: Add coarse-grain allocator to GPU [ROCm/ROCR-Runtime commit: `adbc0495e2`]	2025-04-23 15:53:29 -04:00
Saleel Kudchadker	945d6da90b	rocr: return preferred SDMA engine mask - Add a new AMD extension API to return preferred SDMA engine mask. This can use used in conjunction with copy_on_engine API to get optimal bandwidth. [ROCm/ROCR-Runtime commit: `57c0c643ce`]	2025-04-22 13:28:38 -07:00
Amber Lin	bf3bb1f1a1	Revert "kfdtest: Temporarily blacklist KFDNegativeTest" This reverts commit `fffdffc3ce`. MEC v18 starts to support pipe reset [ROCm/ROCR-Runtime commit: `bdb6e43b54`]	2025-04-21 14:14:10 -04:00
Yiannis Papadopoulos	8246b54f1e	rocr/aie: Remove redundant cache flushes for already loaded PDIs [ROCm/ROCR-Runtime commit: `7c8fa87160`]	2025-04-17 09:48:41 -05:00
Shane Xiao	8d34f4e12d	rocr: Add rec sdma engines with limited XGMI SDMA engine This patch will adds recommended sdma supports with limited XGMI SDMA engine. It will use one PCIe SDMA to do gpu <-> gpu copies which will help improve all to all copy performance. Signed-off-by: Shane Xiao <shane.xiao@amd.com> [ROCm/ROCR-Runtime commit: `6a63170b38`]	2025-04-11 23:54:15 +08:00
Jonathan Kim	a595c0bd25	kfdtest: fix trap on start for gfx 9 and 11 Similar to GFX 12, GFX 9 and 11 need to exit without forwarding the PC. [ROCm/ROCR-Runtime commit: `4c3a0698f8`]	2025-04-10 14:48:19 -04:00
David Yat Sin	309a1354ab	rocr: refactor PC Sampling PRED_EXEC op Refactor PRED_EXEC op command size calculation. Fix issue when copy size is less than 32MB. [ROCm/ROCR-Runtime commit: `c1b7aa39ed`]	2025-04-08 17:26:29 -04:00
Yiannis Papadopoulos	96b7e42776	rocr/aie: Increment write pointer upon packet submission [ROCm/ROCR-Runtime commit: `2d2c47bdef`]	2025-04-08 15:36:40 -05:00
Eric Huang	13cdca7fb3	kfdtest: fix max queues on multi-gpu mode The max queues per process is 1024 in KFD, KFDQMTest.OverSubscribeCpQueues fails with multi-gpu mode on more than 15 gpus, because 65x16=1040 exceeds 1024, so changing MAX_CP_QUEUES to adapt it will fix the issue. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> [ROCm/ROCR-Runtime commit: `df6048429c`]	2025-04-08 12:57:00 -04:00
Eric Huang	9055cf8092	kfdtest: fix ptrace error on multi-gpu mode The parent process can only be ptraced by 1 process once, to avoid the error we have to add mutex to synchronize the ptrace call. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> [ROCm/ROCR-Runtime commit: `d3265234e9`]	2025-04-08 09:58:28 -04:00
Choudhary, Rahul	480ca4d5e7	Update workflow to use mainline branch [ROCm/ROCR-Runtime commit: `5b4c717208`]	2025-04-07 09:36:52 -04:00
Choudhary, Rahul	d605e9256c	Update rocm_ci_caller.yml updating push trigger to amd-mainline Signed-off-by: Choudhary, Rahul <Rahul.Choudhary@amd.com> [ROCm/ROCR-Runtime commit: `a0b80c825c`]	2025-04-07 09:36:52 -04:00
Yiannis Papadopoulos	f53a9c72c4	rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition. [ROCm/ROCR-Runtime commit: `c63e01724c`]	2025-04-03 15:13:20 -05:00
Lancelot SIX	c813d2c62d	rocr: Replace tabs with spaces in trap handler source codes Use spaces consistently to format the trap handler code. This patch does not introduce any change in the trap handler. Using `git show -w` on this patch shows an empty diff. Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a [ROCm/ROCR-Runtime commit: `e0359e5d35`]	2025-04-03 09:44:23 +01:00
David Yat Sin	f46bc26cff	rocr: Fix PC Sampling PRED_EXEC num dwords count Fix incorrect value for number of dwords in the PRED_EXEC command. [ROCm/ROCR-Runtime commit: `2a433e2b96`]	2025-04-01 15:53:45 -04:00
Mallya, Ameya Keshava	7adfc15d58	Adding !verify features Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com> [ROCm/ROCR-Runtime commit: `39e8911fbc`]	2025-03-31 13:05:52 -07:00

1 2 3 4 5 ...

2930 次代码提交