提交图

2930 次代码提交

作者 SHA1 备注 提交日期
Amber Lin 9c6828647b kfdtest: blacklist KFDSVMEvictTest.QueueTest
Temporarily blacklist KFDSVMEvictTest.QueueTest on gfx950

Signed-off-by: Amber Lin <Amber.Lin@amd.com>


[ROCm/ROCR-Runtime commit: 31d51acb26]
2025-05-23 01:22:11 -04:00
David Yat Sin 342e478e7d rocr: Perform memcpy for small code-object loads
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.


[ROCm/ROCR-Runtime commit: da2607024b]
2025-05-22 18:39:19 -04:00
David Yat Sin 9c5bb61708 rocr: Perform range based cache invalidates
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.


[ROCm/ROCR-Runtime commit: e969e01f54]
2025-05-22 18:39:19 -04:00
Ramakrishnan, Ranjith 85cd72987f CMake: Remove file reorganization backward compatibility code (#176)
The feature has already been disabled, and the related source code is no longer required

[ROCm/ROCR-Runtime commit: 1785cff6a5]
2025-05-22 09:47:26 -07:00
Philip Yang 4ac71d1f5d kfdtest: Add KFDQMTest UserQueueBufValidation
Create CP queue and SDMA queue should fail with invalid queue ring
buffer or ring buffer size.

Test unmap or free queue buffers should fail before queue is destroyed.

Use child process to test unmap CWSR buffer will evict queue.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I5dcd51d6b43445d19a986f8b0b82063e20348a5f


[ROCm/ROCR-Runtime commit: bd86fb1e63]
2025-05-22 10:06:42 -04:00
Philip Yang 50886316e9 libhsakmt: unmap from GPU error handling
If unmap from GPU return failed, for example, unmap user queue buffer
while queue is active, we should not free obj->mapped_node_id_array,
otherwise, the following unmap user queue buffer after queue is
destroyed still return failed.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I32aeb18871c2e971d01900d92916c54680f5c9fa


[ROCm/ROCR-Runtime commit: 3e6f51b715]
2025-05-22 10:06:42 -04:00
Apurv Mishra 5c42a9f1bf kfdtest: Disable tests that cause unwanted behavior
disable KFDLocalMemoryTest.Fragmentation and
KFDEventTest.MeasureInterruptConsumption as
part of the  KFD test suite improvement feature

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: f853dda9ba]
2025-05-21 16:29:15 -04:00
Ben Vanik ba02a7b1ca kfdtest: Fix SVM profiler QUEUE_RESTORE parsing
[ROCm/ROCR-Runtime commit: d54124383f]
2025-05-21 13:17:25 -04:00
Ben Vanik 62cd7e1f54 rocr: Fix SVM profiler QUEUE_RESTORE parsing
[ROCm/ROCR-Runtime commit: 1a32392912]
2025-05-21 13:17:25 -04:00
Flora Cui 89e5075ce0 rocr: try defaultSignal for intercept_queue
if interrupt is not supported

Signed-off-by: Flora Cui <flora.cui@amd.com>


[ROCm/ROCR-Runtime commit: 8cf4b7fc05]
2025-05-21 09:37:47 -04:00
Yiannis Papadopoulos 69505ab60c Fix formatting
[ROCm/ROCR-Runtime commit: 700078d335]
2025-05-20 13:59:22 -05:00
Yiannis Papadopoulos 38c54b09ac rocr/aie: Correct operand count
[ROCm/ROCR-Runtime commit: c80616d807]
2025-05-20 13:59:22 -05:00
David Yat Sin 38ea4370c1 rocr: Fix doorbell ring
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.


[ROCm/ROCR-Runtime commit: f011a9506d]
2025-05-20 09:19:10 -04:00
Aaron Liu ba372ca4a8 rocrtst/dtif: performance::memory_async_copy test fix on DTIF
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Feifei Xu <feifxu@amd.com>
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 297ea78140]
2025-05-13 16:44:31 -04:00
Jiadong Zhu b99015f30a rocr/dtif: use default signal for intercept queue for DTIF
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 0f9d2b836c]
2025-05-13 16:44:31 -04:00
Aaron Liu 85a11c729c rocr/dtif: disable interrupt signal for DTIF backend
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 8c1b1201b7]
2025-05-13 16:44:31 -04:00
Jiadong Zhu a0dc167541 rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader
hsaKmtQueueRingDoorbell is specfic to DTIF backend

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: e2d767879d]
2025-05-13 16:44:31 -04:00
Aaron Liu 008bbd94d5 rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: e9088d6e47]
2025-05-13 16:44:31 -04:00
Aaron Liu c6ffc85a47 rocr/dtif: add DRM APIs wrapper in thunk loader
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 0cd4ddd62b]
2025-05-13 16:44:31 -04:00
Aaron Liu 6cf184a0d4 rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...)
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 1b79caa214]
2025-05-13 16:44:31 -04:00
Aaron Liu 87dcbf1255 rocr/dtif: add thunk loader to wrap hsaKmt APIs
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 7ba77fb193]
2025-05-13 16:44:31 -04:00
Aaron Liu 137b168b46 rocr/dtif: add dtif environment variable
Using HSA_ENABLE_DTIF to control dtif/native thunk code path

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 166b0fa45a]
2025-05-13 16:44:31 -04:00
Ma, Li 526db8bbaa rocr: Expose all available DMA engines (#165)
When copying for inter devices, Currently only XGMI as exposed. Now
SDMA0/1 will be exposed as well for inter device copies especially that
they are one of the recommended engines.

Signed-off-by: Li Ma <li.ma@amd.com>

[ROCm/ROCR-Runtime commit: e38dd98914]
2025-05-13 17:42:15 +08:00
Hila, Nino 24b8070788 Update palamida.yml (#158)
* Update palamida.yml

Signed-off-by: Hila, Nino <Nino.Hila@amd.com>

* Add palamida.yml

---------

Signed-off-by: Hila, Nino <Nino.Hila@amd.com>

[ROCm/ROCR-Runtime commit: f5daf75abf]
2025-05-12 21:37:36 -07:00
Saleel Kudchadker c0b0cb1788 rocr: Expose hsa_amd_memory_get_preferred_copy_engine api
[ROCm/ROCR-Runtime commit: 1eb8694dd2]
2025-05-09 17:13:27 -07:00
Shane Xiao f8ac975cd2 rocr: Set rec_sdma_eng_override_ for all gpus
Set the rec_sdma_eng_override_ for other gpus, or DmaCopyOnEngine
will use sdma for D<->D copy, which will trigger invalid argument.


[ROCm/ROCR-Runtime commit: 82a88f2e2b]
2025-05-08 23:52:12 +08:00
christian-heusel 6c8a2da29a rocr:Add missing cstdint include
[ROCm/ROCR-Runtime commit: 5cc61b714d]
2025-05-06 20:52:48 -04:00
Searles, Mark f698518819 Update createMCObjectStreamer() to use new LLVM API (#156) (#157)
* Update createMCObjectStreamer() to use new LLVM API

Obsolete interfaces were removed via llvm-project's
f2ff298867d7733122e32eead5a8c524b09dfdb1

* Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR

* Fix typo

[ROCm/ROCR-Runtime commit: ac1e6d59c2]
2025-05-05 13:18:05 -07:00
Apurv Mishra aa896090f8 kfdtest: Update ROCr homepage in CMakeLists.txt
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: aa0a32a166]
2025-05-01 11:22:49 -04:00
David Yat Sin b48b401a09 rocr: Fix logic for scratch reclaim
Fix logic error that can cause scratch memory to be reclaimed while a
dispatch is still using it.


[ROCm/ROCR-Runtime commit: 4ed5950beb]
2025-04-29 17:23:45 -04:00
Amber Lin 9d98d7479d kfdtest: Skip SVMEvict with xnack=0
Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on
NPS2/DPX mode. It's seen with xnack off and happens more often on the
partition with less VRAM because of TMR.

Temporarily skip SVM Evict tests on Family AV when xnack is disabled.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>


[ROCm/ROCR-Runtime commit: 5e28208cec]
2025-04-25 12:45:36 -04:00
Tony Gutierrez ce61e3301b rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).


[ROCm/ROCR-Runtime commit: f2c482d923]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 6f37386eb2 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.


[ROCm/ROCR-Runtime commit: 6e3c375bf1]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 18404ba8a8 rocr: Remove empty shared.cpp
[ROCm/ROCR-Runtime commit: 11d1d2cd25]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 3ebcf3020f rocr/libhsakmt: Add coarse-grain allocator to GPU
[ROCm/ROCR-Runtime commit: adbc0495e2]
2025-04-23 15:53:29 -04:00
Saleel Kudchadker 945d6da90b rocr: return preferred SDMA engine mask
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.


[ROCm/ROCR-Runtime commit: 57c0c643ce]
2025-04-22 13:28:38 -07:00
Amber Lin bf3bb1f1a1 Revert "kfdtest: Temporarily blacklist KFDNegativeTest"
This reverts commit fffdffc3ce.

MEC v18 starts to support pipe reset


[ROCm/ROCR-Runtime commit: bdb6e43b54]
2025-04-21 14:14:10 -04:00
Yiannis Papadopoulos 8246b54f1e rocr/aie: Remove redundant cache flushes for already loaded PDIs
[ROCm/ROCR-Runtime commit: 7c8fa87160]
2025-04-17 09:48:41 -05:00
Shane Xiao 8d34f4e12d rocr: Add rec sdma engines with limited XGMI SDMA engine
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.

Signed-off-by: Shane Xiao <shane.xiao@amd.com>


[ROCm/ROCR-Runtime commit: 6a63170b38]
2025-04-11 23:54:15 +08:00
Jonathan Kim a595c0bd25 kfdtest: fix trap on start for gfx 9 and 11
Similar to GFX 12, GFX 9 and 11 need to exit without forwarding
the PC.


[ROCm/ROCR-Runtime commit: 4c3a0698f8]
2025-04-10 14:48:19 -04:00
David Yat Sin 309a1354ab rocr: refactor PC Sampling PRED_EXEC op
Refactor PRED_EXEC op command size calculation.
Fix issue when copy size is less than 32MB.


[ROCm/ROCR-Runtime commit: c1b7aa39ed]
2025-04-08 17:26:29 -04:00
Yiannis Papadopoulos 96b7e42776 rocr/aie: Increment write pointer upon packet submission
[ROCm/ROCR-Runtime commit: 2d2c47bdef]
2025-04-08 15:36:40 -05:00
Eric Huang 13cdca7fb3 kfdtest: fix max queues on multi-gpu mode
The max queues per process is 1024 in KFD,
KFDQMTest.OverSubscribeCpQueues fails with multi-gpu mode
on more than 15 gpus, because 65x16=1040 exceeds 1024, so
changing MAX_CP_QUEUES to adapt it will fix the issue.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>


[ROCm/ROCR-Runtime commit: df6048429c]
2025-04-08 12:57:00 -04:00
Eric Huang 9055cf8092 kfdtest: fix ptrace error on multi-gpu mode
The parent process can only be ptraced by 1 process
once, to avoid the error we have to add mutex to
synchronize the ptrace call.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>


[ROCm/ROCR-Runtime commit: d3265234e9]
2025-04-08 09:58:28 -04:00
Choudhary, Rahul 480ca4d5e7 Update workflow to use mainline branch
[ROCm/ROCR-Runtime commit: 5b4c717208]
2025-04-07 09:36:52 -04:00
Choudhary, Rahul d605e9256c Update rocm_ci_caller.yml updating push trigger to amd-mainline
Signed-off-by: Choudhary, Rahul <Rahul.Choudhary@amd.com>

[ROCm/ROCR-Runtime commit: a0b80c825c]
2025-04-07 09:36:52 -04:00
Yiannis Papadopoulos f53a9c72c4 rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition.
[ROCm/ROCR-Runtime commit: c63e01724c]
2025-04-03 15:13:20 -05:00
Lancelot SIX c813d2c62d rocr: Replace tabs with spaces in trap handler source codes
Use spaces consistently to format the trap handler code.  This patch
does not introduce any change in the trap handler.  Using `git show -w`
on this patch shows an empty diff.

Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a


[ROCm/ROCR-Runtime commit: e0359e5d35]
2025-04-03 09:44:23 +01:00
David Yat Sin f46bc26cff rocr: Fix PC Sampling PRED_EXEC num dwords count
Fix incorrect value for number of dwords in the PRED_EXEC command.


[ROCm/ROCR-Runtime commit: 2a433e2b96]
2025-04-01 15:53:45 -04:00
Mallya, Ameya Keshava 7adfc15d58 Adding !verify features
Signed-off-by: Mallya, Ameya Keshava <AmeyaKeshava.Mallya@amd.com>

[ROCm/ROCR-Runtime commit: 39e8911fbc]
2025-03-31 13:05:52 -07:00