Граф коммитов

2890 Коммитов

Автор SHA1 Сообщение Дата
David Yat Sin b83b8b4535 rocr: Remove deprecated queue doubleMap code
[ROCm/ROCR-Runtime commit: 4bae509296]
2025-05-28 16:12:02 -04:00
David Yat Sin e84a855c98 rocr: Remove queue_full_workaround code
Remove deprecated queue_full_workaround code as gfx7 and gfx8 GPUs are
EoL.


[ROCm/ROCR-Runtime commit: b8434529a5]
2025-05-28 16:12:02 -04:00
David Yat Sin a16f5380cd rocr: Remove addrlib files for EoL GPUs
[ROCm/ROCR-Runtime commit: 2b691c3d5f]
2025-05-28 16:12:02 -04:00
David Yat Sin 4ecd0382b7 rocr: update required CP FW version
Update required CP FW version required for async-scratch memory support
on gfx950.


[ROCm/ROCR-Runtime commit: 04dbf769f6]
2025-05-28 13:03:58 -04:00
David Yat Sin 5e7bd6145d rocr: Fix compile error when using clang
[ROCm/ROCR-Runtime commit: 9d38ca0d22]
2025-05-27 23:56:28 -04:00
Apurv Mishra 226d8126c9 kfdtest: Disable KFD RAS test case
disable KFD RAS test case as the tests cause GPU reset
which affects the active kfdtest, the tests can only be
run successfully as separate processes

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: d9a95605cc]
2025-05-27 19:04:04 -04:00
cfreeamd c20e30db93 rocr: Support unmap adjacent mem sections in 1 try
[ROCm/ROCR-Runtime commit: f0ce7a8e59]
2025-05-27 15:13:20 -04:00
Alysa Liu 296e60d882 rocr: Add check for 'value' pointer
Replaces assertion check assert(value) with explicit null pointer check
Returns HSA_STATUS_ERROR_INVALID_ARGUMENT on null valuesrocr: Add check for 'value' pointer

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 625425326d]
2025-05-27 12:18:04 -04:00
Alysa Liu 8cbabdbbe3 rocr: Unchecked return value as arg
v1: Add value pointer validation before
dereferencing in GetInfo method for MODULE_NAME case.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: f1f34da4f6]
2025-05-27 12:18:04 -04:00
cfreeamd 7fe67829ef rocr: Fix ISA generic's for gfx906 wrt sramecc
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.

The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).


[ROCm/ROCR-Runtime commit: b7361c5ee4]
2025-05-27 07:45:00 -05:00
cfreeamd b7d56427ec rocr: Fix ISA generic's for gfx906 wrt sramecc
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.

The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).


[ROCm/ROCR-Runtime commit: 3e99bb6150]
2025-05-27 07:45:00 -05:00
Yifan Zhang 3ab8b5a98b coredump: call KFD_IOC_DBG_TRAP_DISABLE in error path.
KFD assumes kfd_dbg_trap_enable/disable be called in pair, or there will
be kfd_process ref leak in KFD.


[ROCm/ROCR-Runtime commit: ccd91bcd19]
2025-05-27 13:54:00 +08:00
Eric Huang 0d5e261f39 libhsakmt: optimize big system buffer allocation
To change biggest single buffer to be huge page aligned
and other optimization.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>


[ROCm/ROCR-Runtime commit: afe7965796]
2025-05-26 18:30:00 -04:00
Eric Huang 2c6f84b12c libhsakmt: add big system buffer allocation support
when allocating userptr buffer in system ram with size bigger
than or equal 512G, TTM has limit and returns error, to split one
big buffer into multiple small buffers in vm_object will solve
this issue.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>


[ROCm/ROCR-Runtime commit: 8887d25304]
2025-05-26 11:04:30 -04:00
Flora Cui 4360679cb7 rocrtst: performance::memory_async_copy test fix on DXG
Signed-off-by: Flora Cui <flora.cui@amd.com>


[ROCm/ROCR-Runtime commit: e884650952]
2025-05-26 15:01:27 +08:00
Amber Lin 9c6828647b kfdtest: blacklist KFDSVMEvictTest.QueueTest
Temporarily blacklist KFDSVMEvictTest.QueueTest on gfx950

Signed-off-by: Amber Lin <Amber.Lin@amd.com>


[ROCm/ROCR-Runtime commit: 31d51acb26]
2025-05-23 01:22:11 -04:00
David Yat Sin 342e478e7d rocr: Perform memcpy for small code-object loads
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.


[ROCm/ROCR-Runtime commit: da2607024b]
2025-05-22 18:39:19 -04:00
David Yat Sin 9c5bb61708 rocr: Perform range based cache invalidates
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.


[ROCm/ROCR-Runtime commit: e969e01f54]
2025-05-22 18:39:19 -04:00
Ramakrishnan, Ranjith 85cd72987f CMake: Remove file reorganization backward compatibility code (#176)
The feature has already been disabled, and the related source code is no longer required

[ROCm/ROCR-Runtime commit: 1785cff6a5]
2025-05-22 09:47:26 -07:00
Philip Yang 4ac71d1f5d kfdtest: Add KFDQMTest UserQueueBufValidation
Create CP queue and SDMA queue should fail with invalid queue ring
buffer or ring buffer size.

Test unmap or free queue buffers should fail before queue is destroyed.

Use child process to test unmap CWSR buffer will evict queue.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I5dcd51d6b43445d19a986f8b0b82063e20348a5f


[ROCm/ROCR-Runtime commit: bd86fb1e63]
2025-05-22 10:06:42 -04:00
Philip Yang 50886316e9 libhsakmt: unmap from GPU error handling
If unmap from GPU return failed, for example, unmap user queue buffer
while queue is active, we should not free obj->mapped_node_id_array,
otherwise, the following unmap user queue buffer after queue is
destroyed still return failed.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I32aeb18871c2e971d01900d92916c54680f5c9fa


[ROCm/ROCR-Runtime commit: 3e6f51b715]
2025-05-22 10:06:42 -04:00
Apurv Mishra 5c42a9f1bf kfdtest: Disable tests that cause unwanted behavior
disable KFDLocalMemoryTest.Fragmentation and
KFDEventTest.MeasureInterruptConsumption as
part of the  KFD test suite improvement feature

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: f853dda9ba]
2025-05-21 16:29:15 -04:00
Ben Vanik ba02a7b1ca kfdtest: Fix SVM profiler QUEUE_RESTORE parsing
[ROCm/ROCR-Runtime commit: d54124383f]
2025-05-21 13:17:25 -04:00
Ben Vanik 62cd7e1f54 rocr: Fix SVM profiler QUEUE_RESTORE parsing
[ROCm/ROCR-Runtime commit: 1a32392912]
2025-05-21 13:17:25 -04:00
Flora Cui 89e5075ce0 rocr: try defaultSignal for intercept_queue
if interrupt is not supported

Signed-off-by: Flora Cui <flora.cui@amd.com>


[ROCm/ROCR-Runtime commit: 8cf4b7fc05]
2025-05-21 09:37:47 -04:00
Yiannis Papadopoulos 69505ab60c Fix formatting
[ROCm/ROCR-Runtime commit: 700078d335]
2025-05-20 13:59:22 -05:00
Yiannis Papadopoulos 38c54b09ac rocr/aie: Correct operand count
[ROCm/ROCR-Runtime commit: c80616d807]
2025-05-20 13:59:22 -05:00
David Yat Sin 38ea4370c1 rocr: Fix doorbell ring
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.


[ROCm/ROCR-Runtime commit: f011a9506d]
2025-05-20 09:19:10 -04:00
Aaron Liu ba372ca4a8 rocrtst/dtif: performance::memory_async_copy test fix on DTIF
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Feifei Xu <feifxu@amd.com>
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 297ea78140]
2025-05-13 16:44:31 -04:00
Jiadong Zhu b99015f30a rocr/dtif: use default signal for intercept queue for DTIF
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 0f9d2b836c]
2025-05-13 16:44:31 -04:00
Aaron Liu 85a11c729c rocr/dtif: disable interrupt signal for DTIF backend
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 8c1b1201b7]
2025-05-13 16:44:31 -04:00
Jiadong Zhu a0dc167541 rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader
hsaKmtQueueRingDoorbell is specfic to DTIF backend

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: e2d767879d]
2025-05-13 16:44:31 -04:00
Aaron Liu 008bbd94d5 rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: e9088d6e47]
2025-05-13 16:44:31 -04:00
Aaron Liu c6ffc85a47 rocr/dtif: add DRM APIs wrapper in thunk loader
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 0cd4ddd62b]
2025-05-13 16:44:31 -04:00
Aaron Liu 6cf184a0d4 rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...)
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 1b79caa214]
2025-05-13 16:44:31 -04:00
Aaron Liu 87dcbf1255 rocr/dtif: add thunk loader to wrap hsaKmt APIs
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 7ba77fb193]
2025-05-13 16:44:31 -04:00
Aaron Liu 137b168b46 rocr/dtif: add dtif environment variable
Using HSA_ENABLE_DTIF to control dtif/native thunk code path

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 166b0fa45a]
2025-05-13 16:44:31 -04:00
Ma, Li 526db8bbaa rocr: Expose all available DMA engines (#165)
When copying for inter devices, Currently only XGMI as exposed. Now
SDMA0/1 will be exposed as well for inter device copies especially that
they are one of the recommended engines.

Signed-off-by: Li Ma <li.ma@amd.com>

[ROCm/ROCR-Runtime commit: e38dd98914]
2025-05-13 17:42:15 +08:00
Hila, Nino 24b8070788 Update palamida.yml (#158)
* Update palamida.yml

Signed-off-by: Hila, Nino <Nino.Hila@amd.com>

* Add palamida.yml

---------

Signed-off-by: Hila, Nino <Nino.Hila@amd.com>

[ROCm/ROCR-Runtime commit: f5daf75abf]
2025-05-12 21:37:36 -07:00
Saleel Kudchadker c0b0cb1788 rocr: Expose hsa_amd_memory_get_preferred_copy_engine api
[ROCm/ROCR-Runtime commit: 1eb8694dd2]
2025-05-09 17:13:27 -07:00
Shane Xiao f8ac975cd2 rocr: Set rec_sdma_eng_override_ for all gpus
Set the rec_sdma_eng_override_ for other gpus, or DmaCopyOnEngine
will use sdma for D<->D copy, which will trigger invalid argument.


[ROCm/ROCR-Runtime commit: 82a88f2e2b]
2025-05-08 23:52:12 +08:00
christian-heusel 6c8a2da29a rocr:Add missing cstdint include
[ROCm/ROCR-Runtime commit: 5cc61b714d]
2025-05-06 20:52:48 -04:00
Searles, Mark f698518819 Update createMCObjectStreamer() to use new LLVM API (#156) (#157)
* Update createMCObjectStreamer() to use new LLVM API

Obsolete interfaces were removed via llvm-project's
f2ff298867d7733122e32eead5a8c524b09dfdb1

* Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR

* Fix typo

[ROCm/ROCR-Runtime commit: ac1e6d59c2]
2025-05-05 13:18:05 -07:00
Apurv Mishra aa896090f8 kfdtest: Update ROCr homepage in CMakeLists.txt
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: aa0a32a166]
2025-05-01 11:22:49 -04:00
David Yat Sin b48b401a09 rocr: Fix logic for scratch reclaim
Fix logic error that can cause scratch memory to be reclaimed while a
dispatch is still using it.


[ROCm/ROCR-Runtime commit: 4ed5950beb]
2025-04-29 17:23:45 -04:00
Amber Lin 9d98d7479d kfdtest: Skip SVMEvict with xnack=0
Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on
NPS2/DPX mode. It's seen with xnack off and happens more often on the
partition with less VRAM because of TMR.

Temporarily skip SVM Evict tests on Family AV when xnack is disabled.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>


[ROCm/ROCR-Runtime commit: 5e28208cec]
2025-04-25 12:45:36 -04:00
Tony Gutierrez ce61e3301b rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).


[ROCm/ROCR-Runtime commit: f2c482d923]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 6f37386eb2 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.


[ROCm/ROCR-Runtime commit: 6e3c375bf1]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 18404ba8a8 rocr: Remove empty shared.cpp
[ROCm/ROCR-Runtime commit: 11d1d2cd25]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 3ebcf3020f rocr/libhsakmt: Add coarse-grain allocator to GPU
[ROCm/ROCR-Runtime commit: adbc0495e2]
2025-04-23 15:53:29 -04:00