1371 Révisions

Auteur SHA1 Message Date
David Yat Sin ec4830eb5c rocr: document pseudo-code for scratch reclaim
Document CP FW and ROCr pseudo-code for asynchronous reclaim.
No code change.


[ROCm/ROCR-Runtime commit: df5d66eae5]
2025-06-11 16:19:59 -04:00
Chris Freehill 91268a6be9 rocr: Add hsa_amd_portable_export_dmabuf_v2
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.


[ROCm/ROCR-Runtime commit: a34604bddb]
2025-06-09 15:42:58 -05:00
Chris Freehill 287986ab65 rocr: Add hsa_amd_portable_export_dmabuf_v2
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.


[ROCm/ROCR-Runtime commit: 3a9d14bb66]
2025-06-09 15:42:58 -05:00
Sunday Clement 5c7524ba3e rocr: Fix Unintentional Integer Overflow
Its safer to have the integer literal explicitly be an unsigned long
in this expression as that's what the type of the errorCode variable
resolves to, preventing any overflow errors.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: dce52be686]
2025-06-09 15:16:10 -04:00
Sunday Clement 1eaee1649a rocr: Fix Unintended Sign Extension
ehdr->e_shentshize and ehdr->e_shnum are both 16-bit unsigned integers
and so their types get implicitly promoted to signed int automatically
during the multiplication, they must be explicitly cast into a larger
unsigned type, otherwise if the signed product is large enough the
value is sign extended resulting in incorrect values.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: d00ca2e9b7]
2025-06-09 15:16:10 -04:00
Alysa Liu 03430838af rocr: Remove structurally dead code
Remove unreachable return statement.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 9b3d15e68d]
2025-06-09 14:01:39 -04:00
Alysa Liu d1c3b7262d rocr: Add proper file descriptor cleanup
Ensure file descriptor 'in' is properly closed in error cases
when calling _lseek() during readFrom() operations.
Fix potential resource leak when errors occur during file operations.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 167602edfb]
2025-06-04 22:37:21 -04:00
Sunday Clement 1da312af87 rocr: Fix Potential Deadlock
Moved the Call to pthread_mutex_lock to an else statement for better
code readibility.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: 1635746a9c]
2025-06-04 10:18:09 -04:00
Sunday Clement 25886ecda8 rocr: Fix Potential Deadlock
Because eventDescrp->mutex is a non-recursive lock attempting to
acquire the lock with pthread_mutex_lock can cause the system to hang
indefinitely if the lock was already previously aquired with the
preceeding call to pthread_mutex_trylock.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: a97b7df4b9]
2025-06-04 10:18:09 -04:00
Alysa Liu 6de1c81b71 rocr: Fix inefficient copy operations
Refactor variable assignments to use std::move() where appropriate.
Updat function headers to accept parameters by const& where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: f6c8cbd293]
2025-06-02 11:18:36 -04:00
Alysa Liu 65f5ce6f0a rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: ae6851dbb4]
2025-06-02 11:18:36 -04:00
Alysa Liu b97f9ba6d5 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Revert change in amd_kfd_driver.cpp.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: a945b5d493]
2025-06-02 11:18:36 -04:00
Alysa Liu 88dd451c64 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 369d89ade3]
2025-06-02 11:18:36 -04:00
Sunday Clement 3d3cca8083 rocr: Fix Resource Leak
allocated memory was previously not freed in the event of an error
with rwlock initialization.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: 293092f32f]
2025-05-30 09:16:26 -04:00
David Yat Sin d2982b797a rocr: Add all sysfs entries for L2 Cache
For L2 Cache and above, we report the total amount of cache for the
whole partition, so we add up the L2 Cache entry for each partition.


[ROCm/ROCR-Runtime commit: fc561ff37a]
2025-05-29 19:02:38 -04:00
David Yat Sin 8f7c7458aa rocr: Remove extra check for page-aligned
ROCr initially had a bug where memory allocations that were not 4K
aligned were internally 4K aligned but ROCr would not keep track
of user-requested size. This would cause some pointer_info queries
to fail, but HIP was already aligning the buffer sizes for IPC
requests. For backward compatibility accross 2 minor versions,
we allowed IPC look-ups to be both aligned and un-aligned.
Removing this check as this 4 minor versions have been released
since then.


[ROCm/ROCR-Runtime commit: d52f1d0453]
2025-05-29 12:35:15 -04:00
David Yat Sin 4515a48355 rocr: Update async-scratch reclaim API doc
[ROCm/ROCR-Runtime commit: c3978d03a4]
2025-05-28 20:08:52 -04:00
David Yat Sin 1b1d4e017a rocr:Fix compile warnings
[ROCm/ROCR-Runtime commit: 11da1293de]
2025-05-28 16:12:02 -04:00
David Yat Sin 39ecc88315 rocr: Remove deprecated doorbell type 1 support
[ROCm/ROCR-Runtime commit: 0d70045817]
2025-05-28 16:12:02 -04:00
David Yat Sin b83b8b4535 rocr: Remove deprecated queue doubleMap code
[ROCm/ROCR-Runtime commit: 4bae509296]
2025-05-28 16:12:02 -04:00
David Yat Sin e84a855c98 rocr: Remove queue_full_workaround code
Remove deprecated queue_full_workaround code as gfx7 and gfx8 GPUs are
EoL.


[ROCm/ROCR-Runtime commit: b8434529a5]
2025-05-28 16:12:02 -04:00
David Yat Sin a16f5380cd rocr: Remove addrlib files for EoL GPUs
[ROCm/ROCR-Runtime commit: 2b691c3d5f]
2025-05-28 16:12:02 -04:00
David Yat Sin 4ecd0382b7 rocr: update required CP FW version
Update required CP FW version required for async-scratch memory support
on gfx950.


[ROCm/ROCR-Runtime commit: 04dbf769f6]
2025-05-28 13:03:58 -04:00
David Yat Sin 5e7bd6145d rocr: Fix compile error when using clang
[ROCm/ROCR-Runtime commit: 9d38ca0d22]
2025-05-27 23:56:28 -04:00
cfreeamd c20e30db93 rocr: Support unmap adjacent mem sections in 1 try
[ROCm/ROCR-Runtime commit: f0ce7a8e59]
2025-05-27 15:13:20 -04:00
Alysa Liu 296e60d882 rocr: Add check for 'value' pointer
Replaces assertion check assert(value) with explicit null pointer check
Returns HSA_STATUS_ERROR_INVALID_ARGUMENT on null valuesrocr: Add check for 'value' pointer

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 625425326d]
2025-05-27 12:18:04 -04:00
Alysa Liu 8cbabdbbe3 rocr: Unchecked return value as arg
v1: Add value pointer validation before
dereferencing in GetInfo method for MODULE_NAME case.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: f1f34da4f6]
2025-05-27 12:18:04 -04:00
cfreeamd 7fe67829ef rocr: Fix ISA generic's for gfx906 wrt sramecc
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.

The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).


[ROCm/ROCR-Runtime commit: b7361c5ee4]
2025-05-27 07:45:00 -05:00
cfreeamd b7d56427ec rocr: Fix ISA generic's for gfx906 wrt sramecc
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.

The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).


[ROCm/ROCR-Runtime commit: 3e99bb6150]
2025-05-27 07:45:00 -05:00
Yifan Zhang 3ab8b5a98b coredump: call KFD_IOC_DBG_TRAP_DISABLE in error path.
KFD assumes kfd_dbg_trap_enable/disable be called in pair, or there will
be kfd_process ref leak in KFD.


[ROCm/ROCR-Runtime commit: ccd91bcd19]
2025-05-27 13:54:00 +08:00
David Yat Sin 342e478e7d rocr: Perform memcpy for small code-object loads
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.


[ROCm/ROCR-Runtime commit: da2607024b]
2025-05-22 18:39:19 -04:00
David Yat Sin 9c5bb61708 rocr: Perform range based cache invalidates
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.


[ROCm/ROCR-Runtime commit: e969e01f54]
2025-05-22 18:39:19 -04:00
Ramakrishnan, Ranjith 85cd72987f CMake: Remove file reorganization backward compatibility code (#176)
The feature has already been disabled, and the related source code is no longer required

[ROCm/ROCR-Runtime commit: 1785cff6a5]
2025-05-22 09:47:26 -07:00
Ben Vanik 62cd7e1f54 rocr: Fix SVM profiler QUEUE_RESTORE parsing
[ROCm/ROCR-Runtime commit: 1a32392912]
2025-05-21 13:17:25 -04:00
Flora Cui 89e5075ce0 rocr: try defaultSignal for intercept_queue
if interrupt is not supported

Signed-off-by: Flora Cui <flora.cui@amd.com>


[ROCm/ROCR-Runtime commit: 8cf4b7fc05]
2025-05-21 09:37:47 -04:00
Yiannis Papadopoulos 69505ab60c Fix formatting
[ROCm/ROCR-Runtime commit: 700078d335]
2025-05-20 13:59:22 -05:00
Yiannis Papadopoulos 38c54b09ac rocr/aie: Correct operand count
[ROCm/ROCR-Runtime commit: c80616d807]
2025-05-20 13:59:22 -05:00
David Yat Sin 38ea4370c1 rocr: Fix doorbell ring
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.


[ROCm/ROCR-Runtime commit: f011a9506d]
2025-05-20 09:19:10 -04:00
Jiadong Zhu b99015f30a rocr/dtif: use default signal for intercept queue for DTIF
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 0f9d2b836c]
2025-05-13 16:44:31 -04:00
Aaron Liu 85a11c729c rocr/dtif: disable interrupt signal for DTIF backend
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 8c1b1201b7]
2025-05-13 16:44:31 -04:00
Jiadong Zhu a0dc167541 rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader
hsaKmtQueueRingDoorbell is specfic to DTIF backend

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: e2d767879d]
2025-05-13 16:44:31 -04:00
Aaron Liu 008bbd94d5 rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: e9088d6e47]
2025-05-13 16:44:31 -04:00
Aaron Liu c6ffc85a47 rocr/dtif: add DRM APIs wrapper in thunk loader
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 0cd4ddd62b]
2025-05-13 16:44:31 -04:00
Aaron Liu 6cf184a0d4 rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...)
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 1b79caa214]
2025-05-13 16:44:31 -04:00
Aaron Liu 87dcbf1255 rocr/dtif: add thunk loader to wrap hsaKmt APIs
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 7ba77fb193]
2025-05-13 16:44:31 -04:00
Aaron Liu 137b168b46 rocr/dtif: add dtif environment variable
Using HSA_ENABLE_DTIF to control dtif/native thunk code path

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 166b0fa45a]
2025-05-13 16:44:31 -04:00
Ma, Li 526db8bbaa rocr: Expose all available DMA engines (#165)
When copying for inter devices, Currently only XGMI as exposed. Now
SDMA0/1 will be exposed as well for inter device copies especially that
they are one of the recommended engines.

Signed-off-by: Li Ma <li.ma@amd.com>

[ROCm/ROCR-Runtime commit: e38dd98914]
2025-05-13 17:42:15 +08:00
Saleel Kudchadker c0b0cb1788 rocr: Expose hsa_amd_memory_get_preferred_copy_engine api
[ROCm/ROCR-Runtime commit: 1eb8694dd2]
2025-05-09 17:13:27 -07:00
Shane Xiao f8ac975cd2 rocr: Set rec_sdma_eng_override_ for all gpus
Set the rec_sdma_eng_override_ for other gpus, or DmaCopyOnEngine
will use sdma for D<->D copy, which will trigger invalid argument.


[ROCm/ROCR-Runtime commit: 82a88f2e2b]
2025-05-08 23:52:12 +08:00
christian-heusel 6c8a2da29a rocr:Add missing cstdint include
[ROCm/ROCR-Runtime commit: 5cc61b714d]
2025-05-06 20:52:48 -04:00