نمودار کامیت

1218 کامیت‌ها

مولف SHA1 پیام تاریخ
Sunday Clement dce52be686 rocr: Fix Unintentional Integer Overflow
Its safer to have the integer literal explicitly be an unsigned long
in this expression as that's what the type of the errorCode variable
resolves to, preventing any overflow errors.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-09 15:16:10 -04:00
Sunday Clement d00ca2e9b7 rocr: Fix Unintended Sign Extension
ehdr->e_shentshize and ehdr->e_shnum are both 16-bit unsigned integers
and so their types get implicitly promoted to signed int automatically
during the multiplication, they must be explicitly cast into a larger
unsigned type, otherwise if the signed product is large enough the
value is sign extended resulting in incorrect values.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-09 15:16:10 -04:00
Alysa Liu 9b3d15e68d rocr: Remove structurally dead code
Remove unreachable return statement.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-09 14:01:39 -04:00
Alysa Liu 167602edfb rocr: Add proper file descriptor cleanup
Ensure file descriptor 'in' is properly closed in error cases
when calling _lseek() during readFrom() operations.
Fix potential resource leak when errors occur during file operations.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-04 22:37:21 -04:00
Sunday Clement 1635746a9c rocr: Fix Potential Deadlock
Moved the Call to pthread_mutex_lock to an else statement for better
code readibility.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-04 10:18:09 -04:00
Sunday Clement a97b7df4b9 rocr: Fix Potential Deadlock
Because eventDescrp->mutex is a non-recursive lock attempting to
acquire the lock with pthread_mutex_lock can cause the system to hang
indefinitely if the lock was already previously aquired with the
preceeding call to pthread_mutex_trylock.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-04 10:18:09 -04:00
Alysa Liu f6c8cbd293 rocr: Fix inefficient copy operations
Refactor variable assignments to use std::move() where appropriate.
Updat function headers to accept parameters by const& where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu ae6851dbb4 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu a945b5d493 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Revert change in amd_kfd_driver.cpp.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu 369d89ade3 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Sunday Clement 293092f32f rocr: Fix Resource Leak
allocated memory was previously not freed in the event of an error
with rwlock initialization.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-05-30 09:16:26 -04:00
David Yat Sin fc561ff37a rocr: Add all sysfs entries for L2 Cache
For L2 Cache and above, we report the total amount of cache for the
whole partition, so we add up the L2 Cache entry for each partition.
2025-05-29 19:02:38 -04:00
David Yat Sin d52f1d0453 rocr: Remove extra check for page-aligned
ROCr initially had a bug where memory allocations that were not 4K
aligned were internally 4K aligned but ROCr would not keep track
of user-requested size. This would cause some pointer_info queries
to fail, but HIP was already aligning the buffer sizes for IPC
requests. For backward compatibility accross 2 minor versions,
we allowed IPC look-ups to be both aligned and un-aligned.
Removing this check as this 4 minor versions have been released
since then.
2025-05-29 12:35:15 -04:00
David Yat Sin c3978d03a4 rocr: Update async-scratch reclaim API doc 2025-05-28 20:08:52 -04:00
David Yat Sin 11da1293de rocr:Fix compile warnings 2025-05-28 16:12:02 -04:00
David Yat Sin 0d70045817 rocr: Remove deprecated doorbell type 1 support 2025-05-28 16:12:02 -04:00
David Yat Sin 4bae509296 rocr: Remove deprecated queue doubleMap code 2025-05-28 16:12:02 -04:00
David Yat Sin b8434529a5 rocr: Remove queue_full_workaround code
Remove deprecated queue_full_workaround code as gfx7 and gfx8 GPUs are
EoL.
2025-05-28 16:12:02 -04:00
David Yat Sin 2b691c3d5f rocr: Remove addrlib files for EoL GPUs 2025-05-28 16:12:02 -04:00
David Yat Sin 04dbf769f6 rocr: update required CP FW version
Update required CP FW version required for async-scratch memory support
on gfx950.
2025-05-28 13:03:58 -04:00
David Yat Sin 9d38ca0d22 rocr: Fix compile error when using clang 2025-05-27 23:56:28 -04:00
cfreeamd f0ce7a8e59 rocr: Support unmap adjacent mem sections in 1 try 2025-05-27 15:13:20 -04:00
Alysa Liu 625425326d rocr: Add check for 'value' pointer
Replaces assertion check assert(value) with explicit null pointer check
Returns HSA_STATUS_ERROR_INVALID_ARGUMENT on null valuesrocr: Add check for 'value' pointer

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-05-27 12:18:04 -04:00
Alysa Liu f1f34da4f6 rocr: Unchecked return value as arg
v1: Add value pointer validation before
dereferencing in GetInfo method for MODULE_NAME case.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-05-27 12:18:04 -04:00
cfreeamd b7361c5ee4 rocr: Fix ISA generic's for gfx906 wrt sramecc
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.

The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
2025-05-27 07:45:00 -05:00
cfreeamd 3e99bb6150 rocr: Fix ISA generic's for gfx906 wrt sramecc
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.

The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
2025-05-27 07:45:00 -05:00
Yifan Zhang ccd91bcd19 coredump: call KFD_IOC_DBG_TRAP_DISABLE in error path.
KFD assumes kfd_dbg_trap_enable/disable be called in pair, or there will
be kfd_process ref leak in KFD.
2025-05-27 13:54:00 +08:00
David Yat Sin da2607024b rocr: Perform memcpy for small code-object loads
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.
2025-05-22 18:39:19 -04:00
David Yat Sin e969e01f54 rocr: Perform range based cache invalidates
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.
2025-05-22 18:39:19 -04:00
Ramakrishnan, Ranjith 1785cff6a5 CMake: Remove file reorganization backward compatibility code (#176)
The feature has already been disabled, and the related source code is no longer required
2025-05-22 09:47:26 -07:00
Ben Vanik 1a32392912 rocr: Fix SVM profiler QUEUE_RESTORE parsing 2025-05-21 13:17:25 -04:00
Flora Cui 8cf4b7fc05 rocr: try defaultSignal for intercept_queue
if interrupt is not supported

Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-05-21 09:37:47 -04:00
Yiannis Papadopoulos 700078d335 Fix formatting 2025-05-20 13:59:22 -05:00
Yiannis Papadopoulos c80616d807 rocr/aie: Correct operand count 2025-05-20 13:59:22 -05:00
David Yat Sin f011a9506d rocr: Fix doorbell ring
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.
2025-05-20 09:19:10 -04:00
Jiadong Zhu 0f9d2b836c rocr/dtif: use default signal for intercept queue for DTIF
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 8c1b1201b7 rocr/dtif: disable interrupt signal for DTIF backend
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Jiadong Zhu e2d767879d rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader
hsaKmtQueueRingDoorbell is specfic to DTIF backend

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu e9088d6e47 rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 0cd4ddd62b rocr/dtif: add DRM APIs wrapper in thunk loader
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 1b79caa214 rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...)
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 7ba77fb193 rocr/dtif: add thunk loader to wrap hsaKmt APIs
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 166b0fa45a rocr/dtif: add dtif environment variable
Using HSA_ENABLE_DTIF to control dtif/native thunk code path

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Ma, Li e38dd98914 rocr: Expose all available DMA engines (#165)
When copying for inter devices, Currently only XGMI as exposed. Now
SDMA0/1 will be exposed as well for inter device copies especially that
they are one of the recommended engines.

Signed-off-by: Li Ma <li.ma@amd.com>
2025-05-13 17:42:15 +08:00
Saleel Kudchadker 1eb8694dd2 rocr: Expose hsa_amd_memory_get_preferred_copy_engine api 2025-05-09 17:13:27 -07:00
Shane Xiao 82a88f2e2b rocr: Set rec_sdma_eng_override_ for all gpus
Set the rec_sdma_eng_override_ for other gpus, or DmaCopyOnEngine
will use sdma for D<->D copy, which will trigger invalid argument.
2025-05-08 23:52:12 +08:00
christian-heusel 5cc61b714d rocr:Add missing cstdint include 2025-05-06 20:52:48 -04:00
David Yat Sin 4ed5950beb rocr: Fix logic for scratch reclaim
Fix logic error that can cause scratch memory to be reclaimed while a
dispatch is still using it.
2025-04-29 17:23:45 -04:00
Tony Gutierrez f2c482d923 rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
2025-04-23 15:53:29 -04:00
Tony Gutierrez 6e3c375bf1 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
2025-04-23 15:53:29 -04:00