169 Révisions

Auteur SHA1 Message Date
pghoshamd bc20b51f40 SWDEV-561708 Counted queue size from env var (#2844)
* SWDEV-561708 Counted queue size from env var

* use counted_queue_size for test

* remove rocrtst changes; add a const for default queue size

* Remove env var from test; use queue->size

* Improve env var documentation

* Correct type
2026-01-29 10:00:37 -05:00
German Andryeyev 196baa4321 rocr: Fix static build in Windows (#2660) 2026-01-21 18:44:51 -05:00
pghoshamd 793755532f SWDEV-561708 Initial shared queue pool apis (#1614)
* SWDEV-561708 Initial shared queue pool apis

* Validate params; some fixes in callback function (but still needs to be checked)

* Dtor cleanup

* minor

* Enable profiling; remove callback since aql_queue takes care of it

* setPriority and setCuMask APIs updated for counted queues

* Increasing step and minor version for rocprofiler

* Tests for CountedQueueManager

* tests

* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending

* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added

* Changed to ASSERT_SUCCESS macros for all tests

* RIng buffer overflow test added

* tests fixed; cleanup added at hsa_shutdown

* priority conversion table changes

* Compiler warnings fixed

* Rewrite 1 test; add desc and improve SetUp() code

* Improvement

* Unififed getinfo for both counted and non-counted queues

* Address PR feedback

* Addressing feedback: memleak, data type mismatch, documentation

* improve comment

* format

* Missing HSA_API macros for roctracer

* Revert "Addressing feedback: memleak, data type mismatch, documentation"

This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.

* Improving acquire api doc

* release api doc improved

* error codes for release api doc
2026-01-21 15:30:04 -05:00
Alysa Liu 9139f5a241 Revert "rocr: Switch back to legacy IPC (#1744)" (#2676)
This reverts commit 7e4b62290c.
2026-01-20 14:34:10 -05:00
hongkzha-amd b3c4e94e70 rocr: Improve memory protection and WSL compatibility (#2274)
* rocr: Add ProtectMemory API and use it in RemoveAccess
Replace munmap + mmap with mprotect when removing memory access.
This improves performance by 5-10x, ensures atomicity (no race
condition window), and prepares for WSL/DXG compatibility fixes.

Suggested-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: Skip CPU mapping operations on WSL
On WSL, CPU cannot access GPU VRAM due to platform restrictions.
CPU access would fault-in system RAM instead, causing data corruption
and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than
silently creating broken mappings. GPU-to-GPU mappings remain functional.

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: reduce ifdef linux
v2: Fix IsDXG check logic

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

---------
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-13 12:08:20 -06:00
pghoshamd 637b0d71f0 SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers (#2146)
* SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers

* Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex

* Replaced unique_locks with lock_guards

* More changes

* Replace new and deletes with smart pointers

* Replaced some more with shared ptrs

* Replacements with smart pointers - pt 2

* missed change
2026-01-06 10:59:34 -05:00
Rahul Manocha dd4bee33ff SWDEV-558848 - Update thunk interface signature for vmm enablement (#2259)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-12-11 08:43:28 -08:00
Rahul Manocha 0c1f87a7f6 SWDEV-558848 - vmm api support for rocr on windows (#1761)
* SWDEV-558848 - vmm api support for rocr on windows

* Fixes to VMM handle Map/Unmap Set/Get Access

* Fix GetShareableHandle to use pointer for shareable handle

* Update os specific map/unmap memory calls

* clang format update

* Minor syntax fixes from code review

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
2025-12-10 08:39:51 -08:00
Mario Limonciello bc5d48e76c Run pre-commit's whitespace related hooks on projects/rocr-runtime (#2130)
* Run pre-commit's whitespace related hooks on projects/rocr-runtime

In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Add missing semicolon which would block compilation on big endian CPUs

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-08 07:56:50 -06:00
cfreeamd 24c2a84e3f rocr: GPU core file location support (#1732)
* rocr: WIP Support dump of GPU core file

* WIP new core dump tests compile

* WIP: anony namespaces, test updates, progress

Added disabled Fault test. Other non-disabled coredump tests don't work.

* WIP: address code review feedback

* WIP: gpu core dump rocrtst works; combined

* WIP: remove rocrtst changes for this commit
2025-11-20 18:50:51 -08:00
David Yat Sin 7e4b62290c rocr: Switch back to legacy IPC (#1744)
Switch back to legacy IPC Implementation while we fix some race
conditions.
2025-11-13 09:41:55 -05:00
German Andryeyev ee1158b7b8 rocr: Fix Windows build and Ctz implementation (#1634) 2025-11-03 12:07:11 -05:00
Rahul Manocha 4f075902fc SWDEV-555347 - Remove lock contention in async events loop (#878)
* SWDEV-555347 - Remove lock contention in async events loop

* SWDEV-555347 - Introduce Pool of AsyncEventItems

* create generic mempool for AsyncEventItem

* Use BaseShared allocate and free for async event pool

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-10-24 08:43:00 -07:00
systems-assistant[bot] bebe65f104 rocr: fix nullptr dereference (#262)
* rocr: fix nullptr dereference

Return early in the case that malloc fails to avoid dereferencing of a
null pointer on eventDescrp.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix potential nullptr dereference

returns early if sym->section() fails to properly acquire the object.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
2025-10-21 13:49:01 -04:00
axie_amdeng dde482d224 rocr: unitialized size variable caused huge memory/space allocation (#1232)
Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>
2025-10-14 16:57:10 -04:00
German Andryeyev 913743d433 Add windows build support into ROCr (#912)
Make sure ROCR can be compiled under windows. Extra setup for the windows build environment is required. The change should not have any functional changes under Linux.
2025-09-19 10:10:17 -04:00
systems-assistant[bot] f1fabcfd64 rocr: Error Handling Issues (#264)
* rocr: Fix Incorrect Assertion Check

The wrong variable is used in the assertion statement, should be error
checking for the value of paramEndLoc after it is modified by the call
to find().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Potential Undefined Behaviour

In the event that the SvmProfileControl destructor is called and
event == -1 is true then the call to close(event) is effectively
close(-1) which is undefined behaviour. This has been changed to only
call close() on valid file descriptors.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Add Error Check on Bytes Read

In the case that there is an incomplete read the call to copyTo() will
now return an error.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Exception Error

Destructors are implicitly marked with noexcept being true by default
so if its not explicitly marked false in the destructor or the
functions it calls, any thrown exceptions will cause the program to
crash.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
2025-09-16 09:43:45 -04:00
Flora Cui e7cb108a5e [rocr-runtime] Add support for WSL DXG devices (#854)
* rocr/rocdxg: add rocdxg support

* rocr/dxg: set flags for dxg env

* rocr: ring doorbell for dtif/dxg

* rocr/dxg: sdma changes

1. align command size to 64
2. call hsaKmtQueueRingDoorbell
3. disable gcr && hdp flush


Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-09-09 10:16:57 +08:00
systems-assistant[bot] 83a10986a4 SWDEV-539130 - Log blit copy duration (#258)
Co-authored-by: Pengda Xie <pengda.xie@amd.com>
2025-09-03 10:01:47 -07:00
jokim-amd 700afd2d17 Re-Enable IPC DMA Bufs by default
Let ROCr use the new IPC-DMA bufs path.
2025-08-14 18:49:09 -04:00
mat3ix c41050d01f rocr: SDMA improvements (#326)
- When SDMA queue gets full when copying 2GB or more it blocks async
copy api
- Improve/format logging
2025-08-13 10:25:29 -04:00
zichguan-amd 0c698557a0 rocr: check _SC_LEVEL1_DCACHE_LINESIZE before use
Support musl
Fixes ROCm/ROCR-Runtime#318

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/ROCR-Runtime commit: 7946ddb647]
2025-07-14 14:44:31 -04:00
Sunday Clement 90e35e8486 rocr: Remove Recursive Include
Removed unnecessary header inlude in file to prevent circular include.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: 31b6474801]
2025-06-13 12:29:52 -04:00
Sunday Clement 1da312af87 rocr: Fix Potential Deadlock
Moved the Call to pthread_mutex_lock to an else statement for better
code readibility.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: 1635746a9c]
2025-06-04 10:18:09 -04:00
Sunday Clement 25886ecda8 rocr: Fix Potential Deadlock
Because eventDescrp->mutex is a non-recursive lock attempting to
acquire the lock with pthread_mutex_lock can cause the system to hang
indefinitely if the lock was already previously aquired with the
preceeding call to pthread_mutex_trylock.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: a97b7df4b9]
2025-06-04 10:18:09 -04:00
Alysa Liu 88dd451c64 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 369d89ade3]
2025-06-02 11:18:36 -04:00
Sunday Clement 3d3cca8083 rocr: Fix Resource Leak
allocated memory was previously not freed in the event of an error
with rwlock initialization.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: 293092f32f]
2025-05-30 09:16:26 -04:00
David Yat Sin 1b1d4e017a rocr:Fix compile warnings
[ROCm/ROCR-Runtime commit: 11da1293de]
2025-05-28 16:12:02 -04:00
David Yat Sin 342e478e7d rocr: Perform memcpy for small code-object loads
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.


[ROCm/ROCR-Runtime commit: da2607024b]
2025-05-22 18:39:19 -04:00
Aaron Liu 137b168b46 rocr/dtif: add dtif environment variable
Using HSA_ENABLE_DTIF to control dtif/native thunk code path

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>


[ROCm/ROCR-Runtime commit: 166b0fa45a]
2025-05-13 16:44:31 -04:00
Tony Gutierrez 6f37386eb2 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.


[ROCm/ROCR-Runtime commit: 6e3c375bf1]
2025-04-23 15:53:29 -04:00
lyndonli e9c934c116 rocr: Remove redundant Refresh() call
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().

Signed-off-by: lyndonli <Lyndon.Li@amd.com>


[ROCm/ROCR-Runtime commit: c34a2798ce]
2025-03-25 09:13:59 -04:00
David Yat Sin e130172218 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.


[ROCm/ROCR-Runtime commit: 02b38d0614]
2025-03-06 16:23:38 -05:00
David Yat Sin d93d05bcf1 rocr: Temporarily disable alternate scratch memory
Temporarily disable alternate scratch memory usage by default due to
some stability issues.


[ROCm/ROCR-Runtime commit: 9a950ab788]
2025-03-03 09:27:29 -05:00
David Yat Sin 5905b82579 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.


[ROCm/ROCR-Runtime commit: aa2f98e6f9]
2025-02-19 21:02:00 -05:00
Sv. Lockal d1507361ec Fix build issues for musl libc (#267)
Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9


[ROCm/ROCR-Runtime commit: 5d04bd42f3]
2025-01-29 14:31:05 +00:00
Yiannis Papadopoulos 428cc5b47c rocr/aie: Add dma-buf import support for AIEAgents via the Driver interface
Change-Id: I70f8d8772dda7c06944d75042cb3034ddd89aff4


[ROCm/ROCR-Runtime commit: 26bfa0b8f6]
2025-01-27 15:22:46 -05:00
Shweta Khatri 4325142db1 rocr: Use view3dAs2dArray flag, for thick/3D swizzle modes.
Added HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving. Note that all features of 3D images are supported
with 2D swizzles,it's just that the access patterns are different
and therefore cache hit-rates may be better or worse, depending
on how it's used. Volumetric algorithms do better with 3D and apps
that tend to access a single slice at a time do better with 2D.

Change-Id: Id8574a6710fe4333a1ee331e5ce9195a81434198


[ROCm/ROCR-Runtime commit: 6361466baa]
2025-01-27 09:28:33 -05:00
David Yat Sin 922b61ddee rocr: Add thread priority for AsyncEventHandler
Set priority to maximum for signal event handler and minimum for
exceptions event handler.

Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc


[ROCm/ROCR-Runtime commit: 7ea25ebb85]
2025-01-24 10:08:12 -05:00
Eddie Richter 8ea388af92 rocr/aie: AIE Queue Processing
Change-Id: I681c971ba7229037ca85d5529838aa7bbe5820e2


[ROCm/ROCR-Runtime commit: e9cc839b2b]
2024-12-10 10:50:02 -05:00
Apurv Mishra baf737a3cb rocr: declare 'args' as class member in 'os_thread'
Removed 'args' as a unique pointer and deletion in
'ThreadTrampoline', then declared as a class member.

Change-Id: Ia52058392d0170e8b5e57cfdd2c587f47a6f93f0
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>


[ROCm/ROCR-Runtime commit: 89115369cc]
2024-11-27 10:27:40 -05:00
David Yat Sin ed5bbc1eeb rocr: Fix sem_post overflow errors
WaitSemaphore and PostSemaphore are used in the HybridMutex
implementation. If HybridMutex did not have to call WaitSemaphore when
acquired, then calling PostSemaphore would cause the internal count
inside sem_t to slowly grow to large values and eventually cause
overflow.

Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc


[ROCm/ROCR-Runtime commit: f58aff630c]
2024-11-13 21:57:26 -05:00
David Yat Sin 3e694d739a rocr: Add HSA_SIGNAL_WAIT_ABORT_TIMEOUT
Add support for abort timeout when hsa_signal_wait_relaxed is called and
signal does not clear within timeout.
timeout is in seconds

Change-Id: If1db5a8af33c82ddc4b48968c3d8eceb97d0ea6d


[ROCm/ROCR-Runtime commit: 4ec730f1dc]
2024-11-13 21:57:02 -05:00
German Andryeyev 6617af10e6 rocr: Disable WaitAny() in AsyncEventsLoop()
- Add the new path to avoid WaitAny() calls  in AsyncEventsLoopp() with
HSA_WAIT_ANY_DEBUG key. The new path is selected by default.
The optimizaiton combines all logic of WaitAny() in a single processing loop
and avoids extra memory allocations or ref counting.  Also it won't spin
on the CPU if all events are busy.

Change-Id: I197ce60d0d023fbb672f700d6e87702686f1f55a


[ROCm/ROCR-Runtime commit: 0fc7369ba5]
2024-10-25 14:37:02 -04:00
Jonathan Kim ff4690de61 rocr: Fix IPC DMA Buf fragment handling and enable for development
Discarding blocks for reallocation on IPC export for better memory
performance trigger memory violations with DMA BUF exports so bypass
this for now as application performance drops haven't been observed
with the bypass.

The raw fragment should be passed to the DMA Buf export call as well
since offsets will be implicitly applied in the Thunk/KFD for
export/import calls.

Also, use the agent information directly from the pointer
information so that the export call doesn't have to scan memory to find
this.  Pass the node ID in the handle so that the import call doesn't
have to make two thunk imports to fetch the node ID for GPU memory
imports.

Finally, allow the user to use DMA Buf IPC via
HSA_ENABLE_IPC_MODE_LEGACY=0 for developer testing as legacy mode will
be applied by default.

Change-Id: Ie8fe267f8768fa5df37126078406f7065f69ff4e


[ROCm/ROCR-Runtime commit: 32bb0764b7]
2024-09-27 14:40:42 -04:00
Saleel Kudchadker 8d1fe1f7ea rocr: Allocate AQL queue on device memory
- Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device
memory.
- Before writing AQL packet header to the queue use an SFENCE to ensure
that there is no reodering of the writes over PCIE

Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d


[ROCm/ROCR-Runtime commit: 3baaa6e9c0]
2024-09-05 17:48:02 -04:00
David Yat Sin de85c5738e rocr: Handle pthread_create returning errors
Rewriting logic to fix issue where pthread_create would return errors
other than EINVAL, and these errors would be ignored.

Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06


[ROCm/ROCR-Runtime commit: c8dd4d2b3b]
2024-08-22 12:15:10 -04:00
Jonathan Kim b6aa5a4c09 rocr: Memory copy based on recommended SDMA engines
Recommended SDMA engines for DMA copies are now exposed for better
GPU-GPU performance. ROCr can now select those DMA engines.

Also lock-in host-device copies to SDMA0 and device-host copies to
SDMA1 for better stability and performance.

Change-Id: Ideff2e13daf537104efecb8b837bd49ee5096cb5


[ROCm/ROCR-Runtime commit: eb30a5bbc7]
2024-08-20 16:22:32 -04:00
James Xu e5d7121245 Fix compile errors with musl>=1.2.3
Patch submitted on behalf of user AngryLoki:

The fix repeats common pattern, used for musl, 
e.g: https://github.com/void-linux/void-packages/blob/5ccf1c66a1df2d644e1a0db0a68fca321469c57e/srcpkgs/MangoHud/patches/0001-elfhacks-d_un.d_ptr-is-relative-on-non-glibc-systems.patch#L90.

Quoting:
d_un.d_ptr is relative on non glibc systems

elf(5) documents it this way, glibc diverts from this documentation

Change-Id: I815f88f127ef00c88ae827a8ad48df0d33c92467


[ROCm/ROCR-Runtime commit: a621bca303]
2024-08-19 11:02:29 -04:00
Jonathan Kim db44209c11 Disable DMABUF IPC iplementation
Current DMABUF implemenation is unstable.  Switch back to legacy
support for now.

Change-Id: I3be871f38c6524b0bcc9225bab61de4e57771efb


[ROCm/ROCR-Runtime commit: ea646cf958]
2024-08-12 13:14:14 -04:00