rocm-systems

Auteur	SHA1	Message	Date
pghoshamd	bc20b51f40	SWDEV-561708 Counted queue size from env var (#2844 ) * SWDEV-561708 Counted queue size from env var * use counted_queue_size for test * remove rocrtst changes; add a const for default queue size * Remove env var from test; use queue->size * Improve env var documentation * Correct type	2026-01-29 10:00:37 -05:00
German Andryeyev	196baa4321	rocr: Fix static build in Windows (#2660 )	2026-01-21 18:44:51 -05:00
pghoshamd	793755532f	SWDEV-561708 Initial shared queue pool apis (#1614 ) * SWDEV-561708 Initial shared queue pool apis * Validate params; some fixes in callback function (but still needs to be checked) * Dtor cleanup * minor * Enable profiling; remove callback since aql_queue takes care of it * setPriority and setCuMask APIs updated for counted queues * Increasing step and minor version for rocprofiler * Tests for CountedQueueManager * tests * Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending * Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added * Changed to ASSERT_SUCCESS macros for all tests * RIng buffer overflow test added * tests fixed; cleanup added at hsa_shutdown * priority conversion table changes * Compiler warnings fixed * Rewrite 1 test; add desc and improve SetUp() code * Improvement * Unififed getinfo for both counted and non-counted queues * Address PR feedback * Addressing feedback: memleak, data type mismatch, documentation * improve comment * format * Missing HSA_API macros for roctracer * Revert "Addressing feedback: memleak, data type mismatch, documentation" This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de. * Improving acquire api doc * release api doc improved * error codes for release api doc	2026-01-21 15:30:04 -05:00
Alysa Liu	9139f5a241	Revert "rocr: Switch back to legacy IPC (#1744 )" (#2676 ) This reverts commit `7e4b62290c`.	2026-01-20 14:34:10 -05:00
hongkzha-amd	b3c4e94e70	rocr: Improve memory protection and WSL compatibility (#2274 ) * rocr: Add ProtectMemory API and use it in RemoveAccess Replace munmap + mmap with mprotect when removing memory access. This improves performance by 5-10x, ensures atomicity (no race condition window), and prepares for WSL/DXG compatibility fixes. Suggested-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> * rocr: Skip CPU mapping operations on WSL On WSL, CPU cannot access GPU VRAM due to platform restrictions. CPU access would fault-in system RAM instead, causing data corruption and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than silently creating broken mappings. GPU-to-GPU mappings remain functional. Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> * rocr: reduce ifdef linux v2: Fix IsDXG check logic Signed-off-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> --------- Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Signed-off-by: David Yat Sin <David.YatSin@amd.com> Signed-off-by: Flora Cui <flora.cui@amd.com>	2026-01-13 12:08:20 -06:00
pghoshamd	637b0d71f0	SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers (#2146 ) * SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers * Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex * Replaced unique_locks with lock_guards * More changes * Replace new and deletes with smart pointers * Replaced some more with shared ptrs * Replacements with smart pointers - pt 2 * missed change	2026-01-06 10:59:34 -05:00
Rahul Manocha	dd4bee33ff	SWDEV-558848 - Update thunk interface signature for vmm enablement (#2259 ) Co-authored-by: Rahul Manocha <rmanocha@amd.com>	2025-12-11 08:43:28 -08:00
Rahul Manocha	0c1f87a7f6	SWDEV-558848 - vmm api support for rocr on windows (#1761 ) * SWDEV-558848 - vmm api support for rocr on windows * Fixes to VMM handle Map/Unmap Set/Get Access * Fix GetShareableHandle to use pointer for shareable handle * Update os specific map/unmap memory calls * clang format update * Minor syntax fixes from code review Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com> --------- Co-authored-by: Rahul Manocha <rmanocha@amd.com> Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>	2025-12-10 08:39:51 -08:00
Mario Limonciello	bc5d48e76c	Run pre-commit's whitespace related hooks on projects/rocr-runtime (#2130 ) * Run pre-commit's whitespace related hooks on projects/rocr-runtime In order for pre-commit to be useful, everything needs to meet a common baseline. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> * Add missing semicolon which would block compilation on big endian CPUs Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> --------- Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>	2025-12-08 07:56:50 -06:00
cfreeamd	24c2a84e3f	rocr: GPU core file location support (#1732 ) * rocr: WIP Support dump of GPU core file * WIP new core dump tests compile * WIP: anony namespaces, test updates, progress Added disabled Fault test. Other non-disabled coredump tests don't work. * WIP: address code review feedback * WIP: gpu core dump rocrtst works; combined * WIP: remove rocrtst changes for this commit	2025-11-20 18:50:51 -08:00
David Yat Sin	7e4b62290c	rocr: Switch back to legacy IPC (#1744 ) Switch back to legacy IPC Implementation while we fix some race conditions.	2025-11-13 09:41:55 -05:00
German Andryeyev	ee1158b7b8	rocr: Fix Windows build and Ctz implementation (#1634 )	2025-11-03 12:07:11 -05:00
Rahul Manocha	4f075902fc	SWDEV-555347 - Remove lock contention in async events loop (#878 ) * SWDEV-555347 - Remove lock contention in async events loop * SWDEV-555347 - Introduce Pool of AsyncEventItems * create generic mempool for AsyncEventItem * Use BaseShared allocate and free for async event pool --------- Co-authored-by: Rahul Manocha <rmanocha@amd.com>	2025-10-24 08:43:00 -07:00
systems-assistant[bot]	bebe65f104	rocr: fix nullptr dereference (#262 ) * rocr: fix nullptr dereference Return early in the case that malloc fails to avoid dereferencing of a null pointer on eventDescrp. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> * rocr: Fix potential nullptr dereference returns early if sym->section() fails to properly acquire the object. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> --------- Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>	2025-10-21 13:49:01 -04:00
axie_amdeng	dde482d224	rocr: unitialized size variable caused huge memory/space allocation (#1232 ) Signed-off-by: Alex Xie <AlexBin.Xie@amd.com>	2025-10-14 16:57:10 -04:00
German Andryeyev	913743d433	Add windows build support into ROCr (#912 ) Make sure ROCR can be compiled under windows. Extra setup for the windows build environment is required. The change should not have any functional changes under Linux.	2025-09-19 10:10:17 -04:00
systems-assistant[bot]	f1fabcfd64	rocr: Error Handling Issues (#264 ) * rocr: Fix Incorrect Assertion Check The wrong variable is used in the assertion statement, should be error checking for the value of paramEndLoc after it is modified by the call to find(). Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> * rocr: Fix Potential Undefined Behaviour In the event that the SvmProfileControl destructor is called and event == -1 is true then the call to close(event) is effectively close(-1) which is undefined behaviour. This has been changed to only call close() on valid file descriptors. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> * rocr: Add Error Check on Bytes Read In the case that there is an incomplete read the call to copyTo() will now return an error. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> * rocr: Fix Exception Error Destructors are implicitly marked with noexcept being true by default so if its not explicitly marked false in the destructor or the functions it calls, any thrown exceptions will cause the program to crash. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> --------- Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>	2025-09-16 09:43:45 -04:00
Flora Cui	e7cb108a5e	[rocr-runtime] Add support for WSL DXG devices (#854 ) * rocr/rocdxg: add rocdxg support * rocr/dxg: set flags for dxg env * rocr: ring doorbell for dtif/dxg * rocr/dxg: sdma changes 1. align command size to 64 2. call hsaKmtQueueRingDoorbell 3. disable gcr && hdp flush Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Signed-off-by: tiancyin <tianci.yin@amd.com> Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>	2025-09-09 10:16:57 +08:00
systems-assistant[bot]	83a10986a4	SWDEV-539130 - Log blit copy duration (#258 ) Co-authored-by: Pengda Xie <pengda.xie@amd.com>	2025-09-03 10:01:47 -07:00
jokim-amd	700afd2d17	Re-Enable IPC DMA Bufs by default Let ROCr use the new IPC-DMA bufs path.	2025-08-14 18:49:09 -04:00
mat3ix	c41050d01f	rocr: SDMA improvements (#326 ) - When SDMA queue gets full when copying 2GB or more it blocks async copy api - Improve/format logging	2025-08-13 10:25:29 -04:00
zichguan-amd	0c698557a0	rocr: check _SC_LEVEL1_DCACHE_LINESIZE before use Support musl Fixes ROCm/ROCR-Runtime#318 Signed-off-by: zichguan-amd <zichuan.guan@amd.com> [ROCm/ROCR-Runtime commit: `7946ddb647`]	2025-07-14 14:44:31 -04:00
Sunday Clement	90e35e8486	rocr: Remove Recursive Include Removed unnecessary header inlude in file to prevent circular include. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> [ROCm/ROCR-Runtime commit: `31b6474801`]	2025-06-13 12:29:52 -04:00
Sunday Clement	1da312af87	rocr: Fix Potential Deadlock Moved the Call to pthread_mutex_lock to an else statement for better code readibility. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> [ROCm/ROCR-Runtime commit: `1635746a9c`]	2025-06-04 10:18:09 -04:00
Sunday Clement	25886ecda8	rocr: Fix Potential Deadlock Because eventDescrp->mutex is a non-recursive lock attempting to acquire the lock with pthread_mutex_lock can cause the system to hang indefinitely if the lock was already previously aquired with the preceeding call to pthread_mutex_trylock. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> [ROCm/ROCR-Runtime commit: `a97b7df4b9`]	2025-06-04 10:18:09 -04:00
Alysa Liu	88dd451c64	rocr: Fixed inefficient copy operations Changed variable assignments to use std::move() where appropriate Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> [ROCm/ROCR-Runtime commit: `369d89ade3`]	2025-06-02 11:18:36 -04:00
Sunday Clement	3d3cca8083	rocr: Fix Resource Leak allocated memory was previously not freed in the event of an error with rwlock initialization. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com> [ROCm/ROCR-Runtime commit: `293092f32f`]	2025-05-30 09:16:26 -04:00
David Yat Sin	1b1d4e017a	rocr:Fix compile warnings [ROCm/ROCR-Runtime commit: `11da1293de`]	2025-05-28 16:12:02 -04:00
David Yat Sin	342e478e7d	rocr: Perform memcpy for small code-object loads On large BAR systems, for small-sized code-objects, we get performance using direct memcpy due to latencies when doing the blit-copy. [ROCm/ROCR-Runtime commit: `da2607024b`]	2025-05-22 18:39:19 -04:00
Aaron Liu	137b168b46	rocr/dtif: add dtif environment variable Using HSA_ENABLE_DTIF to control dtif/native thunk code path Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `166b0fa45a`]	2025-05-13 16:44:31 -04:00
Tony Gutierrez	6f37386eb2	rocr: Flags to alloc queue buf/struct in dev mem This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. Flags are added to support the various use cases. [ROCm/ROCR-Runtime commit: `6e3c375bf1`]	2025-04-23 15:53:29 -04:00
lyndonli	e9c934c116	rocr: Remove redundant Refresh() call The initial call to Refresh() in the constructor is unnecessary as it's handled in Runtime::Load(). Signed-off-by: lyndonli <Lyndon.Li@amd.com> [ROCm/ROCR-Runtime commit: `c34a2798ce`]	2025-03-25 09:13:59 -04:00
David Yat Sin	e130172218	rocr: Put back scratch_backing_memory_byte_size The scratch_backing_memory_byte_size is not used by CP, but it is currently used by rocgdb. Putting the field back, but we need to find a solution for alt_scratch_backing_memory_byte_size. Also, completely disabling alternate scratch as we need some changes to support debugger. [ROCm/ROCR-Runtime commit: `02b38d0614`]	2025-03-06 16:23:38 -05:00
David Yat Sin	d93d05bcf1	rocr: Temporarily disable alternate scratch memory Temporarily disable alternate scratch memory usage by default due to some stability issues. [ROCm/ROCR-Runtime commit: `9a950ab788`]	2025-03-03 09:27:29 -05:00
David Yat Sin	5905b82579	rocr: Update for new async scratch reclaim Updating ROCr code to match new handshake protocol with CP FW for asynchronous scratch reclaim. Increase previous limits when scratch reclaim feature is available. [ROCm/ROCR-Runtime commit: `aa2f98e6f9`]	2025-02-19 21:02:00 -05:00
Sv. Lockal	d1507361ec	Fix build issues for musl libc (#267 ) Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9 [ROCm/ROCR-Runtime commit: `5d04bd42f3`]	2025-01-29 14:31:05 +00:00
Yiannis Papadopoulos	428cc5b47c	rocr/aie: Add dma-buf import support for AIEAgents via the Driver interface Change-Id: I70f8d8772dda7c06944d75042cb3034ddd89aff4 [ROCm/ROCR-Runtime commit: `26bfa0b8f6`]	2025-01-27 15:22:46 -05:00
Shweta Khatri	4325142db1	rocr: Use view3dAs2dArray flag, for thick/3D swizzle modes. Added HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag to enable/disable this. Default value is false (view3dAs2dArray = 1) Enabling this flag will enable support for swizzles that do 3D interleaving. Note that all features of 3D images are supported with 2D swizzles,it's just that the access patterns are different and therefore cache hit-rates may be better or worse, depending on how it's used. Volumetric algorithms do better with 3D and apps that tend to access a single slice at a time do better with 2D. Change-Id: Id8574a6710fe4333a1ee331e5ce9195a81434198 [ROCm/ROCR-Runtime commit: `6361466baa`]	2025-01-27 09:28:33 -05:00
David Yat Sin	922b61ddee	rocr: Add thread priority for AsyncEventHandler Set priority to maximum for signal event handler and minimum for exceptions event handler. Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc [ROCm/ROCR-Runtime commit: `7ea25ebb85`]	2025-01-24 10:08:12 -05:00
Eddie Richter	8ea388af92	rocr/aie: AIE Queue Processing Change-Id: I681c971ba7229037ca85d5529838aa7bbe5820e2 [ROCm/ROCR-Runtime commit: `e9cc839b2b`]	2024-12-10 10:50:02 -05:00
Apurv Mishra	baf737a3cb	rocr: declare 'args' as class member in 'os_thread' Removed 'args' as a unique pointer and deletion in 'ThreadTrampoline', then declared as a class member. Change-Id: Ia52058392d0170e8b5e57cfdd2c587f47a6f93f0 Signed-off-by: Apurv Mishra <apurv.mishra@amd.com> [ROCm/ROCR-Runtime commit: `89115369cc`]	2024-11-27 10:27:40 -05:00
David Yat Sin	ed5bbc1eeb	rocr: Fix sem_post overflow errors WaitSemaphore and PostSemaphore are used in the HybridMutex implementation. If HybridMutex did not have to call WaitSemaphore when acquired, then calling PostSemaphore would cause the internal count inside sem_t to slowly grow to large values and eventually cause overflow. Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc [ROCm/ROCR-Runtime commit: `f58aff630c`]	2024-11-13 21:57:26 -05:00
David Yat Sin	3e694d739a	rocr: Add HSA_SIGNAL_WAIT_ABORT_TIMEOUT Add support for abort timeout when hsa_signal_wait_relaxed is called and signal does not clear within timeout. timeout is in seconds Change-Id: If1db5a8af33c82ddc4b48968c3d8eceb97d0ea6d [ROCm/ROCR-Runtime commit: `4ec730f1dc`]	2024-11-13 21:57:02 -05:00
German Andryeyev	6617af10e6	rocr: Disable WaitAny() in AsyncEventsLoop() - Add the new path to avoid WaitAny() calls in AsyncEventsLoopp() with HSA_WAIT_ANY_DEBUG key. The new path is selected by default. The optimizaiton combines all logic of WaitAny() in a single processing loop and avoids extra memory allocations or ref counting. Also it won't spin on the CPU if all events are busy. Change-Id: I197ce60d0d023fbb672f700d6e87702686f1f55a [ROCm/ROCR-Runtime commit: `0fc7369ba5`]	2024-10-25 14:37:02 -04:00
Jonathan Kim	ff4690de61	rocr: Fix IPC DMA Buf fragment handling and enable for development Discarding blocks for reallocation on IPC export for better memory performance trigger memory violations with DMA BUF exports so bypass this for now as application performance drops haven't been observed with the bypass. The raw fragment should be passed to the DMA Buf export call as well since offsets will be implicitly applied in the Thunk/KFD for export/import calls. Also, use the agent information directly from the pointer information so that the export call doesn't have to scan memory to find this. Pass the node ID in the handle so that the import call doesn't have to make two thunk imports to fetch the node ID for GPU memory imports. Finally, allow the user to use DMA Buf IPC via HSA_ENABLE_IPC_MODE_LEGACY=0 for developer testing as legacy mode will be applied by default. Change-Id: Ie8fe267f8768fa5df37126078406f7065f69ff4e [ROCm/ROCR-Runtime commit: `32bb0764b7`]	2024-09-27 14:40:42 -04:00
Saleel Kudchadker	8d1fe1f7ea	rocr: Allocate AQL queue on device memory - Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device memory. - Before writing AQL packet header to the queue use an SFENCE to ensure that there is no reodering of the writes over PCIE Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d [ROCm/ROCR-Runtime commit: `3baaa6e9c0`]	2024-09-05 17:48:02 -04:00
David Yat Sin	de85c5738e	rocr: Handle pthread_create returning errors Rewriting logic to fix issue where pthread_create would return errors other than EINVAL, and these errors would be ignored. Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06 [ROCm/ROCR-Runtime commit: `c8dd4d2b3b`]	2024-08-22 12:15:10 -04:00
Jonathan Kim	b6aa5a4c09	rocr: Memory copy based on recommended SDMA engines Recommended SDMA engines for DMA copies are now exposed for better GPU-GPU performance. ROCr can now select those DMA engines. Also lock-in host-device copies to SDMA0 and device-host copies to SDMA1 for better stability and performance. Change-Id: Ideff2e13daf537104efecb8b837bd49ee5096cb5 [ROCm/ROCR-Runtime commit: `eb30a5bbc7`]	2024-08-20 16:22:32 -04:00
James Xu	e5d7121245	Fix compile errors with musl>=1.2.3 Patch submitted on behalf of user AngryLoki: The fix repeats common pattern, used for musl, e.g: https://github.com/void-linux/void-packages/blob/5ccf1c66a1df2d644e1a0db0a68fca321469c57e/srcpkgs/MangoHud/patches/0001-elfhacks-d_un.d_ptr-is-relative-on-non-glibc-systems.patch#L90. Quoting: d_un.d_ptr is relative on non glibc systems elf(5) documents it this way, glibc diverts from this documentation Change-Id: I815f88f127ef00c88ae827a8ad48df0d33c92467 [ROCm/ROCR-Runtime commit: `a621bca303`]	2024-08-19 11:02:29 -04:00
Jonathan Kim	db44209c11	Disable DMABUF IPC iplementation Current DMABUF implemenation is unstable. Switch back to legacy support for now. Change-Id: I3be871f38c6524b0bcc9225bab61de4e57771efb [ROCm/ROCR-Runtime commit: `ea646cf958`]	2024-08-12 13:14:14 -04:00

1 2 3 4

169 Révisions