rocm-systems

Автор	SHA1	Сообщение	Дата
David Yat Sin	b83b8b4535	rocr: Remove deprecated queue doubleMap code [ROCm/ROCR-Runtime commit: `4bae509296`]	2025-05-28 16:12:02 -04:00
David Yat Sin	e84a855c98	rocr: Remove queue_full_workaround code Remove deprecated queue_full_workaround code as gfx7 and gfx8 GPUs are EoL. [ROCm/ROCR-Runtime commit: `b8434529a5`]	2025-05-28 16:12:02 -04:00
David Yat Sin	a16f5380cd	rocr: Remove addrlib files for EoL GPUs [ROCm/ROCR-Runtime commit: `2b691c3d5f`]	2025-05-28 16:12:02 -04:00
David Yat Sin	4ecd0382b7	rocr: update required CP FW version Update required CP FW version required for async-scratch memory support on gfx950. [ROCm/ROCR-Runtime commit: `04dbf769f6`]	2025-05-28 13:03:58 -04:00
David Yat Sin	5e7bd6145d	rocr: Fix compile error when using clang [ROCm/ROCR-Runtime commit: `9d38ca0d22`]	2025-05-27 23:56:28 -04:00
Apurv Mishra	226d8126c9	kfdtest: Disable KFD RAS test case disable KFD RAS test case as the tests cause GPU reset which affects the active kfdtest, the tests can only be run successfully as separate processes Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com> [ROCm/ROCR-Runtime commit: `d9a95605cc`]	2025-05-27 19:04:04 -04:00
cfreeamd	c20e30db93	rocr: Support unmap adjacent mem sections in 1 try [ROCm/ROCR-Runtime commit: `f0ce7a8e59`]	2025-05-27 15:13:20 -04:00
Alysa Liu	296e60d882	rocr: Add check for 'value' pointer Replaces assertion check assert(value) with explicit null pointer check Returns HSA_STATUS_ERROR_INVALID_ARGUMENT on null valuesrocr: Add check for 'value' pointer Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> [ROCm/ROCR-Runtime commit: `625425326d`]	2025-05-27 12:18:04 -04:00
Alysa Liu	8cbabdbbe3	rocr: Unchecked return value as arg v1: Add value pointer validation before dereferencing in GetInfo method for MODULE_NAME case. Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> [ROCm/ROCR-Runtime commit: `f1f34da4f6`]	2025-05-27 12:18:04 -04:00
cfreeamd	7fe67829ef	rocr: Fix ISA generic's for gfx906 wrt sramecc gfx9-generic cannot support sramecc- and sramecc+. sramecc feature is only configurable on gfx906. The code object produced for gfx9-generic can be loaded on both gfx906 with any sramecc setting, compiler will produce the isa that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4). [ROCm/ROCR-Runtime commit: `b7361c5ee4`]	2025-05-27 07:45:00 -05:00
cfreeamd	b7d56427ec	rocr: Fix ISA generic's for gfx906 wrt sramecc gfx9-generic cannot support sramecc- and sramecc+. sramecc feature is only configurable on gfx906. The code object produced for gfx9-generic can be loaded on both gfx906 with any sramecc setting, compiler will produce the isa that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4). [ROCm/ROCR-Runtime commit: `3e99bb6150`]	2025-05-27 07:45:00 -05:00
Yifan Zhang	3ab8b5a98b	coredump: call KFD_IOC_DBG_TRAP_DISABLE in error path. KFD assumes kfd_dbg_trap_enable/disable be called in pair, or there will be kfd_process ref leak in KFD. [ROCm/ROCR-Runtime commit: `ccd91bcd19`]	2025-05-27 13:54:00 +08:00
Eric Huang	0d5e261f39	libhsakmt: optimize big system buffer allocation To change biggest single buffer to be huge page aligned and other optimization. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> [ROCm/ROCR-Runtime commit: `afe7965796`]	2025-05-26 18:30:00 -04:00
Eric Huang	2c6f84b12c	libhsakmt: add big system buffer allocation support when allocating userptr buffer in system ram with size bigger than or equal 512G, TTM has limit and returns error, to split one big buffer into multiple small buffers in vm_object will solve this issue. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> [ROCm/ROCR-Runtime commit: `8887d25304`]	2025-05-26 11:04:30 -04:00
Flora Cui	4360679cb7	rocrtst: performance::memory_async_copy test fix on DXG Signed-off-by: Flora Cui <flora.cui@amd.com> [ROCm/ROCR-Runtime commit: `e884650952`]	2025-05-26 15:01:27 +08:00
Amber Lin	9c6828647b	kfdtest: blacklist KFDSVMEvictTest.QueueTest Temporarily blacklist KFDSVMEvictTest.QueueTest on gfx950 Signed-off-by: Amber Lin <Amber.Lin@amd.com> [ROCm/ROCR-Runtime commit: `31d51acb26`]	2025-05-23 01:22:11 -04:00
David Yat Sin	342e478e7d	rocr: Perform memcpy for small code-object loads On large BAR systems, for small-sized code-objects, we get performance using direct memcpy due to latencies when doing the blit-copy. [ROCm/ROCR-Runtime commit: `da2607024b`]	2025-05-22 18:39:19 -04:00
David Yat Sin	9c5bb61708	rocr: Perform range based cache invalidates Invalidate only the address range that covers the newly copied code-object. This avoids invalidating I$ for old code objects and thus might increase I$ hit rate. [ROCm/ROCR-Runtime commit: `e969e01f54`]	2025-05-22 18:39:19 -04:00
Ramakrishnan, Ranjith	85cd72987f	CMake: Remove file reorganization backward compatibility code (#176 ) The feature has already been disabled, and the related source code is no longer required [ROCm/ROCR-Runtime commit: `1785cff6a5`]	2025-05-22 09:47:26 -07:00
Philip Yang	4ac71d1f5d	kfdtest: Add KFDQMTest UserQueueBufValidation Create CP queue and SDMA queue should fail with invalid queue ring buffer or ring buffer size. Test unmap or free queue buffers should fail before queue is destroyed. Use child process to test unmap CWSR buffer will evict queue. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Change-Id: I5dcd51d6b43445d19a986f8b0b82063e20348a5f [ROCm/ROCR-Runtime commit: `bd86fb1e63`]	2025-05-22 10:06:42 -04:00
Philip Yang	50886316e9	libhsakmt: unmap from GPU error handling If unmap from GPU return failed, for example, unmap user queue buffer while queue is active, we should not free obj->mapped_node_id_array, otherwise, the following unmap user queue buffer after queue is destroyed still return failed. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Change-Id: I32aeb18871c2e971d01900d92916c54680f5c9fa [ROCm/ROCR-Runtime commit: `3e6f51b715`]	2025-05-22 10:06:42 -04:00
Apurv Mishra	5c42a9f1bf	kfdtest: Disable tests that cause unwanted behavior disable KFDLocalMemoryTest.Fragmentation and KFDEventTest.MeasureInterruptConsumption as part of the KFD test suite improvement feature Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com> [ROCm/ROCR-Runtime commit: `f853dda9ba`]	2025-05-21 16:29:15 -04:00
Ben Vanik	ba02a7b1ca	kfdtest: Fix SVM profiler QUEUE_RESTORE parsing [ROCm/ROCR-Runtime commit: `d54124383f`]	2025-05-21 13:17:25 -04:00
Ben Vanik	62cd7e1f54	rocr: Fix SVM profiler QUEUE_RESTORE parsing [ROCm/ROCR-Runtime commit: `1a32392912`]	2025-05-21 13:17:25 -04:00
Flora Cui	89e5075ce0	rocr: try defaultSignal for intercept_queue if interrupt is not supported Signed-off-by: Flora Cui <flora.cui@amd.com> [ROCm/ROCR-Runtime commit: `8cf4b7fc05`]	2025-05-21 09:37:47 -04:00
Yiannis Papadopoulos	69505ab60c	Fix formatting [ROCm/ROCR-Runtime commit: `700078d335`]	2025-05-20 13:59:22 -05:00
Yiannis Papadopoulos	38c54b09ac	rocr/aie: Correct operand count [ROCm/ROCR-Runtime commit: `c80616d807`]	2025-05-20 13:59:22 -05:00
David Yat Sin	38ea4370c1	rocr: Fix doorbell ring When compiling with -O0, some compilers generate a xchg instruction for the __atomic_store(...) built-in. Using xchg on MMIO memory is undefined-behavior and may be ignored on certain CPUs. [ROCm/ROCR-Runtime commit: `f011a9506d`]	2025-05-20 09:19:10 -04:00
Aaron Liu	ba372ca4a8	rocrtst/dtif: performance::memory_async_copy test fix on DTIF Signed-off-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Feifei Xu <feifxu@amd.com> Signed-off-by: Longlong Yao <Longlong.Yao@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `297ea78140`]	2025-05-13 16:44:31 -04:00
Jiadong Zhu	b99015f30a	rocr/dtif: use default signal for intercept queue for DTIF Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `0f9d2b836c`]	2025-05-13 16:44:31 -04:00
Aaron Liu	85a11c729c	rocr/dtif: disable interrupt signal for DTIF backend Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `8c1b1201b7`]	2025-05-13 16:44:31 -04:00
Jiadong Zhu	a0dc167541	rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader hsaKmtQueueRingDoorbell is specfic to DTIF backend Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `e2d767879d`]	2025-05-13 16:44:31 -04:00
Aaron Liu	008bbd94d5	rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `e9088d6e47`]	2025-05-13 16:44:31 -04:00
Aaron Liu	c6ffc85a47	rocr/dtif: add DRM APIs wrapper in thunk loader Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `0cd4ddd62b`]	2025-05-13 16:44:31 -04:00
Aaron Liu	6cf184a0d4	rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...) Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `1b79caa214`]	2025-05-13 16:44:31 -04:00
Aaron Liu	87dcbf1255	rocr/dtif: add thunk loader to wrap hsaKmt APIs For native and DTIF backends, unify to use HSAKMT_CALL(...) to call hsaKmt APIs. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `7ba77fb193`]	2025-05-13 16:44:31 -04:00
Aaron Liu	137b168b46	rocr/dtif: add dtif environment variable Using HSA_ENABLE_DTIF to control dtif/native thunk code path Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com> [ROCm/ROCR-Runtime commit: `166b0fa45a`]	2025-05-13 16:44:31 -04:00
Ma, Li	526db8bbaa	rocr: Expose all available DMA engines (#165 ) When copying for inter devices, Currently only XGMI as exposed. Now SDMA0/1 will be exposed as well for inter device copies especially that they are one of the recommended engines. Signed-off-by: Li Ma <li.ma@amd.com> [ROCm/ROCR-Runtime commit: `e38dd98914`]	2025-05-13 17:42:15 +08:00
Hila, Nino	24b8070788	Update palamida.yml (#158 ) * Update palamida.yml Signed-off-by: Hila, Nino <Nino.Hila@amd.com> * Add palamida.yml --------- Signed-off-by: Hila, Nino <Nino.Hila@amd.com> [ROCm/ROCR-Runtime commit: `f5daf75abf`]	2025-05-12 21:37:36 -07:00
Saleel Kudchadker	c0b0cb1788	rocr: Expose hsa_amd_memory_get_preferred_copy_engine api [ROCm/ROCR-Runtime commit: `1eb8694dd2`]	2025-05-09 17:13:27 -07:00
Shane Xiao	f8ac975cd2	rocr: Set rec_sdma_eng_override_ for all gpus Set the rec_sdma_eng_override_ for other gpus, or DmaCopyOnEngine will use sdma for D<->D copy, which will trigger invalid argument. [ROCm/ROCR-Runtime commit: `82a88f2e2b`]	2025-05-08 23:52:12 +08:00
christian-heusel	6c8a2da29a	rocr:Add missing cstdint include [ROCm/ROCR-Runtime commit: `5cc61b714d`]	2025-05-06 20:52:48 -04:00
Searles, Mark	f698518819	Update createMCObjectStreamer() to use new LLVM API (#156 ) (#157 ) * Update createMCObjectStreamer() to use new LLVM API Obsolete interfaces were removed via llvm-project's f2ff298867d7733122e32eead5a8c524b09dfdb1 * Fix typo: LLVM_VERSION -> LLVM_VERSION_MAJOR * Fix typo [ROCm/ROCR-Runtime commit: `ac1e6d59c2`]	2025-05-05 13:18:05 -07:00
Apurv Mishra	aa896090f8	kfdtest: Update ROCr homepage in CMakeLists.txt Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com> [ROCm/ROCR-Runtime commit: `aa0a32a166`]	2025-05-01 11:22:49 -04:00
David Yat Sin	b48b401a09	rocr: Fix logic for scratch reclaim Fix logic error that can cause scratch memory to be reclaimed while a dispatch is still using it. [ROCm/ROCR-Runtime commit: `4ed5950beb`]	2025-04-29 17:23:45 -04:00
Amber Lin	9d98d7479d	kfdtest: Skip SVMEvict with xnack=0 Random driver deadlock on svm_range_evict_svm_bo_worker() is obeserved on NPS2/DPX mode. It's seen with xnack off and happens more often on the partition with less VRAM because of TMR. Temporarily skip SVM Evict tests on Family AV when xnack is disabled. Signed-off-by: Amber Lin <Amber.Lin@amd.com> [ROCm/ROCR-Runtime commit: `5e28208cec`]	2025-04-25 12:45:36 -04:00
Tony Gutierrez	ce61e3301b	rocr: Add large_bar_enabled var to the GPU agent Adds a bool to the GPU agent and a public member method to check if the GPU supports large BAR. This is needed so we can check if large BAR is supported when a user tries to allocate an AQL queue in device memory on a given GPU agent. Also adds an exception to the AQL queue if device-side AQL queues are requested and the GPU owner of the AQL doesn't support large BAR. Otherwise, ROCr will currently allow device-side queues that can cause faults when the user tries to touch their ring buffers and the user will not know why the faults are occuring. This relies on the fact that the KFD does not exposed any links from the CPU to the GPU if large BAR is not enabled (though links from the GPU to the CPU may still be exposed by the KFD). [ROCm/ROCR-Runtime commit: `f2c482d923`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	6f37386eb2	rocr: Flags to alloc queue buf/struct in dev mem This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. Flags are added to support the various use cases. [ROCm/ROCR-Runtime commit: `6e3c375bf1`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	18404ba8a8	rocr: Remove empty shared.cpp [ROCm/ROCR-Runtime commit: `11d1d2cd25`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	3ebcf3020f	rocr/libhsakmt: Add coarse-grain allocator to GPU [ROCm/ROCR-Runtime commit: `adbc0495e2`]	2025-04-23 15:53:29 -04:00

1 2 3 4 5 ...

2890 Коммитов