rocm-systems

Автор	SHA1	Сообщение	Дата
SaleelK	5c7c549301	clr: Fix some nullptr checks and prints (#2825 )	2026-01-27 16:45:17 -08:00
SaleelK	340f3aa887	clr: Implement dynamic stream to HWq logic (#1958 ) * clr: Implement dynamic stream to HW queue assignment This change implements dynamic stream to hardware queue (HWq) mapping with the following features: * Queue depth heuristics with weights for optimal HWq assignment * Make last used queue sticky for better locality * Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to pipe mapping based on creation order (single process per device only, as pipe ID is statically assigned by runtime) * More aggressive heuristic usage for better queue distribution * Extend dynamic queues support for all stream priorities Environment variables: * DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 - Depth+Pipe heuristics * DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation * clr: Clean up last_used_queue_	2026-01-23 10:40:54 -08:00
Tao Sang	163e44d0a8	SWDEV-555889 - Support mipmap on rocr (#2082 ) * SWDEV-555889 - Support mipmap on rocr Support mipmap in hip-rt on rocr backend. Enable all mipmap tests in Windows. Some other minor improvement. Add some SRD logs that will be removed finally. * Add sampler.mipFilter to fix sampler issues on mipmap in rocr. Fix format issues of view of leveled image and mipmap image in blit kernel in rocr. Enabled disabled mipmap tests. * Rewrite view logic * Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues. Reset last error in nagative tests. * Remove SRD dump log from hip-rt Let Rocr mipmap log be in condition. * minor format chang * Exclude mipmap tests for mi200+ which don't support mipmap.	2026-01-21 09:10:29 -08:00
German Andryeyev	db792fac37	SWDEV-558849 - Add support for static linking with ROCR (#2659 )	2026-01-20 14:53:01 -05:00
Filip Jankovic	29cd25df66	Add hipDeviceAttributeExpertSchedMode (#2435 ) * Add hipDeviceAttributeExpertSchedMode --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com> * Update hipDeviceAttributeExpertSchedMode unit test * Move check to ROCr from thunk interface * Revert unrelated whitespace changes * Revert version bump --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>	2026-01-15 08:41:39 -08:00
David Yat Sin	a3b445118d	SWDEV-519413 - Ignore ROCr shutdown events (#1616 ) ROCr now reports a shutdown event, but this is not a fatal error. Ignore this event.	2026-01-14 11:28:03 -08:00
SaleelK	6b28faa532	clr: Implement per-stream SDMA engine affinity for improved copy performance (#2480 ) Problem: The existing SDMA engine selection logic had several issues: 1. Same VirtualGPU/stream could use different SDMA engines for consecutive async copies since copy_engine_status may report engines as busy 2. Busy and Preferred engine check for every copy 3. No global tracking of which VirtualGPU uses which engine, leading to suboptimal resource allocation Solution: Implemented a global SDMA engine allocator with per-stream affinity: - Added Device::SdmaEngineAllocator to manage VirtualGPU → engine assignments * Maintains global map of active assignments * Enforces exclusivity: different streams use different engines (except inter-GPU copies where preferred engines are prioritized for optimal hardware paths like XGMI links) * Thread-safe allocation/release with Monitor lock - Modified VirtualGPU to cache assigned engine locally (assigned_sdma_engine_) for fast lookup without map access on hot path - Refactored rocrCopyBuffer() to: 1. Check local cached engine first → use if assigned 2. Call AllocateSdmaEngine() if not assigned → cache result - Moved HSA API queries (memory_copy_engine_status, memory_get_preferred_copy_engine) into AllocateEngine() for cleaner separation of concerns - Engine release on HostQueue::finish() instead of only VirtualGPU destruction * Improves engine utilization by releasing earlier * Added virtual ReleaseSdmaEngines() method to device::VirtualDevice - Added future path for simple round-robin allocation (kUseSimpleRR) for next-gen GPUs with uniform SDMA bandwidth (disabled by default) Cleanup: - Removed selectSdmaEngine() helper (logic moved to allocator) - Removed getSdmaRWMasks() (allocator accesses maxSdmaReadMask_/WriteMask_ directly) - Removed unused sdmaEngineReadMask_/WriteMask_ member variables from DmaBlitManager Benefits: - Ensures consistent per-stream SDMA engine usage - Prevents cross-stream contention and engine thrashing - Prioritizes hardware-optimal paths for inter-GPU transfers - Better resource utilization through earlier release - Cleaner, more maintainable code structure	2026-01-07 19:37:45 -08:00
Shadi Dashmiz	2789ea429a	SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296 ) Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>	2026-01-05 11:33:11 -05:00
Maneesh Gupta	4a9833e70e	Revert "Add HasExpertSchedMode device prop (#2241 )" (#2371 ) This reverts commit `c0b4aef5ad`.	2025-12-17 21:26:44 -08:00
Filip Jankovic	c0b4aef5ad	Add HasExpertSchedMode device prop (#2241 ) * Add HasExpertSchedMode device prop * Add unit tests for HasExpertSchedMode * Add gfx12 check for HasExpertSchedMode prop * Update gfx major version check and test for ExpertSchedMode * Minor fix and ROCr version bump * Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h * Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h * Apply suggestion from @dayatsin-amd * Apply suggestion from @dayatsin-amd --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com> Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>	2025-12-17 17:06:08 +01:00
German Andryeyev	3895aadba6	SWDEV-558849 - Make ROCR path in Windows more stable (#2181 )	2025-12-10 12:37:10 -05:00
Jin Jung	deaf8ab38a	SWDEV-567119 - Windows GL Interop Support (#1892 )	2025-12-08 11:03:59 -05:00
AidanBeltonS	d849b88aef	SWDEV-558080 - Add recommended granularity (#1176 ) * Add recommended granularity * Improve granularity testing * Update based on feedback	2025-11-26 16:10:58 +00:00
Karthik Jayaprakash	740a06d567	SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160 )	2025-11-25 19:25:32 -05:00
Victor Zhang	7580052878	SWDEV-564318 - Add support for allocating uncached device memory (#1670 )	2025-11-09 12:51:41 -05:00
Sam Ruscica	757de39caa	Updated amdFileRead/Write in rocdevice to support windows build (#1435 ) * Updated amdFileRead in rocdevice to support windows build * Updated amdFileRead in rocdevice to support windows build	2025-11-04 10:03:03 -05:00
MachineTom	5f76cb916d	SWDEV-555888 - Refactor Numa code (#1191 ) 1. Create a set of mini numa interface. In Linux, the interface is based on system call rather than libnuma. In Windows, the interface can also work, but the policy class is dummy. Different from Linux, Windows doesn't provide numactl tool or numa lib to setup numa policy, thus the default policy is followed in Windows, that is, using the closest host numa node to allocate pinned host memory in hipHostMalloc(). To get the closest host numa node of a GPU device, you need query the new attribute hipDeviceAttributeHostNumaId. Then you can create a thread with CPU affinity on the numa node. For example, reference the test in hip-tests/catch/perftests/memory/hipPerfHostNumaAllocWin.cc. 2. Remove pfnSetThreadGroupAffinity and pfnGetNumaNodeProcessorMaskEx as the functions have been exposed since Win7 and Win server 2008. 3. Other minor fixes.	2025-10-23 21:56:15 -04:00
Pengda Xie	a4bbd73dc6	SWDEV-556684 - Remove HSAIL support (#1183 )	2025-10-23 11:21:49 -07:00
Jimbo	37f2be9140	SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister (#962 ) * SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister * SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister * SWDEV-554174 Added hipHostRegisterIoMemory flag in test cases * SWDEV-554174 : Did formatting corrections * SWDEV-554608 - set HSA_AMD_MEMORY_POOL_UNCACHED_FLAG if IoMemory is set * SWDEV-554608 - set HSA_AMD_MEMORY_POOL_UNCACHED_FLAG if IoMemory is set * SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister --------- Co-authored-by: Anavena Venkatesh <Anavena.Venkatesh@amd.com> Co-authored-by: Rambabu Swargam <rambabu.swargam@amd.com>	2025-10-22 20:25:59 -04:00
Pengda Xie	29c2ca8cbc	SWDEV-559867 - Fix CU mask printing (#1328 )	2025-10-20 09:47:17 -07:00
Sam Ruscica	135c38b41c	SWDEV-553436 Created wrapper functions for file read and file write (#935 )	2025-10-07 09:42:22 -04:00
MachineTom	25922d08c3	SWDEV-539145 - Return error when ext_fine_grain_pool unavailable (#877 ) Return error when ext_fine_grain_pool is unavailable for hipHostMallocUncached, hipHostAllocUncached and hipExtHostRegisterUncached. Disable related tests on Navi4x where ext_fine_grain_pool is unavailable	2025-09-21 19:25:28 -04:00
German Andryeyev	ea89ddd589	SWDEV-547108 - Add dll loader for Windows build (#1004 ) The build of ROCR backend will be enabled by default in Windows. It requires the dll loader until ROCR dll will be always available in Windows for any configuration.	2025-09-19 11:25:30 -04:00
lancesix	45b48fb987	SWDEV-555043 - Do not wait on signal if gpu in error state (#1023 ) During a process tear-down we wait on all signals before releasing them: VirtualGPU::HwQueueTracker::~HwQueueTracker() { for (auto& signal : signal_list_) { CpuWaitForSignal(signal); signal->release(); } [...] } In the case where we exit the process after a GPU error that did not cause an abort (ulimit -c == 0), waiting for the signal can be skipped. With the device on the error state, no progress is made, and the signal is probably never going to be modified again: inline bool WaitForSignal(hsa_signal_t signal, bool active_wait = false, bool yield = false) { [...] if (HIP_SKIP_ABORT_ON_GPU_ERROR && amd::Device::IsGPUInError()) { ClPrint(amd::LOG_ERROR, amd::LOG_SIG, "Device not Stable, while waiting for Signal =" "(0x%lx) for %d ns", signal.handle, kTimeout4Secs); return true; } [...] } However, after calling CpuWaitForSignal, when calling "release", we can end-up on a signal dtor which also tries to wait on the signal. Because the GPU is the error state, we never receive the signal, and hang the process during tear down. This happens with the ProfilingSignal dtor: ProfilingSignal::~ProfilingSignal() { if (signal_.handle != 0) { if (hsa_signal_load_relaxed(signal_) > 0) { LogError("Runtime shouldn't destroy a signal that is still busy!"); if (hsa_signal_wait_scacquire(signal_, HSA_SIGNAL_CONDITION_LT, kInitSignalValueOne, kUnlimitedWait, HSA_WAIT_STATE_BLOCKED) != 0) { } } hsa_signal_destroy(signal_); } } This dtor should check that the GPU is not in the error state before trying to wait, which is what this patch implements. Bug: SWDEV-555043 Bug: SWDEV-553435 Bug: SWDEV-553679 Bug: SWDEV-555119	2025-09-18 14:32:04 +01:00
Ioannis Assiouras	5c1eebab84	SWDEV-543723 - Change agentInfo parameter in hostAlloc to void* (#995 )	2025-09-18 11:43:15 +01:00
Ioannis Assiouras	35629e433d	SWDEV-546146 - Added support for hipMemLocationTypeHost in hipMemSetAccess (#682 )	2025-09-10 23:06:20 +01:00
SaleelK	e197aa83ba	SWDEV-543723 - Execute permission for kernArg buf (#728 ) - Refactor deviceLocalAlloc arguments - Refactor hostAlloc code, have cleaner interface - Kern args buffer need to have execute flag set as CP enforces this on certain newer HW.	2025-09-08 12:21:30 -07:00
SaleelK	c4537e8050	SWDEV-553126 - Improve logging (#835 ) * Ability to mask COPY api usage in logs * Show total graph nodes in logs * Add another log level for detailed debug	2025-09-04 10:08:41 -07:00
Ajay GunaShekar	f2ad8d6d5e	SWDEV-553099 - remove WITHOUT_HSA_BACKEND usage (#831 )	2025-09-03 08:40:25 -07:00
Ioannis Assiouras	a1c30318fb	SWDEV-546223 - Get image support info from ISA meta (#773 )	2025-09-02 15:05:18 +01:00
Ioannis Assiouras	5f525ee934	SWDEV-550882 - Expect HSA_EXT_POINTER_TYPE_RESERVED_ADDR pointer type from hsa_amd_pointer_info for hmm (#733 )	2025-08-27 19:42:13 +01:00
Karthik Jayaprakash	89070536c0	SWDEV-552141 - Fix handle/fd type passed from app to align with spec. (#759 ) * SWDEV-552141 - Fix handle/fd type passed from app to align with spec. * SWDEV-552141 - Fix handle/fd type passed from app to align with spec.	2025-08-27 14:28:53 -04:00
Danylo Lytovchenko	2ff2316227	Adjust clang format to the new versions, revert broken macro layout (#714 )	2025-08-22 17:23:22 +02:00
Danylo Lytovchenko	f7338717ae	SWDEV-470698 - fix formatting, add format check workflow (#657 )	2025-08-20 19:58:06 +05:30
Andryeyev, German	72b9408fed	SWDEV-547108 - Fix compilation errors under Windows (#867 ) Interop and numa are not enabled. [ROCm/clr commit: `0ac913e64c`]	2025-08-17 02:33:31 -04:00
Betigeri, Sourabh	35e48d1eaf	SWDEV-546293 - hipMemPrefetchAsync_v2 and hipMemAdvise_v2 implementation (#869 ) SWDEV-546293 - hipMemPrefetchAsync hipMemAdvise_v2 Please enter the commit message for your changes. Lines starting [ROCm/clr commit: `cbee74a80e`]	2025-08-15 22:40:04 -07:00
Manocha, Rahul	b3ccf487da	SWDEV-545952 - API definitions for hipStreamSet/GetAttribute (#831 ) Co-authored-by: Rahul Manocha <rmanocha@amd.com> [ROCm/clr commit: `0f49c4a97f`]	2025-08-15 12:51:35 -07:00
Andryeyev, German	6df9a49437	SWDEV-465041 - Add support for user events with DD (#321 ) * SWDEV-465041 - Add support for user events with DD User events can be replaced with HSA signals. Add the interface to allocate HSA signal for user events and update the status on CL_COMPLETE. Force pinned path with DD to avoid blocking calls. Pinned memory can be released only when the command is complete. Simplify device enqueue path to use generic kernel arg buffer and signals * Fix notifyCmdQueue() logic for OCL * Avoid blocking calls in OCL with DD * Add event destruciton in a case of the failure. [ROCm/clr commit: `2305f8ae56`]	2025-08-12 19:04:36 -04:00
Manocha, Rahul	4a93a614e5	SWDEV-539710 - Defer allocation of managed variable (#652 ) Co-authored-by: Rahul Manocha <rmanocha@amd.com> [ROCm/clr commit: `3f6f9d6081`]	2025-07-31 08:30:23 -07:00
Betigeri, Sourabh	680c7fbf64	SWDEV-483895 - Use ROCr to reserve memory for HMM (#590 ) [ROCm/clr commit: `a1f056bd11`]	2025-07-15 21:40:13 +05:30
Sang, Tao	a777d6eca0	SWDEV-539145 - Simplify host memory pool management (#668 ) * SWDEV-539145 - Simplify host memory pool management Remove unnecessary variables and functions. Make code simpler and clear. * Change cpu_agent_info_ into pointer. * Restore getPreferredNumaNode() [ROCm/clr commit: `1351cd7fa8`]	2025-07-11 10:38:40 -04:00
Sang, Tao	7fd322af90	SWDEV-508776 - support VGPRs validation (#274 ) Clarify some VGPRs terms description. Fix some wrong query logics of availableVGPRs_ and availableRegistersPerCU_ in device info. Add hipDeviceAttributeMaxAvailableVgprsPerThread attribute query. Remove hardcoding of following info_.vgprAllocGranularity_ info_.vgprsPerSimd_ [ROCm/clr commit: `397f303d97`]	2025-07-09 10:46:52 -04:00
Xie, Pengda	8e5921174f	SWDEV-540576 - Abort if user request a core dump (#653 ) * SWDEV-539414 - Return error status from runtime handler when HIP_SKIP_ABORT_ON_GPU_ERROR is false * SWDEV-539414 - default handler when GPU core file is generated * SWDEV-540576 - Abort if user request a core dump Change-Id: I9e2c640acf559880bd13641de9103e660ef822a3 --------- Co-authored-by: Assiouras, Ioannis <Ioannis.Assiouras@amd.com> Co-authored-by: agunashe <ajay.gunashekar@amd.com> [ROCm/clr commit: `9c1bff0ae7`]	2025-07-03 00:29:14 +05:30
Sang, Tao	287ec83f5c	SWDEV-539145 - Support extended fine grained system memory pool (#603 ) * Add hipHostMalloc() new flag hipHostMallocUncached which will force to allocate pinned host memory on extended fine grained system memory pool. * Add hipHostAlloc() new flag hipHostAllocUncached which will force to allocate pinned host memory on extended fine grained system memory pool. * Add hipHostRegister() new flag hipHostRegisterUncached which will force to map host memory onto extended fine grained system momory pool. [ROCm/clr commit: `a7d7687b8f`]	2025-07-01 10:10:10 +05:30
Sang, Tao	da7c6f57ef	SWDEV-539399 - Support ROCCLR_MEM_HSA_CONTIGUOUS (#610 ) Fix missing support of ROCCLR_MEM_HSA_CONTIGUOUS in hipExtMallocWithFlags(). [ROCm/clr commit: `f4d78d427c`]	2025-06-26 14:05:23 -04:00
Kudchadker, Saleel	3c9f80b4e4	SWDEV-535490 - Improve logging (#441 ) - Include HIP version/githash in the logs - Add a new method to print library path [ROCm/clr commit: `5d53c83806`]	2025-06-24 13:00:00 -07:00
Andryeyev, German	fc2f5aaf22	SWDEV-533074 - Expose hipDeviceAttributeNumberOfXccs attribute (#336 ) [ROCm/clr commit: `5cc172c99c`]	2025-05-21 22:35:42 +05:30
Jayaprakash, Karthik	a17e22a78c	SWDEV-529929 - hipMemGetHandleForAddressRange implementation. (#245 ) [ROCm/clr commit: `12131de4a9`]	2025-05-20 15:56:04 -04:00
Jayaprakash, Karthik	4ea2d9a5ee	SWDEV-531711 - Report correct error code based on device failure. (#286 ) [ROCm/clr commit: `f5b8db33f1`]	2025-05-17 06:33:13 -04:00
Assiouras, Ioannis	4efd624960	SWDEV-525593, SWDEV-527293 - Acquire active queue after xferQueue is created (#165 ) For xferQueue VirtualGPU::create is called after ProfilingBegin so the active queue needs to be acquired. [ROCm/clr commit: `d3fb8eda8b`]	2025-04-30 09:21:11 +01:00

1 2 3 4 5 ...

344 Коммитов