344 Commits

Author SHA1 Message Date
SaleelK 5c7c549301 clr: Fix some nullptr checks and prints (#2825) 2026-01-27 16:45:17 -08:00
SaleelK 340f3aa887 clr: Implement dynamic stream to HWq logic (#1958)
* clr: Implement dynamic stream to HW queue assignment

This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:

* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
  pipe mapping based on creation order (single process per device only,
  as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities

Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
  Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation

* clr: Clean up last_used_queue_
2026-01-23 10:40:54 -08:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
German Andryeyev db792fac37 SWDEV-558849 - Add support for static linking with ROCR (#2659) 2026-01-20 14:53:01 -05:00
Filip Jankovic 29cd25df66 Add hipDeviceAttributeExpertSchedMode (#2435)
* Add hipDeviceAttributeExpertSchedMode

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>

* Update hipDeviceAttributeExpertSchedMode unit test

* Move check to ROCr from thunk interface

* Revert unrelated whitespace changes

* Revert version bump

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
2026-01-15 08:41:39 -08:00
David Yat Sin a3b445118d SWDEV-519413 - Ignore ROCr shutdown events (#1616)
ROCr now reports a shutdown event, but this is not a fatal error. Ignore
this event.
2026-01-14 11:28:03 -08:00
SaleelK 6b28faa532 clr: Implement per-stream SDMA engine affinity for improved copy performance (#2480)
Problem:
The existing SDMA engine selection logic had several issues:
1. Same VirtualGPU/stream could use different SDMA engines for consecutive
   async copies since copy_engine_status may report engines as busy
2. Busy and Preferred engine check for every copy
3. No global tracking of which VirtualGPU uses which engine, leading to
   suboptimal resource allocation

Solution:
Implemented a global SDMA engine allocator with per-stream affinity:

- Added Device::SdmaEngineAllocator to manage VirtualGPU → engine assignments
  * Maintains global map of active assignments
  * Enforces exclusivity: different streams use different engines (except
    inter-GPU copies where preferred engines are prioritized for optimal
    hardware paths like XGMI links)
  * Thread-safe allocation/release with Monitor lock

- Modified VirtualGPU to cache assigned engine locally (assigned_sdma_engine_)
  for fast lookup without map access on hot path

- Refactored rocrCopyBuffer() to:
  1. Check local cached engine first → use if assigned
  2. Call AllocateSdmaEngine() if not assigned → cache result

- Moved HSA API queries (memory_copy_engine_status, memory_get_preferred_copy_engine)
  into AllocateEngine() for cleaner separation of concerns

- Engine release on HostQueue::finish() instead of only VirtualGPU destruction
  * Improves engine utilization by releasing earlier
  * Added virtual ReleaseSdmaEngines() method to device::VirtualDevice

- Added future path for simple round-robin allocation (kUseSimpleRR) for
  next-gen GPUs with uniform SDMA bandwidth (disabled by default)

Cleanup:
- Removed selectSdmaEngine() helper (logic moved to allocator)
- Removed getSdmaRWMasks() (allocator accesses maxSdmaReadMask_/WriteMask_ directly)
- Removed unused sdmaEngineReadMask_/WriteMask_ member variables from DmaBlitManager

Benefits:
- Ensures consistent per-stream SDMA engine usage
- Prevents cross-stream contention and engine thrashing
- Prioritizes hardware-optimal paths for inter-GPU transfers
- Better resource utilization through earlier release
- Cleaner, more maintainable code structure
2026-01-07 19:37:45 -08:00
Shadi Dashmiz 2789ea429a SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296)
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-05 11:33:11 -05:00
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
German Andryeyev 3895aadba6 SWDEV-558849 - Make ROCR path in Windows more stable (#2181) 2025-12-10 12:37:10 -05:00
Jin Jung deaf8ab38a SWDEV-567119 - Windows GL Interop Support (#1892) 2025-12-08 11:03:59 -05:00
AidanBeltonS d849b88aef SWDEV-558080 - Add recommended granularity (#1176)
* Add recommended granularity

* Improve granularity testing

* Update based on feedback
2025-11-26 16:10:58 +00:00
Karthik Jayaprakash 740a06d567 SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160) 2025-11-25 19:25:32 -05:00
Victor Zhang 7580052878 SWDEV-564318 - Add support for allocating uncached device memory (#1670) 2025-11-09 12:51:41 -05:00
Sam Ruscica 757de39caa Updated amdFileRead/Write in rocdevice to support windows build (#1435)
* Updated amdFileRead in rocdevice to support windows build

* Updated amdFileRead in rocdevice to support windows build
2025-11-04 10:03:03 -05:00
MachineTom 5f76cb916d SWDEV-555888 - Refactor Numa code (#1191)
1. Create a set of mini numa interface.
In Linux, the interface is based on system call rather than libnuma.
In Windows, the interface can also work, but the policy class is dummy.
Different from Linux, Windows doesn't provide numactl tool or numa lib to setup numa policy, thus
the default policy is followed in Windows, that is, using the closest host numa node to allocate
pinned host memory in hipHostMalloc().
To get the closest host numa node of a GPU device, you need query the new attribute
hipDeviceAttributeHostNumaId. Then you can create a thread with CPU affinity on the numa node.
For example, reference the test in hip-tests/catch/perftests/memory/hipPerfHostNumaAllocWin.cc.

2. Remove pfnSetThreadGroupAffinity and pfnGetNumaNodeProcessorMaskEx as the functions have been exposed since Win7 and Win server 2008.

3. Other minor fixes.
2025-10-23 21:56:15 -04:00
Pengda Xie a4bbd73dc6 SWDEV-556684 - Remove HSAIL support (#1183) 2025-10-23 11:21:49 -07:00
Jimbo 37f2be9140 SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister (#962)
* SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister

* SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister

* SWDEV-554174 Added hipHostRegisterIoMemory flag in test cases

* SWDEV-554174 : Did formatting corrections

* SWDEV-554608 - set HSA_AMD_MEMORY_POOL_UNCACHED_FLAG if IoMemory is set

* SWDEV-554608 - set HSA_AMD_MEMORY_POOL_UNCACHED_FLAG if IoMemory is set

* SWDEV-554608 - Add hipHostRegisterIoMemory for hipHostRegister

---------

Co-authored-by: Anavena Venkatesh <Anavena.Venkatesh@amd.com>
Co-authored-by: Rambabu Swargam <rambabu.swargam@amd.com>
2025-10-22 20:25:59 -04:00
Pengda Xie 29c2ca8cbc SWDEV-559867 - Fix CU mask printing (#1328) 2025-10-20 09:47:17 -07:00
Sam Ruscica 135c38b41c SWDEV-553436 Created wrapper functions for file read and file write (#935) 2025-10-07 09:42:22 -04:00
MachineTom 25922d08c3 SWDEV-539145 - Return error when ext_fine_grain_pool unavailable (#877)
Return error when ext_fine_grain_pool is unavailable for
hipHostMallocUncached, hipHostAllocUncached and
hipExtHostRegisterUncached.
Disable related tests on Navi4x where
ext_fine_grain_pool is unavailable
2025-09-21 19:25:28 -04:00
German Andryeyev ea89ddd589 SWDEV-547108 - Add dll loader for Windows build (#1004)
The build of ROCR backend will be enabled by default in Windows.
It requires the dll loader until ROCR dll will be always available in Windows for any configuration.
2025-09-19 11:25:30 -04:00
lancesix 45b48fb987 SWDEV-555043 - Do not wait on signal if gpu in error state (#1023)
During a process tear-down we wait on all signals before releasing them:

    VirtualGPU::HwQueueTracker::~HwQueueTracker() {
      for (auto& signal : signal_list_) {
        CpuWaitForSignal(signal);
        signal->release();
      }
      [...]
    }

In the case where we exit the process after a GPU error that did not
cause an abort (ulimit -c == 0), waiting for the signal can be skipped.
With the device on the error state, no progress is made, and the signal
is probably never going to be modified again:

    inline bool WaitForSignal(hsa_signal_t signal, bool active_wait = false, bool yield = false) {
          [...]
          if (HIP_SKIP_ABORT_ON_GPU_ERROR && amd::Device::IsGPUInError()) {
            ClPrint(amd::LOG_ERROR, amd::LOG_SIG,
                    "Device not Stable, while waiting for Signal ="
                    "(0x%lx) for %d ns",
                    signal.handle, kTimeout4Secs);
            return true;
          }
          [...]
    }

However, after calling CpuWaitForSignal, when calling "release", we can
end-up on a signal dtor which also tries to wait on the signal.  Because
the GPU is the error state, we never receive the signal, and hang the
process during tear down.  This happens with the ProfilingSignal dtor:

    ProfilingSignal::~ProfilingSignal() {
      if (signal_.handle != 0) {
        if (hsa_signal_load_relaxed(signal_) > 0) {
          LogError("Runtime shouldn't destroy a signal that is still busy!");
          if (hsa_signal_wait_scacquire(signal_, HSA_SIGNAL_CONDITION_LT, kInitSignalValueOne,
                                        kUnlimitedWait, HSA_WAIT_STATE_BLOCKED) != 0) {
          }
        }
        hsa_signal_destroy(signal_);
      }
    }

This dtor should check that the GPU is not in the error state before
trying to wait, which is what this patch implements.

Bug: SWDEV-555043
Bug: SWDEV-553435
Bug: SWDEV-553679
Bug: SWDEV-555119
2025-09-18 14:32:04 +01:00
Ioannis Assiouras 5c1eebab84 SWDEV-543723 - Change agentInfo parameter in hostAlloc to void* (#995) 2025-09-18 11:43:15 +01:00
Ioannis Assiouras 35629e433d SWDEV-546146 - Added support for hipMemLocationTypeHost in hipMemSetAccess (#682) 2025-09-10 23:06:20 +01:00
SaleelK e197aa83ba SWDEV-543723 - Execute permission for kernArg buf (#728)
- Refactor deviceLocalAlloc arguments
- Refactor hostAlloc code, have cleaner interface
- Kern args buffer need to have execute flag set as CP enforces this on
  certain newer HW.
2025-09-08 12:21:30 -07:00
SaleelK c4537e8050 SWDEV-553126 - Improve logging (#835)
* Ability to mask COPY api usage in logs
* Show total graph nodes in logs
* Add another log level for detailed debug
2025-09-04 10:08:41 -07:00
Ajay GunaShekar f2ad8d6d5e SWDEV-553099 - remove WITHOUT_HSA_BACKEND usage (#831) 2025-09-03 08:40:25 -07:00
Ioannis Assiouras a1c30318fb SWDEV-546223 - Get image support info from ISA meta (#773) 2025-09-02 15:05:18 +01:00
Ioannis Assiouras 5f525ee934 SWDEV-550882 - Expect HSA_EXT_POINTER_TYPE_RESERVED_ADDR pointer type from hsa_amd_pointer_info for hmm (#733) 2025-08-27 19:42:13 +01:00
Karthik Jayaprakash 89070536c0 SWDEV-552141 - Fix handle/fd type passed from app to align with spec. (#759)
* SWDEV-552141 - Fix handle/fd type passed from app to align with spec.

* SWDEV-552141 - Fix handle/fd type passed from app to align with spec.
2025-08-27 14:28:53 -04:00
Danylo Lytovchenko 2ff2316227 Adjust clang format to the new versions, revert broken macro layout (#714) 2025-08-22 17:23:22 +02:00
Danylo Lytovchenko f7338717ae SWDEV-470698 - fix formatting, add format check workflow (#657) 2025-08-20 19:58:06 +05:30
Andryeyev, German 72b9408fed SWDEV-547108 - Fix compilation errors under Windows (#867)
Interop and numa are not enabled.

[ROCm/clr commit: 0ac913e64c]
2025-08-17 02:33:31 -04:00
Betigeri, Sourabh 35e48d1eaf SWDEV-546293 - hipMemPrefetchAsync_v2 and hipMemAdvise_v2 implementation (#869)
SWDEV-546293 - hipMemPrefetchAsync hipMemAdvise_v2

Please enter the commit message for your changes. Lines starting

[ROCm/clr commit: cbee74a80e]
2025-08-15 22:40:04 -07:00
Manocha, Rahul b3ccf487da SWDEV-545952 - API definitions for hipStreamSet/GetAttribute (#831)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>

[ROCm/clr commit: 0f49c4a97f]
2025-08-15 12:51:35 -07:00
Andryeyev, German 6df9a49437 SWDEV-465041 - Add support for user events with DD (#321)
* SWDEV-465041 - Add support for user events with DD

User events can be replaced with HSA signals. Add the interface
to allocate HSA signal for user events and update the status on
CL_COMPLETE.
Force pinned path with DD to avoid blocking calls. Pinned memory
can be released only when the command is complete.
Simplify device enqueue path to use generic kernel arg buffer and
signals

* Fix notifyCmdQueue() logic for OCL

* Avoid blocking calls in OCL with DD

* Add event  destruciton in a case of the failure.

[ROCm/clr commit: 2305f8ae56]
2025-08-12 19:04:36 -04:00
Manocha, Rahul 4a93a614e5 SWDEV-539710 - Defer allocation of managed variable (#652)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>

[ROCm/clr commit: 3f6f9d6081]
2025-07-31 08:30:23 -07:00
Betigeri, Sourabh 680c7fbf64 SWDEV-483895 - Use ROCr to reserve memory for HMM (#590)
[ROCm/clr commit: a1f056bd11]
2025-07-15 21:40:13 +05:30
Sang, Tao a777d6eca0 SWDEV-539145 - Simplify host memory pool management (#668)
* SWDEV-539145 - Simplify host memory pool management

Remove unnecessary variables and functions.
Make code simpler and clear.

* Change cpu_agent_info_ into pointer.

* Restore getPreferredNumaNode()

[ROCm/clr commit: 1351cd7fa8]
2025-07-11 10:38:40 -04:00
Sang, Tao 7fd322af90 SWDEV-508776 - support VGPRs validation (#274)
Clarify some VGPRs terms description.
Fix some wrong query logics of availableVGPRs_ and
availableRegistersPerCU_ in device info.
Add hipDeviceAttributeMaxAvailableVgprsPerThread
attribute query.
Remove hardcoding of following
 info_.vgprAllocGranularity_
 info_.vgprsPerSimd_

[ROCm/clr commit: 397f303d97]
2025-07-09 10:46:52 -04:00
Xie, Pengda 8e5921174f SWDEV-540576 - Abort if user request a core dump (#653)
* SWDEV-539414 - Return error status from runtime handler when HIP_SKIP_ABORT_ON_GPU_ERROR is false

* SWDEV-539414 - default handler when GPU core file is generated

* SWDEV-540576 - Abort if user request a core dump

Change-Id: I9e2c640acf559880bd13641de9103e660ef822a3

---------

Co-authored-by: Assiouras, Ioannis <Ioannis.Assiouras@amd.com>
Co-authored-by: agunashe <ajay.gunashekar@amd.com>

[ROCm/clr commit: 9c1bff0ae7]
2025-07-03 00:29:14 +05:30
Sang, Tao 287ec83f5c SWDEV-539145 - Support extended fine grained system memory pool (#603)
* Add hipHostMalloc() new flag hipHostMallocUncached which will force to allocate pinned
host memory on extended fine grained system memory pool.
* Add hipHostAlloc() new flag hipHostAllocUncached which will force to allocate pinned
host memory on extended fine grained system memory pool.
* Add hipHostRegister() new flag hipHostRegisterUncached which will force to map
host memory onto extended fine grained system momory pool.

[ROCm/clr commit: a7d7687b8f]
2025-07-01 10:10:10 +05:30
Sang, Tao da7c6f57ef SWDEV-539399 - Support ROCCLR_MEM_HSA_CONTIGUOUS (#610)
Fix missing support of ROCCLR_MEM_HSA_CONTIGUOUS in
hipExtMallocWithFlags().

[ROCm/clr commit: f4d78d427c]
2025-06-26 14:05:23 -04:00
Kudchadker, Saleel 3c9f80b4e4 SWDEV-535490 - Improve logging (#441)
- Include HIP version/githash in the logs
- Add a new method to print library path

[ROCm/clr commit: 5d53c83806]
2025-06-24 13:00:00 -07:00
Andryeyev, German fc2f5aaf22 SWDEV-533074 - Expose hipDeviceAttributeNumberOfXccs attribute (#336)
[ROCm/clr commit: 5cc172c99c]
2025-05-21 22:35:42 +05:30
Jayaprakash, Karthik a17e22a78c SWDEV-529929 - hipMemGetHandleForAddressRange implementation. (#245)
[ROCm/clr commit: 12131de4a9]
2025-05-20 15:56:04 -04:00
Jayaprakash, Karthik 4ea2d9a5ee SWDEV-531711 - Report correct error code based on device failure. (#286)
[ROCm/clr commit: f5b8db33f1]
2025-05-17 06:33:13 -04:00
Assiouras, Ioannis 4efd624960 SWDEV-525593, SWDEV-527293 - Acquire active queue after xferQueue is created (#165)
For xferQueue VirtualGPU::create is called after ProfilingBegin
so the active queue needs to be acquired.

[ROCm/clr commit: d3fb8eda8b]
2025-04-30 09:21:11 +01:00