1338 Melakukan

Penulis SHA1 Pesan Tanggal
Rahul Manocha c4f7593001 clr: Update signal count and pool size for staging buffer (#2889)
* clr: Update signal count and pool size for staging buffer

* Change to naming of variables etc

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-29 10:34:00 -08:00
SaleelK 5c7c549301 clr: Fix some nullptr checks and prints (#2825) 2026-01-27 16:45:17 -08:00
sluzynsk-amd f37b100c34 SWDEV-563777 - further reduce compilation warnings (#2331)
This change resolves some of the warnings generated during clr builds.
Quiet regular output of doxygen.
Disable non-documented warnings of doxygen.

Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>
2026-01-27 20:51:16 +01:00
marandje 5cda2a496e SWDEV-568260 - Validate sub-buffer coverage in hipMemSetAccess (#2451) 2026-01-26 23:09:46 +01:00
Shadi Dashmiz 71856ec239 SWDEV-465366 : Deadlock during stream wait opeartion (#2652)
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-26 16:54:07 -05:00
Xie, AlexBin e22c9b457e SWDEV-576718 - provide option to limit memory cache usage (#2810)
* SWDEV-576718 - provide option to limit memory cache usage

* SWDEV-576718 - Use MiB instead of MB in description
2026-01-26 11:35:01 -05:00
SaleelK 340f3aa887 clr: Implement dynamic stream to HWq logic (#1958)
* clr: Implement dynamic stream to HW queue assignment

This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:

* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
  pipe mapping based on creation order (single process per device only,
  as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities

Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
  Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation

* clr: Clean up last_used_queue_
2026-01-23 10:40:54 -08:00
Jin Jung 48347bc857 SWDEV-572679 - Fix hipGraphicsGLRegisterImage (#2475) 2026-01-23 06:25:10 -08:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
German Andryeyev db792fac37 SWDEV-558849 - Add support for static linking with ROCR (#2659) 2026-01-20 14:53:01 -05:00
Filip Jankovic 29cd25df66 Add hipDeviceAttributeExpertSchedMode (#2435)
* Add hipDeviceAttributeExpertSchedMode

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>

* Update hipDeviceAttributeExpertSchedMode unit test

* Move check to ROCr from thunk interface

* Revert unrelated whitespace changes

* Revert version bump

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
2026-01-15 08:41:39 -08:00
David Yat Sin a3b445118d SWDEV-519413 - Ignore ROCr shutdown events (#1616)
ROCr now reports a shutdown event, but this is not a fatal error. Ignore
this event.
2026-01-14 11:28:03 -08:00
Luca Bruni d7ff927690 [clr] Fix device printf pointer advancement issue with string format specifiers (#1313) 2026-01-14 13:05:25 -05:00
Jin Jung d4758bc29e SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323) 2026-01-12 13:10:46 -06:00
SaleelK e6e0378acd clr: Always query new engine for intergpu copies (#2559) 2026-01-12 11:01:02 -08:00
SaleelK 6b28faa532 clr: Implement per-stream SDMA engine affinity for improved copy performance (#2480)
Problem:
The existing SDMA engine selection logic had several issues:
1. Same VirtualGPU/stream could use different SDMA engines for consecutive
   async copies since copy_engine_status may report engines as busy
2. Busy and Preferred engine check for every copy
3. No global tracking of which VirtualGPU uses which engine, leading to
   suboptimal resource allocation

Solution:
Implemented a global SDMA engine allocator with per-stream affinity:

- Added Device::SdmaEngineAllocator to manage VirtualGPU → engine assignments
  * Maintains global map of active assignments
  * Enforces exclusivity: different streams use different engines (except
    inter-GPU copies where preferred engines are prioritized for optimal
    hardware paths like XGMI links)
  * Thread-safe allocation/release with Monitor lock

- Modified VirtualGPU to cache assigned engine locally (assigned_sdma_engine_)
  for fast lookup without map access on hot path

- Refactored rocrCopyBuffer() to:
  1. Check local cached engine first → use if assigned
  2. Call AllocateSdmaEngine() if not assigned → cache result

- Moved HSA API queries (memory_copy_engine_status, memory_get_preferred_copy_engine)
  into AllocateEngine() for cleaner separation of concerns

- Engine release on HostQueue::finish() instead of only VirtualGPU destruction
  * Improves engine utilization by releasing earlier
  * Added virtual ReleaseSdmaEngines() method to device::VirtualDevice

- Added future path for simple round-robin allocation (kUseSimpleRR) for
  next-gen GPUs with uniform SDMA bandwidth (disabled by default)

Cleanup:
- Removed selectSdmaEngine() helper (logic moved to allocator)
- Removed getSdmaRWMasks() (allocator accesses maxSdmaReadMask_/WriteMask_ directly)
- Removed unused sdmaEngineReadMask_/WriteMask_ member variables from DmaBlitManager

Benefits:
- Ensures consistent per-stream SDMA engine usage
- Prevents cross-stream contention and engine thrashing
- Prioritizes hardware-optimal paths for inter-GPU transfers
- Better resource utilization through earlier release
- Cleaner, more maintainable code structure
2026-01-07 19:37:45 -08:00
Shadi Dashmiz 2789ea429a SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296)
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-05 11:33:11 -05:00
Ioannis Assiouras aecc845456 SWDEV-573589 - Fixed performance regression due to the increase of the signal pool (#2470) 2026-01-02 12:50:56 +00:00
German Andryeyev 741b4b9fdf SWDEV-558849 - Fix Windows build for ROCR backend (#2368) 2025-12-29 08:35:22 -05:00
marandje 3e49440495 SWDEV-555178 - Calculate phys mem offset for remap range (#1879) 2025-12-23 10:27:42 +01:00
Sourabh U Betigeri d552491985 SWDEV-572329 - Remove barrier packet (#2304) 2025-12-19 13:37:48 -08:00
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
systems-assistant[bot] b002c6a739 SWDEV-538607 - Add SIMDe as a build dependency, remove naked intrinsic use. (#500)
Co-authored-by: Alex Voicu <alexandru.voicu@amd.com>
Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
2025-12-15 17:40:51 +00:00
Fábio Mestre 447beeb00b Replace usages of __ockl_gws_init with __builtin_amdgcn_ds_gws_init (#2235) 2025-12-15 16:56:14 +01:00
SaleelK 10635483ad clr: Fix packet batch write logic (#2236)
* When writing bulk packets always invalidate packet headers, Its
  possible that the CP fetcher can have multiple packets in flight. In
such cases we may end up with a malformed packet because the writes are
not complete yet CP finds a valid header.
2025-12-11 04:26:41 -08:00
German Andryeyev 3895aadba6 SWDEV-558849 - Make ROCR path in Windows more stable (#2181) 2025-12-10 12:37:10 -05:00
SaleelK acc236fd89 clr: Avoid saving all ProfilingSignals at once (#2108)
* While reusing signals, its possible we can come across a timestamp
  that can contain several signals, like when profiling a graph. Reading
timestamps from all signals can make the call severely CPU bound.
Instead cache only that signal so as to avoid the overhead for critical
path.
2025-12-08 11:32:16 -08:00
Jin Jung deaf8ab38a SWDEV-567119 - Windows GL Interop Support (#1892) 2025-12-08 11:03:59 -05:00
Lancelot Six 659737c824 clr: Bump _amdgpu_r_debug.r_version to 11 (#2063) 2025-12-05 16:01:08 -05:00
Rahul Manocha 9dd3c2fa70 SWDEV-563271 - return error when pal cmd submission fails (#1585) 2025-12-05 14:25:01 -05:00
harkgill-amd 8f622de972 Add gfx1152 support to PAL (#2077) 2025-12-03 10:39:22 -08:00
Ioannis Assiouras 65b769ee16 SWDEV-569101 - increase signal list size to at least DEBUG_HIP_GRAPH_BATCH_SIZE (#2084) 2025-12-01 18:52:51 -08:00
SaleelK c105dcd05b clr: Use graph segment scheduling to process HIP Graphs (#1372)
* clr: Use graph segment scheduling to process HIP Graphs

* Add a broader path to use capture packet capture for all topologies
* Refactor code
* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING to toggle new vs classic path,
  Enabled by default

* clr: Few fixes and improvements

* clr: Detect complex graphs to take classic path

* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING=2 to force segment scheduling
  path

* clr: Fix a cornercase stack corruption

* clr: Track commands of segments instead of snapshots

* clr: Fix Batch dispatch logic

* Track fence_dirty_ flag for command of other streams
* Dependency resolution markers can now accomodate dirty fence on cross
  streams

---------

Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>
2025-12-01 12:49:26 -08:00
AidanBeltonS d849b88aef SWDEV-558080 - Add recommended granularity (#1176)
* Add recommended granularity

* Improve granularity testing

* Update based on feedback
2025-11-26 16:10:58 +00:00
Karthik Jayaprakash 740a06d567 SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160) 2025-11-25 19:25:32 -05:00
cadolphe-amd cce94f6ee0 SWDEV-557412 - Incorporate proper chunk offset when remapping virtual memory (#1848)
* SWDEV-557412 - Incorporate proper offset when remapping virtual memory

* Fix condition to check if VMHeap allocation address matches a chunk address

* Move offset calculation outside if/else block

---------

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-25 18:05:25 -05:00
Victor Zhang ede71ca3b0 SWDEV-567829 - populateFormatStringHashMap: relax printf hash collisi… (#1944)
* SWDEV-567829 - populateFormatStringHashMap: relax printf hash collision check for duplicate format strings

* function optimized by ai
2025-11-25 17:19:27 -05:00
sluzynsk-amd 2cf9faa93f SWDEV-563777 - fix warnings related to inconsistent overrides (#1625)
This patch adds missing override keywords. Fixes this class of warnings.

Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>
2025-11-24 18:50:07 +01:00
Ioannis Assiouras 36029ea1a8 SWDEV-559166 - Fix race condition in getDemangledName (#1868) 2025-11-23 08:45:45 +00:00
German Andryeyev ff4782620e SWDEV-547108 - Fix PAL build with HSA backend (#1850)
When hip is built with HSA backend then the headers from ROCR will be used, but
scratch_backing_memory_byte_size is a part of amd_queue_v2_t structure
2025-11-14 12:28:03 -05:00
pcritchl-amd 60cd210dac Reapply "SWDEV-562996 - Build fix: Ubertrace callback calling convention mismatch on x86 (#1587)" (#1717) (#1754) 2025-11-12 13:47:24 -05:00
Ioannis Assiouras 4f91b68988 SWDEV-559166 - Remove obsolete member execInfoOffset from KernelParameters (#1790) 2025-11-12 17:20:36 +00:00
SaleelK 5e418ca256 clr: Allow all engines but prefer recommended engines (#1750)
* Also honor ROC_P2P_SDMA_SIZE for IPC, since IPC can also mean P2P
2025-11-10 13:10:46 -08:00
Victor Zhang 7580052878 SWDEV-564318 - Add support for allocating uncached device memory (#1670) 2025-11-09 12:51:41 -05:00
SaleelK 738bb19835 clr: Increase kernelArg/managedBuffer size (#1586)
* Increase the buffer to 4MB. That can help kernel launches limited by a deep kernel pipeline

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-08 18:32:43 -08:00
Pengda Xie 93947241d0 SWDEV-556684 - HSAIL cleanup (#1657) 2025-11-08 02:22:03 -08:00
Pengda Xie 5dd15e22ca SWDEV-559514 - Add queue validation to submitMarker sync path (#1308) 2025-11-08 02:21:36 -08:00
lancesix f7ffcd1402 clr: SWDEV-547890 - Bump PAL API version to 954 (#1680)
* clr: Adjust call to ICmdBuffer::CmdCopyMemoryToImage for PAL >= 955

PAL starting versino 955 adds a new argument to
ICmdBuffer::CmdCopyMemoryToImage.  Adjust teh callsite to account
fort his.

* clr: Handle new GpuUtil::TraceSessionState cases for PAL >= 939

Starting PAL API version 939, GpuUtil::TraceSessionState changes its
possible values.  Adjust for it.

* clr: require PAL version 954

Bump the PAL required vesion to 954, as this is required for proper
debugger support.
2025-11-08 00:52:04 +00:00
Jin Jung 291ff6c468 SWDEV-558855 - Enable Interop Map Buffer on Windows (#1748)
* Support Windows HANDLE in interop_map_buffer

* Refactored Windows HANDLE in interop_map_buffer

* ROCr System Dependent Handle Type

* Fix for ROCr Handle Conversion Bug

* Remove Windows Header
2025-11-07 12:47:01 -08:00