rocm-systems

Penulis	SHA1	Pesan	Tanggal
Rahul Manocha	c4f7593001	clr: Update signal count and pool size for staging buffer (#2889 ) * clr: Update signal count and pool size for staging buffer * Change to naming of variables etc --------- Co-authored-by: Rahul Manocha <rmanocha@amd.com>	2026-01-29 10:34:00 -08:00
SaleelK	5c7c549301	clr: Fix some nullptr checks and prints (#2825 )	2026-01-27 16:45:17 -08:00
sluzynsk-amd	f37b100c34	SWDEV-563777 - further reduce compilation warnings (#2331 ) This change resolves some of the warnings generated during clr builds. Quiet regular output of doxygen. Disable non-documented warnings of doxygen. Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>	2026-01-27 20:51:16 +01:00
marandje	5cda2a496e	SWDEV-568260 - Validate sub-buffer coverage in hipMemSetAccess (#2451 )	2026-01-26 23:09:46 +01:00
Shadi Dashmiz	71856ec239	SWDEV-465366 : Deadlock during stream wait opeartion (#2652 ) Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>	2026-01-26 16:54:07 -05:00
Xie, AlexBin	e22c9b457e	SWDEV-576718 - provide option to limit memory cache usage (#2810 ) * SWDEV-576718 - provide option to limit memory cache usage * SWDEV-576718 - Use MiB instead of MB in description	2026-01-26 11:35:01 -05:00
SaleelK	340f3aa887	clr: Implement dynamic stream to HWq logic (#1958 ) * clr: Implement dynamic stream to HW queue assignment This change implements dynamic stream to hardware queue (HWq) mapping with the following features: * Queue depth heuristics with weights for optimal HWq assignment * Make last used queue sticky for better locality * Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to pipe mapping based on creation order (single process per device only, as pipe ID is statically assigned by runtime) * More aggressive heuristic usage for better queue distribution * Extend dynamic queues support for all stream priorities Environment variables: * DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 - Depth+Pipe heuristics * DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation * clr: Clean up last_used_queue_	2026-01-23 10:40:54 -08:00
Jin Jung	48347bc857	SWDEV-572679 - Fix hipGraphicsGLRegisterImage (#2475 )	2026-01-23 06:25:10 -08:00
Tao Sang	163e44d0a8	SWDEV-555889 - Support mipmap on rocr (#2082 ) * SWDEV-555889 - Support mipmap on rocr Support mipmap in hip-rt on rocr backend. Enable all mipmap tests in Windows. Some other minor improvement. Add some SRD logs that will be removed finally. * Add sampler.mipFilter to fix sampler issues on mipmap in rocr. Fix format issues of view of leveled image and mipmap image in blit kernel in rocr. Enabled disabled mipmap tests. * Rewrite view logic * Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues. Reset last error in nagative tests. * Remove SRD dump log from hip-rt Let Rocr mipmap log be in condition. * minor format chang * Exclude mipmap tests for mi200+ which don't support mipmap.	2026-01-21 09:10:29 -08:00
German Andryeyev	db792fac37	SWDEV-558849 - Add support for static linking with ROCR (#2659 )	2026-01-20 14:53:01 -05:00
Filip Jankovic	29cd25df66	Add hipDeviceAttributeExpertSchedMode (#2435 ) * Add hipDeviceAttributeExpertSchedMode --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com> * Update hipDeviceAttributeExpertSchedMode unit test * Move check to ROCr from thunk interface * Revert unrelated whitespace changes * Revert version bump --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>	2026-01-15 08:41:39 -08:00
David Yat Sin	a3b445118d	SWDEV-519413 - Ignore ROCr shutdown events (#1616 ) ROCr now reports a shutdown event, but this is not a fatal error. Ignore this event.	2026-01-14 11:28:03 -08:00
Luca Bruni	d7ff927690	[clr] Fix device printf pointer advancement issue with string format specifiers (#1313 )	2026-01-14 13:05:25 -05:00
Jin Jung	d4758bc29e	SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323 )	2026-01-12 13:10:46 -06:00
SaleelK	e6e0378acd	clr: Always query new engine for intergpu copies (#2559 )	2026-01-12 11:01:02 -08:00
SaleelK	6b28faa532	clr: Implement per-stream SDMA engine affinity for improved copy performance (#2480 ) Problem: The existing SDMA engine selection logic had several issues: 1. Same VirtualGPU/stream could use different SDMA engines for consecutive async copies since copy_engine_status may report engines as busy 2. Busy and Preferred engine check for every copy 3. No global tracking of which VirtualGPU uses which engine, leading to suboptimal resource allocation Solution: Implemented a global SDMA engine allocator with per-stream affinity: - Added Device::SdmaEngineAllocator to manage VirtualGPU → engine assignments * Maintains global map of active assignments * Enforces exclusivity: different streams use different engines (except inter-GPU copies where preferred engines are prioritized for optimal hardware paths like XGMI links) * Thread-safe allocation/release with Monitor lock - Modified VirtualGPU to cache assigned engine locally (assigned_sdma_engine_) for fast lookup without map access on hot path - Refactored rocrCopyBuffer() to: 1. Check local cached engine first → use if assigned 2. Call AllocateSdmaEngine() if not assigned → cache result - Moved HSA API queries (memory_copy_engine_status, memory_get_preferred_copy_engine) into AllocateEngine() for cleaner separation of concerns - Engine release on HostQueue::finish() instead of only VirtualGPU destruction * Improves engine utilization by releasing earlier * Added virtual ReleaseSdmaEngines() method to device::VirtualDevice - Added future path for simple round-robin allocation (kUseSimpleRR) for next-gen GPUs with uniform SDMA bandwidth (disabled by default) Cleanup: - Removed selectSdmaEngine() helper (logic moved to allocator) - Removed getSdmaRWMasks() (allocator accesses maxSdmaReadMask_/WriteMask_ directly) - Removed unused sdmaEngineReadMask_/WriteMask_ member variables from DmaBlitManager Benefits: - Ensures consistent per-stream SDMA engine usage - Prevents cross-stream contention and engine thrashing - Prioritizes hardware-optimal paths for inter-GPU transfers - Better resource utilization through earlier release - Cleaner, more maintainable code structure	2026-01-07 19:37:45 -08:00
Shadi Dashmiz	2789ea429a	SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296 ) Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>	2026-01-05 11:33:11 -05:00
Ioannis Assiouras	aecc845456	SWDEV-573589 - Fixed performance regression due to the increase of the signal pool (#2470 )	2026-01-02 12:50:56 +00:00
German Andryeyev	741b4b9fdf	SWDEV-558849 - Fix Windows build for ROCR backend (#2368 )	2025-12-29 08:35:22 -05:00
marandje	3e49440495	SWDEV-555178 - Calculate phys mem offset for remap range (#1879 )	2025-12-23 10:27:42 +01:00
Sourabh U Betigeri	d552491985	SWDEV-572329 - Remove barrier packet (#2304 )	2025-12-19 13:37:48 -08:00
Maneesh Gupta	4a9833e70e	Revert "Add HasExpertSchedMode device prop (#2241 )" (#2371 ) This reverts commit `c0b4aef5ad`.	2025-12-17 21:26:44 -08:00
Filip Jankovic	c0b4aef5ad	Add HasExpertSchedMode device prop (#2241 ) * Add HasExpertSchedMode device prop * Add unit tests for HasExpertSchedMode * Add gfx12 check for HasExpertSchedMode prop * Update gfx major version check and test for ExpertSchedMode * Minor fix and ROCr version bump * Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h * Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h * Apply suggestion from @dayatsin-amd * Apply suggestion from @dayatsin-amd --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com> Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>	2025-12-17 17:06:08 +01:00
systems-assistant[bot]	b002c6a739	SWDEV-538607 - Add SIMDe as a build dependency, remove naked intrinsic use. (#500 ) Co-authored-by: Alex Voicu <alexandru.voicu@amd.com> Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>	2025-12-15 17:40:51 +00:00
Fábio Mestre	447beeb00b	Replace usages of __ockl_gws_init with __builtin_amdgcn_ds_gws_init (#2235 )	2025-12-15 16:56:14 +01:00
SaleelK	10635483ad	clr: Fix packet batch write logic (#2236 ) * When writing bulk packets always invalidate packet headers, Its possible that the CP fetcher can have multiple packets in flight. In such cases we may end up with a malformed packet because the writes are not complete yet CP finds a valid header.	2025-12-11 04:26:41 -08:00
German Andryeyev	3895aadba6	SWDEV-558849 - Make ROCR path in Windows more stable (#2181 )	2025-12-10 12:37:10 -05:00
SaleelK	acc236fd89	clr: Avoid saving all ProfilingSignals at once (#2108 ) * While reusing signals, its possible we can come across a timestamp that can contain several signals, like when profiling a graph. Reading timestamps from all signals can make the call severely CPU bound. Instead cache only that signal so as to avoid the overhead for critical path.	2025-12-08 11:32:16 -08:00
Jin Jung	deaf8ab38a	SWDEV-567119 - Windows GL Interop Support (#1892 )	2025-12-08 11:03:59 -05:00
Lancelot Six	659737c824	clr: Bump _amdgpu_r_debug.r_version to 11 (#2063 )	2025-12-05 16:01:08 -05:00
Rahul Manocha	9dd3c2fa70	SWDEV-563271 - return error when pal cmd submission fails (#1585 )	2025-12-05 14:25:01 -05:00
harkgill-amd	8f622de972	Add gfx1152 support to PAL (#2077 )	2025-12-03 10:39:22 -08:00
Ioannis Assiouras	65b769ee16	SWDEV-569101 - increase signal list size to at least DEBUG_HIP_GRAPH_BATCH_SIZE (#2084 )	2025-12-01 18:52:51 -08:00
SaleelK	c105dcd05b	clr: Use graph segment scheduling to process HIP Graphs (#1372 ) * clr: Use graph segment scheduling to process HIP Graphs * Add a broader path to use capture packet capture for all topologies * Refactor code * Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING to toggle new vs classic path, Enabled by default * clr: Few fixes and improvements * clr: Detect complex graphs to take classic path * Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING=2 to force segment scheduling path * clr: Fix a cornercase stack corruption * clr: Track commands of segments instead of snapshots * clr: Fix Batch dispatch logic * Track fence_dirty_ flag for command of other streams * Dependency resolution markers can now accomodate dirty fence on cross streams --------- Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com> Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>	2025-12-01 12:49:26 -08:00
AidanBeltonS	d849b88aef	SWDEV-558080 - Add recommended granularity (#1176 ) * Add recommended granularity * Improve granularity testing * Update based on feedback	2025-11-26 16:10:58 +00:00
Karthik Jayaprakash	740a06d567	SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160 )	2025-11-25 19:25:32 -05:00
cadolphe-amd	cce94f6ee0	SWDEV-557412 - Incorporate proper chunk offset when remapping virtual memory (#1848 ) * SWDEV-557412 - Incorporate proper offset when remapping virtual memory * Fix condition to check if VMHeap allocation address matches a chunk address * Move offset calculation outside if/else block --------- Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>	2025-11-25 18:05:25 -05:00
Victor Zhang	ede71ca3b0	SWDEV-567829 - populateFormatStringHashMap: relax printf hash collisi… (#1944 ) * SWDEV-567829 - populateFormatStringHashMap: relax printf hash collision check for duplicate format strings * function optimized by ai	2025-11-25 17:19:27 -05:00
sluzynsk-amd	2cf9faa93f	SWDEV-563777 - fix warnings related to inconsistent overrides (#1625 ) This patch adds missing override keywords. Fixes this class of warnings. Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>	2025-11-24 18:50:07 +01:00
Ioannis Assiouras	36029ea1a8	SWDEV-559166 - Fix race condition in getDemangledName (#1868 )	2025-11-23 08:45:45 +00:00
German Andryeyev	ff4782620e	SWDEV-547108 - Fix PAL build with HSA backend (#1850 ) When hip is built with HSA backend then the headers from ROCR will be used, but scratch_backing_memory_byte_size is a part of amd_queue_v2_t structure	2025-11-14 12:28:03 -05:00
pcritchl-amd	60cd210dac	Reapply "SWDEV-562996 - Build fix: Ubertrace callback calling convention mismatch on x86 (#1587 )" (#1717 ) (#1754 )	2025-11-12 13:47:24 -05:00
Ioannis Assiouras	4f91b68988	SWDEV-559166 - Remove obsolete member execInfoOffset from KernelParameters (#1790 )	2025-11-12 17:20:36 +00:00
SaleelK	5e418ca256	clr: Allow all engines but prefer recommended engines (#1750 ) * Also honor ROC_P2P_SDMA_SIZE for IPC, since IPC can also mean P2P	2025-11-10 13:10:46 -08:00
Victor Zhang	7580052878	SWDEV-564318 - Add support for allocating uncached device memory (#1670 )	2025-11-09 12:51:41 -05:00
SaleelK	738bb19835	clr: Increase kernelArg/managedBuffer size (#1586 ) * Increase the buffer to 4MB. That can help kernel launches limited by a deep kernel pipeline Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>	2025-11-08 18:32:43 -08:00
Pengda Xie	93947241d0	SWDEV-556684 - HSAIL cleanup (#1657 )	2025-11-08 02:22:03 -08:00
Pengda Xie	5dd15e22ca	SWDEV-559514 - Add queue validation to submitMarker sync path (#1308 )	2025-11-08 02:21:36 -08:00
lancesix	f7ffcd1402	clr: SWDEV-547890 - Bump PAL API version to 954 (#1680 ) * clr: Adjust call to ICmdBuffer::CmdCopyMemoryToImage for PAL >= 955 PAL starting versino 955 adds a new argument to ICmdBuffer::CmdCopyMemoryToImage. Adjust teh callsite to account fort his. * clr: Handle new GpuUtil::TraceSessionState cases for PAL >= 939 Starting PAL API version 939, GpuUtil::TraceSessionState changes its possible values. Adjust for it. * clr: require PAL version 954 Bump the PAL required vesion to 954, as this is required for proper debugger support.	2025-11-08 00:52:04 +00:00
Jin Jung	291ff6c468	SWDEV-558855 - Enable Interop Map Buffer on Windows (#1748 ) * Support Windows HANDLE in interop_map_buffer * Refactored Windows HANDLE in interop_map_buffer * ROCr System Dependent Handle Type * Fix for ROCr Handle Conversion Bug * Remove Windows Header	2025-11-07 12:47:01 -08:00

1 2 3 4 5 ...

1338 Melakukan