rocm-systems

Автор	SHA1	Сообщение	Дата
SaleelK	5c7c549301	clr: Fix some nullptr checks and prints (#2825 )	2026-01-27 16:45:17 -08:00
Shadi Dashmiz	b816d10802	Fix for pntr attri query from a peer device (#2722 ) * Fix for pntr attri query from a peer device Signed-off-by: sdashmiz <shadi.dashmiz@amd.com> * SWDEV-577116 : Fix qeury on peer device - if access is disabled query should return error. Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: sdashmiz <shadi.dashmiz@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-27 15:25:14 -05:00
sluzynsk-amd	f37b100c34	SWDEV-563777 - further reduce compilation warnings (#2331 ) This change resolves some of the warnings generated during clr builds. Quiet regular output of doxygen. Disable non-documented warnings of doxygen. Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>	2026-01-27 20:51:16 +01:00
Ioannis Assiouras	a66c6ca156	Removed extra marker when syncing graph streams back to the launch stream (#2823 )	2026-01-27 19:26:48 +00:00
vstojilj	9a8942a89c	SWDEV-558836, SWDEV-558837 - Add hipMemSetMemPool and hipMemGetMemPoo… (#1349 ) * SWDEV-558836, SWDEV-558837 - Add hipMemSetMemPool and hipMemGetMemPool implementation * Add managed allocation type for mem pools * Update rocprofiler-sdk with APis declaration	2026-01-27 18:45:28 +01:00
marandje	5cda2a496e	SWDEV-568260 - Validate sub-buffer coverage in hipMemSetAccess (#2451 )	2026-01-26 23:09:46 +01:00
Shadi Dashmiz	71856ec239	SWDEV-465366 : Deadlock during stream wait opeartion (#2652 ) Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>	2026-01-26 16:54:07 -05:00
Stella Laurenzo	e0dcba903e	Add kpack runtime integration for split device code artifacts (#2622 ) Integrates rocm-kpack runtime library for loading device code from external kpack archives at HIP initialization time. Changes: - Add kpack_params_ optional to FatBinaryInfo for HIPK metadata - Parse HIPK magic (0x4B504948) in digestFatBinary to detect kpack'd binaries - Add ExtractKpackBinary() to load code objects via kpack_load_code_object() - Wire up kpack cache lifecycle in hip_global.cpp - Track kpack allocations for proper cleanup - Support multi-TU binaries via bundle_index (co_index parameter) The ROCM_KPACK_ENABLED cmake flag controls whether kpack support is compiled in. When disabled, HIPK binaries return hipErrorNotSupported. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2026-01-26 09:50:42 -08:00
Xie, AlexBin	e22c9b457e	SWDEV-576718 - provide option to limit memory cache usage (#2810 ) * SWDEV-576718 - provide option to limit memory cache usage * SWDEV-576718 - Use MiB instead of MB in description	2026-01-26 11:35:01 -05:00
German Andryeyev	2b1b41f4da	AIRUNTIME-32 - Add try/catch around all HIP API calls (#2822 )	2026-01-23 14:53:20 -05:00
SaleelK	340f3aa887	clr: Implement dynamic stream to HWq logic (#1958 ) * clr: Implement dynamic stream to HW queue assignment This change implements dynamic stream to hardware queue (HWq) mapping with the following features: * Queue depth heuristics with weights for optimal HWq assignment * Make last used queue sticky for better locality * Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to pipe mapping based on creation order (single process per device only, as pipe ID is statically assigned by runtime) * More aggressive heuristic usage for better queue distribution * Extend dynamic queues support for all stream priorities Environment variables: * DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 - Depth+Pipe heuristics * DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation * clr: Clean up last_used_queue_	2026-01-23 10:40:54 -08:00
Jin Jung	48347bc857	SWDEV-572679 - Fix hipGraphicsGLRegisterImage (#2475 )	2026-01-23 06:25:10 -08:00
Ioannis Assiouras	f05a33968f	SWDEV-570500 - Fixed graph node to stream scheduling in multistream path (#2596 )	2026-01-21 20:48:46 +00:00
Tao Sang	163e44d0a8	SWDEV-555889 - Support mipmap on rocr (#2082 ) * SWDEV-555889 - Support mipmap on rocr Support mipmap in hip-rt on rocr backend. Enable all mipmap tests in Windows. Some other minor improvement. Add some SRD logs that will be removed finally. * Add sampler.mipFilter to fix sampler issues on mipmap in rocr. Fix format issues of view of leveled image and mipmap image in blit kernel in rocr. Enabled disabled mipmap tests. * Rewrite view logic * Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues. Reset last error in nagative tests. * Remove SRD dump log from hip-rt Let Rocr mipmap log be in condition. * minor format chang * Exclude mipmap tests for mi200+ which don't support mipmap.	2026-01-21 09:10:29 -08:00
Sam Ruscica	5daeb14582	SWDEV-547291 - Interop for OpenGL (#2350 ) Updated to convert flags correctly Added ObjectRegistry to track registered and mapped resources and incorporated it into hip_gl. Added mip level check Made functions static in-line Reworked validation to be more clear.	2026-01-21 09:08:55 -08:00
Karthik Jayaprakash	6a84a00208	Use size_t datatype for global dimensions. (#2604 )	2026-01-20 20:39:07 -05:00
German Andryeyev	db792fac37	SWDEV-558849 - Add support for static linking with ROCR (#2659 )	2026-01-20 14:53:01 -05:00
Ioannis Assiouras	59aa56a340	hip-issue-3876 : Take into account thread-local capture mode in checks for valid capture (#2177 )	2026-01-20 18:42:27 +00:00
Karthik Jayaprakash	99c3a06f4e	SWDEV-549518 - Enable logging dynamically through HIP APIS. (#1079 ) * SWDEV-549518 - Enable logging dynamically through HIP APIS. * SWDEV-549518 - Adding ROCProfiler related new API changes. * rocprofiler-sdk changes for hip api additions. --------- Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com> Co-authored-by: jainprad <92369414+jainprad@users.noreply.github.com>	2026-01-19 16:16:14 -05:00
systems-assistant[bot]	88f07baa92	SWDEV-493792 - add split barriers for grid_group (#508 ) * SWDEV-493792 - add split barriers for grid_group * add tests * Update change log * Add Navi4 split barrier * Update docs * Use new Catch2 Approx macro * Update split_barrier.cc to check for coop groups --------- Co-authored-by: Jatin Chaudhary <jatchaud@amd.com> Co-authored-by: Jatin Chaudhary <51944368+cjatin@users.noreply.github.com>	2026-01-19 09:17:00 -08:00
Fábio Mestre	e6236417f7	SWDEV-571222 - Fix bf16 headers on gcc (#2260 ) GCC does not support anonymous structs with members that have non-trivial constructors. This commit changes the header to remove the union when compiling with gcc. This should be a non-breaking change for other compilers.	2026-01-16 15:02:48 +00:00
Filip Jankovic	29cd25df66	Add hipDeviceAttributeExpertSchedMode (#2435 ) * Add hipDeviceAttributeExpertSchedMode --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com> * Update hipDeviceAttributeExpertSchedMode unit test * Move check to ROCr from thunk interface * Revert unrelated whitespace changes * Revert version bump --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>	2026-01-15 08:41:39 -08:00
David Yat Sin	a3b445118d	SWDEV-519413 - Ignore ROCr shutdown events (#1616 ) ROCr now reports a shutdown event, but this is not a fatal error. Ignore this event.	2026-01-14 11:28:03 -08:00
Luca Bruni	d7ff927690	[clr] Fix device printf pointer advancement issue with string format specifiers (#1313 )	2026-01-14 13:05:25 -05:00
AidanBeltonS	607d66e87c	Add messages to static asserts to prevent warnings (#1011 )	2026-01-13 14:02:36 +00:00
Fábio Mestre	09a01ee11c	Replace usages of __ockl_clz with builtins (#2234 )	2026-01-13 11:15:46 +01:00
Fábio Mestre	61325db1c8	Fix AMD_LOG_LEVEL_SIZE env variable (#2463 ) AMD_LOG_LEVEL_SIZE is being used in a global variable. This always uses the default value of 2048 because the HIP runtime doesn't have the opportunity to load environment variables at the point where global variables are initialized. The solution is to use AMD_LOG_LEVEL_SIZE inside truncate_log_file() function.	2026-01-13 09:57:49 +00:00
Jin Jung	d4758bc29e	SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323 )	2026-01-12 13:10:46 -06:00
SaleelK	e6e0378acd	clr: Always query new engine for intergpu copies (#2559 )	2026-01-12 11:01:02 -08:00
Andrei Kochin	5e15839611	Revert "SWDEV-566854 - Improve memory object handling (#1939 )" (#2572 ) This reverts commit `39d8432893`. rocprim failures were introduced with the commit. Based on the @erman-gurses investigation: Based on the list here: 2789ea4...050e88e https://github.com/ROCm/TheRock/actions/runs/20864279671 -> `e005f84` (FAILED) https://github.com/ROCm/TheRock/actions/runs/20867580342 -> `39d8432` (FAILED) https://github.com/ROCm/TheRock/actions/runs/20870979894 -> `88f4bb1` (PASSED) https://github.com/ROCm/TheRock/actions/runs/20872795557 -> `11d9472` (PASSED) So the issue comes from this commit SWDEV-566854 - Improve memory object handling (#1939) SHA: `39d8432`	2026-01-12 12:09:16 -05:00
AidanBeltonS	3309d7176b	SWDEV-557148 - Set primary context when device set (#1161 ) * SWDEV-557148 - Set activate context when device set * clang-format * Check for active status * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-09 17:09:40 +00:00
Alexandra Sidorova	38a359f5f3	[CLR] prevent compilation errors for non-HIP compilers in amd_hip_mx_common.h and amd_hip_ocp_types.h (#2448 ) Co-authored-by: Andrei Kochin <andrei.kochin@amd.com>	2026-01-08 17:49:13 +04:00
SaleelK	6b28faa532	clr: Implement per-stream SDMA engine affinity for improved copy performance (#2480 ) Problem: The existing SDMA engine selection logic had several issues: 1. Same VirtualGPU/stream could use different SDMA engines for consecutive async copies since copy_engine_status may report engines as busy 2. Busy and Preferred engine check for every copy 3. No global tracking of which VirtualGPU uses which engine, leading to suboptimal resource allocation Solution: Implemented a global SDMA engine allocator with per-stream affinity: - Added Device::SdmaEngineAllocator to manage VirtualGPU → engine assignments * Maintains global map of active assignments * Enforces exclusivity: different streams use different engines (except inter-GPU copies where preferred engines are prioritized for optimal hardware paths like XGMI links) * Thread-safe allocation/release with Monitor lock - Modified VirtualGPU to cache assigned engine locally (assigned_sdma_engine_) for fast lookup without map access on hot path - Refactored rocrCopyBuffer() to: 1. Check local cached engine first → use if assigned 2. Call AllocateSdmaEngine() if not assigned → cache result - Moved HSA API queries (memory_copy_engine_status, memory_get_preferred_copy_engine) into AllocateEngine() for cleaner separation of concerns - Engine release on HostQueue::finish() instead of only VirtualGPU destruction * Improves engine utilization by releasing earlier * Added virtual ReleaseSdmaEngines() method to device::VirtualDevice - Added future path for simple round-robin allocation (kUseSimpleRR) for next-gen GPUs with uniform SDMA bandwidth (disabled by default) Cleanup: - Removed selectSdmaEngine() helper (logic moved to allocator) - Removed getSdmaRWMasks() (allocator accesses maxSdmaReadMask_/WriteMask_ directly) - Removed unused sdmaEngineReadMask_/WriteMask_ member variables from DmaBlitManager Benefits: - Ensures consistent per-stream SDMA engine usage - Prevents cross-stream contention and engine thrashing - Prioritizes hardware-optimal paths for inter-GPU transfers - Better resource utilization through earlier release - Cleaner, more maintainable code structure	2026-01-07 19:37:45 -08:00
Godavarthy Surya, Anusha	1ef6a86ee3	SWDEV-549711 - Improve graph DEBUG dot print for segments (#2205 ) Co-authored-by: Anusha GodavarthySurya<agodavar@amd.com>	2026-01-07 14:07:49 +05:30
Gerardo Hernandez	50644f5aef	SWDEV-508225 remove assertions when loading fat binary (#2013 ) * SWDEV-508225 - do not assert() after calling digestFatBinary() if it fails. Otherwise this causes assertions to trigger easily in systems that have an APU and a discrete GPU and the code was compiled for the discrete one * SWDEV-508225 - fix that when using a non-existent ordinal in HIP_VISIBLE_DEVICES, getCurrentArch() would crash	2026-01-06 21:53:32 +00:00
AidanBeltonS	39d8432893	SWDEV-566854 - Improve memory object handling (#1939 ) * Improve memory object handling for memcpy * update * Pass offsets and make hip_graph changes * Update projects/clr/hipamd/src/hip_memory.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unnecessary command overload * Update based on feedback * Fix failing hipGraphTests * Fix graph bugs * Fix failing memcpy tests --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-05 18:05:56 +00:00
Shadi Dashmiz	2789ea429a	SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296 ) Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>	2026-01-05 11:33:11 -05:00
Ioannis Assiouras	aecc845456	SWDEV-573589 - Fixed performance regression due to the increase of the signal pool (#2470 )	2026-01-02 12:50:56 +00:00
Jimbo	a59d46ffbf	SWDEV-567545 - Implement block_rank in co-op grid groups (#2182 ) * SWDEV-567545 - Implement block_rank in co-op grid groups	2025-12-29 11:39:23 -05:00
German Andryeyev	741b4b9fdf	SWDEV-558849 - Fix Windows build for ROCR backend (#2368 )	2025-12-29 08:35:22 -05:00
Ioannis Assiouras	49b8900158	SWDEV-558849 - keep the lastEnqueueCommand_ when PAL backend is enabled (#2320 )	2025-12-23 21:24:09 +00:00
marandje	3e49440495	SWDEV-555178 - Calculate phys mem offset for remap range (#1879 )	2025-12-23 10:27:42 +01:00
Sourabh U Betigeri	d552491985	SWDEV-572329 - Remove barrier packet (#2304 )	2025-12-19 13:37:48 -08:00
Sourabh U Betigeri	fdc1660dfa	SWDEV-565304 - Pass numa node to migrate pages correctly (#1729 ) * SWDEV-565304 - Pass cpuId of the the thread currently running * SWDEV-565304 - Numa id to be returned * SWDEV-565304 - Numa id to be returned	2025-12-19 13:36:53 -08:00
Matt Arsenault	0c0d8dc974	SWDEV-548892 - Stop using __ockl_lane_id (#2186 ) __lane_id already exists and is identical.	2025-12-19 20:34:55 +01:00
Sourabh U Betigeri	883fdfb820	Revert "clr: Minor fixes for error return" (#2399 ) - This reverts commit 8dd8436e43c7f0d062fd73252bf61c35615d181d. - Resolve MIOpen test failures observed in TheRock - TheRock Issue: ROCm/TheRock#2642 - room-systems issue: #2400	2025-12-18 18:40:13 -05:00
Jatin Chaudhary	fdf73116d5	Do not allocate code objects when we map a static code object (#2332 )	2025-12-18 09:22:02 +00:00
Maneesh Gupta	4a9833e70e	Revert "Add HasExpertSchedMode device prop (#2241 )" (#2371 ) This reverts commit `c0b4aef5ad`.	2025-12-17 21:26:44 -08:00
Shadi Dashmiz	96f6b6e251	SWDEV-571304 : Fix the constructor for __half (#2240 ) - comply with cuda - Fix usecase for constexpr Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>	2025-12-17 11:15:20 -05:00
Filip Jankovic	c0b4aef5ad	Add HasExpertSchedMode device prop (#2241 ) * Add HasExpertSchedMode device prop * Add unit tests for HasExpertSchedMode * Add gfx12 check for HasExpertSchedMode prop * Update gfx major version check and test for ExpertSchedMode * Minor fix and ROCr version bump * Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h * Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h * Apply suggestion from @dayatsin-amd * Apply suggestion from @dayatsin-amd --------- Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com> Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>	2025-12-17 17:06:08 +01:00

1 2 3 4 5 ...

13244 Коммитов