rocm-systems

Penulis	SHA1	Pesan	Tanggal
vedithal-amd	769d3dd67a	[rocprofiler-compute] Data imputation strategy for iteration multiplexing (#2468 ) * Data imputation strategy for iteration multiplexing * Implement data imputation methodology to handle missing counter values in case of iteration multiplexing * Enable dispatch filtering with iteration multiplexing since we are no longer merging dispatches * Bugfix to prevent check for missing counter values when using csv format when profiling with iteration multiplexing * Move warning and info message in case of iteration multiplexing to sanitize function which comes earlier in analyze mode * Address review comments * Fix typo in documentation * Move profiling config init. after path check in sanitize() * Graceful handling of dispatches with all counters empty within data imputation logic * Improve info message for iteration multiplexing based analysis * Ensure proper error message when trying to run iteration multiplexing with attach/detach * fix test case	2026-01-08 12:01:51 -05:00
systems-assistant[bot]	53c56fca5f	[SWDEV-558534] AMD-SMI bad pages add flag to convert to hex (#1900 ) * Simplify hex flag check for bad page info * moved the hex help text up with the other help text --------- Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com> Authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com> Co-authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com>	2026-01-08 10:21:10 -06:00
Bindhiya Kanangot Balakrishnan	8326c33d33	[SWDEV-573540] Add DRM-based wake for suspended AMD GPUs (#2510 ) Implements automatic device wake using getDRMDeviceId() DRM call when GPUs are detected in low-power state. This ensures rocm-smi can access device information on suspended GPUs. Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>	2026-01-08 10:19:45 -06:00
koushikbillakanti-amd	ac1fa8dccb	[SWDEV-567284] AMDSMI conceptual documentation for setting perf determinism (#2529 ) Authored-by: Koushik Billakanti <kbillaka@amd.com>	2026-01-08 08:04:23 -06:00
Alexandra Sidorova	38a359f5f3	[CLR] prevent compilation errors for non-HIP compilers in amd_hip_mx_common.h and amd_hip_ocp_types.h (#2448 ) Co-authored-by: Andrei Kochin <andrei.kochin@amd.com>	2026-01-08 17:49:13 +04:00
SaleelK	6b28faa532	clr: Implement per-stream SDMA engine affinity for improved copy performance (#2480 ) Problem: The existing SDMA engine selection logic had several issues: 1. Same VirtualGPU/stream could use different SDMA engines for consecutive async copies since copy_engine_status may report engines as busy 2. Busy and Preferred engine check for every copy 3. No global tracking of which VirtualGPU uses which engine, leading to suboptimal resource allocation Solution: Implemented a global SDMA engine allocator with per-stream affinity: - Added Device::SdmaEngineAllocator to manage VirtualGPU → engine assignments * Maintains global map of active assignments * Enforces exclusivity: different streams use different engines (except inter-GPU copies where preferred engines are prioritized for optimal hardware paths like XGMI links) * Thread-safe allocation/release with Monitor lock - Modified VirtualGPU to cache assigned engine locally (assigned_sdma_engine_) for fast lookup without map access on hot path - Refactored rocrCopyBuffer() to: 1. Check local cached engine first → use if assigned 2. Call AllocateSdmaEngine() if not assigned → cache result - Moved HSA API queries (memory_copy_engine_status, memory_get_preferred_copy_engine) into AllocateEngine() for cleaner separation of concerns - Engine release on HostQueue::finish() instead of only VirtualGPU destruction * Improves engine utilization by releasing earlier * Added virtual ReleaseSdmaEngines() method to device::VirtualDevice - Added future path for simple round-robin allocation (kUseSimpleRR) for next-gen GPUs with uniform SDMA bandwidth (disabled by default) Cleanup: - Removed selectSdmaEngine() helper (logic moved to allocator) - Removed getSdmaRWMasks() (allocator accesses maxSdmaReadMask_/WriteMask_ directly) - Removed unused sdmaEngineReadMask_/WriteMask_ member variables from DmaBlitManager Benefits: - Ensures consistent per-stream SDMA engine usage - Prevents cross-stream contention and engine thrashing - Prioritizes hardware-optimal paths for inter-GPU transfers - Better resource utilization through earlier release - Cleaner, more maintainable code structure	2026-01-07 19:37:45 -08:00
Flora Cui	be04fa8250	rocr: reorder HsaNodeProperties to improve compatibility (#2447 ) Signed-off-by: Flora Cui <flora.cui@amd.com>	2026-01-08 09:56:39 +08:00
David Galiffi	cb17e59a57	[rocprofiler-systems] Improve build time by refactoring RCCL test cmake (#1656 ) Improve cmake configuration time by making sure the rccl-tests are built during the build phase rather than the configuration phase.	2026-01-07 19:51:54 -05:00
anujshuk-amd	c35a7dd8cb	[rocprofiler-systems] Update timemory submodule (#2440 ) - Fixes SWDEV-559349 - Fix build failure caused by correct libunwind not being found in some environments. - Updated the `timemory` submodule to commit `24407d37ab85c46ba6c18fba9498320f825ee4e4 `.	2026-01-07 19:35:23 -05:00
Ajay GunaShekar	95ab459a4c	Use static catch2.lib instead of catch2.dll (#2419 ) * Use static catch2.lib instead of catch2.dll Using catch2.dll incraeses execution time by 12x * handle debug option for static catch2 * SWDEV-573539 - skip atomics on windows since its taking a very long time to execute mlsejenkins needs newer cmake but compiler breaks with newer versions so skipping on windows can be a workaround for now --------- Co-authored-by: Joseph Macaranas <145489236+jayhawk-commits@users.noreply.github.com>	2026-01-07 14:35:25 -08:00
Alysa Liu	5be4fddf06	kfdtest: Support blit kernel copy (#677 ) Add support for blit kernel copy. Add GpuMemCopyTest test for KFDQMTest.	2026-01-07 16:48:11 -05:00
David Yat Sin	7178747ebc	Update CODEOWNERS for ROCR-Runtime (#2521 )	2026-01-07 14:22:11 -05:00
Aleksandar Djordjevic	aecea25a61	[rocprofiler-systems] CMake Cleanup (#2455 ) ## Technical Details - Removed `configure_file()` call that was generating `defines.hpp` from `defines.hpp.in` and update CMake file to reference renamed file. - Remove duplicate `find_library(pthread_LIBRARY NAMES pthread pthreads)`	2026-01-07 14:07:37 -05:00
anujshuk-amd	596ffce5fe	[rocprof-sys] Fix segfault from thread ID array overflow (#2172 ) Thread limit configuration and enforcement: * Added a check in `CMakeLists.txt` to ensure `ROCPROFSYS_MAX_THREADS` is at least 128, automatically setting it to 128 with a warning if a lower value is provided. * Replaced hardcoded thread limit (`allowed_max_threads`) in `pthread_create_gotcha.cpp` with the configurable `ROCPROFSYS_MAX_THREADS` value, ensuring all runtime checks and warnings use the actual configured limit. Documentation improvements: * Updated the development guide to explain the new thread limit behavior, including how exceeding the limit is handled gracefully, how to configure it, and the build-time validation rules. Test updates: * Modified thread limit tests to use the configurable `ROCPROFSYS_MAX_THREADS` value instead of a hardcoded limit and expanded the range of tested thread values. * Increased test timeouts to accommodate larger thread counts and ensure reliability with higher limits.	2026-01-07 14:03:37 -05:00
vedithal-amd	050e88ee71	Remove unused python packages (#2437 ) * Remove dependency on following unused python packages by updating requirements.txt, LICENSE, standalone binary requirements, cmake and docker requirements * matplotlib * kaleido * pymongo * colorlover * tqdm * Remove unused code from src/utils/gui.py * Reformat python using ruff	2026-01-07 09:03:49 -05:00
Godavarthy Surya, Anusha	1ef6a86ee3	SWDEV-549711 - Improve graph DEBUG dot print for segments (#2205 ) Co-authored-by: Anusha GodavarthySurya<agodavar@amd.com>	2026-01-07 14:07:49 +05:30
Stella Laurenzo	81eed26ec6	[amdsmi] Add include dirs for libdrm. (#2504 ) This has started failing on various developer build systems. Looking at it, it is not precisely clear how this ever worked given that nothing appears to be adding the DRM include dirs. I'd prefer that we remove this delay loading (at least for TheRock builds where it is never needed), but in the meantime, this does fix the issue and is verified on an affected system. Fixes https://github.com/ROCm/TheRock/issues/2744	2026-01-06 15:18:20 -08:00
Yazen AL Musaffar	cb372748f8	[ROCM-SMI] [SWDEV-569731] rsmi tests failing on Frequency/Power/GpuMetrics ReadOnly Fix (#2303 ) * Updated unsupported metric version file for rocm_smi_tests Frequency/Power/GpuMetrics ReadOnly tests Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>	2026-01-06 16:46:38 -06:00
Gerardo Hernandez	50644f5aef	SWDEV-508225 remove assertions when loading fat binary (#2013 ) * SWDEV-508225 - do not assert() after calling digestFatBinary() if it fails. Otherwise this causes assertions to trigger easily in systems that have an APU and a discrete GPU and the code was compiled for the discrete one * SWDEV-508225 - fix that when using a non-existent ordinal in HIP_VISIBLE_DEVICES, getCurrentArch() would crash	2026-01-06 21:53:32 +00:00
Daniel Oliveira	32fde0f73d	[SWDEV-568613] Add gpu_metrics 1.0 support for older GPUs (#2444 ) fix: Add gpu_metrics 1.0 support which is still used by some hardware Code changes related to the following: * APIs * Unit tests Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>	2026-01-06 14:25:13 -06:00
systems-assistant[bot]	c6b7448227	Add support for get and set APIs for CPUISOFreqPolicy and DFCState Co… (#1901 ) * Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control - Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control in AMD SMI and also in the CLI tool * CHANGELOG.md file updated * SWDEV-562837: Update amdsmi-py-api.md as per the new APIs Updated amdsmi-py-api.md as per the new APIs added. --------- Signed-off-by: Soumya <sranjanr@amd.com> Signed-off-by: gabrpham <Gabriel.Pham@amd.com> Co-authored-by: Saka Sitharammurthy <SitharamMurthy.Saka@amd.com>	2026-01-06 10:37:07 -06:00
SakaSitharammurthy	6c98c49362	[SWDEV-568731] Updated example code in amdsmi-py-api.md file (#2311 ) Addresses: - SWDEV-568731 - SWDEV-568724 - SWDEV-568695 Signed-off-by: Saka, SitharamMurthy <SitharamMurthy.Saka@amd.com>	2026-01-06 10:34:36 -06:00
pghoshamd	637b0d71f0	SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers (#2146 ) * SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers * Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex * Replaced unique_locks with lock_guards * More changes * Replace new and deletes with smart pointers * Replaced some more with shared ptrs * Replacements with smart pointers - pt 2 * missed change	2026-01-06 10:59:34 -05:00
vedithal-amd	e005f8487b	[rocprofiler-compute] Add gfx arch. based pre-processor guards and runtime checks in rocflop.cpp (#2487 ) * Remove MFMA functionality in rocflop sample since its not supported in MI50 * Add gfx arc based support for MFMA and SMFMAC in rocflop.cpp * Add --int32 usage doc * Address review comments	2026-01-06 10:17:54 -05:00
Jonathan R. Madsen	7fcea905f3	[rocprofiler-sdk] Fix double-buffering emplace and flush synchronization (#2334 ) * Fix buffer tracing synchronization lock - PR #529 (in rocprofiler-sdk-internal) introduced waiting on the syncer flag when emplacing in a buffer to prevent the overwriting buffer records currently being processed in a buffer flush callback - The above fix introduced a block on the both buffers when a buffer flush callback was being executed instead of a block on the buffer being flushed. * Add rocpd tests for duplicate records * Address code review comments	2026-01-06 06:06:18 -06:00
habajpai-amd	9e4d1c31c7	fix: prevent static initialization deadlock in thread_data (#2474 ) * fix: prevent static initialization deadlock in thread_data * update comment	2026-01-06 16:39:32 +05:30
Jason Bonnell	1d5a6e9bfe	Update rocprofiler workflows to use new mi325 runner names (#2467 ) * Update rocprofiler workflows to use new runner naming for mi325 * Add input options to workflow_dispatch for rocprofiler-systems CI workflow * Update runner name on therock-ci-linux.yml as well	2026-01-05 15:41:01 -05:00
AidanBeltonS	39d8432893	SWDEV-566854 - Improve memory object handling (#1939 ) * Improve memory object handling for memcpy * update * Pass offsets and make hip_graph changes * Update projects/clr/hipamd/src/hip_memory.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Remove unnecessary command overload * Update based on feedback * Fix failing hipGraphTests * Fix graph bugs * Fix failing memcpy tests --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-05 18:05:56 +00:00
Joseph Macaranas	11d9472e5f	Bump TheRock SHA for CI 20251230 (#2466 ) * Bump TheRock SHA for CI 20251230 * Remove patch and align workflows between OS	2026-01-05 13:00:37 -05:00
Benjamin Welton	7871f53563	Add gfx950 support to ValuPipeIssueUtil counter (#2396 ) Add gfx950 (MI350) to the ValuPipeIssueUtil counter definition to enable RDC_FI_PROF_VALU_PIPE_ISSUE_UTIL telemetry field support on MI350 hardware.	2026-01-05 09:37:34 -08:00
Julia Jiang	88f4bb1988	SWDEV-564412 - fix test failure on hipSetValidDevices_with_hipMemcpyPeer (#2150 )	2026-01-05 12:36:31 -05:00
Julia Jiang	0f0504d79d	SWDEV-564412-Fix soft hang in HIP sub-test hipMemVmm_Uncached (#2223 )	2026-01-05 12:36:08 -05:00
Julia Jiang	3568e0df02	SWDEV-563487 - Fix catch tests failures on Windows (#2097 )	2026-01-05 12:35:41 -05:00
Shadi Dashmiz	2789ea429a	SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296 ) Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>	2026-01-05 11:33:11 -05:00
jamessiddeley-amd	53fd27c0ed	[rocprofiler-compute] Improve roofline logging for roofline.csv (#2390 ) * enhanced roofline log output for graceful exit * addressed comment, added block filtering * ruff format	2026-01-02 14:41:28 -05:00
Swati Rawat	3f004c9237	Update using-rocprofv3-with-openmp.rst (#2473 )	2026-01-02 22:29:39 +05:30
Sv. Lockal	afaa412d9d	[rocprofiler-register] Fix compilation with libc++ (#1241 ) `tests/rocprofiler/rocprofiler.cpp` uses `std::string` without including `<string>` directly. This works with libstdc++ due to transitive includes, but fails with libc++. Closes #1240	2026-01-02 22:26:56 +05:30
Ioannis Assiouras	aecc845456	SWDEV-573589 - Fixed performance regression due to the increase of the signal pool (#2470 )	2026-01-02 12:50:56 +00:00
Joseph Narlo	03f714dd25	[SWDEV-567254] Sync Unified and Linux header (#2220 ) * [SWDEV-567254] Sync Unified and Linux header Signed-off-by: Joseph Narlo <joseph.narlo@amd.com> * Latest sync changes * Sync * Add back guest_windows tag * Sync --------- Signed-off-by: Joseph Narlo <joseph.narlo@amd.com> Co-authored-by: amd-josnarlo <josnarlo.amd.com>	2025-12-30 13:27:55 -06:00
vedithal-amd	ca32193c84	Fix test cases (#2462 )	2025-12-30 11:39:20 -05:00
Jimbo	a59d46ffbf	SWDEV-567545 - Implement block_rank in co-op grid groups (#2182 ) * SWDEV-567545 - Implement block_rank in co-op grid groups	2025-12-29 11:39:23 -05:00
Adam Pryor	5bf6e366dd	[SWDEV-548460] Add RDC Policy Reset Message (#2180 ) * [SWDEV-548460] Add RDC Policy Reset Message * [rdc] Bump version to 1.3.0 Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com> * chore: [rdc] Format CMakeLists.txt Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com> --------- Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com> Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>	2025-12-29 08:31:13 -08:00
German Andryeyev	741b4b9fdf	SWDEV-558849 - Fix Windows build for ROCR backend (#2368 )	2025-12-29 08:35:22 -05:00
vedithal-amd	ea3fb1b810	Remove SMFMAC functionality in rocflop sample since its not supported in MI100 (#2456 )	2025-12-27 09:47:54 -05:00
vedithal-amd	9c1560b8bb	[rocprofiler-compute] Fix merging logic for multi process (#2445 ) * Fix merging logic for multi process * Fix dispatch id reset logic in case of rocpd format * Fix kernel id reset logic in case of csv format * Revert correlation logic change in csv format * Do inner join instead of left join	2025-12-27 09:47:42 -05:00
abchoudh-amd	983386e40b	[rocprofiler-compute] Write raw counter and metric values (#2314 ) * Added tool for dumping counter and metric values * Skip Linting * Added support for iteration multiplexing * Remove subparser and supress compute options * Specify output dir * Add kernel info * csv name change * Added comments * Support dispatch id-less dataframes * Formatting fix * Add default for path * Print help with no args * Support only single workload	2025-12-26 14:06:57 +05:30
marantic-amd	bb83791b17	Remove redundant ROCPROFSYS_TRACE_CACHED variable from the code (#2434 )	2025-12-25 13:36:04 +01:00
marantic-amd	c3132773c8	Fix agent device ID in the cached kernel_dispatch trace (#2452 )	2025-12-25 10:23:16 +01:00
Bindhiya Kanangot Balakrishnan	641fa27699	[SWDEV-566543] Fix param validation in FrequenciesRead test (#2430 ) Fixed incorrect error code expectation in FrequenciesRead test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr parameter. Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>	2025-12-23 15:38:25 -08:00
Ioannis Assiouras	49b8900158	SWDEV-558849 - keep the lastEnqueueCommand_ when PAL backend is enabled (#2320 )	2025-12-23 21:24:09 +00:00

1 2 3 4 5 ...

71068 Melakukan