vedithal-amd
e005f8487b
[rocprofiler-compute] Add gfx arch. based pre-processor guards and runtime checks in rocflop.cpp ( #2487 )
...
* Remove MFMA functionality in rocflop sample since its not supported in MI50
* Add gfx arc based support for MFMA and SMFMAC in rocflop.cpp
* Add --int32 usage doc
* Address review comments
2026-01-06 10:17:54 -05:00
Nusrat Islam
49d9f8cc27
use memcpy for local copies ( #2121 )
...
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu >
[ROCm/rccl commit: b4a86ef680 ]
2026-01-06 09:00:57 -06:00
Nusrat Islam
b4a86ef680
use memcpy for local copies ( #2121 )
...
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu >
2026-01-06 09:00:57 -06:00
Edgar Gabriel
cc727261de
disable the putmem_signal_on_stream on RO ( #376 )
...
it fails in about 50% of the cases. Will revisit later why it fails,
but RO is at the moment lower priority, so disabling the test for now.
[ROCm/rocshmem commit: ed2f75f1de ]
2026-01-06 08:10:46 -06:00
Edgar Gabriel
ed2f75f1de
disable the putmem_signal_on_stream on RO ( #376 )
...
it fails in about 50% of the cases. Will revisit later why it fails,
but RO is at the moment lower priority, so disabling the test for now.
2026-01-06 08:10:46 -06:00
Jonathan R. Madsen
7fcea905f3
[rocprofiler-sdk] Fix double-buffering emplace and flush synchronization ( #2334 )
...
* Fix buffer tracing synchronization lock
- PR #529 (in rocprofiler-sdk-internal) introduced waiting on the syncer flag when emplacing in a buffer to prevent the overwriting buffer records currently being processed in a buffer flush callback
- The above fix introduced a block on the both buffers when a buffer flush callback was being executed instead of a block on the buffer being flushed.
* Add rocpd tests for duplicate records
* Address code review comments
2026-01-06 06:06:18 -06:00
habajpai-amd
9e4d1c31c7
fix: prevent static initialization deadlock in thread_data ( #2474 )
...
* fix: prevent static initialization deadlock in thread_data
* update comment
2026-01-06 16:39:32 +05:30
Longlong Yao
c34ec1e52f
wsl/librocdxg: Change scratch memory allocation
...
Calculate the actual scratch memory size required based on the
packet information for kernel dispatch.
If the required size exceeds the total allocated memory, scratch
memory must be reallocated. Otherwise, no action is needed.
miopen_gtest: Full/GPU_MIOpenDriverRegressionTest_FP16.MIOpenDriverRegressionHalf/0
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com >
Reviewed-by: Flora Cui <flora.cui@amd.com >
Reviewed-by: Horatio Zhang <Hongkun.Zhang@amd.com >
2026-01-06 10:12:04 +08:00
Longlong Yao
c3f55c8e59
wsl/librocdxg: Change scratch memory allocation
...
Calculate the actual scratch memory size required based on the
packet information for kernel dispatch.
If the required size exceeds the total allocated memory, scratch
memory must be reallocated. Otherwise, no action is needed.
miopen_gtest: Full/GPU_MIOpenDriverRegressionTest_FP16.MIOpenDriverRegressionHalf/0
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com >
Reviewed-by: Flora Cui <flora.cui@amd.com >
Reviewed-by: Horatio Zhang <Hongkun.Zhang@amd.com >
2026-01-06 10:12:04 +08:00
Aurelien Bouteiller
abb1e0684a
Do not hardcode wf_size==64 in ionic provider ( #367 )
...
* Do not hardcode wf_size==64 in ionic provider
* Simpler same_qp_mask in ionic
[ROCm/rocshmem commit: 0c496d83d6 ]
2026-01-05 18:36:58 -05:00
Aurelien Bouteiller
0c496d83d6
Do not hardcode wf_size==64 in ionic provider ( #367 )
...
* Do not hardcode wf_size==64 in ionic provider
* Simpler same_qp_mask in ionic
2026-01-05 18:36:58 -05:00
Jeff Jiang
bb47dd2537
Utils - Fix for the incorrect MD5 results when user crop is enabled. ( #687 )
...
* * rocDecode: Fixed the incorrect MD5 results when user crop is enabled.
- When the user cropping is enabled, the cropping rect needs to be sent to the output surface info struct, which is used by the MD5 calculation.
* * rocDecode: Minor change based on review comment.
[ROCm/rocdecode commit: 8fbb195a72 ]
2026-01-05 13:11:59 -08:00
Jeff Jiang
8fbb195a72
Utils - Fix for the incorrect MD5 results when user crop is enabled. ( #687 )
...
* * rocDecode: Fixed the incorrect MD5 results when user crop is enabled.
- When the user cropping is enabled, the cropping rect needs to be sent to the output surface info struct, which is used by the MD5 calculation.
* * rocDecode: Minor change based on review comment.
2026-01-05 13:11:59 -08:00
Jason Bonnell
1d5a6e9bfe
Update rocprofiler workflows to use new mi325 runner names ( #2467 )
...
* Update rocprofiler workflows to use new runner naming for mi325
* Add input options to workflow_dispatch for rocprofiler-systems CI workflow
* Update runner name on therock-ci-linux.yml as well
2026-01-05 15:41:01 -05:00
AidanBeltonS
39d8432893
SWDEV-566854 - Improve memory object handling ( #1939 )
...
* Improve memory object handling for memcpy
* update
* Pass offsets and make hip_graph changes
* Update projects/clr/hipamd/src/hip_memory.cpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* Remove unnecessary command overload
* Update based on feedback
* Fix failing hipGraphTests
* Fix graph bugs
* Fix failing memcpy tests
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2026-01-05 18:05:56 +00:00
Joseph Macaranas
11d9472e5f
Bump TheRock SHA for CI 20251230 ( #2466 )
...
* Bump TheRock SHA for CI 20251230
* Remove patch and align workflows between OS
2026-01-05 13:00:37 -05:00
Benjamin Welton
7871f53563
Add gfx950 support to ValuPipeIssueUtil counter ( #2396 )
...
Add gfx950 (MI350) to the ValuPipeIssueUtil counter definition to
enable RDC_FI_PROF_VALU_PIPE_ISSUE_UTIL telemetry field support on
MI350 hardware.
2026-01-05 09:37:34 -08:00
Julia Jiang
88f4bb1988
SWDEV-564412 - fix test failure on hipSetValidDevices_with_hipMemcpyPeer ( #2150 )
2026-01-05 12:36:31 -05:00
Julia Jiang
0f0504d79d
SWDEV-564412-Fix soft hang in HIP sub-test hipMemVmm_Uncached ( #2223 )
2026-01-05 12:36:08 -05:00
Julia Jiang
3568e0df02
SWDEV-563487 - Fix catch tests failures on Windows ( #2097 )
2026-01-05 12:35:41 -05:00
Shadi Dashmiz
2789ea429a
SWDEV-565300: Fix coherency range mode in mem pool pointers ( #2296 )
...
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com >
2026-01-05 11:33:11 -05:00
Avinash
de23e1db6d
Navi4 LL enablement and tuning ( #2095 )
...
* LL enablement for gfx1201
* Single node LL/Simple tuning
* multinode algo/prto default choice
* First iteration of Table tuning
* gfx924 tuning table correction
* Addressing PR comments and prefix match fix
[ROCm/rccl commit: 9545ae04b2 ]
2026-01-05 10:17:12 -06:00
Avinash
9545ae04b2
Navi4 LL enablement and tuning ( #2095 )
...
* LL enablement for gfx1201
* Single node LL/Simple tuning
* multinode algo/prto default choice
* First iteration of Table tuning
* gfx924 tuning table correction
* Addressing PR comments and prefix match fix
2026-01-05 10:17:12 -06:00
Flora Cui
7d501366cb
wsl/librocdxg: fix wgp count calc
...
Signed-off-by: Flora Cui <flora.cui@amd.com >
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com >
2026-01-05 16:04:18 +08:00
jamessiddeley-amd
53fd27c0ed
[rocprofiler-compute] Improve roofline logging for roofline.csv ( #2390 )
...
* enhanced roofline log output for graceful exit
* addressed comment, added block filtering
* ruff format
2026-01-02 14:41:28 -05:00
Swati Rawat
3f004c9237
Update using-rocprofv3-with-openmp.rst ( #2473 )
2026-01-02 22:29:39 +05:30
Sv. Lockal
afaa412d9d
[rocprofiler-register] Fix compilation with libc++ ( #1241 )
...
`tests/rocprofiler/rocprofiler.cpp` uses `std::string` without including `<string>` directly.
This works with libstdc++ due to transitive includes, but fails with libc++.
Closes #1240
2026-01-02 22:26:56 +05:30
Ioannis Assiouras
aecc845456
SWDEV-573589 - Fixed performance regression due to the increase of the signal pool ( #2470 )
2026-01-02 12:50:56 +00:00
Nusrat Islam
57f81914d8
gfx950: restrict maxChannels to 48 for multi-node collectives ( #2116 )
...
* gfx950: restrict maxChannels to 48 for multi-node collectives
* change env name for reduced CU config
---------
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu >
Co-authored-by: Islam <nusislam@amd.com >
[ROCm/rccl commit: f756aa9add ]
2025-12-31 09:28:19 -06:00
Nusrat Islam
f756aa9add
gfx950: restrict maxChannels to 48 for multi-node collectives ( #2116 )
...
* gfx950: restrict maxChannels to 48 for multi-node collectives
* change env name for reduced CU config
---------
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu >
Co-authored-by: Islam <nusislam@amd.com >
2025-12-31 09:28:19 -06:00
Joseph Narlo
03f714dd25
[SWDEV-567254] Sync Unified and Linux header ( #2220 )
...
* [SWDEV-567254] Sync Unified and Linux header
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com >
* Latest sync changes
* Sync
* Add back guest_windows tag
* Sync
---------
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com >
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2025-12-30 13:27:55 -06:00
vedithal-amd
ca32193c84
Fix test cases ( #2462 )
2025-12-30 11:39:20 -05:00
amd-jiali
7d25ecc65c
Add an environment variable to allow user explicitly turn off direct AllGather ( #2119 )
...
Co-authored-by: Jiali Li <jialili@amd.com >
[ROCm/rccl commit: 935208ad09 ]
2025-12-29 16:43:40 -08:00
amd-jiali
935208ad09
Add an environment variable to allow user explicitly turn off direct AllGather ( #2119 )
...
Co-authored-by: Jiali Li <jialili@amd.com >
2025-12-29 16:43:40 -08:00
Jimbo
a59d46ffbf
SWDEV-567545 - Implement block_rank in co-op grid groups ( #2182 )
...
* SWDEV-567545 - Implement block_rank in co-op grid groups
2025-12-29 11:39:23 -05:00
Adam Pryor
5bf6e366dd
[SWDEV-548460] Add RDC Policy Reset Message ( #2180 )
...
* [SWDEV-548460] Add RDC Policy Reset Message
* [rdc] Bump version to 1.3.0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
* chore: [rdc] Format CMakeLists.txt
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
---------
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-12-29 08:31:13 -08:00
German Andryeyev
741b4b9fdf
SWDEV-558849 - Fix Windows build for ROCR backend ( #2368 )
2025-12-29 08:35:22 -05:00
vedithal-amd
ea3fb1b810
Remove SMFMAC functionality in rocflop sample since its not supported in MI100 ( #2456 )
2025-12-27 09:47:54 -05:00
vedithal-amd
9c1560b8bb
[rocprofiler-compute] Fix merging logic for multi process ( #2445 )
...
* Fix merging logic for multi process
* Fix dispatch id reset logic in case of rocpd format
* Fix kernel id reset logic in case of csv format
* Revert correlation logic change in csv format
* Do inner join instead of left join
2025-12-27 09:47:42 -05:00
abchoudh-amd
983386e40b
[rocprofiler-compute] Write raw counter and metric values ( #2314 )
...
* Added tool for dumping counter and metric values
* Skip Linting
* Added support for iteration multiplexing
* Remove subparser and supress compute options
* Specify output dir
* Add kernel info
* csv name change
* Added comments
* Support dispatch id-less dataframes
* Formatting fix
* Add default for path
* Print help with no args
* Support only single workload
2025-12-26 14:06:57 +05:30
Avinash
2585ae8815
Virtual device enablement ( Minimal changes ) ( #2110 )
...
* minimal changes
* Setting Default tuning table
* Add warnings NIC merge accross PCIe Root complexes,NUMA
---------
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com >
[ROCm/rccl commit: 6f62165369 ]
2025-12-25 15:06:33 -06:00
Avinash
6f62165369
Virtual device enablement ( Minimal changes ) ( #2110 )
...
* minimal changes
* Setting Default tuning table
* Add warnings NIC merge accross PCIe Root complexes,NUMA
---------
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com >
2025-12-25 15:06:33 -06:00
marantic-amd
bb83791b17
Remove redundant ROCPROFSYS_TRACE_CACHED variable from the code ( #2434 )
2025-12-25 13:36:04 +01:00
marantic-amd
c3132773c8
Fix agent device ID in the cached kernel_dispatch trace ( #2452 )
2025-12-25 10:23:16 +01:00
Flora Cui
c33dcd2d07
wsl/libhsakmt: fix reserved local help size calc
...
Signed-off-by: Flora Cui <flora.cui@amd.com >
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com >
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/114 >
2025-12-24 13:30:50 +08:00
Flora Cui
91df8f84da
wsl/libhsakmt: implement hsaKmtSetMemoryUserData
...
Signed-off-by: Flora Cui <flora.cui@amd.com >
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com >
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/114 >
2025-12-24 13:30:50 +08:00
Flora Cui
4fb2ed2c5a
librocdxg: fix vgpr count
...
refer to https://github.com/ROCm/rocm-systems/pull/1807
Signed-off-by: Flora Cui <flora.cui@amd.com >
2025-12-24 13:30:45 +08:00
Longlong Yao
9bf8eb8c1e
librocdxg: correct atomic info for APU
...
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com >
2025-12-24 13:26:17 +08:00
Longlong Yao
e616b3e65e
librocdxg: use shared GPU memory as vram on small APU
...
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com >
Signed-off-by: Flora Cui <flora.cui@amd.com >
2025-12-24 13:23:07 +08:00
Longlong Yao
56eeaf26f8
librocdxg: query total shared GPU memory
...
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com >
2025-12-24 13:14:55 +08:00