76333 Révisions

Auteur SHA1 Message Date
vedithal-amd e005f8487b [rocprofiler-compute] Add gfx arch. based pre-processor guards and runtime checks in rocflop.cpp (#2487)
* Remove MFMA functionality in rocflop sample since its not supported in MI50

* Add gfx arc based support for MFMA and SMFMAC in rocflop.cpp

* Add --int32 usage doc

* Address review comments
2026-01-06 10:17:54 -05:00
Nusrat Islam 49d9f8cc27 use memcpy for local copies (#2121)
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>

[ROCm/rccl commit: b4a86ef680]
2026-01-06 09:00:57 -06:00
Nusrat Islam b4a86ef680 use memcpy for local copies (#2121)
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
2026-01-06 09:00:57 -06:00
Edgar Gabriel cc727261de disable the putmem_signal_on_stream on RO (#376)
it fails in about 50% of the cases. Will revisit later why it fails,
but RO is at the moment lower priority, so disabling the test for now.

[ROCm/rocshmem commit: ed2f75f1de]
2026-01-06 08:10:46 -06:00
Edgar Gabriel ed2f75f1de disable the putmem_signal_on_stream on RO (#376)
it fails in about 50% of the cases. Will revisit later why it fails,
but RO is at the moment lower priority, so disabling the test for now.
2026-01-06 08:10:46 -06:00
Jonathan R. Madsen 7fcea905f3 [rocprofiler-sdk] Fix double-buffering emplace and flush synchronization (#2334)
* Fix buffer tracing synchronization lock

- PR #529 (in rocprofiler-sdk-internal) introduced waiting on the syncer flag when emplacing in a buffer to prevent the overwriting buffer records currently being processed in a buffer flush callback
- The above fix introduced a block on the both buffers when a buffer flush callback was being executed instead of a block on the buffer being flushed.

* Add rocpd tests for duplicate records

* Address code review comments
2026-01-06 06:06:18 -06:00
habajpai-amd 9e4d1c31c7 fix: prevent static initialization deadlock in thread_data (#2474)
* fix: prevent static initialization deadlock in thread_data

* update comment
2026-01-06 16:39:32 +05:30
Longlong Yao c34ec1e52f wsl/librocdxg: Change scratch memory allocation
Calculate the actual scratch memory size required based on the
packet information for kernel dispatch.

If the required size exceeds the total allocated memory, scratch
memory must be reallocated. Otherwise, no action is needed.

miopen_gtest: Full/GPU_MIOpenDriverRegressionTest_FP16.MIOpenDriverRegressionHalf/0

Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Horatio Zhang <Hongkun.Zhang@amd.com>
2026-01-06 10:12:04 +08:00
Longlong Yao c3f55c8e59 wsl/librocdxg: Change scratch memory allocation
Calculate the actual scratch memory size required based on the
packet information for kernel dispatch.

If the required size exceeds the total allocated memory, scratch
memory must be reallocated. Otherwise, no action is needed.

miopen_gtest: Full/GPU_MIOpenDriverRegressionTest_FP16.MIOpenDriverRegressionHalf/0

Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Horatio Zhang <Hongkun.Zhang@amd.com>
2026-01-06 10:12:04 +08:00
Aurelien Bouteiller abb1e0684a Do not hardcode wf_size==64 in ionic provider (#367)
* Do not hardcode wf_size==64 in ionic provider

* Simpler same_qp_mask in ionic


[ROCm/rocshmem commit: 0c496d83d6]
2026-01-05 18:36:58 -05:00
Aurelien Bouteiller 0c496d83d6 Do not hardcode wf_size==64 in ionic provider (#367)
* Do not hardcode wf_size==64 in ionic provider

* Simpler same_qp_mask in ionic
2026-01-05 18:36:58 -05:00
Jeff Jiang bb47dd2537 Utils - Fix for the incorrect MD5 results when user crop is enabled. (#687)
* * rocDecode: Fixed the incorrect MD5 results when user crop is enabled.
 - When the user cropping is enabled, the cropping rect needs to be sent to the output surface info struct, which is used by the MD5 calculation.

* * rocDecode: Minor change based on review comment.

[ROCm/rocdecode commit: 8fbb195a72]
2026-01-05 13:11:59 -08:00
Jeff Jiang 8fbb195a72 Utils - Fix for the incorrect MD5 results when user crop is enabled. (#687)
* * rocDecode: Fixed the incorrect MD5 results when user crop is enabled.
 - When the user cropping is enabled, the cropping rect needs to be sent to the output surface info struct, which is used by the MD5 calculation.

* * rocDecode: Minor change based on review comment.
2026-01-05 13:11:59 -08:00
Jason Bonnell 1d5a6e9bfe Update rocprofiler workflows to use new mi325 runner names (#2467)
* Update rocprofiler workflows to use new runner naming for mi325

* Add input options to workflow_dispatch for rocprofiler-systems CI workflow

* Update runner name on therock-ci-linux.yml as well
2026-01-05 15:41:01 -05:00
AidanBeltonS 39d8432893 SWDEV-566854 - Improve memory object handling (#1939)
* Improve memory object handling for memcpy

* update

* Pass offsets and make hip_graph changes

* Update projects/clr/hipamd/src/hip_memory.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove unnecessary command overload

* Update based on feedback

* Fix failing hipGraphTests

* Fix graph bugs

* Fix failing memcpy tests

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-05 18:05:56 +00:00
Joseph Macaranas 11d9472e5f Bump TheRock SHA for CI 20251230 (#2466)
* Bump TheRock SHA for CI 20251230
* Remove patch and align workflows between OS
2026-01-05 13:00:37 -05:00
Benjamin Welton 7871f53563 Add gfx950 support to ValuPipeIssueUtil counter (#2396)
Add gfx950 (MI350) to the ValuPipeIssueUtil counter definition to
enable RDC_FI_PROF_VALU_PIPE_ISSUE_UTIL telemetry field support on
MI350 hardware.
2026-01-05 09:37:34 -08:00
Julia Jiang 88f4bb1988 SWDEV-564412 - fix test failure on hipSetValidDevices_with_hipMemcpyPeer (#2150) 2026-01-05 12:36:31 -05:00
Julia Jiang 0f0504d79d SWDEV-564412-Fix soft hang in HIP sub-test hipMemVmm_Uncached (#2223) 2026-01-05 12:36:08 -05:00
Julia Jiang 3568e0df02 SWDEV-563487 - Fix catch tests failures on Windows (#2097) 2026-01-05 12:35:41 -05:00
Shadi Dashmiz 2789ea429a SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296)
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-05 11:33:11 -05:00
Avinash de23e1db6d Navi4 LL enablement and tuning (#2095)
* LL enablement for gfx1201

* Single node LL/Simple tuning

* multinode algo/prto default choice

* First iteration of Table tuning

* gfx924 tuning table correction

* Addressing PR comments and prefix match fix


[ROCm/rccl commit: 9545ae04b2]
2026-01-05 10:17:12 -06:00
Avinash 9545ae04b2 Navi4 LL enablement and tuning (#2095)
* LL enablement for gfx1201

* Single node LL/Simple tuning

* multinode algo/prto default choice

* First iteration of Table tuning

* gfx924 tuning table correction

* Addressing PR comments and prefix match fix
2026-01-05 10:17:12 -06:00
Flora Cui 7d501366cb wsl/librocdxg: fix wgp count calc
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
2026-01-05 16:04:18 +08:00
jamessiddeley-amd 53fd27c0ed [rocprofiler-compute] Improve roofline logging for roofline.csv (#2390)
* enhanced roofline log output for graceful exit

* addressed comment, added block filtering

* ruff format
2026-01-02 14:41:28 -05:00
Swati Rawat 3f004c9237 Update using-rocprofv3-with-openmp.rst (#2473) 2026-01-02 22:29:39 +05:30
Sv. Lockal afaa412d9d [rocprofiler-register] Fix compilation with libc++ (#1241)
`tests/rocprofiler/rocprofiler.cpp` uses `std::string` without including `<string>` directly.
This works with libstdc++ due to transitive includes, but fails with libc++.

Closes #1240
2026-01-02 22:26:56 +05:30
Ioannis Assiouras aecc845456 SWDEV-573589 - Fixed performance regression due to the increase of the signal pool (#2470) 2026-01-02 12:50:56 +00:00
Nusrat Islam 57f81914d8 gfx950: restrict maxChannels to 48 for multi-node collectives (#2116)
* gfx950: restrict maxChannels to 48 for multi-node collectives

* change env name for reduced CU config

---------

Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>

[ROCm/rccl commit: f756aa9add]
2025-12-31 09:28:19 -06:00
Nusrat Islam f756aa9add gfx950: restrict maxChannels to 48 for multi-node collectives (#2116)
* gfx950: restrict maxChannels to 48 for multi-node collectives

* change env name for reduced CU config

---------

Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
2025-12-31 09:28:19 -06:00
Joseph Narlo 03f714dd25 [SWDEV-567254] Sync Unified and Linux header (#2220)
* [SWDEV-567254] Sync Unified and Linux header

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>

* Latest sync changes

* Sync

* Add back guest_windows tag

* Sync

---------

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2025-12-30 13:27:55 -06:00
vedithal-amd ca32193c84 Fix test cases (#2462) 2025-12-30 11:39:20 -05:00
amd-jiali 7d25ecc65c Add an environment variable to allow user explicitly turn off direct AllGather (#2119)
Co-authored-by: Jiali Li <jialili@amd.com>

[ROCm/rccl commit: 935208ad09]
2025-12-29 16:43:40 -08:00
amd-jiali 935208ad09 Add an environment variable to allow user explicitly turn off direct AllGather (#2119)
Co-authored-by: Jiali Li <jialili@amd.com>
2025-12-29 16:43:40 -08:00
Jimbo a59d46ffbf SWDEV-567545 - Implement block_rank in co-op grid groups (#2182)
* SWDEV-567545 - Implement block_rank in co-op grid groups
2025-12-29 11:39:23 -05:00
Adam Pryor 5bf6e366dd [SWDEV-548460] Add RDC Policy Reset Message (#2180)
* [SWDEV-548460] Add RDC Policy Reset Message

* [rdc] Bump version to 1.3.0

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

* chore: [rdc] Format CMakeLists.txt

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-12-29 08:31:13 -08:00
German Andryeyev 741b4b9fdf SWDEV-558849 - Fix Windows build for ROCR backend (#2368) 2025-12-29 08:35:22 -05:00
vedithal-amd ea3fb1b810 Remove SMFMAC functionality in rocflop sample since its not supported in MI100 (#2456) 2025-12-27 09:47:54 -05:00
vedithal-amd 9c1560b8bb [rocprofiler-compute] Fix merging logic for multi process (#2445)
* Fix merging logic for multi process

* Fix dispatch id reset logic in case of rocpd format

* Fix kernel id reset logic in case of csv format

* Revert correlation logic change in csv format

* Do inner join instead of left join
2025-12-27 09:47:42 -05:00
abchoudh-amd 983386e40b [rocprofiler-compute] Write raw counter and metric values (#2314)
* Added tool for dumping counter and metric values

* Skip Linting

* Added support for iteration multiplexing

* Remove subparser and supress compute options

* Specify output dir

* Add kernel info

* csv name change

* Added comments

* Support dispatch id-less dataframes

* Formatting fix

* Add default for path

* Print help with no args

* Support only single workload
2025-12-26 14:06:57 +05:30
Avinash 2585ae8815 Virtual device enablement ( Minimal changes ) (#2110)
* minimal changes

* Setting Default tuning table

* Add warnings NIC merge accross PCIe Root complexes,NUMA

---------

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 6f62165369]
2025-12-25 15:06:33 -06:00
Avinash 6f62165369 Virtual device enablement ( Minimal changes ) (#2110)
* minimal changes

* Setting Default tuning table

* Add warnings NIC merge accross PCIe Root complexes,NUMA

---------

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2025-12-25 15:06:33 -06:00
marantic-amd bb83791b17 Remove redundant ROCPROFSYS_TRACE_CACHED variable from the code (#2434) 2025-12-25 13:36:04 +01:00
marantic-amd c3132773c8 Fix agent device ID in the cached kernel_dispatch trace (#2452) 2025-12-25 10:23:16 +01:00
Flora Cui c33dcd2d07 wsl/libhsakmt: fix reserved local help size calc
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/114>
2025-12-24 13:30:50 +08:00
Flora Cui 91df8f84da wsl/libhsakmt: implement hsaKmtSetMemoryUserData
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/114>
2025-12-24 13:30:50 +08:00
Flora Cui 4fb2ed2c5a librocdxg: fix vgpr count
refer to https://github.com/ROCm/rocm-systems/pull/1807

Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-12-24 13:30:45 +08:00
Longlong Yao 9bf8eb8c1e librocdxg: correct atomic info for APU
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:26:17 +08:00
Longlong Yao e616b3e65e librocdxg: use shared GPU memory as vram on small APU
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-12-24 13:23:07 +08:00
Longlong Yao 56eeaf26f8 librocdxg: query total shared GPU memory
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:14:55 +08:00