Graphe des révisions

74766 Révisions

Auteur SHA1 Message Date
Joseph Macaranas 11d9472e5f Bump TheRock SHA for CI 20251230 (#2466)
* Bump TheRock SHA for CI 20251230
* Remove patch and align workflows between OS
2026-01-05 13:00:37 -05:00
Benjamin Welton 7871f53563 Add gfx950 support to ValuPipeIssueUtil counter (#2396)
Add gfx950 (MI350) to the ValuPipeIssueUtil counter definition to
enable RDC_FI_PROF_VALU_PIPE_ISSUE_UTIL telemetry field support on
MI350 hardware.
2026-01-05 09:37:34 -08:00
Julia Jiang 88f4bb1988 SWDEV-564412 - fix test failure on hipSetValidDevices_with_hipMemcpyPeer (#2150) 2026-01-05 12:36:31 -05:00
Julia Jiang 0f0504d79d SWDEV-564412-Fix soft hang in HIP sub-test hipMemVmm_Uncached (#2223) 2026-01-05 12:36:08 -05:00
Julia Jiang 3568e0df02 SWDEV-563487 - Fix catch tests failures on Windows (#2097) 2026-01-05 12:35:41 -05:00
Shadi Dashmiz 2789ea429a SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296)
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-05 11:33:11 -05:00
Avinash de23e1db6d Navi4 LL enablement and tuning (#2095)
* LL enablement for gfx1201

* Single node LL/Simple tuning

* multinode algo/prto default choice

* First iteration of Table tuning

* gfx924 tuning table correction

* Addressing PR comments and prefix match fix


[ROCm/rccl commit: 9545ae04b2]
2026-01-05 10:17:12 -06:00
Avinash 9545ae04b2 Navi4 LL enablement and tuning (#2095)
* LL enablement for gfx1201

* Single node LL/Simple tuning

* multinode algo/prto default choice

* First iteration of Table tuning

* gfx924 tuning table correction

* Addressing PR comments and prefix match fix
2026-01-05 10:17:12 -06:00
Flora Cui 7d501366cb wsl/librocdxg: fix wgp count calc
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
2026-01-05 16:04:18 +08:00
jamessiddeley-amd 53fd27c0ed [rocprofiler-compute] Improve roofline logging for roofline.csv (#2390)
* enhanced roofline log output for graceful exit

* addressed comment, added block filtering

* ruff format
2026-01-02 14:41:28 -05:00
Swati Rawat 3f004c9237 Update using-rocprofv3-with-openmp.rst (#2473) 2026-01-02 22:29:39 +05:30
Sv. Lockal afaa412d9d [rocprofiler-register] Fix compilation with libc++ (#1241)
`tests/rocprofiler/rocprofiler.cpp` uses `std::string` without including `<string>` directly.
This works with libstdc++ due to transitive includes, but fails with libc++.

Closes #1240
2026-01-02 22:26:56 +05:30
Ioannis Assiouras aecc845456 SWDEV-573589 - Fixed performance regression due to the increase of the signal pool (#2470) 2026-01-02 12:50:56 +00:00
Nusrat Islam 57f81914d8 gfx950: restrict maxChannels to 48 for multi-node collectives (#2116)
* gfx950: restrict maxChannels to 48 for multi-node collectives

* change env name for reduced CU config

---------

Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>

[ROCm/rccl commit: f756aa9add]
2025-12-31 09:28:19 -06:00
Nusrat Islam f756aa9add gfx950: restrict maxChannels to 48 for multi-node collectives (#2116)
* gfx950: restrict maxChannels to 48 for multi-node collectives

* change env name for reduced CU config

---------

Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
2025-12-31 09:28:19 -06:00
Joseph Narlo 03f714dd25 [SWDEV-567254] Sync Unified and Linux header (#2220)
* [SWDEV-567254] Sync Unified and Linux header

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>

* Latest sync changes

* Sync

* Add back guest_windows tag

* Sync

---------

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2025-12-30 13:27:55 -06:00
vedithal-amd ca32193c84 Fix test cases (#2462) 2025-12-30 11:39:20 -05:00
amd-jiali 7d25ecc65c Add an environment variable to allow user explicitly turn off direct AllGather (#2119)
Co-authored-by: Jiali Li <jialili@amd.com>

[ROCm/rccl commit: 935208ad09]
2025-12-29 16:43:40 -08:00
amd-jiali 935208ad09 Add an environment variable to allow user explicitly turn off direct AllGather (#2119)
Co-authored-by: Jiali Li <jialili@amd.com>
2025-12-29 16:43:40 -08:00
Jimbo a59d46ffbf SWDEV-567545 - Implement block_rank in co-op grid groups (#2182)
* SWDEV-567545 - Implement block_rank in co-op grid groups
2025-12-29 11:39:23 -05:00
Adam Pryor 5bf6e366dd [SWDEV-548460] Add RDC Policy Reset Message (#2180)
* [SWDEV-548460] Add RDC Policy Reset Message

* [rdc] Bump version to 1.3.0

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

* chore: [rdc] Format CMakeLists.txt

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-12-29 08:31:13 -08:00
German Andryeyev 741b4b9fdf SWDEV-558849 - Fix Windows build for ROCR backend (#2368) 2025-12-29 08:35:22 -05:00
vedithal-amd ea3fb1b810 Remove SMFMAC functionality in rocflop sample since its not supported in MI100 (#2456) 2025-12-27 09:47:54 -05:00
vedithal-amd 9c1560b8bb [rocprofiler-compute] Fix merging logic for multi process (#2445)
* Fix merging logic for multi process

* Fix dispatch id reset logic in case of rocpd format

* Fix kernel id reset logic in case of csv format

* Revert correlation logic change in csv format

* Do inner join instead of left join
2025-12-27 09:47:42 -05:00
abchoudh-amd 983386e40b [rocprofiler-compute] Write raw counter and metric values (#2314)
* Added tool for dumping counter and metric values

* Skip Linting

* Added support for iteration multiplexing

* Remove subparser and supress compute options

* Specify output dir

* Add kernel info

* csv name change

* Added comments

* Support dispatch id-less dataframes

* Formatting fix

* Add default for path

* Print help with no args

* Support only single workload
2025-12-26 14:06:57 +05:30
Avinash 2585ae8815 Virtual device enablement ( Minimal changes ) (#2110)
* minimal changes

* Setting Default tuning table

* Add warnings NIC merge accross PCIe Root complexes,NUMA

---------

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 6f62165369]
2025-12-25 15:06:33 -06:00
Avinash 6f62165369 Virtual device enablement ( Minimal changes ) (#2110)
* minimal changes

* Setting Default tuning table

* Add warnings NIC merge accross PCIe Root complexes,NUMA

---------

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2025-12-25 15:06:33 -06:00
marantic-amd bb83791b17 Remove redundant ROCPROFSYS_TRACE_CACHED variable from the code (#2434) 2025-12-25 13:36:04 +01:00
marantic-amd c3132773c8 Fix agent device ID in the cached kernel_dispatch trace (#2452) 2025-12-25 10:23:16 +01:00
Flora Cui c33dcd2d07 wsl/libhsakmt: fix reserved local help size calc
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/114>
2025-12-24 13:30:50 +08:00
Flora Cui 91df8f84da wsl/libhsakmt: implement hsaKmtSetMemoryUserData
Signed-off-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
Part-of: <http://10.67.69.192/wsl/rocr-runtime/-/merge_requests/114>
2025-12-24 13:30:50 +08:00
Flora Cui 4fb2ed2c5a librocdxg: fix vgpr count
refer to https://github.com/ROCm/rocm-systems/pull/1807

Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-12-24 13:30:45 +08:00
Longlong Yao 9bf8eb8c1e librocdxg: correct atomic info for APU
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:26:17 +08:00
Longlong Yao e616b3e65e librocdxg: use shared GPU memory as vram on small APU
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-12-24 13:23:07 +08:00
Longlong Yao 56eeaf26f8 librocdxg: query total shared GPU memory
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:14:55 +08:00
Longlong Yao 5ebe95d5b2 librocdxg: query total shared GPU memory
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:14:55 +08:00
Longlong Yao 6652313128 librocdxg: Add Strix and Strix Halo support
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:13:22 +08:00
Longlong Yao a2c5e19624 librocdxg: add interface to query segment info
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:08:12 +08:00
Longlong Yao 26cf8c8298 librocdxg: add interface to query segment info
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-12-24 13:08:12 +08:00
Bindhiya Kanangot Balakrishnan 641fa27699 [SWDEV-566543] Fix param validation in FrequenciesRead test (#2430)
Fixed incorrect error code expectation in FrequenciesRead
test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr
parameter.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-23 15:38:25 -08:00
Ioannis Assiouras 49b8900158 SWDEV-558849 - keep the lastEnqueueCommand_ when PAL backend is enabled (#2320) 2025-12-23 21:24:09 +00:00
ammallya c2c4d4c1f5 Revert "Adding full build capability to theROCK for HIP changes (#2003)" (#2441)
This reverts commit 0a52f5c101.

Reverts #2003

MIOpen build failures on windows causing blockers on unrelated file changes.
2025-12-23 13:01:08 -08:00
vedithal-amd 61fd728fdb [rocprofiler-compute] Faster counter accuracy testing (#2420)
* Faster counter accuracy testing

* Better handle SPI_CSN_* metrics for lesser than MI350 series

* Use metric filtering to collect only relevant counters for comparison

* Ensure all workload folders are deleted after testing is completed

* Dont use clean_existing=False

* Add manual test for all counter accuracy
2025-12-23 13:13:53 -05:00
vedithal-amd d7302d6c1c [rocprofiler-compute] Test env. vars. in rocprofiler-sdk backend (#2414)
* Test env. vars. in rocprofiler-sdk backend

* Improve rocprofiler-sdk backend test case to check for env. vars. and
  ensure we do not overwrite irrelevant env. vars.

* Remove unnecessary usage of ROCPROF_INDIVIDUAL_XCC_MODE env. var.

* Formatting fixes

* Test fixes

* Remove redundant code in tests

* Remove usage of utils_mod and use utils instead, this prevents
  duplicate imports
2025-12-23 13:13:28 -05:00
vedithal-amd 588773f9bf [rocprofiler-compute] Fix for multi process workload profiling (#2418)
* Fix for multi process workload profiling

Native counter collection tool updates:
    * Do not dump empty counter data for a process
    * Use PID instead of UUID for dumped csv files to facilitate correlation
    * Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
      native tool) files
    * Handle merging multiple pairs of csv (from sdk tool) and csv (from
      native tool) files

Rocpd output format updates:
    * Merge multiple rocpd databases into a single csv
    * Reset dispatch id and kernel id for unique dispatches and unique
      kernels respectively
    * Retain multiple rocpd databases per run for multi process workloads

* Add test case for multiprocess profiling using rocflop workload

* Add rocflop

* Fix native counter csv to rocprofv3 csv conversion

* Use kernel_id instead of dispatch_id to correlate native counter csv
  and kernel trace csv

* python formatting using ruff 0.14 instead of 0.13
2025-12-23 13:12:18 -05:00
Corey Derochie f221a1ae08 Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi (#2028)
* Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi

* Added `amd-smi static --driver`

* Update docs/how-to/troubleshooting-rccl.rst

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: f942810959]
2025-12-23 21:22:11 +05:30
Corey Derochie f942810959 Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi (#2028)
* Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi

* Added `amd-smi static --driver`

* Update docs/how-to/troubleshooting-rccl.rst

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-12-23 21:22:11 +05:30
Karthikeyan Arumugam bb599d8ed7 Add support for AMD AINIC within RCCL default internal network plugin. (#2078)
* Added support for AMD ROCm net-ib alongside vanilla net-ib, with auto-generation to detect conflicts early during NCCL sync and enable future customizations.
* Integrated AMD AINIC support in RCCL for out-of-the-box usage, leveraging performance improvements by default, channel pinning for optimal pipeline performance, and extended support for 32B in-line CTS messages.
* Implemented internal derivation of AINIC-specific flags when RCCL AINIC environment parameter is set, and checks before initializing AINIC net-ib methods.
* Included snapshot of auto-generated ROCm net-ib file (src/transport/net_ib_rocm.cc) for reference.
* Fixed typos in RCCL param API (RCCL_AINIC_ROCE) and dlclose.
* Updated plugin loading logic:
* Load internal ROCmIB plugin only when NCCL_NET_PLUGIN is not set.
* Load default internal net-ib only when not AINIC and no external plugin env is set.

[ROCm/rccl commit: 9f4651f20f]
2025-12-23 10:33:10 -05:00
Karthikeyan Arumugam 9f4651f20f Add support for AMD AINIC within RCCL default internal network plugin. (#2078)
* Added support for AMD ROCm net-ib alongside vanilla net-ib, with auto-generation to detect conflicts early during NCCL sync and enable future customizations.
* Integrated AMD AINIC support in RCCL for out-of-the-box usage, leveraging performance improvements by default, channel pinning for optimal pipeline performance, and extended support for 32B in-line CTS messages.
* Implemented internal derivation of AINIC-specific flags when RCCL AINIC environment parameter is set, and checks before initializing AINIC net-ib methods.
* Included snapshot of auto-generated ROCm net-ib file (src/transport/net_ib_rocm.cc) for reference.
* Fixed typos in RCCL param API (RCCL_AINIC_ROCE) and dlclose.
* Updated plugin loading logic:
* Load internal ROCmIB plugin only when NCCL_NET_PLUGIN is not set.
* Load default internal net-ib only when not AINIC and no external plugin env is set.
2025-12-23 10:33:10 -05:00
marandje 3e49440495 SWDEV-555178 - Calculate phys mem offset for remap range (#1879) 2025-12-23 10:27:42 +01:00