Grafik Komit

2061 Melakukan

Penulis SHA1 Pesan Tanggal
dependabot[bot] c2fd82c02d Bump rocm-docs-core from 1.26.0 to 1.29.0 in /docs/sphinx (#2051)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.26.0 to 1.29.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.26.0...v1.29.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 131900c264]
2026-01-20 14:28:59 -07:00
dependabot[bot] a1bb4108c1 Bump urllib3 from 2.5.0 to 2.6.3 in /docs/sphinx (#2130)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.5.0 to 2.6.3.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.5.0...2.6.3)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.6.3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: d94ecb7772]
2026-01-20 14:27:31 -07:00
Mythreya Kuricheti 73df3f12b3 use message instead of warning for nccl.h C++ check (#2128)
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 0dc31b1a4a]
2026-01-20 14:21:38 -07:00
Nusrat Islam 96f6029a1b revert memcpy use for direct AG (#2146)
Co-authored-by: Islam <nusislam@amd.com>

[ROCm/rccl commit: f3c5156bbf]
2026-01-20 13:58:28 -06:00
mberenjk 9ee8fb0aa9 Merge pull request #2136 from mberenjk/mberenjk/nccl-sync-2.28.3
Merge remote-tracking branch 'nccl/master' 2.28.3 into develop

[ROCm/rccl commit: 2fdcceaabb]
2026-01-20 11:38:11 -08:00
Marzieh Berenjkoub d7293281f3 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 858b4e76eb]
2026-01-20 13:04:02 -06:00
Aravind Ravikumar f336ad5133 Enable Robust Multi-Node RCCL Testing with cvs-sbatch and Improve CI Reliability (#2123)
* sbatch changes and TheRock SHA update

* Move tests location from /home to /apps/cvs_tests

* Add comments and move credential.ini file to /apps/cvs_tests

* Changed salloc reservation to rccl reservation

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>

[ROCm/rccl commit: 239d62f545]
2026-01-16 23:13:06 -05:00
Geo Min dfdb64572c [TheRock CI] Adding working single node tests (#2142)
* Adding working single node tests

* Revert to old docker sha

* adding back no perf tests

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>

[ROCm/rccl commit: 4b295c9893]
2026-01-13 08:35:58 -08:00
Deeksha Goplani ea1f021496 Added new unit test for register.cc (#1712)
* new unit test for register.cc

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>

* Add new register API tests

* Fix debug message ordering issue

---------

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: 420b3b840e]
2026-01-09 17:04:01 -06:00
Nusrat Islam eb347a0dd3 GDA support for alltoall via rocshmem integration (#2099)
* ROCSHMEM linking/building to match MSCCL++ style

* add rocSHMEM as a submodule

* Move rocSHMEM submodule to ext-src/rocSHMEM

* Adding submodule support proper, as well as a patch for rocshmem

* Cleaning up INCLUDE_DIR vs INCLUDE_DIRS mixup

* updating patch file

* Pointing rocshmem submodule to edgars fixup patch

* Adding IBVERBS link to the submodule build

* More IBVERBS patching

* pin rocshmem submodule to b534423

* Adding IPC support in rocSHMEM build

* updating rocshmem submodule to resolve CQ errors

* Updating submodule to include recent a2a optimizations

* invoke rocshmem alltoall from rccl

* Updating submodule to CQ error number hang

* Updating submodule to include a2a improvements and bug fixes

* Updating submodule to point to Yiltan's fork and doorbell ring removal commit

* Updating hash to correspond with submodule change

* Updating to no-ctx wg call and updating submodule

* copy-in/copy-out using multiples CUs

* Updating rocSHMEM submodule to include doorbell improvs

* updating gitmodule to point to upstream

* code cleanup and adjust threashold

* guard rocshmem a2a invocation

* Only build with rocshmem when specified

* code cleanup

* address review comments

* Removing debugging failure case

Signed-off-by: Thomas Huber <thomas.huber@amd.com>

* whitespace fix

* Adding rocshmem compile guard

* Removing unneccesary comment

Signed-off-by: Thomas Huber <thomas.huber@amd.com>

* remove commented lines

* address review comments

* cleanup

---------

Signed-off-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k12-27.cs-aus.dcgpu>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-03.cs-aus.dcgpu>

[ROCm/rccl commit: 27648b0900]
2026-01-09 14:04:54 -06:00
Wenkai Du 87eec6427e Fix broken build due to ncclCudaCalloc change (#2135)
[ROCm/rccl commit: 11e0f4445e]
2026-01-09 09:22:00 -08:00
Dingming Wu 4e15dc142c Update device.h for hip_bfloat16 inclusion guard (#2107)
* Update device.h for hip_bfloat16 inclusion guard

Prevents other files in rocm include the old hip/hip_bfloat16.h, which is guarded by _HIP_INCLUDE_HIP_AMD_DETAIL_HIP_BFLOAT16_H_ and _HIP_BFLOAT16_H_

* Update device.h to handle old hip_bfloat16.h

Added a workaround for old hip_bfloat16.h header usage.

[ROCm/rccl commit: 8e4dbfdf37]
2026-01-09 09:45:47 -05:00
Karthikeyan Arumugam 94499918b3 Add check for P2pPolicy for rocm-ib (#2122)
[ROCm/rccl commit: d0d00c33ee]
2026-01-09 11:33:05 +00:00
Wenkai Du 07453ebfaf Improve RCCL kernel coll trace (#2061)
[ROCm/rccl commit: 1d22c87167]
2026-01-08 16:07:18 -08:00
Wenkai Du 721c624de8 Remove iommu warning in KVM env (#2112)
* Remove iommu warning in KVM env

* Fix for review comments

[ROCm/rccl commit: de931f4c53]
2026-01-08 13:55:40 -08:00
Atul Kulkarni 30d36661c2 Adds Python-based test runner for RCCL (#2034)
* Added python test runner to execute rccl tests

* Disabled capture output to avoid hangs

* Add RCCL_TEST_MPI_HOSTFILE env var to get the hostfile

* Converted test_type to boolean gtest flag

* Removed unused return values

* Added custom rccl library usage

* Removed json output

* Updates to test_runner: added num_gpus field

* Address review comments

* Prepend env vars for single node, single process executions

* Added separate enums for exit and result codes

* Update configuration files

* Moved configurations to its own dir

* Address review comments

* Update tools/scripts/test_runner/README.md

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 0c2c61d2f1]
2026-01-08 10:04:41 -06:00
Kapil S. Pawar 868f40c49d [NAVI3X] [MI308X] Fix UT hangs and failures for ROCm RCCL builds (#2124)
* Update toolchain with compiler flags for RelWithDebInfo

[ROCm/rccl commit: e905d52fc0]
2026-01-08 08:58:19 -06:00
Mustafa Abduljabbar 5bba932529 [WarpSpeed] Improve handling for auto and manual modes (#2125)
* Force ring in WarpSpeed manual mode and log event

* Skip usage for non-ring in WarpSpeed auto mode

* Enable WarpSpeed when its CU count is set

[ROCm/rccl commit: 93fdcb160c]
2026-01-06 10:21:49 -05:00
Nusrat Islam 49d9f8cc27 use memcpy for local copies (#2121)
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>

[ROCm/rccl commit: b4a86ef680]
2026-01-06 09:00:57 -06:00
Avinash de23e1db6d Navi4 LL enablement and tuning (#2095)
* LL enablement for gfx1201

* Single node LL/Simple tuning

* multinode algo/prto default choice

* First iteration of Table tuning

* gfx924 tuning table correction

* Addressing PR comments and prefix match fix


[ROCm/rccl commit: 9545ae04b2]
2026-01-05 10:17:12 -06:00
Nusrat Islam 57f81914d8 gfx950: restrict maxChannels to 48 for multi-node collectives (#2116)
* gfx950: restrict maxChannels to 48 for multi-node collectives

* change env name for reduced CU config

---------

Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>

[ROCm/rccl commit: f756aa9add]
2025-12-31 09:28:19 -06:00
amd-jiali 7d25ecc65c Add an environment variable to allow user explicitly turn off direct AllGather (#2119)
Co-authored-by: Jiali Li <jialili@amd.com>

[ROCm/rccl commit: 935208ad09]
2025-12-29 16:43:40 -08:00
Avinash 2585ae8815 Virtual device enablement ( Minimal changes ) (#2110)
* minimal changes

* Setting Default tuning table

* Add warnings NIC merge accross PCIe Root complexes,NUMA

---------

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 6f62165369]
2025-12-25 15:06:33 -06:00
Corey Derochie f221a1ae08 Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi (#2028)
* Updated troubleshooting-rccl.rst to change rocm-smi to amd-smi

* Added `amd-smi static --driver`

* Update docs/how-to/troubleshooting-rccl.rst

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: f942810959]
2025-12-23 21:22:11 +05:30
Karthikeyan Arumugam bb599d8ed7 Add support for AMD AINIC within RCCL default internal network plugin. (#2078)
* Added support for AMD ROCm net-ib alongside vanilla net-ib, with auto-generation to detect conflicts early during NCCL sync and enable future customizations.
* Integrated AMD AINIC support in RCCL for out-of-the-box usage, leveraging performance improvements by default, channel pinning for optimal pipeline performance, and extended support for 32B in-line CTS messages.
* Implemented internal derivation of AINIC-specific flags when RCCL AINIC environment parameter is set, and checks before initializing AINIC net-ib methods.
* Included snapshot of auto-generated ROCm net-ib file (src/transport/net_ib_rocm.cc) for reference.
* Fixed typos in RCCL param API (RCCL_AINIC_ROCE) and dlclose.
* Updated plugin loading logic:
* Load internal ROCmIB plugin only when NCCL_NET_PLUGIN is not set.
* Load default internal net-ib only when not AINIC and no external plugin env is set.

[ROCm/rccl commit: 9f4651f20f]
2025-12-23 10:33:10 -05:00
Geo Min c199df6b96 Revert "Adding org var and dynamic runner selection (#2106)" (#2114)
This reverts commit 4f7698c27e.

[ROCm/rccl commit: 4f474a7389]
2025-12-19 12:53:09 -08:00
alexander-sannikov 8bc2e81e9a Tuning: use constant value for CorrectionFactor tables
[ROCm/rccl commit: 50568dc93d]
2025-12-18 18:55:03 +00:00
alexander-sannikov 1b00f1a895 Tuning: fixed out-of-bound access
[ROCm/rccl commit: dea50b5e11]
2025-12-18 18:55:03 +00:00
Atul Kulkarni 4ef22f973e Revert: Restore default symbol visibility for tests in debug mode (#2111)
[ROCm/rccl commit: 313b98281c]
2025-12-18 11:20:12 -06:00
Pedram Alizadeh bed6070e12 Adding tuning conf file for CU reduction for AR, AG, and RS with under-subscribed number of GPUs per node (#2102)
[ROCm/rccl commit: f0e7e8745f]
2025-12-17 16:58:54 -05:00
Atul Kulkarni c64c23fbee Removes default visibility in debug mode and updates unit tests for alt_rsmi impl (#2091)
* Update unit tests for alt_rsmi impl

- Create distinct test executable for alt_rsmi testing
- Updated alt_rsmi tests to use public methods
- Compiles alt_rsmi.cc with ARSMI_TEST_BUILD
- Enables external linkage of internal variables
- Only for AltRsmiTests.cpp that manipulates internals
- Clean separation for test behavior

* Address review comments

* restore hidden symbol visibility

[ROCm/rccl commit: 74690ea705]
2025-12-17 10:27:00 -08:00
Geo Min 4f7698c27e Adding org var and dynamic runner selection (#2106)
[ROCm/rccl commit: 2e193aed68]
2025-12-16 10:41:57 -08:00
Mustafa Abduljabbar d15a2c6b65 Keep P2P self-copy for batched ops to prevent >32N hang. (#2108)
[ROCm/rccl commit: 596567ff95]
2025-12-16 11:56:39 -05:00
isaki001 ddfff6b705 Remove node-count and threshold restrictions from p2p-batching (#2077)
* remove node-count and threshold restrictions from p2p-batching

* remove batching threshold usage, fix typo for using batching-enablement flag

---------

Co-authored-by: Mustafa Abduljabbar <mustafa.abduljabbar@amd.com>

[ROCm/rccl commit: 7c1049d2a4]
2025-12-15 19:55:46 -05:00
Mustafa Abduljabbar 88652b53d0 Add fix for WarpSpeed auto mode (#2104)
[ROCm/rccl commit: 5787c960fc]
2025-12-12 17:56:52 -05:00
Mustafa Abduljabbar 2621e0254e [Device] WarpSpeed enablement and single node CU and perf opt for MI350 (#2073)
[ROCm/rccl commit: d009ab144e]
2025-12-11 19:04:35 -05:00
Ahmed Khan f17357d0d4 Add ncclCommDump API (#2068)
* Add ncclCommDump API

* remove trailing whitespace changes

* Add more proxy trace timestamps

* Add facebook_rccl namespace before proxyTrace timestamp call

* Clean up ProxyTrae construction

* Move updateProxyOpCounter to member function

* Move setProxyOpTimestamp to member function

* Move addNewProxyOp to member function

* Make internal methods private

* Make ProxyTrace thread safe

* Fix unit tests

* Fix overwritten ProxyTrace DONE setting in net.cc

[ROCm/rccl commit: 08dd75712f]
2025-12-11 15:02:35 -07:00
Thomas Huber e5c20187ed Update gfx950 tuner conf to include broadcast (#2065)
Signed-off-by: Thomas Huber <thomas.huber@amd.com>

[ROCm/rccl commit: 1f2f9f33ba]
2025-12-11 14:36:03 -05:00
Mustafa Abduljabbar 085752d6e5 Add WAIT_PEER NPKIT event (#2100)
[ROCm/rccl commit: 2cf6a9bb19]
2025-12-11 11:18:41 -05:00
Geo Min 1b4eef8f86 Correct runner name (#2098)
[ROCm/rccl commit: 5384a8abb2]
2025-12-10 11:44:48 -08:00
corey-derochie-amd de82a18790 Fixed unit-test env var list parsing and improved filtered test run speed (#1626)
* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.

* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.

* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.

* Wrapped the support checks in helper functions on `TestBed`.

[ROCm/rccl commit: 18e9ad913b]
2025-12-10 10:06:44 -07:00
Geo Min 2e0abab81a [ci] Bumping TheRock CI commit hash (#2097)
* Bumping TheRock CI commit hasH

* fixing artifact group

[ROCm/rccl commit: 6af9087b0c]
2025-12-09 16:25:57 -08:00
Atul Kulkarni 11ffeda52f Added a Process Isolated Test Runner (#1993)
* Added single process isolation support to execute tests

* Address review comments

* Update README

* Removed requirement of explicit call to clear method

* Added macros for simplified usage

* Updated tests to use process isolation framework

* Adjust summary output format for isolated tests

* Updated rccl_wrap tests

* Used process isolation in AllocTests

* Used process isolation and fixed failing tests

* Modified test output, added signal handling

Updated macros to handle lambdas

* Convert argcheck tests to isolated tests

* Convert proxy tests to isolated tests

* Remove non-supported test

* Fixed file descriptor handling and clearing env vars for tests

[ROCm/rccl commit: 7e10267dfd]
2025-12-08 10:36:05 -06:00
Atul Kulkarni 142860442a Enable MPI support to execute MPI specific unit/functional tests (#1996)
* Added MPI support to execute unit/functional tests

Update node and process validation
Updated node detection count and modified validation method
Update validation logic to include max procs and nodes

* Address review comments

* Fix warnings

* Added a new NET transport test and clean up

* Added MPI test logging mechanism

* Decoupled GTest framework

* Added Net IB functional tests

* Updated with resource guards

* Added NET IB tests and refactored code

* Update P2pWorkflow test

* Update documentation

* Add MPI_TESTS_ENABLED guard to the file

* Fix Shm and NetIB tests

* Applied refactoring and cleanup

* Replaced BufferGuard with AutoGuard

* Modified test debug logging

* Use macro to reduce NcclTypeTraits code duplication

- Replace repetitive template specializations with a single
  DEFINE_NCCL_TYPE_TRAIT macro
- Use stringification operator (#) to auto-generate type name strings
- Add #undef to keep macro from polluting namespace
- Makes adding new type mappings trivial

* Unify buffer initialization with generic pattern function

- Remove initializeBufferWithCustomPattern
- Make initializeBufferWithPattern generic with PatternFunc template param
- Now single function handles all patterns via lambda injection
- Updated all test files to use lambdas for pattern generation
- Pattern logic now visible at call site (self-documenting)

* Unify buffer verification with pluggable pattern function

- Remove verifyBufferWithCustomCheck
- Make verifyBufferData generic with PatternFunc template param
- Single function handles all verification patterns via lambda injection
- Updated all test files to use lambdas
- Better defaults: num_samples=0 means verify all elements
- Pattern logic now visible at call site (self-documenting)

* Docs: Add DeviceBufferHelpers section to MPITestRunner.md

- Document new refactored buffer initialization/verification API
- Explain pluggable pattern functions with lambda examples
- Show type mapping and automatic float/int comparison
- Include migration guide from old API to new unified functions
- Demonstrate best practices with real-world examples
- Reference recent refactoring commits (macro-based type traits)

* Docs: Update documentation and examples

- Update on DeviceBufferHelpers
- Update examples using DeviceBufferHelpers methods, e.g. data verification

* Address review comment.

- Replace manual pattern generation loop with initializeBufferWithPattern call
- Use downloadBuffer to get host copy instead of manual hipMemcpy

* Remove non-existent dependency

* Remove duplicate testcase

* Code cleanup in test files

* Moved common constants to base class

[ROCm/rccl commit: 29e1567b95]
2025-12-06 16:05:37 -06:00
Atul Kulkarni 1a986dc190 Remove legacy AltRsmi tests (#2090)
These tests will be replaced by new tests.

[ROCm/rccl commit: 8ad446b271]
2025-12-05 16:53:55 -06:00
Atul Kulkarni 63aa3bb537 Remove legacy Shm and P2p tests (#2089)
These tests will be replaced by MPI tests.

[ROCm/rccl commit: 0d797d1f6c]
2025-12-05 16:53:28 -06:00
Atul Kulkarni 86a4dd95f6 Remove static to non-static conversion used in tests (#2084)
* Remove coll_reg tests which are unsupported

* removed static to non-static conversion feature

[ROCm/rccl commit: 7ec8e73e12]
2025-12-04 18:03:14 -06:00
Atul Kulkarni a364ada6e7 Add missing header in alloc.h (#2086)
[ROCm/rccl commit: 892d258319]
2025-12-04 11:26:19 -06:00
Atul Kulkarni 0ced7aede8 Fix rccl test suite to use hip_bf16.h instead of hip_bfloat16.h for the __bf16 intrinsic (#2082)
[ROCm/rccl commit: cc6e259a02]
2025-12-04 10:02:06 -06:00
Atul Kulkarni e4aef19511 Added new unit tests for AllReduce with Bias API (#2036)
* Added new unit tests for AllReduce with Bias API

* Address review comments

[ROCm/rccl commit: 7c12b0b76b]
2025-12-03 17:37:34 -06:00