提交線圖

1883 次程式碼提交

作者 SHA1 備註 日期
Avinash 832c5b1f13 [build] Disable MSCCL++ compilation by default (#1879)
* Enable MSCCLPP on request

* Updating docs and README

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update CHANGELOG.md

Github didn't take the edit to my suggestion properly.

---------

Co-authored-by: amd <amd@super3.amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: a0ec15bafe]
2025-08-28 08:52:12 -06:00
Nilesh M Negi ddb1c9bd8b [AzureCI] Switch to ROCm 6.4.1 and add rccl-tests (#1782)
* Use ROCm 6.4.1 for testing
* Extend RCCL-Tests to multi-node
* Add HSA_NO_SCRATCH_RECLAIM to UT runs
* Limit to single-node rccl-tests for now

[ROCm/rccl commit: d73cee7588]
2025-08-27 21:07:53 -05:00
jonatluu 8526a86978 fix lintian warning package-contains-timestamped-gzip (#1865)
* fix lintian warning package-contains-timestamped-gzip

* fix lintian warning

[ROCm/rccl commit: 4699bff790]
2025-08-27 13:29:07 -04:00
Geo Min 6db483845d [TheRock CI] Adding single node tests for RCCL (#1876)
* Add single-node testing

* Adding single node test

* Adding quotes

* fix typo

* Adding test flag

* No MPI

* Adding openmpi install

* Adding comment

* PR comments

* Missing proj

* Adding half

* Adding rocr runtime

* Adding them all'

* new sha

* Fixing script

* Removing confusing skip test case

* Adding docs

* Update .github/workflows/therock-test-packages-single-node.yml

Co-authored-by: Marius Brehler <marius.brehler@amd.com>

---------

Co-authored-by: Marius Brehler <marius.brehler@amd.com>

[ROCm/rccl commit: f404624d9e]
2025-08-27 08:13:10 -07:00
Nusrat Islam fde5d7a8be Device allocation tracker (#1878)
* alloc: add memory allocation tracker

* alloc: add tracker for ncclCuMemAlloc() APIs

* alloc: add null pointer check during free

[ROCm/rccl commit: df448862c3]
2025-08-27 09:30:51 -05:00
Kapil S. Pawar 3d889cc189 Code coverage tests for param.cc (#1872)
* Added code coverage unit tests for param.cc

* Updated ParamTests.cpp and removed ParamTestsConfFile.txt

* Updated ParamTests.cpp

* Removed NCCL_LOG_INFO and added sample cofig file

---------

Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: c9becd89cd]
2025-08-27 09:30:37 -05:00
ishkool f500628ef2 Code coverage tests for net_socket.cc (#1840)
* Code coverage UTs for net_socket.cc

* Addressed review comments

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: c288fbf1b2]
2025-08-27 09:24:21 -05:00
Marius Brehler 5277457f21 Bump TheRock version used for testing (#1885)
[ROCm/rccl commit: 221205ebd4]
2025-08-27 16:22:27 +02:00
Mustafa Abduljabbar f37f290134 [Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861)
* Support pipelining codegen and template specialization

* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)

* Remove need for FUNC_INDEX_TOTAL

* Add pipeline field to device function key construction logic

* Avoid unneeded codegen for LL/LL64 kernels

* Modify conditions and add pipeline dtypes env

* Optimize selection for both gfx942 and gfx950

* Increase pipeline bitfield width

* Use __forceinline__ for all device functions

* Realign reduceCopy with original form

* Add opt-out option to enable perf debugs

* Remove force-reduce-pipelining option from README

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 277747c199]
2025-08-26 15:03:54 -04:00
Nusrat Islam c7fce9b0eb fixup: remove extra semicolon (#1881)
[ROCm/rccl commit: b882af9ffd]
2025-08-26 10:57:25 -05:00
Jeffrey Novotny 9d8f953085 Docs: Fix formatting for Docker guide (#1882)
* Docs: Fix formatting for Docker guide

* Incorporate feedback

[ROCm/rccl commit: 64f8e01b76]
2025-08-26 10:18:32 -04:00
Mustafa Abduljabbar b33b5755f6 Support gfx950 in topo_expl and resolve dependency on FMT (#1829)
* Support gfx950 in topo_expl

* Fix dependencies and fetch fmt from sources

* Remove third_party folder in make clean

* Add empty target when fmt is found

* Add MI350 example

* Update README.md

---------

Co-authored-by: isaki001 <ioannissakiotis@gmail.com>

[ROCm/rccl commit: dfad51e3c9]
2025-08-26 10:11:38 -04:00
Nusrat Islam 1af94eee8d Add direct allgather algorithm (#1868)
* add direct allgather algorithm

* minor fix

* add debug print for memory allocation tracker

* add message size threshold for direct allgather

* scatter transfers across ranks

* update changelog

* minor fix

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* enable direct AG when pxn is ON on MI300X or MI350

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 5e7937effb]
2025-08-25 07:55:10 -05:00
corey-derochie-amd af1c448ed1 Changed TestBedChild to avoid hang if the call fails (#1875)
Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.

[ROCm/rccl commit: b88c134874]
2025-08-23 00:17:34 -05:00
Nilesh M Negi fbe014c870 [BUILD] Populate host_table entries only for 1 unroll (#1871)
[ROCm/rccl commit: bf6660ee4e]
2025-08-23 00:15:38 -05:00
awelling2801 40462cc845 Added new tests for rccl_wrap - rcclUpdateThreadThreshold (#1855)
* Added tests for rccl_wrap - rcclUpdateThreadThreshold

* Skipped tests gtest_skip added

* Added tests for new functions rcclSetP2pNetChunkSize and rcclSetPxn

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: a1a65c65c4]
2025-08-21 16:39:53 -05:00
Marius Brehler b8daad3068 Add a badge for TheRock CI (#1874)
Adds a badge for TheRock CI and moves the existing badge to the top.

[ROCm/rccl commit: 5ae5eb9440]
2025-08-21 21:54:37 +02:00
Geo Min bec7d58b04 [TheRock CI] Adding TheRock RCCL tests (#1873)
* First commit for rccl multi node test workflow

* Adding workflow dispatch

* Added branch based pull trigger

* Changed typo in branch name

* Add input variables to push

* Removed input variables to push

* Added self hosted runner for Vultr cloud

* Skipping build and only running test

* Changed test runner label name

* Made changes to executable paths in test script

* Made changes to run

* Made changes to cd into cvs dir

* This is a dummy commit

* Added cmake options

* Modified build options

* Commiting build changes

* Adding rccl and rccl-tests

* Re-ordering rccl and rccl-tests

* adding --global command

* modified cmake command

* modified script paths

* Testing OIDC for rccl repo

* Testing OIDC for rccl repo

* Testing build and upload workflow

* use default env variable for AMDGPU families on push workflow trigger

* Adding cleanup and correct role

* Adding additional yml files

* Fixing typo';

* Adding new sha

* Adding correct gpu target

* Adding back venv bin activate

* Adding workflow dispatch for tests

* Testing

* Adding cat

* Adding cat

* Adding rocm dir change

* Adding checkout

* cat with sudo

* rccl checkout

* correcting branch

* removing sudo

* trying to adjust correct path'

* Adding output dir path

* Use docker container with pre-installed MPI

* Adding back build steps

* Fixing SHA

* Adding exclusion logic:

* Adding test

* Adding CI check

* Removing testing

* Limit to build only rccl, rccl-tests and required dependencies

* Adding test

* Removing test

* Removing quote

* Reverting test

* PR comments

---------

Co-authored-by: arravikum <arravikum@amd.com>
Co-authored-by: Marius Brehler <marius.brehler@amd.com>

[ROCm/rccl commit: f9a957bbab]
2025-08-20 15:07:23 -07:00
Arm Patinyasakdikul 3a544ed5f4 Removing "Could not find any local path from gpu X to net." warning (#1866)
* Removing "Could not find any local path from gpu X to net." warning to avoid confusion.

[ROCm/rccl commit: 28a83c3ea6]
2025-08-20 16:52:35 -05:00
Arm Patinyasakdikul 8557ea33ad Test: delete child object to address memory leak. (#1863)
[ROCm/rccl commit: 9d3acffa5f]
2025-08-20 10:15:03 -05:00
Arm Patinyasakdikul d4fecfb0be Remove noinline attribute from reduceCopyPacks and (#1864)
reduceCopyPacksWithBias.

[ROCm/rccl commit: fb882e80f6]
2025-08-19 20:24:31 -05:00
Atul Kulkarni 8c5095dd94 Added new code owners (#1869)
[ROCm/rccl commit: 231449c896]
2025-08-19 16:32:25 -05:00
Mustafa Abduljabbar 5025a9aab9 Have ncclDevFuncId use 64-Bit keyed map with field packing (#1857)
- Updated ncclDevFuncId to use a hash-based lookup with std::unordered_map.
- Keys are now 64-bit integers, which pack coll, algo, proto, devRedOp, and type fields.
- Improved flexibility and maintainability by moving away from row-based indexing.
- Added error handling for missing keys in the hash map.
- Aligned key generation logic with generate.py and updated generate.py.

[ROCm/rccl commit: c1b3cd8911]
2025-08-19 16:41:19 -04:00
Nusrat Islam e4c025e5cd device: optimize threadfence for ll64 protocol (#1858)
* device: optimize threadfence for ll64 protocol

* device: use __atomic_signal_fence()

---------

Co-authored-by: Nusrat Islam <nusislam@useocpslog-003.amd.com>

[ROCm/rccl commit: 6ade5065b4]
2025-08-18 09:16:41 -05:00
ishkool 377160e0c9 Code Coverage: Proxy.cc tests (#1818)
* Proxy.cc tests

* Update ProxyTest.cpp

Cleaned up the code.

* Update ProxyTests.cpp

Bring back deleting dynamically allocated memory

[ROCm/rccl commit: 876f985e0f]
2025-08-15 19:06:32 -05:00
Atul Kulkarni 38e88ba87e Added new unit tests for src/enqueue.cc (#1853)
[ROCm/rccl commit: 84f3cc6a02]
2025-08-15 18:26:26 -05:00
ishkool 61a189bc84 Code Coverage Unit Tests for comm.h (#1783)
* File containing test for comm.h

* Update CommTest.cpp

Added gtest API for assert

* Update CommTest.cpp

Adding copyright

* Update CommTest.cpp

Removing info and tested as not required.

* Update and rename CommTest.cpp to CommTests.cpp

* Update CMakeLists.txt

[ROCm/rccl commit: 6453273aa6]
2025-08-15 17:44:24 -05:00
Nilesh M Negi ed4abedf7b [DEVICE] Use noinline for LLGenericOp only on gfx950 (#1849)
[ROCm/rccl commit: c3b8de4ec8]
2025-08-15 15:15:02 -05:00
isaki001 2e9a2d1762 [TUNING] gfx950 16N tuning (#1835)
* change gfx950 algo/proto selection for multinode allreduce, allgather, reduceScatter
* gfx950 tuning: enable tuning for broadcast, allreduce starts LL128 earlier and switches to ring earlier, change LL128 start for allgather and reduceScatter
* lower LL128 threshold
* update reduceScatter LL128 min to match LL max for consistency
* enable multinode PXN and increase chunksize for gfx950
* change LL128 start to 128KB, adjust ring-start according to node-count
* disable code-path for fused-AR on LL128 for gfx950
* use LL128 starting from 1KB for multinode allgather on gfx950
* start LL128 earlier for multinode reduceScatter on gfx950
* start LL128 earlier for multinode broadcast on gfx950
* set multinode allreduce to start simple on 64MB for gfx950
* start LL128 from 1KB for multinode broadcast on gfx950
* setting multinode AR to use tree instead of ring at 16MB, 64MB, 128MB
* set multinode broadcast to use LL for up to 256KB depending on node-count for gfx950
* adjust algo for 32MB  multinode allreduce on gfx950
* make 32MB tree LL128 for multinode AR on gfx950
* make sure ring is not picked on 2N allreduce on small sizes

[ROCm/rccl commit: 44121db890]
2025-08-15 15:12:45 -05:00
alex-breslow-amd dc3a0c5242 Disable the __threadfence on the sender side of the simple protocol when possible. (#1830)
Leverages the traits of extended-scope fine-grain memory to get rid of a device-scope acquire-release fence.  This improves throughput for single node workloads on gfx942 and gfx950 for some input sizes (e.g., ~32 MiB to about 256 MiB) when using the simple protocol.  Multinode workloads on MI300X see a smaller but statistically significant uplift for some message sizes.  Runtime disablement is supported via setting the environment variable RCCL_GFX942_CHEAP_FENCE_ON to 0.

[ROCm/rccl commit: 1aa2570b48]
2025-08-15 07:54:54 -07:00
mberenjk c76a4492f1 Added useAcc as a template parameter to address the performance regression (#1856)
* Added useAcc as a template parameter to address the 2% performance regression in allreduceWithBias
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>


[ROCm/rccl commit: c61152baa4]
2025-08-14 15:58:54 -05:00
Adel Johar d3e9db9432 Docs: Add environment variables reference page
[ROCm/rccl commit: aaf8613b76]
2025-08-14 09:55:28 +02:00
Karthikeyan Arumugam 16d0871985 Add cstring header explictly as it is removed from HIP (#1859)
[ROCm/rccl commit: 6d41e5ba99]
2025-08-13 15:14:22 -07:00
Rahul Vaidya baa6a61535 [BUILD] Fix UT packaging on Debian family OS (#1854)
* Fix UT packaging on Debian family OSes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Split OR condition when performing Debian checks

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: ee9ed3ef87]
2025-08-11 17:03:16 -05:00
Chris Sosa 584413b2cb Add CI Badge for tracking CI status in prep for gating changes (#1851)
This PR is intended to move RCCL to gating changes on CI failures. Right now, only build/unittests run per PR consistently. We should eventually add all single and multi-node test status badges once those tests are running in presubmit and continuously on develop

[ROCm/rccl commit: 53977821b5]
2025-08-11 14:02:46 -07:00
Nilesh M Negi 74adb64dfb [BUILD] Fix UT packaging on Debian OS (#1848)
[ROCm/rccl commit: 5036d0e713]
2025-08-11 09:43:26 -05:00
Rahul Vaidya 70a5f2f317 Fix rccl-UnitTests packaging on Debian systems (#1846)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: cbbc713b03]
2025-08-08 12:28:56 -05:00
isaki001 52d33058bb enable more events for LL128 NPKIT trace collection (#1827)
[ROCm/rccl commit: 74d82a8145]
2025-08-07 11:19:36 -05:00
awelling2801 c5b4e1bc78 Created coverage tests for rccl_wrap (#1694)
* Created coverage tests for rccl_wrap

RCCL_EXPOSE_STATIC off by default

Coverage tests for rccl_wrap.cc

* Remove RCCL_EXPOSE_STATIC dependency

* Removed Rcclwrap.RcclGetAlgoInfoTest

* Remove comments

* Corrected RCCL_EXPOSE_STATIC definition logic

---------

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: 82bea39280]
2025-08-06 14:48:00 -05:00
Avinash f34d760613 Compiler warnings fix 2 (#1801)
* Changes to device code

* Changes to src/misc

* Changes to graph

* src/include changes

* src/transport changes

* changes in init, enqueue, proxy

* Changes to CMakeLists.txt

* Additional changes to device code

* Additional changes to net.cc

* adding 'compiler warning' tag to ease upstream merge'

* typo correction

* Addessing comments

* Additional changes for new commits

[ROCm/rccl commit: 3f8cac388e]
2025-08-05 17:36:23 -05:00
Arm Patinyasakdikul df3b7e477f Disable context tracking for the current version. (#1839)
[ROCm/rccl commit: 6fc228e247]
2025-08-04 10:48:00 -05:00
Atul Kulkarni 35283394ed Add unit tests for graph/xml.cc & graph/xml.h (#1833)
* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

* Added new unit tests for src/transport/shm.cc

* Added new unit tests for graph/xml.cc

[ROCm/rccl commit: 0e7d7da55d]
2025-08-01 14:20:27 -05:00
Atul Kulkarni e550ba1e3b Update help text in README (#1837)
[ROCm/rccl commit: e2c9f2feab]
2025-08-01 14:19:27 -05:00
awelling2801 0d34963b35 Added tests for coll_reg (#1700)
Changes to coll_reg

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 5ecc1b7ede]
2025-07-31 13:49:23 -05:00
dependabot[bot] b6639c85f4 Bump urllib3 from 2.2.2 to 2.5.0 in /docs/sphinx (#1751)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.2 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.2.2...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 32e95963dc]
2025-07-31 11:25:45 -06:00
dependabot[bot] e31001e378 Bump rocm-docs-core from 1.18.2 to 1.22.0 in /docs/sphinx (#1836)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.2 to 1.22.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.2...v1.22.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 1acc3eb6c1]
2025-07-31 11:15:01 -06:00
awelling2801 839fcb54b5 Added tests for transport.cc (#1725)
Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 7320752bf3]
2025-07-31 11:04:28 -05:00
Rahul Vaidya d65eb0b021 Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: 0adc5edc74]
2025-07-31 11:00:49 -05:00
Nilesh M Negi be810f10f3 [DEVICE] Add unroll=2 for gfx950 multi-node (#1824)
[ROCm/rccl commit: bd55f876e9]
2025-07-31 02:35:26 -05:00
ycui1984 39c508b80d Add collective latency profiler (#1785)
* [LatencyProfiler] Initial commit

* [LatencyProfiler] Add unit tests

* [LatencyProfiler] add more

* [LatencyProfiler] Pass unit tests

* [LatencyProfiler] Add hooks to integrate with meta internal tools

* [LatencyProfiler] Restore install.sh

* [LatencyProfiler] Resolved comments 1. add proper license 2. use proper namespace

* [LatencyProfiler] Add header

[ROCm/rccl commit: 874cd657ef]
2025-07-30 14:59:28 -07:00