Grafico dei commit

1899 Commit

Autore SHA1 Messaggio Data
Karthik Ganesan 740dfd1efd Update prims_simple.h to keep header file access to rccl_metadata.h uniform (#1906)
Header files in device/ folder are directly referenced in the code base except here.
2025-09-16 08:58:50 -05:00
Kapil S. Pawar 86a6d06e40 Added new tests for rccl_wrap - rcclOverrideProtocol, rcclOverrideAlgorithm (#1895)
* Added new unit tests for rccl_wrap
2025-09-15 18:00:26 -05:00
Bertan Dogancay 93d86dd8e3 [BUILD] Stop generating sym kernels by default (#1907)
* Stop generating sym kernels by default
2025-09-15 12:19:35 -04:00
ycui1984 da8abb2651 [MIT] Add MIT license file (#1908) 2025-09-12 13:37:44 -05:00
Arm Patinyasakdikul f21fbdfc18 Fix issue where staging/mainline build commit hash doesn't match the actual RCCL commit. (#1910) 2025-09-11 16:13:21 -05:00
mberenjk ada4e12360 disabling msccl for fp8 datatype (#1888)
* disabling msccl for fp8 datatype

---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-09-11 13:09:34 -05:00
Wenkai Du de9ebd8a8b Treat PIX and PXB as same GDR distance (#1894) 2025-09-11 10:44:10 -05:00
isaki001 9c36439354 add reduce/broadcast algo/proto selection table for multi-node gfx940 (#1889) 2025-09-10 14:25:23 -05:00
Wenkai Du c2bccf9156 Enable LL128 and use same tuning table for gfx942 4 NICs (#1898) 2025-09-10 11:11:15 -04:00
Kapil S. Pawar f418a4c6d0 Added new tests for rccl_wrap - rcclSetPipelining (#1890)
* Added tests for rcclSetPipelining

* Added conditions to skip the test

* Updated message size
2025-09-05 09:29:11 -05:00
Mustafa Abduljabbar 6e45eaf75e Use add_unroll.sh in topo_expl makefile (#1886) 2025-09-03 09:43:16 -04:00
Mustafa Abduljabbar 7ccc6f268f Force enable proto and/or algo after model selection (#1799)
* Force enable proto or algo

* Remove inc nccl_common.h

* Move logic and add error checks

* Fix topo_expl compatibility

* Allow algo/proto overrides

* Remove extra function decl

* Clarify warning message

* Move algo/proto overrides into separate functions

* Update CHANGELOG.md
2025-09-03 08:54:13 -04:00
ycui1984 361d596229 [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm>=6.4.0 (#1867)
* [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm >= 6.4.0
* [rocm_regression] Check firmware version
* [rocm_regression] Resolve review comments
* [rocm_regression] Move hsa env checking into init once func
* [rocm_regression] Prevent hot fix version in firmware
* [rocm_regression] Improve unit tests
2025-08-29 11:18:23 -05:00
Bertan Dogancay 9afc15625f Merge pull request #1880 from rahulvaidya20/2.27.3-1
[SYNC] 2.27.3-1
2025-08-29 12:10:12 -04:00
BertanDogancay 08a7be231b Merge remote-tracking branch 'nccl/master' into develop 2025-08-28 15:46:28 -05:00
Avinash a0ec15bafe [build] Disable MSCCL++ compilation by default (#1879)
* Enable MSCCLPP on request

* Updating docs and README

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update CHANGELOG.md

Github didn't take the edit to my suggestion properly.

---------

Co-authored-by: amd <amd@super3.amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2025-08-28 08:52:12 -06:00
Nilesh M Negi d73cee7588 [AzureCI] Switch to ROCm 6.4.1 and add rccl-tests (#1782)
* Use ROCm 6.4.1 for testing
* Extend RCCL-Tests to multi-node
* Add HSA_NO_SCRATCH_RECLAIM to UT runs
* Limit to single-node rccl-tests for now
2025-08-27 21:07:53 -05:00
jonatluu 4699bff790 fix lintian warning package-contains-timestamped-gzip (#1865)
* fix lintian warning package-contains-timestamped-gzip

* fix lintian warning
2025-08-27 13:29:07 -04:00
Geo Min f404624d9e [TheRock CI] Adding single node tests for RCCL (#1876)
* Add single-node testing

* Adding single node test

* Adding quotes

* fix typo

* Adding test flag

* No MPI

* Adding openmpi install

* Adding comment

* PR comments

* Missing proj

* Adding half

* Adding rocr runtime

* Adding them all'

* new sha

* Fixing script

* Removing confusing skip test case

* Adding docs

* Update .github/workflows/therock-test-packages-single-node.yml

Co-authored-by: Marius Brehler <marius.brehler@amd.com>

---------

Co-authored-by: Marius Brehler <marius.brehler@amd.com>
2025-08-27 08:13:10 -07:00
Nusrat Islam df448862c3 Device allocation tracker (#1878)
* alloc: add memory allocation tracker

* alloc: add tracker for ncclCuMemAlloc() APIs

* alloc: add null pointer check during free
2025-08-27 09:30:51 -05:00
Kapil S. Pawar c9becd89cd Code coverage tests for param.cc (#1872)
* Added code coverage unit tests for param.cc

* Updated ParamTests.cpp and removed ParamTestsConfFile.txt

* Updated ParamTests.cpp

* Removed NCCL_LOG_INFO and added sample cofig file

---------

Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com>
2025-08-27 09:30:37 -05:00
ishkool c288fbf1b2 Code coverage tests for net_socket.cc (#1840)
* Code coverage UTs for net_socket.cc

* Addressed review comments

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-27 09:24:21 -05:00
Marius Brehler 221205ebd4 Bump TheRock version used for testing (#1885) 2025-08-27 16:22:27 +02:00
Mustafa Abduljabbar 277747c199 [Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861)
* Support pipelining codegen and template specialization

* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)

* Remove need for FUNC_INDEX_TOTAL

* Add pipeline field to device function key construction logic

* Avoid unneeded codegen for LL/LL64 kernels

* Modify conditions and add pipeline dtypes env

* Optimize selection for both gfx942 and gfx950

* Increase pipeline bitfield width

* Use __forceinline__ for all device functions

* Realign reduceCopy with original form

* Add opt-out option to enable perf debugs

* Remove force-reduce-pipelining option from README

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-08-26 15:03:54 -04:00
Nusrat Islam b882af9ffd fixup: remove extra semicolon (#1881) 2025-08-26 10:57:25 -05:00
Jeffrey Novotny 64f8e01b76 Docs: Fix formatting for Docker guide (#1882)
* Docs: Fix formatting for Docker guide

* Incorporate feedback
2025-08-26 10:18:32 -04:00
Mustafa Abduljabbar dfad51e3c9 Support gfx950 in topo_expl and resolve dependency on FMT (#1829)
* Support gfx950 in topo_expl

* Fix dependencies and fetch fmt from sources

* Remove third_party folder in make clean

* Add empty target when fmt is found

* Add MI350 example

* Update README.md

---------

Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
2025-08-26 10:11:38 -04:00
Nusrat Islam 5e7937effb Add direct allgather algorithm (#1868)
* add direct allgather algorithm

* minor fix

* add debug print for memory allocation tracker

* add message size threshold for direct allgather

* scatter transfers across ranks

* update changelog

* minor fix

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* enable direct AG when pxn is ON on MI300X or MI350

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-08-25 07:55:10 -05:00
corey-derochie-amd b88c134874 Changed TestBedChild to avoid hang if the call fails (#1875)
Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.
2025-08-23 00:17:34 -05:00
Nilesh M Negi bf6660ee4e [BUILD] Populate host_table entries only for 1 unroll (#1871) 2025-08-23 00:15:38 -05:00
awelling2801 a1a65c65c4 Added new tests for rccl_wrap - rcclUpdateThreadThreshold (#1855)
* Added tests for rccl_wrap - rcclUpdateThreadThreshold

* Skipped tests gtest_skip added

* Added tests for new functions rcclSetP2pNetChunkSize and rcclSetPxn

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-21 16:39:53 -05:00
Marius Brehler 5ae5eb9440 Add a badge for TheRock CI (#1874)
Adds a badge for TheRock CI and moves the existing badge to the top.
2025-08-21 21:54:37 +02:00
Geo Min f9a957bbab [TheRock CI] Adding TheRock RCCL tests (#1873)
* First commit for rccl multi node test workflow

* Adding workflow dispatch

* Added branch based pull trigger

* Changed typo in branch name

* Add input variables to push

* Removed input variables to push

* Added self hosted runner for Vultr cloud

* Skipping build and only running test

* Changed test runner label name

* Made changes to executable paths in test script

* Made changes to run

* Made changes to cd into cvs dir

* This is a dummy commit

* Added cmake options

* Modified build options

* Commiting build changes

* Adding rccl and rccl-tests

* Re-ordering rccl and rccl-tests

* adding --global command

* modified cmake command

* modified script paths

* Testing OIDC for rccl repo

* Testing OIDC for rccl repo

* Testing build and upload workflow

* use default env variable for AMDGPU families on push workflow trigger

* Adding cleanup and correct role

* Adding additional yml files

* Fixing typo';

* Adding new sha

* Adding correct gpu target

* Adding back venv bin activate

* Adding workflow dispatch for tests

* Testing

* Adding cat

* Adding cat

* Adding rocm dir change

* Adding checkout

* cat with sudo

* rccl checkout

* correcting branch

* removing sudo

* trying to adjust correct path'

* Adding output dir path

* Use docker container with pre-installed MPI

* Adding back build steps

* Fixing SHA

* Adding exclusion logic:

* Adding test

* Adding CI check

* Removing testing

* Limit to build only rccl, rccl-tests and required dependencies

* Adding test

* Removing test

* Removing quote

* Reverting test

* PR comments

---------

Co-authored-by: arravikum <arravikum@amd.com>
Co-authored-by: Marius Brehler <marius.brehler@amd.com>
2025-08-20 15:07:23 -07:00
Arm Patinyasakdikul 28a83c3ea6 Removing "Could not find any local path from gpu X to net." warning (#1866)
* Removing "Could not find any local path from gpu X to net." warning to avoid confusion.
2025-08-20 16:52:35 -05:00
Arm Patinyasakdikul 9d3acffa5f Test: delete child object to address memory leak. (#1863) 2025-08-20 10:15:03 -05:00
Arm Patinyasakdikul fb882e80f6 Remove noinline attribute from reduceCopyPacks and (#1864)
reduceCopyPacksWithBias.
2025-08-19 20:24:31 -05:00
Atul Kulkarni 231449c896 Added new code owners (#1869) 2025-08-19 16:32:25 -05:00
Mustafa Abduljabbar c1b3cd8911 Have ncclDevFuncId use 64-Bit keyed map with field packing (#1857)
- Updated ncclDevFuncId to use a hash-based lookup with std::unordered_map.
- Keys are now 64-bit integers, which pack coll, algo, proto, devRedOp, and type fields.
- Improved flexibility and maintainability by moving away from row-based indexing.
- Added error handling for missing keys in the hash map.
- Aligned key generation logic with generate.py and updated generate.py.
2025-08-19 16:41:19 -04:00
Nusrat Islam 6ade5065b4 device: optimize threadfence for ll64 protocol (#1858)
* device: optimize threadfence for ll64 protocol

* device: use __atomic_signal_fence()

---------

Co-authored-by: Nusrat Islam <nusislam@useocpslog-003.amd.com>
2025-08-18 09:16:41 -05:00
ishkool 876f985e0f Code Coverage: Proxy.cc tests (#1818)
* Proxy.cc tests

* Update ProxyTest.cpp

Cleaned up the code.

* Update ProxyTests.cpp

Bring back deleting dynamically allocated memory
2025-08-15 19:06:32 -05:00
Atul Kulkarni 84f3cc6a02 Added new unit tests for src/enqueue.cc (#1853) 2025-08-15 18:26:26 -05:00
ishkool 6453273aa6 Code Coverage Unit Tests for comm.h (#1783)
* File containing test for comm.h

* Update CommTest.cpp

Added gtest API for assert

* Update CommTest.cpp

Adding copyright

* Update CommTest.cpp

Removing info and tested as not required.

* Update and rename CommTest.cpp to CommTests.cpp

* Update CMakeLists.txt
2025-08-15 17:44:24 -05:00
Nilesh M Negi c3b8de4ec8 [DEVICE] Use noinline for LLGenericOp only on gfx950 (#1849) 2025-08-15 15:15:02 -05:00
isaki001 44121db890 [TUNING] gfx950 16N tuning (#1835)
* change gfx950 algo/proto selection for multinode allreduce, allgather, reduceScatter
* gfx950 tuning: enable tuning for broadcast, allreduce starts LL128 earlier and switches to ring earlier, change LL128 start for allgather and reduceScatter
* lower LL128 threshold
* update reduceScatter LL128 min to match LL max for consistency
* enable multinode PXN and increase chunksize for gfx950
* change LL128 start to 128KB, adjust ring-start according to node-count
* disable code-path for fused-AR on LL128 for gfx950
* use LL128 starting from 1KB for multinode allgather on gfx950
* start LL128 earlier for multinode reduceScatter on gfx950
* start LL128 earlier for multinode broadcast on gfx950
* set multinode allreduce to start simple on 64MB for gfx950
* start LL128 from 1KB for multinode broadcast on gfx950
* setting multinode AR to use tree instead of ring at 16MB, 64MB, 128MB
* set multinode broadcast to use LL for up to 256KB depending on node-count for gfx950
* adjust algo for 32MB  multinode allreduce on gfx950
* make 32MB tree LL128 for multinode AR on gfx950
* make sure ring is not picked on 2N allreduce on small sizes
2025-08-15 15:12:45 -05:00
alex-breslow-amd 1aa2570b48 Disable the __threadfence on the sender side of the simple protocol when possible. (#1830)
Leverages the traits of extended-scope fine-grain memory to get rid of a device-scope acquire-release fence.  This improves throughput for single node workloads on gfx942 and gfx950 for some input sizes (e.g., ~32 MiB to about 256 MiB) when using the simple protocol.  Multinode workloads on MI300X see a smaller but statistically significant uplift for some message sizes.  Runtime disablement is supported via setting the environment variable RCCL_GFX942_CHEAP_FENCE_ON to 0.
2025-08-15 07:54:54 -07:00
mberenjk c61152baa4 Added useAcc as a template parameter to address the performance regression (#1856)
* Added useAcc as a template parameter to address the 2% performance regression in allreduceWithBias
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-08-14 15:58:54 -05:00
Adel Johar aaf8613b76 Docs: Add environment variables reference page 2025-08-14 09:55:28 +02:00
Karthikeyan Arumugam 6d41e5ba99 Add cstring header explictly as it is removed from HIP (#1859) 2025-08-13 15:14:22 -07:00
Rahul Vaidya ee9ed3ef87 [BUILD] Fix UT packaging on Debian family OS (#1854)
* Fix UT packaging on Debian family OSes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Split OR condition when performing Debian checks

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>
2025-08-11 17:03:16 -05:00
Chris Sosa 53977821b5 Add CI Badge for tracking CI status in prep for gating changes (#1851)
This PR is intended to move RCCL to gating changes on CI failures. Right now, only build/unittests run per PR consistently. We should eventually add all single and multi-node test status badges once those tests are running in presubmit and continuously on develop
2025-08-11 14:02:46 -07:00