72 Commit

Autore SHA1 Messaggio Data
Marzieh Berenjkoub d7293281f3 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 858b4e76eb]
2026-01-20 13:04:02 -06:00
Avinash 832c5b1f13 [build] Disable MSCCL++ compilation by default (#1879)
* Enable MSCCLPP on request

* Updating docs and README

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update CHANGELOG.md

Github didn't take the edit to my suggestion properly.

---------

Co-authored-by: amd <amd@super3.amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: a0ec15bafe]
2025-08-28 08:52:12 -06:00
Mustafa Abduljabbar f37f290134 [Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861)
* Support pipelining codegen and template specialization

* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)

* Remove need for FUNC_INDEX_TOTAL

* Add pipeline field to device function key construction logic

* Avoid unneeded codegen for LL/LL64 kernels

* Modify conditions and add pipeline dtypes env

* Optimize selection for both gfx942 and gfx950

* Increase pipeline bitfield width

* Use __forceinline__ for all device functions

* Realign reduceCopy with original form

* Add opt-out option to enable perf debugs

* Remove force-reduce-pipelining option from README

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 277747c199]
2025-08-26 15:03:54 -04:00
Marius Brehler b8daad3068 Add a badge for TheRock CI (#1874)
Adds a badge for TheRock CI and moves the existing badge to the top.

[ROCm/rccl commit: 5ae5eb9440]
2025-08-21 21:54:37 +02:00
Chris Sosa 584413b2cb Add CI Badge for tracking CI status in prep for gating changes (#1851)
This PR is intended to move RCCL to gating changes on CI failures. Right now, only build/unittests run per PR consistently. We should eventually add all single and multi-node test status badges once those tests are running in presubmit and continuously on develop

[ROCm/rccl commit: 53977821b5]
2025-08-11 14:02:46 -07:00
Atul Kulkarni e550ba1e3b Update help text in README (#1837)
[ROCm/rccl commit: e2c9f2feab]
2025-08-01 14:19:27 -05:00
Nilesh M Negi 708c053b21 Update Dockerfile to use CMake-based build (#1630)
* [DOCKER] Update Dockerfile to switch to CMake build

* Fix typo in Dockerfile.ubuntu

* Add README to docker sub-dir

* Update Dockerfile and README

* Modify markdown headings in docker/README

* Update docs

* Fix typo in docs

* Update docs/install/docker-install.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update docs/install/docker-install.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update docs/install/docker-install.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update docker/README

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: bd1a5b38b6]
2025-04-10 11:40:10 -05:00
Jeffrey Novotny 0f4558ea59 Modify cmake instruction in build from source (#1445)
[ROCm/rccl commit: 28594b26b3]
2024-12-03 11:26:02 -05:00
Jeffrey Novotny 1d1e17b3c9 Refactor RCCL install guide into several pages (#1427)
* Refactor RCCL install guide into several pages

* Changes from code review and new docker guide

* Add missing entries to ToC

* Minor fixes

* Fix help strings

* Edits after review and remove extra white space

[ROCm/rccl commit: bf7c130631]
2024-11-27 15:34:26 -05:00
corey-derochie-amd d5a2245a40 Checkout submodules with shallow depth (#1353)
* Make submodules shallow

* Updated README for the shallow checkout changes.

[ROCm/rccl commit: 7231808c58]
2024-09-27 11:07:16 -06:00
Nilesh M Negi 60ee54839c Add Dockerfile to build rccl and rccl-tests (#1011)
* [BUILD] Add Dockerfile for RCCL and RCCL-Tests

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Update docker/Dockerfile.ubuntu

Typo for LD_LIBRARY_PATH

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update docker/Dockerfile.ubuntu

use `-b` for `git clone` instead of additional `git checkout`

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update docker/Dockerfile.ubuntu

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 707377b3cd]
2024-09-22 03:53:16 -05:00
corey-derochie-amd 9ffd893c5a Re-enabled MSCCL++ (#1325)
* Added restrictions around calling MSCCL++ collectives (#1281)

* Added restriction to non-zero 32-byte multiple message sizes to MSCCL++ AllGather.

* Renamed and refactored some mscclpp types.

* Only transmit the MSCCL++ unique id for non-split comm init. For splitting comm, it has already been transmitted. Instead, save the MSCCL++ communicator in child communicators when calling `ncclCommSplit`. Only destroy MSCCL++ communicators when no RCCL communicators remain that use it. Also improved trace logging.

* Disable MSCCL++ when using managed memory buffers as it isn't supported.

* Added datatype and op constraints for MSCCL++ AllReduce.

* Added documentation on MSCCL++ restrictions to the README.

* [BUILD] Support custom CMake flags in MSCCLPP (#1275)

* [BUILD] Support custom CMAKE_PREFIX_PATH in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] CMake flags to support build-id in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] Fix CMake warnings in MSCCLPP build

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Wrapped all cmake arguments passed to mscclpp to remove empty arguments and properly format them.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>

* Link to libmscclpp_nccl statically (#1282)

* Switched mscclpp_nccl to static linking. Added a build step to rename the NCCL API functions.

* Undid separation of building libmscclpp_nccl from building librccl with MSCCL++ integration. With a static build, it's either fully enabled or fully disabled.

* `nm` isn't always available in docker containers due to being stripped down. Removed use of `nm` in `cmake` and hard-coded the output into mscclpp_nccl_syms.txt.

* Removed IBVerbs dependency for integrating with MSCCL++ (#1313)

* Renamed `RCCL_ENABLE_MSCCLPP` to `RCCL_MSCCLPP_ENABLE` to conform to MSCCL. Set `RCCL_MSCCLPP_ENABLE` to 1 by default if `ENABLE_MSCCLPP` is defined, or 0 otherwise. Added a log warning if `RCCL_MSCCLPP_ENABLE` is set to 1 but `ENABLE_MSCCLPP` is not defined. (#1294)

* Include mscclpp as a git submodule (#1314)

* Added the desired mscclpp commit as a git submodule.

* Added step to automatically checkout the mscclpp submodule if it isn't already present, in case the user forgot to clone recursively.

* Added instruction to README to clone using --recurse-submodules to get the mscclpp submodule.

* Enabled MSCCL++ feature build.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 736a705875]
2024-09-11 09:55:16 -06:00
Ziyue Yang 7830806b4b Revise MSCCL link in README to Azure repo (#1311)
[ROCm/rccl commit: 8282baae7f]
2024-09-05 17:10:49 -05:00
randyh62 e2d093cc3a Update README.md (#1321)
update note formatting

[ROCm/rccl commit: 4e2eeafdf6]
2024-09-05 14:23:36 -07:00
randyh62 0f98c58804 what-is-rccl (#1312)
* what-is-rccl

* create Installation instreuctions from README

* update README link

* Add using-nccl

* Add note about docs

* correct doc path

* sources to source

* correct docs link

[ROCm/rccl commit: 391c7ea070]
2024-09-05 06:54:48 -07:00
corey-derochie-amd dc04844405 Disable MSCCL for the non-multi-process case by default (#1307)
* Added `RCCL_MSCCL_ENABLE_SINGLE_PROCESS` runtime flag to return to the original MSCCL enablement behaviour except when explicitly enabling for multi-thread.

* Added documentation for the new `RCCL_MSCCL_ENABLE_SINGLE_PROCESS` runtime env var.

[ROCm/rccl commit: e056fe8f7e]
2024-09-04 11:11:50 -06:00
Nilesh M Negi 3e52d15ced [README] Tips on using less than 8 MI300 GPUs (#1270)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: a2474846f5]
2024-08-06 11:12:09 -05:00
Nilesh M Negi 35f4a405f0 [BUILD] Update gfxTargets for ASAN build (#1242)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 4f31ab85ea]
2024-08-06 10:53:51 -05:00
corey-derochie-amd b8542c2477 Integrated RCCL with MSCCL++ for small message sizes (#1231)
[ROCm/rccl commit: 6dc47eecd7]
2024-07-12 15:32:58 -06:00
Nilesh M Negi 7ca67f1cb9 [BUILD] Update install.sh for RCCL build (#1191)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 5aaf7121d9]
2024-05-31 17:58:34 -05:00
corey-derochie-amd 62a6a07d49 Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. (#1125)
[ROCm/rccl commit: 503a472a25]
2024-03-25 16:29:13 -06:00
Bertan Dogancay cee279fd99 Implement ROCTX (#1094)
* Implement roctx

[ROCm/rccl commit: b617aecc31]
2024-02-27 15:46:15 -07:00
Bertan Dogancay 11674674fc [DEV] Configure functions in RCCL (#986)
* configure functions in rccl

[ROCm/rccl commit: 28d9b170c9]
2024-01-18 15:07:16 -07:00
searlmc1 b5642f39ed Update README.md (#955)
Remove references to HCC, which was removed from ROCm ~2yrs ago

[ROCm/rccl commit: 15fa77bb57]
2023-11-15 18:01:45 -08:00
Bertan Dogancay 1a538d0218 Update install.sh --fast and README (#924)
[ROCm/rccl commit: 3807c203fc]
2023-10-19 16:35:10 -06:00
gilbertlee-amd 2bda28cf7e Limiting # parallel jobs in install script to 16 by default, and new -j/--jobs flag (#785)
[ROCm/rccl commit: bb55848450]
2023-06-22 14:30:44 -06:00
Sam Wu 5168be1867 Update Read the Docs, documentation, and dependabot (#772)
* update documentation

add version number to documentation

rename .sphinx/.doxygen to sphinx/doxygen

enable htmlzip, pdf, epub formats when publishing on Read the Docs

* add noCI label for dependabot PRs

since RTD CI is separate from math lib CI

* update rocm-docs-core to v0.13.4

* update README with link to rocm.docs.amd.com

[ROCm/rccl commit: c3f47853bd]
2023-06-07 15:31:58 -06:00
gilbertlee-amd d2c1295f79 Refactoring CMakeFiles (#755)
[ROCm/rccl commit: 777d8747a5]
2023-05-25 16:08:54 -06:00
akolliasAMD 7c15eeb38a added npkit_enable on CI tests (#698)
[ROCm/rccl commit: 9fe5a349f1]
2023-04-05 08:05:23 -06:00
Sam Wu 1e1be0c808 Fix Docs static analysis (#708)
* remove ref to old script in static analysis

* update README with doc build instructions

update gitignore with doc artifacts

[ROCm/rccl commit: 8c56e6c892]
2023-03-16 13:12:43 -06:00
Saad Rahim 8fdc4795fd Standard template implementation (#703)
[ROCm/rccl commit: 6e48e518d9]
2023-03-13 11:00:57 -06:00
PedramAlizadeh 1fe26823f5 Changed the name of UnitTests to rccl-UnitTests (wrapper executable included).
[ROCm/rccl commit: 45872d170f]
2022-12-13 21:45:57 +00:00
akolliasAMD 2a1d472a20 updated readme to reflect the newer tests
[ROCm/rccl commit: 5950942738]
2022-07-13 16:08:28 +00:00
Wenkai Du 9a9d9cb29b README.md: add CMAKE_PREFIX_PATH to build steps (#581)
[ROCm/rccl commit: 314da5a485]
2022-07-12 11:32:07 -07:00
Ziyue Yang 2b418b5dee Add Feature - Add NPKit Support in RCCL (#564)
* apply npkit

* fix bug

* add npkit in readme

[ROCm/rccl commit: 6e93fafdc3]
2022-06-20 14:30:19 -07:00
gilbertlee-amd a2a4888497 Moving opt-in custom signal handler from UnitTests into RCCL (#550)
* Enable via RCCL_ENABLE_SIGNALHANDLER=1

[ROCm/rccl commit: 700b473211]
2022-05-20 09:56:38 -06:00
gilbertlee-amd e2bf842e85 Update README.md (#364)
- Remove outdated HIP Direct call requirements
- Remove outdated chrpath requirement
- Adding section about HSA_FORCE_FINE_GRAIN_PCIE

[ROCm/rccl commit: b122dcd991]
2021-05-11 13:41:41 -06:00
Cory Bloor 6c28ab176f Add Jenkins docs build (#18)
* Fix typo in copyright

* Minor README improvements

- Prevent underscores from being interpreted as italics in test name format.
- Switch URL to HTTPS.

* Update docs scripts config

- Allow run_doc.sh and run_doxygen.sh to be called from any directory.

* Add docs build to Jenkins

[ROCm/rccl commit: 8aea5edb29]
2021-02-18 16:37:37 -07:00
Wenkai Du ab71643c99 Merge remote-tracking branch 'nccl/master' into 2.8.3
[ROCm/rccl commit: c985358e11]
2021-02-15 18:44:47 -05:00
Sylvain Jeaugey fc7bdb38a5 2.8.4-1
Fix hang in corner cases of alltoallv using point to point send/recv.
Harmonize error messages.
Fix missing NVTX section in the license.
Update README.


[ROCm/rccl commit: 911d61f214]
2021-02-09 15:36:48 -08:00
Stanley Tsang 209133fadf Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos (#265)
* Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos

* Removing potentially unneccessary dependencies from install script

[ROCm/rccl commit: 8c90aefb6d]
2020-09-10 17:27:22 -06:00
Stanley Tsang bbc4b72ebe Adding static library building option. (#244)
* Adding static library building option.

* Disabling running tests for static build

* Removing static packaging in CI

Co-authored-by: Saad Rahim <saad.rahim@amd.com>

[ROCm/rccl commit: c5d4d9eb76]
2020-08-06 11:19:43 -06:00
Stanley Tsang dafe176570 Documentation updates for NCCL 2.7.0 (#219)
* Making hip-clang the default compiler; documentation update

* Adding back --hip-clang to install.sh as a silent option for CI

* Documentation updates for NCCL 2.7

* Restoring deleted line in install script

[ROCm/rccl commit: 8d21adb5e3]
2020-06-16 16:48:11 -06:00
Stanley Tsang 8e43b854f1 Making hip-clang the default compiler; documentation update (#216)
* Making hip-clang the default compiler; documentation update

* Adding back --hip-clang to install.sh as a silent option for CI

[ROCm/rccl commit: dc403e0ca2]
2020-06-04 11:58:27 -06:00
Stanley Tsang e35e4d3401 Updating README and readthedocs documentation.
[ROCm/rccl commit: b59b9d328b]
2020-05-12 20:11:49 +00:00
Stanley Tsang e5419407c4 Updating copyright notices for 2020.
[ROCm/rccl commit: 20fa04d9b6]
2020-01-29 15:28:08 -08:00
Wenkai Du fedce64117 Change manual build instructions to fit most common usage
[ROCm/rccl commit: 00a910c2da]
2019-11-26 12:40:26 -08:00
Stanley Tsang 6aa817d768 Fixing install script to actually install library when requested (#88)
* Fixing install script to actually install library when requested.  Cleaning up unused code.

Removing unused arguments from install script.

Fixing weird whitespacing

* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib

* Updates and corrections to README and install script


[ROCm/rccl commit: 329a62a01f]
2019-06-25 17:25:21 -06:00
Saad Rahim 07d0f15687 Fixing whitespace
[ROCm/rccl commit: 02ef2d27e6]
2019-05-24 14:49:12 -07:00
Saad Rahim 7d340ae2a2 Adding link to readthedocs
[ROCm/rccl commit: fac7ef9370]
2019-05-24 14:48:24 -07:00