74 Révisions

Auteur SHA1 Message Date
Marzieh Berenjkoub d7293281f3 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 858b4e76eb]
2026-01-20 13:04:02 -06:00
Nusrat Islam eb347a0dd3 GDA support for alltoall via rocshmem integration (#2099)
* ROCSHMEM linking/building to match MSCCL++ style

* add rocSHMEM as a submodule

* Move rocSHMEM submodule to ext-src/rocSHMEM

* Adding submodule support proper, as well as a patch for rocshmem

* Cleaning up INCLUDE_DIR vs INCLUDE_DIRS mixup

* updating patch file

* Pointing rocshmem submodule to edgars fixup patch

* Adding IBVERBS link to the submodule build

* More IBVERBS patching

* pin rocshmem submodule to b534423

* Adding IPC support in rocSHMEM build

* updating rocshmem submodule to resolve CQ errors

* Updating submodule to include recent a2a optimizations

* invoke rocshmem alltoall from rccl

* Updating submodule to CQ error number hang

* Updating submodule to include a2a improvements and bug fixes

* Updating submodule to point to Yiltan's fork and doorbell ring removal commit

* Updating hash to correspond with submodule change

* Updating to no-ctx wg call and updating submodule

* copy-in/copy-out using multiples CUs

* Updating rocSHMEM submodule to include doorbell improvs

* updating gitmodule to point to upstream

* code cleanup and adjust threashold

* guard rocshmem a2a invocation

* Only build with rocshmem when specified

* code cleanup

* address review comments

* Removing debugging failure case

Signed-off-by: Thomas Huber <thomas.huber@amd.com>

* whitespace fix

* Adding rocshmem compile guard

* Removing unneccesary comment

Signed-off-by: Thomas Huber <thomas.huber@amd.com>

* remove commented lines

* address review comments

* cleanup

---------

Signed-off-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k12-27.cs-aus.dcgpu>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-03.cs-aus.dcgpu>

[ROCm/rccl commit: 27648b0900]
2026-01-09 14:04:54 -06:00
Mustafa Abduljabbar 2621e0254e [Device] WarpSpeed enablement and single node CU and perf opt for MI350 (#2073)
[ROCm/rccl commit: d009ab144e]
2025-12-11 19:04:35 -05:00
Arm Patinyasakdikul f81bb04bff Added install.sh flag to suppress warnings. (#2054)
[ROCm/rccl commit: 461e61d10e]
2025-11-17 00:35:06 -06:00
alex-breslow-amd d51ed2fdfd Dump compiler-determined GPU kernel resource usage (#1965)
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)


[ROCm/rccl commit: ff209e5b19]
2025-10-13 11:24:42 -05:00
alex-breslow-amd 8c8c6886bc Implement disassembling library into assembly with source code (#1714)
- Add --dump-asm to install.sh dump assembly from RCCL library

[ROCm/rccl commit: 8d6e21285c]
2025-09-23 10:11:32 -07:00
Bertan Dogancay 546b37e35a [BUILD] Stop generating sym kernels by default (#1907)
* Stop generating sym kernels by default

[ROCm/rccl commit: 93d86dd8e3]
2025-09-15 12:19:35 -04:00
Avinash 832c5b1f13 [build] Disable MSCCL++ compilation by default (#1879)
* Enable MSCCLPP on request

* Updating docs and README

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update CHANGELOG.md

Github didn't take the edit to my suggestion properly.

---------

Co-authored-by: amd <amd@super3.amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: a0ec15bafe]
2025-08-28 08:52:12 -06:00
Atul Kulkarni e550ba1e3b Update help text in README (#1837)
[ROCm/rccl commit: e2c9f2feab]
2025-08-01 14:19:27 -05:00
Rahul Vaidya d65eb0b021 Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: 0adc5edc74]
2025-07-31 11:00:49 -05:00
Mustafa Abduljabbar b3a0cc5e96 Add optional bf16 software-triggered pipelining for reduceCopyPacks (#1758)
- Introduced double-buffering to reduce copy overhead and overlap BF16 arithmetic with data prefetching.
- Aimed to improve performance of reduction-based collectives by up to 10%.
- Implemented based on recommendations from Guennadi Riguer (AMD)
- Added --force-reduce-pipeline option to install.sh to activate this optimization for BF16 reductions.
- Feature is disabled by default to prevent regressions with large messages until auto-tuning logic is upstreamed.
---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Pedram Alizadeh <pmohamma@amd.com>

[ROCm/rccl commit: 0ce20e7e07]
2025-07-25 10:57:05 -04:00
Atul Kulkarni c94fb7c58e Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

[ROCm/rccl commit: 275fdd43c1]
2025-07-17 11:20:49 -05:00
Nilesh M Negi 0c41f27b10 [BUILD] Move NPKit flags from install.sh to CMakeLists.txt (#1741)
[ROCm/rccl commit: 568777a9bf]
2025-06-23 21:51:49 -05:00
jonatluu 590fb2798b Remove File reorganization backward compatibility (rccl) (#1753)
[ROCm/rccl commit: 709140204a]
2025-06-22 17:18:26 -05:00
Atul Kulkarni 4cd71722f2 Added new ENABLE_CODE_COVERAGE option. (#1664)
Modified install.sh script to add this new option

[ROCm/rccl commit: 682ed36fe6]
2025-06-10 12:12:36 -05:00
isaki001 de76d7f649 Add Compilation Flag for enabling/disabling clipping, and tune number of blocks for mscclpp allreduce8 (#1607)
* mscclpp patch apply clip patch and set allreduce8 blocks from 512 to 1024

* add compilation flag for enabling/disabling clipping in mscclpp

* change flag name for consistency, set flag to OFF

* add compilation flag in rccl for enabling clipping in mscclpp

* set 1024 threads for mscclpp allreduce8 only for bfloat16

* fix improper description for ENABLE_MSCCLPP_CLIP flag

* Revert "Merge branch 'clip-patch' of https://github.com/isaki001/rccl into clip-patch"

This reverts commit 6e31857a9db98314b8a748eb024f2c3699ebe2d5, reversing
changes made to 193f4caa8ffa78b4e056893212fd8344aa14e937.

* update clip remove-clip.patch for rebase

[ROCm/rccl commit: 8145c4f3b8]
2025-04-30 16:42:28 -05:00
Wenkai Du 086fa823db NPKit: enable reduce scatter profiling (#1580)
[ROCm/rccl commit: f957c4fe22]
2025-03-04 10:03:56 -08:00
Mustafa Abduljabbar f58025185e Add IB verbs logging and enable traces through install.sh (#1511)
* Add IB Verbs logging

* Simplify tracing and undo debug.h changes

* Update debug.h

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Exchange remote comm device index

[ROCm/rccl commit: dc75209dd7]
2025-01-31 12:35:39 -05:00
Bertan Dogancay a781a3033b [Profiler] Enable ROCTX during build by default (#1506)
* Enable ROCTX during build by default

* Check for roctx support in cmake

[ROCm/rccl commit: 35fe9e06f3]
2025-01-29 11:29:46 -05:00
akolliasAMD c65d4ab18f changed the CMake option from AMDGPU_TARGETS to GPU_TARGETS (#1440)
[ROCm/rccl commit: 45c1c1a781]
2024-12-12 12:09:30 -07:00
Jeffrey Novotny 1d1e17b3c9 Refactor RCCL install guide into several pages (#1427)
* Refactor RCCL install guide into several pages

* Changes from code review and new docker guide

* Add missing entries to ToC

* Minor fixes

* Fix help strings

* Edits after review and remove extra white space

[ROCm/rccl commit: bf7c130631]
2024-11-27 15:34:26 -05:00
gilbertlee-amd d4094525c8 Fixing install.sh to properly accept spaces in ONLY_FUNCS (#1339)
[ROCm/rccl commit: 575afee5de]
2024-09-18 17:25:36 -06:00
corey-derochie-amd 9ffd893c5a Re-enabled MSCCL++ (#1325)
* Added restrictions around calling MSCCL++ collectives (#1281)

* Added restriction to non-zero 32-byte multiple message sizes to MSCCL++ AllGather.

* Renamed and refactored some mscclpp types.

* Only transmit the MSCCL++ unique id for non-split comm init. For splitting comm, it has already been transmitted. Instead, save the MSCCL++ communicator in child communicators when calling `ncclCommSplit`. Only destroy MSCCL++ communicators when no RCCL communicators remain that use it. Also improved trace logging.

* Disable MSCCL++ when using managed memory buffers as it isn't supported.

* Added datatype and op constraints for MSCCL++ AllReduce.

* Added documentation on MSCCL++ restrictions to the README.

* [BUILD] Support custom CMake flags in MSCCLPP (#1275)

* [BUILD] Support custom CMAKE_PREFIX_PATH in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] CMake flags to support build-id in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] Fix CMake warnings in MSCCLPP build

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Wrapped all cmake arguments passed to mscclpp to remove empty arguments and properly format them.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>

* Link to libmscclpp_nccl statically (#1282)

* Switched mscclpp_nccl to static linking. Added a build step to rename the NCCL API functions.

* Undid separation of building libmscclpp_nccl from building librccl with MSCCL++ integration. With a static build, it's either fully enabled or fully disabled.

* `nm` isn't always available in docker containers due to being stripped down. Removed use of `nm` in `cmake` and hard-coded the output into mscclpp_nccl_syms.txt.

* Removed IBVerbs dependency for integrating with MSCCL++ (#1313)

* Renamed `RCCL_ENABLE_MSCCLPP` to `RCCL_MSCCLPP_ENABLE` to conform to MSCCL. Set `RCCL_MSCCLPP_ENABLE` to 1 by default if `ENABLE_MSCCLPP` is defined, or 0 otherwise. Added a log warning if `RCCL_MSCCLPP_ENABLE` is set to 1 but `ENABLE_MSCCLPP` is not defined. (#1294)

* Include mscclpp as a git submodule (#1314)

* Added the desired mscclpp commit as a git submodule.

* Added step to automatically checkout the mscclpp submodule if it isn't already present, in case the user forgot to clone recursively.

* Added instruction to README to clone using --recurse-submodules to get the mscclpp submodule.

* Enabled MSCCL++ feature build.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 736a705875]
2024-09-11 09:55:16 -06:00
Nilesh M Negi 713ed3341d [BUILD] Disable MSCCLPP build by default (#1283)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: cb2e0615d7]
2024-08-02 23:17:51 -05:00
corey-derochie-amd b8542c2477 Integrated RCCL with MSCCL++ for small message sizes (#1231)
[ROCm/rccl commit: 6dc47eecd7]
2024-07-12 15:32:58 -06:00
corey-derochie-amd 37bf54b8f8 Enable multi-threading for MSCCL (#1203)
MSCCL can now run in a multi-threaded configuration. To test in the unit tests, added the ENABLE_OPENMP compile definition flag and the --openmp-test-enable flag to the unit test build script. To activate, set the environment variables UT_MULTITHREADED=1 and UT_PROCESS_MASK=1. Set Jenkins to use this mode.

[ROCm/rccl commit: 0c36d571ea]
2024-07-04 09:34:38 -06:00
Nilesh M Negi 7ca67f1cb9 [BUILD] Update install.sh for RCCL build (#1191)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 5aaf7121d9]
2024-05-31 17:58:34 -05:00
Wenkai Du 0ff5fc0bad npkit: add broadcast trace (#1166)
[ROCm/rccl commit: a0cef69110]
2024-05-07 14:00:16 -07:00
Bertan Dogancay cee279fd99 Implement ROCTX (#1094)
* Implement roctx

[ROCm/rccl commit: b617aecc31]
2024-02-27 15:46:15 -07:00
Bertan Dogancay 45ed3ef4e7 Nvtx support (#1076)
* NVTX support

[ROCm/rccl commit: 8a442faa12]
2024-02-08 14:08:24 -07:00
Bertan Dogancay 11674674fc [DEV] Configure functions in RCCL (#986)
* configure functions in rccl

[ROCm/rccl commit: 28d9b170c9]
2024-01-18 15:07:16 -07:00
Nilesh M Negi c1acf97c05 Remove FORCE from AMDGPU_TARGETS and add support in install script (#989)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 414884c6cb]
2024-01-09 13:29:47 -06:00
Wenkai Du b736b506c0 NPkit: misc fixes for MSCCL (#936)
* msccl: add xcc_id to timestamp sync

* NPKit: add timestamp for rrc operator

* NPKit: add timestamp for MSCCL init

[ROCm/rccl commit: a497722894]
2023-10-30 10:00:12 -07:00
Bertan Dogancay 1a538d0218 Update install.sh --fast and README (#924)
[ROCm/rccl commit: 3807c203fc]
2023-10-19 16:35:10 -06:00
Edgar Gabriel e6c3e9fd8e turn bfd compilation off by default
revert the logic to ensure that we are not accidentally creating
a dependency on the bfd libraries when deploying rccl binaries.


[ROCm/rccl commit: 88a55cef83]
2023-09-29 20:25:33 +00:00
akolliasAMD 12b2fc9774 install.sh fix (#903)
[ROCm/rccl commit: a773def279]
2023-09-29 07:42:17 -06:00
Cen Zhao d3c20a1210 Update install.sh to take "--static" option (#894)
* Update install.sh to take "--static" option

* Fix static build errors

---------

Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>

[ROCm/rccl commit: fb57a438d7]
2023-09-27 12:45:21 -04:00
Nusrat Islam ffbfe43500 msccl: add NPKIT profiling for MSCCL send-recv
[ROCm/rccl commit: a283f55f12]
2023-09-08 13:11:16 -05:00
arvindcheru 5e60fb93d5 366827 - Disable file reorg backward compatibility support by default (#849)
* Disable file reorg backward compatibility support by default

- File Reorg backward compatibility option set to OFF

* Update install.sh

[ROCm/rccl commit: 6ee758382e]
2023-08-22 09:14:49 -04:00
Ziyue Yang 18811f6159 NPKit update (#844)
* NPKit update

1. Enable NPKit for MSCCL kernels
2. Fix NPKit context index calculation for sendrecv kernels

* Update build script for npkit

[ROCm/rccl commit: d33a70e620]
2023-08-08 17:30:40 -07:00
Bertan Dogancay 74dd9c4807 Disable MSCCL kernels at compile time (#834)
* Disable MSCCL kernels at compile time

[ROCm/rccl commit: 64c32d1c5b]
2023-08-02 09:45:18 -06:00
Wenkai Du 3ce8711153 npkit: separate network timing between send and test (#798)
[ROCm/rccl commit: 0f14e5a640]
2023-07-10 09:31:49 -07:00
akolliasAMD e1ac484d4e added npkit support into the all_gather run ring algorithm (#790)
[ROCm/rccl commit: 9bba4a2f2a]
2023-06-29 13:59:54 -06:00
gilbertlee-amd 2bda28cf7e Limiting # parallel jobs in install script to 16 by default, and new -j/--jobs flag (#785)
[ROCm/rccl commit: bb55848450]
2023-06-22 14:30:44 -06:00
Bertan Dogancay d411d52b19 Disable Colltrace for --fast option (#778)
* Disable Colltrace for --fast option

* Limit nprocs for CI

[ROCm/rccl commit: 0c77c66221]
2023-06-21 14:16:09 -06:00
Bertan Dogancay 1ae071944e improve compilation time and create timetrace plot (#773)
* improve compilation time and create time-trace plot

* set default value for nproc

[ROCm/rccl commit: f35777e9b0]
2023-06-14 09:17:51 -06:00
gilbertlee-amd d2c1295f79 Refactoring CMakeFiles (#755)
[ROCm/rccl commit: 777d8747a5]
2023-05-25 16:08:54 -06:00
akolliasAMD f371afe6fd updated install script to enable all of npkit (#754)
[ROCm/rccl commit: 58db1cb96d]
2023-05-24 14:44:01 -06:00
akolliasAMD 7c15eeb38a added npkit_enable on CI tests (#698)
[ROCm/rccl commit: 9fe5a349f1]
2023-04-05 08:05:23 -06:00
PedramAlizadeh 1fe26823f5 Changed the name of UnitTests to rccl-UnitTests (wrapper executable included).
[ROCm/rccl commit: 45872d170f]
2022-12-13 21:45:57 +00:00