Graf commitů

74 Commity

Autor SHA1 Zpráva Datum
Marzieh Berenjkoub 858b4e76eb Merge remote-tracking branch 'nccl/master' into develop 2026-01-20 13:04:02 -06:00
Nusrat Islam 27648b0900 GDA support for alltoall via rocshmem integration (#2099)
* ROCSHMEM linking/building to match MSCCL++ style

* add rocSHMEM as a submodule

* Move rocSHMEM submodule to ext-src/rocSHMEM

* Adding submodule support proper, as well as a patch for rocshmem

* Cleaning up INCLUDE_DIR vs INCLUDE_DIRS mixup

* updating patch file

* Pointing rocshmem submodule to edgars fixup patch

* Adding IBVERBS link to the submodule build

* More IBVERBS patching

* pin rocshmem submodule to b534423

* Adding IPC support in rocSHMEM build

* updating rocshmem submodule to resolve CQ errors

* Updating submodule to include recent a2a optimizations

* invoke rocshmem alltoall from rccl

* Updating submodule to CQ error number hang

* Updating submodule to include a2a improvements and bug fixes

* Updating submodule to point to Yiltan's fork and doorbell ring removal commit

* Updating hash to correspond with submodule change

* Updating to no-ctx wg call and updating submodule

* copy-in/copy-out using multiples CUs

* Updating rocSHMEM submodule to include doorbell improvs

* updating gitmodule to point to upstream

* code cleanup and adjust threashold

* guard rocshmem a2a invocation

* Only build with rocshmem when specified

* code cleanup

* address review comments

* Removing debugging failure case

Signed-off-by: Thomas Huber <thomas.huber@amd.com>

* whitespace fix

* Adding rocshmem compile guard

* Removing unneccesary comment

Signed-off-by: Thomas Huber <thomas.huber@amd.com>

* remove commented lines

* address review comments

* cleanup

---------

Signed-off-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k12-27.cs-aus.dcgpu>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-03.cs-aus.dcgpu>
2026-01-09 14:04:54 -06:00
Mustafa Abduljabbar d009ab144e [Device] WarpSpeed enablement and single node CU and perf opt for MI350 (#2073) 2025-12-11 19:04:35 -05:00
Arm Patinyasakdikul 461e61d10e Added install.sh flag to suppress warnings. (#2054) 2025-11-17 00:35:06 -06:00
alex-breslow-amd ff209e5b19 Dump compiler-determined GPU kernel resource usage (#1965)
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)
2025-10-13 11:24:42 -05:00
alex-breslow-amd 8d6e21285c Implement disassembling library into assembly with source code (#1714)
- Add --dump-asm to install.sh dump assembly from RCCL library
2025-09-23 10:11:32 -07:00
Bertan Dogancay 93d86dd8e3 [BUILD] Stop generating sym kernels by default (#1907)
* Stop generating sym kernels by default
2025-09-15 12:19:35 -04:00
Avinash a0ec15bafe [build] Disable MSCCL++ compilation by default (#1879)
* Enable MSCCLPP on request

* Updating docs and README

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update CHANGELOG.md

Github didn't take the edit to my suggestion properly.

---------

Co-authored-by: amd <amd@super3.amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2025-08-28 08:52:12 -06:00
Atul Kulkarni e2c9f2feab Update help text in README (#1837) 2025-08-01 14:19:27 -05:00
Rahul Vaidya 0adc5edc74 Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831)
Signed-off-by: ravaidya <ravaidya@amd.com>
2025-07-31 11:00:49 -05:00
Mustafa Abduljabbar 0ce20e7e07 Add optional bf16 software-triggered pipelining for reduceCopyPacks (#1758)
- Introduced double-buffering to reduce copy overhead and overlap BF16 arithmetic with data prefetching.
- Aimed to improve performance of reduction-based collectives by up to 10%.
- Implemented based on recommendations from Guennadi Riguer (AMD)
- Added --force-reduce-pipeline option to install.sh to activate this optimization for BF16 reductions.
- Feature is disabled by default to prevent regressions with large messages until auto-tuning logic is upstreamed.
---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Pedram Alizadeh <pmohamma@amd.com>
2025-07-25 10:57:05 -04:00
Atul Kulkarni 275fdd43c1 Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.
2025-07-17 11:20:49 -05:00
Nilesh M Negi 568777a9bf [BUILD] Move NPKit flags from install.sh to CMakeLists.txt (#1741) 2025-06-23 21:51:49 -05:00
jonatluu 709140204a Remove File reorganization backward compatibility (rccl) (#1753) 2025-06-22 17:18:26 -05:00
Atul Kulkarni 682ed36fe6 Added new ENABLE_CODE_COVERAGE option. (#1664)
Modified install.sh script to add this new option
2025-06-10 12:12:36 -05:00
isaki001 8145c4f3b8 Add Compilation Flag for enabling/disabling clipping, and tune number of blocks for mscclpp allreduce8 (#1607)
* mscclpp patch apply clip patch and set allreduce8 blocks from 512 to 1024

* add compilation flag for enabling/disabling clipping in mscclpp

* change flag name for consistency, set flag to OFF

* add compilation flag in rccl for enabling clipping in mscclpp

* set 1024 threads for mscclpp allreduce8 only for bfloat16

* fix improper description for ENABLE_MSCCLPP_CLIP flag

* Revert "Merge branch 'clip-patch' of https://github.com/isaki001/rccl into clip-patch"

This reverts commit 6e31857a9db98314b8a748eb024f2c3699ebe2d5, reversing
changes made to 193f4caa8ffa78b4e056893212fd8344aa14e937.

* update clip remove-clip.patch for rebase
2025-04-30 16:42:28 -05:00
Wenkai Du f957c4fe22 NPKit: enable reduce scatter profiling (#1580) 2025-03-04 10:03:56 -08:00
Mustafa Abduljabbar dc75209dd7 Add IB verbs logging and enable traces through install.sh (#1511)
* Add IB Verbs logging

* Simplify tracing and undo debug.h changes

* Update debug.h

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

* Exchange remote comm device index
2025-01-31 12:35:39 -05:00
Bertan Dogancay 35fe9e06f3 [Profiler] Enable ROCTX during build by default (#1506)
* Enable ROCTX during build by default

* Check for roctx support in cmake
2025-01-29 11:29:46 -05:00
akolliasAMD 45c1c1a781 changed the CMake option from AMDGPU_TARGETS to GPU_TARGETS (#1440) 2024-12-12 12:09:30 -07:00
Jeffrey Novotny bf7c130631 Refactor RCCL install guide into several pages (#1427)
* Refactor RCCL install guide into several pages

* Changes from code review and new docker guide

* Add missing entries to ToC

* Minor fixes

* Fix help strings

* Edits after review and remove extra white space
2024-11-27 15:34:26 -05:00
gilbertlee-amd 575afee5de Fixing install.sh to properly accept spaces in ONLY_FUNCS (#1339) 2024-09-18 17:25:36 -06:00
corey-derochie-amd 736a705875 Re-enabled MSCCL++ (#1325)
* Added restrictions around calling MSCCL++ collectives (#1281)

* Added restriction to non-zero 32-byte multiple message sizes to MSCCL++ AllGather.

* Renamed and refactored some mscclpp types.

* Only transmit the MSCCL++ unique id for non-split comm init. For splitting comm, it has already been transmitted. Instead, save the MSCCL++ communicator in child communicators when calling `ncclCommSplit`. Only destroy MSCCL++ communicators when no RCCL communicators remain that use it. Also improved trace logging.

* Disable MSCCL++ when using managed memory buffers as it isn't supported.

* Added datatype and op constraints for MSCCL++ AllReduce.

* Added documentation on MSCCL++ restrictions to the README.

* [BUILD] Support custom CMake flags in MSCCLPP (#1275)

* [BUILD] Support custom CMAKE_PREFIX_PATH in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] CMake flags to support build-id in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] Fix CMake warnings in MSCCLPP build

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Wrapped all cmake arguments passed to mscclpp to remove empty arguments and properly format them.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>

* Link to libmscclpp_nccl statically (#1282)

* Switched mscclpp_nccl to static linking. Added a build step to rename the NCCL API functions.

* Undid separation of building libmscclpp_nccl from building librccl with MSCCL++ integration. With a static build, it's either fully enabled or fully disabled.

* `nm` isn't always available in docker containers due to being stripped down. Removed use of `nm` in `cmake` and hard-coded the output into mscclpp_nccl_syms.txt.

* Removed IBVerbs dependency for integrating with MSCCL++ (#1313)

* Renamed `RCCL_ENABLE_MSCCLPP` to `RCCL_MSCCLPP_ENABLE` to conform to MSCCL. Set `RCCL_MSCCLPP_ENABLE` to 1 by default if `ENABLE_MSCCLPP` is defined, or 0 otherwise. Added a log warning if `RCCL_MSCCLPP_ENABLE` is set to 1 but `ENABLE_MSCCLPP` is not defined. (#1294)

* Include mscclpp as a git submodule (#1314)

* Added the desired mscclpp commit as a git submodule.

* Added step to automatically checkout the mscclpp submodule if it isn't already present, in case the user forgot to clone recursively.

* Added instruction to README to clone using --recurse-submodules to get the mscclpp submodule.

* Enabled MSCCL++ feature build.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2024-09-11 09:55:16 -06:00
Nilesh M Negi cb2e0615d7 [BUILD] Disable MSCCLPP build by default (#1283)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-08-02 23:17:51 -05:00
corey-derochie-amd 6dc47eecd7 Integrated RCCL with MSCCL++ for small message sizes (#1231) 2024-07-12 15:32:58 -06:00
corey-derochie-amd 0c36d571ea Enable multi-threading for MSCCL (#1203)
MSCCL can now run in a multi-threaded configuration. To test in the unit tests, added the ENABLE_OPENMP compile definition flag and the --openmp-test-enable flag to the unit test build script. To activate, set the environment variables UT_MULTITHREADED=1 and UT_PROCESS_MASK=1. Set Jenkins to use this mode.
2024-07-04 09:34:38 -06:00
Nilesh M Negi 5aaf7121d9 [BUILD] Update install.sh for RCCL build (#1191)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-05-31 17:58:34 -05:00
Wenkai Du a0cef69110 npkit: add broadcast trace (#1166) 2024-05-07 14:00:16 -07:00
Bertan Dogancay b617aecc31 Implement ROCTX (#1094)
* Implement roctx
2024-02-27 15:46:15 -07:00
Bertan Dogancay 8a442faa12 Nvtx support (#1076)
* NVTX support
2024-02-08 14:08:24 -07:00
Bertan Dogancay 28d9b170c9 [DEV] Configure functions in RCCL (#986)
* configure functions in rccl
2024-01-18 15:07:16 -07:00
Nilesh M Negi 414884c6cb Remove FORCE from AMDGPU_TARGETS and add support in install script (#989)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-01-09 13:29:47 -06:00
Wenkai Du a497722894 NPkit: misc fixes for MSCCL (#936)
* msccl: add xcc_id to timestamp sync

* NPKit: add timestamp for rrc operator

* NPKit: add timestamp for MSCCL init
2023-10-30 10:00:12 -07:00
Bertan Dogancay 3807c203fc Update install.sh --fast and README (#924) 2023-10-19 16:35:10 -06:00
Edgar Gabriel 88a55cef83 turn bfd compilation off by default
revert the logic to ensure that we are not accidentally creating
a dependency on the bfd libraries when deploying rccl binaries.
2023-09-29 20:25:33 +00:00
akolliasAMD a773def279 install.sh fix (#903) 2023-09-29 07:42:17 -06:00
Cen Zhao fb57a438d7 Update install.sh to take "--static" option (#894)
* Update install.sh to take "--static" option

* Fix static build errors

---------

Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>
2023-09-27 12:45:21 -04:00
Nusrat Islam a283f55f12 msccl: add NPKIT profiling for MSCCL send-recv 2023-09-08 13:11:16 -05:00
arvindcheru 6ee758382e 366827 - Disable file reorg backward compatibility support by default (#849)
* Disable file reorg backward compatibility support by default

- File Reorg backward compatibility option set to OFF

* Update install.sh
2023-08-22 09:14:49 -04:00
Ziyue Yang d33a70e620 NPKit update (#844)
* NPKit update

1. Enable NPKit for MSCCL kernels
2. Fix NPKit context index calculation for sendrecv kernels

* Update build script for npkit
2023-08-08 17:30:40 -07:00
Bertan Dogancay 64c32d1c5b Disable MSCCL kernels at compile time (#834)
* Disable MSCCL kernels at compile time
2023-08-02 09:45:18 -06:00
Wenkai Du 0f14e5a640 npkit: separate network timing between send and test (#798) 2023-07-10 09:31:49 -07:00
akolliasAMD 9bba4a2f2a added npkit support into the all_gather run ring algorithm (#790) 2023-06-29 13:59:54 -06:00
gilbertlee-amd bb55848450 Limiting # parallel jobs in install script to 16 by default, and new -j/--jobs flag (#785) 2023-06-22 14:30:44 -06:00
Bertan Dogancay 0c77c66221 Disable Colltrace for --fast option (#778)
* Disable Colltrace for --fast option

* Limit nprocs for CI
2023-06-21 14:16:09 -06:00
Bertan Dogancay f35777e9b0 improve compilation time and create timetrace plot (#773)
* improve compilation time and create time-trace plot

* set default value for nproc
2023-06-14 09:17:51 -06:00
gilbertlee-amd 777d8747a5 Refactoring CMakeFiles (#755) 2023-05-25 16:08:54 -06:00
akolliasAMD 58db1cb96d updated install script to enable all of npkit (#754) 2023-05-24 14:44:01 -06:00
akolliasAMD 9fe5a349f1 added npkit_enable on CI tests (#698) 2023-04-05 08:05:23 -06:00
PedramAlizadeh 45872d170f Changed the name of UnitTests to rccl-UnitTests (wrapper executable included). 2022-12-13 21:45:57 +00:00