Dosyalar
rocm-systems/README.md
T

148 satır
7.9 KiB
Markdown
Ham Normal Görünüm Geçmiş

2019-05-16 16:16:18 +00:00
# RCCL
2015-11-17 11:30:40 -08:00
2019-05-16 16:16:18 +00:00
ROCm Communication Collectives Library
2015-11-17 11:30:40 -08:00
2025-08-21 21:54:37 +02:00
[![RCCL](https://dev.azure.com/ROCm-CI/ROCm-CI/_apis/build/status%2Frccl?repoName=ROCm%2Frccl&branchName=develop)](https://dev.azure.com/ROCm-CI/ROCm-CI/_build/latest?definitionId=107&repoName=ROCm%2Frccl&branchName=develop)
[![TheRock CI](https://github.com/ROCm/rccl/actions/workflows/therock-ci.yml/badge.svg?branch=develop&event=push)](https://github.com/ROCm/rccl/actions/workflows/therock-ci.yml)
2024-09-05 14:23:36 -07:00
> **Note:** The published documentation is available at [RCCL](https://rocm.docs.amd.com/projects/rccl/en/latest/index.html) in an organized easy-to-read format that includes a table of contents and search functionality. The documentation source files reside in the [rccl/docs](https://github.com/ROCm/rccl/tree/develop/docs) folder in this repository. As with all ROCm projects, the documentation is open source. For more information, see [Contribute to ROCm documentation](https://rocm.docs.amd.com/en/latest/contribute/contributing.html).
2024-09-05 06:54:48 -07:00
2015-11-17 11:30:40 -08:00
## Introduction
2020-06-16 16:48:11 -06:00
RCCL (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs, implementing all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, and all-to-all. There is also initial support for direct GPU-to-GPU send and receive operations. It has been optimized to achieve high bandwidth on platforms using PCIe, xGMI as well as networking using InfiniBand Verbs or TCP/IP sockets. RCCL supports an arbitrary number of GPUs installed in a single node or multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications.
2015-11-17 11:30:40 -08:00
The collective operations are implemented using ring and tree algorithms and have been optimized for throughput and latency. For best performance, small operations can be either batched into larger operations or aggregated through the API.
2015-11-17 11:30:40 -08:00
## Requirements
2019-05-16 16:16:18 +00:00
1. ROCm supported GPUs
2023-11-15 18:01:45 -08:00
2. ROCm stack installed on the system (HIP runtime & HIP-Clang)
2015-11-17 11:30:40 -08:00
2019-05-16 16:16:18 +00:00
## Quickstart RCCL Build
2018-11-08 11:22:28 -08:00
2023-11-15 18:01:45 -08:00
RCCL directly depends on HIP runtime plus the HIP-Clang compiler, which are part of the ROCm software stack.
For ROCm installation instructions, see https://github.com/ROCm/ROCm.
2015-11-17 11:30:40 -08:00
The root of this repository has a helper script `install.sh` to build and install RCCL with a single command. It hard-codes configurations that can be specified through invoking cmake directly, but it's a great way to get started quickly and can serve as an example of how to build/install RCCL.
### To build the library using the install script:
2015-11-17 11:30:40 -08:00
```shell
./install.sh
```
For more info on build options/flags when using the install script, use `./install.sh --help`
```shell
./install.sh --help
RCCL build & installation helper script
Options:
--address-sanitizer Build with address sanitizer enabled
2025-08-01 14:19:27 -05:00
-c|--enable-code-coverage Enable code coverage
-d|--dependencies Install RCCL dependencies
--debug Build debug library
2023-10-19 16:35:10 -06:00
--enable_backtrace Build with custom backtrace support
--disable-colltrace Build without collective trace
--enable-msccl-kernel Build with MSCCL kernels
--enable-mscclpp Build with MSCCL++ support
2025-08-01 14:19:27 -05:00
--enable-mscclpp-clip Build MSCCL++ with clip wrapper on bfloat16 and half addition routines
--disable-roctx Build without ROCTX logging
-f|--fast Quick-build RCCL (local gpu arch only, no backtrace, and collective trace support)
-h|--help Prints this help message
-i|--install Install RCCL library (see --prefix argument below)
-j|--jobs Specify how many parallel compilation jobs to run ($nproc by default)
-l|--local_gpu_only Only compile for local GPU architecture
2025-08-01 14:19:27 -05:00
--amdgpu_targets Only compile for specified GPU architecture(s). For multiple targets, separate by ';' (builds for all supported GPU architectures by default)
--no_clean Don't delete files if they already exist
--npkit-enable Compile with npkit enabled
2025-08-01 14:19:27 -05:00
--log-trace Build with log trace enabled (i.e. NCCL_DEBUG=TRACE)
--openmp-test-enable Enable OpenMP in rccl unit tests
-p|--package_build Build RCCL package
--prefix Specify custom directory to install RCCL to (default: `/opt/rocm`)
--run_tests_all Run all rccl unit tests (must be built already)
-r|--run_tests_quick Run small subset of rccl unit tests (must be built already)
--static Build RCCL as a static library instead of shared library
-t|--tests_build Build rccl unit tests, but do not run
--time-trace Plot the build time of RCCL (requires `ninja-build` package installed on the system)
--verbose Show compile commands
```
2015-11-17 11:30:40 -08:00
By default, RCCL builds for all GPU targets defined in `DEFAULT_GPUS` in `CMakeLists.txt`. To target specific GPU(s), and potentially reduce build time, use `--amdgpu_targets` as a `;` separated string listing GPU(s) to target.
2019-05-16 16:16:18 +00:00
## Manual build
### To build the library using CMake:
2015-11-17 11:30:40 -08:00
2018-09-24 16:06:59 -07:00
```shell
$ git clone --recursive https://github.com/ROCm/rccl.git
2019-05-16 16:16:18 +00:00
$ cd rccl
$ mkdir build
$ cd build
$ cmake ..
$ make -j 16 # Or some other suitable number of parallel jobs
2018-09-24 16:06:59 -07:00
```
If you have already cloned, you can checkout the external submodules manually.
2024-09-11 09:55:16 -06:00
```shell
$ git submodule update --init --recursive --depth=1
2024-09-11 09:55:16 -06:00
```
You may substitute an installation path of your own choosing by passing `CMAKE_INSTALL_PREFIX`. For example:
```shell
$ cmake -DCMAKE_INSTALL_PREFIX=$PWD/rccl-install -DCMAKE_BUILD_TYPE=Release ..
```
Note: ensure rocm-cmake is installed, `apt install rocm-cmake`.
2015-11-17 11:30:40 -08:00
2023-03-13 11:00:57 -06:00
### To build the RCCL package and install package :
2015-11-17 11:30:40 -08:00
2019-05-16 16:16:18 +00:00
Assuming you have already cloned this repository and built the library as shown in the previous section:
2015-11-17 11:30:40 -08:00
2018-09-24 16:06:59 -07:00
```shell
2019-05-16 16:16:18 +00:00
$ cd rccl/build
$ make package
$ sudo dpkg -i *.deb
2018-09-24 16:06:59 -07:00
```
RCCL package install requires sudo/root access because it installs under `/opt/rocm/`. This is an optional step as RCCL can instead be used directly by including the path containing `librccl.so`.
## Docker build
Refer to [docker/README.md](docker/README.md "docker/README.md")
2018-09-24 16:06:59 -07:00
## Tests
There are rccl unit tests implemented with the Googletest framework in RCCL. The rccl unit tests require Googletest 1.10 or higher to build and execute properly (installed with the -d option to install.sh).
To invoke the rccl unit tests, go to the build folder, then the test subfolder, and execute the appropriate rccl unit test executable(s).
rccl unit test names are now of the format:
2021-02-18 16:37:37 -07:00
2022-07-13 16:08:28 +00:00
CollectiveCall.[Type of test]
Filtering of rccl unit tests should be done with environment variable and by passing the `--gtest_filter` command line flag, for example:
2019-05-16 16:16:18 +00:00
2018-09-24 16:06:59 -07:00
```shell
UT_DATATYPES=ncclBfloat16 UT_REDOPS=prod ./rccl-UnitTests --gtest_filter="AllReduce.C*"
2015-11-17 11:30:40 -08:00
```
2024-09-05 06:54:48 -07:00
will run only AllReduce correctness tests with float16 datatype. A list of available filtering environment variables appears at the top of every run. See "Running a Subset of the Tests" at https://google.github.io/googletest/advanced.html#running-a-subset-of-the-tests for more information on how to form more advanced filters.
2015-11-17 11:30:40 -08:00
There are also other performance and error-checking tests for RCCL. These are maintained separately at https://github.com/ROCm/rccl-tests.
2019-05-16 16:16:18 +00:00
See the rccl-tests README for more information on how to build and run those tests.
2019-05-24 14:37:45 -07:00
## Library and API Documentation
Please refer to the [RCCL Documentation Site](https://rocm.docs.amd.com/projects/rccl/en/latest/) for current documentation.
2019-05-24 14:37:45 -07:00
2023-03-16 13:12:43 -06:00
### How to build documentation
Run the steps below to build documentation locally.
```shell
2023-03-16 13:12:43 -06:00
cd docs
pip3 install -r sphinx/requirements.txt
2023-03-16 13:12:43 -06:00
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
```
2018-09-24 16:06:59 -07:00
## Copyright
2025-08-01 14:19:27 -05:00
All source code and accompanying documentation is copyright (c) 2015-2025, NVIDIA CORPORATION. All rights reserved.
2019-05-16 16:16:18 +00:00
2025-08-01 14:19:27 -05:00
All modifications are copyright (c) 2019-2025 Advanced Micro Devices, Inc. All rights reserved.