Re-enabled MSCCL++ (#1325)
* Added restrictions around calling MSCCL++ collectives (#1281) * Added restriction to non-zero 32-byte multiple message sizes to MSCCL++ AllGather. * Renamed and refactored some mscclpp types. * Only transmit the MSCCL++ unique id for non-split comm init. For splitting comm, it has already been transmitted. Instead, save the MSCCL++ communicator in child communicators when calling `ncclCommSplit`. Only destroy MSCCL++ communicators when no RCCL communicators remain that use it. Also improved trace logging. * Disable MSCCL++ when using managed memory buffers as it isn't supported. * Added datatype and op constraints for MSCCL++ AllReduce. * Added documentation on MSCCL++ restrictions to the README. * [BUILD] Support custom CMake flags in MSCCLPP (#1275) * [BUILD] Support custom CMAKE_PREFIX_PATH in MSCCLPP Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> * [BUILD] CMake flags to support build-id in MSCCLPP Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> * [BUILD] Fix CMake warnings in MSCCLPP build Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> * Wrapped all cmake arguments passed to mscclpp to remove empty arguments and properly format them. --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: Corey Derochie <corey.derochie@amd.com> * Link to libmscclpp_nccl statically (#1282) * Switched mscclpp_nccl to static linking. Added a build step to rename the NCCL API functions. * Undid separation of building libmscclpp_nccl from building librccl with MSCCL++ integration. With a static build, it's either fully enabled or fully disabled. * `nm` isn't always available in docker containers due to being stripped down. Removed use of `nm` in `cmake` and hard-coded the output into mscclpp_nccl_syms.txt. * Removed IBVerbs dependency for integrating with MSCCL++ (#1313) * Renamed `RCCL_ENABLE_MSCCLPP` to `RCCL_MSCCLPP_ENABLE` to conform to MSCCL. Set `RCCL_MSCCLPP_ENABLE` to 1 by default if `ENABLE_MSCCLPP` is defined, or 0 otherwise. Added a log warning if `RCCL_MSCCLPP_ENABLE` is set to 1 but `ENABLE_MSCCLPP` is not defined. (#1294) * Include mscclpp as a git submodule (#1314) * Added the desired mscclpp commit as a git submodule. * Added step to automatically checkout the mscclpp submodule if it isn't already present, in case the user forgot to clone recursively. * Added instruction to README to clone using --recurse-submodules to get the mscclpp submodule. * Enabled MSCCL++ feature build. --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
This commit is contained in:
committed by
GitHub
parent
4856309413
commit
736a705875
@@ -68,13 +68,18 @@ By default, RCCL builds for all GPU targets defined in `DEFAULT_GPUS` in `CMakeL
|
||||
### To build the library using CMake:
|
||||
|
||||
```shell
|
||||
$ git clone https://github.com/ROCm/rccl.git
|
||||
$ git clone https://github.com/ROCm/rccl.git --recurse-submodules
|
||||
$ cd rccl
|
||||
$ mkdir build
|
||||
$ cd build
|
||||
$ cmake ..
|
||||
$ make -j 16 # Or some other suitable number of parallel jobs
|
||||
```
|
||||
If you have already cloned, you can checkout the `mscclpp` submodule manually.
|
||||
```shell
|
||||
$ cd ext-src/mscclpp
|
||||
$ git submodule update --init --recursive
|
||||
```
|
||||
You may substitute an installation path of your own choosing by passing `CMAKE_INSTALL_PREFIX`. For example:
|
||||
```shell
|
||||
$ cmake -DCMAKE_INSTALL_PREFIX=$PWD/rccl-install ..
|
||||
@@ -134,7 +139,13 @@ RCCL integrates [MSCCL](https://github.com/Azure/msccl) and [MSCCL++](https://gi
|
||||
|
||||
MSCCL uses XMLs for different collective algorithms on different architectures. RCCL collectives can leverage those algorithms once the corresponding XML has been provided by the user. The XML files contain the sequence of send-recv and reduction operations to be executed by the kernel. On MI300X, MSCCL is enabled by default. On other platforms, the users may have to enable this by setting `RCCL_MSCCL_FORCE_ENABLE=1`. By default, MSCCL will only be used if every rank belongs to a unique process; to disable this restriction for multi-threaded or single-threaded configurations, set `RCCL_MSCCL_ENABLE_SINGLE_PROCESS=1`.
|
||||
|
||||
On the other hand, RCCL allreduce and allgather collectives can leverage the efficient MSCCL++ communication kernels for certain message sizes. MSCCL++ support is available whenever MSCCL support is available. Users need to set the RCCL environment variable `RCCL_ENABLE_MSCCLPP=1` to run RCCL workload with MSCCL++ support. It is also possible to set the message size threshold for using MSCCL++ by using the environment variable `RCCL_MSCCLPP_THRESHOLD`. Once `RCCL_MSCCLPP_THRESHOLD` (the default value is 1MB) is set, RCCL will invoke MSCCL++ kernels for all message sizes less than or equal to the specified threshold.
|
||||
On the other hand, RCCL allreduce and allgather collectives can leverage the efficient MSCCL++ communication kernels for certain message sizes. MSCCL++ support is available whenever MSCCL support is available. Users need to set the RCCL environment variable `RCCL_MSCCLPP_ENABLE=1` to run RCCL workload with MSCCL++ support. It is also possible to set the message size threshold for using MSCCL++ by using the environment variable `RCCL_MSCCLPP_THRESHOLD`. Once `RCCL_MSCCLPP_THRESHOLD` (the default value is 1MB) is set, RCCL will invoke MSCCL++ kernels for all message sizes less than or equal to the specified threshold.
|
||||
|
||||
If some restrictions are not met, it will fall back to MSCCL or RCCL. The following are restrictions on using MSCCL++:
|
||||
- Message size must be a non-zero multiple of 32 bytes
|
||||
- Does not support `hipMallocManaged` buffers
|
||||
- Allreduce only supports `float16`, `int32`, `uint32`, `float32`, and `bfloat16` data types
|
||||
- Allreduce only supports the `sum` op
|
||||
|
||||
## Library and API Documentation
|
||||
|
||||
|
||||
Reference in New Issue
Block a user