Integrated RCCL with MSCCL++ for small message sizes (#1231)
This commit is contained in:
کامیت شده توسط
GitHub
والد
c755b9cf93
کامیت
6dc47eecd7
@@ -37,6 +37,7 @@ RCCL build & installation helper script
|
||||
--enable_backtrace Build with custom backtrace support
|
||||
--disable-colltrace Build without collective trace
|
||||
--disable-msccl-kernel Build without MSCCL kernels
|
||||
--disable-mscclpp Build without MSCCL++ support
|
||||
-f|--fast Quick-build RCCL (local gpu arch only, no backtrace, and collective trace support)
|
||||
-h|--help Prints this help message
|
||||
-i|--install Install RCCL library (see --prefix argument below)
|
||||
@@ -45,6 +46,7 @@ RCCL build & installation helper script
|
||||
--amdgpu_targets Only compile for specified GPU architecture(s). For multiple targets, seperate by ';' (builds for all supported GPU architectures by default)
|
||||
--no_clean Don't delete files if they already exist
|
||||
--npkit-enable Compile with npkit enabled
|
||||
--openmp-test-enable Enable OpenMP in rccl unit tests
|
||||
--roctx-enable Compile with roctx enabled (example usage: rocprof --roctx-trace ./rccl-program)
|
||||
-p|--package_build Build RCCL package
|
||||
--prefix Specify custom directory to install RCCL to (default: `/opt/rocm`)
|
||||
@@ -123,6 +125,13 @@ To manually run RCCL with NPKit enabled, environment variable `NPKIT_DUMP_DIR` n
|
||||
|
||||
To manually analyze NPKit dump results, please leverage [npkit_trace_generator.py](https://github.com/microsoft/NPKit/blob/main/rccl_samples/npkit_trace_generator.py).
|
||||
|
||||
## MSCCL/MSCCL++
|
||||
RCCL integrates MSCCL(https://github.com/microsoft/msccl) and MSCCL++ (https://github.com/microsoft/mscclpp) to leverage the highly efficient GPU-GPU communication primitives for collective operations. Thanks to Microsoft Corporation for collaborating with us in this project.
|
||||
|
||||
MSCCL uses XMLs for different collective algorithms on different architectures. RCCL collectives can leverage those algorithms once the corresponding XML has been provided by the user. The XML files contain the sequence of send-recv and reduction operations to be executed by the kernel. On MI300X, MSCCL is enabled by default. On other platforms, the users may have to enable this by setting `RCCL_MSCCL_FORCE_ENABLE=1`.
|
||||
|
||||
On the other hand, RCCL allreduce and allgather collectives can leverage the efficient MSCCL++ communication kernels for certain message sizes. MSCCL++ support is available whenever MSCCL support is available. Users need to set the RCCL environment variable `RCCL_ENABLE_MSCCLPP=1` to run RCCL workload with MSCCL++ support. It is also possible to set the message size threshold for using MSCCL++ by using the environment variable `RCCL_MSCCLPP_THRESHOLD`. Once `RCCL_MSCCLPP_THRESHOLD` (the default value is 1MB) is set, RCCL will invoke MSCCL++ kernels for all message sizes less than or equal to the specified threshold.
|
||||
|
||||
## Library and API Documentation
|
||||
|
||||
Please refer to the [RCCL Documentation Site](https://rocm.docs.amd.com/projects/rccl/en/latest/) for current documentation.
|
||||
|
||||
مرجع در شماره جدید
Block a user