diff --git a/projects/rccl/CHANGELOG.md b/projects/rccl/CHANGELOG.md index 0db0cc936b..a9f103d527 100644 --- a/projects/rccl/CHANGELOG.md +++ b/projects/rccl/CHANGELOG.md @@ -1,15 +1,34 @@ # Change Log for RCCL Full documentation for RCCL is available at [https://rccl.readthedocs.io](https://rccl.readthedocs.io) +## [UNRELEASED] +### Added +- Compatibility with NCCL 2.9.9 -## [Unreleased] +## [RCCL-2.8.4 for ROCm 4.3.0] +### Added +- Ability to select the number of channels to use for clique-based all reduce (RCCL_CLIQUE_ALLREDUCE_NCHANNELS). This can be adjusted to tune for performance when computation kernels are being executed in parallel. ### Optimizations - Additional tuning for clique-based kernel AllReduce performance (still requires opt in with RCCL_ENABLE_CLIQUE=1) - +- Modification of default values for number of channels / byte limits for clique-based all reduce based on device architecture ### Changed - Replaced RCCL_FORCE_ENABLE_CLIQUE to RCCL_CLIQUE_IGNORE_TOPO - Clique-based kernels can now be enabled on topologies where all active GPUs are XGMI-connected - Topologies not normally supported by clique-based kernels require RCCL_CLIQUE_IGNORE_TOPO=1 +### Known issues +- Managed memory is not currently supported for clique-based kernels + +## [RCCL-2.8.4 for ROCm 4.2.0] +### Added +- Compatibility with NCCL 2.8.4 + +### Optimizations +- Additional tuning for clique-based kernels +- Enabling GPU direct RDMA read from GPU +- Fixing potential memory leak issue when re-creating multiple communicators within same process +- Improved topology detection +### Known issues +- None ## [RCCL-2.7.8 for ROCm 4.1.0] ### Added