diff --git a/projects/rccl/CHANGELOG.md b/projects/rccl/CHANGELOG.md index 93180498c8..448c973bec 100644 --- a/projects/rccl/CHANGELOG.md +++ b/projects/rccl/CHANGELOG.md @@ -2,28 +2,44 @@ Full documentation for RCCL is available at [https://rccl.readthedocs.io](https://rccl.readthedocs.io) -## (Unreleased) RCCL-2.12.10 +## RCCL-2.13.4 for ROCm 5.4.0 +### Changed +- Compatibility with NCCL 2.13.4 +- Improvements to RCCL when running with hipGraphs +- RCCL_ENABLE_HIPGRAPH environment variable is no longer necessary to enable hipGraph support +- Minor latency improvements +### Fixed +- Resolved potential memory access error due to asynchronous memset + +## RCCL-2.12.10 for ROCm 5.3.0 +### Changed +- Improvements to LL128 algorithms +### Added +- Adding initial hipGraph support via opt-in environment variable RCCL_ENABLE_HIPGRAPH +- Integrating with NPKit (https://github.com/microsoft/NPKit) profiling code + +## RCCL-2.12.10 for ROCm 5.2.3 ### Added - Compatibility with NCCL 2.12.10 - Packages for test and benchmark executables on all supported OSes using CPack. - Adding custom signal handler - opt-in with RCCL_ENABLE_SIGNALHANDLER=1 - Additional details provided if Binary File Descriptor library (BFD) is pre-installed -- Adding experimental support for using multiple ranks per device - - Requires using a new interface to create communicator (ncclCommInitRankMulti), please - refer to the interface documentation for details. - - To avoid potential deadlocks, user might have to set an environment variables increasing - the number of hardware queues (e.g. export GPU_MAX_HW_QUEUES=16) - Adding support for reusing ports in NET/IB channels - Opt-in with NCCL_IB_SOCK_CLIENT_PORT_REUSE=1 and NCCL_IB_SOCK_SERVER_PORT_REUSE=1 - When "Call to bind failed : Address already in use" error happens in large-scale AlltoAll (e.g., >=64 MI200 nodes), users are suggested to opt-in either one or both of the options to resolve the massive port usage issue -- Avoid using NCCL_IB_SOCK_SERVER_PORT_REUSE when NCCL_NCHANNELS_PER_NET_PEER is tuned >1 -- Adding initial hipGraph support via opt-in environment variable RCCL_ENABLE_HIPGRAPH - + - Avoid using NCCL_IB_SOCK_SERVER_PORT_REUSE when NCCL_NCHANNELS_PER_NET_PEER is tuned >1 ### Removed - Removed experimental clique-based kernels +## RCCL-2.11.4 for ROCm 5.2.0 +### Changed +- Unit testing framework rework +- Minor bug fixes +### Known issues +- Managed memory is not currently supported for clique-based kernels + ## RCCL-2.11.4 for ROCm 5.1.0 ### Added - Compatibility with NCCL 2.11.4