ROCm 4.3 changelog update (#379)

* Update CHANGELOG.md (#378)

* Updating CHANGELOG.md for ROCm 4.3

[ROCm/rccl commit: 903c84050d]
此提交包含在:
gilbertlee-amd
2021-06-03 10:56:02 -06:00
提交者 GitHub
父節點 f7024c67c2
當前提交 fd94c55afe
+21 -2
查看文件
@@ -1,15 +1,34 @@
# Change Log for RCCL
Full documentation for RCCL is available at [https://rccl.readthedocs.io](https://rccl.readthedocs.io)
## [UNRELEASED]
### Added
- Compatibility with NCCL 2.9.9
## [Unreleased]
## [RCCL-2.8.4 for ROCm 4.3.0]
### Added
- Ability to select the number of channels to use for clique-based all reduce (RCCL_CLIQUE_ALLREDUCE_NCHANNELS). This can be adjusted to tune for performance when computation kernels are being executed in parallel.
### Optimizations
- Additional tuning for clique-based kernel AllReduce performance (still requires opt in with RCCL_ENABLE_CLIQUE=1)
- Modification of default values for number of channels / byte limits for clique-based all reduce based on device architecture
### Changed
- Replaced RCCL_FORCE_ENABLE_CLIQUE to RCCL_CLIQUE_IGNORE_TOPO
- Clique-based kernels can now be enabled on topologies where all active GPUs are XGMI-connected
- Topologies not normally supported by clique-based kernels require RCCL_CLIQUE_IGNORE_TOPO=1
### Known issues
- Managed memory is not currently supported for clique-based kernels
## [RCCL-2.8.4 for ROCm 4.2.0]
### Added
- Compatibility with NCCL 2.8.4
### Optimizations
- Additional tuning for clique-based kernels
- Enabling GPU direct RDMA read from GPU
- Fixing potential memory leak issue when re-creating multiple communicators within same process
- Improved topology detection
### Known issues
- None
## [RCCL-2.7.8 for ROCm 4.1.0]
### Added