Wenkai Du 4f7d5f85ec Sync up to NCCL 2.9.6 (#363)
* 2.9.6-1

Add support for CUDA graphs.
Fuse BCM Gen4 switches to avoid suboptimal performance on some platforms. Issue #439.
Fix bootstrap issue caused by connection reordering.
Fix CPU locking block.
Improve CollNet algorithm.
Improve performance on DGX A100 for communicators with only one GPU per node.

* Clique tuning upgrade (#352) (#19)

* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu

Co-authored-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com>

[ROCm/rccl commit: 6021329af0]
2021-05-11 19:40:34 -07:00
S
Описание
No description provided
282 MiB
Languages
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
Разное 1.1%