Fix data corruption with Tree/LL128 on systems with 1GPU:1NIC.
Fix hang with Collnet on bfloat16 on systems with less than one NIC
per GPU.
Fix long initialization time.
Fix data corruption with Collnet when mixing multi-process and
multi-GPU per process.
Fix crash when shared memory creation fails.
Fix Avg operation with Collnet/Chain.
Fix performance of alltoall at scale with more than one NIC per GPU.
Fix performance for DGX H800.
Fix race condition in connection progress causing a crash.
Fix network flush with Collnet.
Fix performance of aggregated allGather/reduceScatter operations.
Fix PXN operation when CUDA_VISIBLE_DEVICES is set.
Fix NVTX3 compilation issues on Debian 10.
This commit is contained in:
Sylvain Jeaugey
2023-06-13 00:19:57 -07:00
parent d97a32fac8
commit ea38312273
26 changed files with 331 additions and 225 deletions
+1 -1
View File
@@ -1,6 +1,6 @@
##### version
NCCL_MAJOR := 2
NCCL_MINOR := 18
NCCL_PATCH := 1
NCCL_PATCH := 3
NCCL_SUFFIX :=
PKG_REVISION := 1