2dc2c86ec1078e20c57c5b4c6cbb98080e1192cd
Fix data corruption with Tree/LL128 on systems with 1GPU:1NIC.
Fix hang with Collnet on bfloat16 on systems with less than one NIC
per GPU.
Fix long initialization time.
Fix data corruption with Collnet when mixing multi-process and
multi-GPU per process.
Fix crash when shared memory creation fails.
Fix Avg operation with Collnet/Chain.
Fix performance of alltoall at scale with more than one NIC per GPU.
Fix performance for DGX H800.
Fix race condition in connection progress causing a crash.
Fix network flush with Collnet.
Fix performance of aggregated allGather/reduceScatter operations.
Fix PXN operation when CUDA_VISIBLE_DEVICES is set.
Fix NVTX3 compilation issues on Debian 10.
[ROCm/rccl commit: ea38312273]
Popis
Nebyl uveden žádný popis
Jazyky
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Jiný
1.1%