2ce89466227dfe8ebc3483fe3b35b471758e85db
Add support for CUDA 12.0, drop Kepler (sm_35).
Support for H100 features.
Make socket code more robust and protected. Solves #555.
Improve performance on large CUDA graphs, reducing dependencies.
Reduce inter-socket bandwidth on AMD CPUs to favor better paths.
Various fixes to ncclCommAbort.
Make service thread polling resistant to EINTR.
Compile with profiling API by default.
Extend NVTX instrumentation with call arguments.
[ROCm/rccl commit: 28189e2df8]
描述
未提供描述
儲存庫語言
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
其他
1.1%