Add support for CUDA 12.0, drop Kepler (sm_35).
Support for H100 features.
Make socket code more robust and protected. Solves #555.
Improve performance on large CUDA graphs, reducing dependencies.
Reduce inter-socket bandwidth on AMD CPUs to favor better paths.
Various fixes to ncclCommAbort.
Make service thread polling resistant to EINTR.
Compile with profiling API by default.
Extend NVTX instrumentation with call arguments.
这个提交包含在:
Sylvain Jeaugey
2022-11-29 04:27:46 -08:00
父节点 614b49f0de
当前提交 28189e2df8
修改 46 个文件,包含 3325 行新增1037 行删除
+2 -2
查看文件
@@ -1,6 +1,6 @@
##### version
NCCL_MAJOR := 2
NCCL_MINOR := 15
NCCL_PATCH := 5
NCCL_MINOR := 16
NCCL_PATCH := 2
NCCL_SUFFIX :=
PKG_REVISION := 1