Add support for IB SHARP to NVLS (NVLink SHARP algorithm).
Add NVLS+Tree algorithm.
Add support for memory management using cuMem* functions.
Use all NICs for Send/Receive operations on systems with more than
one NIC per GPU (#804).
Add ncclCommSplit primitive, with resource sharing option in config.
Fix alltoallv hang (#788)
Increase number of channels on H100 when we're not limited by NVLink.
Improve error reporting in case of IB failure, printing local and
remote ID (#779).
Add build option to allow compilation against RDMA includes instead
of dynamically loading IB verbs symbols (#802).
Fix context creation for progress thread (#803).
NET/IB: add option to use multiple QPs in round-robin mode.
Fix tree performance issue when NVB is disabled on HCM topologies.
Этот коммит содержится в:
Sylvain Jeaugey
2023-04-03 05:32:07 -07:00
родитель 9b7d5edbfc
Коммит d97a32fac8
64 изменённых файлов: 4758 добавлений и 3131 удалений
+5
Просмотреть файл
@@ -12,6 +12,7 @@ DEBUG ?= 0
TRACE ?= 0
PROFAPI ?= 1
NVTX ?= 1
RDMA_CORE ?= 0
NVCC = $(CUDA_HOME)/bin/nvcc
@@ -106,3 +107,7 @@ endif
ifneq ($(PROFAPI), 0)
CXXFLAGS += -DPROFAPI
endif
ifneq ($(RDMA_CORE), 0)
CXXFLAGS += -DNCCL_BUILD_RDMA_CORE=1
endif