2.18.1-1
Add support for IB SHARP to NVLS (NVLink SHARP algorithm). Add NVLS+Tree algorithm. Add support for memory management using cuMem* functions. Use all NICs for Send/Receive operations on systems with more than one NIC per GPU (#804). Add ncclCommSplit primitive, with resource sharing option in config. Fix alltoallv hang (#788) Increase number of channels on H100 when we're not limited by NVLink. Improve error reporting in case of IB failure, printing local and remote ID (#779). Add build option to allow compilation against RDMA includes instead of dynamically loading IB verbs symbols (#802). Fix context creation for progress thread (#803). NET/IB: add option to use multiple QPs in round-robin mode. Fix tree performance issue when NVB is disabled on HCM topologies.
Этот коммит содержится в:
@@ -12,6 +12,7 @@ DEBUG ?= 0
|
||||
TRACE ?= 0
|
||||
PROFAPI ?= 1
|
||||
NVTX ?= 1
|
||||
RDMA_CORE ?= 0
|
||||
|
||||
NVCC = $(CUDA_HOME)/bin/nvcc
|
||||
|
||||
@@ -106,3 +107,7 @@ endif
|
||||
ifneq ($(PROFAPI), 0)
|
||||
CXXFLAGS += -DPROFAPI
|
||||
endif
|
||||
|
||||
ifneq ($(RDMA_CORE), 0)
|
||||
CXXFLAGS += -DNCCL_BUILD_RDMA_CORE=1
|
||||
endif
|
||||
|
||||
Ссылка в новой задаче
Block a user