2.18.1-1

Add support for IB SHARP to NVLS (NVLink SHARP algorithm).
Add NVLS+Tree algorithm.
Add support for memory management using cuMem* functions.
Use all NICs for Send/Receive operations on systems with more than
one NIC per GPU (#804).
Add ncclCommSplit primitive, with resource sharing option in config.
Fix alltoallv hang (#788)
Increase number of channels on H100 when we're not limited by NVLink.
Improve error reporting in case of IB failure, printing local and
remote ID (#779).
Add build option to allow compilation against RDMA includes instead
of dynamically loading IB verbs symbols (#802).
Fix context creation for progress thread (#803).
NET/IB: add option to use multiple QPs in round-robin mode.
Fix tree performance issue when NVB is disabled on HCM topologies.

Этот коммит содержится в:

Sylvain Jeaugey

2023-04-03 05:32:07 -07:00

родитель 9b7d5edbfc

Коммит d97a32fac8

64 изменённых файлов: 4758 добавлений и 3131 удалений

									
										makefiles/common.mk
									
		+5
		
												Просмотреть файл
												
				@@ -12,6 +12,7 @@ DEBUG ?= 0

				TRACE ?= 0

				PROFAPI ?= 1

				NVTX ?= 1

				RDMA_CORE ?= 0

				NVCC = $(CUDA_HOME)/bin/nvcc

				@@ -106,3 +107,7 @@ endif

				ifneq ($(PROFAPI), 0)

				CXXFLAGS += -DPROFAPI

				endif

				ifneq ($(RDMA_CORE), 0)

				CXXFLAGS += -DNCCL_BUILD_RDMA_CORE=1

				endif