Optimize CUDA graph launch; avoid launching a CPU callback for
intra-node operations.
Simplify kernel common code to improve the latency of send/recv
operations.
Strengthen CUDA streams semantics.
Change NET API to v6, to add dmabuf support.
Add ncclGetLastError() function.
Add ncclRemoteError code and use it for remote network errors.
Support the use of a different NCCL_NET parameter per communicator.
Add support for SHM and P2P transfers using cudaMemcpy.
Αυτή η υποβολή περιλαμβάνεται σε:
Sylvain Jeaugey
2022-05-24 02:02:31 -07:00
γονέας 7aa1c46fd5
υποβολή 19ab67d172
62 αρχεία άλλαξαν με 4787 προσθήκες και 2496 διαγραφές
+2 -2
Προβολή Αρχείου
@@ -1,6 +1,6 @@
##### version
NCCL_MAJOR := 2
NCCL_MINOR := 12
NCCL_PATCH := 12
NCCL_MINOR := 13
NCCL_PATCH := 4
NCCL_SUFFIX :=
PKG_REVISION := 1