Wykres commitów

14 Commity

Autor SHA1 Wiadomość Data
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
Sylvain Jeaugey a46ea10583 2.9.6-1
Add support for CUDA graphs.
Fuse BCM Gen4 switches to avoid suboptimal performance on some platforms. Issue #439.
Fix bootstrap issue caused by connection reordering.
Fix CPU locking block.
Improve CollNet algorithm.
Improve performance on DGX A100 for communicators with only one GPU per node.
2021-04-12 16:00:46 -07:00
gilbertlee-amd f4a9b9acba Adding pthread_join / pthread_detach to clean up pthreads to avoid leaks (#322) 2021-02-26 16:29:55 -07:00
Wenkai Du c985358e11 Merge remote-tracking branch 'nccl/master' into 2.8.3 2021-02-15 18:44:47 -05:00
Sylvain Jeaugey 911d61f214 2.8.4-1
Fix hang in corner cases of alltoallv using point to point send/recv.
Harmonize error messages.
Fix missing NVTX section in the license.
Update README.
2021-02-09 15:36:48 -08:00
gilbertlee-amd 3e62ceddc5 Clique kernel support (#295) (#15)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>
2021-01-28 09:45:01 -07:00
Wenkai Du d469947641 Merge remote-tracking branch 'nccl/master' into no-target-id 2021-01-14 19:27:53 -05:00
Sylvain Jeaugey 920dbe5b35 2.8.3-1
Optimization for Tree allreduce on A100.
Improve aggregation performance.
Use shared buffers for inter-node send/recv.
Add NVTX profiling hooks.
Accelerate alltoall connections by merging communication for all
channels.
Add support for one hop communication through NVLink, for faster
send/recv communication on cubemesh topologies like DGX-1.
Improve alltoall scheduling to better balance intra/inter node
communication.
Increase send/recv parallelism by 8x, each warp sending or
receiving to a different peer.
Net: move to v4.
Net: make flush operation asynchronous to accelerate alltoall.
Net: define maximum number of requests.
Fix hang when using LL128 protocol after 2^31 steps.
Fix #379 : topology injection failing when using less GPUs than
described in the XML.
Fix #394 : protocol mismatch causing hangs or crashes when using
one GPU per node.
2020-11-17 11:08:52 -08:00
gilbertlee-amd 84a2541e01 Revert "Initial support for clique-based kernels (#276)" (#280)
This reverts commit 2b8184808d.
2020-10-15 11:30:18 -07:00
gilbertlee-amd 2b8184808d Initial support for clique-based kernels (#276)
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Sylvain Jeaugey 5949d96f36 2.7.3-1
Add support for A100 GPU and related platforms.
Add support for CUDA 11.
Add support for send/receive operations (beta).
2020-06-08 09:31:44 -07:00
Sylvain Jeaugey 299c554dcc 2.5.6-1 (#255)
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).

Add tuning for GCP
2019-11-19 14:57:39 -08:00
Ke Wen 7c72dee660 2.4.8-1
Fix #209: improve socket transport performance
  Split transfers over multiple sockets
  Launch multiple threads to drive sockets
  Detect AWS NICs and set nsockets/nthreads accordingly
2019-06-25 13:22:47 -07:00
David Addison f40ce73e89 NCCL 2.4.6-1
Added detection of IBM/Power NVLink bridge device.
    Add NUMA support to PCI distance calculations.
    Added NCCL_IGNORE_CPU_AFFINITY env var.
    Fix memory leaks; GithubIssue#180
    Compiler warning fix; GithubIssue#178
    Replace non-standard variable length arrays. GithubIssue#171
    Fix Tree+Shared Memory crash. GithubPR#185
    Fix LL cleanup hang during long running DL jobs.
    Fix NCCL_RINGS environment variable handling.
    Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191
    Improve bootstrap socket connection reliability at scale.
    Fix hostname hashing issue. GithubIssue#187
    Code cleanup to rename all non device files from *.cu to *.cc
2019-04-05 13:05:45 -07:00