Commit Graph

925 Commits

Author SHA1 Message Date
Wenkai Du ffecb74b1e Update tuning table and fix topo_expl
[ROCm/rccl commit: 94ad7f6f51]
2022-11-07 18:24:24 +00:00
Wenkai Du 4c9c1d41ee Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 9a077e6947]
2022-11-03 21:17:42 +00:00
Wenkai Du 826b93d98e Merge pull request #641 from ROCmSoftwarePlatform/2.14.3
Sync up with NCCL 2.14.3

[ROCm/rccl commit: effc4b255b]
2022-11-02 08:31:12 -07:00
Wenkai Du 4630365f2d Fix P2P scheduling
[ROCm/rccl commit: 72ef100050]
2022-10-31 08:54:34 -07:00
Sylvain Jeaugey 775f1b59ba Merge tag 'v2.15.5-1'
[ROCm/rccl commit: 2f4cb874ba]
2022-10-25 01:15:22 -07:00
Sylvain Jeaugey 0b20e8b7e9 2.15.5-1
Fix crash with CollnetChain on some node topologies
Fix hang when interleaving the capture of different graphs
Fix hang during init in multi-threaded mode
Fix potential data corruption with LL128 protocol on unaligned buffers.
Fix CPU usage during preconnect
Fixes double-free in the error path for ncclCommInitAll
Workaround hang on H100 with Ring/LL128 on 2 GPUs.


[ROCm/rccl commit: cb111f764a]
2022-10-25 00:55:55 -07:00
Wenkai Du 36e5e02e46 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 4f0e223db4]
2022-10-20 15:41:29 +00:00
Wenkai Du cf1d4a62e8 Fix missing initialization due to merge error (#640)
[ROCm/rccl commit: bc8ef779df]
2022-10-19 21:20:11 -07:00
Wenkai Du 7fe0b0161f topo_expl: fix compilation error (#639)
[ROCm/rccl commit: fc554a2428]
2022-10-19 14:19:50 -07:00
gilbertlee-amd 5459f1c3f6 Fixing formatting for copywrite (#638)
[ROCm/rccl commit: 10dbd2a452]
2022-10-19 13:43:21 -06:00
Wenkai Du 76414d3230 Support P2P with invisible devices (#636)
* Support P2P with invisible devices

* Update copyright year

[ROCm/rccl commit: 9ddf0e0649]
2022-10-17 10:24:59 -07:00
Wenkai Du 7cb9d872ab Merge pull request #634 from yzygitzh/ziyyang/npkit-fix
Apply several fixes to NPKit

[ROCm/rccl commit: 9916a09818]
2022-10-17 08:01:24 -07:00
raramakr 65bd12b94d Merge pull request #635 from raramakr/swdev
Correct include and library path for new directory layout

[ROCm/rccl commit: b32f38126d]
2022-10-14 15:48:44 -07:00
gilbertlee-amd 0ca30fb88a Updating files for missing licenses (#637)
[ROCm/rccl commit: ebb8b5bf63]
2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan 1cfca9f8e1 Correct include and library path for new directory layout
Use actual header files and libraries , rather than using wrapper header files and library softlinks


[ROCm/rccl commit: cf4e963aaf]
2022-10-14 01:32:04 -07:00
Ziyue Yang d54574b0eb apply npkit
[ROCm/rccl commit: 7d6bbc19d4]
2022-10-14 01:28:17 +00:00
Sylvain Jeaugey 37dc333d42 Merge tag 'v2.15.1-1'
[ROCm/rccl commit: d128d62238]
2022-10-07 11:00:26 -07:00
Edgar Gabriel 3fd090c5ba Merge pull request #633 from edgargabriel/topic/topo-binary-tree
introduce a hw topology aware bintree

[ROCm/rccl commit: 4972c129e3]
2022-10-05 17:06:54 -05:00
John Bachan c9cd7243ed Fixes a double-free in the error path of ncclCommInitAll.
Fixes https://github.com/NVIDIA/nccl/issues/726


[ROCm/rccl commit: 2401f4a918]
2022-10-03 17:12:32 -07:00
Edgar Gabriel f2736a4fb3 introduce a hw topology aware bintree
for hayabusa architecture.


[ROCm/rccl commit: e645b02cd8]
2022-10-03 15:26:21 +00:00
gilbertlee-amd 9225ea766e Removing TransferBench from tools (#632)
Point to new TransferBench repo

[ROCm/rccl commit: bd7d589446]
2022-09-30 11:53:32 -06:00
akolliasAMD dbbdf65020 Added new gpu targets (#631)
[ROCm/rccl commit: ef71550738]
2022-09-29 14:53:55 -06:00
Wenkai Du 07a0adf1d6 Another threadfence and flags rework (#629)
[ROCm/rccl commit: a523b37ac7]
2022-09-28 16:49:29 -07:00
Wenkai Du f6da79844a Add LL128 tuning (#630)
[ROCm/rccl commit: 021932b3c8]
2022-09-27 09:39:09 -07:00
Sylvain Jeaugey b4bac0d15a 2.15.1-1
Add support for H100 (sm90).
Make sure NCCL kernel honor user stream priorities.


[ROCm/rccl commit: da8152e57a]
2022-09-27 02:31:13 -07:00
Sylvain Jeaugey 8761a6c2fc Merge remote-tracking branch 'origin/master'
[ROCm/rccl commit: 99c28f2e75]
2022-09-27 02:24:41 -07:00
Cliff Woolley 37ccaf1f82 Use compatibility shim only with static cudart
Closes issue 658


[ROCm/rccl commit: 78313a6d21]
2022-09-27 02:22:48 -07:00
Wen-Heng (Jack) Chung 27d27e971b Merge pull request #619 from whchung/exp_reduce_code_size
Only use split tree algorithm to reduce kernel code size.

[ROCm/rccl commit: e8af0716c4]
2022-09-26 10:06:27 -05:00
Sylvain Jeaugey fda4362c9e Fix potential deadlock during init in multi-thread mode.
Make sure all calls calling cudaMalloc (including devCommSetup) are
called before the last bootstrapBarrier. That way, we avoid calls to
cudaMalloc be blocked by a NCCL kernel launched on another GPU by
another thread which completed init faster.

Resolve #623.


[ROCm/rccl commit: ecab28a7c9]
2022-09-26 02:13:10 -07:00
Wen-Heng (Jack) Chung bd19566413 Merge pull request #621 from whchung/exp_reduce_sleep_cycles
Reduce s_sleep cycles

[ROCm/rccl commit: 35f1fe3434]
2022-09-23 15:31:16 -05:00
Wen-Heng (Jack) Chung 183f1e6b32 Merge pull request #624 from whchung/exp_tweak_unroll_factors
Tweak unroll factors.

[ROCm/rccl commit: a80cc7e6e1]
2022-09-23 11:30:05 -05:00
Wen-Heng (Jack) Chung dcf3946826 Merge pull request #625 from whchung/exp_sync_lds
Abolish syncthreads and only wait on LDS traffic.

[ROCm/rccl commit: a08a24e042]
2022-09-23 11:29:48 -05:00
Wen-Heng (Jack) Chung 7cde92deff Tweak unroll factors.
[ROCm/rccl commit: 84054c3b30]
2022-09-22 13:03:04 -05:00
Wenkai Du abdc365a05 Only use split tree algorithm to reduce kernel code size.
[ROCm/rccl commit: 02929cffb6]
2022-09-22 12:01:53 -05:00
Wenkai Du 0b56e397cc Reduce s_sleep cycles
[ROCm/rccl commit: a3c8ef8c03]
2022-09-22 12:01:12 -05:00
Wen-Heng (Jack) Chung 975642e7ee Abolish syncthreads and only wait on LDS traffic.
[ROCm/rccl commit: b9ae02d4ad]
2022-09-22 12:00:37 -05:00
Wenkai Du e4d46a0f64 Rework threadfence and flag setting (#627)
[ROCm/rccl commit: 49c811ecf9]
2022-09-22 08:35:42 -07:00
Wenkai Du 81c71aeb67 Revert changes to gfx1030 (#622)
[ROCm/rccl commit: d9216af48b]
2022-09-20 20:06:17 -07:00
Wenkai Du 98609a7b92 Define ncclShmem as global shared (#618)
* Use global defined shared memory

* Add --hipcc-func-supp to compile option

* Force inline some device functions

* Add back threadfence

[ROCm/rccl commit: 9e6c87a2bf]
2022-09-20 09:00:20 -07:00
Jane Xu 554b03bed1 address review comments
[ROCm/rccl commit: f89fd4777d]
2022-09-20 11:58:33 +02:00
Jane Xu e742734b82 Fix intermittent 11.6 builds: generate unique .cu file for each object file
[ROCm/rccl commit: 79fb0326ac]
2022-09-20 11:58:33 +02:00
Edgar Gabriel 1a8709086d Merge pull request #617 from edgargabriel/binary-tree-2.13.4
make binary tree work on 2.13.4

[ROCm/rccl commit: 05cc7bd850]
2022-09-14 20:30:11 -05:00
Edgar Gabriel 95d6ed2154 make binary tree work on 2.13.4
[ROCm/rccl commit: 8f3219dbd4]
2022-09-15 00:01:54 +00:00
Wenkai Du 069af6f7c3 Merge pull request #612 from ROCmSoftwarePlatform/2.13.4
2.13.4

[ROCm/rccl commit: 8f5507e047]
2022-09-13 21:33:29 -07:00
Wenkai Du ba6e2db70d Misc fixes and disable binTree
[ROCm/rccl commit: a06e14e39b]
2022-09-14 00:26:19 +00:00
Edgar Gabriel 4a86adcaba Update init.cc
[ROCm/rccl commit: e5d2dfed34]
2022-09-13 17:29:32 -05:00
Edgar Gabriel 4c17f4dcc1 Merge branch 'develop' into 2.13.4
[ROCm/rccl commit: be935d7ce7]
2022-09-13 17:19:04 -05:00
Edgar Gabriel 3225ee7cd0 Merge pull request #615 from edgargabriel/topic/two-trees
add binary tree

[ROCm/rccl commit: ea8120a346]
2022-09-13 16:50:45 -05:00
Edgar Gabriel 7148c0aa7b add binary tree
In addition, introduce the ability to have 2 trees at the same time.
Only for allreduce at the moment.


[ROCm/rccl commit: 65e2ae20e5]
2022-09-13 20:52:32 +00:00
Gilbert Lee 1d24c476f4 Merge branch 'develop' into 2.13.4
[ROCm/rccl commit: 009e79623f]
2022-09-09 23:07:04 +00:00