Wenkai Du
ffecb74b1e
Update tuning table and fix topo_expl
...
[ROCm/rccl commit: 94ad7f6f51 ]
2022-11-07 18:24:24 +00:00
Wenkai Du
4c9c1d41ee
Merge remote-tracking branch 'nccl/master' into develop
...
[ROCm/rccl commit: 9a077e6947 ]
2022-11-03 21:17:42 +00:00
Wenkai Du
826b93d98e
Merge pull request #641 from ROCmSoftwarePlatform/2.14.3
...
Sync up with NCCL 2.14.3
[ROCm/rccl commit: effc4b255b ]
2022-11-02 08:31:12 -07:00
Wenkai Du
4630365f2d
Fix P2P scheduling
...
[ROCm/rccl commit: 72ef100050 ]
2022-10-31 08:54:34 -07:00
Sylvain Jeaugey
775f1b59ba
Merge tag 'v2.15.5-1'
...
[ROCm/rccl commit: 2f4cb874ba ]
2022-10-25 01:15:22 -07:00
Sylvain Jeaugey
0b20e8b7e9
2.15.5-1
...
Fix crash with CollnetChain on some node topologies
Fix hang when interleaving the capture of different graphs
Fix hang during init in multi-threaded mode
Fix potential data corruption with LL128 protocol on unaligned buffers.
Fix CPU usage during preconnect
Fixes double-free in the error path for ncclCommInitAll
Workaround hang on H100 with Ring/LL128 on 2 GPUs.
[ROCm/rccl commit: cb111f764a ]
2022-10-25 00:55:55 -07:00
Wenkai Du
36e5e02e46
Merge remote-tracking branch 'nccl/master' into develop
...
[ROCm/rccl commit: 4f0e223db4 ]
2022-10-20 15:41:29 +00:00
Wenkai Du
cf1d4a62e8
Fix missing initialization due to merge error ( #640 )
...
[ROCm/rccl commit: bc8ef779df ]
2022-10-19 21:20:11 -07:00
Wenkai Du
7fe0b0161f
topo_expl: fix compilation error ( #639 )
...
[ROCm/rccl commit: fc554a2428 ]
2022-10-19 14:19:50 -07:00
gilbertlee-amd
5459f1c3f6
Fixing formatting for copywrite ( #638 )
...
[ROCm/rccl commit: 10dbd2a452 ]
2022-10-19 13:43:21 -06:00
Wenkai Du
76414d3230
Support P2P with invisible devices ( #636 )
...
* Support P2P with invisible devices
* Update copyright year
[ROCm/rccl commit: 9ddf0e0649 ]
2022-10-17 10:24:59 -07:00
Wenkai Du
7cb9d872ab
Merge pull request #634 from yzygitzh/ziyyang/npkit-fix
...
Apply several fixes to NPKit
[ROCm/rccl commit: 9916a09818 ]
2022-10-17 08:01:24 -07:00
raramakr
65bd12b94d
Merge pull request #635 from raramakr/swdev
...
Correct include and library path for new directory layout
[ROCm/rccl commit: b32f38126d ]
2022-10-14 15:48:44 -07:00
gilbertlee-amd
0ca30fb88a
Updating files for missing licenses ( #637 )
...
[ROCm/rccl commit: ebb8b5bf63 ]
2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan
1cfca9f8e1
Correct include and library path for new directory layout
...
Use actual header files and libraries , rather than using wrapper header files and library softlinks
[ROCm/rccl commit: cf4e963aaf ]
2022-10-14 01:32:04 -07:00
Ziyue Yang
d54574b0eb
apply npkit
...
[ROCm/rccl commit: 7d6bbc19d4 ]
2022-10-14 01:28:17 +00:00
Sylvain Jeaugey
37dc333d42
Merge tag 'v2.15.1-1'
...
[ROCm/rccl commit: d128d62238 ]
2022-10-07 11:00:26 -07:00
Edgar Gabriel
3fd090c5ba
Merge pull request #633 from edgargabriel/topic/topo-binary-tree
...
introduce a hw topology aware bintree
[ROCm/rccl commit: 4972c129e3 ]
2022-10-05 17:06:54 -05:00
John Bachan
c9cd7243ed
Fixes a double-free in the error path of ncclCommInitAll.
...
Fixes https://github.com/NVIDIA/nccl/issues/726
[ROCm/rccl commit: 2401f4a918 ]
2022-10-03 17:12:32 -07:00
Edgar Gabriel
f2736a4fb3
introduce a hw topology aware bintree
...
for hayabusa architecture.
[ROCm/rccl commit: e645b02cd8 ]
2022-10-03 15:26:21 +00:00
gilbertlee-amd
9225ea766e
Removing TransferBench from tools ( #632 )
...
Point to new TransferBench repo
[ROCm/rccl commit: bd7d589446 ]
2022-09-30 11:53:32 -06:00
akolliasAMD
dbbdf65020
Added new gpu targets ( #631 )
...
[ROCm/rccl commit: ef71550738 ]
2022-09-29 14:53:55 -06:00
Wenkai Du
07a0adf1d6
Another threadfence and flags rework ( #629 )
...
[ROCm/rccl commit: a523b37ac7 ]
2022-09-28 16:49:29 -07:00
Wenkai Du
f6da79844a
Add LL128 tuning ( #630 )
...
[ROCm/rccl commit: 021932b3c8 ]
2022-09-27 09:39:09 -07:00
Sylvain Jeaugey
b4bac0d15a
2.15.1-1
...
Add support for H100 (sm90).
Make sure NCCL kernel honor user stream priorities.
[ROCm/rccl commit: da8152e57a ]
2022-09-27 02:31:13 -07:00
Sylvain Jeaugey
8761a6c2fc
Merge remote-tracking branch 'origin/master'
...
[ROCm/rccl commit: 99c28f2e75 ]
2022-09-27 02:24:41 -07:00
Cliff Woolley
37ccaf1f82
Use compatibility shim only with static cudart
...
Closes issue 658
[ROCm/rccl commit: 78313a6d21 ]
2022-09-27 02:22:48 -07:00
Wen-Heng (Jack) Chung
27d27e971b
Merge pull request #619 from whchung/exp_reduce_code_size
...
Only use split tree algorithm to reduce kernel code size.
[ROCm/rccl commit: e8af0716c4 ]
2022-09-26 10:06:27 -05:00
Sylvain Jeaugey
fda4362c9e
Fix potential deadlock during init in multi-thread mode.
...
Make sure all calls calling cudaMalloc (including devCommSetup) are
called before the last bootstrapBarrier. That way, we avoid calls to
cudaMalloc be blocked by a NCCL kernel launched on another GPU by
another thread which completed init faster.
Resolve #623 .
[ROCm/rccl commit: ecab28a7c9 ]
2022-09-26 02:13:10 -07:00
Wen-Heng (Jack) Chung
bd19566413
Merge pull request #621 from whchung/exp_reduce_sleep_cycles
...
Reduce s_sleep cycles
[ROCm/rccl commit: 35f1fe3434 ]
2022-09-23 15:31:16 -05:00
Wen-Heng (Jack) Chung
183f1e6b32
Merge pull request #624 from whchung/exp_tweak_unroll_factors
...
Tweak unroll factors.
[ROCm/rccl commit: a80cc7e6e1 ]
2022-09-23 11:30:05 -05:00
Wen-Heng (Jack) Chung
dcf3946826
Merge pull request #625 from whchung/exp_sync_lds
...
Abolish syncthreads and only wait on LDS traffic.
[ROCm/rccl commit: a08a24e042 ]
2022-09-23 11:29:48 -05:00
Wen-Heng (Jack) Chung
7cde92deff
Tweak unroll factors.
...
[ROCm/rccl commit: 84054c3b30 ]
2022-09-22 13:03:04 -05:00
Wenkai Du
abdc365a05
Only use split tree algorithm to reduce kernel code size.
...
[ROCm/rccl commit: 02929cffb6 ]
2022-09-22 12:01:53 -05:00
Wenkai Du
0b56e397cc
Reduce s_sleep cycles
...
[ROCm/rccl commit: a3c8ef8c03 ]
2022-09-22 12:01:12 -05:00
Wen-Heng (Jack) Chung
975642e7ee
Abolish syncthreads and only wait on LDS traffic.
...
[ROCm/rccl commit: b9ae02d4ad ]
2022-09-22 12:00:37 -05:00
Wenkai Du
e4d46a0f64
Rework threadfence and flag setting ( #627 )
...
[ROCm/rccl commit: 49c811ecf9 ]
2022-09-22 08:35:42 -07:00
Wenkai Du
81c71aeb67
Revert changes to gfx1030 ( #622 )
...
[ROCm/rccl commit: d9216af48b ]
2022-09-20 20:06:17 -07:00
Wenkai Du
98609a7b92
Define ncclShmem as global shared ( #618 )
...
* Use global defined shared memory
* Add --hipcc-func-supp to compile option
* Force inline some device functions
* Add back threadfence
[ROCm/rccl commit: 9e6c87a2bf ]
2022-09-20 09:00:20 -07:00
Jane Xu
554b03bed1
address review comments
...
[ROCm/rccl commit: f89fd4777d ]
2022-09-20 11:58:33 +02:00
Jane Xu
e742734b82
Fix intermittent 11.6 builds: generate unique .cu file for each object file
...
[ROCm/rccl commit: 79fb0326ac ]
2022-09-20 11:58:33 +02:00
Edgar Gabriel
1a8709086d
Merge pull request #617 from edgargabriel/binary-tree-2.13.4
...
make binary tree work on 2.13.4
[ROCm/rccl commit: 05cc7bd850 ]
2022-09-14 20:30:11 -05:00
Edgar Gabriel
95d6ed2154
make binary tree work on 2.13.4
...
[ROCm/rccl commit: 8f3219dbd4 ]
2022-09-15 00:01:54 +00:00
Wenkai Du
069af6f7c3
Merge pull request #612 from ROCmSoftwarePlatform/2.13.4
...
2.13.4
[ROCm/rccl commit: 8f5507e047 ]
2022-09-13 21:33:29 -07:00
Wenkai Du
ba6e2db70d
Misc fixes and disable binTree
...
[ROCm/rccl commit: a06e14e39b ]
2022-09-14 00:26:19 +00:00
Edgar Gabriel
4a86adcaba
Update init.cc
...
[ROCm/rccl commit: e5d2dfed34 ]
2022-09-13 17:29:32 -05:00
Edgar Gabriel
4c17f4dcc1
Merge branch 'develop' into 2.13.4
...
[ROCm/rccl commit: be935d7ce7 ]
2022-09-13 17:19:04 -05:00
Edgar Gabriel
3225ee7cd0
Merge pull request #615 from edgargabriel/topic/two-trees
...
add binary tree
[ROCm/rccl commit: ea8120a346 ]
2022-09-13 16:50:45 -05:00
Edgar Gabriel
7148c0aa7b
add binary tree
...
In addition, introduce the ability to have 2 trees at the same time.
Only for allreduce at the moment.
[ROCm/rccl commit: 65e2ae20e5 ]
2022-09-13 20:52:32 +00:00
Gilbert Lee
1d24c476f4
Merge branch 'develop' into 2.13.4
...
[ROCm/rccl commit: 009e79623f ]
2022-09-09 23:07:04 +00:00