Ranjith Ramakrishnan
b397cb16ea
Correct hsa header path for new directory layout
2022-11-04 09:52:16 -07:00
Wenkai Du
effc4b255b
Merge pull request #641 from ROCmSoftwarePlatform/2.14.3
...
Sync up with NCCL 2.14.3
2022-11-02 08:31:12 -07:00
Wenkai Du
72ef100050
Fix P2P scheduling
2022-10-31 08:54:34 -07:00
Wenkai Du
4f0e223db4
Merge remote-tracking branch 'nccl/master' into develop
2022-10-20 15:41:29 +00:00
Wenkai Du
bc8ef779df
Fix missing initialization due to merge error ( #640 )
2022-10-19 21:20:11 -07:00
Wenkai Du
fc554a2428
topo_expl: fix compilation error ( #639 )
2022-10-19 14:19:50 -07:00
gilbertlee-amd
10dbd2a452
Fixing formatting for copywrite ( #638 )
2022-10-19 13:43:21 -06:00
Wenkai Du
9ddf0e0649
Support P2P with invisible devices ( #636 )
...
* Support P2P with invisible devices
* Update copyright year
2022-10-17 10:24:59 -07:00
Wenkai Du
9916a09818
Merge pull request #634 from yzygitzh/ziyyang/npkit-fix
...
Apply several fixes to NPKit
2022-10-17 08:01:24 -07:00
raramakr
b32f38126d
Merge pull request #635 from raramakr/swdev
...
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd
ebb8b5bf63
Updating files for missing licenses ( #637 )
2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan
cf4e963aaf
Correct include and library path for new directory layout
...
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
Ziyue Yang
7d6bbc19d4
apply npkit
2022-10-14 01:28:17 +00:00
Edgar Gabriel
4972c129e3
Merge pull request #633 from edgargabriel/topic/topo-binary-tree
...
introduce a hw topology aware bintree
2022-10-05 17:06:54 -05:00
Edgar Gabriel
e645b02cd8
introduce a hw topology aware bintree
...
for hayabusa architecture.
2022-10-03 15:26:21 +00:00
gilbertlee-amd
bd7d589446
Removing TransferBench from tools ( #632 )
...
Point to new TransferBench repo
2022-09-30 11:53:32 -06:00
akolliasAMD
ef71550738
Added new gpu targets ( #631 )
2022-09-29 14:53:55 -06:00
Wenkai Du
a523b37ac7
Another threadfence and flags rework ( #629 )
2022-09-28 16:49:29 -07:00
Wenkai Du
021932b3c8
Add LL128 tuning ( #630 )
2022-09-27 09:39:09 -07:00
Sylvain Jeaugey
99c28f2e75
Merge remote-tracking branch 'origin/master'
2022-09-27 02:24:41 -07:00
Cliff Woolley
78313a6d21
Use compatibility shim only with static cudart
...
Closes issue 658
2022-09-27 02:22:48 -07:00
Wen-Heng (Jack) Chung
e8af0716c4
Merge pull request #619 from whchung/exp_reduce_code_size
...
Only use split tree algorithm to reduce kernel code size.
2022-09-26 10:06:27 -05:00
Sylvain Jeaugey
ecab28a7c9
Fix potential deadlock during init in multi-thread mode.
...
Make sure all calls calling cudaMalloc (including devCommSetup) are
called before the last bootstrapBarrier. That way, we avoid calls to
cudaMalloc be blocked by a NCCL kernel launched on another GPU by
another thread which completed init faster.
Resolve #623 .
2022-09-26 02:13:10 -07:00
Wen-Heng (Jack) Chung
35f1fe3434
Merge pull request #621 from whchung/exp_reduce_sleep_cycles
...
Reduce s_sleep cycles
2022-09-23 15:31:16 -05:00
Wen-Heng (Jack) Chung
a80cc7e6e1
Merge pull request #624 from whchung/exp_tweak_unroll_factors
...
Tweak unroll factors.
2022-09-23 11:30:05 -05:00
Wen-Heng (Jack) Chung
a08a24e042
Merge pull request #625 from whchung/exp_sync_lds
...
Abolish syncthreads and only wait on LDS traffic.
2022-09-23 11:29:48 -05:00
Wen-Heng (Jack) Chung
84054c3b30
Tweak unroll factors.
2022-09-22 13:03:04 -05:00
Wenkai Du
02929cffb6
Only use split tree algorithm to reduce kernel code size.
2022-09-22 12:01:53 -05:00
Wenkai Du
a3c8ef8c03
Reduce s_sleep cycles
2022-09-22 12:01:12 -05:00
Wen-Heng (Jack) Chung
b9ae02d4ad
Abolish syncthreads and only wait on LDS traffic.
2022-09-22 12:00:37 -05:00
Wenkai Du
49c811ecf9
Rework threadfence and flag setting ( #627 )
2022-09-22 08:35:42 -07:00
Wenkai Du
d9216af48b
Revert changes to gfx1030 ( #622 )
2022-09-20 20:06:17 -07:00
Wenkai Du
9e6c87a2bf
Define ncclShmem as global shared ( #618 )
...
* Use global defined shared memory
* Add --hipcc-func-supp to compile option
* Force inline some device functions
* Add back threadfence
2022-09-20 09:00:20 -07:00
Jane Xu
f89fd4777d
address review comments
2022-09-20 11:58:33 +02:00
Jane Xu
79fb0326ac
Fix intermittent 11.6 builds: generate unique .cu file for each object file
2022-09-20 11:58:33 +02:00
Edgar Gabriel
05cc7bd850
Merge pull request #617 from edgargabriel/binary-tree-2.13.4
...
make binary tree work on 2.13.4
2022-09-14 20:30:11 -05:00
Edgar Gabriel
8f3219dbd4
make binary tree work on 2.13.4
2022-09-15 00:01:54 +00:00
Wenkai Du
8f5507e047
Merge pull request #612 from ROCmSoftwarePlatform/2.13.4
...
2.13.4
2022-09-13 21:33:29 -07:00
Wenkai Du
a06e14e39b
Misc fixes and disable binTree
2022-09-14 00:26:19 +00:00
Edgar Gabriel
e5d2dfed34
Update init.cc
2022-09-13 17:29:32 -05:00
Edgar Gabriel
be935d7ce7
Merge branch 'develop' into 2.13.4
2022-09-13 17:19:04 -05:00
Edgar Gabriel
ea8120a346
Merge pull request #615 from edgargabriel/topic/two-trees
...
add binary tree
2022-09-13 16:50:45 -05:00
Edgar Gabriel
65e2ae20e5
add binary tree
...
In addition, introduce the ability to have 2 trees at the same time.
Only for allreduce at the moment.
2022-09-13 20:52:32 +00:00
Gilbert Lee
009e79623f
Merge branch 'develop' into 2.13.4
2022-09-09 23:07:04 +00:00
gilbertlee-amd
dd56135a9a
Updating stream caching ( #614 )
...
- Adding non-captured hipStream for use in setup
2022-09-09 16:30:15 -06:00
gilbertlee-amd
65d78e9a1d
GraphBench ( #613 )
...
Adding simple GraphBench tool for comparing RCCL hipGraph performance
2022-09-09 12:12:25 -06:00
Wenkai Du
a79d9e3586
Merge remote-tracking branch 'nccl/master' into develop
2022-09-09 16:05:38 +00:00
Wenkai Du
7bbce085cc
Enable LL128 protocol support ( #605 )
...
* Enable LL128 protocol support
* Use shared memory object directly when possible
2022-09-08 14:45:27 -07:00
Lauren Wrubleski
d700a94918
Update ubuntu18 to ubuntu20 ( #611 )
2022-09-07 16:02:37 -06:00
Min Si
2b57751abb
Fix compilation issues with buck ( #610 )
...
* Fix compilation warning with -Wmisleading-indentation
When compile with -Wmisleading-indentation, it reports warning:
misleading indentation; statement is not part of the previous 'if'
This patch fixes it
* Avoid relative include file path
We don't need relative include file paths for src/graph/*.h
since src/ is already in CMake include_directories
2022-09-07 09:56:05 -06:00