Edgar Gabriel
3225ee7cd0
Merge pull request #615 from edgargabriel/topic/two-trees
...
add binary tree
[ROCm/rccl commit: ea8120a346 ]
2022-09-13 16:50:45 -05:00
Edgar Gabriel
7148c0aa7b
add binary tree
...
In addition, introduce the ability to have 2 trees at the same time.
Only for allreduce at the moment.
[ROCm/rccl commit: 65e2ae20e5 ]
2022-09-13 20:52:32 +00:00
gilbertlee-amd
35872115f8
Updating stream caching ( #614 )
...
- Adding non-captured hipStream for use in setup
[ROCm/rccl commit: dd56135a9a ]
2022-09-09 16:30:15 -06:00
gilbertlee-amd
af71be44f1
GraphBench ( #613 )
...
Adding simple GraphBench tool for comparing RCCL hipGraph performance
[ROCm/rccl commit: 65d78e9a1d ]
2022-09-09 12:12:25 -06:00
Wenkai Du
fe99249cde
Enable LL128 protocol support ( #605 )
...
* Enable LL128 protocol support
* Use shared memory object directly when possible
[ROCm/rccl commit: 7bbce085cc ]
2022-09-08 14:45:27 -07:00
Lauren Wrubleski
3da06e4704
Update ubuntu18 to ubuntu20 ( #611 )
...
[ROCm/rccl commit: d700a94918 ]
2022-09-07 16:02:37 -06:00
Min Si
25ba51fe83
Fix compilation issues with buck ( #610 )
...
* Fix compilation warning with -Wmisleading-indentation
When compile with -Wmisleading-indentation, it reports warning:
misleading indentation; statement is not part of the previous 'if'
This patch fixes it
* Avoid relative include file path
We don't need relative include file paths for src/graph/*.h
since src/ is already in CMake include_directories
[ROCm/rccl commit: 2b57751abb ]
2022-09-07 09:56:05 -06:00
gilbertlee-amd
616cb39a0b
Adding opt-in hipGraph support for RCCL via RCCL_ENABLE_HIPGRAPH ( #608 )
...
Adding opt-in hipGraph support via RCCL_ENABLE_HIPGRAPH
[ROCm/rccl commit: 47b2fc3a30 ]
2022-09-06 10:29:46 -06:00
akolliasAMD
2cd63dac42
added stream synch after hipMemset ( #609 )
...
[ROCm/rccl commit: 06bce9d0c9 ]
2022-08-30 16:18:37 -06:00
Wenkai Du
f18868f439
Use hipExtLaunchKernel when not using graph and not in group mode ( #606 )
...
[ROCm/rccl commit: c9f2fe1f65 ]
2022-08-26 13:40:37 -07:00
akolliasAMD
151a8ef56a
git_version cmake consistency changes ( #604 )
...
* git_version cmake variable consistency changes
[ROCm/rccl commit: 6670dc95ab ]
2022-08-25 15:11:28 -06:00
Edgar Gabriel
22dcbed61b
Merge pull request #603 from edgargabriel/topic/float16_unit_tests
...
introduce support for ncclFloat16/half in UT
[ROCm/rccl commit: 8a311583e0 ]
2022-08-25 07:40:20 -05:00
Edgar Gabriel
b32b819151
introduce support for ncclFloat16/half in UT
...
[ROCm/rccl commit: f6e00dec13 ]
2022-08-24 15:28:24 +00:00
Edgar Gabriel
6bb871c986
Merge pull request #598 from edgargabriel/topic/tree-multirank
...
Expand ncclTreeBasePostset for multi-rank
[ROCm/rccl commit: e739c62a53 ]
2022-08-24 08:28:34 -05:00
Wenkai Du
56ea2c4be5
Use non-temporal access for slow path ( #602 )
...
[ROCm/rccl commit: 88487a62bb ]
2022-08-23 08:21:51 -07:00
Edgar Gabriel
aa6d450f35
fix channelcount for multi-rank scenario
...
[ROCm/rccl commit: 4141ec1151 ]
2022-08-22 19:09:22 +00:00
akolliasAMD
1d55fe756c
Simple tree changes ( #599 )
...
changed treebase to create basic balanced tree
[ROCm/rccl commit: 3c1b1ec8c8 ]
2022-08-19 13:51:49 -06:00
Edgar Gabriel
27cb7d2b20
Merge pull request #601 from CosmicFusion/patch-2
...
fix error: use of undeclared identifier 'free'
[ROCm/rccl commit: 6fba80208c ]
2022-08-19 14:10:49 -05:00
Cosmic Fusion
1dcf1da5ca
fix error: use of undeclared identifier 'free'
...
include stdlib.h to fix compilation error in rccl :
[39/58] Building CXX object CMakeFiles/rccl.dir/src/misc/signals.cc.o
FAILED: CMakeFiles/rccl.dir/src/misc/signals.cc.o
/opt/rocm/bin/hipcc -DENABLE_COLLTRACE -DHAVE_BFD -DHAVE_CPLUS_DEMANGLE -DUSE_ROCM_SMI64CONFIG -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS -I/home/cosmo/build/flgrwqa/build/include -I/home/cosmo/build/flgrwqa/build/include/rccl -I/home/cosmo/build/flgrwqa/rccl/src -I/home/cosmo/build/flgrwqa/rccl/src/include -I/home/cosmo/build/flgrwqa/rccl/src/collectives -I/home/cosmo/build/flgrwqa/rccl/src/collectives/device -I/opt/hsa/include -fPIC -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip --offload-arch=gfx803 --offload-arch=gfx900:xnack- --offload-arch=gfx906:xnack- --offload-arch=gfx908:xnack- --offload-arch=gfx90a:xnack- --offload-arch=gfx90a:xnack+ --offload-arch=gfx1030 -std=c++14 -MD -MT CMakeFiles/rccl.dir/src/misc/signals.cc.o -MF CMakeFiles/rccl.dir/src/misc/signals.cc.o.d -o CMakeFiles/rccl.dir/src/misc/signals.cc.o -c /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc
In file included from /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc:8:
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:138:9: error: use of undeclared identifier 'free'
free(file->syms);
^
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:155:5: error: use of undeclared identifier 'free'
free(file->syms);
^
[ROCm/rccl commit: 080fc2d9d6 ]
2022-08-19 20:25:06 +03:00
Wenkai Du
c2e9ada40b
Repurpose profiling implementation to simple timestamps tracing ( #600 )
...
[ROCm/rccl commit: 14b8ff153f ]
2022-08-18 15:34:46 -07:00
Wenkai Du
6c3f1366e8
Add XGMI sys type and clean up detection code ( #597 )
...
[ROCm/rccl commit: f5c0b243a8 ]
2022-08-12 09:52:29 -07:00
Ziyue Yang
478d8312b8
Improve alignment and tuning for Pivot A2A algorithm ( #593 )
...
* Improve alignment and tuning for Pivot A2A algorithm
* enable pivot a2a by default
[ROCm/rccl commit: f6b9686482 ]
2022-08-05 19:40:19 -07:00
gilbertlee-amd
e3b832f4ce
Disable clique AllReduce UnitTest ( #595 )
...
[ROCm/rccl commit: dae11c2aca ]
2022-08-04 18:30:00 -06:00
gilbertlee-amd
b350916a6e
Fixing CMake to avoid unnecessary git_version relinking ( #594 )
...
[ROCm/rccl commit: 9ed9cd0e31 ]
2022-08-04 18:03:59 -06:00
arvindcheru
a44be6655d
HIP Path default updated to ROCM_PATH (reorg path) ( #592 )
...
Updated default path for hip to ROCM_PATH (/opt/rocm instead of /opt/rocm/hip) as per new/current structure.
[ROCm/rccl commit: 2cb2f9493a ]
2022-08-04 13:38:41 -04:00
akolliasAMD
6fb5c5d5e3
minor latency tuning ( #591 )
...
* minor tuning for tree ll
[ROCm/rccl commit: 4cecdc9be5 ]
2022-08-03 15:07:44 -06:00
Wenkai Du
f70830d629
Revert "Use nontemporal in slow path and add XGMI sys type ( #575 )" ( #590 )
...
This reverts commit e04bba619a .
[ROCm/rccl commit: 9089e68a99 ]
2022-08-02 09:31:53 -07:00
Wenkai Du
7e124d5b83
Add nccl_net.h to librccl-dev package ( #589 )
...
[ROCm/rccl commit: e2cb95a390 ]
2022-07-29 13:39:49 -07:00
akolliasAMD
d5ca0be51f
Fixed issue with attomicEXCH creating errors on multi node runs ( #587 )
...
[ROCm/rccl commit: 254208e7dd ]
2022-07-22 11:32:49 -06:00
akolliasAMD
fd99ca19f5
updated alltoallV test to reflect how send counts are done in perf tests ( #586 )
...
[ROCm/rccl commit: 686dbc8bc6 ]
2022-07-21 14:59:34 -06:00
akolliasAMD
18d9fd1b8f
Removing redundant LOAD and STORE on primitives plus adding some atomics ( #585 )
...
[ROCm/rccl commit: 451c287aa6 ]
2022-07-21 13:04:57 -06:00
Hubert Lu
088d62ff58
Merge pull request #580 from hubertlu-tw/develop
...
Enhancement of RCCL logging information for topology-aware utilities
[ROCm/rccl commit: 6dd090917a ]
2022-07-15 15:16:37 -07:00
Edgar Gabriel
24f2071206
Merge pull request #584 from edgargabriel/topic/signal-backtrace
...
intercept SIGUSR2 in RCCL
[ROCm/rccl commit: 58437544f8 ]
2022-07-15 11:31:19 -05:00
Edgar Gabriel
a9e0333dba
intercept SIGUSR2 in RCCL
...
add support for intercepting SIGUSR2 in RCCL. This signal will
not terminate the execution of the application, but print the stacktrace
of the process that the signal was sent to instead.
[ROCm/rccl commit: 2b1d5d3bc1 ]
2022-07-15 16:28:46 +00:00
akolliasAMD
24d9d1c37a
Merge pull request #583 from yzygitzh/ziyyang/ll-fix
...
Remove redundant LOAD/STORE usage in LL initialization
[ROCm/rccl commit: da31537ec7 ]
2022-07-14 08:51:39 -06:00
Ziyue Yang
e1aae026bf
Remove redundant LOAD/STORE usage in LL initialization
...
[ROCm/rccl commit: 77c2bef952 ]
2022-07-14 00:40:36 +00:00
akolliasAMD
d2df866925
Merge pull request #582 from akolliasAMD/readmeUpdate
...
updated readme to reflect the newer tests
[ROCm/rccl commit: 873c13b47a ]
2022-07-13 12:28:30 -06:00
akolliasAMD
2a1d472a20
updated readme to reflect the newer tests
...
[ROCm/rccl commit: 5950942738 ]
2022-07-13 16:08:28 +00:00
Wenkai Du
9a9d9cb29b
README.md: add CMAKE_PREFIX_PATH to build steps ( #581 )
...
[ROCm/rccl commit: 314da5a485 ]
2022-07-12 11:32:07 -07:00
hubertlu-tw
e13eb2eab9
Enhancement of RCCL logging information for topology-aware utilities
...
[ROCm/rccl commit: a1842df858 ]
2022-07-11 19:01:10 +00:00
Wenkai Du
c129677fe0
Skip HDP cache flush for gfx90a ( #578 )
...
* Skip HDP cache flush for gfx90a
* Remove extra debug print
[ROCm/rccl commit: 8c3c8b78c0 ]
2022-07-08 10:13:32 -07:00
Wenkai Du
659cd52d5c
Add more constraints to enable GDR ( #579 )
...
* Add more constraints to enable GDR
* Revert deleted line
[ROCm/rccl commit: aa0d7ca882 ]
2022-07-08 09:52:27 -07:00
Yifan Xiong
bf15ad1d72
Reduce AlltoAll port usage in send/recv proxy ( #577 )
...
* Reduce AlltoAll port usage when connecting proxy
Reuse socket ports when connecting proxies in AlltoAll.
Existing port usage in AlltoAll is O(n) for recv and O(n) for send,
reusing socket ports in server or client side will make one of them
O(1), reusing both will reduce the total port usage to O(1) and enables
AlltoAll in >64 MI200 nodes.
* Update changelog accordingly
Update changelog accordingly.
[ROCm/rccl commit: 80f53cc171 ]
2022-07-07 16:15:52 -07:00
Wenkai Du
4b99cef680
Revert "Adding the missing roc:: namespace ( #570 )" ( #576 )
...
This reverts commit fc340decf4 .
[ROCm/rccl commit: 2e65881a79 ]
2022-07-06 10:07:35 -07:00
Wenkai Du
e04bba619a
Use nontemporal in slow path and add XGMI sys type ( #575 )
...
* Use nontemporal in slow path and add XGMI sys type
* Clean up XGMI detection
[ROCm/rccl commit: b250c01cbe ]
2022-07-06 07:58:41 -07:00
Wenkai Du
2f4aea93e0
Fix GPU to NIC mapping in tree ( #573 )
...
* Fix GPU to NIC mapping in tree
* Update tuning table
[ROCm/rccl commit: 00af1f64e9 ]
2022-07-03 20:52:52 -07:00
gilbertlee-amd
cb5ae7224e
Adding git hash info to version output line ( #572 )
...
[ROCm/rccl commit: a89a9966aa ]
2022-06-28 16:42:51 -06:00
Dmitry Mikushin
fc340decf4
Adding the missing roc:: namespace ( #570 )
...
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com >
* Adding a changelog entry
* minor updates to wording
* missing period
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com >
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com >
[ROCm/rccl commit: d5bea2cfaa ]
2022-06-27 11:44:43 -06:00
Wenkai Du
915a9d3934
Do not set NET GDR level automatically ( #571 )
...
[ROCm/rccl commit: 9a285b5e1d ]
2022-06-23 16:28:28 -07:00
Wenkai Du
784b12bf75
Use different atomics to check flags in kernel ( #568 )
...
[ROCm/rccl commit: c3bb9e70d0 ]
2022-06-23 09:16:41 -07:00