Wenkai Du
7bbce085cc
Enable LL128 protocol support ( #605 )
...
* Enable LL128 protocol support
* Use shared memory object directly when possible
2022-09-08 14:45:27 -07:00
Lauren Wrubleski
d700a94918
Update ubuntu18 to ubuntu20 ( #611 )
2022-09-07 16:02:37 -06:00
Min Si
2b57751abb
Fix compilation issues with buck ( #610 )
...
* Fix compilation warning with -Wmisleading-indentation
When compile with -Wmisleading-indentation, it reports warning:
misleading indentation; statement is not part of the previous 'if'
This patch fixes it
* Avoid relative include file path
We don't need relative include file paths for src/graph/*.h
since src/ is already in CMake include_directories
2022-09-07 09:56:05 -06:00
gilbertlee-amd
47b2fc3a30
Adding opt-in hipGraph support for RCCL via RCCL_ENABLE_HIPGRAPH ( #608 )
...
Adding opt-in hipGraph support via RCCL_ENABLE_HIPGRAPH
2022-09-06 10:29:46 -06:00
akolliasAMD
06bce9d0c9
added stream synch after hipMemset ( #609 )
2022-08-30 16:18:37 -06:00
Wenkai Du
c9f2fe1f65
Use hipExtLaunchKernel when not using graph and not in group mode ( #606 )
2022-08-26 13:40:37 -07:00
akolliasAMD
6670dc95ab
git_version cmake consistency changes ( #604 )
...
* git_version cmake variable consistency changes
2022-08-25 15:11:28 -06:00
Edgar Gabriel
8a311583e0
Merge pull request #603 from edgargabriel/topic/float16_unit_tests
...
introduce support for ncclFloat16/half in UT
2022-08-25 07:40:20 -05:00
Edgar Gabriel
f6e00dec13
introduce support for ncclFloat16/half in UT
2022-08-24 15:28:24 +00:00
Edgar Gabriel
e739c62a53
Merge pull request #598 from edgargabriel/topic/tree-multirank
...
Expand ncclTreeBasePostset for multi-rank
2022-08-24 08:28:34 -05:00
Wenkai Du
88487a62bb
Use non-temporal access for slow path ( #602 )
2022-08-23 08:21:51 -07:00
Edgar Gabriel
4141ec1151
fix channelcount for multi-rank scenario
2022-08-22 19:09:22 +00:00
akolliasAMD
3c1b1ec8c8
Simple tree changes ( #599 )
...
changed treebase to create basic balanced tree
2022-08-19 13:51:49 -06:00
Edgar Gabriel
6fba80208c
Merge pull request #601 from CosmicFusion/patch-2
...
fix error: use of undeclared identifier 'free'
2022-08-19 14:10:49 -05:00
Cosmic Fusion
080fc2d9d6
fix error: use of undeclared identifier 'free'
...
include stdlib.h to fix compilation error in rccl :
[39/58] Building CXX object CMakeFiles/rccl.dir/src/misc/signals.cc.o
FAILED: CMakeFiles/rccl.dir/src/misc/signals.cc.o
/opt/rocm/bin/hipcc -DENABLE_COLLTRACE -DHAVE_BFD -DHAVE_CPLUS_DEMANGLE -DUSE_ROCM_SMI64CONFIG -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS -I/home/cosmo/build/flgrwqa/build/include -I/home/cosmo/build/flgrwqa/build/include/rccl -I/home/cosmo/build/flgrwqa/rccl/src -I/home/cosmo/build/flgrwqa/rccl/src/include -I/home/cosmo/build/flgrwqa/rccl/src/collectives -I/home/cosmo/build/flgrwqa/rccl/src/collectives/device -I/opt/hsa/include -fPIC -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip --offload-arch=gfx803 --offload-arch=gfx900:xnack- --offload-arch=gfx906:xnack- --offload-arch=gfx908:xnack- --offload-arch=gfx90a:xnack- --offload-arch=gfx90a:xnack+ --offload-arch=gfx1030 -std=c++14 -MD -MT CMakeFiles/rccl.dir/src/misc/signals.cc.o -MF CMakeFiles/rccl.dir/src/misc/signals.cc.o.d -o CMakeFiles/rccl.dir/src/misc/signals.cc.o -c /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc
In file included from /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc:8:
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:138:9: error: use of undeclared identifier 'free'
free(file->syms);
^
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:155:5: error: use of undeclared identifier 'free'
free(file->syms);
^
2022-08-19 20:25:06 +03:00
Wenkai Du
14b8ff153f
Repurpose profiling implementation to simple timestamps tracing ( #600 )
2022-08-18 15:34:46 -07:00
Wenkai Du
f5c0b243a8
Add XGMI sys type and clean up detection code ( #597 )
2022-08-12 09:52:29 -07:00
Ziyue Yang
f6b9686482
Improve alignment and tuning for Pivot A2A algorithm ( #593 )
...
* Improve alignment and tuning for Pivot A2A algorithm
* enable pivot a2a by default
2022-08-05 19:40:19 -07:00
gilbertlee-amd
dae11c2aca
Disable clique AllReduce UnitTest ( #595 )
2022-08-04 18:30:00 -06:00
gilbertlee-amd
9ed9cd0e31
Fixing CMake to avoid unnecessary git_version relinking ( #594 )
2022-08-04 18:03:59 -06:00
arvindcheru
2cb2f9493a
HIP Path default updated to ROCM_PATH (reorg path) ( #592 )
...
Updated default path for hip to ROCM_PATH (/opt/rocm instead of /opt/rocm/hip) as per new/current structure.
2022-08-04 13:38:41 -04:00
akolliasAMD
4cecdc9be5
minor latency tuning ( #591 )
...
* minor tuning for tree ll
2022-08-03 15:07:44 -06:00
Wenkai Du
9089e68a99
Revert "Use nontemporal in slow path and add XGMI sys type ( #575 )" ( #590 )
...
This reverts commit b250c01cbe .
2022-08-02 09:31:53 -07:00
Wenkai Du
e2cb95a390
Add nccl_net.h to librccl-dev package ( #589 )
2022-07-29 13:39:49 -07:00
akolliasAMD
254208e7dd
Fixed issue with attomicEXCH creating errors on multi node runs ( #587 )
2022-07-22 11:32:49 -06:00
akolliasAMD
686dbc8bc6
updated alltoallV test to reflect how send counts are done in perf tests ( #586 )
2022-07-21 14:59:34 -06:00
akolliasAMD
451c287aa6
Removing redundant LOAD and STORE on primitives plus adding some atomics ( #585 )
2022-07-21 13:04:57 -06:00
Hubert Lu
6dd090917a
Merge pull request #580 from hubertlu-tw/develop
...
Enhancement of RCCL logging information for topology-aware utilities
2022-07-15 15:16:37 -07:00
Edgar Gabriel
58437544f8
Merge pull request #584 from edgargabriel/topic/signal-backtrace
...
intercept SIGUSR2 in RCCL
2022-07-15 11:31:19 -05:00
Edgar Gabriel
2b1d5d3bc1
intercept SIGUSR2 in RCCL
...
add support for intercepting SIGUSR2 in RCCL. This signal will
not terminate the execution of the application, but print the stacktrace
of the process that the signal was sent to instead.
2022-07-15 16:28:46 +00:00
akolliasAMD
da31537ec7
Merge pull request #583 from yzygitzh/ziyyang/ll-fix
...
Remove redundant LOAD/STORE usage in LL initialization
2022-07-14 08:51:39 -06:00
Ziyue Yang
77c2bef952
Remove redundant LOAD/STORE usage in LL initialization
2022-07-14 00:40:36 +00:00
akolliasAMD
873c13b47a
Merge pull request #582 from akolliasAMD/readmeUpdate
...
updated readme to reflect the newer tests
2022-07-13 12:28:30 -06:00
akolliasAMD
5950942738
updated readme to reflect the newer tests
2022-07-13 16:08:28 +00:00
Wenkai Du
314da5a485
README.md: add CMAKE_PREFIX_PATH to build steps ( #581 )
2022-07-12 11:32:07 -07:00
hubertlu-tw
a1842df858
Enhancement of RCCL logging information for topology-aware utilities
2022-07-11 19:01:10 +00:00
Wenkai Du
8c3c8b78c0
Skip HDP cache flush for gfx90a ( #578 )
...
* Skip HDP cache flush for gfx90a
* Remove extra debug print
2022-07-08 10:13:32 -07:00
Wenkai Du
aa0d7ca882
Add more constraints to enable GDR ( #579 )
...
* Add more constraints to enable GDR
* Revert deleted line
2022-07-08 09:52:27 -07:00
Yifan Xiong
80f53cc171
Reduce AlltoAll port usage in send/recv proxy ( #577 )
...
* Reduce AlltoAll port usage when connecting proxy
Reuse socket ports when connecting proxies in AlltoAll.
Existing port usage in AlltoAll is O(n) for recv and O(n) for send,
reusing socket ports in server or client side will make one of them
O(1), reusing both will reduce the total port usage to O(1) and enables
AlltoAll in >64 MI200 nodes.
* Update changelog accordingly
Update changelog accordingly.
2022-07-07 16:15:52 -07:00
Wenkai Du
2e65881a79
Revert "Adding the missing roc:: namespace ( #570 )" ( #576 )
...
This reverts commit d5bea2cfaa .
2022-07-06 10:07:35 -07:00
Wenkai Du
b250c01cbe
Use nontemporal in slow path and add XGMI sys type ( #575 )
...
* Use nontemporal in slow path and add XGMI sys type
* Clean up XGMI detection
2022-07-06 07:58:41 -07:00
Wenkai Du
00af1f64e9
Fix GPU to NIC mapping in tree ( #573 )
...
* Fix GPU to NIC mapping in tree
* Update tuning table
2022-07-03 20:52:52 -07:00
gilbertlee-amd
a89a9966aa
Adding git hash info to version output line ( #572 )
2022-06-28 16:42:51 -06:00
Dmitry Mikushin
d5bea2cfaa
Adding the missing roc:: namespace ( #570 )
...
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com >
* Adding a changelog entry
* minor updates to wording
* missing period
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com >
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com >
2022-06-27 11:44:43 -06:00
Wenkai Du
9a285b5e1d
Do not set NET GDR level automatically ( #571 )
2022-06-23 16:28:28 -07:00
Wenkai Du
c3bb9e70d0
Use different atomics to check flags in kernel ( #568 )
2022-06-23 09:16:41 -07:00
akolliasAMD
06f05300fe
Merge pull request #569 from akolliasAMD/disableMultiRankTest
...
moved default number of max ranks per gpu to 1
2022-06-22 15:52:06 -04:00
akolliasAMD
8b9291eb47
moved default number of max ranks per gpu to 1
2022-06-22 17:37:49 +00:00
Ziyue Yang
6e93fafdc3
Add Feature - Add NPKit Support in RCCL ( #564 )
...
* apply npkit
* fix bug
* add npkit in readme
2022-06-20 14:30:19 -07:00
Wenkai Du
f274c865c1
Change default nchannels per peer ( #563 )
2022-06-13 06:39:05 -07:00