Wenkai Du
c2e9ada40b
Repurpose profiling implementation to simple timestamps tracing ( #600 )
...
[ROCm/rccl commit: 14b8ff153f ]
2022-08-18 15:34:46 -07:00
Wenkai Du
6c3f1366e8
Add XGMI sys type and clean up detection code ( #597 )
...
[ROCm/rccl commit: f5c0b243a8 ]
2022-08-12 09:52:29 -07:00
Ziyue Yang
478d8312b8
Improve alignment and tuning for Pivot A2A algorithm ( #593 )
...
* Improve alignment and tuning for Pivot A2A algorithm
* enable pivot a2a by default
[ROCm/rccl commit: f6b9686482 ]
2022-08-05 19:40:19 -07:00
gilbertlee-amd
e3b832f4ce
Disable clique AllReduce UnitTest ( #595 )
...
[ROCm/rccl commit: dae11c2aca ]
2022-08-04 18:30:00 -06:00
gilbertlee-amd
b350916a6e
Fixing CMake to avoid unnecessary git_version relinking ( #594 )
...
[ROCm/rccl commit: 9ed9cd0e31 ]
2022-08-04 18:03:59 -06:00
arvindcheru
a44be6655d
HIP Path default updated to ROCM_PATH (reorg path) ( #592 )
...
Updated default path for hip to ROCM_PATH (/opt/rocm instead of /opt/rocm/hip) as per new/current structure.
[ROCm/rccl commit: 2cb2f9493a ]
2022-08-04 13:38:41 -04:00
akolliasAMD
6fb5c5d5e3
minor latency tuning ( #591 )
...
* minor tuning for tree ll
[ROCm/rccl commit: 4cecdc9be5 ]
2022-08-03 15:07:44 -06:00
Wenkai Du
f70830d629
Revert "Use nontemporal in slow path and add XGMI sys type ( #575 )" ( #590 )
...
This reverts commit e04bba619a .
[ROCm/rccl commit: 9089e68a99 ]
2022-08-02 09:31:53 -07:00
Wenkai Du
7e124d5b83
Add nccl_net.h to librccl-dev package ( #589 )
...
[ROCm/rccl commit: e2cb95a390 ]
2022-07-29 13:39:49 -07:00
akolliasAMD
d5ca0be51f
Fixed issue with attomicEXCH creating errors on multi node runs ( #587 )
...
[ROCm/rccl commit: 254208e7dd ]
2022-07-22 11:32:49 -06:00
akolliasAMD
fd99ca19f5
updated alltoallV test to reflect how send counts are done in perf tests ( #586 )
...
[ROCm/rccl commit: 686dbc8bc6 ]
2022-07-21 14:59:34 -06:00
akolliasAMD
18d9fd1b8f
Removing redundant LOAD and STORE on primitives plus adding some atomics ( #585 )
...
[ROCm/rccl commit: 451c287aa6 ]
2022-07-21 13:04:57 -06:00
Hubert Lu
088d62ff58
Merge pull request #580 from hubertlu-tw/develop
...
Enhancement of RCCL logging information for topology-aware utilities
[ROCm/rccl commit: 6dd090917a ]
2022-07-15 15:16:37 -07:00
Edgar Gabriel
24f2071206
Merge pull request #584 from edgargabriel/topic/signal-backtrace
...
intercept SIGUSR2 in RCCL
[ROCm/rccl commit: 58437544f8 ]
2022-07-15 11:31:19 -05:00
Edgar Gabriel
a9e0333dba
intercept SIGUSR2 in RCCL
...
add support for intercepting SIGUSR2 in RCCL. This signal will
not terminate the execution of the application, but print the stacktrace
of the process that the signal was sent to instead.
[ROCm/rccl commit: 2b1d5d3bc1 ]
2022-07-15 16:28:46 +00:00
akolliasAMD
24d9d1c37a
Merge pull request #583 from yzygitzh/ziyyang/ll-fix
...
Remove redundant LOAD/STORE usage in LL initialization
[ROCm/rccl commit: da31537ec7 ]
2022-07-14 08:51:39 -06:00
Ziyue Yang
e1aae026bf
Remove redundant LOAD/STORE usage in LL initialization
...
[ROCm/rccl commit: 77c2bef952 ]
2022-07-14 00:40:36 +00:00
akolliasAMD
d2df866925
Merge pull request #582 from akolliasAMD/readmeUpdate
...
updated readme to reflect the newer tests
[ROCm/rccl commit: 873c13b47a ]
2022-07-13 12:28:30 -06:00
akolliasAMD
2a1d472a20
updated readme to reflect the newer tests
...
[ROCm/rccl commit: 5950942738 ]
2022-07-13 16:08:28 +00:00
Wenkai Du
9a9d9cb29b
README.md: add CMAKE_PREFIX_PATH to build steps ( #581 )
...
[ROCm/rccl commit: 314da5a485 ]
2022-07-12 11:32:07 -07:00
hubertlu-tw
e13eb2eab9
Enhancement of RCCL logging information for topology-aware utilities
...
[ROCm/rccl commit: a1842df858 ]
2022-07-11 19:01:10 +00:00
Wenkai Du
c129677fe0
Skip HDP cache flush for gfx90a ( #578 )
...
* Skip HDP cache flush for gfx90a
* Remove extra debug print
[ROCm/rccl commit: 8c3c8b78c0 ]
2022-07-08 10:13:32 -07:00
Wenkai Du
659cd52d5c
Add more constraints to enable GDR ( #579 )
...
* Add more constraints to enable GDR
* Revert deleted line
[ROCm/rccl commit: aa0d7ca882 ]
2022-07-08 09:52:27 -07:00
Yifan Xiong
bf15ad1d72
Reduce AlltoAll port usage in send/recv proxy ( #577 )
...
* Reduce AlltoAll port usage when connecting proxy
Reuse socket ports when connecting proxies in AlltoAll.
Existing port usage in AlltoAll is O(n) for recv and O(n) for send,
reusing socket ports in server or client side will make one of them
O(1), reusing both will reduce the total port usage to O(1) and enables
AlltoAll in >64 MI200 nodes.
* Update changelog accordingly
Update changelog accordingly.
[ROCm/rccl commit: 80f53cc171 ]
2022-07-07 16:15:52 -07:00
Wenkai Du
4b99cef680
Revert "Adding the missing roc:: namespace ( #570 )" ( #576 )
...
This reverts commit fc340decf4 .
[ROCm/rccl commit: 2e65881a79 ]
2022-07-06 10:07:35 -07:00
Wenkai Du
e04bba619a
Use nontemporal in slow path and add XGMI sys type ( #575 )
...
* Use nontemporal in slow path and add XGMI sys type
* Clean up XGMI detection
[ROCm/rccl commit: b250c01cbe ]
2022-07-06 07:58:41 -07:00
Wenkai Du
2f4aea93e0
Fix GPU to NIC mapping in tree ( #573 )
...
* Fix GPU to NIC mapping in tree
* Update tuning table
[ROCm/rccl commit: 00af1f64e9 ]
2022-07-03 20:52:52 -07:00
gilbertlee-amd
cb5ae7224e
Adding git hash info to version output line ( #572 )
...
[ROCm/rccl commit: a89a9966aa ]
2022-06-28 16:42:51 -06:00
Dmitry Mikushin
fc340decf4
Adding the missing roc:: namespace ( #570 )
...
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com >
* Adding a changelog entry
* minor updates to wording
* missing period
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com >
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com >
[ROCm/rccl commit: d5bea2cfaa ]
2022-06-27 11:44:43 -06:00
Wenkai Du
915a9d3934
Do not set NET GDR level automatically ( #571 )
...
[ROCm/rccl commit: 9a285b5e1d ]
2022-06-23 16:28:28 -07:00
Wenkai Du
784b12bf75
Use different atomics to check flags in kernel ( #568 )
...
[ROCm/rccl commit: c3bb9e70d0 ]
2022-06-23 09:16:41 -07:00
akolliasAMD
f5b5160820
Merge pull request #569 from akolliasAMD/disableMultiRankTest
...
moved default number of max ranks per gpu to 1
[ROCm/rccl commit: 06f05300fe ]
2022-06-22 15:52:06 -04:00
akolliasAMD
dcf46e84e0
moved default number of max ranks per gpu to 1
...
[ROCm/rccl commit: 8b9291eb47 ]
2022-06-22 17:37:49 +00:00
Ziyue Yang
2b418b5dee
Add Feature - Add NPKit Support in RCCL ( #564 )
...
* apply npkit
* fix bug
* add npkit in readme
[ROCm/rccl commit: 6e93fafdc3 ]
2022-06-20 14:30:19 -07:00
Wenkai Du
0fb000932f
Change default nchannels per peer ( #563 )
...
[ROCm/rccl commit: f274c865c1 ]
2022-06-13 06:39:05 -07:00
arvindcheru
9c0e790eb5
[CMake] GNU Install Dir Enhancements ( #557 )
...
* sd321110 (GNUInstall Dir) enhancements
[ROCm/rccl commit: a1fe1adf1c ]
2022-06-10 18:51:51 -04:00
Edgar Gabriel
5099922936
Merge pull request #561 from edgargabriel/multi-rank-devel
...
Multi rank devel
[ROCm/rccl commit: 45e611dffd ]
2022-06-10 11:19:20 -05:00
Edgar
f7ef619ba7
extending the unit-tests for multi-rank support
...
[ROCm/rccl commit: a87d61db2b ]
2022-06-10 14:23:19 +00:00
Edgar
8953f5b5ca
Introduce multi-rank support per device.
...
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
[ROCm/rccl commit: 0336ffdf70 ]
2022-06-10 14:23:12 +00:00
Wenkai Du
11a6cdd52f
Fix P2P scheduling ( #560 )
...
[ROCm/rccl commit: 5cb2aca3d9 ]
2022-06-06 13:32:28 -07:00
Wenkai Du
f2dbc77afe
Enable timing profile option ( #558 )
...
[ROCm/rccl commit: 7a6c6927ae ]
2022-06-03 07:05:13 -07:00
akolliasAMD
49adbcc5c7
Merge pull request #556 from akolliasAMD/ROCmSoftwarePlatform/2.12.12
...
Sync up with NCCL 2.12.12
[ROCm/rccl commit: 2f9663379d ]
2022-06-02 12:09:49 -04:00
Aristotelis
0b55e01ef3
Merge remote-tracking branch 'ncclRepo/master' into develop
...
[ROCm/rccl commit: e0864e7093 ]
2022-06-02 15:27:24 +00:00
Wenkai Du
1e36b432f1
Revert chunksteps changes ( #555 )
...
[ROCm/rccl commit: eef812bed7 ]
2022-05-31 14:45:51 -07:00
Wenkai Du
5becf1669f
Add another Rome model ( #553 )
...
* Add another Rome model
* Add option to force enable intranet on single node
* Limit p2p channels to number of ranks
* Refine p2p channels handling
[ROCm/rccl commit: ef499c4810 ]
2022-05-31 11:31:30 -07:00
akolliasAMD
a03ab8e752
code cleanup ( #554 )
...
[ROCm/rccl commit: a0a686e74c ]
2022-05-31 09:59:36 -04:00
Wenkai Du
2c125ce6ed
Update Rome model ( #552 )
...
[ROCm/rccl commit: c5b77121f0 ]
2022-05-26 09:59:23 -07:00
akolliasAMD
22dc8bd246
Added creation of new tree and added switch for using treesplit for specific cases ( #551 )
...
[ROCm/rccl commit: 98f0809a39 ]
2022-05-25 18:55:14 -04:00
gilbertlee-amd
a2a4888497
Moving opt-in custom signal handler from UnitTests into RCCL ( #550 )
...
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
[ROCm/rccl commit: 700b473211 ]
2022-05-20 09:56:38 -06:00
Wenkai Du
86e8797602
Add switch for pivot alltoall kernel ( #549 )
...
[ROCm/rccl commit: 6707a270b1 ]
2022-05-17 18:14:04 -07:00