커밋 그래프

844 커밋

작성자 SHA1 메시지 날짜
akolliasAMD 686dbc8bc6 updated alltoallV test to reflect how send counts are done in perf tests (#586) 2022-07-21 14:59:34 -06:00
akolliasAMD 451c287aa6 Removing redundant LOAD and STORE on primitives plus adding some atomics (#585) 2022-07-21 13:04:57 -06:00
Hubert Lu 6dd090917a Merge pull request #580 from hubertlu-tw/develop
Enhancement of RCCL logging information for topology-aware utilities
2022-07-15 15:16:37 -07:00
Edgar Gabriel 58437544f8 Merge pull request #584 from edgargabriel/topic/signal-backtrace
intercept SIGUSR2 in RCCL
2022-07-15 11:31:19 -05:00
Edgar Gabriel 2b1d5d3bc1 intercept SIGUSR2 in RCCL
add support for intercepting SIGUSR2 in RCCL. This signal will
not terminate the execution of the application, but print the stacktrace
of the process that the signal was sent to instead.
2022-07-15 16:28:46 +00:00
akolliasAMD da31537ec7 Merge pull request #583 from yzygitzh/ziyyang/ll-fix
Remove redundant LOAD/STORE usage in LL initialization
2022-07-14 08:51:39 -06:00
Ziyue Yang 77c2bef952 Remove redundant LOAD/STORE usage in LL initialization 2022-07-14 00:40:36 +00:00
akolliasAMD 873c13b47a Merge pull request #582 from akolliasAMD/readmeUpdate
updated readme to reflect the newer tests
2022-07-13 12:28:30 -06:00
akolliasAMD 5950942738 updated readme to reflect the newer tests 2022-07-13 16:08:28 +00:00
Wenkai Du 314da5a485 README.md: add CMAKE_PREFIX_PATH to build steps (#581) 2022-07-12 11:32:07 -07:00
hubertlu-tw a1842df858 Enhancement of RCCL logging information for topology-aware utilities 2022-07-11 19:01:10 +00:00
Wenkai Du 8c3c8b78c0 Skip HDP cache flush for gfx90a (#578)
* Skip HDP cache flush for gfx90a

* Remove extra debug print
2022-07-08 10:13:32 -07:00
Wenkai Du aa0d7ca882 Add more constraints to enable GDR (#579)
* Add more constraints to enable GDR

* Revert deleted line
2022-07-08 09:52:27 -07:00
Yifan Xiong 80f53cc171 Reduce AlltoAll port usage in send/recv proxy (#577)
* Reduce AlltoAll port usage when connecting proxy

Reuse socket ports when connecting proxies in AlltoAll.

Existing port usage in AlltoAll is O(n) for recv and O(n) for send,
reusing socket ports in server or client side will make one of them
O(1), reusing both will reduce the total port usage to O(1) and enables
AlltoAll in >64 MI200 nodes.

* Update changelog accordingly

Update changelog accordingly.
2022-07-07 16:15:52 -07:00
Wenkai Du 2e65881a79 Revert "Adding the missing roc:: namespace (#570)" (#576)
This reverts commit d5bea2cfaa.
2022-07-06 10:07:35 -07:00
Wenkai Du b250c01cbe Use nontemporal in slow path and add XGMI sys type (#575)
* Use nontemporal in slow path and add XGMI sys type

* Clean up XGMI detection
2022-07-06 07:58:41 -07:00
Wenkai Du 00af1f64e9 Fix GPU to NIC mapping in tree (#573)
* Fix GPU to NIC mapping in tree

* Update tuning table
2022-07-03 20:52:52 -07:00
gilbertlee-amd a89a9966aa Adding git hash info to version output line (#572) 2022-06-28 16:42:51 -06:00
Dmitry Mikushin d5bea2cfaa Adding the missing roc:: namespace (#570)
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

* Adding a changelog entry

* minor updates to wording

* missing period

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2022-06-27 11:44:43 -06:00
Wenkai Du 9a285b5e1d Do not set NET GDR level automatically (#571) 2022-06-23 16:28:28 -07:00
Wenkai Du c3bb9e70d0 Use different atomics to check flags in kernel (#568) 2022-06-23 09:16:41 -07:00
akolliasAMD 06f05300fe Merge pull request #569 from akolliasAMD/disableMultiRankTest
moved default number of max ranks per gpu to 1
2022-06-22 15:52:06 -04:00
akolliasAMD 8b9291eb47 moved default number of max ranks per gpu to 1 2022-06-22 17:37:49 +00:00
Ziyue Yang 6e93fafdc3 Add Feature - Add NPKit Support in RCCL (#564)
* apply npkit

* fix bug

* add npkit in readme
2022-06-20 14:30:19 -07:00
Wenkai Du f274c865c1 Change default nchannels per peer (#563) 2022-06-13 06:39:05 -07:00
arvindcheru a1fe1adf1c [CMake] GNU Install Dir Enhancements (#557)
* sd321110 (GNUInstall Dir) enhancements
2022-06-10 18:51:51 -04:00
Edgar Gabriel 45e611dffd Merge pull request #561 from edgargabriel/multi-rank-devel
Multi rank devel
2022-06-10 11:19:20 -05:00
Edgar a87d61db2b extending the unit-tests for multi-rank support 2022-06-10 14:23:19 +00:00
Edgar 0336ffdf70 Introduce multi-rank support per device.
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
2022-06-10 14:23:12 +00:00
Wenkai Du 5cb2aca3d9 Fix P2P scheduling (#560) 2022-06-06 13:32:28 -07:00
Wenkai Du 7a6c6927ae Enable timing profile option (#558) 2022-06-03 07:05:13 -07:00
akolliasAMD 2f9663379d Merge pull request #556 from akolliasAMD/ROCmSoftwarePlatform/2.12.12
Sync up with NCCL 2.12.12
2022-06-02 12:09:49 -04:00
Aristotelis e0864e7093 Merge remote-tracking branch 'ncclRepo/master' into develop 2022-06-02 15:27:24 +00:00
Wenkai Du eef812bed7 Revert chunksteps changes (#555) 2022-05-31 14:45:51 -07:00
Wenkai Du ef499c4810 Add another Rome model (#553)
* Add another Rome model

* Add option to force enable intranet on single node

* Limit p2p channels to number of ranks

* Refine p2p channels handling
2022-05-31 11:31:30 -07:00
akolliasAMD a0a686e74c code cleanup (#554) 2022-05-31 09:59:36 -04:00
Wenkai Du c5b77121f0 Update Rome model (#552) 2022-05-26 09:59:23 -07:00
akolliasAMD 98f0809a39 Added creation of new tree and added switch for using treesplit for specific cases (#551) 2022-05-25 18:55:14 -04:00
gilbertlee-amd 700b473211 Moving opt-in custom signal handler from UnitTests into RCCL (#550)
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
Wenkai Du 6707a270b1 Add switch for pivot alltoall kernel (#549) 2022-05-17 18:14:04 -07:00
Wenkai Du 283dc86a73 Refine and add new Rome models (#548) 2022-05-17 08:23:59 -07:00
Sylvain Jeaugey 7aa1c46fd5 2.12.12-1
Improve allreduce performance when we have more than one network interface per
GPU and we need to use PXN to close rings.
Add support for PCI Gen5 on 5.4 kernels.
Fix crash when setting NCCL_SET_THREAD_NAME.
Fix random crash in init due to uninitialized struct.
Fix hang on cubemesh topologies.
Add P2P_DIRECT_DISABLE parameter to disable direct access to pointers within a
process.
2022-05-13 00:26:57 -07:00
Wenkai Du c9919e0e35 Improve LL performance (#546)
* Improve LL performance

* Add split barriers for LL
2022-05-10 13:32:10 -07:00
Edgar Gabriel 46b30c5f9b Merge pull request #544 from edgargabriel/topic/header-file-include
fix cmake logic to handle old and new include dirs
2022-04-28 16:29:08 -05:00
Edgar 4c4a7cb696 fix cmake logic to handle old and new include dirs
Starting from rocm 5.2 there is a reorganization of the
include directories. This pr allows to compile
rccl on both the old and the new directory layout.
This solution is using find_package() for identifying correct
settings for rocm_smi starting from rocm-5.2, and the original (manual)
settings for all previous releases.

Tested with rocm-5.2, 5.1.1, 5.0.2, and 4.5.2.
2022-04-28 14:33:46 -04:00
gilbertlee-amd 685bcea127 [TransferBench] Syncing with TransferBench v1.02 (#541) 2022-04-27 20:43:24 -06:00
Wenkai Du 063da25563 topo_expl: fix build and add tuning support (#539) 2022-04-26 15:40:07 -07:00
Wenkai Du 379940dfac Merge pull request #533 from ROCmSoftwarePlatform/2.12.10
Sync up with NCCL 2.12.10
2022-04-26 10:09:37 -07:00
Edgar Gabriel 39e3002e19 Merge pull request #530 from edgargabriel/topic/signal-intercept
Topic/signal intercept
2022-04-25 10:44:26 -05:00
Edgar 2bf6d254b6 add a signal handler and backtrace
Tweak the signal handler and force non-release build
Increase ulimit locked memory value
Update the singal handler to use bfd symbol resolution.
Include configure logic to find bfd functions.
Add optionally c++ function name demangling
2022-04-25 10:48:17 -04:00