Граф коммитов

633 Коммитов

Автор SHA1 Сообщение Дата
Eiden Yoshida fb267ea333 Move address-sanitizer build above addition of rccl library in CMakeLists (#392) 2021-06-11 14:43:54 -06:00
Wenkai Du 6dcae8a459 Select sendrecv path based on collective data size (#391)
* Select sendrecv path based on collective data size

* Add comments on packing and unpacking group field

* Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests
2021-06-10 17:51:04 -07:00
Stanley Tsang f6f5e16fe6 Fixing bug with ExtractSubDataset function not fully initializing subdataset (#390) 2021-06-10 14:35:39 -06:00
Eiden Yoshida eea7b24058 Add address sanitizer build option (#389) 2021-06-10 09:14:54 -06:00
Wenkai Du b815a2800f Setup collectives threshold for enabling intranet (#387)
* Setup collectives threshold for enabling intranet

* Use separate operation counters for coll and p2p
2021-06-09 13:24:26 -07:00
Wenkai Du c2064adcc7 Add support for another Rome model (#385) 2021-06-08 13:58:20 -07:00
Stanley Tsang 6842429a14 Updating changelog to show install script fix (#384) 2021-06-08 13:00:40 -06:00
Wenkai Du a3a8c2d56b Allow intranode use of network connection (#383)
* Allow intranode use of network connection

* Checking for graph for null pointer
2021-06-08 07:37:59 -07:00
Stanley Tsang 820a53287f Fixing install script so that invoking -r alone does not trigger rebuild (#382) 2021-06-04 09:46:04 -06:00
Wenkai Du 961922ea02 Add option to enable multiple SAT in SHARP (#380)
* Add option to enable multiple SAT in SHARP

* Extend number of NICs to 16
2021-06-03 19:45:18 -07:00
gilbertlee-amd 903c84050d ROCm 4.3 changelog update (#379)
* Update CHANGELOG.md (#378)

* Updating CHANGELOG.md for ROCm 4.3
2021-06-03 10:56:02 -06:00
Wenkai Du 03ac898825 Merge pull request #377 from wenkaidu/2.9.9
Sync up with NCCL 2.9.9
2021-05-26 11:38:19 -07:00
Wenkai Du 13dc80ee14 topo_expl: update to 2.9.9 2021-05-26 09:24:34 -07:00
Wenkai Du e3abf1c2ec Merge remote-tracking branch 'nccl/master' into develop 2021-05-25 20:52:15 -07:00
Stanley Tsang 256403d4f0 Adding support for hipMallocManaged() in unit tests (#375)
* Adding HMM support for unit tests

* Fixing HMM opt-in check
2021-05-25 17:07:12 -06:00
Wenkai Du 4c83adb75c Update Rome models matching (#376) 2021-05-25 10:12:40 -07:00
gilbertlee-amd 8e817ecd6d Tweak clique channel usage for gfx908 (#374) 2021-05-21 15:36:21 -06:00
Wenkai Du 50da1b48af Correction on max number of groups (#373) 2021-05-20 08:58:45 -07:00
Wenkai Du 8cde34be51 Use fixed segment size for sendrecv (#369) 2021-05-19 08:25:26 -07:00
Wenkai Du 42b080867e Running only sum for CI quick test (#370) 2021-05-19 08:25:13 -07:00
gilbertlee-amd ddceadc313 Tune clique-based AllReduce for device type 908 (#372)
* Changing switch-over point for clique-based AllReduce
2021-05-18 15:36:07 -06:00
gilbertlee-amd 2daadcc834 Disabling env var caching for all unit tests (#371)
* Disabling env var caching for all unit tests
2021-05-18 12:56:30 -06:00
Wenkai Du 2f31289fe6 Merge pull request #368 from ROCmSoftwarePlatform/2.9.8
Merge NCCL 2.9.8
2021-05-18 08:38:59 -07:00
Wenkai Du 87727383fe Merge remote-tracking branch 'nccl/master' into 2.9.8 2021-05-17 10:15:16 -07:00
Stanley Tsang 0b2bfdd6d8 Multiprocess unit test various fixes (#367)
* Re-enabling mp unit tests

* Fixing shared memory leak and other bugs related to shared mem for MP unit tests

* Revert 43bfbfc97bf9edbae1f386d461439091618ff8ed

* Further tightening up unlinks

* Moving test check macros to separate header file

* Tightening up shared memory unlinking for clique kernels, add munmap for host barrier for MP unit tests

* Updating new MP unit test

* Fixing mqueue bug

* Fixing memory leak in MP unit tests
2021-05-14 09:38:49 -06:00
Wenkai Du caf5c9992a Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)
To skip Infiniband, set RCCL_IB_HCA_SKIP_LINK_LAYER=1.
To skip Ethernet, RCCL_IB_HCA_SKIP_LINK_LAYER=2.
2021-05-12 14:14:53 -07:00
Sylvain Jeaugey 3fec2fa5ee 2.9.9-1
Fix crash when setting NCCL_MAX_P2P_NCHANNELS below nchannels.
Fix hang during sendrecv dynamic NVB connection establishment on
cubemesh topologies.
Add environment variable to only use SHARP on communicators beyond
a given number of ranks.
Add debug subsystem to trace memory allocations.
Fix compilation with TRACE=1. (Issue #505)
2021-05-12 11:09:31 -07:00
Wenkai Du abde40197a Merge pull request #366 from ROCmSoftwarePlatform/2.9.6
Sync up to NCCL 2.9.6
2021-05-11 20:20:42 -07:00
Wenkai Du 330b82df3b Revert "Sync up to NCCL 2.9.6 (#363)" (#365)
This reverts commit 6021329af0.
2021-05-11 20:18:17 -07:00
Wenkai Du 6021329af0 Sync up to NCCL 2.9.6 (#363)
* 2.9.6-1

Add support for CUDA graphs.
Fuse BCM Gen4 switches to avoid suboptimal performance on some platforms. Issue #439.
Fix bootstrap issue caused by connection reordering.
Fix CPU locking block.
Improve CollNet algorithm.
Improve performance on DGX A100 for communicators with only one GPU per node.

* Clique tuning upgrade (#352) (#19)

* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu

Co-authored-by: Sylvain Jeaugey <sjeaugey@nvidia.com>
Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com>
2021-05-11 19:40:34 -07:00
gilbertlee-amd b122dcd991 Update README.md (#364)
- Remove outdated HIP Direct call requirements
- Remove outdated chrpath requirement
- Adding section about HSA_FORCE_FINE_GRAIN_PCIE
2021-05-11 13:41:41 -06:00
Wenkai Du d0d5d4d921 Merge remote-tracking branch 'rccl/develop' into 2.9.6 2021-05-11 09:14:16 -07:00
gilbertlee-amd e796b1645c Clique tuning upgrade (#352) (#19)
* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu
2021-05-11 08:44:59 -06:00
Sylvain Jeaugey ca8485b0d0 2.9.8-1
Fix memory leaks.
Fix crash in bootstrap error case.
Fix Collnet clean-up issue.
Make PCI switch vendor/device optional for XML injection.
Add support for nvidia-peermem module.
2021-05-10 14:00:03 -07:00
gilbertlee-amd 9d7232c091 Clique tuning upgrade (#352)
* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu
2021-05-06 09:50:07 -06:00
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
gilbertlee-amd 4f8e788a61 Fixing potential race-condition in env var parameter macro (#359) 2021-04-28 12:04:41 -06:00
saadrahim 96782191cf Expanding CI coverage for 8GPU configurations plus extended tests (#350) 2021-04-27 09:57:00 -06:00
Wenkai Du ad54a14a5c Add libdl linking option (#358) 2021-04-26 15:24:58 -07:00
Wenkai Du ed237dcaa7 Use better name for kernel collective trace enable (#357)
"NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,COLL" enables collectives API
trace. Adding "RCCL_KERNEL_COLL_TRACE_ENABLE=1" enables kernel traces.
2021-04-26 08:35:53 -07:00
Wenkai Du 9cc9c3360b Control collective trace from kernel separately (#356) 2021-04-23 16:36:19 -07:00
Stanley Tsang 70597789d0 Message queue refactor to POSIX implementation and leak fix (#355)
* Fixing message queue leak.

* Using POSIX implementation of Message Queues

* Adding unlink to msgqueue

* MsgQueue update

* Adding timeout check to msgqueue broadcast; tightening up system checks

* Removing unnecessary code

* Removing extra argument from print

* Adding explicit msg queue close call to all other ranks
2021-04-23 11:33:20 -06:00
Wenkai Du 415c7cd3d1 Tune number of channels for gfx90a (#349) 2021-04-19 15:27:01 -07:00
Wenkai Du 9c718ce6d6 Use correct WARP_SIZE for gfx1030 (#348) 2021-04-14 14:09:52 -07:00
Wenkai Du a79f74082e Limit max channels for ring graph on single node Rome (#347)
* Limit max channels for ring graph on single node Rome
* Partially revert "Use non-temporal access for streaming data (#341)"
2021-04-14 10:14:54 -07:00
Wenkai Du 1fe031402a Add gfx90a target (#344)
* Add gfx90a target

* Support gfx90a topology

Co-authored-by: Eiden Yoshida <eiden.yoshida@amd.com>
2021-04-14 09:29:00 -06:00
Sylvain Jeaugey a46ea10583 2.9.6-1
Add support for CUDA graphs.
Fuse BCM Gen4 switches to avoid suboptimal performance on some platforms. Issue #439.
Fix bootstrap issue caused by connection reordering.
Fix CPU locking block.
Improve CollNet algorithm.
Improve performance on DGX A100 for communicators with only one GPU per node.
2021-04-12 16:00:46 -07:00
Wenkai Du 3f18540f50 Remove link to NUMA lib as it is no longer needed (#346) 2021-04-12 09:53:17 -07:00
TomSang 87f12cbb86 Add detection of cooperative multi device launch attribute (#345) 2021-04-11 13:29:24 -07:00
Wenkai Du def8b4ca0d Move RCCL changelog and Copyright out of /usr/share (#343) 2021-04-09 14:08:40 -07:00