Commit-Graf

873 Incheckningar

Upphovsman SHA1 Meddelande Datum
Edgar Gabriel 3225ee7cd0 Merge pull request #615 from edgargabriel/topic/two-trees
add binary tree

[ROCm/rccl commit: ea8120a346]
2022-09-13 16:50:45 -05:00
Edgar Gabriel 7148c0aa7b add binary tree
In addition, introduce the ability to have 2 trees at the same time.
Only for allreduce at the moment.


[ROCm/rccl commit: 65e2ae20e5]
2022-09-13 20:52:32 +00:00
gilbertlee-amd 35872115f8 Updating stream caching (#614)
- Adding non-captured hipStream for use in setup

[ROCm/rccl commit: dd56135a9a]
2022-09-09 16:30:15 -06:00
gilbertlee-amd af71be44f1 GraphBench (#613)
Adding simple GraphBench tool for comparing RCCL hipGraph performance

[ROCm/rccl commit: 65d78e9a1d]
2022-09-09 12:12:25 -06:00
Wenkai Du fe99249cde Enable LL128 protocol support (#605)
* Enable LL128 protocol support

* Use shared memory object directly when possible

[ROCm/rccl commit: 7bbce085cc]
2022-09-08 14:45:27 -07:00
Lauren Wrubleski 3da06e4704 Update ubuntu18 to ubuntu20 (#611)
[ROCm/rccl commit: d700a94918]
2022-09-07 16:02:37 -06:00
Min Si 25ba51fe83 Fix compilation issues with buck (#610)
* Fix compilation warning with -Wmisleading-indentation

When compile with -Wmisleading-indentation, it reports warning:
misleading indentation; statement is not part of the previous 'if'

This patch fixes it

* Avoid relative include file path

We don't need relative include file paths for src/graph/*.h
since src/ is already in CMake include_directories

[ROCm/rccl commit: 2b57751abb]
2022-09-07 09:56:05 -06:00
gilbertlee-amd 616cb39a0b Adding opt-in hipGraph support for RCCL via RCCL_ENABLE_HIPGRAPH (#608)
Adding opt-in hipGraph support via RCCL_ENABLE_HIPGRAPH

[ROCm/rccl commit: 47b2fc3a30]
2022-09-06 10:29:46 -06:00
akolliasAMD 2cd63dac42 added stream synch after hipMemset (#609)
[ROCm/rccl commit: 06bce9d0c9]
2022-08-30 16:18:37 -06:00
Wenkai Du f18868f439 Use hipExtLaunchKernel when not using graph and not in group mode (#606)
[ROCm/rccl commit: c9f2fe1f65]
2022-08-26 13:40:37 -07:00
akolliasAMD 151a8ef56a git_version cmake consistency changes (#604)
* git_version cmake variable consistency changes

[ROCm/rccl commit: 6670dc95ab]
2022-08-25 15:11:28 -06:00
Edgar Gabriel 22dcbed61b Merge pull request #603 from edgargabriel/topic/float16_unit_tests
introduce support for ncclFloat16/half in UT

[ROCm/rccl commit: 8a311583e0]
2022-08-25 07:40:20 -05:00
Edgar Gabriel b32b819151 introduce support for ncclFloat16/half in UT
[ROCm/rccl commit: f6e00dec13]
2022-08-24 15:28:24 +00:00
Edgar Gabriel 6bb871c986 Merge pull request #598 from edgargabriel/topic/tree-multirank
Expand ncclTreeBasePostset for multi-rank

[ROCm/rccl commit: e739c62a53]
2022-08-24 08:28:34 -05:00
Wenkai Du 56ea2c4be5 Use non-temporal access for slow path (#602)
[ROCm/rccl commit: 88487a62bb]
2022-08-23 08:21:51 -07:00
Edgar Gabriel aa6d450f35 fix channelcount for multi-rank scenario
[ROCm/rccl commit: 4141ec1151]
2022-08-22 19:09:22 +00:00
akolliasAMD 1d55fe756c Simple tree changes (#599)
changed treebase to create basic balanced tree

[ROCm/rccl commit: 3c1b1ec8c8]
2022-08-19 13:51:49 -06:00
Edgar Gabriel 27cb7d2b20 Merge pull request #601 from CosmicFusion/patch-2
fix error: use of undeclared identifier 'free'

[ROCm/rccl commit: 6fba80208c]
2022-08-19 14:10:49 -05:00
Cosmic Fusion 1dcf1da5ca fix error: use of undeclared identifier 'free'
include stdlib.h to fix compilation error in rccl :

[39/58] Building CXX object CMakeFiles/rccl.dir/src/misc/signals.cc.o
FAILED: CMakeFiles/rccl.dir/src/misc/signals.cc.o 
/opt/rocm/bin/hipcc -DENABLE_COLLTRACE -DHAVE_BFD -DHAVE_CPLUS_DEMANGLE -DUSE_ROCM_SMI64CONFIG -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS -I/home/cosmo/build/flgrwqa/build/include -I/home/cosmo/build/flgrwqa/build/include/rccl -I/home/cosmo/build/flgrwqa/rccl/src -I/home/cosmo/build/flgrwqa/rccl/src/include -I/home/cosmo/build/flgrwqa/rccl/src/collectives -I/home/cosmo/build/flgrwqa/rccl/src/collectives/device -I/opt/hsa/include -fPIC -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip --offload-arch=gfx803 --offload-arch=gfx900:xnack- --offload-arch=gfx906:xnack- --offload-arch=gfx908:xnack- --offload-arch=gfx90a:xnack- --offload-arch=gfx90a:xnack+ --offload-arch=gfx1030 -std=c++14 -MD -MT CMakeFiles/rccl.dir/src/misc/signals.cc.o -MF CMakeFiles/rccl.dir/src/misc/signals.cc.o.d -o CMakeFiles/rccl.dir/src/misc/signals.cc.o -c /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc
In file included from /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc:8:
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:138:9: error: use of undeclared identifier 'free'
        free(file->syms);
        ^
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:155:5: error: use of undeclared identifier 'free'
    free(file->syms);
    ^

[ROCm/rccl commit: 080fc2d9d6]
2022-08-19 20:25:06 +03:00
Wenkai Du c2e9ada40b Repurpose profiling implementation to simple timestamps tracing (#600)
[ROCm/rccl commit: 14b8ff153f]
2022-08-18 15:34:46 -07:00
Wenkai Du 6c3f1366e8 Add XGMI sys type and clean up detection code (#597)
[ROCm/rccl commit: f5c0b243a8]
2022-08-12 09:52:29 -07:00
Ziyue Yang 478d8312b8 Improve alignment and tuning for Pivot A2A algorithm (#593)
* Improve alignment and tuning for Pivot A2A algorithm

* enable pivot a2a by default

[ROCm/rccl commit: f6b9686482]
2022-08-05 19:40:19 -07:00
gilbertlee-amd e3b832f4ce Disable clique AllReduce UnitTest (#595)
[ROCm/rccl commit: dae11c2aca]
2022-08-04 18:30:00 -06:00
gilbertlee-amd b350916a6e Fixing CMake to avoid unnecessary git_version relinking (#594)
[ROCm/rccl commit: 9ed9cd0e31]
2022-08-04 18:03:59 -06:00
arvindcheru a44be6655d HIP Path default updated to ROCM_PATH (reorg path) (#592)
Updated default path for hip to ROCM_PATH (/opt/rocm instead of /opt/rocm/hip) as per new/current structure.

[ROCm/rccl commit: 2cb2f9493a]
2022-08-04 13:38:41 -04:00
akolliasAMD 6fb5c5d5e3 minor latency tuning (#591)
* minor tuning for tree ll

[ROCm/rccl commit: 4cecdc9be5]
2022-08-03 15:07:44 -06:00
Wenkai Du f70830d629 Revert "Use nontemporal in slow path and add XGMI sys type (#575)" (#590)
This reverts commit e04bba619a.

[ROCm/rccl commit: 9089e68a99]
2022-08-02 09:31:53 -07:00
Wenkai Du 7e124d5b83 Add nccl_net.h to librccl-dev package (#589)
[ROCm/rccl commit: e2cb95a390]
2022-07-29 13:39:49 -07:00
akolliasAMD d5ca0be51f Fixed issue with attomicEXCH creating errors on multi node runs (#587)
[ROCm/rccl commit: 254208e7dd]
2022-07-22 11:32:49 -06:00
akolliasAMD fd99ca19f5 updated alltoallV test to reflect how send counts are done in perf tests (#586)
[ROCm/rccl commit: 686dbc8bc6]
2022-07-21 14:59:34 -06:00
akolliasAMD 18d9fd1b8f Removing redundant LOAD and STORE on primitives plus adding some atomics (#585)
[ROCm/rccl commit: 451c287aa6]
2022-07-21 13:04:57 -06:00
Hubert Lu 088d62ff58 Merge pull request #580 from hubertlu-tw/develop
Enhancement of RCCL logging information for topology-aware utilities

[ROCm/rccl commit: 6dd090917a]
2022-07-15 15:16:37 -07:00
Edgar Gabriel 24f2071206 Merge pull request #584 from edgargabriel/topic/signal-backtrace
intercept SIGUSR2 in RCCL

[ROCm/rccl commit: 58437544f8]
2022-07-15 11:31:19 -05:00
Edgar Gabriel a9e0333dba intercept SIGUSR2 in RCCL
add support for intercepting SIGUSR2 in RCCL. This signal will
not terminate the execution of the application, but print the stacktrace
of the process that the signal was sent to instead.


[ROCm/rccl commit: 2b1d5d3bc1]
2022-07-15 16:28:46 +00:00
akolliasAMD 24d9d1c37a Merge pull request #583 from yzygitzh/ziyyang/ll-fix
Remove redundant LOAD/STORE usage in LL initialization

[ROCm/rccl commit: da31537ec7]
2022-07-14 08:51:39 -06:00
Ziyue Yang e1aae026bf Remove redundant LOAD/STORE usage in LL initialization
[ROCm/rccl commit: 77c2bef952]
2022-07-14 00:40:36 +00:00
akolliasAMD d2df866925 Merge pull request #582 from akolliasAMD/readmeUpdate
updated readme to reflect the newer tests

[ROCm/rccl commit: 873c13b47a]
2022-07-13 12:28:30 -06:00
akolliasAMD 2a1d472a20 updated readme to reflect the newer tests
[ROCm/rccl commit: 5950942738]
2022-07-13 16:08:28 +00:00
Wenkai Du 9a9d9cb29b README.md: add CMAKE_PREFIX_PATH to build steps (#581)
[ROCm/rccl commit: 314da5a485]
2022-07-12 11:32:07 -07:00
hubertlu-tw e13eb2eab9 Enhancement of RCCL logging information for topology-aware utilities
[ROCm/rccl commit: a1842df858]
2022-07-11 19:01:10 +00:00
Wenkai Du c129677fe0 Skip HDP cache flush for gfx90a (#578)
* Skip HDP cache flush for gfx90a

* Remove extra debug print

[ROCm/rccl commit: 8c3c8b78c0]
2022-07-08 10:13:32 -07:00
Wenkai Du 659cd52d5c Add more constraints to enable GDR (#579)
* Add more constraints to enable GDR

* Revert deleted line

[ROCm/rccl commit: aa0d7ca882]
2022-07-08 09:52:27 -07:00
Yifan Xiong bf15ad1d72 Reduce AlltoAll port usage in send/recv proxy (#577)
* Reduce AlltoAll port usage when connecting proxy

Reuse socket ports when connecting proxies in AlltoAll.

Existing port usage in AlltoAll is O(n) for recv and O(n) for send,
reusing socket ports in server or client side will make one of them
O(1), reusing both will reduce the total port usage to O(1) and enables
AlltoAll in >64 MI200 nodes.

* Update changelog accordingly

Update changelog accordingly.

[ROCm/rccl commit: 80f53cc171]
2022-07-07 16:15:52 -07:00
Wenkai Du 4b99cef680 Revert "Adding the missing roc:: namespace (#570)" (#576)
This reverts commit fc340decf4.

[ROCm/rccl commit: 2e65881a79]
2022-07-06 10:07:35 -07:00
Wenkai Du e04bba619a Use nontemporal in slow path and add XGMI sys type (#575)
* Use nontemporal in slow path and add XGMI sys type

* Clean up XGMI detection

[ROCm/rccl commit: b250c01cbe]
2022-07-06 07:58:41 -07:00
Wenkai Du 2f4aea93e0 Fix GPU to NIC mapping in tree (#573)
* Fix GPU to NIC mapping in tree

* Update tuning table

[ROCm/rccl commit: 00af1f64e9]
2022-07-03 20:52:52 -07:00
gilbertlee-amd cb5ae7224e Adding git hash info to version output line (#572)
[ROCm/rccl commit: a89a9966aa]
2022-06-28 16:42:51 -06:00
Dmitry Mikushin fc340decf4 Adding the missing roc:: namespace (#570)
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

* Adding a changelog entry

* minor updates to wording

* missing period

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>

[ROCm/rccl commit: d5bea2cfaa]
2022-06-27 11:44:43 -06:00
Wenkai Du 915a9d3934 Do not set NET GDR level automatically (#571)
[ROCm/rccl commit: 9a285b5e1d]
2022-06-23 16:28:28 -07:00
Wenkai Du 784b12bf75 Use different atomics to check flags in kernel (#568)
[ROCm/rccl commit: c3bb9e70d0]
2022-06-23 09:16:41 -07:00