Wykres commitów

869 Commity

Autor SHA1 Wiadomość Data
Wenkai Du 7bbce085cc Enable LL128 protocol support (#605)
* Enable LL128 protocol support

* Use shared memory object directly when possible
2022-09-08 14:45:27 -07:00
Lauren Wrubleski d700a94918 Update ubuntu18 to ubuntu20 (#611) 2022-09-07 16:02:37 -06:00
Min Si 2b57751abb Fix compilation issues with buck (#610)
* Fix compilation warning with -Wmisleading-indentation

When compile with -Wmisleading-indentation, it reports warning:
misleading indentation; statement is not part of the previous 'if'

This patch fixes it

* Avoid relative include file path

We don't need relative include file paths for src/graph/*.h
since src/ is already in CMake include_directories
2022-09-07 09:56:05 -06:00
gilbertlee-amd 47b2fc3a30 Adding opt-in hipGraph support for RCCL via RCCL_ENABLE_HIPGRAPH (#608)
Adding opt-in hipGraph support via RCCL_ENABLE_HIPGRAPH
2022-09-06 10:29:46 -06:00
akolliasAMD 06bce9d0c9 added stream synch after hipMemset (#609) 2022-08-30 16:18:37 -06:00
Wenkai Du c9f2fe1f65 Use hipExtLaunchKernel when not using graph and not in group mode (#606) 2022-08-26 13:40:37 -07:00
akolliasAMD 6670dc95ab git_version cmake consistency changes (#604)
* git_version cmake variable consistency changes
2022-08-25 15:11:28 -06:00
Edgar Gabriel 8a311583e0 Merge pull request #603 from edgargabriel/topic/float16_unit_tests
introduce support for ncclFloat16/half in UT
2022-08-25 07:40:20 -05:00
Edgar Gabriel f6e00dec13 introduce support for ncclFloat16/half in UT 2022-08-24 15:28:24 +00:00
Edgar Gabriel e739c62a53 Merge pull request #598 from edgargabriel/topic/tree-multirank
Expand ncclTreeBasePostset for multi-rank
2022-08-24 08:28:34 -05:00
Wenkai Du 88487a62bb Use non-temporal access for slow path (#602) 2022-08-23 08:21:51 -07:00
Edgar Gabriel 4141ec1151 fix channelcount for multi-rank scenario 2022-08-22 19:09:22 +00:00
akolliasAMD 3c1b1ec8c8 Simple tree changes (#599)
changed treebase to create basic balanced tree
2022-08-19 13:51:49 -06:00
Edgar Gabriel 6fba80208c Merge pull request #601 from CosmicFusion/patch-2
fix error: use of undeclared identifier 'free'
2022-08-19 14:10:49 -05:00
Cosmic Fusion 080fc2d9d6 fix error: use of undeclared identifier 'free'
include stdlib.h to fix compilation error in rccl :

[39/58] Building CXX object CMakeFiles/rccl.dir/src/misc/signals.cc.o
FAILED: CMakeFiles/rccl.dir/src/misc/signals.cc.o 
/opt/rocm/bin/hipcc -DENABLE_COLLTRACE -DHAVE_BFD -DHAVE_CPLUS_DEMANGLE -DUSE_ROCM_SMI64CONFIG -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS -I/home/cosmo/build/flgrwqa/build/include -I/home/cosmo/build/flgrwqa/build/include/rccl -I/home/cosmo/build/flgrwqa/rccl/src -I/home/cosmo/build/flgrwqa/rccl/src/include -I/home/cosmo/build/flgrwqa/rccl/src/collectives -I/home/cosmo/build/flgrwqa/rccl/src/collectives/device -I/opt/hsa/include -fPIC -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip --offload-arch=gfx803 --offload-arch=gfx900:xnack- --offload-arch=gfx906:xnack- --offload-arch=gfx908:xnack- --offload-arch=gfx90a:xnack- --offload-arch=gfx90a:xnack+ --offload-arch=gfx1030 -std=c++14 -MD -MT CMakeFiles/rccl.dir/src/misc/signals.cc.o -MF CMakeFiles/rccl.dir/src/misc/signals.cc.o.d -o CMakeFiles/rccl.dir/src/misc/signals.cc.o -c /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc
In file included from /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc:8:
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:138:9: error: use of undeclared identifier 'free'
        free(file->syms);
        ^
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:155:5: error: use of undeclared identifier 'free'
    free(file->syms);
    ^
2022-08-19 20:25:06 +03:00
Wenkai Du 14b8ff153f Repurpose profiling implementation to simple timestamps tracing (#600) 2022-08-18 15:34:46 -07:00
Wenkai Du f5c0b243a8 Add XGMI sys type and clean up detection code (#597) 2022-08-12 09:52:29 -07:00
Ziyue Yang f6b9686482 Improve alignment and tuning for Pivot A2A algorithm (#593)
* Improve alignment and tuning for Pivot A2A algorithm

* enable pivot a2a by default
2022-08-05 19:40:19 -07:00
gilbertlee-amd dae11c2aca Disable clique AllReduce UnitTest (#595) 2022-08-04 18:30:00 -06:00
gilbertlee-amd 9ed9cd0e31 Fixing CMake to avoid unnecessary git_version relinking (#594) 2022-08-04 18:03:59 -06:00
arvindcheru 2cb2f9493a HIP Path default updated to ROCM_PATH (reorg path) (#592)
Updated default path for hip to ROCM_PATH (/opt/rocm instead of /opt/rocm/hip) as per new/current structure.
2022-08-04 13:38:41 -04:00
akolliasAMD 4cecdc9be5 minor latency tuning (#591)
* minor tuning for tree ll
2022-08-03 15:07:44 -06:00
Wenkai Du 9089e68a99 Revert "Use nontemporal in slow path and add XGMI sys type (#575)" (#590)
This reverts commit b250c01cbe.
2022-08-02 09:31:53 -07:00
Wenkai Du e2cb95a390 Add nccl_net.h to librccl-dev package (#589) 2022-07-29 13:39:49 -07:00
akolliasAMD 254208e7dd Fixed issue with attomicEXCH creating errors on multi node runs (#587) 2022-07-22 11:32:49 -06:00
akolliasAMD 686dbc8bc6 updated alltoallV test to reflect how send counts are done in perf tests (#586) 2022-07-21 14:59:34 -06:00
akolliasAMD 451c287aa6 Removing redundant LOAD and STORE on primitives plus adding some atomics (#585) 2022-07-21 13:04:57 -06:00
Hubert Lu 6dd090917a Merge pull request #580 from hubertlu-tw/develop
Enhancement of RCCL logging information for topology-aware utilities
2022-07-15 15:16:37 -07:00
Edgar Gabriel 58437544f8 Merge pull request #584 from edgargabriel/topic/signal-backtrace
intercept SIGUSR2 in RCCL
2022-07-15 11:31:19 -05:00
Edgar Gabriel 2b1d5d3bc1 intercept SIGUSR2 in RCCL
add support for intercepting SIGUSR2 in RCCL. This signal will
not terminate the execution of the application, but print the stacktrace
of the process that the signal was sent to instead.
2022-07-15 16:28:46 +00:00
akolliasAMD da31537ec7 Merge pull request #583 from yzygitzh/ziyyang/ll-fix
Remove redundant LOAD/STORE usage in LL initialization
2022-07-14 08:51:39 -06:00
Ziyue Yang 77c2bef952 Remove redundant LOAD/STORE usage in LL initialization 2022-07-14 00:40:36 +00:00
akolliasAMD 873c13b47a Merge pull request #582 from akolliasAMD/readmeUpdate
updated readme to reflect the newer tests
2022-07-13 12:28:30 -06:00
akolliasAMD 5950942738 updated readme to reflect the newer tests 2022-07-13 16:08:28 +00:00
Wenkai Du 314da5a485 README.md: add CMAKE_PREFIX_PATH to build steps (#581) 2022-07-12 11:32:07 -07:00
hubertlu-tw a1842df858 Enhancement of RCCL logging information for topology-aware utilities 2022-07-11 19:01:10 +00:00
Wenkai Du 8c3c8b78c0 Skip HDP cache flush for gfx90a (#578)
* Skip HDP cache flush for gfx90a

* Remove extra debug print
2022-07-08 10:13:32 -07:00
Wenkai Du aa0d7ca882 Add more constraints to enable GDR (#579)
* Add more constraints to enable GDR

* Revert deleted line
2022-07-08 09:52:27 -07:00
Yifan Xiong 80f53cc171 Reduce AlltoAll port usage in send/recv proxy (#577)
* Reduce AlltoAll port usage when connecting proxy

Reuse socket ports when connecting proxies in AlltoAll.

Existing port usage in AlltoAll is O(n) for recv and O(n) for send,
reusing socket ports in server or client side will make one of them
O(1), reusing both will reduce the total port usage to O(1) and enables
AlltoAll in >64 MI200 nodes.

* Update changelog accordingly

Update changelog accordingly.
2022-07-07 16:15:52 -07:00
Wenkai Du 2e65881a79 Revert "Adding the missing roc:: namespace (#570)" (#576)
This reverts commit d5bea2cfaa.
2022-07-06 10:07:35 -07:00
Wenkai Du b250c01cbe Use nontemporal in slow path and add XGMI sys type (#575)
* Use nontemporal in slow path and add XGMI sys type

* Clean up XGMI detection
2022-07-06 07:58:41 -07:00
Wenkai Du 00af1f64e9 Fix GPU to NIC mapping in tree (#573)
* Fix GPU to NIC mapping in tree

* Update tuning table
2022-07-03 20:52:52 -07:00
gilbertlee-amd a89a9966aa Adding git hash info to version output line (#572) 2022-06-28 16:42:51 -06:00
Dmitry Mikushin d5bea2cfaa Adding the missing roc:: namespace (#570)
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>

* Adding a changelog entry

* minor updates to wording

* missing period

Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
2022-06-27 11:44:43 -06:00
Wenkai Du 9a285b5e1d Do not set NET GDR level automatically (#571) 2022-06-23 16:28:28 -07:00
Wenkai Du c3bb9e70d0 Use different atomics to check flags in kernel (#568) 2022-06-23 09:16:41 -07:00
akolliasAMD 06f05300fe Merge pull request #569 from akolliasAMD/disableMultiRankTest
moved default number of max ranks per gpu to 1
2022-06-22 15:52:06 -04:00
akolliasAMD 8b9291eb47 moved default number of max ranks per gpu to 1 2022-06-22 17:37:49 +00:00
Ziyue Yang 6e93fafdc3 Add Feature - Add NPKit Support in RCCL (#564)
* apply npkit

* fix bug

* add npkit in readme
2022-06-20 14:30:19 -07:00
Wenkai Du f274c865c1 Change default nchannels per peer (#563) 2022-06-13 06:39:05 -07:00