2
0
Gráfico de cometimentos

900 Cometimentos

Autor(a) SHA1 Mensagem Data
Edgar Gabriel 3fd090c5ba Merge pull request #633 from edgargabriel/topic/topo-binary-tree
introduce a hw topology aware bintree

[ROCm/rccl commit: 4972c129e3]
2022-10-05 17:06:54 -05:00
Edgar Gabriel f2736a4fb3 introduce a hw topology aware bintree
for hayabusa architecture.


[ROCm/rccl commit: e645b02cd8]
2022-10-03 15:26:21 +00:00
gilbertlee-amd 9225ea766e Removing TransferBench from tools (#632)
Point to new TransferBench repo

[ROCm/rccl commit: bd7d589446]
2022-09-30 11:53:32 -06:00
akolliasAMD dbbdf65020 Added new gpu targets (#631)
[ROCm/rccl commit: ef71550738]
2022-09-29 14:53:55 -06:00
Wenkai Du 07a0adf1d6 Another threadfence and flags rework (#629)
[ROCm/rccl commit: a523b37ac7]
2022-09-28 16:49:29 -07:00
Wenkai Du f6da79844a Add LL128 tuning (#630)
[ROCm/rccl commit: 021932b3c8]
2022-09-27 09:39:09 -07:00
Wen-Heng (Jack) Chung 27d27e971b Merge pull request #619 from whchung/exp_reduce_code_size
Only use split tree algorithm to reduce kernel code size.

[ROCm/rccl commit: e8af0716c4]
2022-09-26 10:06:27 -05:00
Wen-Heng (Jack) Chung bd19566413 Merge pull request #621 from whchung/exp_reduce_sleep_cycles
Reduce s_sleep cycles

[ROCm/rccl commit: 35f1fe3434]
2022-09-23 15:31:16 -05:00
Wen-Heng (Jack) Chung 183f1e6b32 Merge pull request #624 from whchung/exp_tweak_unroll_factors
Tweak unroll factors.

[ROCm/rccl commit: a80cc7e6e1]
2022-09-23 11:30:05 -05:00
Wen-Heng (Jack) Chung dcf3946826 Merge pull request #625 from whchung/exp_sync_lds
Abolish syncthreads and only wait on LDS traffic.

[ROCm/rccl commit: a08a24e042]
2022-09-23 11:29:48 -05:00
Wen-Heng (Jack) Chung 7cde92deff Tweak unroll factors.
[ROCm/rccl commit: 84054c3b30]
2022-09-22 13:03:04 -05:00
Wenkai Du abdc365a05 Only use split tree algorithm to reduce kernel code size.
[ROCm/rccl commit: 02929cffb6]
2022-09-22 12:01:53 -05:00
Wenkai Du 0b56e397cc Reduce s_sleep cycles
[ROCm/rccl commit: a3c8ef8c03]
2022-09-22 12:01:12 -05:00
Wen-Heng (Jack) Chung 975642e7ee Abolish syncthreads and only wait on LDS traffic.
[ROCm/rccl commit: b9ae02d4ad]
2022-09-22 12:00:37 -05:00
Wenkai Du e4d46a0f64 Rework threadfence and flag setting (#627)
[ROCm/rccl commit: 49c811ecf9]
2022-09-22 08:35:42 -07:00
Wenkai Du 81c71aeb67 Revert changes to gfx1030 (#622)
[ROCm/rccl commit: d9216af48b]
2022-09-20 20:06:17 -07:00
Wenkai Du 98609a7b92 Define ncclShmem as global shared (#618)
* Use global defined shared memory

* Add --hipcc-func-supp to compile option

* Force inline some device functions

* Add back threadfence

[ROCm/rccl commit: 9e6c87a2bf]
2022-09-20 09:00:20 -07:00
Edgar Gabriel 1a8709086d Merge pull request #617 from edgargabriel/binary-tree-2.13.4
make binary tree work on 2.13.4

[ROCm/rccl commit: 05cc7bd850]
2022-09-14 20:30:11 -05:00
Edgar Gabriel 95d6ed2154 make binary tree work on 2.13.4
[ROCm/rccl commit: 8f3219dbd4]
2022-09-15 00:01:54 +00:00
Wenkai Du 069af6f7c3 Merge pull request #612 from ROCmSoftwarePlatform/2.13.4
2.13.4

[ROCm/rccl commit: 8f5507e047]
2022-09-13 21:33:29 -07:00
Wenkai Du ba6e2db70d Misc fixes and disable binTree
[ROCm/rccl commit: a06e14e39b]
2022-09-14 00:26:19 +00:00
Edgar Gabriel 4a86adcaba Update init.cc
[ROCm/rccl commit: e5d2dfed34]
2022-09-13 17:29:32 -05:00
Edgar Gabriel 4c17f4dcc1 Merge branch 'develop' into 2.13.4
[ROCm/rccl commit: be935d7ce7]
2022-09-13 17:19:04 -05:00
Edgar Gabriel 3225ee7cd0 Merge pull request #615 from edgargabriel/topic/two-trees
add binary tree

[ROCm/rccl commit: ea8120a346]
2022-09-13 16:50:45 -05:00
Edgar Gabriel 7148c0aa7b add binary tree
In addition, introduce the ability to have 2 trees at the same time.
Only for allreduce at the moment.


[ROCm/rccl commit: 65e2ae20e5]
2022-09-13 20:52:32 +00:00
Gilbert Lee 1d24c476f4 Merge branch 'develop' into 2.13.4
[ROCm/rccl commit: 009e79623f]
2022-09-09 23:07:04 +00:00
gilbertlee-amd 35872115f8 Updating stream caching (#614)
- Adding non-captured hipStream for use in setup

[ROCm/rccl commit: dd56135a9a]
2022-09-09 16:30:15 -06:00
gilbertlee-amd af71be44f1 GraphBench (#613)
Adding simple GraphBench tool for comparing RCCL hipGraph performance

[ROCm/rccl commit: 65d78e9a1d]
2022-09-09 12:12:25 -06:00
Wenkai Du 7874a99c75 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: a79d9e3586]
2022-09-09 16:05:38 +00:00
Wenkai Du fe99249cde Enable LL128 protocol support (#605)
* Enable LL128 protocol support

* Use shared memory object directly when possible

[ROCm/rccl commit: 7bbce085cc]
2022-09-08 14:45:27 -07:00
Lauren Wrubleski 3da06e4704 Update ubuntu18 to ubuntu20 (#611)
[ROCm/rccl commit: d700a94918]
2022-09-07 16:02:37 -06:00
Min Si 25ba51fe83 Fix compilation issues with buck (#610)
* Fix compilation warning with -Wmisleading-indentation

When compile with -Wmisleading-indentation, it reports warning:
misleading indentation; statement is not part of the previous 'if'

This patch fixes it

* Avoid relative include file path

We don't need relative include file paths for src/graph/*.h
since src/ is already in CMake include_directories

[ROCm/rccl commit: 2b57751abb]
2022-09-07 09:56:05 -06:00
gilbertlee-amd 616cb39a0b Adding opt-in hipGraph support for RCCL via RCCL_ENABLE_HIPGRAPH (#608)
Adding opt-in hipGraph support via RCCL_ENABLE_HIPGRAPH

[ROCm/rccl commit: 47b2fc3a30]
2022-09-06 10:29:46 -06:00
akolliasAMD 2cd63dac42 added stream synch after hipMemset (#609)
[ROCm/rccl commit: 06bce9d0c9]
2022-08-30 16:18:37 -06:00
Wenkai Du f18868f439 Use hipExtLaunchKernel when not using graph and not in group mode (#606)
[ROCm/rccl commit: c9f2fe1f65]
2022-08-26 13:40:37 -07:00
akolliasAMD 151a8ef56a git_version cmake consistency changes (#604)
* git_version cmake variable consistency changes

[ROCm/rccl commit: 6670dc95ab]
2022-08-25 15:11:28 -06:00
Edgar Gabriel 22dcbed61b Merge pull request #603 from edgargabriel/topic/float16_unit_tests
introduce support for ncclFloat16/half in UT

[ROCm/rccl commit: 8a311583e0]
2022-08-25 07:40:20 -05:00
Edgar Gabriel b32b819151 introduce support for ncclFloat16/half in UT
[ROCm/rccl commit: f6e00dec13]
2022-08-24 15:28:24 +00:00
Edgar Gabriel 6bb871c986 Merge pull request #598 from edgargabriel/topic/tree-multirank
Expand ncclTreeBasePostset for multi-rank

[ROCm/rccl commit: e739c62a53]
2022-08-24 08:28:34 -05:00
Wenkai Du 56ea2c4be5 Use non-temporal access for slow path (#602)
[ROCm/rccl commit: 88487a62bb]
2022-08-23 08:21:51 -07:00
Edgar Gabriel aa6d450f35 fix channelcount for multi-rank scenario
[ROCm/rccl commit: 4141ec1151]
2022-08-22 19:09:22 +00:00
akolliasAMD 1d55fe756c Simple tree changes (#599)
changed treebase to create basic balanced tree

[ROCm/rccl commit: 3c1b1ec8c8]
2022-08-19 13:51:49 -06:00
Edgar Gabriel 27cb7d2b20 Merge pull request #601 from CosmicFusion/patch-2
fix error: use of undeclared identifier 'free'

[ROCm/rccl commit: 6fba80208c]
2022-08-19 14:10:49 -05:00
Cosmic Fusion 1dcf1da5ca fix error: use of undeclared identifier 'free'
include stdlib.h to fix compilation error in rccl :

[39/58] Building CXX object CMakeFiles/rccl.dir/src/misc/signals.cc.o
FAILED: CMakeFiles/rccl.dir/src/misc/signals.cc.o 
/opt/rocm/bin/hipcc -DENABLE_COLLTRACE -DHAVE_BFD -DHAVE_CPLUS_DEMANGLE -DUSE_ROCM_SMI64CONFIG -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Drccl_EXPORTS -I/home/cosmo/build/flgrwqa/build/include -I/home/cosmo/build/flgrwqa/build/include/rccl -I/home/cosmo/build/flgrwqa/rccl/src -I/home/cosmo/build/flgrwqa/rccl/src/include -I/home/cosmo/build/flgrwqa/rccl/src/collectives -I/home/cosmo/build/flgrwqa/rccl/src/collectives/device -I/opt/hsa/include -fPIC -fvisibility=hidden -fgpu-rdc -parallel-jobs=8 -Wno-format-nonliteral -x hip --offload-arch=gfx803 --offload-arch=gfx900:xnack- --offload-arch=gfx906:xnack- --offload-arch=gfx908:xnack- --offload-arch=gfx90a:xnack- --offload-arch=gfx90a:xnack+ --offload-arch=gfx1030 -std=c++14 -MD -MT CMakeFiles/rccl.dir/src/misc/signals.cc.o -MF CMakeFiles/rccl.dir/src/misc/signals.cc.o.d -o CMakeFiles/rccl.dir/src/misc/signals.cc.o -c /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc
In file included from /home/cosmo/build/flgrwqa/rccl/src/misc/signals.cc:8:
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:138:9: error: use of undeclared identifier 'free'
        free(file->syms);
        ^
/home/cosmo/build/flgrwqa/rccl/src/include/BfdBacktrace.hpp:155:5: error: use of undeclared identifier 'free'
    free(file->syms);
    ^

[ROCm/rccl commit: 080fc2d9d6]
2022-08-19 20:25:06 +03:00
Wenkai Du c2e9ada40b Repurpose profiling implementation to simple timestamps tracing (#600)
[ROCm/rccl commit: 14b8ff153f]
2022-08-18 15:34:46 -07:00
Ching-Hsiang Chu c9a50a9ec5 fix NCCL_DEBUG_FILE
Summary: NCCL_DEBUG_FILE does not work properly since the recent v2.13.4 updates (https://github.com/NVIDIA/nccl/pull/682) because it nows sets `ncclDebugLevel` after parse `NCCL_DEBUG_FILE`. This patch move parsing `tempNcclDebugLevel` before processing `NCCL_DEBUG_FILE` to ensure `NCCL_DEBUG_FILE` is parsed only when `NCCL_DEBUG > NCCL_LOG_VERSION` (same as previous behavior)

Differential Revision: D38415208

fbshipit-source-id: 5689bbb798e73efb9e8594557666987f07e89a30


[ROCm/rccl commit: e1d9b273b0]
2022-08-18 11:50:42 +02:00
Wenkai Du 6c3f1366e8 Add XGMI sys type and clean up detection code (#597)
[ROCm/rccl commit: f5c0b243a8]
2022-08-12 09:52:29 -07:00
Ziyue Yang 478d8312b8 Improve alignment and tuning for Pivot A2A algorithm (#593)
* Improve alignment and tuning for Pivot A2A algorithm

* enable pivot a2a by default

[ROCm/rccl commit: f6b9686482]
2022-08-05 19:40:19 -07:00
gilbertlee-amd e3b832f4ce Disable clique AllReduce UnitTest (#595)
[ROCm/rccl commit: dae11c2aca]
2022-08-04 18:30:00 -06:00
gilbertlee-amd b350916a6e Fixing CMake to avoid unnecessary git_version relinking (#594)
[ROCm/rccl commit: 9ed9cd0e31]
2022-08-04 18:03:59 -06:00