Commit grafiek

1547 Commits

Auteur SHA1 Bericht Datum
Abhishek Kulkarni 595cda2ab9 GDR enablement logic fix for kernel 6.4.0+ (#1378)
[ROCm/rccl commit: 6178556853]
2024-11-03 01:20:07 -05:00
Bertan Dogancay 251df02d42 Increase MAX_STACK_SIZE for UT (#1398)
[ROCm/rccl commit: 984f1e4343]
2024-11-01 13:07:45 -04:00
corey-derochie-amd 8444e5fe7f Set minimum ROCm version for MSCCLPP to 6.2 (#1401)
* Added ROCm version check around setting `ENABLE_MSCCLPP` flag.

[ROCm/rccl commit: 6db2644766]
2024-10-30 16:48:54 -06:00
Avinash da3887bafb Memory leak fixes in hostside functions (#1388)
memory leak fixes for parseRome4P2H and ncclTopoAddGPU

[ROCm/rccl commit: d6006f0425]
2024-10-30 14:25:56 -05:00
Tim e346e19065 Adjustment for UT Sendrecv (#1400)
Enabled UT sendrecv to same rank and refactor UBR call

[ROCm/rccl commit: fd9924cfe7]
2024-10-30 15:13:53 -04:00
Nusrat Islam e1c20e7f24 ext-src: Improved allreduce performance in cpx mode for MI308 (#1393)
To get the improved performance for TP=4, the user needs to use
RCCL_MSCCL_FORCE_ENABLE=1 and MSCCLPP_READ_ALLRED=1. For TP=8, the
user should use MSCCLPP_HIERARCHICAL_ALLRED=1.

[ROCm/rccl commit: 0fb3b5eba9]
2024-10-30 08:30:15 -05:00
corey-derochie-amd af1e36a7ee Remove MSCCL switch case fall-through by adding break statement. (#1342)
[ROCm/rccl commit: ea20af698e]
2024-10-29 15:47:59 -06:00
corey-derochie-amd f9d38d8858 6.2 final documentation fixes updated for 6.3 (#1252) (#1399)
* Update CHANGELOG.md

* Update NOTICES.txt

* [DOCS] Note on using less than 8 MI300 GPUs



* Update README.md



---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 8ac63e7e70]
2024-10-29 15:23:45 -06:00
gilbertlee-amd 02bf3a3bf8 Adding support for odd nodes for model_87 (#1309)
[ROCm/rccl commit: 0cbce2a757]
2024-10-24 08:38:12 -06:00
corey-derochie-amd 1c700083b2 Update CHANGELOG to match release branches 6.2 and 6.3 (#1391)
* [CHANGELOG] Add Known issues for ROCm 6.2.1

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Updated 6.2.1 known issues to match the content in develop.

* Updated CHANGELOG for ROCm 6.3 release. (#1380)

* Updated CHANGELOG for ROCm 6.3 release.

* Update CHANGELOG to new format.

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 6ed513e1b9]
2024-10-23 13:49:40 -06:00
Arm Patinyasakdikul 928414ac06 Increased maximum number of XML nodes to support CPX mode. (#1386)
[ROCm/rccl commit: 29f87c7191]
2024-10-23 11:15:11 -05:00
Wenkai Du 075381ee2e Fix topology discovery in container with subset of GPUs (#1384)
* Fix topology discovery in container with subset of GPUs

* Move links counting out of loop

[ROCm/rccl commit: e0780ba4d4]
2024-10-22 13:50:23 -07:00
Bertan Dogancay fcb0b2da3f [Replayer] Add validation (#1387)
* Add validation to rccl_replayer

[ROCm/rccl commit: cfecce790f]
2024-10-22 10:41:08 -04:00
dependabot[bot] 64aead445c Bump rocm-docs-core from 1.8.2 to 1.8.3 in /docs/sphinx (#1385)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.8.2 to 1.8.3.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.8.2...v1.8.3)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 4685d3c546]
2024-10-21 10:05:58 -06:00
Bertan Dogancay 57710c1183 Dynamically select unroll factor to build for when targeting local arch (#1371)
* Dynamically select unroll factor to build for when targeting local arch only

[ROCm/rccl commit: 373f113524]
2024-10-21 10:53:11 -04:00
Wenkai Du 5ee84e0353 Increase CQ size to 3*MAX_REQUESTS (#1374)
* Increase CQ size to 3*MAX_REQUESTS

Suggested by Rukhsana Ansari <rukhsana.ansari@broadcom.com>

* Reword comments based on feedback from Rukhsana

[ROCm/rccl commit: 7c077db307]
2024-10-18 11:01:03 -07:00
akolliasAMD ad2c8c3eb8 added atomic acquire for gfx12 on prims_simple (#1382)
[ROCm/rccl commit: af5678641d]
2024-10-18 11:26:38 -06:00
Jeffrey Novotny a5cc8edd9b Add missing metadata information (#1381)
[ROCm/rccl commit: 4822fd47ca]
2024-10-16 13:26:12 -04:00
Sean Karlage 3eda60a031 static: Enable true rccl static library build (#1379)
* static: Enable true rccl static library build

Rccl uses `-fgpu-rdc` to compile, which requires a specialized link command in order to produce a true static library.

When "linking" with `amdclang++`, you need to use `--emit-static-lib` and `--hip-link` to get a static library with all gpu code generated.  Subsequent links with binaries do not need any special flags to generate gpu code.`

Building a static library:
```
$ cmake -DROCM_PATH=$ROCM_PATH -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=off -DCMAKE_POSITION_INDEPENDENT_CODE=on -DAMDGPU_TARGETS=gfx942 -DCMAKE_CXX_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang .. 2>&1 | tee -a /tmp/build.txt
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- Checking for ROCm support for GPU targets: gfx942
-- Compiling for gfx942
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- ROCM_PATH found: /opt/rocm
-- Compiling with amdclang++
-- HIP compiler:     clang
-- HIP runtime:      rocclr
-- amdclang++ executable: /opt/rocm/llvm/bin/amdclang++
-- amdclang++ version:    18.0.0git
-- hipconfig executable: /opt/rocm/bin/hipconfig
-- amdclang++ HIP version:    6.2.41133
-- ROCm version: 6.2.0
...
$ make -j 32
[  0%] Updating git_version.cpp if necessary
-- Updating git_version.cpp
[  0%] Built target git_version_check
[  0%] Hipifying src/transport/shm.cc -> /home/skarlage/local/rccl/build/hipify/src/transport/shm.cc
[  0%] Hipifying src/bootstrap.cc -> /home/skarlage/local/rccl/build/hipify/src/bootstrap.cc
[  0%] Hipifying src/channel.cc -> /home/skarlage/local/rccl/build/hipify/src/channel.cc
[  1%] Hipifying src/device/all_reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_reduce.h
[  1%] Hipifying src/device/broadcast.h -> /home/skarlage/local/rccl/build/hipify/src/device/broadcast.h
[  1%] Hipifying src/device/all_gather.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_gather.h
[  1%] Hipifying src/device/common.cu -> /home/skarlage/local/rccl/build/hipify/src/device/common.cu.cpp
[  1%] Hipifying src/debug.cc -> /home/skarlage/local/rccl/build/hipify/src/debug.cc
[  1%] Hipifying src/device/alltoall_pivot.h -> /home/skarlage/local/rccl/build/hipify/src/device/alltoall_pivot.h
[  1%] Hipifying src/device/network/unpack/unpack.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack.h
[  4%] Hipifying src/collectives.cc -> /home/skarlage/local/rccl/build/hipify/src/collectives.cc
[  4%] Hipifying src/device/msccl_kernel_impl.h -> /home/skarlage/local/rccl/build/hipify/src/device/msccl_kernel_impl.h
[  4%] Hipifying src/device/network/unpack/unpack_defs.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack_defs.h
[  4%] Hipifying src/device/op128.h -> /home/skarlage/local/rccl/build/hipify/src/device/op128.h
[  4%] Hipifying src/device/onerank.cu -> /home/skarlage/local/rccl/build/hipify/src/device/onerank.cu.cpp
[  4%] Hipifying src/device/common.h -> /home/skarlage/local/rccl/build/hipify/src/device/common.h
[  6%] Hipifying src/device/prims_ll.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll.h
[  6%] Hipifying src/device/primitives.h -> /home/skarlage/local/rccl/build/hipify/src/device/primitives.h
[  6%] Hipifying src/device/prims_ll128.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll128.h
[  6%] Hipifying src/device/reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce.h
[  7%] Hipifying src/device/common_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/common_kernel.h
[  7%] Hipifying src/device/reduce_scatter.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_scatter.h
[  7%] Hipifying src/device/sendrecv.h -> /home/skarlage/local/rccl/build/hipify/src/device/sendrecv.h
[  7%] Hipifying src/device/prims_simple.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_simple.h
[  7%] Hipifying src/enqueue.cc -> /home/skarlage/local/rccl/build/hipify/src/enqueue.cc
[  7%] Hipifying src/device/reduce_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_kernel.h
[  7%] Hipifying src/graph/connect.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/connect.cc
[  7%] Hipifying src/graph/rings.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.h
[  8%] Hipifying src/graph/rings.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.cc
[  8%] Hipifying src/graph/rome_models.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.cc
[  8%] Hipifying src/graph/rome_models.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.h
[  8%] Hipifying src/graph/paths.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/paths.cc
[  9%] Hipifying src/graph/search.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/search.cc
[  9%] Hipifying src/graph/topo.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/topo.cc
...
[100%] Linking CXX static library librccl.a
Elapsed time: 270 s. (time), 0.00046 s. (clock)
Elapsed time: 0 s. (time), 0.000342 s. (clock)
[100%] Built target rccl
```
Static rccl exists:
```
$ file librccl.a
librccl.a: current ar archive
```

* Fix up tests Cmake for static builds

We also need to fix up the tests CMakeLists.txt to:
* Remove the unused `BUILD_STATIC` option
* Use `SHARED_LIBS` as a definition of whether we're building static or
  not.

[ROCm/rccl commit: bdf9544c81]
2024-10-16 06:58:50 -07:00
Wenkai Du bd0cdf5a50 Add back missing net flush (#1376)
[ROCm/rccl commit: c8d3543d3f]
2024-10-15 08:12:26 -07:00
Wenkai Du 5f8571dcbc msccl: disable 1-shot xmls (#1375)
MSCCL 1-shot xmls may cause different output values on different ranks.
Disabling them for now to avoid undefined behavior in applications.

[ROCm/rccl commit: 62d10fdc25]
2024-10-14 15:10:53 -07:00
Wenkai Du 9ad1fe571b Temporarily disable MSCCL all gather XMLs due to UT failure (#1373)
[ROCm/rccl commit: a680e329e6]
2024-10-12 08:43:16 -07:00
Wenkai Du 09acdb6b49 Allow zero byte sendrecv in alltoallv (#1349)
* Allow zero byte sendrecv in alltoallv

* Fix previous merge error

[ROCm/rccl commit: 821d2e1f30]
2024-10-11 10:40:32 -07:00
Wenkai Du 4cd1b3a9f5 Improve model matching for GPUs with alltoall XGMI connection (#1372)
[ROCm/rccl commit: 5c367a21d0]
2024-10-11 09:53:14 -07:00
Arm Patinyasakdikul ef54dd7cbc Increase default number of channels for MI300A in multi-node scenario. (#1366)
This commit changed the default of channels of MI300A from 8 upto 24.
This helps bring up multi-node performance to the expected level.

[ROCm/rccl commit: 133ea201cf]
2024-10-11 11:37:48 -05:00
Wenkai Du 1b988c1b31 Fix crash when PXN is enabled on some platforms (#1369)
[ROCm/rccl commit: b55b6be0cb]
2024-10-11 09:02:59 -07:00
Nusrat Islam 5545392913 ext-src: Fix compiler warnings for MSCCLPP integration (#1368)
[ROCm/rccl commit: 6160603d4c]
2024-10-10 08:20:02 -05:00
Nilesh M Negi 912e9f4b61 [BUILD] Simplify CMake args for building MSCCLPP (#1363)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 364a6c2130]
2024-10-09 23:52:04 -05:00
Nilesh M Negi 04d9a98c8e [BUILD] Require use of Python3 interpreter (#1367)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 41a2c02773]
2024-10-09 22:36:50 -05:00
Nusrat Islam f61053dcba Add a custom allreduce algorithm in MSCCLPP for cpx mode (#1362)
* cmake: remove mscclpp patch after build is complete

To enable mscclpp in cpx mode, a patch cpx.patch needs to be applied.
This patch can be removed after building is done. This helps with the
build process the following time.

* Use read-based mscclpp allreduce from rccl

MSCCLPP by default uses remote write in the allreduce kernel for
large (> 1MB) messages. This PR adds an allreduce kernel that uses
remote read. It needs the users to use an environment variable
MSCCLPP_READ_ALLRED=1.

[ROCm/rccl commit: 4d68751ce1]
2024-10-08 14:42:12 -05:00
corey-derochie-amd 35d98330f2 Only set minNchannels if we are actually using MSCCL, checked using comm->mscclCompatible. (#1337)
[ROCm/rccl commit: c11f6b1531]
2024-10-08 10:20:55 -06:00
akolliasAMD 949fdd027b disabled wbinvl1 for gfx9x on ll128 (#1365)
[ROCm/rccl commit: bc519fd733]
2024-10-08 08:43:29 -06:00
Nilesh M Negi cd29f1e22f [TRANSPORT] Add RCCL_FORCE_ENABLE_GDRDMA for debugging (#1356)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 8ad76f8d10]
2024-10-06 18:43:49 -05:00
akolliasAMD 9c4ac4cae5 Regression timing fix (#1361)
* Removed testbed initialization on standalone tests
* .jenkins renabled all tests

[ROCm/rccl commit: 7fb9189760]
2024-10-03 10:41:26 -06:00
Bertan Dogancay 974c13cd62 [BUILD] Move code generation to python from CMake (#1360)
* Use generate.py for func generation

* Convert AddUnroll.cmake to bash

[ROCm/rccl commit: 2dd10c8f17]
2024-10-03 10:21:19 -04:00
dependabot[bot] 152738dcc9 Bump rocm-docs-core from 1.7.2 to 1.8.2 in /docs/sphinx (#1348)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.7.2 to 1.8.2.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.8.2/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.7.2...v1.8.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 038517b169]
2024-10-02 16:33:26 -06:00
Bertan Dogancay b915c3c154 Merge pull request #1358 from BertanDogancay/nccl-2.21-sync
[ROCm/rccl commit: 833b185a2d]
2024-10-02 18:21:06 -04:00
Nusrat Islam 1f7945286c Enable MSCCLPP use in CPX mode (#1355)
This PR enables the use of MSCCLPP in CPX mode for 8 GPUs.

[ROCm/rccl commit: d13f9c44f5]
2024-10-02 11:52:04 -05:00
BertanDogancay 9059445acb Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 84081064a0]
2024-10-02 09:31:25 -05:00
Wenkai Du 74aa13afbe Add another Rome model (#1354)
[ROCm/rccl commit: e453f1ced9]
2024-10-01 17:41:27 -05:00
Ziyue Yang cf980e9b9c Fix size matching in MSCCL (#1318)
[ROCm/rccl commit: 7830af5844]
2024-10-01 13:32:41 -07:00
Nilesh M Negi efc500d2ff [CI] Temporarily disable RCCL UT Standalone.RegressionTiming in CI (#1350)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 8b3ed8f104]
2024-09-27 14:08:36 -05:00
corey-derochie-amd d5a2245a40 Checkout submodules with shallow depth (#1353)
* Make submodules shallow

* Updated README for the shallow checkout changes.

[ROCm/rccl commit: 7231808c58]
2024-09-27 11:07:16 -06:00
spolifroni-amd dd884f00c0 Merge pull request #1345 from ROCm/spolifroni-amd/update-changelog
Updated  6.2.1 changelog so that it reflects what's in the 6.2.1 RN

[ROCm/rccl commit: 06a0ddb3b4]
2024-09-27 10:15:30 -04:00
Mustafa Abduljabbar ef6d75b3ee MSCCL Multithreaded regression root cause fix (#1347)
* Make sure the target device is used for MSCCL

* Enable single process mode by default to use MSCCL in MT

* Create a per-rank state when GPUs share a thread

[ROCm/rccl commit: 03a3ef3c34]
2024-09-25 15:24:25 -04:00
Nilesh M Negi 21a3b242bf [TRANSPORT] GDRDMA enablement for linux kernel 6.4.0 or newer (#1328)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 105ff1611f]
2024-09-25 11:29:52 -05:00
Tim 94ac752578 Remove 0 size UBR (#1346)
ncclCommRegister, required for UBR, will call IB dmabuf regMr directly which forbids 0 size message

[ROCm/rccl commit: 40e93ebc29]
2024-09-24 18:16:51 -04:00
Nilesh M Negi 56bc01cb83 [BUILD] Enable MSCCL++ for gfx942 variants (#1344)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 3c61e934f2]
2024-09-23 19:05:49 -05:00
Sandra Polifroni 53478f138e Updated the information for 6.2.1 in the changelog so that it reflects what's in the 6.2.1 release notes
[ROCm/rccl commit: 7f87b0cd85]
2024-09-23 14:27:58 -04:00
Nilesh M Negi 60ee54839c Add Dockerfile to build rccl and rccl-tests (#1011)
* [BUILD] Add Dockerfile for RCCL and RCCL-Tests

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Update docker/Dockerfile.ubuntu

Typo for LD_LIBRARY_PATH

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update docker/Dockerfile.ubuntu

use `-b` for `git clone` instead of additional `git checkout`

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update docker/Dockerfile.ubuntu

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 707377b3cd]
2024-09-22 03:53:16 -05:00