Wenkai Du
446c8cbf66
msccl: reduce debug output when using NCCL_DEBUG=INFO ( #932 )
...
[ROCm/rccl commit: fb0eccb57b ]
2023-10-25 08:05:19 -07:00
Wen-Heng (Jack) Chung
769f00db5c
Introduce allgather for MSCCL on 8 sockets up to 320KB. ( #931 )
...
[ROCm/rccl commit: bfb8642450 ]
2023-10-24 18:41:12 -05:00
Wen-Heng (Jack) Chung
89a8493ef8
Introduce allgather MSCCL XML specification for MI250X up to 320KB. ( #930 )
...
[ROCm/rccl commit: 3f9ffe4788 ]
2023-10-24 18:35:55 -05:00
Wen-Heng (Jack) Chung
fc2a13c077
Introduce 1-shot allreduce for MI250X Hayabusa. ( #929 )
...
[ROCm/rccl commit: 72d5fbddfd ]
2023-10-24 16:31:18 -05:00
Wenkai Du
cc4de02a86
Add missing gfx942 support ( #927 )
...
[ROCm/rccl commit: c4e65fd382 ]
2023-10-23 12:04:37 -07:00
akolliasAMD
bc7df769a2
AllReduceTests,fixed the number of roots ( #925 )
...
[ROCm/rccl commit: d8dc282eeb ]
2023-10-20 10:25:11 -06:00
dependabot[bot]
187e9c1958
Bump urllib3 from 1.26.17 to 1.26.18 in /docs/sphinx ( #921 )
...
Bumps [urllib3](https://github.com/urllib3/urllib3 ) from 1.26.17 to 1.26.18.
- [Release notes](https://github.com/urllib3/urllib3/releases )
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst )
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.17...1.26.18 )
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: b173c13831 ]
2023-10-20 10:15:42 -06:00
searlmc1
40b869acca
Merge pull request #926 from ROCmSoftwarePlatform/searlmc1-patch-1
...
Remove quotes causing asan build breakage
[ROCm/rccl commit: dd5f01aeaf ]
2023-10-20 07:52:39 -07:00
searlmc1
212453b2fb
Remove quotes causing asan build breakage
...
The quotes around "-fsanitize=address -shared-libasan" cause the BUILD_ADDRESS_SANITIZER build to fail; remove the quotes
[ROCm/rccl commit: f59de10524 ]
2023-10-19 16:13:39 -07:00
Bertan Dogancay
1a538d0218
Update install.sh --fast and README ( #924 )
...
[ROCm/rccl commit: 3807c203fc ]
2023-10-19 16:35:10 -06:00
Wenkai Du
6f0f614d0b
Remove LDS based software barriers from MSCCL ( #923 )
...
[ROCm/rccl commit: dbb5611a3a ]
2023-10-19 16:39:41 -05:00
Wenkai Du
edeea499b5
Update rome models ( #922 )
...
[ROCm/rccl commit: 4278a9918b ]
2023-10-18 17:28:01 -07:00
Wen-Heng (Jack) Chung
49e52e7269
Introduce 1pass allreduce. Tailor it for very small message sizes <= 20KB. ( #919 )
...
[ROCm/rccl commit: 341926c60a ]
2023-10-16 16:31:08 -05:00
Wenkai Du
e0cc7de446
NPKit: add xcc_id field ( #918 )
...
[ROCm/rccl commit: 39812ce757 ]
2023-10-13 15:24:59 -07:00
Wenkai Du
c0bd012e6c
Fix incorrect arch name parsing ( #916 )
...
[ROCm/rccl commit: 1b80d041cb ]
2023-10-13 10:01:11 -07:00
Wenkai Du
102f0165d6
Port init_once fix from NCCL ( #915 )
...
[ROCm/rccl commit: 6d0b5c1e89 ]
2023-10-13 08:01:12 -07:00
dependabot[bot]
376de87fa9
Bump rocm-docs-core from 0.25.0 to 0.26.0 in /docs/sphinx ( #917 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.25.0 to 0.26.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.25.0...v0.26.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: f7e530259d ]
2023-10-13 08:27:04 -06:00
Wen-Heng (Jack) Chung
dfa0d98f9e
Change MSCCL kernel signature to allow kernel arguments be preloaded via SGPR ( #911 )
...
* Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite (#895 )
Co-authored-by: Pedram Alizadeh <pmohamma@banff-pla-r27-05.pla.dcgpu >
* Only build gfx941
* demo
* fine tune malloc
* Fix merge errors
* Fix merge errors
* Disable parallel build
* Adopt --amdgpu-kernarg-preload-count
* Revert "Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite (#895 )"
This reverts commit f5e252dddf02a41b4d1bc512f306f45f97166304.
* Revert CMake changes.
* NPKIT changes.
* Remove some license declarations.
* Address code review feedbacks on msccl_kernel_impl.h
* Update CMakeLists.txt
* Add CMake logic to check the existence of --amdgpu-kernarg-preload-count
* Fix NPKIT trace logic.
---------
Co-authored-by: Pedram Alizadeh <pmohamma@amd.com >
Co-authored-by: Pedram Alizadeh <pmohamma@banff-pla-r27-05.pla.dcgpu >
Co-authored-by: Ziyue Yang <ziyyang@microsoft.com >
[ROCm/rccl commit: 7ee5c1c28b ]
2023-10-12 20:17:08 -05:00
mberenjk
9a0c9ba3e9
adding cuda support for EmptyKernelTest ( #913 )
...
[ROCm/rccl commit: 7e2d905376 ]
2023-10-11 14:11:12 -05:00
dependabot[bot]
5096358a70
Bump gitpython from 3.1.35 to 3.1.37 in /docs/sphinx ( #912 )
...
Bumps [gitpython](https://github.com/gitpython-developers/GitPython ) from 3.1.35 to 3.1.37.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases )
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES )
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.35...3.1.37 )
---
updated-dependencies:
- dependency-name: gitpython
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: 01d9da8046 ]
2023-10-10 15:43:49 -06:00
gilbertlee-amd
c1a7b56b9b
Adding a simple EmptyKernelTest to measure launch latency ( #910 )
...
[ROCm/rccl commit: 7dbf47e07b ]
2023-10-04 17:22:48 -06:00
Bertan Dogancay
6f7965796f
Revert "Remove 2H4P condition from P2P channels adjustment ( #890 )" ( #904 )
...
This reverts commit 057e30e705 .
[ROCm/rccl commit: a6ff4618c7 ]
2023-10-04 09:46:11 -06:00
dependabot[bot]
c0a707ea50
Bump rocm-docs-core from 0.24.2 to 0.25.0 in /docs/sphinx ( #909 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.24.2 to 0.25.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.24.2...v0.25.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: f2600af812 ]
2023-10-04 09:14:59 -06:00
dependabot[bot]
928cf93c4b
Bump urllib3 from 1.26.15 to 1.26.17 in /docs/sphinx ( #906 )
...
Bumps [urllib3](https://github.com/urllib3/urllib3 ) from 1.26.15 to 1.26.17.
- [Release notes](https://github.com/urllib3/urllib3/releases )
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst )
- [Commits](https://github.com/urllib3/urllib3/compare/1.26.15...1.26.17 )
---
updated-dependencies:
- dependency-name: urllib3
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: 4b4e7ecdf9 ]
2023-10-03 15:54:37 -06:00
akolliasAMD
1ffd3eff31
Dma buf support optin ( #905 )
...
* dmaBufSupport Optin added on every part of the code that should invoke it
[ROCm/rccl commit: 28d7fe5629 ]
2023-10-03 03:17:48 -06:00
Edgar Gabriel
267faf9d45
Merge pull request #899 from edgargabriel/topic/disable-bfd-by-default
...
turn bfd compilation off by default
[ROCm/rccl commit: c90ef5f035 ]
2023-10-01 09:40:05 -05:00
Edgar Gabriel
e6c3e9fd8e
turn bfd compilation off by default
...
revert the logic to ensure that we are not accidentally creating
a dependency on the bfd libraries when deploying rccl binaries.
[ROCm/rccl commit: 88a55cef83 ]
2023-09-29 20:25:33 +00:00
akolliasAMD
12b2fc9774
install.sh fix ( #903 )
...
[ROCm/rccl commit: a773def279 ]
2023-09-29 07:42:17 -06:00
Cen Zhao
d3c20a1210
Update install.sh to take "--static" option ( #894 )
...
* Update install.sh to take "--static" option
* Fix static build errors
---------
Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com >
[ROCm/rccl commit: fb57a438d7 ]
2023-09-27 12:45:21 -04:00
Bertan Dogancay
b35ea4bd78
Modify All-To-All doc ( #896 )
...
* Modify All-To-All doc
* Update nccl.h.in
* update unit-tests
---------
Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com >
[ROCm/rccl commit: c1f57a7041 ]
2023-09-27 12:45:21 -04:00
dependabot[bot]
01c72d16d5
Bump gitpython from 3.1.34 to 3.1.35 in /docs/sphinx ( #898 )
...
Bumps [gitpython](https://github.com/gitpython-developers/GitPython ) from 3.1.34 to 3.1.35.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases )
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES )
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.34...3.1.35 )
---
updated-dependencies:
- dependency-name: gitpython
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: 50bc92f1d5 ]
2023-09-27 12:45:21 -04:00
dependabot[bot]
2c5a37a6b1
Bump cryptography from 41.0.3 to 41.0.4 in /docs/sphinx ( #897 )
...
Bumps [cryptography](https://github.com/pyca/cryptography ) from 41.0.3 to 41.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst )
- [Commits](https://github.com/pyca/cryptography/compare/41.0.3...41.0.4 )
---
updated-dependencies:
- dependency-name: cryptography
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: 1bbc3742b0 ]
2023-09-27 12:45:21 -04:00
Pedram Alizadeh
279da575be
Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite ( #895 )
...
[ROCm/rccl commit: 3f6c2b9b32 ]
2023-09-27 12:44:36 -04:00
akolliasAMD
6f7eb65308
changed the form that RCCL_TREE uses ( #888 )
...
* changed the form that RCCL_TREE uses
[ROCm/rccl commit: b85d73c02e ]
2023-09-15 15:01:33 -06:00
Wenkai Du
3cc41809dd
Reduce NPKit latency overhead in MSCCL kernel ( #893 )
...
* Reduce NPKit latency overhead in MSCCL kernel
* Fix build error without NPKit enable
[ROCm/rccl commit: 26e982d913 ]
2023-09-15 13:28:26 -07:00
Wenkai Du
057e30e705
Remove 2H4P condition from P2P channels adjustment ( #890 )
...
[ROCm/rccl commit: 16dd05a58a ]
2023-09-13 12:54:21 -07:00
Ziyue Yang
6d593761dc
Add single-node MI300X topology ( #889 )
...
[ROCm/rccl commit: c1bfd5f0d8 ]
2023-09-13 11:07:17 -07:00
akolliasAMD
8685535346
Fixed topo_expl ( #891 )
...
[ROCm/rccl commit: 762a42859e ]
2023-09-13 12:05:35 -06:00
Wenkai Du
b0a16d80ff
Fix crash when NPKit is enabled ( #887 )
...
[ROCm/rccl commit: 6a4d5ec089 ]
2023-09-13 11:00:12 -07:00
Audrey MP
2e3d45a53a
Gcn arch name ( #886 )
...
We use CMake to determine if we're compiling against a version of ROCm that supports gcnArchName and handles architecture checking appropriately. It includes a few helper functions as drop ins for the functionality we used gcnArch for before; sometimes to enable flags, and sometimes to set frequencies.
[ROCm/rccl commit: e58ec78d35 ]
2023-09-12 15:34:40 -04:00
Andy li
43a9fd00ee
enable hip graph on multi-node ( #884 )
...
* initial checkin
* enable msccl when hip graph is on
* remove the commented out code of msccl enable check
* clean up the code
* remove the msccl HighestTransportType check logic
[ROCm/rccl commit: e1dc4d5e42 ]
2023-09-11 15:30:04 -07:00
Nusrat Islam
e0ddc8f549
Merge pull request #880 from nusislam/msccl-npkit
...
msccl: add NPKIT profiling for MSCCL send-recv
[ROCm/rccl commit: e46602e44a ]
2023-09-08 14:13:14 -05:00
Nusrat Islam
ffbfe43500
msccl: add NPKIT profiling for MSCCL send-recv
...
[ROCm/rccl commit: a283f55f12 ]
2023-09-08 13:11:16 -05:00
dependabot[bot]
ae27ee7108
Bump rocm-docs-core from 0.22.0 to 0.24.0 in /docs/sphinx ( #882 )
...
* Bump rocm-docs-core from 0.22.0 to 0.24.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.22.0 to 0.24.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.22.0...v0.24.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
* Update requirements.in
* Update requirements.txt
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com >
[ROCm/rccl commit: a893e8a4ab ]
2023-09-07 11:27:53 -06:00
dependabot[bot]
ecd3fb42b0
Bump gitpython from 3.1.32 to 3.1.34 in /docs/sphinx ( #879 )
...
Bumps [gitpython](https://github.com/gitpython-developers/GitPython ) from 3.1.32 to 3.1.34.
- [Release notes](https://github.com/gitpython-developers/GitPython/releases )
- [Changelog](https://github.com/gitpython-developers/GitPython/blob/main/CHANGES )
- [Commits](https://github.com/gitpython-developers/GitPython/compare/3.1.32...3.1.34 )
---
updated-dependencies:
- dependency-name: gitpython
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: 62a09100a6 ]
2023-09-06 14:08:45 -06:00
Bertan Dogancay
2aa31c89df
RCCL should use hipPointerAttribute_t.type ( #872 )
...
[ROCm/rccl commit: 6230b5f6b3 ]
2023-09-05 09:44:12 -06:00
Wenkai Du
009990efca
Remove --hipcc-func-supp with recent compilers ( #874 )
...
* Remove --hipcc-func-supp with recent compilers
* Remove HIP_UNCACHED_MEMORY deetction from header file
[ROCm/rccl commit: 2baca3a55a ]
2023-09-01 07:53:18 -07:00
dependabot[bot]
6ec15d550d
Bump rocm-docs-core from 0.21.0 to 0.22.0 in /docs/sphinx ( #875 )
...
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.21.0 to 0.22.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/v0.22.0/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.21.0...v0.22.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
[ROCm/rccl commit: e642681fd3 ]
2023-09-01 08:46:43 -06:00
Wenkai Du
be412b848b
Update ll_latency_test and add CUDA version ( #873 )
...
[ROCm/rccl commit: c6dd6f6237 ]
2023-08-30 16:29:42 -07:00
dependabot[bot]
29b01e4b3b
Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx ( #870 )
...
* Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core ) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases )
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md )
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
* replace noCI with ci:docs-only label
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com >
[ROCm/rccl commit: a433fcc726 ]
2023-08-30 08:56:38 -06:00