Gráfico de commits

1338 Commits

Autor SHA1 Mensaje Fecha
Wenkai Du 43bbee4dcc Remove hipEventDisableSystemFence (#1122)
There is no indication that disabling system fence has any latency improvement.
Removing it per recommendation from HIP.

[ROCm/rccl commit: 5976f757dd]
2024-03-25 08:01:57 -07:00
Pedram Alizadeh 61f89d680d msccl algorithms tuning for alltoall on MI300 (#1120)
Co-authored-by: PedramAlizadeh <amd@pmohamma.com>

[ROCm/rccl commit: c2fc1d6809]
2024-03-21 20:35:29 -04:00
corey-derochie-amd 9c2a57259d Added @corey-derochie-amd as a code owner (to rocm-documentation) (#1119)
[ROCm/rccl commit: 606d3e6b6e]
2024-03-21 14:56:05 -06:00
dependabot[bot] d956fe9cbd Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx (#1117)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: cb80586fb9]
2024-03-20 09:25:14 -06:00
Nilesh M Negi f93831cf6a BUILD: Enable RCCL static build (#1114)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 53fad75001]
2024-03-15 12:18:18 -05:00
srawat 7c8cf72d35 refactor RCCL (#1112)
* refactor RCCL

* rccl updates

* Update index.rst

* refactor

* Update what-is-rccl.rst

[ROCm/rccl commit: 45ee5734dd]
2024-03-15 14:14:47 +05:30
Pedram Alizadeh 17b9546da9 msccl algorithms tuning for allgather on MI300 (#1110)
[ROCm/rccl commit: 50f22e8317]
2024-03-14 12:18:26 -04:00
dependabot[bot] 7e22922051 Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx (#1109)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.1 to 0.36.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 0867562b18]
2024-03-12 09:38:20 -06:00
Andy li e373bd44bf Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo

[ROCm/rccl commit: 6777e65c1d]
2024-03-08 15:17:53 -08:00
Wenkai Du 2354601589 Improve debug messages of memory allocations (#1107)
[ROCm/rccl commit: ff951e607d]
2024-03-08 10:55:10 -08:00
Wenkai Du c2eff3ecd9 topo_expl: 2.19.4 update and fix build error (#1098)
[ROCm/rccl commit: d2224fd3e1]
2024-03-07 08:52:50 -08:00
Wenkai Du 6dd45024f8 msccl: fix scratch memory allocation after API change (#1103)
[ROCm/rccl commit: 77615cce28]
2024-03-06 11:11:04 -08:00
dependabot[bot] 64e4e20da5 Bump rocm-docs-core from 0.35.0 to 0.35.1 in /docs/sphinx (#1100)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.0 to 0.35.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.0...v0.35.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 1f7b6e18d7]
2024-03-06 11:15:33 -07:00
yhuiYH 45c166554d Merge pull request #1099 from ROCm/LisaDelaney-patch-1
link fix

[ROCm/rccl commit: 12441e8f6c]
2024-03-05 13:54:04 -05:00
Lisa d067682641 link fix
[ROCm/rccl commit: a032cb9eeb]
2024-03-05 09:01:10 -07:00
Bertan Dogancay 1dfe5cca64 Fix bug when configuring for only LL128 (#1097)
[ROCm/rccl commit: a279e7f32d]
2024-03-01 18:09:39 -07:00
Wenkai Du e5aedb153e Add support for using contiguous for GPU direct RDMA (#1096)
Enabled by env var RCCL_NET_CONTIGUOUS_MEM=1

[ROCm/rccl commit: cbd955627e]
2024-02-29 10:06:43 -08:00
Wenkai Du 058886cb20 Add another Rome model (#1095)
[ROCm/rccl commit: df98a6957d]
2024-02-28 10:46:05 -08:00
Bertan Dogancay cee279fd99 Implement ROCTX (#1094)
* Implement roctx

[ROCm/rccl commit: b617aecc31]
2024-02-27 15:46:15 -07:00
dependabot[bot] d0a346a738 Bump rocm-docs-core from 0.34.2 to 0.35.0 in /docs/sphinx (#1092)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.2 to 0.35.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.2...v0.35.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: dae6df6d16]
2024-02-26 16:57:14 -07:00
dependabot[bot] 0272742733 Bump cryptography from 42.0.2 to 42.0.4 in /docs/sphinx (#1090)
Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.2...42.0.4)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: beb1e487ad]
2024-02-26 16:47:14 -07:00
Tim 826d20495f Adding FP16 cases to unit tests(#1093)
Signed-off-by: Tim Hu <timhu102@amd.com>

[ROCm/rccl commit: 0d06b0f1de]
2024-02-26 12:08:04 -05:00
Wenkai Du 874998033f Add new GPU model (#1080)
[ROCm/rccl commit: 74f9e5db64]
2024-02-23 12:19:42 -08:00
Wenkai Du 4b31894d70 Update RCCL/MSCCL work FIFO depth to 256K (#1091)
[ROCm/rccl commit: c5ab37211b]
2024-02-21 17:15:11 -08:00
Bertan Dogancay 4b4bdd904e LL128 check if all XGMI (#1089)
[ROCm/rccl commit: b275ed0b56]
2024-02-21 09:41:40 -07:00
Pedram Alizadeh bf48d1bc4d msccl algorithms tuning for allreduce on MI300 (#1088)
[ROCm/rccl commit: 5a0f9990a9]
2024-02-21 11:31:56 -05:00
dependabot[bot] 3cd03179cb Bump cryptography from 42.0.0 to 42.0.2 in /docs/sphinx (#1087)
Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/42.0.0...42.0.2)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: b7e3f1da14]
2024-02-20 15:03:10 -07:00
dependabot[bot] 46ada18646 Bump rocm-docs-core from 0.34.0 to 0.34.2 in /docs/sphinx (#1086)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.34.0 to 0.34.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.34.0...v0.34.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 7e47a77339]
2024-02-16 11:21:27 -07:00
Bertan Dogancay 32e1c8cba0 Merge pull request #1079 from BertanDogancay/2.19.4-sync
2.19.4 Sync

[ROCm/rccl commit: 2fb12a9358]
2024-02-16 09:50:11 -07:00
BertanDogancay 24d9e1c36b Increase max stack size when ll128 enabled
[ROCm/rccl commit: b098120c40]
2024-02-15 15:56:59 -08:00
akolliasAMD e0dd21028f Allow bus id to be null (#1085)
* Allow bus id to be null


[ROCm/rccl commit: bac57421c7]
2024-02-15 16:36:51 -07:00
BertanDogancay ef72944015 Disable unsupported ld/st instructions
[ROCm/rccl commit: 6f3310605c]
2024-02-15 13:58:16 -08:00
BertanDogancay 7842411fb3 Merge remote-tracking branch 'rccl/develop' into 2.19.4
[ROCm/rccl commit: 76f83f95ab]
2024-02-15 13:37:14 -08:00
akolliasAMD 5d44815d95 Npkit updates (#1084)
* removed warmup runs to be an opt in

[ROCm/rccl commit: 16d7f372b7]
2024-02-15 07:48:45 -07:00
Wenkai Du c4e9e2b18a Use native half without conversion (#1083)
[ROCm/rccl commit: 51003c9980]
2024-02-13 16:57:34 -08:00
Wenkai Du 2f14acf770 Fix undefined symbol when nvtx is not enabled (#1082)
[ROCm/rccl commit: 1f0af90206]
2024-02-13 14:03:43 -08:00
Bertan Dogancay bee47d9e91 Add stack size UT (#1081)
* Add stack size UT

[ROCm/rccl commit: dc2d486ba0]
2024-02-12 17:56:15 -07:00
BertanDogancay de6f20b7ae Fix docs
[ROCm/rccl commit: 32cca51894]
2024-02-11 22:32:55 -08:00
Wenkai Du d5f5091e5d Merge remote-tracking branch 'rccl/develop' into 2.19.4
[ROCm/rccl commit: d999d9ad21]
2024-02-09 11:31:03 -06:00
Wenkai Du 6775a75906 2.18.5 fix (#1077)
* Revert "Revert "2.18.5-1""

This reverts commit 7cc572ecf9.

* Fix initial net device value

[ROCm/rccl commit: 5669b0d7b6]
2024-02-09 09:18:38 -08:00
dependabot[bot] 59eac59cea Bump rocm-docs-core from 0.33.2 to 0.34.0 in /docs/sphinx (#1078)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.2 to 0.34.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.2...v0.34.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 3e505a991c]
2024-02-09 10:12:07 -07:00
Bertan Dogancay 45ed3ef4e7 Nvtx support (#1076)
* NVTX support

[ROCm/rccl commit: 8a442faa12]
2024-02-08 14:08:24 -07:00
Wenkai Du 1538b908ac msccl: use relaxed atomics on scratch buffer (#1075)
[ROCm/rccl commit: 5257c753c5]
2024-02-08 12:09:56 -08:00
dependabot[bot] ff2be03272 Bump rocm-docs-core from 0.33.1 to 0.33.2 in /docs/sphinx (#1073)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.1 to 0.33.2.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.1...v0.33.2)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: be45f0effd]
2024-02-08 09:26:47 -07:00
Wenkai Du ce39eefe65 Doubling P2P channels per peer on single node gfx94x only (#1074)
[ROCm/rccl commit: 704c9ef0d1]
2024-02-07 14:05:57 -08:00
dependabot[bot] c0745fe0b8 Bump rocm-docs-core from 0.33.0 to 0.33.1 in /docs/sphinx (#1071)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.33.0 to 0.33.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.33.0...v0.33.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: a9214032fc]
2024-02-06 16:00:30 -07:00
dependabot[bot] b6868a1573 Bump cryptography from 41.0.6 to 42.0.0 in /docs/sphinx (#1070)
Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/41.0.6...42.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: ca007ddad3]
2024-02-06 15:59:52 -07:00
Wenkai Du 57e508f2e4 Doubling P2P channels per peer on single node only (#1069)
[ROCm/rccl commit: 1d989f6524]
2024-02-02 12:41:00 -08:00
Wenkai Du e319d0a49d Merge remote-tracking branch 'rccl/develop' into HEAD
[ROCm/rccl commit: e64324a64a]
2024-02-01 12:17:09 -06:00
Nilesh M Negi f23716de80 Enable kernarg preloading for ROCm 6.1 (#1068)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 2458f158b1]
2024-02-01 12:14:04 -06:00