نمودار کامیت

1383 کامیت‌ها

مولف SHA1 پیام تاریخ
Pedram Alizadeh 73acf3eeec modifying the tuning table to improve the performance of broadcast for 1MB to 64MB for single-node MI300X (#1172) 2024-05-08 15:49:33 -04:00
mberenjk 408278209d Adding ASAN changes to address memory leak issue" (#1170)
Co-authored-by: akolliasAMD <akollias@amd.com>
2024-05-08 09:16:00 -05:00
Wenkai Du b18784d8b8 Add compiler warning for uninitialized variable and fix (#1163)
* Add compiler warning for uninitialized variable and fix

* Add -Wsometimes-uninitialized

* Convert warning to error
2024-05-08 07:00:25 -07:00
Wenkai Du f679db6ff6 Use normal permute path when one NIC per GPU (#1171) 2024-05-08 06:59:57 -07:00
Wenkai Du a0cef69110 npkit: add broadcast trace (#1166) 2024-05-07 14:00:16 -07:00
Pak Nin Lui 92a4fc6204 Merge pull request #1167 from paklui/dmabuf
fix typo for DMABUF_ENABLE
2024-05-07 08:48:44 -07:00
dependabot[bot] eb562e7b22 Bump jinja2 from 3.1.3 to 3.1.4 in /docs/sphinx (#1168)
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.3 to 3.1.4.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.3...3.1.4)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-06 15:35:34 -06:00
paklui 140b7dd40f fix typo for DMABUF_ENABLE 2024-05-06 13:27:50 -07:00
Wenkai Du b513c3970a Bypass NVIDIA Ampere related tuning (#1165) 2024-05-03 17:57:16 -07:00
Wenkai Du bb58b1c258 Fix ignore NUMA not being observed for NICs during model matching (#1164) 2024-05-03 16:42:07 -07:00
Wenkai Du 6f5a8ce1fb Fix build error when roctracer-dev package is not installed (#1161) 2024-05-01 13:55:09 -07:00
Wenkai Du 4e1b8c1cbb MSCCL: add support for out-of-place all reduce (#1156) 2024-04-28 19:49:09 -07:00
Wenkai Du cd6e840e0b Add back tree simple chunk size tuning (#1157) 2024-04-28 19:48:53 -07:00
Nilesh M Negi b90436d292 [GRAPH] Reduce NCCL_TOPO_MAX_NODES to 64 (#1153)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-04-27 23:41:11 -05:00
Tim cc39e91c6f Merge pull request #1158 from AtlantaPepsi/NPKit_fix
Prevent segfault from npkit-enabled rccl build
2024-04-26 12:44:04 -04:00
AtlantaPepsi 67246649ac prevent segfault from npkit-enabled rccl build
Signed-off-by: AtlantaPepsi <timhu102@amd.com>
2024-04-26 10:54:27 -05:00
Wenkai Du f330b82985 Revert "Use relaxed atomics for LL on GFX11 (#859)" (#1148)
This reverts commit 6a0a6a37d9.

Use inline asm for 128b load on GFX11 for better peformance.
2024-04-26 07:49:55 -07:00
Bertan Dogancay 0ec41f1386 [UT] Start supporting multiple group calls and graphs (#1151)
* Start supporting multiple group calls UT
2024-04-25 11:11:16 -06:00
Shilei Tian efe99057b0 SWDEV-455705: Fix an UB that could lead to miscompilation (#1155) 2024-04-25 10:10:01 -07:00
Wenkai Du 9e0c9b4ed8 Replace __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ (#1154) 2024-04-25 07:19:18 -07:00
Bertan Dogancay dcc75797a1 Update CHANGELOG.md for RCCL 2.20.5 (#1150) 2024-04-24 09:07:49 -06:00
Bertan Dogancay 8753bec3ea Merge pull request #1111 from BertanDogancay/2.20
2.20.5 Sync
2024-04-24 09:05:41 -06:00
BertanDogancay e1a835910e Merge remote-tracking branch 'nccl/master' into develop 2024-04-23 13:34:00 -07:00
Wenkai Du 220066197a Use hipExtMallocWithFlags to allocate host memory on APU (#1149)
Also use SM60 as CUDA compatibility level.
2024-04-17 16:56:38 -07:00
corey-derochie-amd a14137c062 Updated CHANGELOG for next release (#1146)
* Updated CHANGELOG to release for ROCm 6.1.0 (#1142)

* Fixed missing CHANGELOG notes from ROCm 5.5 through unreleased 6.1 (#1141)

* Update CHANGELOG.md for ROCm release 5.5

(cherry picked from commit 975327be45f2313dc7249f9c54ad90870e833a4a)

* Update CHANGELOG.md for ROCm 5.7.0

(cherry picked from commit ac8db8d8e0853f1783c10e2858f6c3b86e4d27cb)

* Added ROCm 6.0 and 6.1 CHANGELOG notes.

---------

Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com>
(cherry picked from commit 3361abe786)

* Updated CHANGELOG to release for ROCm 6.1.0

* Removed empty sections from CHANGELOG in latest releases.

(cherry picked from commit 164c9553717f2c3bce86a372764ea73030dd5f72)

* Reverted ROCm 6.1.0 block to "Unreleased"
2024-04-15 16:29:40 -06:00
corey-derochie-amd 8f471ba537 Created PR template for the rccl repo (#1118) 2024-04-15 15:34:42 -06:00
gilbertlee-amd 4cb62f999a Rail optimization for rings (#1140)
- Modifies the ring creation algorithm to be friendlier to rail-optimized topologies (should not affect classic fabric topologies)
2024-04-15 12:03:57 -06:00
Bertan Dogancay 3caad91f32 Add unique files to source list (#1144) 2024-04-15 09:46:53 -06:00
dependabot[bot] c50eaddc28 Bump idna from 3.4 to 3.7 in /docs/sphinx (#1143)
Bumps [idna](https://github.com/kjd/idna) from 3.4 to 3.7.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst)
- [Commits](https://github.com/kjd/idna/compare/v3.4...v3.7)

---
updated-dependencies:
- dependency-name: idna
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-12 09:28:39 -06:00
corey-derochie-amd 3361abe786 Fixed missing CHANGELOG notes from ROCm 5.5 through unreleased 6.1 (#1141)
* Update CHANGELOG.md for ROCm release 5.5

(cherry picked from commit 975327be45f2313dc7249f9c54ad90870e833a4a)

* Update CHANGELOG.md for ROCm 5.7.0

(cherry picked from commit ac8db8d8e0853f1783c10e2858f6c3b86e4d27cb)

* Added ROCm 6.0 and 6.1 CHANGELOG notes.

---------

Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com>
2024-04-11 15:04:40 -06:00
mberenjk 428837ffe4 replacing rccl_bfloat16 with hip_bfloat16 (#1126)
Co-authored-by: mberenjk <mberenjk@amd.com>
2024-04-11 11:30:37 -05:00
dependabot[bot] d3899c0581 Bump rocm-docs-core from 0.38.0 to 0.38.1 in /docs/sphinx (#1139)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.38.0 to 0.38.1.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.38.0...v0.38.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-11 09:32:54 -06:00
arvindcheru c1b8eab8e1 Update Depends with correct HIP Runtime package name (#1130) 2024-04-09 19:27:07 -04:00
Wenkai Du 0ce68f21d4 NPKit: doubling size of event buffers following MAXCHANNELS change (#1135) 2024-04-09 08:02:58 -07:00
Wenkai Du 137571fa01 Fix buffer overflow when parsing kernel cmdline (#1133) 2024-04-08 11:12:20 -07:00
gilbertlee-amd 93982533d7 [topo_expl] Adding -n option to override number of nodes (#1134) 2024-04-04 15:11:47 -06:00
Wenkai Du e8c76fd806 rccl_prim_test: increase max number of workgroups and test iterations (#1132) 2024-04-03 11:29:21 -07:00
dependabot[bot] d0d1bfdeda Bump rocm-docs-core from 0.37.0 to 0.38.0 in /docs/sphinx (#1127)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.37.0 to 0.38.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.37.0...v0.38.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-27 11:24:30 -06:00
arvindcheru c0a51dc84b Static Build update - Moved all cmake install() to rocm-cmake APIs, static build update (#1123) 2024-03-26 11:11:09 -04:00
corey-derochie-amd 503a472a25 Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. (#1125) 2024-03-25 16:29:13 -06:00
corey-derochie-amd 9eefc68cb5 Fixes the copyright comment block on each of topo_expl/models/*.xml. The format was not valid XML. (#1124) 2024-03-25 16:21:17 -06:00
Wenkai Du 5976f757dd Remove hipEventDisableSystemFence (#1122)
There is no indication that disabling system fence has any latency improvement.
Removing it per recommendation from HIP.
2024-03-25 08:01:57 -07:00
Pedram Alizadeh c2fc1d6809 msccl algorithms tuning for alltoall on MI300 (#1120)
Co-authored-by: PedramAlizadeh <amd@pmohamma.com>
2024-03-21 20:35:29 -04:00
corey-derochie-amd 606d3e6b6e Added @corey-derochie-amd as a code owner (to rocm-documentation) (#1119) 2024-03-21 14:56:05 -06:00
dependabot[bot] cb80586fb9 Bump rocm-docs-core from 0.36.0 to 0.37.0 in /docs/sphinx (#1117)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.36.0 to 0.37.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.36.0...v0.37.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-20 09:25:14 -06:00
jbachan 6dd51f15bf Merge pull request #1217 from crazy-JiangDongHua/bugfix_undo_plan
Bug in plan enqueue logic where plans could be silently not launched for some communicators. Triggered when both are true:
1. Multiple communicators per ncclGroup.
2. Communicators within a group have different plan counts.
2. Intra-process launch barrier disabled.
2024-03-18 10:12:26 -07:00
Nilesh M Negi 53fad75001 BUILD: Enable RCCL static build (#1114)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-03-15 12:18:18 -05:00
srawat 45ee5734dd refactor RCCL (#1112)
* refactor RCCL

* rccl updates

* Update index.rst

* refactor

* Update what-is-rccl.rst
2024-03-15 14:14:47 +05:30
Pedram Alizadeh 50f22e8317 msccl algorithms tuning for allgather on MI300 (#1110) 2024-03-14 12:18:26 -04:00
dependabot[bot] 0867562b18 Bump rocm-docs-core from 0.35.1 to 0.36.0 in /docs/sphinx (#1109)
Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.35.1 to 0.36.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.35.1...v0.36.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-03-12 09:38:20 -06:00