Wykres commitów

1967 Commity

Autor SHA1 Wiadomość Data
corey-derochie-amd c5cdee4fa5 Updated Changelog with 7.1.1 and 7.2.0 stub sections (#2008)
* Missing ROCm 7.0 & 7.1.0 Changelog entries (#1976)

* Update CHANGELOG.md

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Added ROCm 7.2.0 section.

* Update CHANGELOG.md

* Apply suggestion from @corey-derochie-amd

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 561ad2fe05]
2025-10-28 13:41:22 -06:00
Atul Kulkarni f2287e8f97 Removed RCCL_EXPOSE_STATIC duplicate definition. (#1988)
[ROCm/rccl commit: cc867dbaf2]
2025-10-28 13:01:48 -05:00
Atul Kulkarni 884138205d Added ROCM_VERSION restriction to alloc unit tests (#1989)
[ROCm/rccl commit: 26dc7abb32]
2025-10-28 12:54:34 -05:00
alex-breslow-amd f7405b8739 Remove nontemporality from stores, put in casts to global address space (#1982)
* Implements casting key loads and stores to address_space(1) so that vector global load and store instructions are emitted by the compiler instead of more costly flat loads and stores
* Removes nontemporality from some key stores for gfx950.

[ROCm/rccl commit: e69b11eba5]
2025-10-28 10:34:48 -07:00
corey-derochie-amd 44160d34a4 Updated CODEOWNERS to instead use RCCL-Reviewers team (#2010)
* Updated CODEOWNERS to instead use RCCL-Reviewers team

* Apply suggestion from @nileshnegi

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: f290e302d3]
2025-10-28 09:27:26 -06:00
Kapil S. Pawar be249ae356 Fix segmentation fault related to ext-profiler plugin (#1986)
[ROCm/rccl commit: 912d53caba]
2025-10-23 09:26:35 -05:00
Joseph Macaranas 4ba8c94aab [External CI] Add references to rocm-systems super repo (#1935)
- In order to trigger downstream jobs to verify projects that consume rccl, references to those repos are required.

[ROCm/rccl commit: c2e71e83d1]
2025-10-22 16:07:05 -04:00
Aravind Ravikumar a7a1647926 Adding reservation time for salloc in CI (#1992)
Co-authored-by: arravikum <arravikum@amd.com>

[ROCm/rccl commit: 506c2e9878]
2025-10-22 10:00:01 -04:00
ehsanhosseinzadehKhaligh f5b45c549d Updating npkit_trace_generator.py to check npkit directory (#1891)
* create dir regardless of default or user-provided path if it doesn't exist
* Fix npkit_dump_dir on npkit_trace_generator.py

---------

Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>

[ROCm/rccl commit: aec4f0a659]
2025-10-22 02:51:16 -05:00
Afzal Patel 1efa14da5b add roctracer and rocm-core include directiories (#1970)
[ROCm/rccl commit: 724680f87c]
2025-10-21 13:53:57 -04:00
Sourav Chakraborty a3a5631f53 Fix incorrect benchmark name in JitterBench script (#1983)
[ROCm/rccl commit: 57286d5df3]
2025-10-21 12:52:20 -05:00
Sourav Chakraborty 046af13751 Fix build failure in rccl_prim_test (#1984)
Added missing header in rccl_prim_test

[ROCm/rccl commit: 5b345d105c]
2025-10-21 12:51:14 -05:00
mberenjk 96c62b091d Add support for additional paths in RCCL DMABUF kernel configuration loading (#1825)
* Adding more path to the kernel load and an environment variable to force enable DMABUF

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: b58f234539]
2025-10-20 13:35:22 -07:00
Mythreya Kuricheti ef1ed44e93 [rocprofiler-sdk] Update codeowner for api-trace.h (#1974)
Feedback from #1933

[ROCm/rccl commit: 9ae5956ca5]
2025-10-20 10:43:42 -06:00
Nilesh M Negi ab1bd9d87f [FORMAT] Add .clang-format for C++ code (#1404)
[ROCm/rccl commit: 34d469864b]
2025-10-20 10:54:03 -05:00
JC 08d93e763e [CI] Enable ccache w/ namespace for external use (#1966)
* Enable ccache w/ namespace for external use

* Remove TheRock from setup_tools.py command line

* Bump TheRock commit to use health_status.py

Resolves https://github.com/ROCm/rccl/pull/1966/files/f9d6d76440b88ecf67d08765ee0e9bac00b55b40#r2420734710

* Bump TheRock to older commit with health_status.py

* Add git safe directory for working directory

* Move install python deps

* Remove pip freeze

[ROCm/rccl commit: b1589a5786]
2025-10-20 08:44:42 -07:00
Nilesh M Negi 0aa56fb0a5 Fix ncclDevFuncId for AllReduceWithBias (#1980)
[ROCm/rccl commit: c35bc721ad]
2025-10-17 09:28:57 -05:00
Arm Patinyasakdikul fca120343f Disable graph mode memory registration and UBR as unsupported feature. (#1977)
[ROCm/rccl commit: 58eca5d7f8]
2025-10-17 09:18:39 -05:00
Arm Patinyasakdikul b14fec8dbc Fix git version fetching logic. (#1981)
[ROCm/rccl commit: 9806f5e9dd]
2025-10-17 09:17:49 -05:00
Rahul Vaidya 307f787244 [Profiler plugin] Fix segfault issue with profiler plugin (#1973)
* Fix profiler plugin segfault by correctly setting p2p->func

* Look for librccl-profiler.so instead of libnccl-profiler.so

Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

---------

Signed-off-by: rahulvaidya20 <ravaidya@amd.com>
Co-authored-by: Yongjie Qiu <Yongjie.Qiu@amd.com>

[ROCm/rccl commit: 624f68b2b2]
2025-10-16 16:33:18 -05:00
alex-breslow-amd a5256e6219 MSCCL: Unland PR1788 + Fix for MSCCL Data Corruption (#1960)
- Earlier fix PR1788 is no longer necessary after ROCr fix and pre-ROCr fix workaround
- Inserts an s_waitcnt vmcnt(0), which fixes a data corruption issue in MSCCL

[ROCm/rccl commit: 154350baaf]
2025-10-15 10:32:25 -07:00
gilbertlee-amd bb85692891 Enabling gdrcopy option for gfx950 (#1955)
[ROCm/rccl commit: fedddb452c]
2025-10-15 10:55:25 -06:00
alex-breslow-amd 455d516dc4 [gfx950] Make bypassing __threadfence the default for multinode. (#1947)
* Gate based on ROCM version, safe for ROCm 7.0.2 and beyond.
* Updates naming to gfx9CheapFenceOff since we use this for gfx942 and gfx950.  Thanks Nilesh.
* Add info logging statement to NCCL_INIT to print whether enabled when INFO logging is enabled.

[ROCm/rccl commit: c70f5b4621]
2025-10-15 09:15:36 -07:00
isaki001 6d151d4e21 gfx950 channel tuning for ReduceScatter and AllGather (#1940)
* add channel thresholds to override channel-count adjustments

[ROCm/rccl commit: 0f99fd84a3]
2025-10-14 09:50:44 -05:00
mberenjk 433251272b fixing the ar_with_bias test issue when running rccl-tests (#1912)
* fixing the AR_With_Bias issue when running rccl-tests

[ROCm/rccl commit: e738c03e39]
2025-10-13 13:58:21 -07:00
alex-breslow-amd d51ed2fdfd Dump compiler-determined GPU kernel resource usage (#1965)
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)


[ROCm/rccl commit: ff209e5b19]
2025-10-13 11:24:42 -05:00
Geo Min 3ead4ca4a1 fixing group id (#1975)
[ROCm/rccl commit: 97f2665da2]
2025-10-10 16:40:44 -07:00
Mythreya Kuricheti 24a62a2ab3 [rocprofiler-sdk] Add codeowner for api-trace.h (#1933)
[ROCm/rccl commit: 3000f0e837]
2025-10-10 16:29:17 -05:00
Arm Patinyasakdikul 0407f294e9 Fix unroll factor display bug. (#1969)
[ROCm/rccl commit: ff75860d73]
2025-10-10 15:35:06 -05:00
Surya Periaswamy 014fae1b51 MSCCL++ fix split path null deref (#1959)
* Add speriaswamy-amd to CODEOWNERS
* MSCCL++: fix split path null deref; key maps by parent ncclUniqueId
* removed no-op

[ROCm/rccl commit: 5bd5079de1]
2025-10-09 14:08:38 -05:00
Rahul Vaidya 8e5016ebfd Fix LL128 proto selection to respect user setting (#1822)
[ROCm/rccl commit: 6b200ee6c5]
2025-10-09 14:08:03 -05:00
Nusrat Islam d6d5fac152 Update direct AG and single node LL threshold (#1944)
* update AG direct and single node LL threshold

* update thresholds based on MI350 expeirmental results

* disable using LL for direct AG

* enable direct AG for lower GPU counts

* direct AG single node tuning

* fix in-place buffer allocation for AG unit test

* whitespace fix

* gate direct AG for gfx950 and gfx942

---------

Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com>

[ROCm/rccl commit: d22a39e954]
2025-10-09 10:48:50 -05:00
Artem Kuzmitckii 0c7e116b31 Reverse logic of context tracking enablement from #1927 (#1971)
In this commit it disabled by default and can be enabled via
`RCCL_ENABLE_CONTEXT_TRACKING=1` for both (CDNA, RDNA)
Original PR https://github.com/ROCm/rccl/pull/1927

[ROCm/rccl commit: 00a42c80f3]
2025-10-09 10:24:09 +02:00
Arm Patinyasakdikul edd1c72741 Revert "Change to use -O0 instead of -O1 in debug build. (#1949)" (#1957)
This reverts commit 5f16e69d8e.

[ROCm/rccl commit: cede6d0134]
2025-10-08 10:01:45 -05:00
Aravind Ravikumar 45abdcfe62 Enable Presubmit CI Gating for develop Branch (TheRock CI for RCCL) (#1954)
* Trigger CI run on pull request

* Enabling CI run on different PR types

---------

Co-authored-by: arravikum <arravikum@amd.com>

[ROCm/rccl commit: 1858a31c41]
2025-10-07 09:11:50 -04:00
corey-derochie-amd fc8ec5ea9c [SYNC] 2.27.7 (#1928)
Merge pull request #1928 from corey-derochie-amd/2.27.7-sync

[ROCm/rccl commit: b1fbf535da]
2025-10-06 16:47:50 -06:00
BertanDogancay 2a4e4308b0 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 3f94267f21]
2025-10-06 18:36:49 -04:00
Arm Patinyasakdikul 5f16e69d8e Change to use -O0 instead of -O1 in debug build. (#1949)
* Change to use -O0 instead of -O1 in debug build.

* Use -O1 for device code to avoid linking issue in debug build.

[ROCm/rccl commit: feee02ca61]
2025-10-03 16:05:01 -05:00
Nilesh M Negi 6ade586fdc Revert "changes for hugepages backed host buffer for larger allocations (#1841)" (#1951)
This reverts commit 3169352cad.

[ROCm/rccl commit: 342ec086e3]
2025-10-02 23:43:09 -05:00
amd-jiali 917973d9e9 Print out the hipRuntimeVersion message from WARN to always show up (#1911)
Authored-by: Jiali Li <jialili@amd.com>


[ROCm/rccl commit: 5978d2f9ab]
2025-10-02 11:32:32 -05:00
dependabot[bot] dfd4f19978 Bump rocm-docs-core from 1.22.0 to 1.26.0 in /docs/sphinx (#1952)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.26.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.22.0...v1.26.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 42ce371e3d]
2025-10-02 11:33:14 -04:00
Istvan Kiss 118dc600ca Add reference to supported data types section (#1893)
[ROCm/rccl commit: 3776129011]
2025-10-01 12:36:14 +02:00
David DeBonis 32b3a82956 Adding usage tip for ignore cpu affinity (#1948)
* Adding usage tip for ignore cpu affinity

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: d23d18f423]
2025-09-29 10:11:21 -06:00
Bhuvan Mital 3169352cad changes for hugepages backed host buffer for larger allocations (#1841)
[ROCm/rccl commit: 65b69bf318]
2025-09-28 00:40:22 -05:00
Artem Kuzmitckii 722b0cd579 Revert disabling of context tracking for Radeon (#1927)
* Revert disabling of context tracking for Radeon

Original commit df3b7e47
 `Disable context tracking for the current version. (#1839)`

* Add env variable for disabling of context tracking for Radeon

`export NCCL_DISABLE_CONTEXT_TRACKING=1` to force disable of context tracking

* Update docs/how-to/rccl-usage-tips.rst

Fix grammar, thanks @amd-jnovotny

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Rename NCCL_DISABLE_CONTEXT_TRACKING -> RCCL_DISABLE_CONTEXT_TRACKING

* Revert changes in includes and rename util function

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 07925ec027]
2025-09-27 15:19:50 -04:00
alex-breslow-amd 6423f5b024 Gate code by rocm_version (#1945)
[ROCm/rccl commit: 45166f6586]
2025-09-26 13:28:41 -07:00
Mustafa Abduljabbar 22a24cc61a Fix extra token typo (#1943)
[ROCm/rccl commit: 0dd2b2f65e]
2025-09-26 11:18:43 -04:00
Mustafa Abduljabbar d646b0e49b Expose symbols for RCCL algo/proto/channels selection functions (#1923)
* Unhide symbols for algo/proto functions

* Add all_gather direct usage detection

[ROCm/rccl commit: 7a329bbd94]
2025-09-25 18:58:30 -04:00
Larry Meadows a8bf65a298 - LL Protocol: Add missing fences for gfx950, this fixes the hang issue (#1932)
- Remove asm flat_store_dwordx4, not needed

[ROCm/rccl commit: cb14fccdcc]
2025-09-25 14:07:07 -07:00
Sai Enduri 15628819e2 Enable multi node rccl tests on MI350x slurm cluster. (#1900)
* Add tests on slurm cluster

* Integrate slurm.

* Add flags.

* Added dynamic selection of runners for tests and cleanup for slurm reservation

* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"

This reverts commit fdd5a6cc968c764d3d1039f0897fb11f11422928.

* Refactor so tests run on both architectures.

* continue on error

* fail fast false on matrix

* remove scancel

* skip all single node tests

* fix pattern matching for pytest

* switch to always skip github job

* Update to latest allocation.

* Clean up workflows and update docker image.

* Updated container image published from PR #1517

* Switch back to TheRock main branch sha.

---------

Co-authored-by: arravikum <arravikum@amd.com>

[ROCm/rccl commit: 01d16d4139]
2025-09-23 22:00:26 -07:00