Граф коммитов

1971 Коммитов

Автор SHA1 Сообщение Дата
Bertan Dogancay bed7cdf863 [GEN/BUILD] Refactor generate.py and reduce build time for older archs (#2006) 2025-10-30 11:45:53 -04:00
Nilesh M Negi 8444b3c6e9 Fix gfx950 gating conditions to match ROCm 7.0.2 (#2003) 2025-10-29 23:27:04 -05:00
Mustafa Abduljabbar 12f51ba8bf [Device] Adjust threadblock size for gfx950 to increase LL64/Simple performance for AR, RS and AG (#1978)
* Add initial commit to increase tb size to 512
* Fix LL perf issue when subset of NCCL_MAX_NTHREADS is used
Adding a constant to barrier_generic logic from using fallback logic when nthreads < NCCL_MAX_NTHREADS and nthreads == blockDim.X
* Adjust nthreads for LL
* Opt threads for reduce_scatter upper small range
* Add macro for single node
* Restrict MSCCL to 256 threads to prevent mem access fault
* Support pre-MI350 compatibility
* Partially refactor threadblock size override
* Use const macros instead of numerals
* opt out of unused function
2025-10-29 23:24:32 -05:00
Bertan Dogancay b703ffdfa4 [Tools/Replayer] Fix prohibited calls during capture mode (#1938) 2025-10-29 12:19:32 -04:00
corey-derochie-amd 561ad2fe05 Updated Changelog with 7.1.1 and 7.2.0 stub sections (#2008)
* Missing ROCm 7.0 & 7.1.0 Changelog entries (#1976)

* Update CHANGELOG.md

* Update CHANGELOG.md

* Apply suggestions from code review

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update CHANGELOG.md

* Update CHANGELOG.md

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Added ROCm 7.2.0 section.

* Update CHANGELOG.md

* Apply suggestion from @corey-derochie-amd

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-10-28 13:41:22 -06:00
Atul Kulkarni cc867dbaf2 Removed RCCL_EXPOSE_STATIC duplicate definition. (#1988) 2025-10-28 13:01:48 -05:00
Atul Kulkarni 26dc7abb32 Added ROCM_VERSION restriction to alloc unit tests (#1989) 2025-10-28 12:54:34 -05:00
alex-breslow-amd e69b11eba5 Remove nontemporality from stores, put in casts to global address space (#1982)
* Implements casting key loads and stores to address_space(1) so that vector global load and store instructions are emitted by the compiler instead of more costly flat loads and stores
* Removes nontemporality from some key stores for gfx950.
2025-10-28 10:34:48 -07:00
corey-derochie-amd f290e302d3 Updated CODEOWNERS to instead use RCCL-Reviewers team (#2010)
* Updated CODEOWNERS to instead use RCCL-Reviewers team

* Apply suggestion from @nileshnegi

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-10-28 09:27:26 -06:00
Kapil S. Pawar 912d53caba Fix segmentation fault related to ext-profiler plugin (#1986) 2025-10-23 09:26:35 -05:00
Joseph Macaranas c2e71e83d1 [External CI] Add references to rocm-systems super repo (#1935)
- In order to trigger downstream jobs to verify projects that consume rccl, references to those repos are required.
2025-10-22 16:07:05 -04:00
Aravind Ravikumar 506c2e9878 Adding reservation time for salloc in CI (#1992)
Co-authored-by: arravikum <arravikum@amd.com>
2025-10-22 10:00:01 -04:00
ehsanhosseinzadehKhaligh aec4f0a659 Updating npkit_trace_generator.py to check npkit directory (#1891)
* create dir regardless of default or user-provided path if it doesn't exist
* Fix npkit_dump_dir on npkit_trace_generator.py

---------

Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>
2025-10-22 02:51:16 -05:00
Afzal Patel 724680f87c add roctracer and rocm-core include directiories (#1970) 2025-10-21 13:53:57 -04:00
Sourav Chakraborty 57286d5df3 Fix incorrect benchmark name in JitterBench script (#1983) 2025-10-21 12:52:20 -05:00
Sourav Chakraborty 5b345d105c Fix build failure in rccl_prim_test (#1984)
Added missing header in rccl_prim_test
2025-10-21 12:51:14 -05:00
mberenjk b58f234539 Add support for additional paths in RCCL DMABUF kernel configuration loading (#1825)
* Adding more path to the kernel load and an environment variable to force enable DMABUF

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-10-20 13:35:22 -07:00
Mythreya Kuricheti 9ae5956ca5 [rocprofiler-sdk] Update codeowner for api-trace.h (#1974)
Feedback from #1933
2025-10-20 10:43:42 -06:00
Nilesh M Negi 34d469864b [FORMAT] Add .clang-format for C++ code (#1404) 2025-10-20 10:54:03 -05:00
JC b1589a5786 [CI] Enable ccache w/ namespace for external use (#1966)
* Enable ccache w/ namespace for external use

* Remove TheRock from setup_tools.py command line

* Bump TheRock commit to use health_status.py

Resolves https://github.com/ROCm/rccl/pull/1966/files/c6d2e8ce5c14a2c94bfb47e21d3e2d466f25c9b4#r2420734710

* Bump TheRock to older commit with health_status.py

* Add git safe directory for working directory

* Move install python deps

* Remove pip freeze
2025-10-20 08:44:42 -07:00
Nilesh M Negi c35bc721ad Fix ncclDevFuncId for AllReduceWithBias (#1980) 2025-10-17 09:28:57 -05:00
Arm Patinyasakdikul 58eca5d7f8 Disable graph mode memory registration and UBR as unsupported feature. (#1977) 2025-10-17 09:18:39 -05:00
Arm Patinyasakdikul 9806f5e9dd Fix git version fetching logic. (#1981) 2025-10-17 09:17:49 -05:00
Rahul Vaidya 624f68b2b2 [Profiler plugin] Fix segfault issue with profiler plugin (#1973)
* Fix profiler plugin segfault by correctly setting p2p->func

* Look for librccl-profiler.so instead of libnccl-profiler.so

Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

---------

Signed-off-by: rahulvaidya20 <ravaidya@amd.com>
Co-authored-by: Yongjie Qiu <Yongjie.Qiu@amd.com>
2025-10-16 16:33:18 -05:00
alex-breslow-amd 154350baaf MSCCL: Unland PR1788 + Fix for MSCCL Data Corruption (#1960)
- Earlier fix PR1788 is no longer necessary after ROCr fix and pre-ROCr fix workaround
- Inserts an s_waitcnt vmcnt(0), which fixes a data corruption issue in MSCCL
2025-10-15 10:32:25 -07:00
gilbertlee-amd fedddb452c Enabling gdrcopy option for gfx950 (#1955) 2025-10-15 10:55:25 -06:00
alex-breslow-amd c70f5b4621 [gfx950] Make bypassing __threadfence the default for multinode. (#1947)
* Gate based on ROCM version, safe for ROCm 7.0.2 and beyond.
* Updates naming to gfx9CheapFenceOff since we use this for gfx942 and gfx950.  Thanks Nilesh.
* Add info logging statement to NCCL_INIT to print whether enabled when INFO logging is enabled.
2025-10-15 09:15:36 -07:00
isaki001 0f99fd84a3 gfx950 channel tuning for ReduceScatter and AllGather (#1940)
* add channel thresholds to override channel-count adjustments
2025-10-14 09:50:44 -05:00
mberenjk e738c03e39 fixing the ar_with_bias test issue when running rccl-tests (#1912)
* fixing the AR_With_Bias issue when running rccl-tests
2025-10-13 13:58:21 -07:00
alex-breslow-amd ff209e5b19 Dump compiler-determined GPU kernel resource usage (#1965)
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)
2025-10-13 11:24:42 -05:00
Geo Min 97f2665da2 fixing group id (#1975) 2025-10-10 16:40:44 -07:00
Mythreya Kuricheti 3000f0e837 [rocprofiler-sdk] Add codeowner for api-trace.h (#1933) 2025-10-10 16:29:17 -05:00
Arm Patinyasakdikul ff75860d73 Fix unroll factor display bug. (#1969) 2025-10-10 15:35:06 -05:00
Surya Periaswamy 5bd5079de1 MSCCL++ fix split path null deref (#1959)
* Add speriaswamy-amd to CODEOWNERS
* MSCCL++: fix split path null deref; key maps by parent ncclUniqueId
* removed no-op
2025-10-09 14:08:38 -05:00
Rahul Vaidya 6b200ee6c5 Fix LL128 proto selection to respect user setting (#1822) 2025-10-09 14:08:03 -05:00
Nusrat Islam d22a39e954 Update direct AG and single node LL threshold (#1944)
* update AG direct and single node LL threshold

* update thresholds based on MI350 expeirmental results

* disable using LL for direct AG

* enable direct AG for lower GPU counts

* direct AG single node tuning

* fix in-place buffer allocation for AG unit test

* whitespace fix

* gate direct AG for gfx950 and gfx942

---------

Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com>
2025-10-09 10:48:50 -05:00
Artem Kuzmitckii 00a42c80f3 Reverse logic of context tracking enablement from #1927 (#1971)
In this commit it disabled by default and can be enabled via
`RCCL_ENABLE_CONTEXT_TRACKING=1` for both (CDNA, RDNA)
Original PR https://github.com/ROCm/rccl/pull/1927
2025-10-09 10:24:09 +02:00
Arm Patinyasakdikul cede6d0134 Revert "Change to use -O0 instead of -O1 in debug build. (#1949)" (#1957)
This reverts commit feee02ca61.
2025-10-08 10:01:45 -05:00
Aravind Ravikumar 1858a31c41 Enable Presubmit CI Gating for develop Branch (TheRock CI for RCCL) (#1954)
* Trigger CI run on pull request

* Enabling CI run on different PR types

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-10-07 09:11:50 -04:00
corey-derochie-amd b1fbf535da [SYNC] 2.27.7 (#1928)
Merge pull request #1928 from corey-derochie-amd/2.27.7-sync
2025-10-06 16:47:50 -06:00
BertanDogancay 3f94267f21 Merge remote-tracking branch 'nccl/master' into develop 2025-10-06 18:36:49 -04:00
Arm Patinyasakdikul feee02ca61 Change to use -O0 instead of -O1 in debug build. (#1949)
* Change to use -O0 instead of -O1 in debug build.

* Use -O1 for device code to avoid linking issue in debug build.
2025-10-03 16:05:01 -05:00
Nilesh M Negi 342ec086e3 Revert "changes for hugepages backed host buffer for larger allocations (#1841)" (#1951)
This reverts commit 65b69bf318.
2025-10-02 23:43:09 -05:00
amd-jiali 5978d2f9ab Print out the hipRuntimeVersion message from WARN to always show up (#1911)
Authored-by: Jiali Li <jialili@amd.com>
2025-10-02 11:32:32 -05:00
dependabot[bot] 42ce371e3d Bump rocm-docs-core from 1.22.0 to 1.26.0 in /docs/sphinx (#1952)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.26.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.22.0...v1.26.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-02 11:33:14 -04:00
Istvan Kiss 3776129011 Add reference to supported data types section (#1893) 2025-10-01 12:36:14 +02:00
David DeBonis d23d18f423 Adding usage tip for ignore cpu affinity (#1948)
* Adding usage tip for ignore cpu affinity

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-29 10:11:21 -06:00
Bhuvan Mital 65b69bf318 changes for hugepages backed host buffer for larger allocations (#1841) 2025-09-28 00:40:22 -05:00
Artem Kuzmitckii 07925ec027 Revert disabling of context tracking for Radeon (#1927)
* Revert disabling of context tracking for Radeon

Original commit 6fc228e2
 `Disable context tracking for the current version. (#1839)`

* Add env variable for disabling of context tracking for Radeon

`export NCCL_DISABLE_CONTEXT_TRACKING=1` to force disable of context tracking

* Update docs/how-to/rccl-usage-tips.rst

Fix grammar, thanks @amd-jnovotny

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Rename NCCL_DISABLE_CONTEXT_TRACKING -> RCCL_DISABLE_CONTEXT_TRACKING

* Revert changes in includes and rename util function

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-27 15:19:50 -04:00
alex-breslow-amd 45166f6586 Gate code by rocm_version (#1945) 2025-09-26 13:28:41 -07:00