alex-breslow-amd
ff209e5b19
Dump compiler-determined GPU kernel resource usage ( #1965 )
...
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)
2025-10-13 11:24:42 -05:00
Geo Min
97f2665da2
fixing group id ( #1975 )
2025-10-10 16:40:44 -07:00
Mythreya Kuricheti
3000f0e837
[rocprofiler-sdk] Add codeowner for api-trace.h ( #1933 )
2025-10-10 16:29:17 -05:00
Arm Patinyasakdikul
ff75860d73
Fix unroll factor display bug. ( #1969 )
2025-10-10 15:35:06 -05:00
Surya Periaswamy
5bd5079de1
MSCCL++ fix split path null deref ( #1959 )
...
* Add speriaswamy-amd to CODEOWNERS
* MSCCL++: fix split path null deref; key maps by parent ncclUniqueId
* removed no-op
2025-10-09 14:08:38 -05:00
Rahul Vaidya
6b200ee6c5
Fix LL128 proto selection to respect user setting ( #1822 )
2025-10-09 14:08:03 -05:00
Nusrat Islam
d22a39e954
Update direct AG and single node LL threshold ( #1944 )
...
* update AG direct and single node LL threshold
* update thresholds based on MI350 expeirmental results
* disable using LL for direct AG
* enable direct AG for lower GPU counts
* direct AG single node tuning
* fix in-place buffer allocation for AG unit test
* whitespace fix
* gate direct AG for gfx950 and gfx942
---------
Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com >
2025-10-09 10:48:50 -05:00
Artem Kuzmitckii
00a42c80f3
Reverse logic of context tracking enablement from #1927 ( #1971 )
...
In this commit it disabled by default and can be enabled via
`RCCL_ENABLE_CONTEXT_TRACKING=1` for both (CDNA, RDNA)
Original PR https://github.com/ROCm/rccl/pull/1927
2025-10-09 10:24:09 +02:00
Arm Patinyasakdikul
cede6d0134
Revert "Change to use -O0 instead of -O1 in debug build. ( #1949 )" ( #1957 )
...
This reverts commit feee02ca61 .
2025-10-08 10:01:45 -05:00
Aravind Ravikumar
1858a31c41
Enable Presubmit CI Gating for develop Branch (TheRock CI for RCCL) ( #1954 )
...
* Trigger CI run on pull request
* Enabling CI run on different PR types
---------
Co-authored-by: arravikum <arravikum@amd.com >
2025-10-07 09:11:50 -04:00
corey-derochie-amd
b1fbf535da
[SYNC] 2.27.7 ( #1928 )
...
Merge pull request #1928 from corey-derochie-amd/2.27.7-sync
2025-10-06 16:47:50 -06:00
BertanDogancay
3f94267f21
Merge remote-tracking branch 'nccl/master' into develop
2025-10-06 18:36:49 -04:00
Arm Patinyasakdikul
feee02ca61
Change to use -O0 instead of -O1 in debug build. ( #1949 )
...
* Change to use -O0 instead of -O1 in debug build.
* Use -O1 for device code to avoid linking issue in debug build.
2025-10-03 16:05:01 -05:00
Nilesh M Negi
342ec086e3
Revert "changes for hugepages backed host buffer for larger allocations ( #1841 )" ( #1951 )
...
This reverts commit 65b69bf318 .
2025-10-02 23:43:09 -05:00
amd-jiali
5978d2f9ab
Print out the hipRuntimeVersion message from WARN to always show up ( #1911 )
...
Authored-by: Jiali Li <jialili@amd.com >
2025-10-02 11:32:32 -05:00
dependabot[bot]
42ce371e3d
Bump rocm-docs-core from 1.22.0 to 1.26.0 in /docs/sphinx ( #1952 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.26.0/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.22.0...v1.26.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-version: 1.26.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-02 11:33:14 -04:00
Istvan Kiss
3776129011
Add reference to supported data types section ( #1893 )
2025-10-01 12:36:14 +02:00
David DeBonis
d23d18f423
Adding usage tip for ignore cpu affinity ( #1948 )
...
* Adding usage tip for ignore cpu affinity
* Update docs/how-to/rccl-usage-tips.rst
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
* Update docs/how-to/rccl-usage-tips.rst
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
2025-09-29 10:11:21 -06:00
Bhuvan Mital
65b69bf318
changes for hugepages backed host buffer for larger allocations ( #1841 )
2025-09-28 00:40:22 -05:00
Artem Kuzmitckii
07925ec027
Revert disabling of context tracking for Radeon ( #1927 )
...
* Revert disabling of context tracking for Radeon
Original commit 6fc228e2
`Disable context tracking for the current version. (#1839 )`
* Add env variable for disabling of context tracking for Radeon
`export NCCL_DISABLE_CONTEXT_TRACKING=1` to force disable of context tracking
* Update docs/how-to/rccl-usage-tips.rst
Fix grammar, thanks @amd-jnovotny
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
* Rename NCCL_DISABLE_CONTEXT_TRACKING -> RCCL_DISABLE_CONTEXT_TRACKING
* Revert changes in includes and rename util function
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
2025-09-27 15:19:50 -04:00
alex-breslow-amd
45166f6586
Gate code by rocm_version ( #1945 )
2025-09-26 13:28:41 -07:00
Mustafa Abduljabbar
0dd2b2f65e
Fix extra token typo ( #1943 )
2025-09-26 11:18:43 -04:00
Mustafa Abduljabbar
7a329bbd94
Expose symbols for RCCL algo/proto/channels selection functions ( #1923 )
...
* Unhide symbols for algo/proto functions
* Add all_gather direct usage detection
2025-09-25 18:58:30 -04:00
Larry Meadows
cb14fccdcc
- LL Protocol: Add missing fences for gfx950, this fixes the hang issue ( #1932 )
...
- Remove asm flat_store_dwordx4, not needed
2025-09-25 14:07:07 -07:00
Sai Enduri
01d16d4139
Enable multi node rccl tests on MI350x slurm cluster. ( #1900 )
...
* Add tests on slurm cluster
* Integrate slurm.
* Add flags.
* Added dynamic selection of runners for tests and cleanup for slurm reservation
* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"
This reverts commit d5350ff6e4f563ddd56ad81e4bc2a393ed55ba00.
* Refactor so tests run on both architectures.
* continue on error
* fail fast false on matrix
* remove scancel
* skip all single node tests
* fix pattern matching for pytest
* switch to always skip github job
* Update to latest allocation.
* Clean up workflows and update docker image.
* Updated container image published from PR #1517
* Switch back to TheRock main branch sha.
---------
Co-authored-by: arravikum <arravikum@amd.com >
2025-09-23 22:00:26 -07:00
corey-derochie-amd
d86cf78810
Moved new functions to the bottom of the function table to maintain backward compatibility ( #1931 )
...
* Moved new functions to the bottom of the function table to maintain backward compatibility
* Added ordering fixes to api_trace.cc
2025-09-23 13:30:27 -06:00
alex-breslow-amd
8d6e21285c
Implement disassembling library into assembly with source code ( #1714 )
...
- Add --dump-asm to install.sh dump assembly from RCCL library
2025-09-23 10:11:32 -07:00
Mustafa Abduljabbar
c1e1f2faeb
Use batched P2P to enhance alltoall small message performance ( #1902 )
...
* Batch P2P operations (2 per CU/channel) and update channel-part mapping
- Revert bitreversal and fix channel mapping to be compatible with P2P batching and avoid hangs
- P2P batching is only used for more than 2 nodes to avoid aggregating intra-node traffic when it is dominant for less than 2 nodes
* Address single node regression and channel per net peer
* Add batching threshold
* Add enable switch for batching
* Update CHANGELOG.md
* Add minor comment change
* Update CHANGELOG.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
* Update CHANGELOG.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
2025-09-22 16:25:10 -04:00
Tim
ba44f170ad
Update RCCL Replayer README.md ( #1870 )
...
* Update Replayer README.md
2025-09-19 17:57:48 -04:00
corey-derochie-amd
9b04b2a42f
Added an implementation of ncclSymGetKernelPtr for when GENERATE_SYM_KERNELS is not defined, as it is normally generated code. ( #1925 )
2025-09-19 07:52:33 -06:00
corey-derochie-amd
ed095cad35
Moved latency_profiler license into subdirs and updated NOTICES. ( #1918 )
2025-09-18 12:54:39 -06:00
Atul Kulkarni
9839d1c7c8
Updated tests based on NCCL 2.27.3-1 sync ( #1892 )
2025-09-18 09:56:09 -05:00
Venkateshwar Reddy Kandula
0cc896910e
due nccl api sync update RCCL_API_TRACE_VERSION_PATCH to 2 ( #1916 )
2025-09-18 07:36:50 -06:00
Surya Periaswamy
389f794d9a
Add speriaswamy-amd to CODEOWNERS ( #1921 )
2025-09-18 07:15:21 -05:00
Nilesh M Negi
da06c69cb8
[INIT] Use rocm-smi API instead of CLI for querying FW version ( #1920 )
2025-09-17 19:17:19 -05:00
nawrinsu
0b03bb718a
Add nawrinsu to CODEOWNERS ( #1917 )
2025-09-16 23:40:51 -05:00
Laura Promberger
0f6fec1553
Bump minimum cmake version to 3.16 to enable cmake 4 ( #1909 )
...
Minimum required cmake version of test/CMakeList.txt is bumped from 2.8
to 3.16. This alignes with the version used in CMakeList.txt and will
enable building with cmake 4.
2025-09-16 23:10:22 -05:00
Weile
f64b1f409f
add weilewei to CODEOWNERS ( #1915 )
2025-09-16 10:14:18 -07:00
Karthik Ganesan
740dfd1efd
Update prims_simple.h to keep header file access to rccl_metadata.h uniform ( #1906 )
...
Header files in device/ folder are directly referenced in the code base except here.
2025-09-16 08:58:50 -05:00
Kapil S. Pawar
86a6d06e40
Added new tests for rccl_wrap - rcclOverrideProtocol, rcclOverrideAlgorithm ( #1895 )
...
* Added new unit tests for rccl_wrap
2025-09-15 18:00:26 -05:00
Bertan Dogancay
93d86dd8e3
[BUILD] Stop generating sym kernels by default ( #1907 )
...
* Stop generating sym kernels by default
2025-09-15 12:19:35 -04:00
ycui1984
da8abb2651
[MIT] Add MIT license file ( #1908 )
2025-09-12 13:37:44 -05:00
Arm Patinyasakdikul
f21fbdfc18
Fix issue where staging/mainline build commit hash doesn't match the actual RCCL commit. ( #1910 )
2025-09-11 16:13:21 -05:00
mberenjk
ada4e12360
disabling msccl for fp8 datatype ( #1888 )
...
* disabling msccl for fp8 datatype
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
2025-09-11 13:09:34 -05:00
Wenkai Du
de9ebd8a8b
Treat PIX and PXB as same GDR distance ( #1894 )
2025-09-11 10:44:10 -05:00
isaki001
9c36439354
add reduce/broadcast algo/proto selection table for multi-node gfx940 ( #1889 )
2025-09-10 14:25:23 -05:00
Wenkai Du
c2bccf9156
Enable LL128 and use same tuning table for gfx942 4 NICs ( #1898 )
2025-09-10 11:11:15 -04:00
Kapil S. Pawar
f418a4c6d0
Added new tests for rccl_wrap - rcclSetPipelining ( #1890 )
...
* Added tests for rcclSetPipelining
* Added conditions to skip the test
* Updated message size
2025-09-05 09:29:11 -05:00
Mustafa Abduljabbar
6e45eaf75e
Use add_unroll.sh in topo_expl makefile ( #1886 )
2025-09-03 09:43:16 -04:00
Mustafa Abduljabbar
7ccc6f268f
Force enable proto and/or algo after model selection ( #1799 )
...
* Force enable proto or algo
* Remove inc nccl_common.h
* Move logic and add error checks
* Fix topo_expl compatibility
* Allow algo/proto overrides
* Remove extra function decl
* Clarify warning message
* Move algo/proto overrides into separate functions
* Update CHANGELOG.md
2025-09-03 08:54:13 -04:00