Grafico dei commit

1942 Commit

Autore SHA1 Messaggio Data
alex-breslow-amd ff209e5b19 Dump compiler-determined GPU kernel resource usage (#1965)
Adds --kernel-resource-use flag to install.sh to allow dumping per-GPU kernel resource use at compile time (e.g., VGPRs, LDS, SGPRs, scratch, etc.)
2025-10-13 11:24:42 -05:00
Geo Min 97f2665da2 fixing group id (#1975) 2025-10-10 16:40:44 -07:00
Mythreya Kuricheti 3000f0e837 [rocprofiler-sdk] Add codeowner for api-trace.h (#1933) 2025-10-10 16:29:17 -05:00
Arm Patinyasakdikul ff75860d73 Fix unroll factor display bug. (#1969) 2025-10-10 15:35:06 -05:00
Surya Periaswamy 5bd5079de1 MSCCL++ fix split path null deref (#1959)
* Add speriaswamy-amd to CODEOWNERS
* MSCCL++: fix split path null deref; key maps by parent ncclUniqueId
* removed no-op
2025-10-09 14:08:38 -05:00
Rahul Vaidya 6b200ee6c5 Fix LL128 proto selection to respect user setting (#1822) 2025-10-09 14:08:03 -05:00
Nusrat Islam d22a39e954 Update direct AG and single node LL threshold (#1944)
* update AG direct and single node LL threshold

* update thresholds based on MI350 expeirmental results

* disable using LL for direct AG

* enable direct AG for lower GPU counts

* direct AG single node tuning

* fix in-place buffer allocation for AG unit test

* whitespace fix

* gate direct AG for gfx950 and gfx942

---------

Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com>
2025-10-09 10:48:50 -05:00
Artem Kuzmitckii 00a42c80f3 Reverse logic of context tracking enablement from #1927 (#1971)
In this commit it disabled by default and can be enabled via
`RCCL_ENABLE_CONTEXT_TRACKING=1` for both (CDNA, RDNA)
Original PR https://github.com/ROCm/rccl/pull/1927
2025-10-09 10:24:09 +02:00
Arm Patinyasakdikul cede6d0134 Revert "Change to use -O0 instead of -O1 in debug build. (#1949)" (#1957)
This reverts commit feee02ca61.
2025-10-08 10:01:45 -05:00
Aravind Ravikumar 1858a31c41 Enable Presubmit CI Gating for develop Branch (TheRock CI for RCCL) (#1954)
* Trigger CI run on pull request

* Enabling CI run on different PR types

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-10-07 09:11:50 -04:00
corey-derochie-amd b1fbf535da [SYNC] 2.27.7 (#1928)
Merge pull request #1928 from corey-derochie-amd/2.27.7-sync
2025-10-06 16:47:50 -06:00
BertanDogancay 3f94267f21 Merge remote-tracking branch 'nccl/master' into develop 2025-10-06 18:36:49 -04:00
Arm Patinyasakdikul feee02ca61 Change to use -O0 instead of -O1 in debug build. (#1949)
* Change to use -O0 instead of -O1 in debug build.

* Use -O1 for device code to avoid linking issue in debug build.
2025-10-03 16:05:01 -05:00
Nilesh M Negi 342ec086e3 Revert "changes for hugepages backed host buffer for larger allocations (#1841)" (#1951)
This reverts commit 65b69bf318.
2025-10-02 23:43:09 -05:00
amd-jiali 5978d2f9ab Print out the hipRuntimeVersion message from WARN to always show up (#1911)
Authored-by: Jiali Li <jialili@amd.com>
2025-10-02 11:32:32 -05:00
dependabot[bot] 42ce371e3d Bump rocm-docs-core from 1.22.0 to 1.26.0 in /docs/sphinx (#1952)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.26.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.22.0...v1.26.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-02 11:33:14 -04:00
Istvan Kiss 3776129011 Add reference to supported data types section (#1893) 2025-10-01 12:36:14 +02:00
David DeBonis d23d18f423 Adding usage tip for ignore cpu affinity (#1948)
* Adding usage tip for ignore cpu affinity

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-29 10:11:21 -06:00
Bhuvan Mital 65b69bf318 changes for hugepages backed host buffer for larger allocations (#1841) 2025-09-28 00:40:22 -05:00
Artem Kuzmitckii 07925ec027 Revert disabling of context tracking for Radeon (#1927)
* Revert disabling of context tracking for Radeon

Original commit 6fc228e2
 `Disable context tracking for the current version. (#1839)`

* Add env variable for disabling of context tracking for Radeon

`export NCCL_DISABLE_CONTEXT_TRACKING=1` to force disable of context tracking

* Update docs/how-to/rccl-usage-tips.rst

Fix grammar, thanks @amd-jnovotny

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Rename NCCL_DISABLE_CONTEXT_TRACKING -> RCCL_DISABLE_CONTEXT_TRACKING

* Revert changes in includes and rename util function

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-27 15:19:50 -04:00
alex-breslow-amd 45166f6586 Gate code by rocm_version (#1945) 2025-09-26 13:28:41 -07:00
Mustafa Abduljabbar 0dd2b2f65e Fix extra token typo (#1943) 2025-09-26 11:18:43 -04:00
Mustafa Abduljabbar 7a329bbd94 Expose symbols for RCCL algo/proto/channels selection functions (#1923)
* Unhide symbols for algo/proto functions

* Add all_gather direct usage detection
2025-09-25 18:58:30 -04:00
Larry Meadows cb14fccdcc - LL Protocol: Add missing fences for gfx950, this fixes the hang issue (#1932)
- Remove asm flat_store_dwordx4, not needed
2025-09-25 14:07:07 -07:00
Sai Enduri 01d16d4139 Enable multi node rccl tests on MI350x slurm cluster. (#1900)
* Add tests on slurm cluster

* Integrate slurm.

* Add flags.

* Added dynamic selection of runners for tests and cleanup for slurm reservation

* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"

This reverts commit d5350ff6e4f563ddd56ad81e4bc2a393ed55ba00.

* Refactor so tests run on both architectures.

* continue on error

* fail fast false on matrix

* remove scancel

* skip all single node tests

* fix pattern matching for pytest

* switch to always skip github job

* Update to latest allocation.

* Clean up workflows and update docker image.

* Updated container image published from PR #1517

* Switch back to TheRock main branch sha.

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-09-23 22:00:26 -07:00
corey-derochie-amd d86cf78810 Moved new functions to the bottom of the function table to maintain backward compatibility (#1931)
* Moved new functions to the bottom of the function table to maintain backward compatibility

* Added ordering fixes to api_trace.cc
2025-09-23 13:30:27 -06:00
alex-breslow-amd 8d6e21285c Implement disassembling library into assembly with source code (#1714)
- Add --dump-asm to install.sh dump assembly from RCCL library
2025-09-23 10:11:32 -07:00
Mustafa Abduljabbar c1e1f2faeb Use batched P2P to enhance alltoall small message performance (#1902)
* Batch P2P operations (2 per CU/channel) and update channel-part mapping

- Revert bitreversal and fix channel mapping to be compatible with P2P batching and avoid hangs

- P2P batching is only used for more than 2 nodes to avoid aggregating intra-node traffic when it is dominant for less than 2 nodes

* Address single node regression and channel per net peer

* Add batching threshold

* Add enable switch for batching

* Update CHANGELOG.md

* Add minor comment change

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-22 16:25:10 -04:00
Tim ba44f170ad Update RCCL Replayer README.md (#1870)
* Update Replayer README.md
2025-09-19 17:57:48 -04:00
corey-derochie-amd 9b04b2a42f Added an implementation of ncclSymGetKernelPtr for when GENERATE_SYM_KERNELS is not defined, as it is normally generated code. (#1925) 2025-09-19 07:52:33 -06:00
corey-derochie-amd ed095cad35 Moved latency_profiler license into subdirs and updated NOTICES. (#1918) 2025-09-18 12:54:39 -06:00
Atul Kulkarni 9839d1c7c8 Updated tests based on NCCL 2.27.3-1 sync (#1892) 2025-09-18 09:56:09 -05:00
Venkateshwar Reddy Kandula 0cc896910e due nccl api sync update RCCL_API_TRACE_VERSION_PATCH to 2 (#1916) 2025-09-18 07:36:50 -06:00
Surya Periaswamy 389f794d9a Add speriaswamy-amd to CODEOWNERS (#1921) 2025-09-18 07:15:21 -05:00
Nilesh M Negi da06c69cb8 [INIT] Use rocm-smi API instead of CLI for querying FW version (#1920) 2025-09-17 19:17:19 -05:00
nawrinsu 0b03bb718a Add nawrinsu to CODEOWNERS (#1917) 2025-09-16 23:40:51 -05:00
Laura Promberger 0f6fec1553 Bump minimum cmake version to 3.16 to enable cmake 4 (#1909)
Minimum required cmake version of test/CMakeList.txt is bumped from 2.8
to 3.16. This alignes with the version used in CMakeList.txt and will
enable building with cmake 4.
2025-09-16 23:10:22 -05:00
Weile f64b1f409f add weilewei to CODEOWNERS (#1915) 2025-09-16 10:14:18 -07:00
Karthik Ganesan 740dfd1efd Update prims_simple.h to keep header file access to rccl_metadata.h uniform (#1906)
Header files in device/ folder are directly referenced in the code base except here.
2025-09-16 08:58:50 -05:00
Kapil S. Pawar 86a6d06e40 Added new tests for rccl_wrap - rcclOverrideProtocol, rcclOverrideAlgorithm (#1895)
* Added new unit tests for rccl_wrap
2025-09-15 18:00:26 -05:00
Bertan Dogancay 93d86dd8e3 [BUILD] Stop generating sym kernels by default (#1907)
* Stop generating sym kernels by default
2025-09-15 12:19:35 -04:00
ycui1984 da8abb2651 [MIT] Add MIT license file (#1908) 2025-09-12 13:37:44 -05:00
Arm Patinyasakdikul f21fbdfc18 Fix issue where staging/mainline build commit hash doesn't match the actual RCCL commit. (#1910) 2025-09-11 16:13:21 -05:00
mberenjk ada4e12360 disabling msccl for fp8 datatype (#1888)
* disabling msccl for fp8 datatype

---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-09-11 13:09:34 -05:00
Wenkai Du de9ebd8a8b Treat PIX and PXB as same GDR distance (#1894) 2025-09-11 10:44:10 -05:00
isaki001 9c36439354 add reduce/broadcast algo/proto selection table for multi-node gfx940 (#1889) 2025-09-10 14:25:23 -05:00
Wenkai Du c2bccf9156 Enable LL128 and use same tuning table for gfx942 4 NICs (#1898) 2025-09-10 11:11:15 -04:00
Kapil S. Pawar f418a4c6d0 Added new tests for rccl_wrap - rcclSetPipelining (#1890)
* Added tests for rcclSetPipelining

* Added conditions to skip the test

* Updated message size
2025-09-05 09:29:11 -05:00
Mustafa Abduljabbar 6e45eaf75e Use add_unroll.sh in topo_expl makefile (#1886) 2025-09-03 09:43:16 -04:00
Mustafa Abduljabbar 7ccc6f268f Force enable proto and/or algo after model selection (#1799)
* Force enable proto or algo

* Remove inc nccl_common.h

* Move logic and add error checks

* Fix topo_expl compatibility

* Allow algo/proto overrides

* Remove extra function decl

* Clarify warning message

* Move algo/proto overrides into separate functions

* Update CHANGELOG.md
2025-09-03 08:54:13 -04:00