Commit Graph

1925 Commits

Author SHA1 Message Date
Arm Patinyasakdikul feee02ca61 Change to use -O0 instead of -O1 in debug build. (#1949)
* Change to use -O0 instead of -O1 in debug build.

* Use -O1 for device code to avoid linking issue in debug build.
2025-10-03 16:05:01 -05:00
Nilesh M Negi 342ec086e3 Revert "changes for hugepages backed host buffer for larger allocations (#1841)" (#1951)
This reverts commit 65b69bf318.
2025-10-02 23:43:09 -05:00
amd-jiali 5978d2f9ab Print out the hipRuntimeVersion message from WARN to always show up (#1911)
Authored-by: Jiali Li <jialili@amd.com>
2025-10-02 11:32:32 -05:00
dependabot[bot] 42ce371e3d Bump rocm-docs-core from 1.22.0 to 1.26.0 in /docs/sphinx (#1952)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.26.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.22.0...v1.26.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-02 11:33:14 -04:00
Istvan Kiss 3776129011 Add reference to supported data types section (#1893) 2025-10-01 12:36:14 +02:00
David DeBonis d23d18f423 Adding usage tip for ignore cpu affinity (#1948)
* Adding usage tip for ignore cpu affinity

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update docs/how-to/rccl-usage-tips.rst

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-29 10:11:21 -06:00
Bhuvan Mital 65b69bf318 changes for hugepages backed host buffer for larger allocations (#1841) 2025-09-28 00:40:22 -05:00
Artem Kuzmitckii 07925ec027 Revert disabling of context tracking for Radeon (#1927)
* Revert disabling of context tracking for Radeon

Original commit 6fc228e2
 `Disable context tracking for the current version. (#1839)`

* Add env variable for disabling of context tracking for Radeon

`export NCCL_DISABLE_CONTEXT_TRACKING=1` to force disable of context tracking

* Update docs/how-to/rccl-usage-tips.rst

Fix grammar, thanks @amd-jnovotny

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Rename NCCL_DISABLE_CONTEXT_TRACKING -> RCCL_DISABLE_CONTEXT_TRACKING

* Revert changes in includes and rename util function

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-27 15:19:50 -04:00
alex-breslow-amd 45166f6586 Gate code by rocm_version (#1945) 2025-09-26 13:28:41 -07:00
Mustafa Abduljabbar 0dd2b2f65e Fix extra token typo (#1943) 2025-09-26 11:18:43 -04:00
Mustafa Abduljabbar 7a329bbd94 Expose symbols for RCCL algo/proto/channels selection functions (#1923)
* Unhide symbols for algo/proto functions

* Add all_gather direct usage detection
2025-09-25 18:58:30 -04:00
Larry Meadows cb14fccdcc - LL Protocol: Add missing fences for gfx950, this fixes the hang issue (#1932)
- Remove asm flat_store_dwordx4, not needed
2025-09-25 14:07:07 -07:00
Sai Enduri 01d16d4139 Enable multi node rccl tests on MI350x slurm cluster. (#1900)
* Add tests on slurm cluster

* Integrate slurm.

* Add flags.

* Added dynamic selection of runners for tests and cleanup for slurm reservation

* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"

This reverts commit d5350ff6e4f563ddd56ad81e4bc2a393ed55ba00.

* Refactor so tests run on both architectures.

* continue on error

* fail fast false on matrix

* remove scancel

* skip all single node tests

* fix pattern matching for pytest

* switch to always skip github job

* Update to latest allocation.

* Clean up workflows and update docker image.

* Updated container image published from PR #1517

* Switch back to TheRock main branch sha.

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-09-23 22:00:26 -07:00
corey-derochie-amd d86cf78810 Moved new functions to the bottom of the function table to maintain backward compatibility (#1931)
* Moved new functions to the bottom of the function table to maintain backward compatibility

* Added ordering fixes to api_trace.cc
2025-09-23 13:30:27 -06:00
alex-breslow-amd 8d6e21285c Implement disassembling library into assembly with source code (#1714)
- Add --dump-asm to install.sh dump assembly from RCCL library
2025-09-23 10:11:32 -07:00
Mustafa Abduljabbar c1e1f2faeb Use batched P2P to enhance alltoall small message performance (#1902)
* Batch P2P operations (2 per CU/channel) and update channel-part mapping

- Revert bitreversal and fix channel mapping to be compatible with P2P batching and avoid hangs

- P2P batching is only used for more than 2 nodes to avoid aggregating intra-node traffic when it is dominant for less than 2 nodes

* Address single node regression and channel per net peer

* Add batching threshold

* Add enable switch for batching

* Update CHANGELOG.md

* Add minor comment change

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-09-22 16:25:10 -04:00
Tim ba44f170ad Update RCCL Replayer README.md (#1870)
* Update Replayer README.md
2025-09-19 17:57:48 -04:00
corey-derochie-amd 9b04b2a42f Added an implementation of ncclSymGetKernelPtr for when GENERATE_SYM_KERNELS is not defined, as it is normally generated code. (#1925) 2025-09-19 07:52:33 -06:00
corey-derochie-amd ed095cad35 Moved latency_profiler license into subdirs and updated NOTICES. (#1918) 2025-09-18 12:54:39 -06:00
Atul Kulkarni 9839d1c7c8 Updated tests based on NCCL 2.27.3-1 sync (#1892) 2025-09-18 09:56:09 -05:00
Venkateshwar Reddy Kandula 0cc896910e due nccl api sync update RCCL_API_TRACE_VERSION_PATCH to 2 (#1916) 2025-09-18 07:36:50 -06:00
Surya Periaswamy 389f794d9a Add speriaswamy-amd to CODEOWNERS (#1921) 2025-09-18 07:15:21 -05:00
Nilesh M Negi da06c69cb8 [INIT] Use rocm-smi API instead of CLI for querying FW version (#1920) 2025-09-17 19:17:19 -05:00
nawrinsu 0b03bb718a Add nawrinsu to CODEOWNERS (#1917) 2025-09-16 23:40:51 -05:00
Laura Promberger 0f6fec1553 Bump minimum cmake version to 3.16 to enable cmake 4 (#1909)
Minimum required cmake version of test/CMakeList.txt is bumped from 2.8
to 3.16. This alignes with the version used in CMakeList.txt and will
enable building with cmake 4.
2025-09-16 23:10:22 -05:00
Weile f64b1f409f add weilewei to CODEOWNERS (#1915) 2025-09-16 10:14:18 -07:00
Karthik Ganesan 740dfd1efd Update prims_simple.h to keep header file access to rccl_metadata.h uniform (#1906)
Header files in device/ folder are directly referenced in the code base except here.
2025-09-16 08:58:50 -05:00
Kapil S. Pawar 86a6d06e40 Added new tests for rccl_wrap - rcclOverrideProtocol, rcclOverrideAlgorithm (#1895)
* Added new unit tests for rccl_wrap
2025-09-15 18:00:26 -05:00
Bertan Dogancay 93d86dd8e3 [BUILD] Stop generating sym kernels by default (#1907)
* Stop generating sym kernels by default
2025-09-15 12:19:35 -04:00
ycui1984 da8abb2651 [MIT] Add MIT license file (#1908) 2025-09-12 13:37:44 -05:00
Arm Patinyasakdikul f21fbdfc18 Fix issue where staging/mainline build commit hash doesn't match the actual RCCL commit. (#1910) 2025-09-11 16:13:21 -05:00
mberenjk ada4e12360 disabling msccl for fp8 datatype (#1888)
* disabling msccl for fp8 datatype

---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-09-11 13:09:34 -05:00
Wenkai Du de9ebd8a8b Treat PIX and PXB as same GDR distance (#1894) 2025-09-11 10:44:10 -05:00
isaki001 9c36439354 add reduce/broadcast algo/proto selection table for multi-node gfx940 (#1889) 2025-09-10 14:25:23 -05:00
Wenkai Du c2bccf9156 Enable LL128 and use same tuning table for gfx942 4 NICs (#1898) 2025-09-10 11:11:15 -04:00
Kapil S. Pawar f418a4c6d0 Added new tests for rccl_wrap - rcclSetPipelining (#1890)
* Added tests for rcclSetPipelining

* Added conditions to skip the test

* Updated message size
2025-09-05 09:29:11 -05:00
Mustafa Abduljabbar 6e45eaf75e Use add_unroll.sh in topo_expl makefile (#1886) 2025-09-03 09:43:16 -04:00
Mustafa Abduljabbar 7ccc6f268f Force enable proto and/or algo after model selection (#1799)
* Force enable proto or algo

* Remove inc nccl_common.h

* Move logic and add error checks

* Fix topo_expl compatibility

* Allow algo/proto overrides

* Remove extra function decl

* Clarify warning message

* Move algo/proto overrides into separate functions

* Update CHANGELOG.md
2025-09-03 08:54:13 -04:00
ycui1984 361d596229 [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm>=6.4.0 (#1867)
* [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm >= 6.4.0
* [rocm_regression] Check firmware version
* [rocm_regression] Resolve review comments
* [rocm_regression] Move hsa env checking into init once func
* [rocm_regression] Prevent hot fix version in firmware
* [rocm_regression] Improve unit tests
2025-08-29 11:18:23 -05:00
Bertan Dogancay 9afc15625f Merge pull request #1880 from rahulvaidya20/2.27.3-1
[SYNC] 2.27.3-1
2025-08-29 12:10:12 -04:00
BertanDogancay 08a7be231b Merge remote-tracking branch 'nccl/master' into develop 2025-08-28 15:46:28 -05:00
Avinash a0ec15bafe [build] Disable MSCCL++ compilation by default (#1879)
* Enable MSCCLPP on request

* Updating docs and README

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

* Updates to CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* Update CHANGELOG.md

Github didn't take the edit to my suggestion properly.

---------

Co-authored-by: amd <amd@super3.amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2025-08-28 08:52:12 -06:00
Nilesh M Negi d73cee7588 [AzureCI] Switch to ROCm 6.4.1 and add rccl-tests (#1782)
* Use ROCm 6.4.1 for testing
* Extend RCCL-Tests to multi-node
* Add HSA_NO_SCRATCH_RECLAIM to UT runs
* Limit to single-node rccl-tests for now
2025-08-27 21:07:53 -05:00
jonatluu 4699bff790 fix lintian warning package-contains-timestamped-gzip (#1865)
* fix lintian warning package-contains-timestamped-gzip

* fix lintian warning
2025-08-27 13:29:07 -04:00
Geo Min f404624d9e [TheRock CI] Adding single node tests for RCCL (#1876)
* Add single-node testing

* Adding single node test

* Adding quotes

* fix typo

* Adding test flag

* No MPI

* Adding openmpi install

* Adding comment

* PR comments

* Missing proj

* Adding half

* Adding rocr runtime

* Adding them all'

* new sha

* Fixing script

* Removing confusing skip test case

* Adding docs

* Update .github/workflows/therock-test-packages-single-node.yml

Co-authored-by: Marius Brehler <marius.brehler@amd.com>

---------

Co-authored-by: Marius Brehler <marius.brehler@amd.com>
2025-08-27 08:13:10 -07:00
Nusrat Islam df448862c3 Device allocation tracker (#1878)
* alloc: add memory allocation tracker

* alloc: add tracker for ncclCuMemAlloc() APIs

* alloc: add null pointer check during free
2025-08-27 09:30:51 -05:00
Kapil S. Pawar c9becd89cd Code coverage tests for param.cc (#1872)
* Added code coverage unit tests for param.cc

* Updated ParamTests.cpp and removed ParamTestsConfFile.txt

* Updated ParamTests.cpp

* Removed NCCL_LOG_INFO and added sample cofig file

---------

Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com>
2025-08-27 09:30:37 -05:00
ishkool c288fbf1b2 Code coverage tests for net_socket.cc (#1840)
* Code coverage UTs for net_socket.cc

* Addressed review comments

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-27 09:24:21 -05:00
Marius Brehler 221205ebd4 Bump TheRock version used for testing (#1885) 2025-08-27 16:22:27 +02:00
Mustafa Abduljabbar 277747c199 [Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861)
* Support pipelining codegen and template specialization

* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)

* Remove need for FUNC_INDEX_TOTAL

* Add pipeline field to device function key construction logic

* Avoid unneeded codegen for LL/LL64 kernels

* Modify conditions and add pipeline dtypes env

* Optimize selection for both gfx942 and gfx950

* Increase pipeline bitfield width

* Use __forceinline__ for all device functions

* Realign reduceCopy with original form

* Add opt-out option to enable perf debugs

* Remove force-reduce-pipelining option from README

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-08-26 15:03:54 -04:00