Arm Patinyasakdikul
feee02ca61
Change to use -O0 instead of -O1 in debug build. ( #1949 )
...
* Change to use -O0 instead of -O1 in debug build.
* Use -O1 for device code to avoid linking issue in debug build.
2025-10-03 16:05:01 -05:00
Nilesh M Negi
342ec086e3
Revert "changes for hugepages backed host buffer for larger allocations ( #1841 )" ( #1951 )
...
This reverts commit 65b69bf318 .
2025-10-02 23:43:09 -05:00
amd-jiali
5978d2f9ab
Print out the hipRuntimeVersion message from WARN to always show up ( #1911 )
...
Authored-by: Jiali Li <jialili@amd.com >
2025-10-02 11:32:32 -05:00
dependabot[bot]
42ce371e3d
Bump rocm-docs-core from 1.22.0 to 1.26.0 in /docs/sphinx ( #1952 )
...
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core ) from 1.22.0 to 1.26.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.26.0/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.22.0...v1.26.0 )
---
updated-dependencies:
- dependency-name: rocm-docs-core
dependency-version: 1.26.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-02 11:33:14 -04:00
Istvan Kiss
3776129011
Add reference to supported data types section ( #1893 )
2025-10-01 12:36:14 +02:00
David DeBonis
d23d18f423
Adding usage tip for ignore cpu affinity ( #1948 )
...
* Adding usage tip for ignore cpu affinity
* Update docs/how-to/rccl-usage-tips.rst
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
* Update docs/how-to/rccl-usage-tips.rst
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
2025-09-29 10:11:21 -06:00
Bhuvan Mital
65b69bf318
changes for hugepages backed host buffer for larger allocations ( #1841 )
2025-09-28 00:40:22 -05:00
Artem Kuzmitckii
07925ec027
Revert disabling of context tracking for Radeon ( #1927 )
...
* Revert disabling of context tracking for Radeon
Original commit 6fc228e2
`Disable context tracking for the current version. (#1839 )`
* Add env variable for disabling of context tracking for Radeon
`export NCCL_DISABLE_CONTEXT_TRACKING=1` to force disable of context tracking
* Update docs/how-to/rccl-usage-tips.rst
Fix grammar, thanks @amd-jnovotny
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
* Rename NCCL_DISABLE_CONTEXT_TRACKING -> RCCL_DISABLE_CONTEXT_TRACKING
* Revert changes in includes and rename util function
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
2025-09-27 15:19:50 -04:00
alex-breslow-amd
45166f6586
Gate code by rocm_version ( #1945 )
2025-09-26 13:28:41 -07:00
Mustafa Abduljabbar
0dd2b2f65e
Fix extra token typo ( #1943 )
2025-09-26 11:18:43 -04:00
Mustafa Abduljabbar
7a329bbd94
Expose symbols for RCCL algo/proto/channels selection functions ( #1923 )
...
* Unhide symbols for algo/proto functions
* Add all_gather direct usage detection
2025-09-25 18:58:30 -04:00
Larry Meadows
cb14fccdcc
- LL Protocol: Add missing fences for gfx950, this fixes the hang issue ( #1932 )
...
- Remove asm flat_store_dwordx4, not needed
2025-09-25 14:07:07 -07:00
Sai Enduri
01d16d4139
Enable multi node rccl tests on MI350x slurm cluster. ( #1900 )
...
* Add tests on slurm cluster
* Integrate slurm.
* Add flags.
* Added dynamic selection of runners for tests and cleanup for slurm reservation
* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"
This reverts commit d5350ff6e4f563ddd56ad81e4bc2a393ed55ba00.
* Refactor so tests run on both architectures.
* continue on error
* fail fast false on matrix
* remove scancel
* skip all single node tests
* fix pattern matching for pytest
* switch to always skip github job
* Update to latest allocation.
* Clean up workflows and update docker image.
* Updated container image published from PR #1517
* Switch back to TheRock main branch sha.
---------
Co-authored-by: arravikum <arravikum@amd.com >
2025-09-23 22:00:26 -07:00
corey-derochie-amd
d86cf78810
Moved new functions to the bottom of the function table to maintain backward compatibility ( #1931 )
...
* Moved new functions to the bottom of the function table to maintain backward compatibility
* Added ordering fixes to api_trace.cc
2025-09-23 13:30:27 -06:00
alex-breslow-amd
8d6e21285c
Implement disassembling library into assembly with source code ( #1714 )
...
- Add --dump-asm to install.sh dump assembly from RCCL library
2025-09-23 10:11:32 -07:00
Mustafa Abduljabbar
c1e1f2faeb
Use batched P2P to enhance alltoall small message performance ( #1902 )
...
* Batch P2P operations (2 per CU/channel) and update channel-part mapping
- Revert bitreversal and fix channel mapping to be compatible with P2P batching and avoid hangs
- P2P batching is only used for more than 2 nodes to avoid aggregating intra-node traffic when it is dominant for less than 2 nodes
* Address single node regression and channel per net peer
* Add batching threshold
* Add enable switch for batching
* Update CHANGELOG.md
* Add minor comment change
* Update CHANGELOG.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
* Update CHANGELOG.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
2025-09-22 16:25:10 -04:00
Tim
ba44f170ad
Update RCCL Replayer README.md ( #1870 )
...
* Update Replayer README.md
2025-09-19 17:57:48 -04:00
corey-derochie-amd
9b04b2a42f
Added an implementation of ncclSymGetKernelPtr for when GENERATE_SYM_KERNELS is not defined, as it is normally generated code. ( #1925 )
2025-09-19 07:52:33 -06:00
corey-derochie-amd
ed095cad35
Moved latency_profiler license into subdirs and updated NOTICES. ( #1918 )
2025-09-18 12:54:39 -06:00
Atul Kulkarni
9839d1c7c8
Updated tests based on NCCL 2.27.3-1 sync ( #1892 )
2025-09-18 09:56:09 -05:00
Venkateshwar Reddy Kandula
0cc896910e
due nccl api sync update RCCL_API_TRACE_VERSION_PATCH to 2 ( #1916 )
2025-09-18 07:36:50 -06:00
Surya Periaswamy
389f794d9a
Add speriaswamy-amd to CODEOWNERS ( #1921 )
2025-09-18 07:15:21 -05:00
Nilesh M Negi
da06c69cb8
[INIT] Use rocm-smi API instead of CLI for querying FW version ( #1920 )
2025-09-17 19:17:19 -05:00
nawrinsu
0b03bb718a
Add nawrinsu to CODEOWNERS ( #1917 )
2025-09-16 23:40:51 -05:00
Laura Promberger
0f6fec1553
Bump minimum cmake version to 3.16 to enable cmake 4 ( #1909 )
...
Minimum required cmake version of test/CMakeList.txt is bumped from 2.8
to 3.16. This alignes with the version used in CMakeList.txt and will
enable building with cmake 4.
2025-09-16 23:10:22 -05:00
Weile
f64b1f409f
add weilewei to CODEOWNERS ( #1915 )
2025-09-16 10:14:18 -07:00
Karthik Ganesan
740dfd1efd
Update prims_simple.h to keep header file access to rccl_metadata.h uniform ( #1906 )
...
Header files in device/ folder are directly referenced in the code base except here.
2025-09-16 08:58:50 -05:00
Kapil S. Pawar
86a6d06e40
Added new tests for rccl_wrap - rcclOverrideProtocol, rcclOverrideAlgorithm ( #1895 )
...
* Added new unit tests for rccl_wrap
2025-09-15 18:00:26 -05:00
Bertan Dogancay
93d86dd8e3
[BUILD] Stop generating sym kernels by default ( #1907 )
...
* Stop generating sym kernels by default
2025-09-15 12:19:35 -04:00
ycui1984
da8abb2651
[MIT] Add MIT license file ( #1908 )
2025-09-12 13:37:44 -05:00
Arm Patinyasakdikul
f21fbdfc18
Fix issue where staging/mainline build commit hash doesn't match the actual RCCL commit. ( #1910 )
2025-09-11 16:13:21 -05:00
mberenjk
ada4e12360
disabling msccl for fp8 datatype ( #1888 )
...
* disabling msccl for fp8 datatype
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
2025-09-11 13:09:34 -05:00
Wenkai Du
de9ebd8a8b
Treat PIX and PXB as same GDR distance ( #1894 )
2025-09-11 10:44:10 -05:00
isaki001
9c36439354
add reduce/broadcast algo/proto selection table for multi-node gfx940 ( #1889 )
2025-09-10 14:25:23 -05:00
Wenkai Du
c2bccf9156
Enable LL128 and use same tuning table for gfx942 4 NICs ( #1898 )
2025-09-10 11:11:15 -04:00
Kapil S. Pawar
f418a4c6d0
Added new tests for rccl_wrap - rcclSetPipelining ( #1890 )
...
* Added tests for rcclSetPipelining
* Added conditions to skip the test
* Updated message size
2025-09-05 09:29:11 -05:00
Mustafa Abduljabbar
6e45eaf75e
Use add_unroll.sh in topo_expl makefile ( #1886 )
2025-09-03 09:43:16 -04:00
Mustafa Abduljabbar
7ccc6f268f
Force enable proto and/or algo after model selection ( #1799 )
...
* Force enable proto or algo
* Remove inc nccl_common.h
* Move logic and add error checks
* Fix topo_expl compatibility
* Allow algo/proto overrides
* Remove extra function decl
* Clarify warning message
* Move algo/proto overrides into separate functions
* Update CHANGELOG.md
2025-09-03 08:54:13 -04:00
ycui1984
361d596229
[rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm>=6.4.0 ( #1867 )
...
* [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm >= 6.4.0
* [rocm_regression] Check firmware version
* [rocm_regression] Resolve review comments
* [rocm_regression] Move hsa env checking into init once func
* [rocm_regression] Prevent hot fix version in firmware
* [rocm_regression] Improve unit tests
2025-08-29 11:18:23 -05:00
Bertan Dogancay
9afc15625f
Merge pull request #1880 from rahulvaidya20/2.27.3-1
...
[SYNC] 2.27.3-1
2025-08-29 12:10:12 -04:00
BertanDogancay
08a7be231b
Merge remote-tracking branch 'nccl/master' into develop
2025-08-28 15:46:28 -05:00
Avinash
a0ec15bafe
[build] Disable MSCCL++ compilation by default ( #1879 )
...
* Enable MSCCLPP on request
* Updating docs and README
* Updates to CHANGELOG.md
* Update CHANGELOG.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
* Updates to CHANGELOG.md
* Update CHANGELOG.md
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
* Update CHANGELOG.md
Github didn't take the edit to my suggestion properly.
---------
Co-authored-by: amd <amd@super3.amd.com >
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
2025-08-28 08:52:12 -06:00
Nilesh M Negi
d73cee7588
[AzureCI] Switch to ROCm 6.4.1 and add rccl-tests ( #1782 )
...
* Use ROCm 6.4.1 for testing
* Extend RCCL-Tests to multi-node
* Add HSA_NO_SCRATCH_RECLAIM to UT runs
* Limit to single-node rccl-tests for now
2025-08-27 21:07:53 -05:00
jonatluu
4699bff790
fix lintian warning package-contains-timestamped-gzip ( #1865 )
...
* fix lintian warning package-contains-timestamped-gzip
* fix lintian warning
2025-08-27 13:29:07 -04:00
Geo Min
f404624d9e
[TheRock CI] Adding single node tests for RCCL ( #1876 )
...
* Add single-node testing
* Adding single node test
* Adding quotes
* fix typo
* Adding test flag
* No MPI
* Adding openmpi install
* Adding comment
* PR comments
* Missing proj
* Adding half
* Adding rocr runtime
* Adding them all'
* new sha
* Fixing script
* Removing confusing skip test case
* Adding docs
* Update .github/workflows/therock-test-packages-single-node.yml
Co-authored-by: Marius Brehler <marius.brehler@amd.com >
---------
Co-authored-by: Marius Brehler <marius.brehler@amd.com >
2025-08-27 08:13:10 -07:00
Nusrat Islam
df448862c3
Device allocation tracker ( #1878 )
...
* alloc: add memory allocation tracker
* alloc: add tracker for ncclCuMemAlloc() APIs
* alloc: add null pointer check during free
2025-08-27 09:30:51 -05:00
Kapil S. Pawar
c9becd89cd
Code coverage tests for param.cc ( #1872 )
...
* Added code coverage unit tests for param.cc
* Updated ParamTests.cpp and removed ParamTestsConfFile.txt
* Updated ParamTests.cpp
* Removed NCCL_LOG_INFO and added sample cofig file
---------
Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com >
2025-08-27 09:30:37 -05:00
ishkool
c288fbf1b2
Code coverage tests for net_socket.cc ( #1840 )
...
* Code coverage UTs for net_socket.cc
* Addressed review comments
---------
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com >
2025-08-27 09:24:21 -05:00
Marius Brehler
221205ebd4
Bump TheRock version used for testing ( #1885 )
2025-08-27 16:22:27 +02:00
Mustafa Abduljabbar
277747c199
[Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol ( #1861 )
...
* Support pipelining codegen and template specialization
* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)
* Remove need for FUNC_INDEX_TOTAL
* Add pipeline field to device function key construction logic
* Avoid unneeded codegen for LL/LL64 kernels
* Modify conditions and add pipeline dtypes env
* Optimize selection for both gfx942 and gfx950
* Increase pipeline bitfield width
* Use __forceinline__ for all device functions
* Realign reduceCopy with original form
* Add opt-out option to enable perf debugs
* Remove force-reduce-pipelining option from README
* Update CHANGELOG.md
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
2025-08-26 15:03:54 -04:00