Nilesh M Negi
7c422271a8
[MSCCLPP] Disable MSCCLPP Executor ( #1744 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl commit: 92a5d225d9 ]
2025-06-17 01:29:55 -05:00
Sarat Kamisetty
e359e834f5
generic net plugin ctxt that is extensible for use in multiple APIs ( #1735 )
...
Co-authored-by: Sarat Kamisetty <sakamiset@amd.com >
[ROCm/rccl commit: fa0422f174 ]
2025-06-16 14:48:08 -07:00
Bertan Dogancay
9fa53cc454
[NPKit] Use default output directory when env var is not set ( #1747 )
...
[ROCm/rccl commit: 39211c6b41 ]
2025-06-16 15:26:53 -04:00
Mustafa Abduljabbar
3e5dc99aa6
Fix topo_explorer compatibility and capture WarpSize ( #1743 )
...
[ROCm/rccl commit: fb4ad82d0d ]
2025-06-16 08:18:35 -04:00
Tim
7051f217a7
replayer update v0 ( #1733 )
...
* First version of new replayer, with comments on future TODOs
* plus minor fixes for UT
* Updated format of recorder, especially in binary department, according to replayer's need
[ROCm/rccl commit: ba97c9c18b ]
2025-06-13 15:05:34 -04:00
Richard Barnes
2c0cc20a76
Enable -Wdeprecated-copy-with-user-provided-copy ( #1643 )
...
[ROCm/rccl commit: 4486d091b8 ]
2025-06-13 08:23:31 -07:00
Arm Patinyasakdikul
7f7f1cede3
Added missing copyright message. ( #1742 )
...
* Added missing copyright message.
* addressed comments.
[ROCm/rccl commit: 6c37ae9470 ]
2025-06-12 09:58:01 -05:00
corey-derochie-amd
2e7aa3556e
Deprecated MSCCL API functions ( #1740 )
...
[ROCm/rccl commit: 03fba66e71 ]
2025-06-11 17:52:09 -06:00
Nusrat Islam
99813a3288
msccl: adjust msccl threshold for bf16 ( #1736 )
...
* msccl: adjust msccl threshold for bf16
* Update src/collectives.cc
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
---------
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
[ROCm/rccl commit: 75c3c8215c ]
2025-06-11 09:09:57 -05:00
Arm Patinyasakdikul
69f7167b74
Fixed errorneous parenthesis. ( #1739 )
...
[ROCm/rccl commit: 600ace7f19 ]
2025-06-11 09:08:00 -05:00
Nilesh M Negi
4cadf3597c
[DEVICE] Adding ability to choose unroll factor at runtime ( #1734 )
...
* Adding runtime unroll factor selection via RCCL_UNROLL_FACTOR
* [BUILD] Add support for user-defined UNROLL for debugging
* Update CHANGELOG.md
* Fix COLLTRACE errors in CI
* Add debug statements for unroll and resolve warnings
* Incorporate UNROLL into ONLY_FUNCS for debugging
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com >
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
[ROCm/rccl commit: 9d72be7b2f ]
2025-06-11 00:07:59 -05:00
Atul Kulkarni
4cd71722f2
Added new ENABLE_CODE_COVERAGE option. ( #1664 )
...
Modified install.sh script to add this new option
[ROCm/rccl commit: 682ed36fe6 ]
2025-06-10 12:12:36 -05:00
Nilesh M Negi
b797b62f6b
[DEVICE] Use threadfence on gfx950 for LL protocol ( #1686 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl commit: b926203c05 ]
2025-06-09 01:26:07 -05:00
Nilesh M Negi
7abc3160e7
[BUILD] Enable LL128 on gfx950 ( #1731 )
...
* [BUILD] Enable LL128 on gfx950
* Modify comment in src/rccl_wrap.cc
* Update CHANGELOG
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
[ROCm/rccl commit: ef5b4ff630 ]
2025-06-09 00:25:54 -05:00
vstojilj
bbe7422279
SWDEV-536040 - Include <thread> header ( #1724 )
...
[ROCm/rccl commit: 2ac44cfe4e ]
2025-06-06 10:28:11 -06:00
Arm Patinyasakdikul
f65777536f
Remove 'warpSize' compiler constant as it is deprecated in ROCm 7.0. ( #1720 )
...
* Remove 'warpSize' compiler constant as it is deprecated in ROCm 7.0.
* Create ncclShmemScratchWarpSize on host side for enqueue.cc.
* Update src/enqueue.cc
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
* address comments
* fix number of threads
---------
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
[ROCm/rccl commit: ec6efa9b26 ]
2025-06-06 07:34:43 -05:00
Arm Patinyasakdikul
8dd9747504
Increase default WORK_FIFO size to accommodate larger alltoall. ( #1722 )
...
[ROCm/rccl commit: d5b5f6b159 ]
2025-06-05 09:02:45 -05:00
Pedram Alizadeh
1ace5d05ed
Reapplying PR #1641 [AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 ( #1713 )
...
* Reapply "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641 )"
This reverts commit 943ad6f7820739385a0b54e81f823d0df1dbf71c.
* Decreasing NCCL_LL128_SHMEM_ELEMS_PER_THREAD from 16 to 8
[ROCm/rccl commit: 3f7c08648f ]
2025-06-04 13:22:11 -04:00
Avinash
a50ff2c3d3
SPLITCOMM design fix in src/misc/msccl ( #1715 )
...
* Fix TOC-TOU in mcclInit
* Improving vector resize thread safety
* Initial commit rank to comm change
* Removing unwanted include header changes
* Updated CHANGELOG.md
* Update CHANGELOG.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
[ROCm/rccl commit: e94b360246 ]
2025-06-01 21:00:38 -05:00
alex-breslow-amd
4277b5aa88
Use One Slice per Basic Primitive for AllReduce, ReduceScatter, AllGather ( #1681 ) for Single Node on Some GFX9 Systems
...
Using a single slice rather than the typical two provides about 5% speedup (sometimes more or less) on some GFX9 systems for single node.
[ROCm/rccl commit: 2f6b20c00a ]
2025-05-29 16:17:35 -07:00
Nilesh M Negi
19ed482121
Re-apply unroll=1 and 112 channels for gfx950 ( #1706 )
...
* Reapply "[SRC] Enable unroll=1 for gfx950 (#1602 )" (#1667 )
This reverts commit a6972c0d09 .
* Reapply "[GRAPH] Increase default nChannels to 112 for gfx950 (#1596 )" (#1620 )
This reverts commit 1a2eca1756 .
[ROCm/rccl commit: 12517a957e ]
2025-05-28 14:58:10 -05:00
corey-derochie-amd
22120c6303
Fixed errors in the CHANGELOG for ROCm 7.0 ( #1702 )
...
* Updated 6.5 release to be 7.0
* Corrected the RCCL version for 6.4.1
* Moved items to the correct releases
* Added NCCL 2.25.1 compatibility item
* Fixed wording
* Added entry for `ManagedMem` and `ManagedMemGraph` test fix
[ROCm/rccl commit: 7b633d5844 ]
2025-05-23 15:47:59 -05:00
akolliasAMD
6e2f75d424
remove user from code owner file ( #1709 )
...
[ROCm/rccl commit: aabd181fe4 ]
2025-05-23 15:45:15 -05:00
Arm Patinyasakdikul
59597ad8a7
Test: bump max stacksize once again to match current expectation.
...
[ROCm/rccl commit: c07445d5b4 ]
2025-05-23 11:18:25 -05:00
alex-breslow-amd
056ca0edfa
Make offload-compress the default ( #1704 )
...
* Make offload-compress the default
* Add guard for --offload-compress since it was introduced in ROCm 6.2
* Address some of Nilesh's feedback.
* Reorganize for code cleanliness
* Improve comment
* Compress gpu code at link and compile time
[ROCm/rccl commit: f5b44acb1b ]
2025-05-22 22:33:25 -05:00
Nilesh M Negi
7803531f46
[DOCKER] Fix RCCL and RCCL-Tests build for stg1 base images ( #1699 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl commit: 948d2b6a68 ]
2025-05-22 20:46:01 -05:00
Arm Patinyasakdikul
2cb65ba466
Test: Change max stack size to 520 to accomodate new ROCm changes.
...
[ROCm/rccl commit: 523e0893e4 ]
2025-05-21 20:21:27 -05:00
PedramAlizadeh
a99f960742
Revert "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 ( #1641 )"
...
This reverts commit 951ed9cde1 .
[ROCm/rccl commit: 7f878baef0 ]
2025-05-21 20:21:27 -05:00
isaki001
89bc9131aa
fix improper patch reverse order ( #1696 )
...
[ROCm/rccl commit: 66ef428714 ]
2025-05-19 12:29:21 -05:00
Arm Patinyasakdikul
1313bccaca
CHANGELOG.md: Add UT failures as known issue for 6.4.1. ( #1698 )
...
* CHANGELOG.md: Add UT failures as known issue for 6.4.1.
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
[ROCm/rccl commit: 1710c27e77 ]
2025-05-19 10:40:50 -05:00
Arm Patinyasakdikul
3e16753c71
Added known issue for 6.4.1 release to CHANGELOG.md. ( #1697 )
...
[ROCm/rccl commit: e602497789 ]
2025-05-16 08:17:48 -05:00
Sam Wu
0db42fb854
Remove call to junit from math ci ( #1691 )
...
[ROCm/rccl commit: e5bf7bc5b1 ]
2025-05-15 14:45:49 -06:00
Arm Patinyasakdikul
4b5ff98d65
Change GPU references to gfx950. ( #1695 )
...
[ROCm/rccl commit: f306c00671 ]
2025-05-15 10:32:46 -05:00
corey-derochie-amd
65d67dce7a
Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues. ( #1546 )
...
* Revert "Revert "replacing rccl_float8 with hip_fp8 and address compatibility …"
This reverts commit 30eecfdb25 .
* [UT] Modify max stack size to 496
* adding a check for OCP type and replacing ROCM_VERSION with HIP_VERSION
* addressing the ci failure
* Adding the device tag
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
[ROCm/rccl commit: 170acf3bda ]
2025-05-14 15:33:03 -05:00
Mustafa Abduljabbar
951ed9cde1
[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 ( #1641 )
...
* Update LL128 elems per thread
* Precompute ix[g] in LL128 prim
* Make Threadthreshold part of tuning models
* Ignore channel tuning when channels are env controlled
* Tune LL128 max limit for AG
* Tune LL128 max limit for RS
* Retune AR LL128 limits due to changes
* Update CHANGELOG.md
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
[ROCm/rccl commit: 00c1eb098c ]
2025-05-14 14:35:54 -05:00
Dingming Wu
3731cae1b7
Detect if HSA_NO_SCRATCH_RECLAIM is set after initEnv() ( #1683 )
...
* Detect if HSA_NO_SCRATCH_RECLAIM is set after initEnv()
For rocm older than 6.4, we need to set HSA_NO_SCRATCH_RECLAIM=1 to use LL128 protocol.
This Env is set outside of RCCL, add the logging to detect whether its set during runtime.
* check hip runtime ver via hipRuntimeGetVersion
* move the detection to ncclinit func
* correct rocm version integer
* update warning message
* avoid unnecessary info msg on hsa_no_scratch_reclaim detection
[ROCm/rccl commit: 51f87fbb43 ]
2025-05-14 10:12:45 -05:00
mberenjk
08c0b8b0fc
moving the thread_fence to apply before atomic fetch ( #1672 )
...
* applying thread_fence only on warp 0 before atomic fetch
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
[ROCm/rccl commit: 1cefcee51f ]
2025-05-14 10:10:05 -05:00
Mustafa Abduljabbar
128b0e7074
Remove MSCCL single node AllGather XMLs ( #1693 )
...
* Remove MSCCL single node XMLs
* Remove comment on MSCCL AG single node support
[ROCm/rccl commit: d665547eef ]
2025-05-13 17:07:03 -05:00
Nikhil-Nunna
ad657d957a
Updated Codeowners ( #1692 )
...
[ROCm/rccl commit: a72a1939d1 ]
2025-05-12 18:58:39 -05:00
gilbertlee-amd
6e57154001
Fix when more than 64 channels are used for multi-collective group calls ( #1688 )
...
* Fix when more than 64 channels are used for multi-collective group calls
* Update CHANGELOG.md
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com >
[ROCm/rccl commit: 9ef45df8f7 ]
2025-05-12 18:05:57 -05:00
Avinash
6d6dd8434a
RCCL Multinode DMA Buffer crash fix ( #1682 )
...
This commit handles DMABUF initialization and call appropriate handling function. This fixes crash in OS with no peermem support and relying on only DMABUF.
* Initial test commit
* Handling Dmabuf_fd opening and closing
* Cleanup
* Use DMABuff or Peermem as needed
* Using user input for ibDmaBufSupportInitOnce
* Revert all changes to rocmwrap.cc
* Revert all changes to rocmwrap.cc
* Changing to func definition braces
* Reverting line removal in utils.h
* useDmaBuf to calculate flushEnabled
[ROCm/rccl commit: 5f6805b4f4 ]
2025-05-08 19:17:39 -05:00
mberenjk
743cc971d3
Write JSON file to /tmp directory to avoid incorrect write access in recorderTest ( #1680 )
...
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
[ROCm/rccl commit: e70003736e ]
2025-05-07 13:58:27 -05:00
Avinash
c81ea25407
collective trace improvements for debugging ( #1661 )
...
[ROCm/rccl commit: c54a0c085a ]
2025-05-07 13:37:31 -05:00
Bertan Dogancay
c75ebd9147
Merge pull request #1662 from BertanDogancay/2.25
...
[SYNC] 2.25.1-1
[ROCm/rccl commit: 590ad6acc2 ]
2025-05-06 09:39:09 -04:00
Mustafa Abduljabbar
750bd73047
Add missing MACRO to topo_expl ( #1677 )
...
* Fix header compatibility
[ROCm/rccl commit: fdad89690b ]
2025-05-05 15:58:57 -04:00
Mustafa Abduljabbar
ab4a3eb0c1
Fix topo explorer's compatibility with NCCL 2.24 ( #1671 )
...
* Fix build issues
* Fix failure to find path remote rank
[ROCm/rccl commit: f3f3336468 ]
2025-05-05 15:26:29 -04:00
Siu Chi Chan
be0761502d
rccl-UnitTests - link to dl library ( #1673 )
...
[ROCm/rccl commit: 9525c5b2ef ]
2025-05-02 21:20:22 -05:00
Bertan Dogancay
b435c75068
[Graph] Try using P2P by default ( #1670 )
...
[ROCm/rccl commit: acfac55516 ]
2025-05-02 11:54:30 -04:00
Nilesh M Negi
a6972c0d09
Revert "[SRC] Enable unroll=1 for gfx950 ( #1602 )" ( #1667 )
...
* Revert "[SRC] Enable unroll=1 for gfx950 (#1602 )"
This reverts commit 210f90ae0f .
* Update Changelog
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
[ROCm/rccl commit: 329e13efff ]
2025-04-30 23:33:08 -05:00
deeksha-amd
5580cb7574
Added new tests for improving the code coverage ( #1656 )
...
Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com >
[ROCm/rccl commit: 2486838465 ]
2025-04-30 18:01:11 -05:00