Commit Graph

1770 Commits

Author SHA1 Message Date
Nilesh M Negi 7c422271a8 [MSCCLPP] Disable MSCCLPP Executor (#1744)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 92a5d225d9]
2025-06-17 01:29:55 -05:00
Sarat Kamisetty e359e834f5 generic net plugin ctxt that is extensible for use in multiple APIs (#1735)
Co-authored-by: Sarat Kamisetty <sakamiset@amd.com>

[ROCm/rccl commit: fa0422f174]
2025-06-16 14:48:08 -07:00
Bertan Dogancay 9fa53cc454 [NPKit] Use default output directory when env var is not set (#1747)
[ROCm/rccl commit: 39211c6b41]
2025-06-16 15:26:53 -04:00
Mustafa Abduljabbar 3e5dc99aa6 Fix topo_explorer compatibility and capture WarpSize (#1743)
[ROCm/rccl commit: fb4ad82d0d]
2025-06-16 08:18:35 -04:00
Tim 7051f217a7 replayer update v0 (#1733)
* First version of new replayer, with comments on future TODOs

* plus minor fixes for UT

* Updated format of recorder, especially in binary department, according to replayer's need

[ROCm/rccl commit: ba97c9c18b]
2025-06-13 15:05:34 -04:00
Richard Barnes 2c0cc20a76 Enable -Wdeprecated-copy-with-user-provided-copy (#1643)
[ROCm/rccl commit: 4486d091b8]
2025-06-13 08:23:31 -07:00
Arm Patinyasakdikul 7f7f1cede3 Added missing copyright message. (#1742)
* Added missing copyright message.

* addressed comments.

[ROCm/rccl commit: 6c37ae9470]
2025-06-12 09:58:01 -05:00
corey-derochie-amd 2e7aa3556e Deprecated MSCCL API functions (#1740)
[ROCm/rccl commit: 03fba66e71]
2025-06-11 17:52:09 -06:00
Nusrat Islam 99813a3288 msccl: adjust msccl threshold for bf16 (#1736)
* msccl: adjust msccl threshold for bf16

* Update src/collectives.cc

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 75c3c8215c]
2025-06-11 09:09:57 -05:00
Arm Patinyasakdikul 69f7167b74 Fixed errorneous parenthesis. (#1739)
[ROCm/rccl commit: 600ace7f19]
2025-06-11 09:08:00 -05:00
Nilesh M Negi 4cadf3597c [DEVICE] Adding ability to choose unroll factor at runtime (#1734)
* Adding runtime unroll factor selection via RCCL_UNROLL_FACTOR
* [BUILD] Add support for user-defined UNROLL for debugging
* Update CHANGELOG.md
* Fix COLLTRACE errors in CI
* Add debug statements for unroll and resolve warnings
* Incorporate UNROLL into ONLY_FUNCS for debugging

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 9d72be7b2f]
2025-06-11 00:07:59 -05:00
Atul Kulkarni 4cd71722f2 Added new ENABLE_CODE_COVERAGE option. (#1664)
Modified install.sh script to add this new option

[ROCm/rccl commit: 682ed36fe6]
2025-06-10 12:12:36 -05:00
Nilesh M Negi b797b62f6b [DEVICE] Use threadfence on gfx950 for LL protocol (#1686)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: b926203c05]
2025-06-09 01:26:07 -05:00
Nilesh M Negi 7abc3160e7 [BUILD] Enable LL128 on gfx950 (#1731)
* [BUILD] Enable LL128 on gfx950
* Modify comment in src/rccl_wrap.cc
* Update CHANGELOG

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: ef5b4ff630]
2025-06-09 00:25:54 -05:00
vstojilj bbe7422279 SWDEV-536040 - Include <thread> header (#1724)
[ROCm/rccl commit: 2ac44cfe4e]
2025-06-06 10:28:11 -06:00
Arm Patinyasakdikul f65777536f Remove 'warpSize' compiler constant as it is deprecated in ROCm 7.0. (#1720)
* Remove 'warpSize' compiler constant as it is deprecated in ROCm 7.0.

* Create ncclShmemScratchWarpSize on host side for enqueue.cc.

* Update src/enqueue.cc

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

* address comments

* fix number of threads

---------

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: ec6efa9b26]
2025-06-06 07:34:43 -05:00
Arm Patinyasakdikul 8dd9747504 Increase default WORK_FIFO size to accommodate larger alltoall. (#1722)
[ROCm/rccl commit: d5b5f6b159]
2025-06-05 09:02:45 -05:00
Pedram Alizadeh 1ace5d05ed Reapplying PR #1641 [AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1713)
* Reapply "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)"

This reverts commit 943ad6f7820739385a0b54e81f823d0df1dbf71c.

* Decreasing NCCL_LL128_SHMEM_ELEMS_PER_THREAD from 16 to 8

[ROCm/rccl commit: 3f7c08648f]
2025-06-04 13:22:11 -04:00
Avinash a50ff2c3d3 SPLITCOMM design fix in src/misc/msccl (#1715)
* Fix TOC-TOU in mcclInit

* Improving vector resize thread safety

* Initial commit rank to comm change

* Removing unwanted include header changes

* Updated CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: e94b360246]
2025-06-01 21:00:38 -05:00
alex-breslow-amd 4277b5aa88 Use One Slice per Basic Primitive for AllReduce, ReduceScatter, AllGather (#1681) for Single Node on Some GFX9 Systems
Using a single slice rather than the typical two provides about 5% speedup (sometimes more or less) on some GFX9 systems for single node.

[ROCm/rccl commit: 2f6b20c00a]
2025-05-29 16:17:35 -07:00
Nilesh M Negi 19ed482121 Re-apply unroll=1 and 112 channels for gfx950 (#1706)
* Reapply "[SRC] Enable unroll=1 for gfx950 (#1602)" (#1667)
This reverts commit a6972c0d09.

* Reapply "[GRAPH] Increase default nChannels to 112 for gfx950 (#1596)" (#1620)
This reverts commit 1a2eca1756.

[ROCm/rccl commit: 12517a957e]
2025-05-28 14:58:10 -05:00
corey-derochie-amd 22120c6303 Fixed errors in the CHANGELOG for ROCm 7.0 (#1702)
* Updated 6.5 release to be 7.0
* Corrected the RCCL version for 6.4.1
* Moved items to the correct releases
* Added NCCL 2.25.1 compatibility item
* Fixed wording
* Added entry for `ManagedMem` and `ManagedMemGraph` test fix

[ROCm/rccl commit: 7b633d5844]
2025-05-23 15:47:59 -05:00
akolliasAMD 6e2f75d424 remove user from code owner file (#1709)
[ROCm/rccl commit: aabd181fe4]
2025-05-23 15:45:15 -05:00
Arm Patinyasakdikul 59597ad8a7 Test: bump max stacksize once again to match current expectation.
[ROCm/rccl commit: c07445d5b4]
2025-05-23 11:18:25 -05:00
alex-breslow-amd 056ca0edfa Make offload-compress the default (#1704)
* Make offload-compress the default
* Add guard for --offload-compress since it was introduced in ROCm 6.2
* Address some of Nilesh's feedback.
* Reorganize for code cleanliness
* Improve comment
* Compress gpu code at link and compile time

[ROCm/rccl commit: f5b44acb1b]
2025-05-22 22:33:25 -05:00
Nilesh M Negi 7803531f46 [DOCKER] Fix RCCL and RCCL-Tests build for stg1 base images (#1699)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 948d2b6a68]
2025-05-22 20:46:01 -05:00
Arm Patinyasakdikul 2cb65ba466 Test: Change max stack size to 520 to accomodate new ROCm changes.
[ROCm/rccl commit: 523e0893e4]
2025-05-21 20:21:27 -05:00
PedramAlizadeh a99f960742 Revert "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)"
This reverts commit 951ed9cde1.


[ROCm/rccl commit: 7f878baef0]
2025-05-21 20:21:27 -05:00
isaki001 89bc9131aa fix improper patch reverse order (#1696)
[ROCm/rccl commit: 66ef428714]
2025-05-19 12:29:21 -05:00
Arm Patinyasakdikul 1313bccaca CHANGELOG.md: Add UT failures as known issue for 6.4.1. (#1698)
* CHANGELOG.md: Add UT failures as known issue for 6.4.1.

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 1710c27e77]
2025-05-19 10:40:50 -05:00
Arm Patinyasakdikul 3e16753c71 Added known issue for 6.4.1 release to CHANGELOG.md. (#1697)
[ROCm/rccl commit: e602497789]
2025-05-16 08:17:48 -05:00
Sam Wu 0db42fb854 Remove call to junit from math ci (#1691)
[ROCm/rccl commit: e5bf7bc5b1]
2025-05-15 14:45:49 -06:00
Arm Patinyasakdikul 4b5ff98d65 Change GPU references to gfx950. (#1695)
[ROCm/rccl commit: f306c00671]
2025-05-15 10:32:46 -05:00
corey-derochie-amd 65d67dce7a Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues. (#1546)
* Revert "Revert "replacing rccl_float8 with hip_fp8 and address compatibility …"

This reverts commit 30eecfdb25.

* [UT] Modify max stack size to 496

* adding a check for OCP type and replacing ROCM_VERSION with HIP_VERSION

* addressing the ci failure

* Adding the device tag

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 170acf3bda]
2025-05-14 15:33:03 -05:00
Mustafa Abduljabbar 951ed9cde1 [AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)
* Update LL128 elems per thread

* Precompute ix[g] in LL128 prim

* Make Threadthreshold part of tuning models

* Ignore channel tuning when channels are env controlled

* Tune LL128 max limit for AG

* Tune LL128 max limit for RS

* Retune AR LL128 limits due to changes

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 00c1eb098c]
2025-05-14 14:35:54 -05:00
Dingming Wu 3731cae1b7 Detect if HSA_NO_SCRATCH_RECLAIM is set after initEnv() (#1683)
* Detect if HSA_NO_SCRATCH_RECLAIM is set after initEnv()

 For rocm older than 6.4, we need to set HSA_NO_SCRATCH_RECLAIM=1 to use LL128 protocol.
This Env is set outside of RCCL, add the logging to detect whether its set during runtime.

* check hip runtime ver via hipRuntimeGetVersion

* move the detection to ncclinit func

* correct rocm version integer

* update warning message

* avoid unnecessary info msg on hsa_no_scratch_reclaim detection

[ROCm/rccl commit: 51f87fbb43]
2025-05-14 10:12:45 -05:00
mberenjk 08c0b8b0fc moving the thread_fence to apply before atomic fetch (#1672)
* applying thread_fence only on warp 0 before atomic fetch

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 1cefcee51f]
2025-05-14 10:10:05 -05:00
Mustafa Abduljabbar 128b0e7074 Remove MSCCL single node AllGather XMLs (#1693)
* Remove MSCCL single node XMLs

* Remove comment on MSCCL AG single node support

[ROCm/rccl commit: d665547eef]
2025-05-13 17:07:03 -05:00
Nikhil-Nunna ad657d957a Updated Codeowners (#1692)
[ROCm/rccl commit: a72a1939d1]
2025-05-12 18:58:39 -05:00
gilbertlee-amd 6e57154001 Fix when more than 64 channels are used for multi-collective group calls (#1688)
* Fix when more than 64 channels are used for multi-collective group calls

* Update CHANGELOG.md

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 9ef45df8f7]
2025-05-12 18:05:57 -05:00
Avinash 6d6dd8434a RCCL Multinode DMA Buffer crash fix (#1682)
This commit handles DMABUF initialization and call appropriate handling function. This fixes crash in OS with no peermem support and relying on only DMABUF.

* Initial test commit
* Handling Dmabuf_fd opening and closing
* Cleanup
* Use DMABuff or Peermem as needed
* Using user input for ibDmaBufSupportInitOnce
* Revert all changes to rocmwrap.cc
* Revert all changes to rocmwrap.cc
* Changing to func definition braces
* Reverting line removal in utils.h
* useDmaBuf to calculate  flushEnabled

[ROCm/rccl commit: 5f6805b4f4]
2025-05-08 19:17:39 -05:00
mberenjk 743cc971d3 Write JSON file to /tmp directory to avoid incorrect write access in recorderTest (#1680)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: e70003736e]
2025-05-07 13:58:27 -05:00
Avinash c81ea25407 collective trace improvements for debugging (#1661)
[ROCm/rccl commit: c54a0c085a]
2025-05-07 13:37:31 -05:00
Bertan Dogancay c75ebd9147 Merge pull request #1662 from BertanDogancay/2.25
[SYNC] 2.25.1-1

[ROCm/rccl commit: 590ad6acc2]
2025-05-06 09:39:09 -04:00
Mustafa Abduljabbar 750bd73047 Add missing MACRO to topo_expl (#1677)
* Fix header compatibility

[ROCm/rccl commit: fdad89690b]
2025-05-05 15:58:57 -04:00
Mustafa Abduljabbar ab4a3eb0c1 Fix topo explorer's compatibility with NCCL 2.24 (#1671)
* Fix build issues

* Fix failure to find path remote rank


[ROCm/rccl commit: f3f3336468]
2025-05-05 15:26:29 -04:00
Siu Chi Chan be0761502d rccl-UnitTests - link to dl library (#1673)
[ROCm/rccl commit: 9525c5b2ef]
2025-05-02 21:20:22 -05:00
Bertan Dogancay b435c75068 [Graph] Try using P2P by default (#1670)
[ROCm/rccl commit: acfac55516]
2025-05-02 11:54:30 -04:00
Nilesh M Negi a6972c0d09 Revert "[SRC] Enable unroll=1 for gfx950 (#1602)" (#1667)
* Revert "[SRC] Enable unroll=1 for gfx950 (#1602)"
This reverts commit 210f90ae0f.

* Update Changelog

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 329e13efff]
2025-04-30 23:33:08 -05:00
deeksha-amd 5580cb7574 Added new tests for improving the code coverage (#1656)
Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>

[ROCm/rccl commit: 2486838465]
2025-04-30 18:01:11 -05:00