提交線圖

1740 次程式碼提交

作者 SHA1 備註 日期
Arm Patinyasakdikul 8557ea33ad Test: delete child object to address memory leak. (#1863)
[ROCm/rccl commit: 9d3acffa5f]
2025-08-20 10:15:03 -05:00
Arm Patinyasakdikul d4fecfb0be Remove noinline attribute from reduceCopyPacks and (#1864)
reduceCopyPacksWithBias.

[ROCm/rccl commit: fb882e80f6]
2025-08-19 20:24:31 -05:00
Atul Kulkarni 8c5095dd94 Added new code owners (#1869)
[ROCm/rccl commit: 231449c896]
2025-08-19 16:32:25 -05:00
Mustafa Abduljabbar 5025a9aab9 Have ncclDevFuncId use 64-Bit keyed map with field packing (#1857)
- Updated ncclDevFuncId to use a hash-based lookup with std::unordered_map.
- Keys are now 64-bit integers, which pack coll, algo, proto, devRedOp, and type fields.
- Improved flexibility and maintainability by moving away from row-based indexing.
- Added error handling for missing keys in the hash map.
- Aligned key generation logic with generate.py and updated generate.py.

[ROCm/rccl commit: c1b3cd8911]
2025-08-19 16:41:19 -04:00
Nusrat Islam e4c025e5cd device: optimize threadfence for ll64 protocol (#1858)
* device: optimize threadfence for ll64 protocol

* device: use __atomic_signal_fence()

---------

Co-authored-by: Nusrat Islam <nusislam@useocpslog-003.amd.com>

[ROCm/rccl commit: 6ade5065b4]
2025-08-18 09:16:41 -05:00
ishkool 377160e0c9 Code Coverage: Proxy.cc tests (#1818)
* Proxy.cc tests

* Update ProxyTest.cpp

Cleaned up the code.

* Update ProxyTests.cpp

Bring back deleting dynamically allocated memory

[ROCm/rccl commit: 876f985e0f]
2025-08-15 19:06:32 -05:00
Atul Kulkarni 38e88ba87e Added new unit tests for src/enqueue.cc (#1853)
[ROCm/rccl commit: 84f3cc6a02]
2025-08-15 18:26:26 -05:00
ishkool 61a189bc84 Code Coverage Unit Tests for comm.h (#1783)
* File containing test for comm.h

* Update CommTest.cpp

Added gtest API for assert

* Update CommTest.cpp

Adding copyright

* Update CommTest.cpp

Removing info and tested as not required.

* Update and rename CommTest.cpp to CommTests.cpp

* Update CMakeLists.txt

[ROCm/rccl commit: 6453273aa6]
2025-08-15 17:44:24 -05:00
Nilesh M Negi ed4abedf7b [DEVICE] Use noinline for LLGenericOp only on gfx950 (#1849)
[ROCm/rccl commit: c3b8de4ec8]
2025-08-15 15:15:02 -05:00
isaki001 2e9a2d1762 [TUNING] gfx950 16N tuning (#1835)
* change gfx950 algo/proto selection for multinode allreduce, allgather, reduceScatter
* gfx950 tuning: enable tuning for broadcast, allreduce starts LL128 earlier and switches to ring earlier, change LL128 start for allgather and reduceScatter
* lower LL128 threshold
* update reduceScatter LL128 min to match LL max for consistency
* enable multinode PXN and increase chunksize for gfx950
* change LL128 start to 128KB, adjust ring-start according to node-count
* disable code-path for fused-AR on LL128 for gfx950
* use LL128 starting from 1KB for multinode allgather on gfx950
* start LL128 earlier for multinode reduceScatter on gfx950
* start LL128 earlier for multinode broadcast on gfx950
* set multinode allreduce to start simple on 64MB for gfx950
* start LL128 from 1KB for multinode broadcast on gfx950
* setting multinode AR to use tree instead of ring at 16MB, 64MB, 128MB
* set multinode broadcast to use LL for up to 256KB depending on node-count for gfx950
* adjust algo for 32MB  multinode allreduce on gfx950
* make 32MB tree LL128 for multinode AR on gfx950
* make sure ring is not picked on 2N allreduce on small sizes

[ROCm/rccl commit: 44121db890]
2025-08-15 15:12:45 -05:00
alex-breslow-amd dc3a0c5242 Disable the __threadfence on the sender side of the simple protocol when possible. (#1830)
Leverages the traits of extended-scope fine-grain memory to get rid of a device-scope acquire-release fence.  This improves throughput for single node workloads on gfx942 and gfx950 for some input sizes (e.g., ~32 MiB to about 256 MiB) when using the simple protocol.  Multinode workloads on MI300X see a smaller but statistically significant uplift for some message sizes.  Runtime disablement is supported via setting the environment variable RCCL_GFX942_CHEAP_FENCE_ON to 0.

[ROCm/rccl commit: 1aa2570b48]
2025-08-15 07:54:54 -07:00
mberenjk c76a4492f1 Added useAcc as a template parameter to address the performance regression (#1856)
* Added useAcc as a template parameter to address the 2% performance regression in allreduceWithBias
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>


[ROCm/rccl commit: c61152baa4]
2025-08-14 15:58:54 -05:00
Adel Johar d3e9db9432 Docs: Add environment variables reference page
[ROCm/rccl commit: aaf8613b76]
2025-08-14 09:55:28 +02:00
Karthikeyan Arumugam 16d0871985 Add cstring header explictly as it is removed from HIP (#1859)
[ROCm/rccl commit: 6d41e5ba99]
2025-08-13 15:14:22 -07:00
Rahul Vaidya baa6a61535 [BUILD] Fix UT packaging on Debian family OS (#1854)
* Fix UT packaging on Debian family OSes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Split OR condition when performing Debian checks

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: ee9ed3ef87]
2025-08-11 17:03:16 -05:00
Chris Sosa 584413b2cb Add CI Badge for tracking CI status in prep for gating changes (#1851)
This PR is intended to move RCCL to gating changes on CI failures. Right now, only build/unittests run per PR consistently. We should eventually add all single and multi-node test status badges once those tests are running in presubmit and continuously on develop

[ROCm/rccl commit: 53977821b5]
2025-08-11 14:02:46 -07:00
Nilesh M Negi 74adb64dfb [BUILD] Fix UT packaging on Debian OS (#1848)
[ROCm/rccl commit: 5036d0e713]
2025-08-11 09:43:26 -05:00
Rahul Vaidya 70a5f2f317 Fix rccl-UnitTests packaging on Debian systems (#1846)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: cbbc713b03]
2025-08-08 12:28:56 -05:00
isaki001 52d33058bb enable more events for LL128 NPKIT trace collection (#1827)
[ROCm/rccl commit: 74d82a8145]
2025-08-07 11:19:36 -05:00
awelling2801 c5b4e1bc78 Created coverage tests for rccl_wrap (#1694)
* Created coverage tests for rccl_wrap

RCCL_EXPOSE_STATIC off by default

Coverage tests for rccl_wrap.cc

* Remove RCCL_EXPOSE_STATIC dependency

* Removed Rcclwrap.RcclGetAlgoInfoTest

* Remove comments

* Corrected RCCL_EXPOSE_STATIC definition logic

---------

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: 82bea39280]
2025-08-06 14:48:00 -05:00
Avinash f34d760613 Compiler warnings fix 2 (#1801)
* Changes to device code

* Changes to src/misc

* Changes to graph

* src/include changes

* src/transport changes

* changes in init, enqueue, proxy

* Changes to CMakeLists.txt

* Additional changes to device code

* Additional changes to net.cc

* adding 'compiler warning' tag to ease upstream merge'

* typo correction

* Addessing comments

* Additional changes for new commits

[ROCm/rccl commit: 3f8cac388e]
2025-08-05 17:36:23 -05:00
Arm Patinyasakdikul df3b7e477f Disable context tracking for the current version. (#1839)
[ROCm/rccl commit: 6fc228e247]
2025-08-04 10:48:00 -05:00
Atul Kulkarni 35283394ed Add unit tests for graph/xml.cc & graph/xml.h (#1833)
* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

* Added new unit tests for src/transport/shm.cc

* Added new unit tests for graph/xml.cc

[ROCm/rccl commit: 0e7d7da55d]
2025-08-01 14:20:27 -05:00
Atul Kulkarni e550ba1e3b Update help text in README (#1837)
[ROCm/rccl commit: e2c9f2feab]
2025-08-01 14:19:27 -05:00
awelling2801 0d34963b35 Added tests for coll_reg (#1700)
Changes to coll_reg

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 5ecc1b7ede]
2025-07-31 13:49:23 -05:00
dependabot[bot] b6639c85f4 Bump urllib3 from 2.2.2 to 2.5.0 in /docs/sphinx (#1751)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.2 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.2.2...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 32e95963dc]
2025-07-31 11:25:45 -06:00
dependabot[bot] e31001e378 Bump rocm-docs-core from 1.18.2 to 1.22.0 in /docs/sphinx (#1836)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.2 to 1.22.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.2...v1.22.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-version: 1.22.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: 1acc3eb6c1]
2025-07-31 11:15:01 -06:00
awelling2801 839fcb54b5 Added tests for transport.cc (#1725)
Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 7320752bf3]
2025-07-31 11:04:28 -05:00
Rahul Vaidya d65eb0b021 Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: 0adc5edc74]
2025-07-31 11:00:49 -05:00
Nilesh M Negi be810f10f3 [DEVICE] Add unroll=2 for gfx950 multi-node (#1824)
[ROCm/rccl commit: bd55f876e9]
2025-07-31 02:35:26 -05:00
ycui1984 39c508b80d Add collective latency profiler (#1785)
* [LatencyProfiler] Initial commit

* [LatencyProfiler] Add unit tests

* [LatencyProfiler] add more

* [LatencyProfiler] Pass unit tests

* [LatencyProfiler] Add hooks to integrate with meta internal tools

* [LatencyProfiler] Restore install.sh

* [LatencyProfiler] Resolved comments 1. add proper license 2. use proper namespace

* [LatencyProfiler] Add header

[ROCm/rccl commit: 874cd657ef]
2025-07-30 14:59:28 -07:00
Mustafa Abduljabbar cafd7a5126 Optimize alltoall for 64 GPUs and above for gfx942 (#1828)
Add pxn and p2p net chunksize mi300x tuning

[ROCm/rccl commit: 4ce3df8d3a]
2025-07-30 15:14:43 -04:00
mberenjk cca5172260 Upcast FP8 to Half (FP16) for Sum Operation (#1775)
* adding hadd and hadd2 support using builtin functions.

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: c84ee3d298]
2025-07-29 11:33:06 -05:00
awelling2801 da2bb8a578 Added tests for Ipcsocket (#1690)
Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: 9843adaab2]
2025-07-29 10:03:28 -05:00
awelling2801 88dcaaddc5 Code coverage improvements for alloc.h (#1676)
* Added tests for alloc.h

* Added tests for ZeroElementCopy and MemcpyNullSrcOrDstPointer

---------

Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: e118aadc14]
2025-07-29 09:19:57 -05:00
peizhang56 5c02be7b51 Add Unit Test for bitops.h (#1821)
* Add Unit Test for bitops.h

* Change the style

* Fix the code review comments

* Add more test cases

[ROCm/rccl commit: fe182d6546]
2025-07-28 11:25:15 -05:00
Atul Kulkarni de0d446e03 Added new unit tests for src/transport/p2p.cc (#1774)
[ROCm/rccl commit: 81ec6bff4c]
2025-07-25 12:57:57 -05:00
Sarat Kamisetty 1719aa67be passing down NET_OPTIONAL_RECV_COMPLETION hint to n/w plugin to enable optimizations (#1752)
Co-authored-by: Sarat Kamisetty <sakamiset@amd.com>

[ROCm/rccl commit: 783c073a03]
2025-07-25 10:26:58 -05:00
Mustafa Abduljabbar b3a0cc5e96 Add optional bf16 software-triggered pipelining for reduceCopyPacks (#1758)
- Introduced double-buffering to reduce copy overhead and overlap BF16 arithmetic with data prefetching.
- Aimed to improve performance of reduction-based collectives by up to 10%.
- Implemented based on recommendations from Guennadi Riguer (AMD)
- Added --force-reduce-pipeline option to install.sh to activate this optimization for BF16 reductions.
- Feature is disabled by default to prevent regressions with large messages until auto-tuning logic is upstreamed.
---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Pedram Alizadeh <pmohamma@amd.com>

[ROCm/rccl commit: 0ce20e7e07]
2025-07-25 10:57:05 -04:00
Atul Kulkarni bd53bdf447 Added new unit tests for src/transport/shm.cc (#1689)
[ROCm/rccl commit: 1c3d1b3842]
2025-07-25 05:54:42 -05:00
Arm Patinyasakdikul 866058c6d9 Fix segfault when libibverbs returns 0 device. (#1820)
Fix: SWDEV-543816

[ROCm/rccl commit: 3c9c22bb52]
2025-07-23 15:18:52 -05:00
Wenkai Du caff9764d3 Support fused all reduce and elementwise operations (#1729)
* Support fused all reduce and elementwise operations

Add additional "acc" parameter to RCCL Replayer logs

Add flag which indicates availability of new API

* Fix Recorder json parsing

* Remove unreachable code

* Remove extra acc pointer check

* .

* Revert "[DEVICE] Adding ability to choose unroll factor at runtime (#1734)"

This reverts commit 4cadf3597c.

* Use noinline to reduce kernels linking time

* Don't use noinline for gfx942 and gfx950 to avoid perf regression

---------

Co-authored-by: AtlantaPepsi <timhu102@amd.com>
Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>

[ROCm/rccl commit: 9a4213356d]
2025-07-23 09:04:17 -07:00
alex-breslow-amd cbb648505a Cheaper threadfence for gfx942 in postPeer [1/N]: enable for single node allreduce (#1766)
Boosts single node bfloat16 allreduce performance by up to 20% for some data sizes and provides gating with the RCCL_GFX942_CHEAP_FENCE_OFF environment variable

[ROCm/rccl commit: 11fabf1de1]
2025-07-22 07:15:15 -07:00
Rahul Vaidya bd63518944 Add datatype validation for MSCCLPP AllGather (#1816)
Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

[ROCm/rccl commit: c28d3d26a3]
2025-07-21 11:50:45 -05:00
Atul Kulkarni c94fb7c58e Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

[ROCm/rccl commit: 275fdd43c1]
2025-07-17 11:20:49 -05:00
isaki001 af4ce678b5 Fix typo in NPKit build that prevents NET_TEST event (#1807)
[ROCm/rccl commit: ef6a54ba34]
2025-07-16 09:08:06 -05:00
Nilesh M Negi 2c0c02b211 [GRAPH] Match maxChannels for gfx942 CUs (#1302)
[ROCm/rccl commit: 6632183efe]
2025-07-16 09:07:02 -05:00
Wenkai Du 670966f86b Fix inline compilation issue with LL (#1806)
[ROCm/rccl commit: 106024b0db]
2025-07-15 08:39:18 -07:00
isaki001 a20e65cfc0 gfx950 updated on LL thresholds for allreduce/allgather, update treeCorrection (#1803)
* change LL thresholds for allreduce/allgather and update treeCorrectionFactor

* update allGather LL cutoff

* adjust allgather LL/LL128 thresholds

[ROCm/rccl commit: 8d0f1a1cef]
2025-07-15 09:10:19 -05:00
dependabot[bot] c447d779b9 Bump requests from 2.32.2 to 2.32.4 in /docs/sphinx (#1738)
Bumps [requests](https://github.com/psf/requests) from 2.32.2 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.2...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: aafbdad2ab]
2025-07-14 10:30:37 -06:00