Grafico dei commit

1834 Commit

Autore SHA1 Messaggio Data
ycui1984 39c508b80d Add collective latency profiler (#1785)
* [LatencyProfiler] Initial commit

* [LatencyProfiler] Add unit tests

* [LatencyProfiler] add more

* [LatencyProfiler] Pass unit tests

* [LatencyProfiler] Add hooks to integrate with meta internal tools

* [LatencyProfiler] Restore install.sh

* [LatencyProfiler] Resolved comments 1. add proper license 2. use proper namespace

* [LatencyProfiler] Add header

[ROCm/rccl commit: 874cd657ef]
2025-07-30 14:59:28 -07:00
Mustafa Abduljabbar cafd7a5126 Optimize alltoall for 64 GPUs and above for gfx942 (#1828)
Add pxn and p2p net chunksize mi300x tuning

[ROCm/rccl commit: 4ce3df8d3a]
2025-07-30 15:14:43 -04:00
mberenjk cca5172260 Upcast FP8 to Half (FP16) for Sum Operation (#1775)
* adding hadd and hadd2 support using builtin functions.

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: c84ee3d298]
2025-07-29 11:33:06 -05:00
awelling2801 da2bb8a578 Added tests for Ipcsocket (#1690)
Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: 9843adaab2]
2025-07-29 10:03:28 -05:00
awelling2801 88dcaaddc5 Code coverage improvements for alloc.h (#1676)
* Added tests for alloc.h

* Added tests for ZeroElementCopy and MemcpyNullSrcOrDstPointer

---------

Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: e118aadc14]
2025-07-29 09:19:57 -05:00
peizhang56 5c02be7b51 Add Unit Test for bitops.h (#1821)
* Add Unit Test for bitops.h

* Change the style

* Fix the code review comments

* Add more test cases

[ROCm/rccl commit: fe182d6546]
2025-07-28 11:25:15 -05:00
Atul Kulkarni de0d446e03 Added new unit tests for src/transport/p2p.cc (#1774)
[ROCm/rccl commit: 81ec6bff4c]
2025-07-25 12:57:57 -05:00
Sarat Kamisetty 1719aa67be passing down NET_OPTIONAL_RECV_COMPLETION hint to n/w plugin to enable optimizations (#1752)
Co-authored-by: Sarat Kamisetty <sakamiset@amd.com>

[ROCm/rccl commit: 783c073a03]
2025-07-25 10:26:58 -05:00
Mustafa Abduljabbar b3a0cc5e96 Add optional bf16 software-triggered pipelining for reduceCopyPacks (#1758)
- Introduced double-buffering to reduce copy overhead and overlap BF16 arithmetic with data prefetching.
- Aimed to improve performance of reduction-based collectives by up to 10%.
- Implemented based on recommendations from Guennadi Riguer (AMD)
- Added --force-reduce-pipeline option to install.sh to activate this optimization for BF16 reductions.
- Feature is disabled by default to prevent regressions with large messages until auto-tuning logic is upstreamed.
---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
Co-authored-by: Pedram Alizadeh <pmohamma@amd.com>

[ROCm/rccl commit: 0ce20e7e07]
2025-07-25 10:57:05 -04:00
Atul Kulkarni bd53bdf447 Added new unit tests for src/transport/shm.cc (#1689)
[ROCm/rccl commit: 1c3d1b3842]
2025-07-25 05:54:42 -05:00
Arm Patinyasakdikul 866058c6d9 Fix segfault when libibverbs returns 0 device. (#1820)
Fix: SWDEV-543816

[ROCm/rccl commit: 3c9c22bb52]
2025-07-23 15:18:52 -05:00
Wenkai Du caff9764d3 Support fused all reduce and elementwise operations (#1729)
* Support fused all reduce and elementwise operations

Add additional "acc" parameter to RCCL Replayer logs

Add flag which indicates availability of new API

* Fix Recorder json parsing

* Remove unreachable code

* Remove extra acc pointer check

* .

* Revert "[DEVICE] Adding ability to choose unroll factor at runtime (#1734)"

This reverts commit 4cadf3597c.

* Use noinline to reduce kernels linking time

* Don't use noinline for gfx942 and gfx950 to avoid perf regression

---------

Co-authored-by: AtlantaPepsi <timhu102@amd.com>
Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>

[ROCm/rccl commit: 9a4213356d]
2025-07-23 09:04:17 -07:00
alex-breslow-amd cbb648505a Cheaper threadfence for gfx942 in postPeer [1/N]: enable for single node allreduce (#1766)
Boosts single node bfloat16 allreduce performance by up to 20% for some data sizes and provides gating with the RCCL_GFX942_CHEAP_FENCE_OFF environment variable

[ROCm/rccl commit: 11fabf1de1]
2025-07-22 07:15:15 -07:00
Rahul Vaidya bd63518944 Add datatype validation for MSCCLPP AllGather (#1816)
Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

[ROCm/rccl commit: c28d3d26a3]
2025-07-21 11:50:45 -05:00
Atul Kulkarni c94fb7c58e Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

[ROCm/rccl commit: 275fdd43c1]
2025-07-17 11:20:49 -05:00
isaki001 af4ce678b5 Fix typo in NPKit build that prevents NET_TEST event (#1807)
[ROCm/rccl commit: ef6a54ba34]
2025-07-16 09:08:06 -05:00
Nilesh M Negi 2c0c02b211 [GRAPH] Match maxChannels for gfx942 CUs (#1302)
[ROCm/rccl commit: 6632183efe]
2025-07-16 09:07:02 -05:00
Wenkai Du 670966f86b Fix inline compilation issue with LL (#1806)
[ROCm/rccl commit: 106024b0db]
2025-07-15 08:39:18 -07:00
isaki001 a20e65cfc0 gfx950 updated on LL thresholds for allreduce/allgather, update treeCorrection (#1803)
* change LL thresholds for allreduce/allgather and update treeCorrectionFactor

* update allGather LL cutoff

* adjust allgather LL/LL128 thresholds

[ROCm/rccl commit: 8d0f1a1cef]
2025-07-15 09:10:19 -05:00
dependabot[bot] c447d779b9 Bump requests from 2.32.2 to 2.32.4 in /docs/sphinx (#1738)
Bumps [requests](https://github.com/psf/requests) from 2.32.2 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.2...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: aafbdad2ab]
2025-07-14 10:30:37 -06:00
dependabot[bot] 01b3922075 Bump tornado from 6.4.2 to 6.5.1 in /docs/sphinx (#1710)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4.2 to 6.5.1.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.2...v6.5.1)

---
updated-dependencies:
- dependency-name: tornado
  dependency-version: 6.5.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rccl commit: d4d021c726]
2025-07-14 10:25:01 -06:00
Wenkai Du e2ad96bb96 Disable P2P net option by default (#1793)
[ROCm/rccl commit: 708ad75f7a]
2025-07-14 08:55:39 -07:00
Nikhil-Nunna bf4031276c topo_explorer initial readme (#1797)
* topo_explorer intial readme

* topo_explorer readme update

* topo_explorer readme update

* Added sample output to README

* Update README.md

* Update README.md

---------

Co-authored-by: Mustafa Abduljabbar <mustafa.abduljabbar@amd.com>

[ROCm/rccl commit: 7abc7538ea]
2025-07-11 11:28:20 -05:00
Jobbins be9a573cb0 [rccl] Remove .jenkins folder (#1754)
[ROCm/rccl commit: 7ebd31097c]
2025-07-11 11:24:06 -05:00
Bertan Dogancay d4aafe31fa [GEN] Fix typo in IFC code gen (#1796)
[ROCm/rccl commit: 7158adb57f]
2025-07-11 09:19:39 -04:00
Nilesh M Negi 41c985462c [BUILD] Use fmt-header instead of libfmt (#1791)
[ROCm/rccl commit: 6b4ad0fd74]
2025-07-10 17:19:53 -05:00
Nilesh M Negi 86dd6f262b [TOOLS] Update p2p-latency-test for gfx950 (#1730)
[ROCm/rccl commit: f839e4edef]
2025-07-10 12:13:29 -05:00
Nilesh M Negi ba31e4e846 [INIT] Fix fallback for unsupported user-specified runtime unroll factor (#1780)
* [INIT] Fix fallback for unsupported user-specified runtime unroll factor
* Add CollTrace guard
* Move `commSetUnrollFactor()` to rccl_wrap.cc
* Modify comments in the device-code generator script

[ROCm/rccl commit: 2c099fe29a]
2025-07-10 10:56:18 -05:00
Nilesh M Negi 1050eb13ac [DEVICE] Fix validation errors for multi-node LL with gfx950 non-coherent system memory (#1795)
[ROCm/rccl commit: 68d6f99e0f]
2025-07-10 09:05:46 -05:00
Mustafa Abduljabbar caeaaa284c Fix AllReduce regression due to previous max range increase for LL64/LL128 (#1787)
* Adjust tuning factor impacting more than 2 nodes
* Scale max LL128 size for > 2 nodes
* Retune max LL128 range for N > 2

[ROCm/rccl commit: 058264b3f3]
2025-07-09 19:17:10 -05:00
Atul Kulkarni 16aadd67cf Enable Google Test's GMOCK feature (#1773)
[ROCm/rccl commit: a28d5cb986]
2025-07-09 17:25:44 -05:00
mberenjk 1623fcc7a1 Improving build time by removing the gfx11xx and host code from rccl_float8.h (#1789)
* removing extra build time by removing the gfx11xx arch from using hip_fp8

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 697bee4ee8]
2025-07-09 14:03:47 -05:00
Bertan Dogancay b1470b4e50 [GRAPH] Pass rank instead of busId due to a change in an internal function signature (#1792)
[ROCm/rccl commit: 9c89573580]
2025-07-08 08:45:54 -04:00
Marius Brehler 5d753cb871 Set GTEST_BOTH_LIBRARIES appropriately (#1669)
If `find_package()` succeeds to find GTest and `INSTALL_DEPENDENCIES`
is set to OFF, `GTEST_BOTH_LIBRARIES` is not set and thus
`rccl-UnitTests` fails with trying to link unkown symbols.

[ROCm/rccl commit: dac0e528a0]
2025-07-05 20:38:31 -05:00
Bertan Dogancay 471fc6bff2 [DEVICE] Enable PAT algo for RCCL 1ppn (#1756)
* Enable PAT algo for RCCL 1ppn


[ROCm/rccl commit: e96c8473a1]
2025-07-04 13:45:18 -04:00
Rakesh Roy 82a822b646 Fix chrono build error (#1790)
[ROCm/rccl commit: dd3b1d816c]
2025-07-04 08:27:30 -05:00
Wenkai Du e031a4a2f4 msccl: use special send for LL on gfx950 (#1788)
[ROCm/rccl commit: ae9642d4bc]
2025-07-03 04:16:18 -05:00
ryanhankins f910d25563 Adding #include <dlfcn.h> in nccl_net.h to pass build (#1786)
[ROCm/rccl commit: 9d35581d5e]
2025-07-02 19:21:53 -05:00
Nilesh M Negi 23618f9e65 [MSCCLPP] Disable format checks in MSCCLPP by default (#1781)
[ROCm/rccl commit: 9e99c18f6e]
2025-07-02 09:11:42 -05:00
Wenkai Du 6db3b4cd4f Add support for extended fine grained system memory pool (#1770)
* Add support for extended fine-grained system memory pool
* Use hipHostRegisterUncached
* Add "sc0 sc1" flags for LL store on gfx950
* Update after HIP flag is changed to hipExtHostRegisterUncached

[ROCm/rccl commit: 4640ab19b3]
2025-07-01 16:38:49 -05:00
Nilesh M Negi fd0d9ac44c [BUILD] Fix packaging for RAS (#1784)
[ROCm/rccl commit: 3e51c41dcb]
2025-07-01 16:37:14 -05:00
Nilesh M Negi d88d033aba [RAS] Add support for RAS client (#1748)
Enable RAS client binary `rcclras`

[ROCm/rccl commit: 8d3a5542fb]
2025-06-29 18:53:16 -05:00
isaki001 79473681e5 added tuning table for gfx950 (#1779)
* added tuning table for mi350

* remove erroneous string

[ROCm/rccl commit: 75d22b47cb]
2025-06-29 15:45:39 -05:00
Bertan Dogancay ac5dad287a Switch to linear channel mapping for 2 nodes (#1777)
[ROCm/rccl commit: 358dc1bc84]
2025-06-28 09:10:18 -05:00
Arm Patinyasakdikul 4d71cae249 [topo-expl] update header file location. (#1769)
[ROCm/rccl commit: 35024ca1cb]
2025-06-27 15:29:37 -05:00
gilbertlee-amd 23e5680038 Fixing HelloRccl include path to RCCL, fixing some warnings (#1778)
[ROCm/rccl commit: 16101e654f]
2025-06-27 09:12:59 -06:00
Arm Patinyasakdikul c3b110f9e9 add warning if workFIFO is not available after multiple retries. (#1772)
* add warning if workFIFO is not available after multiple retries.

* Update src/enqueue.cc

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 265b1b3775]
2025-06-26 19:49:52 -05:00
Arm Patinyasakdikul 32e80aedc0 Update plugin to look for librccl-net.so. (#1768)
[ROCm/rccl commit: 71c788d4d7]
2025-06-26 16:59:38 -05:00
mberenjk 2c02ee0a99 changing the HIP-VERSION to 6.3 to avoid using hip_fp8 for older ROCm versions (#1764)
Co-authored-by: Marzieh Berenjkoub <mberenjk@.amd.com>

[ROCm/rccl commit: 5fb9d8f828]
2025-06-26 11:15:01 -05:00
Mustafa Abduljabbar ffb17bd9d7 Revert LL64 cutoff points based on internal tuning (#1771)
[ROCm/rccl commit: 7e2ac00980]
2025-06-26 11:59:42 -04:00