rocm-systems

作者	SHA1	備註	日期
Arm Patinyasakdikul	8557ea33ad	Test: delete child object to address memory leak. (#1863 ) [ROCm/rccl commit: `9d3acffa5f`]	2025-08-20 10:15:03 -05:00
Arm Patinyasakdikul	d4fecfb0be	Remove noinline attribute from reduceCopyPacks and (#1864 ) reduceCopyPacksWithBias. [ROCm/rccl commit: `fb882e80f6`]	2025-08-19 20:24:31 -05:00
Atul Kulkarni	8c5095dd94	Added new code owners (#1869 ) [ROCm/rccl commit: `231449c896`]	2025-08-19 16:32:25 -05:00
Mustafa Abduljabbar	5025a9aab9	Have ncclDevFuncId use 64-Bit keyed map with field packing (#1857 ) - Updated ncclDevFuncId to use a hash-based lookup with std::unordered_map. - Keys are now 64-bit integers, which pack coll, algo, proto, devRedOp, and type fields. - Improved flexibility and maintainability by moving away from row-based indexing. - Added error handling for missing keys in the hash map. - Aligned key generation logic with generate.py and updated generate.py. [ROCm/rccl commit: `c1b3cd8911`]	2025-08-19 16:41:19 -04:00
Nusrat Islam	e4c025e5cd	device: optimize threadfence for ll64 protocol (#1858 ) * device: optimize threadfence for ll64 protocol * device: use __atomic_signal_fence() --------- Co-authored-by: Nusrat Islam <nusislam@useocpslog-003.amd.com> [ROCm/rccl commit: `6ade5065b4`]	2025-08-18 09:16:41 -05:00
ishkool	377160e0c9	Code Coverage: Proxy.cc tests (#1818 ) * Proxy.cc tests * Update ProxyTest.cpp Cleaned up the code. * Update ProxyTests.cpp Bring back deleting dynamically allocated memory [ROCm/rccl commit: `876f985e0f`]	2025-08-15 19:06:32 -05:00
Atul Kulkarni	38e88ba87e	Added new unit tests for src/enqueue.cc (#1853 ) [ROCm/rccl commit: `84f3cc6a02`]	2025-08-15 18:26:26 -05:00
ishkool	61a189bc84	Code Coverage Unit Tests for comm.h (#1783 ) * File containing test for comm.h * Update CommTest.cpp Added gtest API for assert * Update CommTest.cpp Adding copyright * Update CommTest.cpp Removing info and tested as not required. * Update and rename CommTest.cpp to CommTests.cpp * Update CMakeLists.txt [ROCm/rccl commit: `6453273aa6`]	2025-08-15 17:44:24 -05:00
Nilesh M Negi	ed4abedf7b	[DEVICE] Use noinline for LLGenericOp only on gfx950 (#1849 ) [ROCm/rccl commit: `c3b8de4ec8`]	2025-08-15 15:15:02 -05:00
isaki001	2e9a2d1762	[TUNING] gfx950 16N tuning (#1835 ) * change gfx950 algo/proto selection for multinode allreduce, allgather, reduceScatter * gfx950 tuning: enable tuning for broadcast, allreduce starts LL128 earlier and switches to ring earlier, change LL128 start for allgather and reduceScatter * lower LL128 threshold * update reduceScatter LL128 min to match LL max for consistency * enable multinode PXN and increase chunksize for gfx950 * change LL128 start to 128KB, adjust ring-start according to node-count * disable code-path for fused-AR on LL128 for gfx950 * use LL128 starting from 1KB for multinode allgather on gfx950 * start LL128 earlier for multinode reduceScatter on gfx950 * start LL128 earlier for multinode broadcast on gfx950 * set multinode allreduce to start simple on 64MB for gfx950 * start LL128 from 1KB for multinode broadcast on gfx950 * setting multinode AR to use tree instead of ring at 16MB, 64MB, 128MB * set multinode broadcast to use LL for up to 256KB depending on node-count for gfx950 * adjust algo for 32MB multinode allreduce on gfx950 * make 32MB tree LL128 for multinode AR on gfx950 * make sure ring is not picked on 2N allreduce on small sizes [ROCm/rccl commit: `44121db890`]	2025-08-15 15:12:45 -05:00
alex-breslow-amd	dc3a0c5242	Disable the __threadfence on the sender side of the simple protocol when possible. (#1830 ) Leverages the traits of extended-scope fine-grain memory to get rid of a device-scope acquire-release fence. This improves throughput for single node workloads on gfx942 and gfx950 for some input sizes (e.g., ~32 MiB to about 256 MiB) when using the simple protocol. Multinode workloads on MI300X see a smaller but statistically significant uplift for some message sizes. Runtime disablement is supported via setting the environment variable RCCL_GFX942_CHEAP_FENCE_ON to 0. [ROCm/rccl commit: `1aa2570b48`]	2025-08-15 07:54:54 -07:00
mberenjk	c76a4492f1	Added useAcc as a template parameter to address the performance regression (#1856 ) * Added useAcc as a template parameter to address the 2% performance regression in allreduceWithBias --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl commit: `c61152baa4`]	2025-08-14 15:58:54 -05:00
Adel Johar	d3e9db9432	Docs: Add environment variables reference page [ROCm/rccl commit: `aaf8613b76`]	2025-08-14 09:55:28 +02:00
Karthikeyan Arumugam	16d0871985	Add cstring header explictly as it is removed from HIP (#1859 ) [ROCm/rccl commit: `6d41e5ba99`]	2025-08-13 15:14:22 -07:00
Rahul Vaidya	baa6a61535	[BUILD] Fix UT packaging on Debian family OS (#1854 ) * Fix UT packaging on Debian family OSes Signed-off-by: ravaidya <ravaidya@amd.com> * Split OR condition when performing Debian checks Signed-off-by: ravaidya <ravaidya@amd.com> --------- Signed-off-by: ravaidya <ravaidya@amd.com> [ROCm/rccl commit: `ee9ed3ef87`]	2025-08-11 17:03:16 -05:00
Chris Sosa	584413b2cb	Add CI Badge for tracking CI status in prep for gating changes (#1851 ) This PR is intended to move RCCL to gating changes on CI failures. Right now, only build/unittests run per PR consistently. We should eventually add all single and multi-node test status badges once those tests are running in presubmit and continuously on develop [ROCm/rccl commit: `53977821b5`]	2025-08-11 14:02:46 -07:00
Nilesh M Negi	74adb64dfb	[BUILD] Fix UT packaging on Debian OS (#1848 ) [ROCm/rccl commit: `5036d0e713`]	2025-08-11 09:43:26 -05:00
Rahul Vaidya	70a5f2f317	Fix rccl-UnitTests packaging on Debian systems (#1846 ) Signed-off-by: ravaidya <ravaidya@amd.com> [ROCm/rccl commit: `cbbc713b03`]	2025-08-08 12:28:56 -05:00
isaki001	52d33058bb	enable more events for LL128 NPKIT trace collection (#1827 ) [ROCm/rccl commit: `74d82a8145`]	2025-08-07 11:19:36 -05:00
awelling2801	c5b4e1bc78	Created coverage tests for rccl_wrap (#1694 ) * Created coverage tests for rccl_wrap RCCL_EXPOSE_STATIC off by default Coverage tests for rccl_wrap.cc * Remove RCCL_EXPOSE_STATIC dependency * Removed Rcclwrap.RcclGetAlgoInfoTest * Remove comments * Corrected RCCL_EXPOSE_STATIC definition logic --------- Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com> Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com> [ROCm/rccl commit: `82bea39280`]	2025-08-06 14:48:00 -05:00
Avinash	f34d760613	Compiler warnings fix 2 (#1801 ) * Changes to device code * Changes to src/misc * Changes to graph * src/include changes * src/transport changes * changes in init, enqueue, proxy * Changes to CMakeLists.txt * Additional changes to device code * Additional changes to net.cc * adding 'compiler warning' tag to ease upstream merge' * typo correction * Addessing comments * Additional changes for new commits [ROCm/rccl commit: `3f8cac388e`]	2025-08-05 17:36:23 -05:00
Arm Patinyasakdikul	df3b7e477f	Disable context tracking for the current version. (#1839 ) [ROCm/rccl commit: `6fc228e247`]	2025-08-04 10:48:00 -05:00
Atul Kulkarni	35283394ed	Add unit tests for graph/xml.cc & graph/xml.h (#1833 ) * Added new binary for executing unit tests Added new unit tests for argcheck.cc and alt_rsmi.cc files Modified the method to execute unit tests to cover static methods by using a bash script to convert static to non-static functions and variables on the fly restricted to debug build type. * Added new unit tests for src/transport/shm.cc * Added new unit tests for graph/xml.cc [ROCm/rccl commit: `0e7d7da55d`]	2025-08-01 14:20:27 -05:00
Atul Kulkarni	e550ba1e3b	Update help text in README (#1837 ) [ROCm/rccl commit: `e2c9f2feab`]	2025-08-01 14:19:27 -05:00
awelling2801	0d34963b35	Added tests for coll_reg (#1700 ) Changes to coll_reg Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com> [ROCm/rccl commit: `5ecc1b7ede`]	2025-07-31 13:49:23 -05:00
dependabot[bot]	b6639c85f4	Bump urllib3 from 2.2.2 to 2.5.0 in /docs/sphinx (#1751 ) Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.2.2 to 2.5.0. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](https://github.com/urllib3/urllib3/compare/2.2.2...2.5.0) --- updated-dependencies: - dependency-name: urllib3 dependency-version: 2.5.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/rccl commit: `32e95963dc`]	2025-07-31 11:25:45 -06:00
dependabot[bot]	e31001e378	Bump rocm-docs-core from 1.18.2 to 1.22.0 in /docs/sphinx (#1836 ) Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.18.2 to 1.22.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.2...v1.22.0) --- updated-dependencies: - dependency-name: rocm-docs-core dependency-version: 1.22.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/rccl commit: `1acc3eb6c1`]	2025-07-31 11:15:01 -06:00
awelling2801	839fcb54b5	Added tests for transport.cc (#1725 ) Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com> [ROCm/rccl commit: `7320752bf3`]	2025-07-31 11:04:28 -05:00
Rahul Vaidya	d65eb0b021	Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831 ) Signed-off-by: ravaidya <ravaidya@amd.com> [ROCm/rccl commit: `0adc5edc74`]	2025-07-31 11:00:49 -05:00
Nilesh M Negi	be810f10f3	[DEVICE] Add unroll=2 for gfx950 multi-node (#1824 ) [ROCm/rccl commit: `bd55f876e9`]	2025-07-31 02:35:26 -05:00
ycui1984	39c508b80d	Add collective latency profiler (#1785 ) * [LatencyProfiler] Initial commit * [LatencyProfiler] Add unit tests * [LatencyProfiler] add more * [LatencyProfiler] Pass unit tests * [LatencyProfiler] Add hooks to integrate with meta internal tools * [LatencyProfiler] Restore install.sh * [LatencyProfiler] Resolved comments 1. add proper license 2. use proper namespace * [LatencyProfiler] Add header [ROCm/rccl commit: `874cd657ef`]	2025-07-30 14:59:28 -07:00
Mustafa Abduljabbar	cafd7a5126	Optimize alltoall for 64 GPUs and above for gfx942 (#1828 ) Add pxn and p2p net chunksize mi300x tuning [ROCm/rccl commit: `4ce3df8d3a`]	2025-07-30 15:14:43 -04:00
mberenjk	cca5172260	Upcast FP8 to Half (FP16) for Sum Operation (#1775 ) * adding hadd and hadd2 support using builtin functions. --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl commit: `c84ee3d298`]	2025-07-29 11:33:06 -05:00
awelling2801	da2bb8a578	Added tests for Ipcsocket (#1690 ) Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com> [ROCm/rccl commit: `9843adaab2`]	2025-07-29 10:03:28 -05:00
awelling2801	88dcaaddc5	Code coverage improvements for alloc.h (#1676 ) * Added tests for alloc.h * Added tests for ZeroElementCopy and MemcpyNullSrcOrDstPointer --------- Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com> [ROCm/rccl commit: `e118aadc14`]	2025-07-29 09:19:57 -05:00
peizhang56	5c02be7b51	Add Unit Test for bitops.h (#1821 ) * Add Unit Test for bitops.h * Change the style * Fix the code review comments * Add more test cases [ROCm/rccl commit: `fe182d6546`]	2025-07-28 11:25:15 -05:00
Atul Kulkarni	de0d446e03	Added new unit tests for src/transport/p2p.cc (#1774 ) [ROCm/rccl commit: `81ec6bff4c`]	2025-07-25 12:57:57 -05:00
Sarat Kamisetty	1719aa67be	passing down NET_OPTIONAL_RECV_COMPLETION hint to n/w plugin to enable optimizations (#1752 ) Co-authored-by: Sarat Kamisetty <sakamiset@amd.com> [ROCm/rccl commit: `783c073a03`]	2025-07-25 10:26:58 -05:00
Mustafa Abduljabbar	b3a0cc5e96	Add optional bf16 software-triggered pipelining for reduceCopyPacks (#1758 ) - Introduced double-buffering to reduce copy overhead and overlap BF16 arithmetic with data prefetching. - Aimed to improve performance of reduction-based collectives by up to 10%. - Implemented based on recommendations from Guennadi Riguer (AMD) - Added --force-reduce-pipeline option to install.sh to activate this optimization for BF16 reductions. - Feature is disabled by default to prevent regressions with large messages until auto-tuning logic is upstreamed. --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> Co-authored-by: Pedram Alizadeh <pmohamma@amd.com> [ROCm/rccl commit: `0ce20e7e07`]	2025-07-25 10:57:05 -04:00
Atul Kulkarni	bd53bdf447	Added new unit tests for src/transport/shm.cc (#1689 ) [ROCm/rccl commit: `1c3d1b3842`]	2025-07-25 05:54:42 -05:00
Arm Patinyasakdikul	866058c6d9	Fix segfault when libibverbs returns 0 device. (#1820 ) Fix: SWDEV-543816 [ROCm/rccl commit: `3c9c22bb52`]	2025-07-23 15:18:52 -05:00
Wenkai Du	caff9764d3	Support fused all reduce and elementwise operations (#1729 ) * Support fused all reduce and elementwise operations Add additional "acc" parameter to RCCL Replayer logs Add flag which indicates availability of new API * Fix Recorder json parsing * Remove unreachable code * Remove extra acc pointer check * . * Revert "[DEVICE] Adding ability to choose unroll factor at runtime (#1734)" This reverts commit `4cadf3597c`. * Use noinline to reduce kernels linking time * Don't use noinline for gfx942 and gfx950 to avoid perf regression --------- Co-authored-by: AtlantaPepsi <timhu102@amd.com> Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com> [ROCm/rccl commit: `9a4213356d`]	2025-07-23 09:04:17 -07:00
alex-breslow-amd	cbb648505a	Cheaper threadfence for gfx942 in postPeer [1/N]: enable for single node allreduce (#1766 ) Boosts single node bfloat16 allreduce performance by up to 20% for some data sizes and provides gating with the RCCL_GFX942_CHEAP_FENCE_OFF environment variable [ROCm/rccl commit: `11fabf1de1`]	2025-07-22 07:15:15 -07:00
Rahul Vaidya	bd63518944	Add datatype validation for MSCCLPP AllGather (#1816 ) Signed-off-by: rahulvaidya20 <ravaidya@amd.com> [ROCm/rccl commit: `c28d3d26a3`]	2025-07-21 11:50:45 -05:00
Atul Kulkarni	c94fb7c58e	Code coverage improvements (#1665 ) * Increased max stack size to 640 * Added new binary for executing unit tests Added new unit tests for argcheck.cc and alt_rsmi.cc files Modified the method to execute unit tests to cover static methods by using a bash script to convert static to non-static functions and variables on the fly restricted to debug build type. [ROCm/rccl commit: `275fdd43c1`]	2025-07-17 11:20:49 -05:00
isaki001	af4ce678b5	Fix typo in NPKit build that prevents NET_TEST event (#1807 ) [ROCm/rccl commit: `ef6a54ba34`]	2025-07-16 09:08:06 -05:00
Nilesh M Negi	2c0c02b211	[GRAPH] Match maxChannels for gfx942 CUs (#1302 ) [ROCm/rccl commit: `6632183efe`]	2025-07-16 09:07:02 -05:00
Wenkai Du	670966f86b	Fix inline compilation issue with LL (#1806 ) [ROCm/rccl commit: `106024b0db`]	2025-07-15 08:39:18 -07:00
isaki001	a20e65cfc0	gfx950 updated on LL thresholds for allreduce/allgather, update treeCorrection (#1803 ) * change LL thresholds for allreduce/allgather and update treeCorrectionFactor * update allGather LL cutoff * adjust allgather LL/LL128 thresholds [ROCm/rccl commit: `8d0f1a1cef`]	2025-07-15 09:10:19 -05:00
dependabot[bot]	c447d779b9	Bump requests from 2.32.2 to 2.32.4 in /docs/sphinx (#1738 ) Bumps [requests](https://github.com/psf/requests) from 2.32.2 to 2.32.4. - [Release notes](https://github.com/psf/requests/releases) - [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md) - [Commits](https://github.com/psf/requests/compare/v2.32.2...v2.32.4) --- updated-dependencies: - dependency-name: requests dependency-version: 2.32.4 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> [ROCm/rccl commit: `aafbdad2ab`]	2025-07-14 10:30:37 -06:00

1 2 3 4 5 ...

1740 次程式碼提交