rocm-systems

مولف	SHA1	پیام	تاریخ
Marzieh Berenjkoub	d7293281f3	Merge remote-tracking branch 'nccl/master' into develop [ROCm/rccl commit: `858b4e76eb`]	2026-01-20 13:04:02 -06:00
Mustafa Abduljabbar	2621e0254e	[Device] WarpSpeed enablement and single node CU and perf opt for MI350 (#2073 ) [ROCm/rccl commit: `d009ab144e`]	2025-12-11 19:04:35 -05:00
AbandiGa	7f7c8d14f6	Disable Bfloatf16 pipelining for reduction collectives for gfx950 (#2047 ) * disable bf16 reduce_copy pipelining for gfx950 * edit CHANGELOG * Combine unroll and pipeline local arch calculation into single function * fix multi-node error and disbale for gfx950 even if it's not a local build * removed has_gfx950 * disable pipelining for gfx950 in rcclSetPipelining --------- Co-authored-by: Ghadeer Alabandi <galaband@cv350-zts-gtu-h30-08.prov.gtu.zts.cpe.ice.amd.com> Co-authored-by: Ghadeer Alabandi <galaband@cv350-zts-gtu-h30-18.prov.gtu.zts.cpe.ice.amd.com> Co-authored-by: Ghadeer Alabandi <galaband@cv350-zts-gtu-h28a-08.prov.gtu.zts.cpe.ice.amd.com> [ROCm/rccl commit: `277b6e9bac`]	2025-11-13 14:55:09 -06:00
Ghadeer Ahmed H Alabandi	5b66480595	[NET] Enable capping the number of QPs created for send/recv colls (#1998 ) [ROCm/rccl commit: `45991fadad`]	2025-11-07 00:47:01 +00:00
Arm Patinyasakdikul	54194a17c3	Added ERROR message class to handle fatal error messages. (#2002 ) * Added ERROR message class to handle fatal error messages. New ERROR message class will print the message in all debug level, including none. Change some of the fatal error message to be in ERROR instead of WARN. Added new error handler function to print out more meaningful error message in the future. * Added CHANGELOG entry. * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Change to no longer reuse NONE as ERROR. ERROR is now a separated class. * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `1ce83d5cc0`]	2025-10-30 16:14:20 -05:00
isaki001	9bccbcd619	P2p batching hang-fix (#2011 ) * prevent batching when send/recv bytes dont match, restore bit reversal for channel to part mapping, prevent batching beyond 32-nodes * correct computation for channel to part mapping * update changelog * disabling p2p-batching by default [ROCm/rccl commit: `641c0eb51c`]	2025-10-30 13:32:01 -05:00
corey-derochie-amd	c5cdee4fa5	Updated Changelog with 7.1.1 and 7.2.0 stub sections (#2008 ) * Missing ROCm 7.0 & 7.1.0 Changelog entries (#1976) * Update CHANGELOG.md * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Added ROCm 7.2.0 section. * Update CHANGELOG.md * Apply suggestion from @corey-derochie-amd --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `561ad2fe05`]	2025-10-28 13:41:22 -06:00
mberenjk	96c62b091d	Add support for additional paths in RCCL DMABUF kernel configuration loading (#1825 ) * Adding more path to the kernel load and an environment variable to force enable DMABUF --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl commit: `b58f234539`]	2025-10-20 13:35:22 -07:00
isaki001	6d151d4e21	gfx950 channel tuning for ReduceScatter and AllGather (#1940 ) * add channel thresholds to override channel-count adjustments [ROCm/rccl commit: `0f99fd84a3`]	2025-10-14 09:50:44 -05:00
BertanDogancay	2a4e4308b0	Merge remote-tracking branch 'nccl/master' into develop [ROCm/rccl commit: `3f94267f21`]	2025-10-06 18:36:49 -04:00
Mustafa Abduljabbar	a075779dcd	Use batched P2P to enhance alltoall small message performance (#1902 ) * Batch P2P operations (2 per CU/channel) and update channel-part mapping - Revert bitreversal and fix channel mapping to be compatible with P2P batching and avoid hangs - P2P batching is only used for more than 2 nodes to avoid aggregating intra-node traffic when it is dominant for less than 2 nodes * Address single node regression and channel per net peer * Add batching threshold * Add enable switch for batching * Update CHANGELOG.md * Add minor comment change * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `c1e1f2faeb`]	2025-09-22 16:25:10 -04:00
Mustafa Abduljabbar	1a7ab8dfc8	Force enable proto and/or algo after model selection (#1799 ) * Force enable proto or algo * Remove inc nccl_common.h * Move logic and add error checks * Fix topo_expl compatibility * Allow algo/proto overrides * Remove extra function decl * Clarify warning message * Move algo/proto overrides into separate functions * Update CHANGELOG.md [ROCm/rccl commit: `7ccc6f268f`]	2025-09-03 08:54:13 -04:00
BertanDogancay	881327184e	Merge remote-tracking branch 'nccl/master' into develop [ROCm/rccl commit: `08a7be231b`]	2025-08-28 15:46:28 -05:00
Avinash	832c5b1f13	[build] Disable MSCCL++ compilation by default (#1879 ) * Enable MSCCLPP on request * Updating docs and README * Updates to CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Updates to CHANGELOG.md * Update CHANGELOG.md Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> * Update CHANGELOG.md Github didn't take the edit to my suggestion properly. --------- Co-authored-by: amd <amd@super3.amd.com> Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> [ROCm/rccl commit: `a0ec15bafe`]	2025-08-28 08:52:12 -06:00
Mustafa Abduljabbar	f37f290134	[Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861 ) * Support pipelining codegen and template specialization * Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16) * Remove need for FUNC_INDEX_TOTAL * Add pipeline field to device function key construction logic * Avoid unneeded codegen for LL/LL64 kernels * Modify conditions and add pipeline dtypes env * Optimize selection for both gfx942 and gfx950 * Increase pipeline bitfield width * Use __forceinline__ for all device functions * Realign reduceCopy with original form * Add opt-out option to enable perf debugs * Remove force-reduce-pipelining option from README * Update CHANGELOG.md --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `277747c199`]	2025-08-26 15:03:54 -04:00
Nusrat Islam	1af94eee8d	Add direct allgather algorithm (#1868 ) * add direct allgather algorithm * minor fix * add debug print for memory allocation tracker * add message size threshold for direct allgather * scatter transfers across ranks * update changelog * minor fix * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * enable direct AG when pxn is ON on MI300X or MI350 --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `5e7937effb`]	2025-08-25 07:55:10 -05:00
Mustafa Abduljabbar	b3a0cc5e96	Add optional bf16 software-triggered pipelining for reduceCopyPacks (#1758 ) - Introduced double-buffering to reduce copy overhead and overlap BF16 arithmetic with data prefetching. - Aimed to improve performance of reduction-based collectives by up to 10%. - Implemented based on recommendations from Guennadi Riguer (AMD) - Added --force-reduce-pipeline option to install.sh to activate this optimization for BF16 reductions. - Feature is disabled by default to prevent regressions with large messages until auto-tuning logic is upstreamed. --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> Co-authored-by: Pedram Alizadeh <pmohamma@amd.com> [ROCm/rccl commit: `0ce20e7e07`]	2025-07-25 10:57:05 -04:00
Wenkai Du	caff9764d3	Support fused all reduce and elementwise operations (#1729 ) * Support fused all reduce and elementwise operations Add additional "acc" parameter to RCCL Replayer logs Add flag which indicates availability of new API * Fix Recorder json parsing * Remove unreachable code * Remove extra acc pointer check * . * Revert "[DEVICE] Adding ability to choose unroll factor at runtime (#1734)" This reverts commit `4cadf3597c`. * Use noinline to reduce kernels linking time * Don't use noinline for gfx942 and gfx950 to avoid perf regression --------- Co-authored-by: AtlantaPepsi <timhu102@amd.com> Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com> [ROCm/rccl commit: `9a4213356d`]	2025-07-23 09:04:17 -07:00
corey-derochie-amd	37ab47fab4	Updated CHANGELOG for LL128 support for gfx942 in 7.0 (#1719 ) * Updated CHANGELOG for LL128 support for gfx942 in 7.0 Also ported 6.4.2 section * Removed unnecessary note from 7.0 [ROCm/rccl commit: `e73db11819`]	2025-06-23 08:50:12 -06:00
BertanDogancay	c0c9312e38	Merge remote-tracking branch 'nccl/master' into develop [ROCm/rccl commit: `aaf023976a`]	2025-06-20 07:54:49 -05:00
Nilesh M Negi	4cadf3597c	[DEVICE] Adding ability to choose unroll factor at runtime (#1734 ) * Adding runtime unroll factor selection via RCCL_UNROLL_FACTOR * [BUILD] Add support for user-defined UNROLL for debugging * Update CHANGELOG.md * Fix COLLTRACE errors in CI * Add debug statements for unroll and resolve warnings * Incorporate UNROLL into ONLY_FUNCS for debugging --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com> Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `9d72be7b2f`]	2025-06-11 00:07:59 -05:00
Nilesh M Negi	7abc3160e7	[BUILD] Enable LL128 on gfx950 (#1731 ) * [BUILD] Enable LL128 on gfx950 * Modify comment in src/rccl_wrap.cc * Update CHANGELOG Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> [ROCm/rccl commit: `ef5b4ff630`]	2025-06-09 00:25:54 -05:00
Pedram Alizadeh	1ace5d05ed	Reapplying PR #1641 [AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1713 ) * Reapply "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)" This reverts commit 943ad6f7820739385a0b54e81f823d0df1dbf71c. * Decreasing NCCL_LL128_SHMEM_ELEMS_PER_THREAD from 16 to 8 [ROCm/rccl commit: `3f7c08648f`]	2025-06-04 13:22:11 -04:00
Avinash	a50ff2c3d3	SPLITCOMM design fix in src/misc/msccl (#1715 ) * Fix TOC-TOU in mcclInit * Improving vector resize thread safety * Initial commit rank to comm change * Removing unwanted include header changes * Updated CHANGELOG.md * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `e94b360246`]	2025-06-01 21:00:38 -05:00
Nilesh M Negi	19ed482121	Re-apply unroll=1 and 112 channels for gfx950 (#1706 ) * Reapply "[SRC] Enable unroll=1 for gfx950 (#1602)" (#1667) This reverts commit `a6972c0d09`. * Reapply "[GRAPH] Increase default nChannels to 112 for gfx950 (#1596)" (#1620) This reverts commit `1a2eca1756`. [ROCm/rccl commit: `12517a957e`]	2025-05-28 14:58:10 -05:00
corey-derochie-amd	22120c6303	Fixed errors in the CHANGELOG for ROCm 7.0 (#1702 ) * Updated 6.5 release to be 7.0 * Corrected the RCCL version for 6.4.1 * Moved items to the correct releases * Added NCCL 2.25.1 compatibility item * Fixed wording * Added entry for `ManagedMem` and `ManagedMemGraph` test fix [ROCm/rccl commit: `7b633d5844`]	2025-05-23 15:47:59 -05:00
PedramAlizadeh	a99f960742	Revert "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641 )" This reverts commit `951ed9cde1`. [ROCm/rccl commit: `7f878baef0`]	2025-05-21 20:21:27 -05:00
Arm Patinyasakdikul	1313bccaca	CHANGELOG.md: Add UT failures as known issue for 6.4.1. (#1698 ) * CHANGELOG.md: Add UT failures as known issue for 6.4.1. --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `1710c27e77`]	2025-05-19 10:40:50 -05:00
Arm Patinyasakdikul	3e16753c71	Added known issue for 6.4.1 release to CHANGELOG.md. (#1697 ) [ROCm/rccl commit: `e602497789`]	2025-05-16 08:17:48 -05:00
Arm Patinyasakdikul	4b5ff98d65	Change GPU references to gfx950. (#1695 ) [ROCm/rccl commit: `f306c00671`]	2025-05-15 10:32:46 -05:00
Mustafa Abduljabbar	951ed9cde1	[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641 ) * Update LL128 elems per thread * Precompute ix[g] in LL128 prim * Make Threadthreshold part of tuning models * Ignore channel tuning when channels are env controlled * Tune LL128 max limit for AG * Tune LL128 max limit for RS * Retune AR LL128 limits due to changes * Update CHANGELOG.md --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `00c1eb098c`]	2025-05-14 14:35:54 -05:00
Mustafa Abduljabbar	128b0e7074	Remove MSCCL single node AllGather XMLs (#1693 ) * Remove MSCCL single node XMLs * Remove comment on MSCCL AG single node support [ROCm/rccl commit: `d665547eef`]	2025-05-13 17:07:03 -05:00
gilbertlee-amd	6e57154001	Fix when more than 64 channels are used for multi-collective group calls (#1688 ) * Fix when more than 64 channels are used for multi-collective group calls * Update CHANGELOG.md Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `9ef45df8f7`]	2025-05-12 18:05:57 -05:00
Nilesh M Negi	a6972c0d09	Revert "[SRC] Enable unroll=1 for gfx950 (#1602 )" (#1667 ) * Revert "[SRC] Enable unroll=1 for gfx950 (#1602)" This reverts commit `210f90ae0f`. * Update Changelog --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> [ROCm/rccl commit: `329e13efff`]	2025-04-30 23:33:08 -05:00
Mustafa Abduljabbar	a85cfaa680	[AllGather MSCCL] Multinode and single node support up to certain send count (#1650 ) * Add multinode and singlenode allgather XML [ROCm/rccl commit: `aa7991dfc8`]	2025-04-24 09:02:03 -04:00
Istvan Kiss	858fa4e65d	Add documentation for NPS4 and CPX partition modes (#1555 ) [ROCm/rccl commit: `28ab8603d2`]	2025-03-31 09:25:25 -06:00
Nilesh M Negi	1a2eca1756	Revert "[GRAPH] Increase default nChannels to 112 for gfx950 (#1596 )" (#1620 ) * Revert "[GRAPH] Increase default nChannels to 112 for gfx950 (#1596)" This reverts commit `cf17cff5b6`. * [DOC] Update Changelog * [DOC] Update CHANGELOG [ROCm/rccl commit: `b17338d164`]	2025-03-28 17:57:06 -05:00
gilbertlee-amd	4ca7e6873e	Rail optimized trees (#1540 ) * Allow disabling rail-optimized trees via RCCL_DISABLE_RAIL_TREES, Graphviz-friendly output via RCCL_OUTPUT_TREES [ROCm/rccl commit: `ddc5d58b93`]	2025-02-20 15:18:29 -07:00
Mustafa Abduljabbar	f58025185e	Add IB verbs logging and enable traces through install.sh (#1511 ) * Add IB Verbs logging * Simplify tracing and undo debug.h changes * Update debug.h * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md * Exchange remote comm device index [ROCm/rccl commit: `dc75209dd7`]	2025-01-31 12:35:39 -05:00
BertanDogancay	1b000665df	Merge remote-tracking branch 'nccl/master' into develop [ROCm/rccl commit: `36343be84f`]	2025-01-23 12:08:46 -06:00
corey-derochie-amd	ebacc24598	Added RCCL env params to control setting the SO_REUSEADDR and SO_LINGER socket options (#1418 ) * Added RCCL env params to control setting the SO_REUSEADDR and SO_LINGER socket options. This can allow control over the number of file descriptors created during bootstrapping. * Casted the linger value to `int` sooner to avoid a scope of unknown typed-ness. * Added CHANGELOG entry for this feature. [ROCm/rccl commit: `2e35417fe5`]	2025-01-14 10:26:04 -07:00
Jeffrey Novotny	cc9209f770	Update rccl changelog for 6.3.1 (#1433 ) * Update rccl changelog for 6.3.1 * Fix version number * Correct RCCL release version * Added details to 6.3.0 changelog --------- Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> [ROCm/rccl commit: `e42f10a361`]	2024-11-26 08:46:37 -05:00
corey-derochie-amd	1c700083b2	Update CHANGELOG to match release branches 6.2 and 6.3 (#1391 ) * [CHANGELOG] Add Known issues for ROCm 6.2.1 Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> * Updated 6.2.1 known issues to match the content in develop. * Updated CHANGELOG for ROCm 6.3 release. (#1380) * Updated CHANGELOG for ROCm 6.3 release. * Update CHANGELOG to new format. Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: nileshnegi <Nilesh.Negi@amd.com> Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `6ed513e1b9`]	2024-10-23 13:49:40 -06:00
Sandra Polifroni	53478f138e	Updated the information for 6.2.1 in the changelog so that it reflects what's in the 6.2.1 release notes [ROCm/rccl commit: `7f87b0cd85`]	2024-09-23 14:27:58 -04:00
Bertan Dogancay	ed152c5b89	Update CHANGELOG.md for RCCL 2.20.5 (#1150 ) [ROCm/rccl commit: `dcc75797a1`]	2024-04-24 09:07:49 -06:00
corey-derochie-amd	34fb1007a7	Updated CHANGELOG for next release (#1146 ) * Updated CHANGELOG to release for ROCm 6.1.0 (#1142) * Fixed missing CHANGELOG notes from ROCm 5.5 through unreleased 6.1 (#1141) * Update CHANGELOG.md for ROCm release 5.5 (cherry picked from commit 83342e865445b233319466d4a620c1166ecaf181) * Update CHANGELOG.md for ROCm 5.7.0 (cherry picked from commit a7c3b8dcb5cd0654f0a39cb3be4fdf7e8c820577) * Added ROCm 6.0 and 6.1 CHANGELOG notes. --------- Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com> (cherry picked from commit `28a2b09304`) * Updated CHANGELOG to release for ROCm 6.1.0 * Removed empty sections from CHANGELOG in latest releases. (cherry picked from commit 164c9553717f2c3bce86a372764ea73030dd5f72) * Reverted ROCm 6.1.0 block to "Unreleased" [ROCm/rccl commit: `a14137c062`]	2024-04-15 16:29:40 -06:00
gilbertlee-amd	422a7ffcbb	Rail optimization for rings (#1140 ) - Modifies the ring creation algorithm to be friendlier to rail-optimized topologies (should not affect classic fabric topologies) [ROCm/rccl commit: `4cb62f999a`]	2024-04-15 12:03:57 -06:00
corey-derochie-amd	28a2b09304	Fixed missing CHANGELOG notes from ROCm 5.5 through unreleased 6.1 (#1141 ) * Update CHANGELOG.md for ROCm release 5.5 (cherry picked from commit 83342e865445b233319466d4a620c1166ecaf181) * Update CHANGELOG.md for ROCm 5.7.0 (cherry picked from commit a7c3b8dcb5cd0654f0a39cb3be4fdf7e8c820577) * Added ROCm 6.0 and 6.1 CHANGELOG notes. --------- Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com> [ROCm/rccl commit: `3361abe786`]	2024-04-11 15:04:40 -06:00
corey-derochie-amd	62a6a07d49	Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. (#1125 ) [ROCm/rccl commit: `503a472a25`]	2024-03-25 16:29:13 -06:00
Wenkai Du	393d0ba7f8	Add back __syncthreads() in barrier and adjust stack size (#688 ) [ROCm/rccl commit: `1c166046a2`]	2023-02-18 08:50:31 -08:00

1 2

68 کامیت‌ها