rocm-systems

作成者	SHA1	メッセージ	日付
Mustafa Abduljabbar	f37f290134	[Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861 ) * Support pipelining codegen and template specialization * Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16) * Remove need for FUNC_INDEX_TOTAL * Add pipeline field to device function key construction logic * Avoid unneeded codegen for LL/LL64 kernels * Modify conditions and add pipeline dtypes env * Optimize selection for both gfx942 and gfx950 * Increase pipeline bitfield width * Use __forceinline__ for all device functions * Realign reduceCopy with original form * Add opt-out option to enable perf debugs * Remove force-reduce-pipelining option from README * Update CHANGELOG.md --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: `277747c199`]	2025-08-26 15:03:54 -04:00
mberenjk	c76a4492f1	Added useAcc as a template parameter to address the performance regression (#1856 ) * Added useAcc as a template parameter to address the 2% performance regression in allreduceWithBias --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl commit: `c61152baa4`]	2025-08-14 15:58:54 -05:00
BertanDogancay	d045d0ca23	Merge remote-tracking branch 'nccl/master' into develop [ROCm/rccl commit: `a6bf9bfc9e`]	2025-04-23 20:47:43 -07:00
BertanDogancay	1b000665df	Merge remote-tracking branch 'nccl/master' into develop [ROCm/rccl commit: `36343be84f`]	2025-01-23 12:08:46 -06:00
Bertan Dogancay	974c13cd62	[BUILD] Move code generation to python from CMake (#1360 ) * Use generate.py for func generation * Convert AddUnroll.cmake to bash [ROCm/rccl commit: `2dd10c8f17`]	2024-10-03 10:21:19 -04:00

5 コミット