rocm-systems

Files

T

Mustafa Abduljabbar 277747c199 [Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861 )

* Support pipelining codegen and template specialization

* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)

* Remove need for FUNC_INDEX_TOTAL

* Add pipeline field to device function key construction logic

* Avoid unneeded codegen for LL/LL64 kernels

* Modify conditions and add pipeline dtypes env

* Optimize selection for both gfx942 and gfx950

* Increase pipeline bitfield width

* Use __forceinline__ for all device functions

* Realign reduceCopy with original form

* Add opt-out option to enable perf debugs

* Remove force-reduce-pipelining option from README

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

2025-08-26 15:03:54 -04:00

scripts

[Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861 )

2025-08-26 15:03:54 -04:00

CheckSymbolExistsNoWarn.cmake

Hide or fix all build warnings (#1331 )

2024-11-04 09:46:42 -07:00

Dependencies.cmake

[BUILD] Use fmt-header instead of libfmt (#1791 )

2025-07-10 17:19:53 -05:00

DownloadProject.cmake

Only initialize MSCCL++ when runtime-enabled. (#1266 )