Файли
rocm-systems/src/include
Mustafa Abduljabbar 277747c199 [Device] Add dynamic fetch/reduce pipelining for reduction collectives - Simple protocol (#1861)
* Support pipelining codegen and template specialization

* Support ReduceCopy pipelining for AllReduce, ReduceScatter, and Reduce (currently enabled for bfloat16)

* Remove need for FUNC_INDEX_TOTAL

* Add pipeline field to device function key construction logic

* Avoid unneeded codegen for LL/LL64 kernels

* Modify conditions and add pipeline dtypes env

* Optimize selection for both gfx942 and gfx950

* Increase pipeline bitfield width

* Use __forceinline__ for all device functions

* Realign reduceCopy with original form

* Add opt-out option to enable perf debugs

* Remove force-reduce-pipelining option from README

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
2025-08-26 15:03:54 -04:00
..
2025-01-21 09:24:32 -06:00
2024-05-14 13:51:41 -07:00
2024-04-02 01:53:21 -07:00
2024-09-16 23:41:17 -07:00
2024-06-19 01:57:16 -07:00
2024-09-16 23:41:17 -07:00
2024-02-13 04:22:38 -08:00
2020-11-17 11:08:52 -08:00
2024-09-16 23:41:17 -07:00
2025-01-07 02:01:15 -08:00
2025-08-05 17:36:23 -05:00
2023-09-26 05:50:33 -07:00
2023-09-26 05:50:33 -07:00
2025-01-07 02:01:15 -08:00
2024-02-13 04:22:38 -08:00
2025-01-27 03:33:57 -08:00
2025-03-12 13:46:21 -07:00
2025-03-12 13:46:21 -07:00
2025-01-07 02:01:15 -08:00
2024-09-16 23:41:17 -07:00
2025-04-22 13:50:40 -07:00
2025-06-25 23:01:34 -05:00
2025-03-12 13:46:21 -07:00
2025-08-05 17:36:23 -05:00
2025-03-12 13:46:21 -07:00
2025-03-12 13:46:21 -07:00
2025-01-07 02:01:15 -08:00
2025-03-12 13:46:21 -07:00
2024-09-16 23:41:17 -07:00
2020-11-17 11:08:52 -08:00
2024-06-19 01:57:16 -07:00