Mustafa Abduljabbar 951ed9cde1 [AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)
* Update LL128 elems per thread

* Precompute ix[g] in LL128 prim

* Make Threadthreshold part of tuning models

* Ignore channel tuning when channels are env controlled

* Tune LL128 max limit for AG

* Tune LL128 max limit for RS

* Retune AR LL128 limits due to changes

* Update CHANGELOG.md

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 00c1eb098c]
2025-05-14 14:35:54 -05:00
S
Popis
Nebyl uveden žádný popis
282 MiB
Jazyky
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
Jiný 1.1%