951ed9cde1570b471b2ab66dccab0f439f2e45f8
* Update LL128 elems per thread
* Precompute ix[g] in LL128 prim
* Make Threadthreshold part of tuning models
* Ignore channel tuning when channels are env controlled
* Tune LL128 max limit for AG
* Tune LL128 max limit for RS
* Retune AR LL128 limits due to changes
* Update CHANGELOG.md
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
[ROCm/rccl commit: 00c1eb098c]
Popis
Nebyl uveden žádný popis
Jazyky
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Jiný
1.1%