951ed9cde1
* Update LL128 elems per thread
* Precompute ix[g] in LL128 prim
* Make Threadthreshold part of tuning models
* Ignore channel tuning when channels are env controlled
* Tune LL128 max limit for AG
* Tune LL128 max limit for RS
* Retune AR LL128 limits due to changes
* Update CHANGELOG.md
---------
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>
[ROCm/rccl commit: 00c1eb098c]