93ac2ea61e
* Enabling LL128 by default on MI300
* Add missing CUDACHECK
* Adjust BW correction factors to fix the Tree->Ring switching point
* Refactor and add ll128 AR logarithmic factor to tuning models
* Move RCCL tuning changes to a separate file
* Use enum for tunable indexing
* Use explicit indexing in tuning models to avoid mismatch issues
* Place rcclGetSizePerRank in a function
* Remove HIP ifdef for rccl-only call
---------
Co-authored-by: Mustafa Abduljabbar <mustafa.abduljabbar@amd.com>
[ROCm/rccl commit: e40ff4f84a]