93ac2ea61e4caa3e226840e3ebcc47e3889a315d
* Enabling LL128 by default on MI300
* Add missing CUDACHECK
* Adjust BW correction factors to fix the Tree->Ring switching point
* Refactor and add ll128 AR logarithmic factor to tuning models
* Move RCCL tuning changes to a separate file
* Use enum for tunable indexing
* Use explicit indexing in tuning models to avoid mismatch issues
* Place rcclGetSizePerRank in a function
* Remove HIP ifdef for rccl-only call
---------
Co-authored-by: Mustafa Abduljabbar <mustafa.abduljabbar@amd.com>
[ROCm/rccl commit: e40ff4f84a]
Açıklama
Hiçbir açıklama sağlanmadı
Dil
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Diğer
1.1%