a724f1ebb79f8d0d1bb37630bce3c028dcefe6b6
* Add ring simple chunk size tuning
* modifying the tuning table to improve the performance of broadcast for 8MB to 32MB for single-node MI300X after ring simple chunk size tuning
* modifying the tuning table to improve the performance of reduce for 1MB to 4MB for single-node MI300X after ring simple chunk size tuning
---------
Co-authored-by: PedramAlizadeh <pmohamma@amd.com>
[ROCm/rccl commit: 73221b4230]
Описание
No description provided
Languages
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Разное
1.1%