1ace5d05eda412447eb54bf09eef38da5ca6d9b4
* Reapply "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)"
This reverts commit 943ad6f7820739385a0b54e81f823d0df1dbf71c.
* Decreasing NCCL_LL128_SHMEM_ELEMS_PER_THREAD from 16 to 8
[ROCm/rccl commit: 3f7c08648f]
説明
説明が提供されていません
言語
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
その他
1.1%