Pedram Alizadeh 1ace5d05ed Reapplying PR #1641 [AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1713)
* Reapply "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)"

This reverts commit 943ad6f7820739385a0b54e81f823d0df1dbf71c.

* Decreasing NCCL_LL128_SHMEM_ELEMS_PER_THREAD from 16 to 8

[ROCm/rccl commit: 3f7c08648f]
2025-06-04 13:22:11 -04:00
S
説明
説明が提供されていません
282 MiB
言語
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
その他 1.1%