파일
rocm-systems/projects
Pedram Alizadeh 1ace5d05ed Reapplying PR #1641 [AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1713)
* Reapply "[AG and RS channel tuning] Add thread work threshold to tuning models and precompute reg index in LL128 (#1641)"

This reverts commit 943ad6f7820739385a0b54e81f823d0df1dbf71c.

* Decreasing NCCL_LL128_SHMEM_ELEMS_PER_THREAD from 16 to 8

[ROCm/rccl commit: 3f7c08648f]
2025-06-04 13:22:11 -04:00
..