caff9764d35f7528e18870b19d53e9cb2ba9534c
* Support fused all reduce and elementwise operations Add additional "acc" parameter to RCCL Replayer logs Add flag which indicates availability of new API * Fix Recorder json parsing * Remove unreachable code * Remove extra acc pointer check * . * Revert "[DEVICE] Adding ability to choose unroll factor at runtime (#1734)" This reverts commit4cadf3597c. * Use noinline to reduce kernels linking time * Don't use noinline for gfx942 and gfx950 to avoid perf regression --------- Co-authored-by: AtlantaPepsi <timhu102@amd.com> Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com> [ROCm/rccl commit:9a4213356d]
Descrição
Descrição não fornecida
Linguagens
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Outra
1.1%