* applying thread_fence only on warp 0 before atomic fetch --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl commit: 1cefcee51f]
1cefcee51f