08c0b8b0fc
* applying thread_fence only on warp 0 before atomic fetch
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl commit: 1cefcee51f]