mberenjk 08c0b8b0fc moving the thread_fence to apply before atomic fetch (#1672)
* applying thread_fence only on warp 0 before atomic fetch

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 1cefcee51f]
2025-05-14 10:10:05 -05:00
S
描述
无详细信息
282 MiB
语言
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
其它 1.1%