e4c025e5cd
* device: optimize threadfence for ll64 protocol
* device: use __atomic_signal_fence()
---------
Co-authored-by: Nusrat Islam <nusislam@useocpslog-003.amd.com>
[ROCm/rccl commit: 6ade5065b4]