Wenkai Du 481a35bc59 Fix memory fence and use non-temporal store (#1007)
* Fix memory fence and use non-temporal store

* Use amdgcn builtin instead of inline asm

* Move threadfence location

* Revert changes to gfx90a

* Rework gfx90a change

* Apply changes to gfx94x

[ROCm/rccl commit: 7965c8b53c]
2023-12-09 12:16:08 -08:00
S
Descrição
Descrição não fornecida
282 MiB
Linguagens
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
Outra 1.1%