2
0
Kaiming Ouyang d03ae00bac Fix cudaMemcpyAsync bug
We are trying to use the copy result of first cudaMemcpyAsync in the
second cudaMemcpyAsync without sync in between. This patch fixes it
by allocating a CPU side array to cache device side addr so that we
can avoid this consecutive cuda mem copy.

Fixes #957


[ROCm/rccl commit: 4365458757]
2023-09-20 05:51:14 -07:00
2023-09-20 05:51:14 -07:00
S
Descrição
Descrição não fornecida
282 MiB
Linguagens
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
Outros 1.1%