In this change, the cpu memcpy will wait until all the commands in the current stream are done. Note that, it only waits on current stream. But not on other streams.