b70968d769
1) currently cpu wait is set to true, which makes the host wait for last
command in queue to finish even if the kernel execution has already
finished causing delay in device sync call.
2) device sync only needs to await completion when hw event
is not ready.
Change-Id: I91e3e89d39a1193ae06abac822cea8ae651493a5
[ROCm/clr commit: eb1089593e]