This eliminates host-synchronization for null stream. Instead, the null-stream uses GPU-side events to wait for other streams. Default is OFF pending additional testing. Add enhanced null-stream test. Also refine HIP_TRACE_API.