Reduce the size of the queueLock and lastCmdLock critical sections
to improve lock contention performance. The smaller the critical
sections are the better.
lasCmdLock is still needed to guarantee that getLastEnqueueCommand_
can retain the command before it is swapped out and released.
Change-Id: Id35d4a77c035b2da0de4c15568b153d49e958bb7
[ROCm/clr commit: 080dcfe857]
Two threads can enqueue to the same HostQueue (HostQueue::enqueue)
and result in last queued command being the first one reachine queue_.enqueue
NOTE: Temporarly make setLastQueuedCommand empty function to pass the build
Change-Id: Id09c3a28d184986f52b2ec86a2f6a18c40df1f0b
[ROCm/clr commit: 3d15a1e291]
~45% to 50% of Performance drop on rocBLAS_int8 test
Use the last command in the queue for a wait.
Add extra print information about processed commands.
Add an option to disable file location printing.
Change-Id: I4187883e1a90e571fde3128af98368108fda8785
[ROCm/clr commit: a66d09f5a3]