- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker
Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
If AMD event contains a reference to a HW event, then runtime
could check/wait for HW event. CPU status update will occur later
after HSA signal callback, but it's not important for the result.
Change-Id: I591391a953bbdba6a25ac07e2cd98aeb17cd4596
- Avoid GPU wait on the marker submission and update the command
batch after HSA signal callback upon HSA barrier completion.
Change-Id: I5c1c97212aefc2ae4b99aa9e2a81627ee9a38c1c
Make sure the logic updates the command status when it's done in
HW, but not on submission.
Add the last command tracking, otherwise queue sync logic in the HIP
upper layer may skip synchronization, assuming the queue is empty.
Change-Id: I2d046792553e74df090a10f7d7a78914610f6df2
Reduce the size of the queueLock and lastCmdLock critical sections
to improve lock contention performance. The smaller the critical
sections are the better.
lasCmdLock is still needed to guarantee that getLastEnqueueCommand_
can retain the command before it is swapped out and released.
Change-Id: Id35d4a77c035b2da0de4c15568b153d49e958bb7
Two threads can enqueue to the same HostQueue (HostQueue::enqueue)
and result in last queued command being the first one reachine queue_.enqueue
NOTE: Temporarly make setLastQueuedCommand empty function to pass the build
Change-Id: Id09c3a28d184986f52b2ec86a2f6a18c40df1f0b
~45% to 50% of Performance drop on rocBLAS_int8 test
Use the last command in the queue for a wait.
Add extra print information about processed commands.
Add an option to disable file location printing.
Change-Id: I4187883e1a90e571fde3128af98368108fda8785