Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.
Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928
[ROCm/clr commit: 5df34a2f7a]
The heap must be cleared once per device, but ROCclr doesn't
create a queue per device in HIP. Hence, the clear operation will
be performed during the first queue creation.
Change-Id: I52ceb06d67d11cde6d019c5ab510059f426a9bfb
[ROCm/clr commit: 04bfd93569]
Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every
64 submits, This recycles the memory if we dont end up calling
synchronize for the longest time.
Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c
[ROCm/clr commit: fa76f03654]
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker
Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
[ROCm/clr commit: 8eeaa998c0]
We do not want to release resources during setStatus in HIP because of Graphs
Change-Id: Idc7b188ab5f8be6975ea91005dd2bbf177401f8c
[ROCm/clr commit: 133287f31f]
If AMD event contains a reference to a HW event, then runtime
could check/wait for HW event. CPU status update will occur later
after HSA signal callback, but it's not important for the result.
Change-Id: I591391a953bbdba6a25ac07e2cd98aeb17cd4596
[ROCm/clr commit: 85c70a7495]
With direct dispatch enabled make sure the queue is done before
destruction.
Change-Id: Ib80af3efb97dfb93e2dce60a11db34fb5c45f5cd
[ROCm/clr commit: a81756bba3]
- Avoid GPU wait on the marker submission and update the command
batch after HSA signal callback upon HSA barrier completion.
Change-Id: I5c1c97212aefc2ae4b99aa9e2a81627ee9a38c1c
[ROCm/clr commit: 6966d8098e]
Make sure the logic updates the command status when it's done in
HW, but not on submission.
Add the last command tracking, otherwise queue sync logic in the HIP
upper layer may skip synchronization, assuming the queue is empty.
Change-Id: I2d046792553e74df090a10f7d7a78914610f6df2
[ROCm/clr commit: 5b31c69a95]
Reduce the size of the queueLock and lastCmdLock critical sections
to improve lock contention performance. The smaller the critical
sections are the better.
lasCmdLock is still needed to guarantee that getLastEnqueueCommand_
can retain the command before it is swapped out and released.
Change-Id: Id35d4a77c035b2da0de4c15568b153d49e958bb7
[ROCm/clr commit: 080dcfe857]
Two threads can enqueue to the same HostQueue (HostQueue::enqueue)
and result in last queued command being the first one reachine queue_.enqueue
NOTE: Temporarly make setLastQueuedCommand empty function to pass the build
Change-Id: Id09c3a28d184986f52b2ec86a2f6a18c40df1f0b
[ROCm/clr commit: 3d15a1e291]
~45% to 50% of Performance drop on rocBLAS_int8 test
Use the last command in the queue for a wait.
Add extra print information about processed commands.
Add an option to disable file location printing.
Change-Id: I4187883e1a90e571fde3128af98368108fda8785
[ROCm/clr commit: a66d09f5a3]