Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every
64 submits, This recycles the memory if we dont end up calling
synchronize for the longest time.
Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker
Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.
Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a
The optimization is controlled with ROCR_SKIP_KERNEL_ARG_COPY.
This is initial check-in for experiments. Extra changes are
necessary for full support:
- handle graph capture with the original sysmem alloc
- avoid memobject references, otherwise there is a race condition with
reusage of the arg buffer
- Remove arg setup from hip
Change-Id: Ib0af710f93e79834711fa4049a7c66093711e68b
std::mem_fun() and std::bind2nd() are removed in c++17. Switch to
simpler logic that does not require those functions.
Change-Id: I19a31f076e1813e367615bd377b424046ce144c7
Add ref counting to ProfilingSignal class to track the last release.
If a signal was used in the marker, then don't reuse it,
but create a new one for internal usage.
Don't rely on HSA callback for the command status update if there
are no pending dispatches.
Change-Id: I19f14ed9d80acfe79993b343b2187635f8428a20
HSA signal calback may occur during the actual marker submit. That
may cause a deadlock, because shared lock_ object. Create the new
notify_lock_ field to protect the notification.
Change-Id: I9752af84e59895530620fac3932c6fc276de8658
Runtime can't assign internal HSA signals for HIP events, because
HIP application can destroy the HIP stream or signal reuse may
occur internally. Switch to global HSA signals for HIP events.
Change-Id: Ieaea2d6b039e492b2e7c5112782a8f4e601e50a1
If AMD event contains a reference to a HW event, then runtime
could check/wait for HW event. CPU status update will occur later
after HSA signal callback, but it's not important for the result.
Change-Id: I591391a953bbdba6a25ac07e2cd98aeb17cd4596
HIP tests require HIP callbacks to be processed in another thread.
This change will use a thread from HSA signal callbacks to make
sure a HIP callback was done asynchronously.
Also process the callback before changing the status of command
Change-Id: Icef85d0e0f808663882cf6881ff1be3e5eca29ac
- Don't notify if the batch is empty, because that means
the current command was processed already.
- Disable pinning optimization to avoid a race condition on stall.
- TS marker submition requires extra AQL barrier
to track the status.
Change-Id: I17eff4ad12ac66cfe1bb44048bebb1891805279d
Direct disaptch doesn't insert extra barriers for Markers if
AQL barrier was the last issued command already.
Change-Id: I00fbc658547d83dd3ee64ec391ed50e5f8a08e30
- Avoid GPU wait on the marker submission and update the command
batch after HSA signal callback upon HSA barrier completion.
Change-Id: I5c1c97212aefc2ae4b99aa9e2a81627ee9a38c1c
Make sure the logic updates the command status when it's done in
HW, but not on submission.
Add the last command tracking, otherwise queue sync logic in the HIP
upper layer may skip synchronization, assuming the queue is empty.
Change-Id: I2d046792553e74df090a10f7d7a78914610f6df2
The hack dosn't really track the commands status. It may be not
necessary for HIP, but will cause early resource release.
Change-Id: I791ad36dd8abd3b6b3d2c9b16a210a555c08ca64
OCL can't distinguish different copy types, but ROC profiler
expects SDMA transfer visibility. Add extra code to detect
a transfer with the host memory and substitute OCL command
Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165
Replace amd::Atomic with std::atomic. Remove make_atomic uses by
converting the variable to std::atomic and making sure the memory
order is relaxed when synchronizes-with is not needed.
Delete utils/atomic.hpp.
Change-Id: I0b36db8d604a8510ac6e36b32885fd16a1b8ccfa
Two threads can enqueue to the same HostQueue (HostQueue::enqueue)
and result in last queued command being the first one reachine queue_.enqueue
NOTE: Temporarly make setLastQueuedCommand empty function to pass the build
Change-Id: Id09c3a28d184986f52b2ec86a2f6a18c40df1f0b
Some apps use P2P transfer without any validation for peer access.
Report an error if runtime has found such a request.
Change-Id: I3bf728f1fc3969697ade97bb1d2f1dce294078e2
- Expose ROCclr interfaces for HIP usage
- ROCr interfaces aren't available in staging, thus control the
build with AMD_HMM_SUPPORT define
Change-Id: Iadc2bcc230e78d3b0dc22b235189c8cc80843446