Runtime can't assign internal HSA signals for HIP events, because
HIP application can destroy the HIP stream or signal reuse may
occur internally. Switch to global HSA signals for HIP events.
Change-Id: Ieaea2d6b039e492b2e7c5112782a8f4e601e50a1
[ROCm/clr commit: ce8dad2ecc]
We do not want to release resources during setStatus in HIP because of Graphs
Change-Id: Idc7b188ab5f8be6975ea91005dd2bbf177401f8c
[ROCm/clr commit: 133287f31f]
If AMD event contains a reference to a HW event, then runtime
could check/wait for HW event. CPU status update will occur later
after HSA signal callback, but it's not important for the result.
Change-Id: I591391a953bbdba6a25ac07e2cd98aeb17cd4596
[ROCm/clr commit: 85c70a7495]
For DD, send a NOP packet so that we leverage the handler to indicate
completion.
Change-Id: Ie57ea0124a8497d39cc49da1c4575c2cd86b9319
[ROCm/clr commit: 9d0846e732]
HIP tests require HIP callbacks to be processed in another thread.
This change will use a thread from HSA signal callbacks to make
sure a HIP callback was done asynchronously.
Also process the callback before changing the status of command
Change-Id: Icef85d0e0f808663882cf6881ff1be3e5eca29ac
[ROCm/clr commit: 7f32d0b425]
- Don't notify if the batch is empty, because that means
the current command was processed already.
- Disable pinning optimization to avoid a race condition on stall.
- TS marker submition requires extra AQL barrier
to track the status.
Change-Id: I17eff4ad12ac66cfe1bb44048bebb1891805279d
[ROCm/clr commit: 24299e25bd]
Skip notification for markers with direct dispatch only,
since they are blocking always
Change-Id: I6bb17650f73371dae6e29c59fd6bb2012cc062fd
[ROCm/clr commit: a9b0e20d26]
Direct disaptch doesn't insert extra barriers for Markers if
AQL barrier was the last issued command already.
Change-Id: I00fbc658547d83dd3ee64ec391ed50e5f8a08e30
[ROCm/clr commit: 0587fb7450]
- Avoid GPU wait on the marker submission and update the command
batch after HSA signal callback upon HSA barrier completion.
Change-Id: I5c1c97212aefc2ae4b99aa9e2a81627ee9a38c1c
[ROCm/clr commit: 6966d8098e]
Make sure the logic updates the command status when it's done in
HW, but not on submission.
Add the last command tracking, otherwise queue sync logic in the HIP
upper layer may skip synchronization, assuming the queue is empty.
Change-Id: I2d046792553e74df090a10f7d7a78914610f6df2
[ROCm/clr commit: 5b31c69a95]
The hack dosn't really track the commands status. It may be not
necessary for HIP, but will cause early resource release.
Change-Id: I791ad36dd8abd3b6b3d2c9b16a210a555c08ca64
[ROCm/clr commit: 532f0ae951]
OCL can't distinguish different copy types, but ROC profiler
expects SDMA transfer visibility. Add extra code to detect
a transfer with the host memory and substitute OCL command
Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165
[ROCm/clr commit: bd340d8cbf]
Replace amd::Atomic with std::atomic. Remove make_atomic uses by
converting the variable to std::atomic and making sure the memory
order is relaxed when synchronizes-with is not needed.
Delete utils/atomic.hpp.
Change-Id: I0b36db8d604a8510ac6e36b32885fd16a1b8ccfa
[ROCm/clr commit: 5d4b6f74d3]
Two threads can enqueue to the same HostQueue (HostQueue::enqueue)
and result in last queued command being the first one reachine queue_.enqueue
NOTE: Temporarly make setLastQueuedCommand empty function to pass the build
Change-Id: Id09c3a28d184986f52b2ec86a2f6a18c40df1f0b
[ROCm/clr commit: 3d15a1e291]
Some apps use P2P transfer without any validation for peer access.
Report an error if runtime has found such a request.
Change-Id: I3bf728f1fc3969697ade97bb1d2f1dce294078e2
[ROCm/clr commit: 01c2727a3a]
- Expose ROCclr interfaces for HIP usage
- ROCr interfaces aren't available in staging, thus control the
build with AMD_HMM_SUPPORT define
Change-Id: Iadc2bcc230e78d3b0dc22b235189c8cc80843446
[ROCm/clr commit: c5afd5d412]
Bottom layers don't error check this value, so we might and up writing a bad value to a register and cause the SPI to hang.
Change-Id: I6da4ae71c66a25c63ebb804da4afe4ca7fb831b7
[ROCm/clr commit: 6e985845b3]
- Check the queue for nullptr, since the user events may not have
a queue, associated with them
Change-Id: Ib969a052acc9108ca3fd0c063157fe4d47c5b244
[ROCm/clr commit: 288967eff4]
~45% to 50% of Performance drop on rocBLAS_int8 test
Add support for active waits without blocking the host thread.
Change-Id: Ie7bb48dcafcb4c93d448bf74749b829b626c3578
[ROCm/clr commit: 0fc433e076]
~45% to 50% of Performance drop on rocBLAS_int8 test
Use the last command in the queue for a wait.
Add extra print information about processed commands.
Add an option to disable file location printing.
Change-Id: I4187883e1a90e571fde3128af98368108fda8785
[ROCm/clr commit: a66d09f5a3]