The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.
Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8
Fix a typo with the name define, when compilation wasn't enabled.
Force CPU prefetch if system was forced in runtime
Change-Id: Id4b578f9fa44a45426fdb5d8ecb1da803aa42313
- Expose ROCclr interfaces for HIP usage
- ROCr interfaces aren't available in staging, thus control the
build with AMD_HMM_SUPPORT define
Change-Id: Iadc2bcc230e78d3b0dc22b235189c8cc80843446
Remove queue limitation since we loop through HW queues now.
Add a DevLogError if we fail to create the hsa_queue. A ticket showed a regression there.
Change-Id: I4f58e405f88e75600a762f6d6352838c969cdb5e
This workaround is to avoid performance penalty of SDMA engine
taking a while to clock up from a lower DPM state. Add env var
GPU_FORCE_BLIT_COPY_SIZE (1024 by default for HIP in KB). Forcing
Src and Dst agent to be amdgpu makes ROCr take blit copy path for
what otherwise should have been SDMA copy
Change-Id: I222f687155f86000d17d66d25182e490b6710463
1. Enable pitch workaround
2. When we use copy image, we don't need to create the custom pitch image
3. wrtBackImageBuffer_ stores device memory object, not amd image object.
Tests:
conformance kernel read / write test pass with this code change.
Change-Id: I7dca3127adde6ac83e78dd270a2256ebed55c60d