Sync between compute and SDMA engines can be very expensive under Windows.
Use CP DMA for tiny transfers (< 1KiB) to avoid syncs and improve performance.
Change-Id: I9db39a2199f7b9e337ed08fd36d9cbc150502f1f
HIP can't rely on the resource tracking, used in OCL and requires different explicit sync.
Make sure ROCCLR syncs compute only when SDMA is used and vise versa.
The new logic will allow to enable CPDMA without unnecessary waits.
Change-Id: Ib9d1788cfd5afa5ea2fec4c96a37d8b9c4d0059d
Blender creates and destroys big allocations during the benchmark.
That causes big delays, because vidmm has to page-in/page-out memory.
Change-Id: I2baf4545807127406e3d2870a7581ff9ae7bcdb5
Adding virtual memory management APIs to rocclr.
The HIP layer will handle virtual allocs on devices.
Change-Id: Ia978f105c2c3fed3959c77580ba228e845105754
Some chunk memory are not guaranteed to be resident during
initial allocation. Use CPDMA to force resident.
Change-Id: If1a2da3e75f136caaa4c7a29d8f604d6af2639fa
PAL may internally align up the allocation size to the page size
reported by KMD. This will cause a mismatch in size between OCL and PAL.
To avoid this, use PAL size when updating the free memory counter on
both alloc and free.
Change-Id: Ic6e8c861a52170476474fb70a769eef93be3261f
On ReBar systems the invible heap is not present, so in theory we should
fail creating the suballocation chunk, however PAL doesn't report any
errors.
To make sure we never fail, allow creating the allocation in the visible
heap and system memory.
Change-Id: Iea9cc68d98b9cb396a2b7a37398b98b66274083b
Replace amd::Atomic with std::atomic. Remove make_atomic uses by
converting the variable to std::atomic and making sure the memory
order is relaxed when synchronizes-with is not needed.
Delete utils/atomic.hpp.
Change-Id: I0b36db8d604a8510ac6e36b32885fd16a1b8ccfa
- Add cache free on OCL context destroy
- Remove std::mem_fun() usage, since it was removed in c++17
Change-Id: If6acd08f13a2298912ecd78fc025dcf0b32aee54