alwaysResident setting doesn't require per queue residency tracking.
Thus, the logic can be skipped to avoid the lock of queues.
Change-Id: Ib5cff5b79d3ecb8c2f2eb2565cf069f9a69438b0
PAL optimized the logic for the barriers, which caused failures with CP DMA on Navi4x.
Change barrier's code to match the most recent PAL optimizations.
Change-Id: I55eeab20f51eb8e920bcbb4b55fbe3c7f77fd3fa
Recently some unused compiler options for HSAIL path were removed,
but it affected blit kernels compilation. Hence, remove those options.
Also delete assert for device to device copy in SDMA path for now.
Change-Id: Ib5d7f063af2ab4a3fc5d73d426e39c391b1011ac
- Make sure persistent memory from resource cache is properly adjusted
in free memory calculation.
Change-Id: I74ef68975ccde4694fb1cb904617c418e85dfc9f
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.
Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4
Fix wrong logic to get layer index;
Make layered image's layout match cuda spec;
Fix wrong comparision of element size.
Remove amd::BufferRect from ihipMemcpyAtoHCommand()
and ihipMemcpyHtoACommand().
Change-Id: Icc6a4233fbce2e9b2dc6feb79e6bfbd761684c7d
Support hipExternalMemoryGetMappedMipmappedArray().
Add ImageExternalBuffer to differiate ImageBuffer.
Currently we only support tiling_optimal mode as
vulkan driver doesn't provide tiling information.
Change-Id: I7e3524cdde53e4df9f728894bcebf4bd3f58d4d9
- disable deprecated function use warning
- disalbe size_t to .type' warning
- disable conversion from 'type1' to 'type2' warning
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I64161fd37cf56de3d132102103267ae8da40193a
Initial implementation for hipMemPoolExportToShareableHandle,
hipMemPoolImportFromShareableHandle,
hipMemPoolExportPointer and hipMemPoolImportPointer
Change-Id: I0ebdc48e9163b394ded560adca6c38bbc5aee7d1
Sync between compute and SDMA engines can be very expensive under Windows.
Use CP DMA for tiny transfers (< 1KiB) to avoid syncs and improve performance.
Change-Id: I9db39a2199f7b9e337ed08fd36d9cbc150502f1f
HIP can't rely on the resource tracking, used in OCL and requires different explicit sync.
Make sure ROCCLR syncs compute only when SDMA is used and vise versa.
The new logic will allow to enable CPDMA without unnecessary waits.
Change-Id: Ib9d1788cfd5afa5ea2fec4c96a37d8b9c4d0059d
Blender creates and destroys big allocations during the benchmark.
That causes big delays, because vidmm has to page-in/page-out memory.
Change-Id: I2baf4545807127406e3d2870a7581ff9ae7bcdb5
Adding virtual memory management APIs to rocclr.
The HIP layer will handle virtual allocs on devices.
Change-Id: Ia978f105c2c3fed3959c77580ba228e845105754
Some chunk memory are not guaranteed to be resident during
initial allocation. Use CPDMA to force resident.
Change-Id: If1a2da3e75f136caaa4c7a29d8f604d6af2639fa
PAL may internally align up the allocation size to the page size
reported by KMD. This will cause a mismatch in size between OCL and PAL.
To avoid this, use PAL size when updating the free memory counter on
both alloc and free.
Change-Id: Ic6e8c861a52170476474fb70a769eef93be3261f
On ReBar systems the invible heap is not present, so in theory we should
fail creating the suballocation chunk, however PAL doesn't report any
errors.
To make sure we never fail, allow creating the allocation in the visible
heap and system memory.
Change-Id: Iea9cc68d98b9cb396a2b7a37398b98b66274083b
Replace amd::Atomic with std::atomic. Remove make_atomic uses by
converting the variable to std::atomic and making sure the memory
order is relaxed when synchronizes-with is not needed.
Delete utils/atomic.hpp.
Change-Id: I0b36db8d604a8510ac6e36b32885fd16a1b8ccfa