* clr: Adjust call to ICmdBuffer::CmdCopyMemoryToImage for PAL >= 955
PAL starting versino 955 adds a new argument to
ICmdBuffer::CmdCopyMemoryToImage. Adjust teh callsite to account
fort his.
* clr: Handle new GpuUtil::TraceSessionState cases for PAL >= 939
Starting PAL API version 939, GpuUtil::TraceSessionState changes its
possible values. Adjust for it.
* clr: require PAL version 954
Bump the PAL required vesion to 954, as this is required for proper
debugger support.
* SWDEV-534207 - fix 'Unit_hipFreeMipmappedArrayImplicitSyncArray - float' out of memory error with extent (1024, 1024, 1024) and 1 levels on 740M iGPUs. totalGlobalMem is not really the amount of device memory available for compute
* SWDEV-534207 - compare expected available memory within a range in Unit_hipMalloc3D_Basic; to take into account some bookkeeping overhead (instead of in exact 64MB chunks)
* SWDEV-534207 - fix missing setting of SvmGpuMemoryCreateInfo::interprocess in the 'fine' and 'fine uncached' memory and 'MemorySubAllocator' cases. Coarse allocation was added first; the flag was missed when the other three cases were added
* SWDEV-534207 - allow more room for the check of available memory after hipFree() in Unit_hipMalloc3D_Basic; it was till failing on 740M
---------
Co-authored-by: Gerardo Hernandez <gerardo.hernandez@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Add VmHeapArray class to reduce the pressure on VA reservation, since
multiple memory pools can be active at the same time.
[ROCm/clr commit: e974f7fde1]
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.
Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500
[ROCm/clr commit: 296dce5570]
The "optimized" version of memcpy is outdated and
was used in win32 only.
Change-Id: I7f2e0e9051e37cec95438266824b5b0025c324c6
[ROCm/clr commit: 7448113cfc]
The new logic has a lock for PAL call and doesn't require the lock for queues.
Change-Id: I61b67c3c4abd2ede44809de1d6beed756766032e
[ROCm/clr commit: 2f3ad43c4a]
alwaysResident setting doesn't require per queue residency tracking.
Thus, the logic can be skipped to avoid the lock of queues.
Change-Id: Ib5cff5b79d3ecb8c2f2eb2565cf069f9a69438b0
[ROCm/clr commit: 95e3958748]
PAL optimized the logic for the barriers, which caused failures with CP DMA on Navi4x.
Change barrier's code to match the most recent PAL optimizations.
Change-Id: I55eeab20f51eb8e920bcbb4b55fbe3c7f77fd3fa
[ROCm/clr commit: 1239309c90]
Recently some unused compiler options for HSAIL path were removed,
but it affected blit kernels compilation. Hence, remove those options.
Also delete assert for device to device copy in SDMA path for now.
Change-Id: Ib5d7f063af2ab4a3fc5d73d426e39c391b1011ac
[ROCm/clr commit: eaa61fc740]
- Make sure persistent memory from resource cache is properly adjusted
in free memory calculation.
Change-Id: I74ef68975ccde4694fb1cb904617c418e85dfc9f
[ROCm/clr commit: 85c15d720d]
Persistent memory should use direct access for write map and
indirect for read map.
Change-Id: I9fc84836d60088b24012ed25f7ef8c16e33796a3
[ROCm/clr commit: c8b3253a24]
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.
Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4
[ROCm/clr commit: 5f297d75d9]
Fix wrong logic to get layer index;
Make layered image's layout match cuda spec;
Fix wrong comparision of element size.
Remove amd::BufferRect from ihipMemcpyAtoHCommand()
and ihipMemcpyHtoACommand().
Change-Id: Icc6a4233fbce2e9b2dc6feb79e6bfbd761684c7d
[ROCm/clr commit: 5a0085e516]
Support hipExternalMemoryGetMappedMipmappedArray().
Add ImageExternalBuffer to differiate ImageBuffer.
Currently we only support tiling_optimal mode as
vulkan driver doesn't provide tiling information.
Change-Id: I7e3524cdde53e4df9f728894bcebf4bd3f58d4d9
[ROCm/clr commit: 6398f604b0]
- Make IPC interfaces generic between devices and rely on the IPC buffer
for attach/detach logic
Change-Id: Id3c18d122030329b7ee532bbb6317de9dd6a0bbe
[ROCm/clr commit: 9aa6f25f1c]
Sync between compute and SDMA engines can be very expensive under Windows.
Use CP DMA for tiny transfers (< 1KiB) to avoid syncs and improve performance.
Change-Id: I9db39a2199f7b9e337ed08fd36d9cbc150502f1f
[ROCm/clr commit: 473621c008]
HIP can't rely on the resource tracking, used in OCL and requires different explicit sync.
Make sure ROCCLR syncs compute only when SDMA is used and vise versa.
The new logic will allow to enable CPDMA without unnecessary waits.
Change-Id: Ib9d1788cfd5afa5ea2fec4c96a37d8b9c4d0059d
[ROCm/clr commit: ff6b4db70b]
Blender creates and destroys big allocations during the benchmark.
That causes big delays, because vidmm has to page-in/page-out memory.
Change-Id: I2baf4545807127406e3d2870a7581ff9ae7bcdb5
[ROCm/clr commit: dc4ad8c99c]
Adding virtual memory management APIs to rocclr.
The HIP layer will handle virtual allocs on devices.
Change-Id: Ia978f105c2c3fed3959c77580ba228e845105754
[ROCm/clr commit: b5f555f9ec]