Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.
Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500
[ROCm/clr commit: 296dce5570]
This reverts commit 84fb57e7f9.
Reason for revert: Even though this change is valid, this would break backward compatibility.
Change-Id: I9c7cab83198c8d5c8485b11194099162e3e7a874
[ROCm/clr commit: ec6f83b544]
Support zero width and height for hipTextObjectCreate to align
with cuda.
Change-Id: I5d4c48625faf5f060ed2a7e634ec65e4ecac9da5
[ROCm/clr commit: efce2f77c4]
Support hipExternalMemoryGetMappedMipmappedArray().
Add ImageExternalBuffer to differiate ImageBuffer.
Currently we only support tiling_optimal mode as
vulkan driver doesn't provide tiling information.
Change-Id: I7e3524cdde53e4df9f728894bcebf4bd3f58d4d9
[ROCm/clr commit: 6398f604b0]
Add a view bit to avoid original resource destruction when parent
dependency doesn't exist with the image view cache
Change-Id: I8277afd575af8f29951c5d1a9f7d94d784251657
[ROCm/clr commit: b49e8e78e1]
Add missing mipmap Apis’ implementation.
Fix some bugs of mimpmap apis.
Use hipmipmappedArray to differentiate cuda
and driver apis on Nvidia.
Change-Id: I6079d9f3b2ddf4e42b9a6f7f3902322cfca02cfd
[ROCm/clr commit: f03c11491b]
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.
Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
[ROCm/clr commit: 04b696abee]
- The implementation in mempool graphs requires refcounting VA object.
That requires release() to update the map only on the actual destruction.
- Add GPU event tracking for paging operation. Otherwise, runtime
may not always flush IB.
Change-Id: Idf99ffb894321a38e04b490116a7ca435635918d
[ROCm/clr commit: 7ef2da5aba]
For large bar platform, it's not necessary to mmap
memory allocated on VRAM to cpu again
Change-Id: I0701680476829d4058b3e7b643e8df657d0c6168
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
[ROCm/clr commit: 25fe45bb2a]
Pass active queue for transfers in the cache coherency layer.
That will allow to use device transfer queue only for
cases when active queue isn't available, because using device
transfer queue from another active queue may cause a deadlock
Change-Id: Ifbe7e0303b77dbf6eeda3939ffbc25a3df7472de
[ROCm/clr commit: 95d55fdfa8]
Cache coherency layer is OCL feature to support multiple devices in
single OCL context.
Change-Id: Ic66df9551fad5b0c4df95ab3e1db1da259919f25
[ROCm/clr commit: 6da9d18140]
Below logic is causing a crash in the CL-GL interop. As a workaround,
limit it only to HIP.
Change-Id: I12e81d035ebd80a4a9a09eb6eea2fae7040d90c9
[ROCm/clr commit: 74ccf71d53]
When OCL ROCr backend performs CL_MEM_COPY_HOST_PTR it may attempt
to have access to amd::Memory object it's currently creating,
but it's not ready yet. The logic creates a temporary dummy object
to perform a copy transfer. The new change will make sure runtime
skips allocation of the same device::Memory object second time.
Change-Id: I14c6a00a3941fdcaa6aea299e9f096e4c3f5cadf
[ROCm/clr commit: 1fde842703]