Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.
Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500
Compute doesn't support IB chaining, but RGP may collect
perf counters, which require more space in CB.
Increase CB size if RGP is enabled.
Change-Id: Iaa0a620ead8541a679b0dfe5e5711af5afdba545
- Use correct header in device_library_decl
- use std:: instead of __hip_internal:: for host compilation
- hide device specific stuff behind __clang__ and __HIP__ check
Change-Id: I2f3647e00555ed0e79f9954a459c41394c3cd49b
- Also add a cache, which allows compiled code objects to be reused
instead of compiling again. This should improve performance on
multigpu systems.
Change-Id: Ib135d616c076b77f8aaf28de275d408b38021d89
There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.
There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling
Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a
Resolved an issue where hipEventSynchronize and hipStreamWaitEvent APIs
did not function correctly for events created with the hipEventInterprocess flag.
The bug caused the event to be incorrectly marked as "recorded,"
leading to these APIs failing to wait for the event as expected.
Change-Id: Ic9fdfaab2393beb93d6e0b83661545e902a63499
- Fix regression for D2H pinned copies which adds systemscope release.
- Skip cpu wait for D2H unpinned copies as we can pass the signal of the
barrier to rocr copy.
- Fix an old bug in sdmaEngineRetainCount_ logic
- Improve logging
Change-Id: If074bddb05564b15949b0d5f9bf12acd3692174e
Make ocltst -m tests/ocltst/liboclruntime.so -t OCLMemoryInfo
pass in emu where GPU memory is very big.
Cherry pick
https://gerrit-git.amd.com/c/compute/ec/clr/+/1014858
Change-Id: I0228c5e87ce7c366983fd4af71c25e7f8161c2c7
hipGetLastError should return the error by any of the previous APIs
in the same host thread to match the CUDA behavior, whereas
hipExtGetLastError will return the error by the immediate previous API.
This Ext API was added earlier to facilitate the existing HIP apps which
are following the current behavior of hipGetLastError
Change-Id: I61e95b1fc136cc761e2434e02187b7ed2598b733
BatchMemop should be positioned before the image support kernels
because the total number of kernels is determined by BlitLinearTotal,
when there is no image support on the device.
Change-Id: I8e53caf744ba54259ac04bad1762eef21806f3f2
The cl_khr_depth_images associated macro definition is defined twice in
the compiler: in opencl-c.h and automatically by the compiler deduced
from the cl-ext list. These two co-exist and there is no need to remove
cl_khr_depth_images from the cl-ext list.
If we remove cl_khr_depth_images from the cl-ext list, and we do not
include opencl-c.h the macro is not defined.
This fixes conformance test ./test_compiler compiler_defines_for_extensions
when using Comgr with -include opencl-c-base.h -fdeclare-opencl-builtins
without including opencl-c.h.
Before we got the error `ERROR: Supported extension cl_khr_depth_images
not defined in kernel`
This change is needed to eventually get rid of the opencl-c.pch that is embedded in comgr, and that makes implementing a compilation cache in comgr hard.
Change-Id: I76497874ebe7163966420d4ac23a0788b93a36fd
- Resolve signal dependencies for barrier value packet if there are > 1
depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log
Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().
Improve monitor wrapper for better performance.
Fix some bugs left from name removing patch.
Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24