There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.
There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling
Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a
Resolved an issue where hipEventSynchronize and hipStreamWaitEvent APIs
did not function correctly for events created with the hipEventInterprocess flag.
The bug caused the event to be incorrectly marked as "recorded,"
leading to these APIs failing to wait for the event as expected.
Change-Id: Ic9fdfaab2393beb93d6e0b83661545e902a63499
- Fix regression for D2H pinned copies which adds systemscope release.
- Skip cpu wait for D2H unpinned copies as we can pass the signal of the
barrier to rocr copy.
- Fix an old bug in sdmaEngineRetainCount_ logic
- Improve logging
Change-Id: If074bddb05564b15949b0d5f9bf12acd3692174e
Make ocltst -m tests/ocltst/liboclruntime.so -t OCLMemoryInfo
pass in emu where GPU memory is very big.
Cherry pick
https://gerrit-git.amd.com/c/compute/ec/clr/+/1014858
Change-Id: I0228c5e87ce7c366983fd4af71c25e7f8161c2c7
hipGetLastError should return the error by any of the previous APIs
in the same host thread to match the CUDA behavior, whereas
hipExtGetLastError will return the error by the immediate previous API.
This Ext API was added earlier to facilitate the existing HIP apps which
are following the current behavior of hipGetLastError
Change-Id: I61e95b1fc136cc761e2434e02187b7ed2598b733
BatchMemop should be positioned before the image support kernels
because the total number of kernels is determined by BlitLinearTotal,
when there is no image support on the device.
Change-Id: I8e53caf744ba54259ac04bad1762eef21806f3f2
The cl_khr_depth_images associated macro definition is defined twice in
the compiler: in opencl-c.h and automatically by the compiler deduced
from the cl-ext list. These two co-exist and there is no need to remove
cl_khr_depth_images from the cl-ext list.
If we remove cl_khr_depth_images from the cl-ext list, and we do not
include opencl-c.h the macro is not defined.
This fixes conformance test ./test_compiler compiler_defines_for_extensions
when using Comgr with -include opencl-c-base.h -fdeclare-opencl-builtins
without including opencl-c.h.
Before we got the error `ERROR: Supported extension cl_khr_depth_images
not defined in kernel`
This change is needed to eventually get rid of the opencl-c.pch that is embedded in comgr, and that makes implementing a compilation cache in comgr hard.
Change-Id: I76497874ebe7163966420d4ac23a0788b93a36fd
- Resolve signal dependencies for barrier value packet if there are > 1
depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log
Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().
Improve monitor wrapper for better performance.
Fix some bugs left from name removing patch.
Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24
This change fixes random segfaults in graph tests that
are seen after the change make internal callbacks non-blocking.
The callback thread that decreases the GraphExec ref count
may now run after the runtime shutdown. This can cause a segfault
because the hip::device that is accessed in GraphExec destructor
is already destroyed during runtime shutdown. This patch ensures
that the hip::device object stays alive until after the
callback thread completes.
Change-Id: I75a6ac01f27a0b2250bbd10ed389ebfb322927af
Fixes#123. find_program doesn't follow CMP0074 and thus ignores LLVM_ROOT and Clang_ROOT. This change adds LLVM_ROOT and Clang_ROOT to the search path of find_program for llvm-mc and clang in hiprtc to mimics previous add_package behaviour.
Caveat: cmake-specific variables like CMAKE_PREFIX_PATH will take precedence over paths specified with HINTS for find_program, there's no way to change the ordering unless we skip cmake-specific variables all together using NO_CMAKE_PATH and NO_CMAKE_ENVIRONMENT_PATH.
Change-Id: I1fedb60cda09744416e19b3c6e3e0c5c9045f8e7