- awaitCompletion code may do a endless spin wait for cases where we
dont submit a handler. One such case can be the hipExt*Launch API which
takes a stop event. In that case we optimize the stop event by attaching
a signal to the dispatch packet but dont submit a handler when we attach
the signal. That means if awaitCompletion() is called after that, we
would keep on waiting on command status on the host rather than simply
checking signal value.
Change-Id: Ie8bf175aeefa3f9e4299b1ae7ae9108dad67e283
This reverts commit d240b03969.
Reason for revert: <rocm-llvm package name change not required for static builds>
Change-Id: Ib2214a74162e5b015b096dc286151ecbd3ca0a80
We must be in protected way to get last command when calling
awaitCompletion() where lastCommand will be released and
possibly destroyed.
This can solve scope lock(notify_lock_) crash in
Event::notifyCmdQueue() with AMD_DIRECT_DISPATCH = true.
Change-Id: I4297166f912a71112f4a8945d993160ba9afdc34
If graph has multiple branches, End command is enqueued on launch stream which
makes sure all the internal parallel streams are finsihed.
When node is removed from the graph, indegree and outdegree are not getting update correctly for parent, child nodes and
resulting in endNode not having deps on parallel commands. Resulting in graph sync issues.
Change-Id: I33cc2f21220e1c017d88099b29b542e05b683f73
Resolved an issue where a freed virtual buffer was incorrectly
added to the global mapping causing an assertion error during
teardown process.
Change-Id: I4801157a28603ce9be1ca0131982b700ff884f7a
Changed find_package call to prioritize the package that is
found under the rocm installation over other system locations
Change-Id: Ice93c94bbb9cdebd467d3e88bb2e4bfb7a1e76d9
1.Move global amd::monitor listenerLock before global
class runtime_tear_down as it will be referenced in
~RuntimeTearDown() after main(). It should be freed
later than runtime_tear_down.
2.Update Device::~Device() to SVM free coopHostcallBuffer_
before context_ is released and freed.
Change-Id: I1d21378ff463477d3238d71e5e2a1a7d6b9147ad
If the graph has kernels that does device side allocation, during packet capture, heap is
allocated because heap pointer has to be added to the AQL packet, and initialized during
graph launch.
Handle race with wait when 2 kernels with device heap are enqueued on multiple streams.
Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688
The hip libraries depends on multiple standard shared libraries.
Corresponding packages are added to the dependency list.
Removed hipcc from hip-runtime package dependency list
Rearranged the package dependencies as well
The base (docker) images usually comes with the standard packages installed.
So even without the standard packages in dependency list, the installation will go through.
But its good to have all the required packages in the dependency list.
Change-Id: Iacab7993dffaa00efd1344d3eafb55f6317d1509
We more clearly define what happens in the case of amdgpu-arch
failure, and instruct developers on how to workaround the failure
Change-Id: Ifff569a7d688b6545ca457116e534a1830fa03c0
Adding a safety check prevents an invalid memory access
if timestamps and kernelNames vectors are of different size.
The patch also moves the addKernelNames for the accumulate command
into dispatchAqlPacket function.
Change-Id: Iea0927e1253800403a1ae3f3d72de1e7d96476c3
cmake config files were part of hip runtime package rather than dev/devel package
Corrected the same
Change-Id: I5e52658f28c551c830294d3b1525907f1b7cbc50
Fork() duplicates all system memory resources, but runtime can't duplicate
GPU resources. Thus, avoid tearDown() calls for the child process(s).
Change-Id: Id6b12bacd5112b9ad3747c218e09cba98ea1b42c
- Update the intra socket weight for partitions within single socket as
it is changed to 13 by the driver.
- Use the PCIe function to distinguish the partitions of the same device
such as TPX mode in gfx942.
Change-Id: I8e64023d44e37c2dbb105cbb343441a48021ba7b
When CL-GL interop is used, a GL context are used by two or more threads at the same time, which causes race condition.
Solution:
Add lock when accessing GL functions during CL-GL interop.
Change-Id: I3a34da3cbdf74c401111cc4e3a04ad84cc52709e
The warpSize variable is set to the value of the __AMDGCN_WAVEFRONT_SIZE macro,
which is a meaningless default in host code.
The resolution for SWDEV-449015 will introduce diagnostics for uses of this
macro in host code, which includes the current definition of the warpSize
variable. With the __device__ specifier, the definition of the warpSize
variable will not cause these diagnostics.
This change does not stop the variable from being used in host code since clang
intentionally does not diagnose uses of __device__ constexpr variables in host
code.
Change-Id: I0317217affe94fdf2dfd9ad0f134e68f5173245f