- awaitCompletion code may do a endless spin wait for cases where we
dont submit a handler. One such case can be the hipExt*Launch API which
takes a stop event. In that case we optimize the stop event by attaching
a signal to the dispatch packet but dont submit a handler when we attach
the signal. That means if awaitCompletion() is called after that, we
would keep on waiting on command status on the host rather than simply
checking signal value.
Change-Id: Ie8bf175aeefa3f9e4299b1ae7ae9108dad67e283
[ROCm/clr commit: 561fb8a459]
This reverts commit 7d021f8196.
Reason for revert: <rocm-llvm package name change not required for static builds>
Change-Id: Ib2214a74162e5b015b096dc286151ecbd3ca0a80
[ROCm/clr commit: 0670697ab4]
We must be in protected way to get last command when calling
awaitCompletion() where lastCommand will be released and
possibly destroyed.
This can solve scope lock(notify_lock_) crash in
Event::notifyCmdQueue() with AMD_DIRECT_DISPATCH = true.
Change-Id: I4297166f912a71112f4a8945d993160ba9afdc34
[ROCm/clr commit: 749385155a]
If graph has multiple branches, End command is enqueued on launch stream which
makes sure all the internal parallel streams are finsihed.
When node is removed from the graph, indegree and outdegree are not getting update correctly for parent, child nodes and
resulting in endNode not having deps on parallel commands. Resulting in graph sync issues.
Change-Id: I33cc2f21220e1c017d88099b29b542e05b683f73
[ROCm/clr commit: 9ad7e79e50]
Resolved an issue where a freed virtual buffer was incorrectly
added to the global mapping causing an assertion error during
teardown process.
Change-Id: I4801157a28603ce9be1ca0131982b700ff884f7a
[ROCm/clr commit: 1c6b92627d]
Changed find_package call to prioritize the package that is
found under the rocm installation over other system locations
Change-Id: Ice93c94bbb9cdebd467d3e88bb2e4bfb7a1e76d9
[ROCm/clr commit: 6b9e89fe0c]
1.Move global amd::monitor listenerLock before global
class runtime_tear_down as it will be referenced in
~RuntimeTearDown() after main(). It should be freed
later than runtime_tear_down.
2.Update Device::~Device() to SVM free coopHostcallBuffer_
before context_ is released and freed.
Change-Id: I1d21378ff463477d3238d71e5e2a1a7d6b9147ad
[ROCm/clr commit: 544c45364f]
If the graph has kernels that does device side allocation, during packet capture, heap is
allocated because heap pointer has to be added to the AQL packet, and initialized during
graph launch.
Handle race with wait when 2 kernels with device heap are enqueued on multiple streams.
Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688
[ROCm/clr commit: 57156c524d]
The hip libraries depends on multiple standard shared libraries.
Corresponding packages are added to the dependency list.
Removed hipcc from hip-runtime package dependency list
Rearranged the package dependencies as well
The base (docker) images usually comes with the standard packages installed.
So even without the standard packages in dependency list, the installation will go through.
But its good to have all the required packages in the dependency list.
Change-Id: Iacab7993dffaa00efd1344d3eafb55f6317d1509
[ROCm/clr commit: 7dc71710dd]
We more clearly define what happens in the case of amdgpu-arch
failure, and instruct developers on how to workaround the failure
Change-Id: Ifff569a7d688b6545ca457116e534a1830fa03c0
[ROCm/clr commit: 3c5b304992]
Adding a safety check prevents an invalid memory access
if timestamps and kernelNames vectors are of different size.
The patch also moves the addKernelNames for the accumulate command
into dispatchAqlPacket function.
Change-Id: Iea0927e1253800403a1ae3f3d72de1e7d96476c3
[ROCm/clr commit: d44f44a5b1]
cmake config files were part of hip runtime package rather than dev/devel package
Corrected the same
Change-Id: I5e52658f28c551c830294d3b1525907f1b7cbc50
[ROCm/clr commit: b7d203edf6]
Fork() duplicates all system memory resources, but runtime can't duplicate
GPU resources. Thus, avoid tearDown() calls for the child process(s).
Change-Id: Id6b12bacd5112b9ad3747c218e09cba98ea1b42c
[ROCm/clr commit: ae2992ea43]
- Update the intra socket weight for partitions within single socket as
it is changed to 13 by the driver.
- Use the PCIe function to distinguish the partitions of the same device
such as TPX mode in gfx942.
Change-Id: I8e64023d44e37c2dbb105cbb343441a48021ba7b
[ROCm/clr commit: 1815fc808d]
When CL-GL interop is used, a GL context are used by two or more threads at the same time, which causes race condition.
Solution:
Add lock when accessing GL functions during CL-GL interop.
Change-Id: I3a34da3cbdf74c401111cc4e3a04ad84cc52709e
[ROCm/clr commit: 0c6a952a90]
The warpSize variable is set to the value of the __AMDGCN_WAVEFRONT_SIZE macro,
which is a meaningless default in host code.
The resolution for SWDEV-449015 will introduce diagnostics for uses of this
macro in host code, which includes the current definition of the warpSize
variable. With the __device__ specifier, the definition of the warpSize
variable will not cause these diagnostics.
This change does not stop the variable from being used in host code since clang
intentionally does not diagnose uses of __device__ constexpr variables in host
code.
Change-Id: I0317217affe94fdf2dfd9ad0f134e68f5173245f
[ROCm/clr commit: 819e537dc5]