Creation of ReferenceCountedObject will increase reference count by 1.
Clear the commands from Node after capture so that they wont be reference later.
Change-Id: I1cc4085939cf65218ec2aa2e25ab6d737f7cacd3
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.
Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..
Change-Id: I14dfbc41731025cc3a548a730558915def3fa384
- HIP path doesn't support resource tracking. Thus, double copy can't be enabled,
because it requires resource tracking.
Change-Id: I0f9c4e185b5b2d2b1abde041fca21bb099db9ccd
Since we made the members public, we can optimize some operations which
do not require redundant conversions to half_raw types.
Change-Id: I31555ef18e695d8e24b89f0418187fa4e932a38a
On some platforms user can ask for extended shared memory for a
particular kernel in some cases. This feature does not exist on HIP at
the moment. So we are setting it to sharedMemPerBlock which is the
maximum user can expect for their kernels.
Change-Id: I81005cf0d1c9fb941e77d34fb8385241ffe5bdd0
Fixes the memory leak with hipExtStreamCreateWithCUMask API.
hsa queues with cumask set are not being reused and created
everytime the API is called, But these queues were not being
destroyed during hipStreamDestroy causing memory leak.
Change-Id: Ibfbe019bbd73604e98eca80461efe53fa64bb701
For refactoring of childGraph to have its own graphExec,
kernelArgs needs to be separated from the graphExec object.
All the childNodes part of graph should share same kernelArg pool.
Otherwise we endup creating multiple device kernel arg memory chucks
for single graphExec.
Change-Id: I4029a46ebc1fa112d87df64ab1fecbf288fabe5e
In case when the tile size is greater than the number of active threads,
the coalesced group size should be equal to the number of active threads.
Change-Id: I1d41322f2428a07862a590cb5d34b01243383b7c
Remove the redundant copies inside sub folders. This was useful when
these projects were independent but now since they are merged they
should have one single .clang-format file.
Change-Id: I60510d7b78b129c761e84f13403492bd0c5d941a
This change modifies the readback mechanism to use a pointer to volatile
instead of a volatile pointer. This ensures that the compiler does not
optimize away the read operation.
Change-Id: I79ff925d615aa8cc4f950e8ff4b7e608fcb179a4
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
(by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.
Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
- Introduce a lock when checking isUserObjectValid. We need a lock
here as one can remove the userObject T2, leading to buffer overflow
when checking ranges in T1.
Change-Id: I058144b8cc463c90ab6bf5cf96bf937897742917
- awaitCompletion code may do a endless spin wait for cases where we
dont submit a handler. One such case can be the hipExt*Launch API which
takes a stop event. In that case we optimize the stop event by attaching
a signal to the dispatch packet but dont submit a handler when we attach
the signal. That means if awaitCompletion() is called after that, we
would keep on waiting on command status on the host rather than simply
checking signal value.
Change-Id: Ie8bf175aeefa3f9e4299b1ae7ae9108dad67e283
This reverts commit d240b03969.
Reason for revert: <rocm-llvm package name change not required for static builds>
Change-Id: Ib2214a74162e5b015b096dc286151ecbd3ca0a80
We must be in protected way to get last command when calling
awaitCompletion() where lastCommand will be released and
possibly destroyed.
This can solve scope lock(notify_lock_) crash in
Event::notifyCmdQueue() with AMD_DIRECT_DISPATCH = true.
Change-Id: I4297166f912a71112f4a8945d993160ba9afdc34
If graph has multiple branches, End command is enqueued on launch stream which
makes sure all the internal parallel streams are finsihed.
When node is removed from the graph, indegree and outdegree are not getting update correctly for parent, child nodes and
resulting in endNode not having deps on parallel commands. Resulting in graph sync issues.
Change-Id: I33cc2f21220e1c017d88099b29b542e05b683f73
Resolved an issue where a freed virtual buffer was incorrectly
added to the global mapping causing an assertion error during
teardown process.
Change-Id: I4801157a28603ce9be1ca0131982b700ff884f7a
Changed find_package call to prioritize the package that is
found under the rocm installation over other system locations
Change-Id: Ice93c94bbb9cdebd467d3e88bb2e4bfb7a1e76d9