hipDeviceSynchronize called from __hipUnregisterFatBinary
accesses static maps and monitors. This change ensures these ojects
are not destroyed before __hipUnregisterFatBinary is called.
Additionally it disables the teardown process for static build.
Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d
[ROCm/clr commit: 9b33db9b24]
This issue was happening because of incorrect usage of getStream call,
if we get the null stream first and then typecast it, and call on
getStream again, we lose the advantage of simply passing "nullptr" to
indicate NULL stream. Thus we enter the waitActiveStream call and add
barriers to sync across streams.
Change-Id: I94dc4e3ec927295b9e1ab6dee4b37d7d3e00b0cc
[ROCm/clr commit: cda4b7db1c]
If only external signals were provided, then just process it
without adding internal signals
Change-Id: Iaefd65d0f8b0a64b9f6a864a9bd73de20a29dfa4
[ROCm/clr commit: 18187cd8fe]
Updating field num_mip_levels to better align with OpenCL specification that mip-mapped images can not be created for CL_MEM_OBJECT_IMAGE1D_BUFFER images. Added check for miplevels value used for ClCreateImage call.
Change-Id: I82a25b83ef0637a877409572b7976d9e4413dfac
[ROCm/clr commit: 21a1c9075a]
Also in the scope of SWDEV-467540.
Fix sporadic crash in Unit_hipStreamAddCallback_MultipleThreads by
deferring release() of block_command.
The test will invoke 1000 threads on the same stream thus there
is a chance to free block_command too early in original code.
By deferring release() of block_command we can make sure block_command
is always valid during calling block_command->notifyCmdQueue().
Change-Id: I31555ee18e6958e34b89f04181867fa4e932a38c
[ROCm/clr commit: e3ef19e22a]
Creation of ReferenceCountedObject will increase reference count by 1.
Clear the commands from Node after capture so that they wont be reference later.
Change-Id: I1cc4085939cf65218ec2aa2e25ab6d737f7cacd3
[ROCm/clr commit: 6ae5d6896c]
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.
Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..
Change-Id: I14dfbc41731025cc3a548a730558915def3fa384
[ROCm/clr commit: 346da4bb40]
- HIP path doesn't support resource tracking. Thus, double copy can't be enabled,
because it requires resource tracking.
Change-Id: I0f9c4e185b5b2d2b1abde041fca21bb099db9ccd
[ROCm/clr commit: 4c763e45a1]
Since we made the members public, we can optimize some operations which
do not require redundant conversions to half_raw types.
Change-Id: I31555ef18e695d8e24b89f0418187fa4e932a38a
[ROCm/clr commit: 6a655a77e7]
On some platforms user can ask for extended shared memory for a
particular kernel in some cases. This feature does not exist on HIP at
the moment. So we are setting it to sharedMemPerBlock which is the
maximum user can expect for their kernels.
Change-Id: I81005cf0d1c9fb941e77d34fb8385241ffe5bdd0
[ROCm/clr commit: 4b95e7bc87]
Fixes the memory leak with hipExtStreamCreateWithCUMask API.
hsa queues with cumask set are not being reused and created
everytime the API is called, But these queues were not being
destroyed during hipStreamDestroy causing memory leak.
Change-Id: Ibfbe019bbd73604e98eca80461efe53fa64bb701
[ROCm/clr commit: 191869b252]
For refactoring of childGraph to have its own graphExec,
kernelArgs needs to be separated from the graphExec object.
All the childNodes part of graph should share same kernelArg pool.
Otherwise we endup creating multiple device kernel arg memory chucks
for single graphExec.
Change-Id: I4029a46ebc1fa112d87df64ab1fecbf288fabe5e
[ROCm/clr commit: 35079e834e]
In case when the tile size is greater than the number of active threads,
the coalesced group size should be equal to the number of active threads.
Change-Id: I1d41322f2428a07862a590cb5d34b01243383b7c
[ROCm/clr commit: 152f343124]
Remove the redundant copies inside sub folders. This was useful when
these projects were independent but now since they are merged they
should have one single .clang-format file.
Change-Id: I60510d7b78b129c761e84f13403492bd0c5d941a
[ROCm/clr commit: b5b1f639c0]
This change modifies the readback mechanism to use a pointer to volatile
instead of a volatile pointer. This ensures that the compiler does not
optimize away the read operation.
Change-Id: I79ff925d615aa8cc4f950e8ff4b7e608fcb179a4
[ROCm/clr commit: ea50d2c0c2]
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
(by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.
Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
[ROCm/clr commit: 73c02041e1]
- Introduce a lock when checking isUserObjectValid. We need a lock
here as one can remove the userObject T2, leading to buffer overflow
when checking ranges in T1.
Change-Id: I058144b8cc463c90ab6bf5cf96bf937897742917
[ROCm/clr commit: 6ac67afdd5]