hiprtc and hip APIs use the same file.
Append to file instead of start of file
Change-Id: I2703f9bb67f0c51b557a058daab129679a0b5dd9
[ROCm/clr commit: e07172ff57]
This patch fixes this potential issue that filling AQL header before
filling the AQL body. The hsa spec specifies "Packet processors may
process AQL packets after the packet format field is updated, but
before the doorbell is signaled."
However, the hipGraph AQL package with valid header will be filled
before fill the body, which may have the potential issue that CP
receive invalid AQL body.
Change-Id: I84af798c19ee2b8805ba19732b0eabdea2958a96
[ROCm/clr commit: 3959b5be1e]
SPT is destroyed with hipDeviceReset(). If a
stream is created right after reset, the same
object id could be reused. Later SPT destructor
incorrectly verifies that the stream is valid
referring to the reused object id causing the
corruption.
Change-Id: I3b1f7ffdf8bab874dca7b8fde22318162997b8f6
[ROCm/clr commit: f6a68b3c2e]
This change adds fixes in optimized multistream path for childGraph uses cases.
1) For childgraph nodes, rely on runNodes() only to process
the childgraph and skip calls to createCommand and enqueueCommands.
This ensures that the start/end markers are enqueued correctly
with respect to the childGraph commands.
In addition, the runNodes() for the childgraph should be called after
the dependency walkthrough to make sure that the subgraph is executed once.
2) Nodes with no outgoing edges should be marked
as a leafs regardless of which stream they are assigned to.
This is to ensure that marker dependencies from nodes
that run on non-zero stream to subgraph leafs that run on zero stream
are still set up correctly.
Change-Id: I4a5f4f3b0e0d01e515cdcb045b46c2798f291255
[ROCm/clr commit: 464b99373b]
Fix random language string that leads to compiling failure
of trap handler and TDR of hipMemset() on VM in release
mode of hip-rt
Change-Id: Ie1d874742b804f62ceda68064fa54f5d39c092b8
[ROCm/clr commit: 857d0d60b9]
When source or destination pitch is set to zero in hip_Memcpy2D struct
it should default to WidthInBytes + [src/dst]XInBytes
Change-Id: Id57b53cab40ba72ced231258da9356554c4868c3
[ROCm/clr commit: 7a1e818c82]
- Fixes -0.0 and +0.0 comparison. For atomicMax if the value on
address is -0.0 and on val is +0.0, gfx90a's unsafe atomics will swap
them. This behavior should be consistent with cas loop as well.
- _system variants of atomicMax and atomicMin are resulting in
incorrect output. Updated these to use the similar implementation as
atomicMax and atomicMin.
Change-Id: I20df36ee29ae0434a6b564f2ba71193fe41cfa59
[ROCm/clr commit: d69cc35750]
This is the first step to remove rocm-ocl-icd.
We don't build amd icd after this commit.
We still need to remove header files usage in future steps.
Change-Id: Ic4ac5476180f9ef2ce87b62891c08b28d6c9bfd2
[ROCm/clr commit: 5f775b8b7f]
Releasing graph exec after wait completes and before delete hip::stream obj
during stream destroy.
Change-Id: I1d68aa8d844f7d3af330c6d09c44af07f8553551
[ROCm/clr commit: 8e80429b87]
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue can be detected
- The new path is controlled by DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)
Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
[ROCm/clr commit: 9db52f9a46]
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.
Change-Id: I7e1c2ba06927459e024653058af142bd82192c43
[ROCm/clr commit: bd3a35bde1]
hipDeviceSynchronize called from __hipUnregisterFatBinary
accesses static maps and monitors. This change ensures these ojects
are not destroyed before __hipUnregisterFatBinary is called.
Additionally it disables the teardown process for static build.
Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d
[ROCm/clr commit: 9b33db9b24]
This issue was happening because of incorrect usage of getStream call,
if we get the null stream first and then typecast it, and call on
getStream again, we lose the advantage of simply passing "nullptr" to
indicate NULL stream. Thus we enter the waitActiveStream call and add
barriers to sync across streams.
Change-Id: I94dc4e3ec927295b9e1ab6dee4b37d7d3e00b0cc
[ROCm/clr commit: cda4b7db1c]
If only external signals were provided, then just process it
without adding internal signals
Change-Id: Iaefd65d0f8b0a64b9f6a864a9bd73de20a29dfa4
[ROCm/clr commit: 18187cd8fe]
Updating field num_mip_levels to better align with OpenCL specification that mip-mapped images can not be created for CL_MEM_OBJECT_IMAGE1D_BUFFER images. Added check for miplevels value used for ClCreateImage call.
Change-Id: I82a25b83ef0637a877409572b7976d9e4413dfac
[ROCm/clr commit: 21a1c9075a]
Also in the scope of SWDEV-467540.
Fix sporadic crash in Unit_hipStreamAddCallback_MultipleThreads by
deferring release() of block_command.
The test will invoke 1000 threads on the same stream thus there
is a chance to free block_command too early in original code.
By deferring release() of block_command we can make sure block_command
is always valid during calling block_command->notifyCmdQueue().
Change-Id: I31555ee18e6958e34b89f04181867fa4e932a38c
[ROCm/clr commit: e3ef19e22a]