After setting the new params in hipDrvGraphExecMemsetNodeSetParams, we
need to update the AQL packet as well, otherwise during the graph launch
it still dispatches the packet which has the original params and not the
updated one.
Change-Id: Ie49a641ba3f66c8085a29f92d88ac6ea6a1c0534
* When hipMemset3dAsync is captured, a 3d extent can set be as a parameter (depth > 1). That worked on nvidia, but on amd wrong portion of array was filled because when creating Memset3D command, extent dimensions were used to create pitchedPtr, instead of original array width and height.
* Also, when capturing hipMemset3dAsync, nvidia allows any of the extent dimension to be 0, and in that case, no work should be done.
Change-Id: I46a605bf9ae801cd3348e98d528c21263a8eefce
=> GraphExec instance is destroyed before async launch completes,
destroy after all pending graph launches
=> Remove GraphExec destroy during next sync point(hipStreamSync,
hipDeviceSync etc..)
Change-Id: I4df682aae5787fd6e5240a7be936ce50361345d0
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue can be detected
- The new path is controlled by DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)
Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.
Change-Id: I7e1c2ba06927459e024653058af142bd82192c43
For refactoring of childGraph to have its own graphExec,
kernelArgs needs to be separated from the graphExec object.
All the childNodes part of graph should share same kernelArg pool.
Otherwise we endup creating multiple device kernel arg memory chucks
for single graphExec.
Change-Id: I4029a46ebc1fa112d87df64ab1fecbf288fabe5e
If graph has multiple branches, End command is enqueued on launch stream which
makes sure all the internal parallel streams are finsihed.
When node is removed from the graph, indegree and outdegree are not getting update correctly for parent, child nodes and
resulting in endNode not having deps on parallel commands. Resulting in graph sync issues.
Change-Id: I33cc2f21220e1c017d88099b29b542e05b683f73
Free node should be added in same graph and once.
Graph clone containing mem alloc/mem free node not supported.
Destroy mem alloc/mem free node is not supported if already added in graph.
Change-Id: I40459e66d7dd84f3b5298617990313b41458c804
Handle GraphExec instance is destroyed before async launch completes
GraphExec instance is destroyed after async launch completes
GraphExec instance is destroyed without a launch
Change-Id: I45a7c82295fea916c7559bd8f796df710513aea1
=> hipDeviceSynchronize is not allowed during capture.
=> hipEventSynchronize during capture should return hipErrorCapturedEvent error
=> hipEventQuery during capture should return hipErrorCapturedEvent error
hipStreamSynchronize, hipEventSynchronize, hipStreamWaitEvent, hipStreamQuery
For Side Stream(Stream that is not currently under capture):
=> If current thread is capturing in relaxed mode, calls are allowed
=> If any stream in current/concurrent thread is capturing in global mode, calls are not allowed
=> If any stream in current thread is capturing in ThreadLocal mode, calls are not allowed
For Stream that is currently under capture
=> calls are not allowed
=> Any call that is not allowed during capture invalidates the capture sequence
=> It is invalid to call synchronous APIs during capture. Synchronous APIs,
such as hipMemcpy(), enqueue work to the legacy stream and synchronize it before returning.
Change-Id: I201c6e63e1a5d93fd416a3b520264c0fdbe31237
When graph is Instantiate on device 0 graph and launch on device1 switch to command creation and enqueue during launch.
Change-Id: Ied34dc99b2a776130d1354ed3830c6ccab9068e4
During hipGraphExecKernelNodeSetParams kernel function can also be updated.
Hence size required for kernel parameters differs from what is allocated during graphInstantiation.
So, create new 128KB kernel pool and allocate kernel args from the pool.
If the pool is full create new 128KB pool. Release kernel pools when graph exec object is destroyed.
Change-Id: I9567946d63400c79cbfd4c5439c654c92557ceae