If graph has multiple branches, End command is enqueued on launch stream which
makes sure all the internal parallel streams are finsihed.
When node is removed from the graph, indegree and outdegree are not getting update correctly for parent, child nodes and
resulting in endNode not having deps on parallel commands. Resulting in graph sync issues.
Change-Id: I33cc2f21220e1c017d88099b29b542e05b683f73
Free node should be added in same graph and once.
Graph clone containing mem alloc/mem free node not supported.
Destroy mem alloc/mem free node is not supported if already added in graph.
Change-Id: I40459e66d7dd84f3b5298617990313b41458c804
Handle GraphExec instance is destroyed before async launch completes
GraphExec instance is destroyed after async launch completes
GraphExec instance is destroyed without a launch
Change-Id: I45a7c82295fea916c7559bd8f796df710513aea1
=> hipDeviceSynchronize is not allowed during capture.
=> hipEventSynchronize during capture should return hipErrorCapturedEvent error
=> hipEventQuery during capture should return hipErrorCapturedEvent error
hipStreamSynchronize, hipEventSynchronize, hipStreamWaitEvent, hipStreamQuery
For Side Stream(Stream that is not currently under capture):
=> If current thread is capturing in relaxed mode, calls are allowed
=> If any stream in current/concurrent thread is capturing in global mode, calls are not allowed
=> If any stream in current thread is capturing in ThreadLocal mode, calls are not allowed
For Stream that is currently under capture
=> calls are not allowed
=> Any call that is not allowed during capture invalidates the capture sequence
=> It is invalid to call synchronous APIs during capture. Synchronous APIs,
such as hipMemcpy(), enqueue work to the legacy stream and synchronize it before returning.
Change-Id: I201c6e63e1a5d93fd416a3b520264c0fdbe31237
When graph is Instantiate on device 0 graph and launch on device1 switch to command creation and enqueue during launch.
Change-Id: Ied34dc99b2a776130d1354ed3830c6ccab9068e4
During hipGraphExecKernelNodeSetParams kernel function can also be updated.
Hence size required for kernel parameters differs from what is allocated during graphInstantiation.
So, create new 128KB kernel pool and allocate kernel args from the pool.
If the pool is full create new 128KB pool. Release kernel pools when graph exec object is destroyed.
Change-Id: I9567946d63400c79cbfd4c5439c654c92557ceae
- hipArray will be an internal struct from rocm6.0
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Icf97fe96b87be8532098cd7f9ceaad099f99c9b9
Addresses the below scenarios like parameters mismatch
in memcopy node, difference in the count of nodes, difference
in the dependencies of the nodes.
Change-Id: I31c6516fb27cc1007809f1b50306fdb0c2356ccc
When kernel function expects no parameters no error should be returned
if both kernelParams and extra arguments are set to null.
Change-Id: I5941bcc400b6fb380e623bdae0233ae3e4f73815
hipMemcpyArrayToArray, hipMemcpyFromArrayAsync, and hipMemcpyToArrayAsync
are deprecated in cuda and are missing the headers in hip_runtime_api.h.
Removed their implementation from hip_memory.cpp.
Change-Id: I9720aec6241515c56c66b7e90a37b2ed53347eb2
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.
Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
GraphMemcpyNodeSetParamsFrom/ToSymbol APIs neew to check device id for
original src/dst is same as what is passed in while set.
Change-Id: If0b610808223dce9115562bb5e9b31c8eaa2df22
- before removing node from graph all edges should be removed and rest
of graph updated
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Ide0afcc964f87f13cf407c971e22497433e3b1ed