- Report kernel names for optimized graph path
- Refactor code so that we store profiling info in Accumulate command
Change-Id: Ib97735a0239aeb9fc3a50a4bb7126dd0bcadc8af
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.
Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues
Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
- Support graph with different types of nodes with single
branch when DEBUG_CLR_GRAPH_PACKET_CAPTURE flag is enabled
Change-Id: I149a8629769cd0d5849ffefb04f1352668a685b6
Fix wrong logic to get layer index;
Make layered image's layout match cuda spec;
Fix wrong comparision of element size.
Remove amd::BufferRect from ihipMemcpyAtoHCommand()
and ihipMemcpyHtoACommand().
Change-Id: Icc6a4233fbce2e9b2dc6feb79e6bfbd761684c7d
Three for loops iterate over all graph nodes for UpdateStream, FillCommands and
EnqueueCommands has performance drop for large graphs.
Change-Id: I077accf3a4680d5d944b73200fd6498a7a48f25c
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.
Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
Avoid syncing blocking streams with the default stream,
since that introduces extra command dependencies and
doesn't allow to destroy memory after last submission
Change-Id: I618e9bd2091c4cf9157125612d8c4759030c5a80
- Intra device memcpy does not need to perform host side synchronization
- Check alloc flags when determining memory type
Change-Id: Ieff28bd8d62756ffe82905354c4a91e9717e6bd4
- params should be valid when used for default flag since we support
unified virtual address space
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I75d40e437b12ee58e72e423bb4818b484ce35b66
MemPool was designed to use hip::Stream, but graph implementation uses
amd::HostQueue. Hence switch graph to hip::Stream management.
Change-Id: Ia319389de45e4c3c6043d17473279a6f27a13140
Add memory allocation support in graph. Current implementation uses
cache from mempool to hold the allocations which belong to the graph.
Also the resource tracking is disabled at this moment because mempool
operates with hip::Stream objects, but graph has execution with
amd::HostQueue objects.
Change-Id: I54fe3250126d24f5a26ada975f37d429bb4ef17b
For hipGraphKernelNode, remove func_;
and reorganize functions to naturely support mGPU;
For hipGraphMemcpyNode, make EnqueueCommands() support different
queues' sync
Change-Id: I22708923f454adf4456ff99d25559daffed8c20d
Remove the api_callbacks_table_t that was holding the API activities and
user callbacks. Instead use a single roctracer callback (TracerCallback)
used to report both API activities and callbacks.
Remove the hipInitActivityCallback that was setting the ROCtracer
callback and memory pool for asynchronous activities as it did not
allow disctinct pools to be used for each activity. Instead, use
hipRegisterTracerCallback to set the single roctracer callback.
Change-Id: I4c10f04f29a6e4cce8caf15db3016c3f72c86b04
- add user obj APIs for creating release and retain of user onbjects
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I0bf2999c77e44269565b27c31c7c1461f8a160a2