Граф коммитов

98 Коммитов

Автор SHA1 Сообщение Дата
Jaydeep Patel 8e80429b87 SWDEV-457316 - Release graph exec before stream gets deleted.
Releasing graph exec after wait completes and before delete hip::stream obj
during stream destroy.

Change-Id: I1d68aa8d844f7d3af330c6d09c44af07f8553551
2024-08-06 00:39:37 -04:00
Jaydeep Patel d954eb64db SWDEV-457316 - Multiple graph exec can be for given stream.
Change-Id: I0f1b184eb63e0432119d62f094637d375a3d4e55
2024-08-06 00:31:04 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya bd3a35bde1 SWDEV-468424 - Add support to capture multiple AQL Packets
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.

Change-Id: I7e1c2ba06927459e024653058af142bd82192c43
2024-08-01 23:55:51 -04:00
Ioannis Assiouras 9b33db9b24 SWDEV-472309 - Ensure static maps are destroyed after __hipUnregisterFatBinary
hipDeviceSynchronize called from __hipUnregisterFatBinary
accesses static maps and monitors. This change ensures these ojects
are not destroyed before __hipUnregisterFatBinary  is called.
Additionally it disables the teardown process for static build.

Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d
2024-07-30 10:26:59 -04:00
Saleel Kudchadker cda4b7db1c SWDEV-475341 - Fix stream resolution for graphs launches
This issue was happening because of incorrect usage of getStream call,
if we get the null stream first and then typecast it, and call on
getStream again, we lose the advantage of simply passing "nullptr" to
indicate NULL stream. Thus we enter the waitActiveStream call and add
barriers to sync across streams.

Change-Id: I94dc4e3ec927295b9e1ab6dee4b37d7d3e00b0cc
2024-07-25 19:38:23 -04:00
Anusha GodavarthySurya 346da4bb40 SWDEV-468424 - hipgraph capture memset node
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.

Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..

Change-Id: I14dfbc41731025cc3a548a730558915def3fa384
2024-07-19 23:52:50 -04:00
Anusha GodavarthySurya 35079e834e SWDEV-468424 - Refactor kernel arg
For refactoring of childGraph to have its own graphExec,
kernelArgs needs to be separated from the graphExec object.
All the childNodes part of graph should share same kernelArg pool.
Otherwise we endup creating multiple device kernel arg memory chucks
for single graphExec.

Change-Id: I4029a46ebc1fa112d87df64ab1fecbf288fabe5e
2024-07-16 08:38:44 -04:00
Ioannis Assiouras ea50d2c0c2 SWDEV-469825 - Modified the kernel argument readback to use a pointer to volatile
This change modifies the readback mechanism to use a pointer to volatile
instead of a volatile pointer. This ensures that the compiler does not
optimize away the read operation.

Change-Id: I79ff925d615aa8cc4f950e8ff4b7e608fcb179a4
2024-07-09 17:28:47 -04:00
Saleel Kudchadker 17313ec99d SWDEV-465602 - Refactor kernel arg pool allocation for graphs
- Allocate additional argument space to accomodate for kernel node
param updates

Change-Id: I2d4ea8bddd716f1191f3cbea807920d0248f8c4e
2024-06-25 18:28:03 -04:00
Anusha GodavarthySurya 57156c524d SWDEV-467102 - Hidden heap init for graph capture
If the graph has kernels that does device side allocation,  during packet capture, heap is
allocated because heap pointer has to be added to the AQL packet, and initialized during
graph launch.

Handle race with wait when 2 kernels with device heap are enqueued on multiple streams.

Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688
2024-06-17 02:07:25 -04:00
Ioannis Assiouras d44f44a5b1 SWDEV-467069 - Added safety check in activity prof for accumulate command
Adding a safety check prevents an invalid memory access
if timestamps and kernelNames vectors are of different size.

The patch also moves the addKernelNames for the accumulate command
into dispatchAqlPacket function.

Change-Id: Iea0927e1253800403a1ae3f3d72de1e7d96476c3
2024-06-12 21:53:03 +01:00
Ioannis Assiouras 3edf1501cc SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1
2024-06-12 16:22:27 -04:00
Anusha GodavarthySurya 3a5cbb91b9 SWDEV-461072 - Add reference to function parameter
Change-Id: I9ad5dafc6d697d12fbd1675f19f88f83ad2d7b9c
2024-06-12 01:20:28 -04:00
Ioannis Assiouras 055e05a12a SWDEV-466601 - Fix invalid mem acccess in kernarg readback path
Change-Id: I4654ae592adc8cf9c687136d45eb1b28d99c7ae1
2024-06-10 15:13:05 +01:00
Anusha GodavarthySurya 243dad92c9 SWDEV-461072 - Extend AQL Optimization for child graph nodes
Change-Id: I6baf906add7240b29ea653020a9a0b56206ee2a7
2024-05-28 06:31:10 +00:00
Saleel Kudchadker 72d23a02c5 SWDEV-301667 - Better log
- Print kernelname for graph launches, its hard to correlate packets
otherwise
- Print correlation_id if any

Change-Id: Ib8db7a00e4e7c98f570e71029e61d86f5dccc2ed
2024-05-28 06:31:10 +00:00
Saleel Kudchadker 1ba74c3ce3 SWDEV-451594 - Fix HDP reg readback
Change-Id: I478a968330f85c3b60ff39fb40bf3cd91acd610e
2024-05-28 06:31:10 +00:00
Saleel Kudchadker badf2b0880 SWDEV-301667 - Refactor graph code
- Remove Last graph node optimization and instead submit a barrier NOP
packet always. This simplifies the code.

Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c
2024-05-28 06:28:17 +00:00
Anusha GodavarthySurya bf4d10ff61 SWDEV-460770 - Handle Graph Exec release
Handle GraphExec instance is destroyed before async launch completes
GraphExec instance is destroyed after async launch completes
GraphExec instance is destroyed without a launch

Change-Id: I45a7c82295fea916c7559bd8f796df710513aea1
2024-05-28 06:28:17 +00:00
Ioannis Assiouras 6cb7b6ec6b SWDEV-451594 - Change device kernel args to use HDP flush by default
The Readback and Avoid HDP Flush memory ordering workaround is
used as a fallback solution only when HDP flush register is invalid

Change-Id: Ic284eba1f95ed22b0270d3abeb904fb902015b1a
2024-05-02 19:35:13 +00:00
Ioannis Assiouras bf74ef4025 SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args
Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7
2024-04-19 09:29:20 -04:00
Ioannis Assiouras 96f5c44851 SWDEV-451166 - Disable kernel args for non-XGMI if HDP flush register is invalid
Change-Id: I227e046e2b9cb25476a50240f5d070adbd558f21
2024-03-15 05:27:52 -04:00
Anusha GodavarthySurya e0e63eb04d SWDEV-447545 - Fix Enable/Disable node with hipGraph
Node can be enabled/disabled only for kernel, memcpy and memset nodes.
If the node is disabled it becomes empty node.
To maintain ordering just enqueue marker with respective node dependencies.

Change-Id: I710f3e88ab4e76c81f6f86a40a7dc61fd4c7e440
2024-02-28 17:34:03 -05:00
Sourabh Betigeri 3fdd46ae59 SWDEV-425640 - An instantiated graphExec should retain a copy of every reference in the source graph
Change-Id: Idf6b224449ca642af2860b33dc739f51a6248e4c
2024-02-28 12:04:53 -05:00
Anusha GodavarthySurya 2dc6ec68a5 SWDEV-444988 - Fix __amd_rocclr_initHeap sync with DEBUG_CLR_GRAPH_PACKET_CAPTURE
When kernel does device side malloc, initial heap is allocated with __amd_rocclr_initHeap.
During graph launch kernel __amd_rocclr_initHeap is enqueued followed by actual kernel . So kernel will execute after initHeap kernel.

But with graph optimizations during capture initHeap gets enqueued on device null stream and actual kernel on graph launch stream.
So no proper synchronization. Switch to command creation and enqueue during launch for kernel node with hidden heap.

Change-Id: Iaf600251faef9a448853f19429023c118aa760b9
2024-02-27 13:11:31 -05:00
Saleel Kudchadker f138e0d113 SWDEV-443760 - Enable device kern args
- Implement workaround to ensure HDP writes are done by writing and
reading the HDP MMIO register.
- Implement the same workaround for graphs, we no longer need sentinel
write/readback

Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2
2024-02-20 02:03:14 -05:00
Saleel Kudchadker 81b8598af9 SWDEV-301667 - Cleanup code and better log
Change-Id: Ie2345264e84026156a9f81b421eed3cf4aeeeffc
2024-02-19 05:42:47 -05:00
Anusha GodavarthySurya 7d09e1abed SWDEV-444767 - Fix graph tests for context change between Inst & launch with DEBUG_CLR_GRAPH_PACKET_CAPTURE
When graph is Instantiate on device 0 graph and launch on device1 switch to command creation and enqueue during launch.

Change-Id: Ied34dc99b2a776130d1354ed3830c6ccab9068e4
2024-02-14 17:02:36 +00:00
Anusha GodavarthySurya 853abeb75e SWDEV-445013 - During CaptureAQLPackets correct sentinal value to copy integer size bytes
Read and write int bytes sentinal value to dev_ptr or PCIE connected devices at the tail end of the kernarg surface.

Change-Id: I993d552ac872b3cd56aef4746c4d1d92c58d38b4
2024-02-13 07:05:57 +00:00
Anusha GodavarthySurya d6bc40e822 SWDEV-445084 - Add DEBUG_CLR_GRAPH_PACKET_CAPTURE support for hipGraphInstantiateWithFlags/Params
Change-Id: I5096b4c8d73d1faf972dfd23ab86a53d888946c4
2024-02-08 04:55:53 -05:00
Anusha GodavarthySurya ca0b50c9ca SWDEV-444558 - SWDEV-444418 - Fix capturing of AQL packets when kernel arg size is 0
When graph doesn't have kernel nodes.

Change-Id: I6b3b476654d7eedc9ff0cec4b7269168aa115360
2024-02-08 06:12:16 +00:00
Anusha GodavarthySurya ae0368d12d SWDEV-422207 - Enable DEBUG_CLR_GRAPH_PACKET_CAPTURE environiment variable
Change-Id: I9bf72b9c1a56980352109bd4d42b54ecb2d1b8f9
2024-02-05 05:08:11 +00:00
Anusha GodavarthySurya e9957151f3 SWDEV-439628 - hipGraphExecKernelNodeSetParams to update graph kernel node params with graph performance optimizations.
During hipGraphExecKernelNodeSetParams kernel function can also be updated.
Hence size required for kernel parameters differs from what is allocated during graphInstantiation.
So, create new 128KB kernel pool and allocate kernel args from the pool.
If the pool is full create new 128KB pool. Release kernel pools when graph exec object is destroyed.

Change-Id: I9567946d63400c79cbfd4c5439c654c92557ceae
2024-02-05 05:08:11 +00:00
Anusha GodavarthySurya 2bb2446d8f SWDEV-422207 - Fix graph catch tests with graph optimizations(DEBUG_CLR_GRAPH_PACKET_CAPTURE enabled)
Change-Id: I16297e0ddde286bf1798c90f2bf846e69819010d
2023-12-14 01:27:08 -05:00
Saleel Kudchadker 058b2702db SWDEV-301667 - Logging refactor
- Remove newline from logging as log function internally inserts a new
line

Change-Id: I25eb2242a1f1e87cf811bcc373d1d485b2e027a8
2023-12-07 12:12:57 -05:00
Saleel Kudchadker b056686607 SWDEV-422207 - Report kernel names for activity profiling
- Report kernel names for optimized graph path
- Refactor code so that we store profiling info in Accumulate command

Change-Id: Ib97735a0239aeb9fc3a50a4bb7126dd0bcadc8af
2023-11-15 14:38:07 -05:00
Saleel Kudchadker c3bd229f4f SWDEV-422207 - Optimize graph end detection
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.

Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
2023-11-10 11:57:02 -05:00
Saleel Kudchadker 9fdee05aee SWDEV-422207 - Workaround HDP register query bug
Change-Id: Ib886a3166b555fbd6b8e5a249f993f47afd00166
2023-11-08 12:12:15 -05:00
Saleel Kudchadker 40f41f4d0b SWDEV-422207 - Track commands for capture
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues

Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
2023-11-03 05:09:04 +00:00
sdashmiz 9b567e1799 SWDEV-417075 - add hipDrvAddMemCpyNode
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Ie631d7b1788f10171a29d463759a3cba3b2b2007

SWDEV-417075 - add hipDrvGraphAddMemcpyNode

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I6bab3310919643e119cd0004276907e223641cfb
2023-10-31 09:55:42 -04:00
Anusha GodavarthySurya 5fb7536586 SWDEV-422207 - Remove L2 flush when kernelArgs are in device memory
Change-Id: I7b5625cb6d55e83689bff7bbb45be9c517ec4a8d
2023-10-26 19:14:58 +00:00
Anusha GodavarthySurya 38d2c56784 SWDEV-422207 - Handle nonkernel nodes for graph opt
- Support graph with different types of nodes with single
branch when DEBUG_CLR_GRAPH_PACKET_CAPTURE flag is enabled

Change-Id: I149a8629769cd0d5849ffefb04f1352668a685b6
2023-10-24 18:36:06 +00:00
taosang2 5a0085e516 SWDEV-364236 - Fix layered Image issue
Fix wrong logic to get layer index;
Make layered image's layout match cuda spec;
Fix wrong comparision of element size.
Remove amd::BufferRect from ihipMemcpyAtoHCommand()
and ihipMemcpyHtoACommand().
Change-Id: Icc6a4233fbce2e9b2dc6feb79e6bfbd761684c7d
2023-10-19 16:06:20 -04:00
Anusha GodavarthySurya e63c280d4d SWDEV-422207 - Capture AQL Packets for graph Kernel nodes during graph Inst. And enqueue AQL packet during launch
Change-Id: I1e5f7f9e2a70bd500d190193cb6ba0867f5a63e7
2023-10-05 00:34:29 -04:00
Anusha GodavarthySurya 530dc6de2a SWDEV-301667 - Optimize performance when graph has single branch
Three for loops iterate over all graph nodes for UpdateStream, FillCommands and
EnqueueCommands has performance drop for large graphs.

Change-Id: I077accf3a4680d5d944b73200fd6498a7a48f25c
2023-09-07 23:35:36 -04:00
Saleel Kudchadker e1e5d071ba SWDEV-301667 - Port optimization to save extra packet to graphs
Change-Id: Ibaf64a4efe070c42620e6e153c1862a4a0b15664
2023-08-23 16:58:21 -04:00
Anusha GodavarthySurya f76a40c26d SWDEV-415772, SWDEV-414682 - Fix childgraph node execution
Change-Id: If9ffc08d98a57b8daa5f131f72ef1bf2317f29e1
2023-08-18 00:45:00 -04:00
Anusha GodavarthySurya fd97dde1e6 SWDEV-407568 - Move graph implementation to hip namespace
Change-Id: I7023f202a7e3eb25b17db6d3e361205594ae81a5
2023-07-26 06:52:45 +00:00
Anusha GodavarthySurya b0e6f99ad7 SWDEV-392732 - Initial commit for graph doorbell optimization(AQL Buffering)
Change-Id: I451725006c54c249dc530c55d2af2a31594bf49b
2023-07-16 07:56:00 -04:00