rocm-systems

Tekijä	SHA1	Viesti	Päivämäärä
Anusha GodavarthySurya	28cbf2bc4f	SWDEV-469422 - Refacor childgraph node Remove static functions in graph Change-Id: I4df94915f81f250acaea60398aea32ef0ed658e2	2024-12-12 12:38:24 -05:00
Branislav Brzak	89dfdc4dbe	SWDEV-490860 - Do signal_is_required detection post graph schedule Change-Id: Iaf1067a811aeac3d16c08de954036e219b545e07	2024-12-09 03:57:44 -05:00
Anusha GodavarthySurya	c47f9dda58	SWDEV-469422 - Cleanup graph code remove parallellists and nodewaitlists Change-Id: I00c7b2894333bd13d47b913d3fcdd6e1ffcb741f	2024-11-30 04:40:51 -05:00
Anusha GodavarthySurya	fb7ad8361c	SWDEV-489084 - Update max streams for graph Change-Id: I6d0992b2e80ebf3184911593a4f3574327b2e9c3	2024-11-29 08:16:16 -05:00
Anusha GodavarthySurya	06e6561eb5	SWDEV-489084 - Avoid using queue colliding with the graph launch stream Change-Id: I3ecaf8836c8e0883441275139041c702aba0937e	2024-11-29 08:15:58 -05:00
Jaydeep Patel	d997f78be4	SWDEV-498077 - Check topoOrder_ before accessing it. Change-Id: I10e3c24ca8dc1009b8ac8ac27b3e9a6296f9a7ee	2024-11-19 04:50:47 -05:00
Jaydeep Patel	24c57cb984	SWDEV-496544 - Reset mem alloc node count for AutoFreeOnLaunch. Change-Id: Ib32b04584548a46632606ecd85b58c6ce4a5894d	2024-11-11 11:03:32 +00:00
Rahul Manocha	314d4a2c22	SWDEV-490864 - Optimize Alloc Node detection in graph Change-Id: I6ac32f9abd0b44864071a0a9396463cb13f6941f	2024-11-01 11:45:49 -04:00
Anusha GodavarthySurya	f9f995c6d0	SWDEV-480209 - Handle GraphExec object release => GraphExec instance is destroyed before async launch completes, destroy after all pending graph launches => Remove GraphExec destroy during next sync point(hipStreamSync, hipDeviceSync etc..) Change-Id: I4df682aae5787fd6e5240a7be936ce50361345d0	2024-10-22 12:30:46 -04:00
Jaydeep Patel	e74ac6f580	SWDEV-482692, SWDEV-485802, SWDEV-485489 - Handle refcounts owned by graph for user objects. Change-Id: Ic739ab1ec5d3dc3143e3ae70f9591922bc0e3d9f	2024-10-08 03:44:44 -04:00
kjayapra-amd	12a39fbf22	SWDEV-480772 - Remove name variable from amd::Monitor class. Change-Id: Ie2a4fa44f485786227230f8a892e090e718aa30e	2024-09-19 11:55:01 -04:00
Rahul Manocha	4d1ded9eaf	SWDEV-479575 - Add marker to parent graph dependencies in childgraph node 1) Child Graph nodes need to have parent graph dependencies in waitlist. 2) Marker is placed on base stream with parent graph waitlist Change-Id: Iec65a0171ea387be05b0733abcc708fb630e4be4	2024-09-18 15:12:50 -04:00
Rahul Manocha	dbf00966b9	SWDEV-479575 - Graph clone root size check Change-Id: I34dd43ea36ce1e2623198e6ce1179318b9f7e277	2024-09-04 11:54:15 -04:00
Ioannis Assiouras	464b99373b	SWDEV-470612 - Added fixes in optimized multistream path for graph execution This change adds fixes in optimized multistream path for childGraph uses cases. 1) For childgraph nodes, rely on runNodes() only to process the childgraph and skip calls to createCommand and enqueueCommands. This ensures that the start/end markers are enqueued correctly with respect to the childGraph commands. In addition, the runNodes() for the childgraph should be called after the dependency walkthrough to make sure that the subgraph is executed once. 2) Nodes with no outgoing edges should be marked as a leafs regardless of which stream they are assigned to. This is to ensure that marker dependencies from nodes that run on non-zero stream to subgraph leafs that run on zero stream are still set up correctly. Change-Id: I4a5f4f3b0e0d01e515cdcb045b46c2798f291255	2024-08-21 10:11:24 -04:00
Jaydeep Patel	912de7ab44	SWDEV-474937 - Fix race condition between main and work thread on windows. Change-Id: I4d6b9de41d0e5a39094eb3babe47dffde72e0587	2024-08-07 14:29:14 -04:00
Jaydeep Patel	8e80429b87	SWDEV-457316 - Release graph exec before stream gets deleted. Releasing graph exec after wait completes and before delete hip::stream obj during stream destroy. Change-Id: I1d68aa8d844f7d3af330c6d09c44af07f8553551	2024-08-06 00:39:37 -04:00
Jaydeep Patel	d954eb64db	SWDEV-457316 - Multiple graph exec can be for given stream. Change-Id: I0f1b184eb63e0432119d62f094637d375a3d4e55	2024-08-06 00:31:04 -04:00
German Andryeyev	9db52f9a46	SWDEV-470612 - Add the optimized multistream path - Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution - Optimize the launch latency, where commands creation and execution is done at the same time - Optimize the scheduling to use less barriers and waiting signals if the same queue can be detected - The new path is controlled by DEBUG_HIP_FORCE_GRAPH_QUEUES environment variable, where 0 will use the original path and any other value will force the number of asynchronous queues for execution - DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async execution in graphs(applicable for Navi families only) Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e	2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya	bd3a35bde1	SWDEV-468424 - Add support to capture multiple AQL Packets => Added support to capture multiple AQL Packets. => Added Interface to callback to hip runtime from rocclr to allocate kernel args from the graph kernel arg pool. => Enabled Support to capture memset node. Change-Id: I7e1c2ba06927459e024653058af142bd82192c43	2024-08-01 23:55:51 -04:00
Ioannis Assiouras	9b33db9b24	SWDEV-472309 - Ensure static maps are destroyed after __hipUnregisterFatBinary hipDeviceSynchronize called from __hipUnregisterFatBinary accesses static maps and monitors. This change ensures these ojects are not destroyed before __hipUnregisterFatBinary is called. Additionally it disables the teardown process for static build. Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d	2024-07-30 10:26:59 -04:00
Saleel Kudchadker	cda4b7db1c	SWDEV-475341 - Fix stream resolution for graphs launches This issue was happening because of incorrect usage of getStream call, if we get the null stream first and then typecast it, and call on getStream again, we lose the advantage of simply passing "nullptr" to indicate NULL stream. Thus we enter the waitActiveStream call and add barriers to sync across streams. Change-Id: I94dc4e3ec927295b9e1ab6dee4b37d7d3e00b0cc	2024-07-25 19:38:23 -04:00
Anusha GodavarthySurya	346da4bb40	SWDEV-468424 - hipgraph capture memset node Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch. Added support to capture single graph memset node. Capture support for memset node is currently disabled. Memset capture will be enabled when capture for multiple packets are supported.. Change-Id: I14dfbc41731025cc3a548a730558915def3fa384	2024-07-19 23:52:50 -04:00
Anusha GodavarthySurya	35079e834e	SWDEV-468424 - Refactor kernel arg For refactoring of childGraph to have its own graphExec, kernelArgs needs to be separated from the graphExec object. All the childNodes part of graph should share same kernelArg pool. Otherwise we endup creating multiple device kernel arg memory chucks for single graphExec. Change-Id: I4029a46ebc1fa112d87df64ab1fecbf288fabe5e	2024-07-16 08:38:44 -04:00
Ioannis Assiouras	ea50d2c0c2	SWDEV-469825 - Modified the kernel argument readback to use a pointer to volatile This change modifies the readback mechanism to use a pointer to volatile instead of a volatile pointer. This ensures that the compiler does not optimize away the read operation. Change-Id: I79ff925d615aa8cc4f950e8ff4b7e608fcb179a4	2024-07-09 17:28:47 -04:00
Saleel Kudchadker	17313ec99d	SWDEV-465602 - Refactor kernel arg pool allocation for graphs - Allocate additional argument space to accomodate for kernel node param updates Change-Id: I2d4ea8bddd716f1191f3cbea807920d0248f8c4e	2024-06-25 18:28:03 -04:00
Anusha GodavarthySurya	57156c524d	SWDEV-467102 - Hidden heap init for graph capture If the graph has kernels that does device side allocation, during packet capture, heap is allocated because heap pointer has to be added to the AQL packet, and initialized during graph launch. Handle race with wait when 2 kernels with device heap are enqueued on multiple streams. Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688	2024-06-17 02:07:25 -04:00
Ioannis Assiouras	d44f44a5b1	SWDEV-467069 - Added safety check in activity prof for accumulate command Adding a safety check prevents an invalid memory access if timestamps and kernelNames vectors are of different size. The patch also moves the addKernelNames for the accumulate command into dispatchAqlPacket function. Change-Id: Iea0927e1253800403a1ae3f3d72de1e7d96476c3	2024-06-12 21:53:03 +01:00
Ioannis Assiouras	3edf1501cc	SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1	2024-06-12 16:22:27 -04:00
Anusha GodavarthySurya	3a5cbb91b9	SWDEV-461072 - Add reference to function parameter Change-Id: I9ad5dafc6d697d12fbd1675f19f88f83ad2d7b9c	2024-06-12 01:20:28 -04:00
Ioannis Assiouras	055e05a12a	SWDEV-466601 - Fix invalid mem acccess in kernarg readback path Change-Id: I4654ae592adc8cf9c687136d45eb1b28d99c7ae1	2024-06-10 15:13:05 +01:00
Anusha GodavarthySurya	243dad92c9	SWDEV-461072 - Extend AQL Optimization for child graph nodes Change-Id: I6baf906add7240b29ea653020a9a0b56206ee2a7	2024-05-28 06:31:10 +00:00
Saleel Kudchadker	72d23a02c5	SWDEV-301667 - Better log - Print kernelname for graph launches, its hard to correlate packets otherwise - Print correlation_id if any Change-Id: Ib8db7a00e4e7c98f570e71029e61d86f5dccc2ed	2024-05-28 06:31:10 +00:00
Saleel Kudchadker	1ba74c3ce3	SWDEV-451594 - Fix HDP reg readback Change-Id: I478a968330f85c3b60ff39fb40bf3cd91acd610e	2024-05-28 06:31:10 +00:00
Saleel Kudchadker	badf2b0880	SWDEV-301667 - Refactor graph code - Remove Last graph node optimization and instead submit a barrier NOP packet always. This simplifies the code. Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c	2024-05-28 06:28:17 +00:00
Anusha GodavarthySurya	bf4d10ff61	SWDEV-460770 - Handle Graph Exec release Handle GraphExec instance is destroyed before async launch completes GraphExec instance is destroyed after async launch completes GraphExec instance is destroyed without a launch Change-Id: I45a7c82295fea916c7559bd8f796df710513aea1	2024-05-28 06:28:17 +00:00
Ioannis Assiouras	6cb7b6ec6b	SWDEV-451594 - Change device kernel args to use HDP flush by default The Readback and Avoid HDP Flush memory ordering workaround is used as a fallback solution only when HDP flush register is invalid Change-Id: Ic284eba1f95ed22b0270d3abeb904fb902015b1a	2024-05-02 19:35:13 +00:00
Ioannis Assiouras	bf74ef4025	SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7	2024-04-19 09:29:20 -04:00
Ioannis Assiouras	96f5c44851	SWDEV-451166 - Disable kernel args for non-XGMI if HDP flush register is invalid Change-Id: I227e046e2b9cb25476a50240f5d070adbd558f21	2024-03-15 05:27:52 -04:00
Anusha GodavarthySurya	e0e63eb04d	SWDEV-447545 - Fix Enable/Disable node with hipGraph Node can be enabled/disabled only for kernel, memcpy and memset nodes. If the node is disabled it becomes empty node. To maintain ordering just enqueue marker with respective node dependencies. Change-Id: I710f3e88ab4e76c81f6f86a40a7dc61fd4c7e440	2024-02-28 17:34:03 -05:00
Sourabh Betigeri	3fdd46ae59	SWDEV-425640 - An instantiated graphExec should retain a copy of every reference in the source graph Change-Id: Idf6b224449ca642af2860b33dc739f51a6248e4c	2024-02-28 12:04:53 -05:00
Anusha GodavarthySurya	2dc6ec68a5	SWDEV-444988 - Fix __amd_rocclr_initHeap sync with DEBUG_CLR_GRAPH_PACKET_CAPTURE When kernel does device side malloc, initial heap is allocated with __amd_rocclr_initHeap. During graph launch kernel __amd_rocclr_initHeap is enqueued followed by actual kernel . So kernel will execute after initHeap kernel. But with graph optimizations during capture initHeap gets enqueued on device null stream and actual kernel on graph launch stream. So no proper synchronization. Switch to command creation and enqueue during launch for kernel node with hidden heap. Change-Id: Iaf600251faef9a448853f19429023c118aa760b9	2024-02-27 13:11:31 -05:00
Saleel Kudchadker	f138e0d113	SWDEV-443760 - Enable device kern args - Implement workaround to ensure HDP writes are done by writing and reading the HDP MMIO register. - Implement the same workaround for graphs, we no longer need sentinel write/readback Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2	2024-02-20 02:03:14 -05:00
Saleel Kudchadker	81b8598af9	SWDEV-301667 - Cleanup code and better log Change-Id: Ie2345264e84026156a9f81b421eed3cf4aeeeffc	2024-02-19 05:42:47 -05:00
Anusha GodavarthySurya	7d09e1abed	SWDEV-444767 - Fix graph tests for context change between Inst & launch with DEBUG_CLR_GRAPH_PACKET_CAPTURE When graph is Instantiate on device 0 graph and launch on device1 switch to command creation and enqueue during launch. Change-Id: Ied34dc99b2a776130d1354ed3830c6ccab9068e4	2024-02-14 17:02:36 +00:00
Anusha GodavarthySurya	853abeb75e	SWDEV-445013 - During CaptureAQLPackets correct sentinal value to copy integer size bytes Read and write int bytes sentinal value to dev_ptr or PCIE connected devices at the tail end of the kernarg surface. Change-Id: I993d552ac872b3cd56aef4746c4d1d92c58d38b4	2024-02-13 07:05:57 +00:00
Anusha GodavarthySurya	d6bc40e822	SWDEV-445084 - Add DEBUG_CLR_GRAPH_PACKET_CAPTURE support for hipGraphInstantiateWithFlags/Params Change-Id: I5096b4c8d73d1faf972dfd23ab86a53d888946c4	2024-02-08 04:55:53 -05:00
Anusha GodavarthySurya	ca0b50c9ca	SWDEV-444558 - SWDEV-444418 - Fix capturing of AQL packets when kernel arg size is 0 When graph doesn't have kernel nodes. Change-Id: I6b3b476654d7eedc9ff0cec4b7269168aa115360	2024-02-08 06:12:16 +00:00
Anusha GodavarthySurya	ae0368d12d	SWDEV-422207 - Enable DEBUG_CLR_GRAPH_PACKET_CAPTURE environiment variable Change-Id: I9bf72b9c1a56980352109bd4d42b54ecb2d1b8f9	2024-02-05 05:08:11 +00:00
Anusha GodavarthySurya	e9957151f3	SWDEV-439628 - hipGraphExecKernelNodeSetParams to update graph kernel node params with graph performance optimizations. During hipGraphExecKernelNodeSetParams kernel function can also be updated. Hence size required for kernel parameters differs from what is allocated during graphInstantiation. So, create new 128KB kernel pool and allocate kernel args from the pool. If the pool is full create new 128KB pool. Release kernel pools when graph exec object is destroyed. Change-Id: I9567946d63400c79cbfd4c5439c654c92557ceae	2024-02-05 05:08:11 +00:00
Anusha GodavarthySurya	2bb2446d8f	SWDEV-422207 - Fix graph catch tests with graph optimizations(DEBUG_CLR_GRAPH_PACKET_CAPTURE enabled) Change-Id: I16297e0ddde286bf1798c90f2bf846e69819010d	2023-12-14 01:27:08 -05:00

1 2 3

113 Commitit