rocm-systems

Автор	SHA1	Сообщение	Дата
SaleelK	c105dcd05b	clr: Use graph segment scheduling to process HIP Graphs (#1372 ) * clr: Use graph segment scheduling to process HIP Graphs * Add a broader path to use capture packet capture for all topologies * Refactor code * Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING to toggle new vs classic path, Enabled by default * clr: Few fixes and improvements * clr: Detect complex graphs to take classic path * Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING=2 to force segment scheduling path * clr: Fix a cornercase stack corruption * clr: Track commands of segments instead of snapshots * clr: Fix Batch dispatch logic * Track fence_dirty_ flag for command of other streams * Dependency resolution markers can now accomodate dirty fence on cross streams --------- Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com> Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>	2025-12-01 12:49:26 -08:00
sluzynsk-amd	2cf9faa93f	SWDEV-563777 - fix warnings related to inconsistent overrides (#1625 ) This patch adds missing override keywords. Fixes this class of warnings. Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>	2025-11-24 18:50:07 +01:00
Ioannis Assiouras	36029ea1a8	SWDEV-559166 - Fix race condition in getDemangledName (#1868 )	2025-11-23 08:45:45 +00:00
Ioannis Assiouras	6d6b136374	SWDEV-559166 - Fix data races in GetSubmissionBatch, CaptureAndSet and SetQueueStatus (#1441 )	2025-10-23 12:18:31 +01:00
Godavarthy Surya, Anusha	fb72d7f851	SWDEV-524746 - Part-II Add multi device support for hip graph. Updated kernel arg manager for each device (#813 ) - Updated kernel arg manager to support allocating kernel args on multiple devices for single graph. - Updated AQL path to capture on the device where graph node is added. Co-authored-by: Anusha GodavarthySurya <Anusha.GodavarthySurya@amd.com>	2025-09-25 20:38:18 +05:30
SaleelK	34b9184686	clr: Fix memory corruption for memset nodes (#1068 ) * Detect graph capture and use graph kernelarg memory for FillBuffer pattern	2025-09-23 17:17:33 -07:00
SaleelK	149dc17c90	clr: Optimize doorbell ring (#1030 ) Lay foundation to batch packets efficiently for graphs Dynamically copy packets with max threshold set with DEBUG_HIP_GRAPH_BATCH_SIZE, if not stagger packet copy with pow2 Default threshold for DEBUG_HIP_GRAPH_BATCH_SIZE is 256 If TS are not collected for a signal for reuse, create a new signal. This can potentially increase signal footprint if the handler doesn't run fast enough.	2025-09-18 15:02:10 -07:00
German Andryeyev	7a1a6682e2	SWDEV-552846 - Unpin memory for hip before exit the copy (#851 )	2025-09-04 20:04:01 +05:30
Danylo Lytovchenko	f7338717ae	SWDEV-470698 - fix formatting, add format check workflow (#657 )	2025-08-20 19:58:06 +05:30
Betigeri, Sourabh	35e48d1eaf	SWDEV-546293 - hipMemPrefetchAsync_v2 and hipMemAdvise_v2 implementation (#869 ) SWDEV-546293 - hipMemPrefetchAsync hipMemAdvise_v2 Please enter the commit message for your changes. Lines starting [ROCm/clr commit: `cbee74a80e`]	2025-08-15 22:40:04 -07:00
Andryeyev, German	6df9a49437	SWDEV-465041 - Add support for user events with DD (#321 ) * SWDEV-465041 - Add support for user events with DD User events can be replaced with HSA signals. Add the interface to allocate HSA signal for user events and update the status on CL_COMPLETE. Force pinned path with DD to avoid blocking calls. Pinned memory can be released only when the command is complete. Simplify device enqueue path to use generic kernel arg buffer and signals * Fix notifyCmdQueue() logic for OCL * Avoid blocking calls in OCL with DD * Add event destruciton in a case of the failure. [ROCm/clr commit: `2305f8ae56`]	2025-08-12 19:04:36 -04:00
Betigeri, Sourabh	40999496c1	SWDEV-545273 - Respect HIP_LAUNCH_PARAM_BUFFER_SIZE (#770 ) [ROCm/clr commit: `2a02d2c2f3`]	2025-08-03 17:32:52 -07:00
Kudchadker, Saleel	cd14def193	SWDEV-521647 - Fix tracking of hw_event (#206 ) - When a command may possibly have two packets(like device heap initializer), and if there is no signal on the main kernel packet the tracking was broken as it marked HW event of the command as the first packet signal. - Make sure if no completion signal is attached to the second packet then clear the HW event for the command. [ROCm/clr commit: `072fb0804e`]	2025-04-25 08:46:44 -07:00
Saleel Kudchadker	c8f39ec2b0	SWDEV-502365 - Track last used command - This change tries to save extra synchronization packets we may insert as we didnt track the completion signals for every command. We track the current enqueued command until it exits the enqueue stage. We also record the exit scope to know if we flushed the caches - Handle correct release scopes and store completion signal as HW events - Use a new finishCommand implementation to only wait for the command passed as the argument Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc [ROCm/clr commit: `e03e4f3b5d`]	2025-03-04 16:05:02 -05:00
Anusha GodavarthySurya	08c92f4793	SWDEV-480209 - Make internal callbacks non-blocking Change-Id: Ic918d08f341abfd9a7c167d09f9c723cdc43157f [ROCm/clr commit: `683a942364`]	2025-01-10 02:16:11 -05:00
Sourabh Betigeri	7261404002	SWDEV-440866 - [hip-roclr] Adds support to batch memory operations APIs Change-Id: I5ac63a6626af8c2b4ac382c52dfe1aaf0b3716b8 [ROCm/clr commit: `03dbcd8ca7`]	2024-12-12 19:29:24 -05:00
Sourabh Betigeri	1712acdd2e	Revert "SWDEV-440866 - [hip-roclr] Adds support to batch memory operations APIs" This reverts commit `ab0ff9163d`. Reason for revert: hipInfo fails on windows. Updating llvm amd-mainline-closed Change-Id: I57e1fa1945188b0bc0a799c4f3d540f2b7713003 [ROCm/clr commit: `2ca644cf22`]	2024-12-02 16:46:12 -05:00
Sourabh Betigeri	ab0ff9163d	SWDEV-440866 - [hip-roclr] Adds support to batch memory operations APIs Change-Id: I449ffca44bbb04d13348d112e896d603c70fd485 [ROCm/clr commit: `bd5d8e9baf`]	2024-11-30 17:54:32 -05:00
German Andryeyev	faea40cbb3	SWDEV-486602 - Optimize HSA callback performance - Don't generate callbacks for HIP events - Don't process profiling info in the callback for HIP events - Wait for CPU status update of the submitted commands every 50 calls. That will allow to drain the commands and destroy HSA signals. Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9 [ROCm/clr commit: `364dfb0ed1`]	2024-10-11 14:50:25 -04:00
Todd tiantuo Li	170e45b879	SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb [ROCm/clr commit: `41dc4545fc`]	2024-10-10 18:00:19 -04:00
Anusha GodavarthySurya	c0ceb1cf12	SWDEV-477324 - Capture Memcpy1D pinned H2D D2H Change-Id: I1f4744f20a9caeed005ec68da44e5fde737e09f7 [ROCm/clr commit: `742b0210d3`]	2024-09-30 01:01:30 -04:00
Vladana Stojiljkovic	887b11894b	SWDEV-482086 - Fix hipGraphInstantiate leak * In a scenario where kernel is launched with hipExtLaunchKernelGGL and stop event is used, hipGraphInstantiate leaks. Since stop event is used, profiling is enabled and Timestamp (ReferencedCountedObject) is created, but it doesn't get released. * The idea behind this solution is that profiling should be disabled when command is captured, hence the timestamp should not be created. Because information about capturing isn't available when kernel command is created, packet capturing state is used to determine whether to create a timestamp or not. Change-Id: Ia23adac4592ded4fb5e236acf99e12e729f63692 [ROCm/clr commit: `da5f1a6146`]	2024-09-29 11:36:53 -04:00
Chong Li	4979c2f206	SWDEV-478929 - Benchmark ReallyQuickPureX Failed Ensure the member function Alloc() and Free() of command_pool_ will not be accessed after command_pool_ be destructed. Signed-off-by: Chong Li <chongli2@amd.com> Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194 [ROCm/clr commit: `e6a5c81221`]	2024-09-10 22:08:18 -04:00
German Andryeyev	35c7a87014	SWDEV-470612 - Add the optimized multistream path - Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution - Optimize the launch latency, where commands creation and execution is done at the same time - Optimize the scheduling to use less barriers and waiting signals if the same queue can be detected - The new path is controlled by DEBUG_HIP_FORCE_GRAPH_QUEUES environment variable, where 0 will use the original path and any other value will force the number of asynchronous queues for execution - DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async execution in graphs(applicable for Navi families only) Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e [ROCm/clr commit: `9db52f9a46`]	2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya	31927fefd6	SWDEV-468424 - Add support to capture multiple AQL Packets => Added support to capture multiple AQL Packets. => Added Interface to callback to hip runtime from rocclr to allocate kernel args from the graph kernel arg pool. => Enabled Support to capture memset node. Change-Id: I7e1c2ba06927459e024653058af142bd82192c43 [ROCm/clr commit: `bd3a35bde1`]	2024-08-01 23:55:51 -04:00
Anusha GodavarthySurya	7985a72073	SWDEV-468424 - hipgraph capture memset node Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch. Added support to capture single graph memset node. Capture support for memset node is currently disabled. Memset capture will be enabled when capture for multiple packets are supported.. Change-Id: I14dfbc41731025cc3a548a730558915def3fa384 [ROCm/clr commit: `346da4bb40`]	2024-07-19 23:52:50 -04:00
Tao Sang	b8cf863eaa	SWDEV-458943 - Implement std::mutex based monitor Implement std::mutex based monitor that has much simpler logics than legacy monitor. Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to toggle them. If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false (by default), use legacy monitor; If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true, use std::mutex based monitor. If no perf drop of stl::mutex based monitor, legacy one will be removed later. Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad [ROCm/clr commit: `73c02041e1`]	2024-07-04 11:50:46 -04:00
kjayapra-amd	41cb6dadf9	SWDEV-460948 - Changes to alloc, set, capture under single function. Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543 [ROCm/clr commit: `892071aeb2`]	2024-06-06 16:57:53 -04:00
Ioannis Assiouras	0e023d1a0a	SWDEV-463865 - symbol renamings to prevent conflicts in static build Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb [ROCm/clr commit: `b8c2ac4de4`]	2024-06-06 04:05:55 -04:00
Saleel Kudchadker	0b3e421451	SWDEV-301667 - Refactor graph code - Remove Last graph node optimization and instead submit a barrier NOP packet always. This simplifies the code. Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c [ROCm/clr commit: `badf2b0880`]	2024-05-28 06:28:17 +00:00
German Andryeyev	68344576d3	SWDEV-460242 - Add system memory suballocator Switch commands creation to the new suballocator to avoid frequent expensive OS calls Change-Id: I3597c811820e577c15708bad8b8a41aa53acc400 [ROCm/clr commit: `5b0bfdcbad`]	2024-05-28 06:28:17 +00:00
Saleel Kudchadker	588e870000	SWDEV-301667 - Pass reference to kernel name Change-Id: I21abe109ddfabfe7640bf78a96c81a1317d31952 [ROCm/clr commit: `4a9d24a211`]	2024-05-05 16:38:20 -04:00
Saleel Kudchadker	f3aedfbec0	SWDEV-301667 - Create TS for each node recorded in graph - Create a vector to allow multiple TS to be stored in Command. - This would mean we dont wait for entire batch in Accumulate command to finish when we exhaust signals. - Reduce the number of signals created at init to 64. This min value may still need to be tuned but the KFD allows max of 4094 interrupt signals per device. - Store kernel names whenever they are available and not just when profiling. If we dynamically enable profiling like for Torch, a crash can happen if hipGraphInstantiate wasnt included in Torch profile scope beacuse we previously entered kernel names only when profiler is attached. Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006 [ROCm/clr commit: `c157bfb202`]	2024-03-26 14:47:24 -04:00
Saleel Kudchadker	cc0b04cc60	SWDEV-301667 - Reset profiler correlation_id_ - The correlation_id had random junk values which we were inserting in the dispatch AQL packet even when no profiler was attached but if we had a valid timestamp. - Also make sure we dont even write the reserved2 field in the AQL packet if no profiler attached. Change-Id: Icdb7493198c1bb5e2d786a97e027288660854cd7 [ROCm/clr commit: `9a6ddae7b2`]	2024-02-05 05:08:11 +00:00
Saleel Kudchadker	dfb1087c3e	SWDEV-422207 - Tag captured kernel names for graphs Change-Id: I9540daa4abf9c340541a681037e2dca4eec821ed [ROCm/clr commit: `dfd4635f91`]	2024-01-03 11:50:05 -05:00
Saleel Kudchadker	cb9a715e04	SWDEV-422207 - Report kernel names for activity profiling - Report kernel names for optimized graph path - Refactor code so that we store profiling info in Accumulate command Change-Id: Ib97735a0239aeb9fc3a50a4bb7126dd0bcadc8af [ROCm/clr commit: `b056686607`]	2023-11-15 14:38:07 -05:00
Saleel Kudchadker	be743bcd59	SWDEV-422207 - Optimize graph end detection - Do not use extra barrier to detect graph end. If its a kernel node we can use a completion signal for the last packet. Saves roughly 6us for Phantom testcase per graph launch. Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91 [ROCm/clr commit: `c3bd229f4f`]	2023-11-10 11:57:02 -05:00
Saleel Kudchadker	19ea94729c	SWDEV-422207 - Report TS for Accumulate command Change-Id: Iba193a6068c1a2d25c2136643faee2c1e2591a07 [ROCm/clr commit: `f5c6fc4dfa`]	2023-11-07 18:19:40 +00:00
Saleel Kudchadker	5f009b7cb1	SWDEV-422207 - Track commands for capture - Track all captured commands under a new AccumulateCommand - Add begin() and end() methods to capture commands - Explicit TS object now passed to certain methods because profilingBegin() and profilingEnd() now happen separately and thus can run into threading issues Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f [ROCm/clr commit: `40f41f4d0b`]	2023-11-03 05:09:04 +00:00
Anusha GodavarthySurya	0404df28ef	SWDEV-422207 - Capture AQL Packets for graph Kernel nodes during graph Inst. And enqueue AQL packet during launch Change-Id: I1e5f7f9e2a70bd500d190193cb6ba0867f5a63e7 [ROCm/clr commit: `e63c280d4d`]	2023-10-05 00:34:29 -04:00
German	5d9912f48b	SWDEV-407533 - [ABI Break]Remove Wavelimiter Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd [ROCm/clr commit: `7be3a5e33e`]	2023-09-07 09:58:41 -04:00
Tao Sang	3fdd346cf2	SWDEV-417727 - Fix hipSignalExternalSemaphoresAsync() This reverts commit `cab71e6e00`. Implement the right way to make ExternalSemaphores be signalled only after prior works on the stream have been finished. Change-Id: I9d5974e05d5f229170b928db4566c14e40e3cbaa [ROCm/clr commit: `d433df4761`]	2023-08-23 22:31:27 -04:00
taosang2	cab71e6e00	SWDEV-417727 - Fix hipSignalExternalSemaphoresAsync() Let ExternalSemaphores be signalled only after prior works on the stream have been finished. Change-Id: I856917db905f68f55fdf484f5267f7fe8ea3117f [ROCm/clr commit: `44a3935cda`]	2023-08-23 14:58:37 -04:00
Anusha GodavarthySurya	57467ef2c7	SWDEV-392732 - Initial commit for graph doorbell optimization(AQL Buffering) Change-Id: I451725006c54c249dc530c55d2af2a31594bf49b [ROCm/clr commit: `b0e6f99ad7`]	2023-07-16 07:56:00 -04:00
sdashmiz	2216908962	SWDEV-403638 - Fix warnings - disable deprecated function use warning - disalbe size_t to .type' warning - disable conversion from 'type1' to 'type2' warning Signed-off-by: sdashmiz <shadi.dashmiz@amd.com> Change-Id: I64161fd37cf56de3d132102103267ae8da40193a [ROCm/clr commit: `38a67df312`]	2023-06-15 12:17:22 -04:00
German	8d97827417	SWDEV-353281 - VM support in mempool for graphs The change enables VM support in graphs on Windows. That allows to avoid caching of all allocations at the cost of map/unmap overhead during memory create/destroy. Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d [ROCm/clr commit: `04b696abee`]	2023-05-05 15:31:26 -04:00
Saleel Kudchadker	cb09d962ba	SWDEV-384557 - Leverage SDMA engine status query Change-Id: I5f386f2965de24a229ea43b6c4da82099692f91f [ROCm/clr commit: `20ca8b8116`]	2023-04-05 07:50:53 +00:00
German	b5b078e036	SWDEV-377991 - Remove liquidflash support Change-Id: Iba6455e5c0210c3223a06fec332404cd9f489154 [ROCm/clr commit: `53a10c9039`]	2023-01-20 09:57:06 -05:00
Ioannis Assiouras	733c8d1d1c	SWDEV-369581 - Convey copy API metadata to ROCclr Change-Id: I569462d6d268700d419510255e201bf7d80d6714 [ROCm/clr commit: `72b45e2a1f`]	2022-12-09 00:27:15 -05:00
Sourabh Betigeri	7aa958a8f7	SWDEV-305894 - Cooperative groups grid and multi grid sync support for gfx940+ Change-Id: I35d72f1cb50c3a96eee56a612b72d641852b145f [ROCm/clr commit: `5d7f3f9f3c`]	2022-12-05 16:30:30 -05:00

1 2

98 Коммитов