rocm-systems

Автор	SHA1	Сообщение	Дата
SaleelK	340f3aa887	clr: Implement dynamic stream to HWq logic (#1958 ) * clr: Implement dynamic stream to HW queue assignment This change implements dynamic stream to hardware queue (HWq) mapping with the following features: * Queue depth heuristics with weights for optimal HWq assignment * Make last used queue sticky for better locality * Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to pipe mapping based on creation order (single process per device only, as pipe ID is statically assigned by runtime) * More aggressive heuristic usage for better queue distribution * Extend dynamic queues support for all stream priorities Environment variables: * DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 - Depth+Pipe heuristics * DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation * clr: Clean up last_used_queue_	2026-01-23 10:40:54 -08:00
Ioannis Assiouras	602ea0be1e	SWDEV-558078 - Fix use-after-free in graph tests due to AsyncEventHandler (#1502 )	2025-10-23 22:49:24 +01:00
Ioannis Assiouras	6d6b136374	SWDEV-559166 - Fix data races in GetSubmissionBatch, CaptureAndSet and SetQueueStatus (#1441 )	2025-10-23 12:18:31 +01:00
Danylo Lytovchenko	f7338717ae	SWDEV-470698 - fix formatting, add format check workflow (#657 )	2025-08-20 19:58:06 +05:30
Manocha, Rahul	b3ccf487da	SWDEV-545952 - API definitions for hipStreamSet/GetAttribute (#831 ) Co-authored-by: Rahul Manocha <rmanocha@amd.com> [ROCm/clr commit: `0f49c4a97f`]	2025-08-15 12:51:35 -07:00
Kudchadker, Saleel	3a849c6962	SWDEV-538195 - Introduce threshold for handler submission (#723 ) - When doing device/stream sync, we can submit a handler which may introduce some host side delays. Use DEBUG_CLR_BATCH_CPU_SYNC_SIZE to batch commands for host wait. Default for HIP is 8 commands. - Investigation is underway in ROCr but need to address this for now in HIP runtime. [ROCm/clr commit: `9b045922a8`]	2025-08-06 20:34:42 -07:00
Saleel Kudchadker	c8f39ec2b0	SWDEV-502365 - Track last used command - This change tries to save extra synchronization packets we may insert as we didnt track the completion signals for every command. We track the current enqueued command until it exits the enqueue stage. We also record the exit scope to know if we flushed the caches - Handle correct release scopes and store completion signal as HW events - Use a new finishCommand implementation to only wait for the command passed as the argument Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc [ROCm/clr commit: `e03e4f3b5d`]	2025-03-04 16:05:02 -05:00
Tao Sang	7803594aea	SWDEV-458943 - Add fast path in wait() wait() is redesigned with two pathes: fast path: Use spinlock to wait for notify signal. If the signal hasn't been received for some loops, go to slow path. slow path: Use condition_variable's wait(). Improve monitor wrapper for better performance. Fix some bugs left from name removing patch. Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24 [ROCm/clr commit: `f2ff56af9c`]	2025-01-28 12:19:55 -05:00
Anusha GodavarthySurya	08c92f4793	SWDEV-480209 - Make internal callbacks non-blocking Change-Id: Ic918d08f341abfd9a7c167d09f9c723cdc43157f [ROCm/clr commit: `683a942364`]	2025-01-10 02:16:11 -05:00
Anusha GodavarthySurya	c34f55babb	SWDEV-489084 - Avoid using queue colliding with the graph launch stream Change-Id: I3ecaf8836c8e0883441275139041c702aba0937e [ROCm/clr commit: `06e6561eb5`]	2024-11-29 08:15:58 -05:00
German Andryeyev	0a03665a3f	SWDEV-491375 - Limit the SW batch size Applications may submit commands withoout waits for GPU. That causes a growth of SW unreleased commands. Make sure runtime flushes SW queue, if it grows over some threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE. Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396 [ROCm/clr commit: `8657a77029`]	2024-10-17 10:53:57 -04:00
Ioannis Assiouras	b5a8d775d6	SWDEV-476929 - Introduce an activeQueues set The new set tracks only the queues that have a command submitted to them. This allows for fast iteration in waitActiveStreams. Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7 [ROCm/clr commit: `bcc545e6b8`]	2024-09-16 15:53:49 -04:00
Saleel Kudchadker	1d4bd084b8	SWDEV-301667 - Cleanup unused paths - Refactor code and cleanup logic for callback saving for event records Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89 [ROCm/clr commit: `1338ff37e8`]	2023-11-02 11:43:41 -04:00
German Andryeyev	2d492a201b	SWDEV-423317 - Enable GPU wait for hip sync calls hipStreamSynchronize and hipDeviceSynchronize won't longer wait for CPU commands in DD mode Change-Id: I079c8bbfc34ddc6d3e2d74c92a34665877e512a5 [ROCm/clr commit: `fbea58ba11`]	2023-09-22 13:04:27 -04:00
German	73f02aa6dc	SWDEV-382397 - Move VirtualGPU destruction back to the thread exit OS can terminate unfinished queue thread from default stream at any time. Potentially leaving the queue lock in a bad state and causing a deadlock if runtime destroys VirtualGPU later from the host thread. Change-Id: I247f102ee84e6b4dba947504933395071945c85d [ROCm/clr commit: `28daf98f1f`]	2023-02-17 10:05:49 -05:00
German	f857dcc48d	SWDEV-352197 - Destroy virtual device in thread destructor Windows kills threads on exit without any notification. However, runtime can still destroy VirtualGPU object from the host thread with HostQueue destruction. This change also forces RGP trace transfer on the last capture without any delays. Change-Id: I768e87e99e1d23a021e63c12f36e450817743759 [ROCm/clr commit: `ad33a021cb`]	2023-01-31 10:53:48 -05:00
German Andryeyev	0ecf22bb53	SWDEV-336024 - Clear device heap to 0 This reverts commit `8624574866`. Reason for revert: Fix regressions Change-Id: I7d883e1c3cbd27bb64b581ec800243ad7dfe24fd [ROCm/clr commit: `07c1b9a998`]	2022-05-19 09:10:08 -04:00
German Andryeyev	8624574866	SWDEV-336024 - Clear device heap to 0 The heap must be cleared once per device, but ROCclr doesn't create a queue per device in HIP. Hence, the clear operation will be performed during the first queue creation. Change-Id: I52ceb06d67d11cde6d019c5ab510059f426a9bfb [ROCm/clr commit: `04bfd93569`]	2022-05-11 11:03:56 -04:00
Saleel Kudchadker	29752a2bbc	SWDEV-334150 - Force callback to cycle commands Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every 64 submits, This recycles the memory if we dont end up calling synchronize for the longest time. Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c [ROCm/clr commit: `fa76f03654`]	2022-04-28 12:30:23 -04:00
haoyuan2	248a738674	SWDEV-290298 - add a flag to indicate the primary context active status Change-Id: Ia31790706d3f855bc1eedf5ef874e471 [ROCm/clr commit: `439af94dd9`]	2021-12-09 23:28:54 -05:00
agunashe	49f0546637	SWDEV-293742 - Update copyright end year VDI repo Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261 [ROCm/clr commit: `d96481fb36`]	2021-08-22 23:56:07 -07:00
German Andryeyev	2813579db6	Add batch tracking for direct dispatch Make sure the logic updates the command status when it's done in HW, but not on submission. Add the last command tracking, otherwise queue sync logic in the HIP upper layer may skip synchronization, assuming the queue is empty. Change-Id: I2d046792553e74df090a10f7d7a78914610f6df2 [ROCm/clr commit: `5b31c69a95`]	2020-12-04 10:16:17 -05:00
German Andryeyev	9c462f9a6d	Disable worker thread creation for direct dispatch Change-Id: I28f08ab9352310c9bf843fcb803a48f95ddf4676 [ROCm/clr commit: `e4f51e063b`]	2020-11-30 17:50:12 -05:00
German Andryeyev	8014e4c7bc	Remove obsolete terminate() method Change-Id: I66b4a74f17977f1af320f402402a2f1b602e9911 [ROCm/clr commit: `08b846ae12`]	2020-11-30 11:46:09 -05:00
Laurent Morichetti	d0b6c2b538	Improve queueLock and lastCmdLock Reduce the size of the queueLock and lastCmdLock critical sections to improve lock contention performance. The smaller the critical sections are the better. lasCmdLock is still needed to guarantee that getLastEnqueueCommand_ can retain the command before it is swapped out and released. Change-Id: Id35d4a77c035b2da0de4c15568b153d49e958bb7 [ROCm/clr commit: `080dcfe857`]	2020-09-01 18:09:31 -04:00
Laurent Morichetti	5f5f1a3a84	Fix indentation with clang-format Change-Id: I7aeadef3c613d5efc31a98e666bfb819ae34bdf5 [ROCm/clr commit: `c95c613edc`]	2020-09-01 18:09:19 -04:00
Jason Tang	e1b0edf35c	SWDEV-246687 - Do not use std::vector reference as class member cuMask_ The current implementation creates default reference in the stack and assigns it to class member cuMasks_, so whenever the content of the stack changes, cuMask_ would change. Change-Id: Iefab63c335d504b83c4ae90bd34ae76c6afb8f3c [ROCm/clr commit: `8ef5da00c7`]	2020-08-05 16:57:36 -04:00
Tao Sang	44eb207f8d	Apply constexpr on global constant varaibles When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be referenced but they are not initialized in that early time. The patch will use constexpr to initialze global constant varables in compile time. Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef [ROCm/clr commit: `fdef6f722f`]	2020-07-22 22:14:13 -04:00
Christophe Paquot	f14d79c587	Make append and setLastQueuedCommand atomic Two threads can enqueue to the same HostQueue (HostQueue::enqueue) and result in last queued command being the first one reachine queue_.enqueue NOTE: Temporarly make setLastQueuedCommand empty function to pass the build Change-Id: Id09c3a28d184986f52b2ec86a2f6a18c40df1f0b [ROCm/clr commit: `3d15a1e291`]	2020-07-14 18:22:45 -04:00
Aryan Salmanpour	55c58ebfaa	Add support for setting queue priority for ROCm backend Change-Id: I67ed5a6868af79538f7f4522d8d11c043cdf3c1e [ROCm/clr commit: `b5552aa97f`]	2020-06-04 20:16:32 -04:00
German Andryeyev	3d2182f8ba	Revert "Avoid lock for last queued command" This reverts commit `88c3f77bed`. Reason for revert: <INSERT REASONING HERE> Change-Id: Ie10442c9447f010bb90c679b6cffca5b48b8d054 [ROCm/clr commit: `44bc0cb35d`]	2020-06-04 18:08:17 -04:00
German Andryeyev	88c3f77bed	Avoid lock for last queued command Use atomics for last queued command update Change-Id: I759e9d78ea72f23c0d45dbede6250b231e122276 [ROCm/clr commit: `dc4e09a63a`]	2020-05-29 11:06:55 -04:00
Christophe Paquot	992fbe8215	Use a dedicated lock for last queued command set/get Change-Id: If3d2144841c7863cf7afe2ca85aea62e0a3a33c7 [ROCm/clr commit: `0782acabb5`]	2020-05-28 12:49:39 -07:00
Aryan Salmanpour	dee687d2d7	Add support for setting CU mask on ROCclr for ROCm backend Change-Id: I0dbe2eeb33467fc0f24b26929119c10e9b455da7 [ROCm/clr commit: `fed94b8604`]	2020-05-15 14:23:43 -04:00
Payam	17f6a41982	removing AMD emails per palamida scan Change-Id: If7307f5b1f81a43f2725ec5abd3b8989cbddbcc5 [ROCm/clr commit: `1b6f21ad9a`]	2020-03-11 21:26:55 -04:00
Laurent Morichetti	e284923583	Update copyright info Change-Id: Ia4f9ff0f5f873b4223a8cca154188bb0d2f1abba [ROCm/clr commit: `b4c6143a2f`]	2020-02-04 09:26:14 -08:00
Laurent Morichetti	011f3e945b	Merge branch 'origin/pghafari/vdi-prototype' into lmoriche/amd-master Change-Id: Id3b833d405596735becb3346f3b08c6da57033fe [ROCm/clr commit: `20c7173849`]	2020-01-30 20:12:13 -08:00

37 Коммитов