37 Коммитов

Автор SHA1 Сообщение Дата
SaleelK 340f3aa887 clr: Implement dynamic stream to HWq logic (#1958)
* clr: Implement dynamic stream to HW queue assignment

This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:

* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
  pipe mapping based on creation order (single process per device only,
  as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities

Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
  Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation

* clr: Clean up last_used_queue_
2026-01-23 10:40:54 -08:00
Ioannis Assiouras 602ea0be1e SWDEV-558078 - Fix use-after-free in graph tests due to AsyncEventHandler (#1502) 2025-10-23 22:49:24 +01:00
Ioannis Assiouras 6d6b136374 SWDEV-559166 - Fix data races in GetSubmissionBatch, CaptureAndSet and SetQueueStatus (#1441) 2025-10-23 12:18:31 +01:00
Danylo Lytovchenko f7338717ae SWDEV-470698 - fix formatting, add format check workflow (#657) 2025-08-20 19:58:06 +05:30
Manocha, Rahul b3ccf487da SWDEV-545952 - API definitions for hipStreamSet/GetAttribute (#831)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>

[ROCm/clr commit: 0f49c4a97f]
2025-08-15 12:51:35 -07:00
Kudchadker, Saleel 3a849c6962 SWDEV-538195 - Introduce threshold for handler submission (#723)
- When doing device/stream sync, we can submit a handler which may
  introduce some host side delays. Use DEBUG_CLR_BATCH_CPU_SYNC_SIZE to
  batch commands for host wait. Default for HIP is 8 commands.
- Investigation is underway in ROCr but need to address this for now in
  HIP runtime.

[ROCm/clr commit: 9b045922a8]
2025-08-06 20:34:42 -07:00
Saleel Kudchadker c8f39ec2b0 SWDEV-502365 - Track last used command
- This change tries to save extra synchronization packets we may insert
  as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
  passed as the argument

Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc


[ROCm/clr commit: e03e4f3b5d]
2025-03-04 16:05:02 -05:00
Tao Sang 7803594aea SWDEV-458943 - Add fast path in wait()
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
 signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().

Improve monitor wrapper for better performance.

Fix some bugs left from name removing patch.

Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24


[ROCm/clr commit: f2ff56af9c]
2025-01-28 12:19:55 -05:00
Anusha GodavarthySurya 08c92f4793 SWDEV-480209 - Make internal callbacks non-blocking
Change-Id: Ic918d08f341abfd9a7c167d09f9c723cdc43157f


[ROCm/clr commit: 683a942364]
2025-01-10 02:16:11 -05:00
Anusha GodavarthySurya c34f55babb SWDEV-489084 - Avoid using queue colliding with the graph launch stream
Change-Id: I3ecaf8836c8e0883441275139041c702aba0937e


[ROCm/clr commit: 06e6561eb5]
2024-11-29 08:15:58 -05:00
German Andryeyev 0a03665a3f SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396


[ROCm/clr commit: 8657a77029]
2024-10-17 10:53:57 -04:00
Ioannis Assiouras b5a8d775d6 SWDEV-476929 - Introduce an activeQueues set
The new set tracks only the queues that have a command
submitted to them. This allows for fast iteration
in waitActiveStreams.

Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7


[ROCm/clr commit: bcc545e6b8]
2024-09-16 15:53:49 -04:00
Saleel Kudchadker 1d4bd084b8 SWDEV-301667 - Cleanup unused paths
- Refactor code and cleanup logic for callback saving for event records

Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89


[ROCm/clr commit: 1338ff37e8]
2023-11-02 11:43:41 -04:00
German Andryeyev 2d492a201b SWDEV-423317 - Enable GPU wait for hip sync calls
hipStreamSynchronize and hipDeviceSynchronize won't longer wait
for CPU commands in DD mode

Change-Id: I079c8bbfc34ddc6d3e2d74c92a34665877e512a5


[ROCm/clr commit: fbea58ba11]
2023-09-22 13:04:27 -04:00
German 73f02aa6dc SWDEV-382397 - Move VirtualGPU destruction back to the thread exit
OS can terminate unfinished queue thread from default stream at any
time. Potentially leaving the queue lock in a bad state and causing a
deadlock if runtime destroys VirtualGPU later from the host thread.

Change-Id: I247f102ee84e6b4dba947504933395071945c85d


[ROCm/clr commit: 28daf98f1f]
2023-02-17 10:05:49 -05:00
German f857dcc48d SWDEV-352197 - Destroy virtual device in thread destructor
Windows kills threads on exit without any notification. However,
runtime can still destroy VirtualGPU object from the host thread with
HostQueue destruction.
This change also forces RGP trace transfer on the last capture without
any delays.

Change-Id: I768e87e99e1d23a021e63c12f36e450817743759


[ROCm/clr commit: ad33a021cb]
2023-01-31 10:53:48 -05:00
German Andryeyev 0ecf22bb53 SWDEV-336024 - Clear device heap to 0
This reverts commit 8624574866.

Reason for revert: Fix regressions

Change-Id: I7d883e1c3cbd27bb64b581ec800243ad7dfe24fd


[ROCm/clr commit: 07c1b9a998]
2022-05-19 09:10:08 -04:00
German Andryeyev 8624574866 SWDEV-336024 - Clear device heap to 0
The heap must be cleared once per device, but ROCclr doesn't
create a queue per device in HIP. Hence, the clear operation will
be performed during the first queue creation.

Change-Id: I52ceb06d67d11cde6d019c5ab510059f426a9bfb


[ROCm/clr commit: 04bfd93569]
2022-05-11 11:03:56 -04:00
Saleel Kudchadker 29752a2bbc SWDEV-334150 - Force callback to cycle commands
Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every
64 submits, This recycles the memory if we dont end up calling
synchronize for the longest time.

Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c


[ROCm/clr commit: fa76f03654]
2022-04-28 12:30:23 -04:00
haoyuan2 248a738674 SWDEV-290298 - add a flag to indicate the primary context active status
Change-Id: Ia31790706d3f855bc1eedf5ef874e471


[ROCm/clr commit: 439af94dd9]
2021-12-09 23:28:54 -05:00
agunashe 49f0546637 SWDEV-293742 - Update copyright end year VDI repo
Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261


[ROCm/clr commit: d96481fb36]
2021-08-22 23:56:07 -07:00
German Andryeyev 2813579db6 Add batch tracking for direct dispatch
Make sure the logic updates the command status when it's done in
HW, but not on submission.
Add the last command tracking, otherwise queue sync logic in the HIP
upper layer may skip synchronization, assuming the queue is empty.

Change-Id: I2d046792553e74df090a10f7d7a78914610f6df2


[ROCm/clr commit: 5b31c69a95]
2020-12-04 10:16:17 -05:00
German Andryeyev 9c462f9a6d Disable worker thread creation for direct dispatch
Change-Id: I28f08ab9352310c9bf843fcb803a48f95ddf4676


[ROCm/clr commit: e4f51e063b]
2020-11-30 17:50:12 -05:00
German Andryeyev 8014e4c7bc Remove obsolete terminate() method
Change-Id: I66b4a74f17977f1af320f402402a2f1b602e9911


[ROCm/clr commit: 08b846ae12]
2020-11-30 11:46:09 -05:00
Laurent Morichetti d0b6c2b538 Improve queueLock and lastCmdLock
Reduce the size of the queueLock and lastCmdLock critical sections
to improve lock contention performance. The smaller the critical
sections are the better.

lasCmdLock is still needed to guarantee that getLastEnqueueCommand_
can retain the command before it is swapped out and released.

Change-Id: Id35d4a77c035b2da0de4c15568b153d49e958bb7


[ROCm/clr commit: 080dcfe857]
2020-09-01 18:09:31 -04:00
Laurent Morichetti 5f5f1a3a84 Fix indentation with clang-format
Change-Id: I7aeadef3c613d5efc31a98e666bfb819ae34bdf5


[ROCm/clr commit: c95c613edc]
2020-09-01 18:09:19 -04:00
Jason Tang e1b0edf35c SWDEV-246687 - Do not use std::vector reference as class member cuMask_
The current implementation creates default reference in the stack and assigns it to class member cuMasks_, so whenever the content of the stack changes, cuMask_ would change.

Change-Id: Iefab63c335d504b83c4ae90bd34ae76c6afb8f3c


[ROCm/clr commit: 8ef5da00c7]
2020-08-05 16:57:36 -04:00
Tao Sang 44eb207f8d Apply constexpr on global constant varaibles
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.

Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef


[ROCm/clr commit: fdef6f722f]
2020-07-22 22:14:13 -04:00
Christophe Paquot f14d79c587 Make append and setLastQueuedCommand atomic
Two threads can enqueue to the same HostQueue (HostQueue::enqueue)
and result in last queued command being the first one reachine queue_.enqueue

NOTE: Temporarly make setLastQueuedCommand empty function to pass the build

Change-Id: Id09c3a28d184986f52b2ec86a2f6a18c40df1f0b


[ROCm/clr commit: 3d15a1e291]
2020-07-14 18:22:45 -04:00
Aryan Salmanpour 55c58ebfaa Add support for setting queue priority for ROCm backend
Change-Id: I67ed5a6868af79538f7f4522d8d11c043cdf3c1e


[ROCm/clr commit: b5552aa97f]
2020-06-04 20:16:32 -04:00
German Andryeyev 3d2182f8ba Revert "Avoid lock for last queued command"
This reverts commit 88c3f77bed.

Reason for revert: <INSERT REASONING HERE>

Change-Id: Ie10442c9447f010bb90c679b6cffca5b48b8d054


[ROCm/clr commit: 44bc0cb35d]
2020-06-04 18:08:17 -04:00
German Andryeyev 88c3f77bed Avoid lock for last queued command
Use atomics for last queued command update

Change-Id: I759e9d78ea72f23c0d45dbede6250b231e122276


[ROCm/clr commit: dc4e09a63a]
2020-05-29 11:06:55 -04:00
Christophe Paquot 992fbe8215 Use a dedicated lock for last queued command set/get
Change-Id: If3d2144841c7863cf7afe2ca85aea62e0a3a33c7


[ROCm/clr commit: 0782acabb5]
2020-05-28 12:49:39 -07:00
Aryan Salmanpour dee687d2d7 Add support for setting CU mask on ROCclr for ROCm backend
Change-Id: I0dbe2eeb33467fc0f24b26929119c10e9b455da7


[ROCm/clr commit: fed94b8604]
2020-05-15 14:23:43 -04:00
Payam 17f6a41982 removing AMD emails per palamida scan
Change-Id: If7307f5b1f81a43f2725ec5abd3b8989cbddbcc5


[ROCm/clr commit: 1b6f21ad9a]
2020-03-11 21:26:55 -04:00
Laurent Morichetti e284923583 Update copyright info
Change-Id: Ia4f9ff0f5f873b4223a8cca154188bb0d2f1abba


[ROCm/clr commit: b4c6143a2f]
2020-02-04 09:26:14 -08:00
Laurent Morichetti 011f3e945b Merge branch 'origin/pghafari/vdi-prototype' into lmoriche/amd-master
Change-Id: Id3b833d405596735becb3346f3b08c6da57033fe


[ROCm/clr commit: 20c7173849]
2020-01-30 20:12:13 -08:00