Commit Graph

70 Commits

Author SHA1 Message Date
Andryeyev, German 28967982b2 SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature
2025-03-19 11:22:50 -04:00
Saleel Kudchadker e03e4f3b5d SWDEV-502365 - Track last used command
- This change tries to save extra synchronization packets we may insert
  as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
  passed as the argument

Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc
2025-03-04 16:05:02 -05:00
Aidan Belton-Schure 179801a750 SWDEV-508279 - Improve HIP event profiling
There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.

There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling

Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a
2025-02-13 04:15:40 -05:00
Saleel Kudchadker 2d450e8b06 SWDEV-504494 - Resolve signal dependencies
- Resolve signal dependencies for barrier value packet if there are > 1
  depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log

Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b
2025-01-29 19:49:02 +00:00
Anusha GodavarthySurya 683a942364 SWDEV-480209 - Make internal callbacks non-blocking
Change-Id: Ic918d08f341abfd9a7c167d09f9c723cdc43157f
2025-01-10 02:16:11 -05:00
German Andryeyev 403f624bf8 SWDEV-486602 - Add tracking of HSA handlers
Add an atomic counter to track the outstanding HSA handlers.
Wait on CPU for the callbacks if the number exceeds the value
in DEBUG_HIP_BLOCK_SYNC env variable.

Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5
2024-10-25 15:20:50 -04:00
German Andryeyev 8657a77029 SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
2024-10-17 10:53:57 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Ioannis Assiouras 07bcc283f9 SWDEV-488851 - Correctly remove the queue from the active set on windows
Change-Id: I4d21743ecf7a44636121f85566f898e62ff61e97
2024-10-02 12:06:59 +01:00
Ioannis Assiouras bcc545e6b8 SWDEV-476929 - Introduce an activeQueues set
The new set tracks only the queues that have a command
submitted to them. This allows for fast iteration
in waitActiveStreams.

Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7
2024-09-16 15:53:49 -04:00
taosang2 749385155a SWDEV-467540 - Get lastCommand safely
We must be in protected way to get last command when calling
awaitCompletion() where lastCommand will be released and
possibly destroyed.
This can solve scope lock(notify_lock_) crash in
Event::notifyCmdQueue() with AMD_DIRECT_DISPATCH = true.

Change-Id: I4297166f912a71112f4a8945d993160ba9afdc34
2024-06-28 21:18:22 -04:00
Ioannis Assiouras 3edf1501cc SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1
2024-06-12 16:22:27 -04:00
Ioannis Assiouras d6eaf49033 SWDEV-460925 - Do awaitCompletion before releasing the lastEnqueueCommand
Change-Id: I210399dd1bced13c0923fdb1c215e044920c5a4b
2024-05-28 06:31:10 +00:00
Saleel Kudchadker 51e4368723 SWDEV-459778 - Remove CPU wait for profiler
- No cpu wait is needed when profiler is attached, Doing this changes
the application profile when roctracer is attached.

Change-Id: I2b9cfc48d697cf5ed54bb6a240d8c12bdb079171
2024-05-28 06:28:17 +00:00
German Andryeyev 0ccdb3e160 SWDEV-440746 - Release last command on terminate
Change-Id: Ib6a9b8fc9a8692eb17b39b854cefd92c6b59733f
2024-04-22 09:57:38 -04:00
Jaydeep Patel eecbc2e436 SWDEV-431879 - Introduce IsHandlerPending back.
It seems that due to removal of vdev()->isHandlerPending(),
Marker queued to ensure finish is not enqueued and that cause
hung at waiting event for kernel enqueue command.

Change-Id: I364abb2dcb4897b11a7eb61b5d85013b69292792
2023-11-23 08:45:19 -05:00
Saleel Kudchadker 1338ff37e8 SWDEV-301667 - Cleanup unused paths
- Refactor code and cleanup logic for callback saving for event records

Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89
2023-11-02 11:43:41 -04:00
German Andryeyev fe7b36f3cb SWDEV-424603 - Use OR for CPU wait request
Make sure rocclr doesn't overwrite the client's request
for a wait.

Change-Id: I0addf18ea408b7f4ecaa1e04b2877cc0bbbfcc0d
2023-10-06 16:51:44 -04:00
German Andryeyev 5438b6362e SWDEV-424603 - Force CPU wait if profiling
Some pytorch tests use a tracer plugin and rely on profiling information
to be reported right after hipDeviceSynchronize()

Change-Id: Ib021a1e7b1a30b3c24de72627c471810f7f7878d
2023-10-06 11:33:06 -04:00
German Andryeyev 596b496c16 SWDEV-424249 - Check if HwEvent is available
Allocate marker only if HW event doesn't exist for the last command.

Change-Id: I3e7284202365a9c75313fb5403f0c1908ab51d1e
2023-10-02 11:27:16 -04:00
German Andryeyev fbea58ba11 SWDEV-423317 - Enable GPU wait for hip sync calls
hipStreamSynchronize and hipDeviceSynchronize won't longer wait
for CPU commands in DD mode

Change-Id: I079c8bbfc34ddc6d3e2d74c92a34665877e512a5
2023-09-22 13:04:27 -04:00
Saleel Kudchadker 1ec0ba3537 SWDEV-301667 - Use large signal pool
Use large signal pool if profiler is connected or profiling forced
enabled. This is needed to mitigate signal creation overhead when
profiling as signals are attached to every packet and deeper batch may
show overhead of signal allocation.

Change-Id: I8034b8a20b55328b87d593bf044f59672f9653e8
2023-08-24 19:17:05 -04:00
Rakesh Roy 8c1232124e SWDEV-405329 - Fix cuMask issue for WGP mode
- Enable CUs adjacent pairwise for WGP mode
- In HostQueue::terminate() do not segfault if virtual device hasn't been created

Change-Id: I94402ff333308af5824878086cc238b3993d534d
2023-06-30 01:09:01 -04:00
Saleel Kudchadker 3e603d986a SWDEV-364604 - Add ROCclr support for hipEventDisableSystemFence
Change-Id: I6127b432a8759359359a1890fda85bc401be6a56
2023-02-21 19:07:35 -05:00
German 28daf98f1f SWDEV-382397 - Move VirtualGPU destruction back to the thread exit
OS can terminate unfinished queue thread from default stream at any
time. Potentially leaving the queue lock in a bad state and causing a
deadlock if runtime destroys VirtualGPU later from the host thread.

Change-Id: I247f102ee84e6b4dba947504933395071945c85d
2023-02-17 10:05:49 -05:00
German ad33a021cb SWDEV-352197 - Destroy virtual device in thread destructor
Windows kills threads on exit without any notification. However,
runtime can still destroy VirtualGPU object from the host thread with
HostQueue destruction.
This change also forces RGP trace transfer on the last capture without
any delays.

Change-Id: I768e87e99e1d23a021e63c12f36e450817743759
2023-01-31 10:53:48 -05:00
Ajay ecea27eb2d SWDEV-372757 - thread check workaround for windows hang
Change-Id: Ie9f87b88dd0f3078ad1919edc336f297f6b40373
2023-01-13 04:05:35 -05:00
German e223b0f678 SWDEV-352487 - Don't add notifications as the last command
Change-Id: Ifed34485839ef2c9491e8e8f6bb3569932160b1c
2022-10-24 09:39:03 -04:00
Saleel Kudchadker 9b5cbd37a2 SWDEV-352001 - Store last scopes for dispatch
- Store last fence scopes and use the last value to determine if we need a cache flush again. This helps cases where hipExtLaunchKernel API is
used.
- Purge code for ROC_EVENT_NO_FLUSH

Change-Id: I531cf9c9c60d5e2b3a9e265d0f52f79ed2fa8a8c
2022-09-22 11:34:10 -04:00
Joseph Greathouse 6b956f7627 SWDEV-330307 - Avoid releasing command before last use
The fix for SWDEV-329789 moved down the last use of the a
command object pointer in order to prevent a race condition.
However, the previous patch did not move down the release of
that command. By releasing the command early, another thread
could get a command with the same pointer. That second thread
could later submit work to the queue using that new command.
The first thread could then perform a comparison against the
queue's last command using its own now-stale pointer. This
could eventually allow the second thread to skip synchornizing
on the queue. This would result in host synchronizations
completing before their device work was actually complete.

Change-Id: I292b7b369743251ceafe453a4c5cae14a6d01046
2022-08-31 16:07:49 -04:00
Jason Tang d92b3a2d90 SWDEV-333471 - Add GPU_FORCE_QUEUE_PROFILING
To support both hip and ocl. HIP_FORCE_QUEUE_PROFILING will be replaced with this later on.

Change-Id: I6d3514b1568ff049584ed9fd74bbdb3e4f4bf0c3
2022-08-19 10:51:41 -04:00
German Andryeyev 9e74f1c7f8 SWDEV-329789 - Avoid a race condition with the last command
Runtime can reset the last command only if it didn't change
since the query at the beginning of finish()

Change-Id: I629f2d788e9bbaa17ca4e96b1a753f8131e32463
2022-07-07 10:17:07 -04:00
Ajay 236178d0d4 SWDEV-337331 - command queue logs for debugging option
Change-Id: I198aecc5fd12369d87d4acc9910acc9435c1967a
2022-06-22 19:41:38 +00:00
Saleel Kudchadker 5df34a2f7a SWDEV-335780 - Indicate if handler is queued
Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.

Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928
2022-06-14 20:55:06 -04:00
German Andryeyev 07c1b9a998 SWDEV-336024 - Clear device heap to 0
This reverts commit 04bfd93569.

Reason for revert: Fix regressions

Change-Id: I7d883e1c3cbd27bb64b581ec800243ad7dfe24fd
2022-05-19 09:10:08 -04:00
German Andryeyev 04bfd93569 SWDEV-336024 - Clear device heap to 0
The heap must be cleared once per device, but ROCclr doesn't
create a queue per device in HIP. Hence, the clear operation will
be performed during the first queue creation.

Change-Id: I52ceb06d67d11cde6d019c5ab510059f426a9bfb
2022-05-11 11:03:56 -04:00
Saleel Kudchadker fa76f03654 SWDEV-334150 - Force callback to cycle commands
Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every
64 submits, This recycles the memory if we dont end up calling
synchronize for the longest time.

Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c
2022-04-28 12:30:23 -04:00
Saleel Kudchadker ddfd919a62 SWDEV-333237 - Release command before queing a marker
Change-Id: I5343c4b7ade2dc68efa7454a919a6657726c45d3
2022-04-22 12:58:58 -04:00
Saleel Kudchadker 8eeaa998c0 SWDEV-301667 - Add cache state for a device
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker

Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
2022-04-12 12:27:31 -04:00
Satyanvesh Dittakavi e20dd61932 SWDEV-306939 - Fix vdi errors/warnings by CppCheck
Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd
2022-01-12 00:22:16 -05:00
haoyuan2 439af94dd9 SWDEV-290298 - add a flag to indicate the primary context active status
Change-Id: Ia31790706d3f855bc1eedf5ef874e471
2021-12-09 23:28:54 -05:00
anusha GodavarthySurya 682151f39d SWDEV-295251 - Avoid marker if queue is empty for DD to fix MT issue
Change-Id: I80be39ace9d93347f81ef8acd7858d43bc4a3f1e
2021-08-22 23:56:08 -07:00
anusha GodavarthySurya de5168fdef SWDEV-295251 - Remove waitEvent check in append
Change-Id: I994f3e7c67ed29c4ee46229c8bcd1448fc7f59ec
2021-08-22 23:56:08 -07:00
agunashe d96481fb36 SWDEV-293742 - Update copyright end year VDI repo
Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261
2021-08-22 23:56:07 -07:00
German Andryeyev 6ab8dcc682 SWDEV-292018 - Avoid marker if queue is empty
Change-Id: I40a42d67d2c911d2c9a0bf425f36bc795f9539c0
2021-08-22 23:56:07 -07:00
Saleel Kudchadker 8e08880cc3 SWDEV-247372 - Add logging for debug
Change-Id: Id5a27034005a7deba37072d8a4c6f250104a96c8
2021-08-22 23:56:07 -07:00
Christophe Paquot 133287f31f SWDEV-240806 - Release resources in Command::terminate for HIP
We do not want to release resources during setStatus in HIP because of Graphs

Change-Id: Idc7b188ab5f8be6975ea91005dd2bbf177401f8c
2021-08-22 23:56:07 -07:00
German Andryeyev 85c70a7495 SWDEV-284671 - Add HW event wait to improve hipDeviceSynchronize
If AMD event contains a reference to a HW event, then runtime
could check/wait for HW event. CPU status update will occur later
after HSA signal callback, but it's not important for the result.

Change-Id: I591391a953bbdba6a25ac07e2cd98aeb17cd4596
2021-08-22 23:56:07 -07:00
German Andryeyev a81756bba3 SWDEV-285318 - Wait for the queue before destruction
With direct dispatch enabled make sure the queue is done before
destruction.

Change-Id: Ib80af3efb97dfb93e2dce60a11db34fb5c45f5cd
2021-05-20 10:28:24 -04:00
Satyanvesh Dittakavi a711a49881 SWDEV-264244 - Hide Notifications from HIP
This fixes hipStreamQuery returning hipErrorNotReady when idle
Change-Id: I3f77666a00bc6a7162b6c660d79e76c09669d94f
2021-03-16 06:30:55 -04:00