Граф коммитов

78 Коммитов

Автор SHA1 Сообщение Дата
Andryeyev, German 6df9a49437 SWDEV-465041 - Add support for user events with DD (#321)
* SWDEV-465041 - Add support for user events with DD

User events can be replaced with HSA signals. Add the interface
to allocate HSA signal for user events and update the status on
CL_COMPLETE.
Force pinned path with DD to avoid blocking calls. Pinned memory
can be released only when the command is complete.
Simplify device enqueue path to use generic kernel arg buffer and
signals

* Fix notifyCmdQueue() logic for OCL

* Avoid blocking calls in OCL with DD

* Add event  destruciton in a case of the failure.

[ROCm/clr commit: 2305f8ae56]
2025-08-12 19:04:36 -04:00
Kudchadker, Saleel 3a849c6962 SWDEV-538195 - Introduce threshold for handler submission (#723)
- When doing device/stream sync, we can submit a handler which may
  introduce some host side delays. Use DEBUG_CLR_BATCH_CPU_SYNC_SIZE to
  batch commands for host wait. Default for HIP is 8 commands.
- Investigation is underway in ROCr but need to address this for now in
  HIP runtime.

[ROCm/clr commit: 9b045922a8]
2025-08-06 20:34:42 -07:00
Patel, Jaydeepkumar 821a1d89b0 SWDEV-536226 - Avoid waiting for lastCommand completion if GPU has already reported an error otherwise it causes hang due to status of cmd is not becoming CL_COMPLETE. (#478)
[ROCm/clr commit: a60212b9b4]
2025-06-25 20:59:17 +05:30
Jayaprakash, Karthik 4ea2d9a5ee SWDEV-531711 - Report correct error code based on device failure. (#286)
[ROCm/clr commit: f5b8db33f1]
2025-05-17 06:33:13 -04:00
Andryeyev, German 3ea758a2d4 SWDEV-528808 - Release all HW queues even if only one is idle (#240)
Pytorch may not explicitly idle each queue. Thus, some queues can be considered as busy,
but have idle state in reality


[ROCm/clr commit: 65a0181a7c]
2025-05-05 19:09:01 -04:00
Sang, Tao 68deb3d10a SWDEV-520352 - Remove HostThread and legacy monitor (#230)
* SWDEV-520352 - Remove HostThread and legacy monitor

Remove HostThread, semaphore and  legacy monitor.
Make original logics of thread and command queue stricker.
Add more comments to make logics clearer.
Some other minor improvement.

Also part of SWDEV-458943.

[ROCm/clr commit: 96cadbc9e9]
2025-04-29 09:55:24 -04:00
Sang, Tao 60a1e6dbc1 SWDEV-523824 - Fix data validation issue of rocFFT (#154)
Fix data validation issue of rocFFT when dynamic queue on.
ReleaseHwQueue() can be called only when no command in HostQueue.
The checking condition need be protected by lock.

[ROCm/clr commit: 18d191fd1d]
2025-04-08 20:30:06 -04:00
Arandjelovic, Marko 1c83314659 SWDEV-517867 - Remove invalid assert (#55)
* Remove invalid assert

* Retrigger CI

* Rebase

[ROCm/clr commit: 8fcaa1ca93]
2025-04-03 11:14:32 +02:00
Andryeyev, German 5c7c86f66d SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature

[ROCm/clr commit: 28967982b2]
2025-03-19 11:22:50 -04:00
Saleel Kudchadker c8f39ec2b0 SWDEV-502365 - Track last used command
- This change tries to save extra synchronization packets we may insert
  as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
  passed as the argument

Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc


[ROCm/clr commit: e03e4f3b5d]
2025-03-04 16:05:02 -05:00
Aidan Belton-Schure 4b4a35b86b SWDEV-508279 - Improve HIP event profiling
There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.

There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling

Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a


[ROCm/clr commit: 179801a750]
2025-02-13 04:15:40 -05:00
Saleel Kudchadker d0656c944b SWDEV-504494 - Resolve signal dependencies
- Resolve signal dependencies for barrier value packet if there are > 1
  depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log

Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b


[ROCm/clr commit: 2d450e8b06]
2025-01-29 19:49:02 +00:00
Anusha GodavarthySurya 08c92f4793 SWDEV-480209 - Make internal callbacks non-blocking
Change-Id: Ic918d08f341abfd9a7c167d09f9c723cdc43157f


[ROCm/clr commit: 683a942364]
2025-01-10 02:16:11 -05:00
German Andryeyev 3191f8e942 SWDEV-486602 - Add tracking of HSA handlers
Add an atomic counter to track the outstanding HSA handlers.
Wait on CPU for the callbacks if the number exceeds the value
in DEBUG_HIP_BLOCK_SYNC env variable.

Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5


[ROCm/clr commit: 403f624bf8]
2024-10-25 15:20:50 -04:00
German Andryeyev 0a03665a3f SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396


[ROCm/clr commit: 8657a77029]
2024-10-17 10:53:57 -04:00
German Andryeyev faea40cbb3 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9


[ROCm/clr commit: 364dfb0ed1]
2024-10-11 14:50:25 -04:00
Ioannis Assiouras 00cb623a67 SWDEV-488851 - Correctly remove the queue from the active set on windows
Change-Id: I4d21743ecf7a44636121f85566f898e62ff61e97


[ROCm/clr commit: 07bcc283f9]
2024-10-02 12:06:59 +01:00
Ioannis Assiouras b5a8d775d6 SWDEV-476929 - Introduce an activeQueues set
The new set tracks only the queues that have a command
submitted to them. This allows for fast iteration
in waitActiveStreams.

Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7


[ROCm/clr commit: bcc545e6b8]
2024-09-16 15:53:49 -04:00
taosang2 881ffd6650 SWDEV-467540 - Get lastCommand safely
We must be in protected way to get last command when calling
awaitCompletion() where lastCommand will be released and
possibly destroyed.
This can solve scope lock(notify_lock_) crash in
Event::notifyCmdQueue() with AMD_DIRECT_DISPATCH = true.

Change-Id: I4297166f912a71112f4a8945d993160ba9afdc34


[ROCm/clr commit: 749385155a]
2024-06-28 21:18:22 -04:00
Ioannis Assiouras af089a2171 SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1


[ROCm/clr commit: 3edf1501cc]
2024-06-12 16:22:27 -04:00
Ioannis Assiouras 60ba0874fa SWDEV-460925 - Do awaitCompletion before releasing the lastEnqueueCommand
Change-Id: I210399dd1bced13c0923fdb1c215e044920c5a4b


[ROCm/clr commit: d6eaf49033]
2024-05-28 06:31:10 +00:00
Saleel Kudchadker 3a67addd48 SWDEV-459778 - Remove CPU wait for profiler
- No cpu wait is needed when profiler is attached, Doing this changes
the application profile when roctracer is attached.

Change-Id: I2b9cfc48d697cf5ed54bb6a240d8c12bdb079171


[ROCm/clr commit: 51e4368723]
2024-05-28 06:28:17 +00:00
German Andryeyev a2ffb2ad40 SWDEV-440746 - Release last command on terminate
Change-Id: Ib6a9b8fc9a8692eb17b39b854cefd92c6b59733f


[ROCm/clr commit: 0ccdb3e160]
2024-04-22 09:57:38 -04:00
Jaydeep Patel 7933b88d7c SWDEV-431879 - Introduce IsHandlerPending back.
It seems that due to removal of vdev()->isHandlerPending(),
Marker queued to ensure finish is not enqueued and that cause
hung at waiting event for kernel enqueue command.

Change-Id: I364abb2dcb4897b11a7eb61b5d85013b69292792


[ROCm/clr commit: eecbc2e436]
2023-11-23 08:45:19 -05:00
Saleel Kudchadker 1d4bd084b8 SWDEV-301667 - Cleanup unused paths
- Refactor code and cleanup logic for callback saving for event records

Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89


[ROCm/clr commit: 1338ff37e8]
2023-11-02 11:43:41 -04:00
German Andryeyev bd63f3f614 SWDEV-424603 - Use OR for CPU wait request
Make sure rocclr doesn't overwrite the client's request
for a wait.

Change-Id: I0addf18ea408b7f4ecaa1e04b2877cc0bbbfcc0d


[ROCm/clr commit: fe7b36f3cb]
2023-10-06 16:51:44 -04:00
German Andryeyev d593231137 SWDEV-424603 - Force CPU wait if profiling
Some pytorch tests use a tracer plugin and rely on profiling information
to be reported right after hipDeviceSynchronize()

Change-Id: Ib021a1e7b1a30b3c24de72627c471810f7f7878d


[ROCm/clr commit: 5438b6362e]
2023-10-06 11:33:06 -04:00
German Andryeyev ee34d05add SWDEV-424249 - Check if HwEvent is available
Allocate marker only if HW event doesn't exist for the last command.

Change-Id: I3e7284202365a9c75313fb5403f0c1908ab51d1e


[ROCm/clr commit: 596b496c16]
2023-10-02 11:27:16 -04:00
German Andryeyev 2d492a201b SWDEV-423317 - Enable GPU wait for hip sync calls
hipStreamSynchronize and hipDeviceSynchronize won't longer wait
for CPU commands in DD mode

Change-Id: I079c8bbfc34ddc6d3e2d74c92a34665877e512a5


[ROCm/clr commit: fbea58ba11]
2023-09-22 13:04:27 -04:00
Saleel Kudchadker 0a26b75238 SWDEV-301667 - Use large signal pool
Use large signal pool if profiler is connected or profiling forced
enabled. This is needed to mitigate signal creation overhead when
profiling as signals are attached to every packet and deeper batch may
show overhead of signal allocation.

Change-Id: I8034b8a20b55328b87d593bf044f59672f9653e8


[ROCm/clr commit: 1ec0ba3537]
2023-08-24 19:17:05 -04:00
Rakesh Roy f887f2fc6f SWDEV-405329 - Fix cuMask issue for WGP mode
- Enable CUs adjacent pairwise for WGP mode
- In HostQueue::terminate() do not segfault if virtual device hasn't been created

Change-Id: I94402ff333308af5824878086cc238b3993d534d


[ROCm/clr commit: 8c1232124e]
2023-06-30 01:09:01 -04:00
Saleel Kudchadker 858e311f34 SWDEV-364604 - Add ROCclr support for hipEventDisableSystemFence
Change-Id: I6127b432a8759359359a1890fda85bc401be6a56


[ROCm/clr commit: 3e603d986a]
2023-02-21 19:07:35 -05:00
German 73f02aa6dc SWDEV-382397 - Move VirtualGPU destruction back to the thread exit
OS can terminate unfinished queue thread from default stream at any
time. Potentially leaving the queue lock in a bad state and causing a
deadlock if runtime destroys VirtualGPU later from the host thread.

Change-Id: I247f102ee84e6b4dba947504933395071945c85d


[ROCm/clr commit: 28daf98f1f]
2023-02-17 10:05:49 -05:00
German f857dcc48d SWDEV-352197 - Destroy virtual device in thread destructor
Windows kills threads on exit without any notification. However,
runtime can still destroy VirtualGPU object from the host thread with
HostQueue destruction.
This change also forces RGP trace transfer on the last capture without
any delays.

Change-Id: I768e87e99e1d23a021e63c12f36e450817743759


[ROCm/clr commit: ad33a021cb]
2023-01-31 10:53:48 -05:00
Ajay 3d12929eb8 SWDEV-372757 - thread check workaround for windows hang
Change-Id: Ie9f87b88dd0f3078ad1919edc336f297f6b40373


[ROCm/clr commit: ecea27eb2d]
2023-01-13 04:05:35 -05:00
German f5f0a6c618 SWDEV-352487 - Don't add notifications as the last command
Change-Id: Ifed34485839ef2c9491e8e8f6bb3569932160b1c


[ROCm/clr commit: e223b0f678]
2022-10-24 09:39:03 -04:00
Saleel Kudchadker 0dd9add8e1 SWDEV-352001 - Store last scopes for dispatch
- Store last fence scopes and use the last value to determine if we need a cache flush again. This helps cases where hipExtLaunchKernel API is
used.
- Purge code for ROC_EVENT_NO_FLUSH

Change-Id: I531cf9c9c60d5e2b3a9e265d0f52f79ed2fa8a8c


[ROCm/clr commit: 9b5cbd37a2]
2022-09-22 11:34:10 -04:00
Joseph Greathouse b995ea06e8 SWDEV-330307 - Avoid releasing command before last use
The fix for SWDEV-329789 moved down the last use of the a
command object pointer in order to prevent a race condition.
However, the previous patch did not move down the release of
that command. By releasing the command early, another thread
could get a command with the same pointer. That second thread
could later submit work to the queue using that new command.
The first thread could then perform a comparison against the
queue's last command using its own now-stale pointer. This
could eventually allow the second thread to skip synchornizing
on the queue. This would result in host synchronizations
completing before their device work was actually complete.

Change-Id: I292b7b369743251ceafe453a4c5cae14a6d01046


[ROCm/clr commit: 6b956f7627]
2022-08-31 16:07:49 -04:00
Jason Tang fb753e489d SWDEV-333471 - Add GPU_FORCE_QUEUE_PROFILING
To support both hip and ocl. HIP_FORCE_QUEUE_PROFILING will be replaced with this later on.

Change-Id: I6d3514b1568ff049584ed9fd74bbdb3e4f4bf0c3


[ROCm/clr commit: d92b3a2d90]
2022-08-19 10:51:41 -04:00
German Andryeyev 685104cefc SWDEV-329789 - Avoid a race condition with the last command
Runtime can reset the last command only if it didn't change
since the query at the beginning of finish()

Change-Id: I629f2d788e9bbaa17ca4e96b1a753f8131e32463


[ROCm/clr commit: 9e74f1c7f8]
2022-07-07 10:17:07 -04:00
Ajay 6596275caf SWDEV-337331 - command queue logs for debugging option
Change-Id: I198aecc5fd12369d87d4acc9910acc9435c1967a


[ROCm/clr commit: 236178d0d4]
2022-06-22 19:41:38 +00:00
Saleel Kudchadker b3ad41f6e4 SWDEV-335780 - Indicate if handler is queued
Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.

Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928


[ROCm/clr commit: 5df34a2f7a]
2022-06-14 20:55:06 -04:00
German Andryeyev 0ecf22bb53 SWDEV-336024 - Clear device heap to 0
This reverts commit 8624574866.

Reason for revert: Fix regressions

Change-Id: I7d883e1c3cbd27bb64b581ec800243ad7dfe24fd


[ROCm/clr commit: 07c1b9a998]
2022-05-19 09:10:08 -04:00
German Andryeyev 8624574866 SWDEV-336024 - Clear device heap to 0
The heap must be cleared once per device, but ROCclr doesn't
create a queue per device in HIP. Hence, the clear operation will
be performed during the first queue creation.

Change-Id: I52ceb06d67d11cde6d019c5ab510059f426a9bfb


[ROCm/clr commit: 04bfd93569]
2022-05-11 11:03:56 -04:00
Saleel Kudchadker 29752a2bbc SWDEV-334150 - Force callback to cycle commands
Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every
64 submits, This recycles the memory if we dont end up calling
synchronize for the longest time.

Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c


[ROCm/clr commit: fa76f03654]
2022-04-28 12:30:23 -04:00
Saleel Kudchadker f464cdacf4 SWDEV-333237 - Release command before queing a marker
Change-Id: I5343c4b7ade2dc68efa7454a919a6657726c45d3


[ROCm/clr commit: ddfd919a62]
2022-04-22 12:58:58 -04:00
Saleel Kudchadker 3d0100c5ab SWDEV-301667 - Add cache state for a device
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker

Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0


[ROCm/clr commit: 8eeaa998c0]
2022-04-12 12:27:31 -04:00
Satyanvesh Dittakavi 85c2cac111 SWDEV-306939 - Fix vdi errors/warnings by CppCheck
Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd


[ROCm/clr commit: e20dd61932]
2022-01-12 00:22:16 -05:00
haoyuan2 248a738674 SWDEV-290298 - add a flag to indicate the primary context active status
Change-Id: Ia31790706d3f855bc1eedf5ef874e471


[ROCm/clr commit: 439af94dd9]
2021-12-09 23:28:54 -05:00
anusha GodavarthySurya fce3b20213 SWDEV-295251 - Avoid marker if queue is empty for DD to fix MT issue
Change-Id: I80be39ace9d93347f81ef8acd7858d43bc4a3f1e


[ROCm/clr commit: 682151f39d]
2021-08-22 23:56:08 -07:00