Граф коммитов

103 Коммитов

Автор SHA1 Сообщение Дата
Ioannis Assiouras 36029ea1a8 SWDEV-559166 - Fix race condition in getDemangledName (#1868) 2025-11-23 08:45:45 +00:00
Ioannis Assiouras 6d6b136374 SWDEV-559166 - Fix data races in GetSubmissionBatch, CaptureAndSet and SetQueueStatus (#1441) 2025-10-23 12:18:31 +01:00
SaleelK c4537e8050 SWDEV-553126 - Improve logging (#835)
* Ability to mask COPY api usage in logs
* Show total graph nodes in logs
* Add another log level for detailed debug
2025-09-04 10:08:41 -07:00
Danylo Lytovchenko 2ff2316227 Adjust clang format to the new versions, revert broken macro layout (#714) 2025-08-22 17:23:22 +02:00
Danylo Lytovchenko f7338717ae SWDEV-470698 - fix formatting, add format check workflow (#657) 2025-08-20 19:58:06 +05:30
Arandjelovic, Marko 208d124f54 SWDEV-547453 Release the kernel command if the operation returns an error (#807)
* SWDEV-547453 Release the kernel command if the operation returns an error

* SWDEV-547453 - Initialize parameters_ to default value

* SWDEV-547453 - Run clang-format

[ROCm/clr commit: a15957fee9]
2025-08-14 20:08:53 +02:00
Andryeyev, German 6df9a49437 SWDEV-465041 - Add support for user events with DD (#321)
* SWDEV-465041 - Add support for user events with DD

User events can be replaced with HSA signals. Add the interface
to allocate HSA signal for user events and update the status on
CL_COMPLETE.
Force pinned path with DD to avoid blocking calls. Pinned memory
can be released only when the command is complete.
Simplify device enqueue path to use generic kernel arg buffer and
signals

* Fix notifyCmdQueue() logic for OCL

* Avoid blocking calls in OCL with DD

* Add event  destruciton in a case of the failure.

[ROCm/clr commit: 2305f8ae56]
2025-08-12 19:04:36 -04:00
Betigeri, Sourabh 40999496c1 SWDEV-545273 - Respect HIP_LAUNCH_PARAM_BUFFER_SIZE (#770)
[ROCm/clr commit: 2a02d2c2f3]
2025-08-03 17:32:52 -07:00
Kudchadker, Saleel cd14def193 SWDEV-521647 - Fix tracking of hw_event (#206)
- When a command may possibly have two packets(like device heap
  initializer), and if there is no signal on the main kernel packet the
tracking was broken as it marked HW event of the command as the first
packet signal.
- Make sure if no completion signal is attached to the second packet
  then clear the HW event for the command.

[ROCm/clr commit: 072fb0804e]
2025-04-25 08:46:44 -07:00
Saleel Kudchadker c8f39ec2b0 SWDEV-502365 - Track last used command
- This change tries to save extra synchronization packets we may insert
  as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
  passed as the argument

Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc


[ROCm/clr commit: e03e4f3b5d]
2025-03-04 16:05:02 -05:00
Anusha GodavarthySurya 08c92f4793 SWDEV-480209 - Make internal callbacks non-blocking
Change-Id: Ic918d08f341abfd9a7c167d09f9c723cdc43157f


[ROCm/clr commit: 683a942364]
2025-01-10 02:16:11 -05:00
German Andryeyev 4a2687a450 SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293


[ROCm/clr commit: 6bb7d1afdc]
2024-10-18 11:35:54 -04:00
German Andryeyev 0a03665a3f SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396


[ROCm/clr commit: 8657a77029]
2024-10-17 10:53:57 -04:00
German Andryeyev faea40cbb3 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9


[ROCm/clr commit: 364dfb0ed1]
2024-10-11 14:50:25 -04:00
Chong Li 4979c2f206 SWDEV-478929 - Benchmark ReallyQuickPureX Failed
Ensure the member function Alloc() and Free() of command_pool_ will not be
accessed after command_pool_ be destructed.

Signed-off-by: Chong Li <chongli2@amd.com>
Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194


[ROCm/clr commit: e6a5c81221]
2024-09-10 22:08:18 -04:00
Ioannis Assiouras 19d16561a4 SWDEV-472309 - Ensure static maps are destroyed after __hipUnregisterFatBinary
hipDeviceSynchronize called from __hipUnregisterFatBinary
accesses static maps and monitors. This change ensures these ojects
are not destroyed before __hipUnregisterFatBinary  is called.
Additionally it disables the teardown process for static build.

Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d


[ROCm/clr commit: 9b33db9b24]
2024-07-30 10:26:59 -04:00
Saleel Kudchadker bbef85714e SWDEV-470008 - Fix AMD_SERIALIZE_KERNEL
- awaitCompletion code may do a endless spin wait for cases where we
dont submit a handler. One such case can be the hipExt*Launch API which
takes a stop event. In that case we optimize the stop event by attaching
a signal to the dispatch packet but dont submit a handler when we attach
the signal. That means if awaitCompletion() is called after that, we
would keep on waiting on command status on the host rather than simply
checking signal value.

Change-Id: Ie8bf175aeefa3f9e4299b1ae7ae9108dad67e283


[ROCm/clr commit: 561fb8a459]
2024-07-02 19:05:05 -04:00
Ioannis Assiouras af089a2171 SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1


[ROCm/clr commit: 3edf1501cc]
2024-06-12 16:22:27 -04:00
kjayapra-amd 41cb6dadf9 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543


[ROCm/clr commit: 892071aeb2]
2024-06-06 16:57:53 -04:00
Ioannis Assiouras 0e023d1a0a SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb


[ROCm/clr commit: b8c2ac4de4]
2024-06-06 04:05:55 -04:00
Saleel Kudchadker 1c94521c1c SWDEV-463428 - Acquire correlation ID after clear
Change-Id: I472085178d5751f5e2c8a6dfe190b6b3249317f0


[ROCm/clr commit: ecff928284]
2024-06-06 03:49:01 -04:00
German Andryeyev 68344576d3 SWDEV-460242 - Add system memory suballocator
Switch commands creation to the new suballocator to avoid
frequent expensive OS calls

Change-Id: I3597c811820e577c15708bad8b8a41aa53acc400


[ROCm/clr commit: 5b0bfdcbad]
2024-05-28 06:28:17 +00:00
German Andryeyev 7eaba0bd33 SWDEV-440746 - Don't set CL_SUBMITTED twice
Change-Id: I9ba34454f7487d6bc0d398b322a147cbac6c6443


[ROCm/clr commit: fd81490bb8]
2024-04-19 17:36:51 -04:00
Saleel Kudchadker f3aedfbec0 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006


[ROCm/clr commit: c157bfb202]
2024-03-26 14:47:24 -04:00
Saleel Kudchadker cc0b04cc60 SWDEV-301667 - Reset profiler correlation_id_
- The correlation_id had random junk values which we were inserting in
the dispatch AQL packet even when no profiler was attached but if we had
a valid timestamp.
- Also make sure we dont even write the reserved2 field in the AQL
packet if no profiler attached.

Change-Id: Icdb7493198c1bb5e2d786a97e027288660854cd7


[ROCm/clr commit: 9a6ddae7b2]
2024-02-05 05:08:11 +00:00
Saleel Kudchadker 19ea94729c SWDEV-422207 - Report TS for Accumulate command
Change-Id: Iba193a6068c1a2d25c2136643faee2c1e2591a07


[ROCm/clr commit: f5c6fc4dfa]
2023-11-07 18:19:40 +00:00
Saleel Kudchadker 5f009b7cb1 SWDEV-422207 - Track commands for capture
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues

Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f


[ROCm/clr commit: 40f41f4d0b]
2023-11-03 05:09:04 +00:00
Saleel Kudchadker 1d4bd084b8 SWDEV-301667 - Cleanup unused paths
- Refactor code and cleanup logic for callback saving for event records

Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89


[ROCm/clr commit: 1338ff37e8]
2023-11-02 11:43:41 -04:00
jiabaxie f25e5e01f3 SWDEV-405983 - adding in HIP_LAUNCH_BLOCKING
Change-Id: I3f9c8a745099aab05155ebe910e727693961a02f


[ROCm/clr commit: 28f0daa34f]
2023-10-10 21:11:13 -04:00
Anusha GodavarthySurya 0404df28ef SWDEV-422207 - Capture AQL Packets for graph Kernel nodes during graph Inst. And enqueue AQL packet during launch
Change-Id: I1e5f7f9e2a70bd500d190193cb6ba0867f5a63e7


[ROCm/clr commit: e63c280d4d]
2023-10-05 00:34:29 -04:00
German 5d9912f48b SWDEV-407533 - [ABI Break]Remove Wavelimiter
Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd


[ROCm/clr commit: 7be3a5e33e]
2023-09-07 09:58:41 -04:00
Anusha GodavarthySurya 57467ef2c7 SWDEV-392732 - Initial commit for graph doorbell optimization(AQL Buffering)
Change-Id: I451725006c54c249dc530c55d2af2a31594bf49b


[ROCm/clr commit: b0e6f99ad7]
2023-07-16 07:56:00 -04:00
Ioannis Assiouras 1ad1fccdfa SWDEV-385050 - Fixed possible invalid queue access from kernelCommand::releaseResources
Change-Id: I7c5d99987cb7ab4fa0aa634f2bb6a4d60331b3af


[ROCm/clr commit: 2e9f6fb49b]
2023-02-23 16:39:27 +00:00
Saleel Kudchadker 858e311f34 SWDEV-364604 - Add ROCclr support for hipEventDisableSystemFence
Change-Id: I6127b432a8759359359a1890fda85bc401be6a56


[ROCm/clr commit: 3e603d986a]
2023-02-21 19:07:35 -05:00
German b5b078e036 SWDEV-377991 - Remove liquidflash support
Change-Id: Iba6455e5c0210c3223a06fec332404cd9f489154


[ROCm/clr commit: 53a10c9039]
2023-01-20 09:57:06 -05:00
German 2143e64c23 SWDEV-377991 - Remove Liquidflash extension
Initial check-in to untie dependencies with HIP and OCL repos

Change-Id: I363b63954c3f118f40a6ed893545d6a4ac44144c


[ROCm/clr commit: c8927cd84e]
2023-01-18 13:16:20 -05:00
Sourabh Betigeri 7aa958a8f7 SWDEV-305894 - Cooperative groups grid and multi grid sync support for gfx940+
Change-Id: I35d72f1cb50c3a96eee56a612b72d641852b145f


[ROCm/clr commit: 5d7f3f9f3c]
2022-12-05 16:30:30 -05:00
Laurent Morichetti 0cd3ec5056 SWDEV-351980 - Consolidate registration tables in the roctracer library
Remove the activity_prof::CallbacksTable. The table was redundant with
the information already stored in the roctracer library. Instead use a
single callback into the roctracer library to query whether the activity
is enabled, and to report it.

Change-Id: I2e05b0881bb4a1953c14361d00ea310d02eb6e0c


[ROCm/clr commit: 52eb28930a]
2022-09-21 05:54:09 -04:00
Laurent Morichetti 353f9bc86c SWDEV-351980 - Enable profiling for commands reporting activities
Profiling should be enabled for any command reporting activities as the
activity record captures the profilingInfo's start and end timestamps.

Since IS_PROFILER_ON is only used to determine whether API tracing is
enabled, there is no need to expose it globally, it should be a property
of the activity_prof::CallbacksTable.

Change-Id: I44a0d19ed2862606cfbc9a98c1a07a336ab7e26c


[ROCm/clr commit: e713b5c7d0]
2022-09-21 05:53:59 -04:00
Laurent Morichetti cbcc94b9e3 SWDEV-351980 - Move activity_ to the ProfilingInfo
The activity_ is only instantiated if profiling is enabled.

Remove the HIP private global record ID. Instead, use the correlation ID
stored in the hip_api_data_t by the profiler while the last HIP function
is in scope.

For NDRange and Copy commands, store the kernel name and byte size
(respectively) in the record.

General cleanups to improve the code's readability.

Change-Id: I01907484b0d9611eb9440c3a7c4865479dc42289


[ROCm/clr commit: 4fbae91468]
2022-09-21 05:53:47 -04:00
Anusha Godavarthy Surya b66ec1a031 SWDEV-345683 - Fix HIP out of memory
If for every eventRecord handler is not submitted,
memory is not getting released during hipFree and leads to OOM.

Change-Id: I19b61a0c523502e9e1a3564ce8b791f3e2cea02c


[ROCm/clr commit: 7b1c6d06d5]
2022-07-28 07:36:38 -04:00
Ajay 6596275caf SWDEV-337331 - command queue logs for debugging option
Change-Id: I198aecc5fd12369d87d4acc9910acc9435c1967a


[ROCm/clr commit: 236178d0d4]
2022-06-22 19:41:38 +00:00
Sarbojit Sarkar 8f863abe02 SWDEV-325379 - Fix for remote copy crash
Change-Id: I22152c0b3538cf7cfc80f82505bc255c01d98f7b


[ROCm/clr commit: 356e22f910]
2022-06-16 23:59:11 -04:00
Saleel Kudchadker d9c2aee526 SWDEV-334152 - Set release as systemscope
Set release scope as system for dispatch AQL when events are passed to
hip*LaunchKernelGGL*

Change-Id: I93b91591e0ab023f1ecc5247f7905eca26147358


[ROCm/clr commit: 02566677cf]
2022-04-29 13:19:29 -04:00
Saleel Kudchadker 29752a2bbc SWDEV-334150 - Force callback to cycle commands
Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every
64 submits, This recycles the memory if we dont end up calling
synchronize for the longest time.

Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c


[ROCm/clr commit: fa76f03654]
2022-04-28 12:30:23 -04:00
Saleel Kudchadker cad3dfe4ec SWDEV-301667 - Separate scope from marker_ts_
Change-Id: I19f4d394e898bfb8c9d9a2c2edf9d5bf5def3b08


[ROCm/clr commit: b6cbfaf499]
2022-04-16 19:26:31 -04:00
Saleel Kudchadker 3d0100c5ab SWDEV-301667 - Add cache state for a device
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker

Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0


[ROCm/clr commit: 8eeaa998c0]
2022-04-12 12:27:31 -04:00
Saleel Kudchadker 4dbec887a2 SWDEV-301667 - Selectively queue handler
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.

Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a


[ROCm/clr commit: 3c3c0ca4c5]
2022-03-24 19:46:28 -04:00
haoyuan2 248a738674 SWDEV-290298 - add a flag to indicate the primary context active status
Change-Id: Ia31790706d3f855bc1eedf5ef874e471


[ROCm/clr commit: 439af94dd9]
2021-12-09 23:28:54 -05:00
Sarbojit Sarkar 4630f3ade0 SWDEV-314254 - Fix for hipMemcpy3D test crash
Change-Id: Iac70bfe0d351cfb5b56fefc9a6487d3f26f2b4ef


[ROCm/clr commit: aedbad0109]
2021-12-09 11:46:52 -05:00