Γράφημα Υποβολών

254 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
German Andryeyev 403f624bf8 SWDEV-486602 - Add tracking of HSA handlers
Add an atomic counter to track the outstanding HSA handlers.
Wait on CPU for the callbacks if the number exceeds the value
in DEBUG_HIP_BLOCK_SYNC env variable.

Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5
2024-10-25 15:20:50 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Todd tiantuo Li 41dc4545fc SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL
Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb
2024-10-10 18:00:19 -04:00
Jaydeep Patel 5ccc140e1b SWDEV-485866 - Return OOM if stream creation fails due to insufficient memory.
Change-Id: I4e57ecc81921bde274bb6a4e0890f0fc6a17955a
2024-10-10 00:44:54 -04:00
Saleel Kudchadker d3d0ca5fc6 SWDEV-478065 - Revert "SWDEV-478065 - Embed host thread in shared_ptr"
This reverts commit 4b03017e8a.

Reason for revert: This blocks multithreaded callbacks

Change-Id: I9944417e4fb63c9eea2b286c828c7dfa621c4fe8
2024-10-04 19:19:28 -04:00
Vladana Stojiljkovic da5f1a6146 SWDEV-482086 - Fix hipGraphInstantiate leak
* In a scenario where kernel is launched with hipExtLaunchKernelGGL and stop event is used, hipGraphInstantiate leaks. Since stop event is used, profiling is enabled and Timestamp (ReferencedCountedObject) is created, but it doesn't get released.
* The idea behind this solution is that profiling should be disabled when command is captured, hence the timestamp should not be created. Because information about capturing isn't available when kernel command is created, packet capturing state is used to determine whether to create a timestamp or not.

Change-Id: Ia23adac4592ded4fb5e236acf99e12e729f63692
2024-09-29 11:36:53 -04:00
German Andryeyev 29cc678d8d SWDEV-483586 - Unblock staging H2D transfers
Although unpinned copies require synchronizations
in HIP, runtime can avoid syncs for H2D copies with
a staging buffer

Change-Id: If2203c6bc0cbd89742823688dc8e89e9acd873b2
2024-09-21 10:25:27 -04:00
Jatin Chaudhary 4b03017e8a SWDEV-478065 - Embed host thread in shared_ptr
This shows up in some valgrind runs. Make sure the resources are
released.

Change-Id: I34c25c00370a221585895655744831215136d5f4
2024-09-17 09:53:51 -04:00
Saleel Kudchadker abc80fcc2f SWDEV-301667 - Improve kernel logging
Change-Id: I4b2b1950e3ab7124fd41af9a92a677c48d6da5eb
2024-09-10 13:43:58 -04:00
Saleel Kudchadker 62a7fed90d SWDEV-481974 - Clear dependent signal bit for barrier value
Change-Id: I3ffda051fa8538970fbb1964beb1f538fce0782c
2024-09-10 13:43:04 -04:00
Rahul Manocha ddbd7039b0 SWDEV-478921 - Destroy Queue created by Coop Launch
Change-Id: I7f31ce05421479ff1de138cae26aafa071e956e2
2024-09-02 02:35:08 -04:00
Julia Jiang 417d3279f9 SWDEV-476623 - correct the format on the fix for clCopyImage
Change-Id: I3a3fb2eaa338ff4e298a43e583fcf94ec7cabdf6
2024-08-28 16:16:24 -04:00
Julia Jiang c3c41dae0d SWDEV-476623 - Fix test failures for clCopyImage
Change-Id: I971c5be98304bdbef0feec73e15ebd61a131b12f
2024-08-27 11:43:12 -04:00
kjayapra-amd 00eb038eec SWDEV-479620 - Change argument type to size_t from uint64_t in nonTemporalMemcpy function.
Change-Id: I31f8a2b00685789b027d78be40a9f82c235f51b9
2024-08-24 07:42:37 -04:00
Shane Xiao 3959b5be1e [SWDEV-479204] Fix the hipGraph AQL package fill issue
This patch fixes this potential issue that filling AQL header before
filling the AQL body. The hsa spec specifies "Packet processors may
process AQL packets after the packet format field is updated, but
before the doorbell is signaled."
However, the hipGraph AQL package with valid header will be filled
before fill the body, which may have the potential issue that CP
receive invalid AQL body.

Change-Id: I84af798c19ee2b8805ba19732b0eabdea2958a96
2024-08-21 21:49:11 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya bd3a35bde1 SWDEV-468424 - Add support to capture multiple AQL Packets
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.

Change-Id: I7e1c2ba06927459e024653058af142bd82192c43
2024-08-01 23:55:51 -04:00
German Andryeyev 18187cd8fe SWDEV-470612 - Avoid processing internal signals
If only external signals were provided, then just process it
without adding internal signals

Change-Id: Iaefd65d0f8b0a64b9f6a864a9bd73de20a29dfa4
2024-07-25 10:08:16 -04:00
Anusha GodavarthySurya 346da4bb40 SWDEV-468424 - hipgraph capture memset node
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.

Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..

Change-Id: I14dfbc41731025cc3a548a730558915def3fa384
2024-07-19 23:52:50 -04:00
Ioannis Assiouras ea50d2c0c2 SWDEV-469825 - Modified the kernel argument readback to use a pointer to volatile
This change modifies the readback mechanism to use a pointer to volatile
instead of a volatile pointer. This ensures that the compiler does not
optimize away the read operation.

Change-Id: I79ff925d615aa8cc4f950e8ff4b7e608fcb179a4
2024-07-09 17:28:47 -04:00
Sourabh Betigeri 9d628a4a3d SWDEV-470703 - Avoids a potential segfault caused by nullptr dereferencing
Change-Id: If80b00b41869076c18651995c46f89095e7266f9
2024-07-02 12:22:29 -04:00
Anusha GodavarthySurya 57156c524d SWDEV-467102 - Hidden heap init for graph capture
If the graph has kernels that does device side allocation,  during packet capture, heap is
allocated because heap pointer has to be added to the AQL packet, and initialized during
graph launch.

Handle race with wait when 2 kernels with device heap are enqueued on multiple streams.

Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688
2024-06-17 02:07:25 -04:00
Ioannis Assiouras d44f44a5b1 SWDEV-467069 - Added safety check in activity prof for accumulate command
Adding a safety check prevents an invalid memory access
if timestamps and kernelNames vectors are of different size.

The patch also moves the addKernelNames for the accumulate command
into dispatchAqlPacket function.

Change-Id: Iea0927e1253800403a1ae3f3d72de1e7d96476c3
2024-06-12 21:53:03 +01:00
Jacob Lambert 4e77806ee5 SWDEV-1 - [NFC] Fix typo in Log output
Change-Id: Ibcd779fa5ed1d8eda0241c7cd2531af4b60cf33f
2024-06-11 15:08:52 -04:00
Ioannis Assiouras 055e05a12a SWDEV-466601 - Fix invalid mem acccess in kernarg readback path
Change-Id: I4654ae592adc8cf9c687136d45eb1b28d99c7ae1
2024-06-10 15:13:05 +01:00
Ioannis Assiouras 775dc204aa SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd
Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd
2024-06-07 12:23:06 -04:00
Ioannis Assiouras b8c2ac4de4 SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb
2024-06-06 04:05:55 -04:00
Sourabh Betigeri a9f05e22db Revert "SWDEV-301667 - Disable HostBlit copy for HIP"
This reverts commit 5447cf8872.

Reason for revert: SWDEV-455075, SWDEV-461507 - This change forces to
use ROCr's copy path. Reintroducing hostBlit copy path for
host-to-host copies.


Change-Id: Ic3c45b49e481c9dcdaa7611f61071778790b7e6c
2024-05-28 06:31:10 +00:00
Saleel Kudchadker 72d23a02c5 SWDEV-301667 - Better log
- Print kernelname for graph launches, its hard to correlate packets
otherwise
- Print correlation_id if any

Change-Id: Ib8db7a00e4e7c98f570e71029e61d86f5dccc2ed
2024-05-28 06:31:10 +00:00
Saleel Kudchadker 1ba74c3ce3 SWDEV-451594 - Fix HDP reg readback
Change-Id: I478a968330f85c3b60ff39fb40bf3cd91acd610e
2024-05-28 06:31:10 +00:00
Saleel Kudchadker badf2b0880 SWDEV-301667 - Refactor graph code
- Remove Last graph node optimization and instead submit a barrier NOP
packet always. This simplifies the code.

Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c
2024-05-28 06:28:17 +00:00
Satyanvesh Dittakavi 3d540ec113 SWDEV-457755 - Add TS only for kernel packets in the Accumulate command
Change-Id: I1b2f01c5763761808f49802fa117abc6306a22aa
2024-05-28 06:28:17 +00:00
Ioannis Assiouras 6cb7b6ec6b SWDEV-451594 - Change device kernel args to use HDP flush by default
The Readback and Avoid HDP Flush memory ordering workaround is
used as a fallback solution only when HDP flush register is invalid

Change-Id: Ic284eba1f95ed22b0270d3abeb904fb902015b1a
2024-05-02 19:35:13 +00:00
Saleel Kudchadker 948ca5a931 SWDEV-301667 - Add LOG_TS mask
- Add LOG_TS mask for printing signal times
- Read raw ticks from signals

Change-Id: Ibdd0bf06c790729f6c65083a4784c97a3c3219e0
2024-04-30 12:24:48 -04:00
German Andryeyev 5c1804aa14 SWDEV-353281 - Corret VA unmap
Make sure graph mempool unmaps VA on release

Change-Id: Id3f1bd8d0115b533ae60aa5ba3676b8bf7e5b961
2024-04-26 09:37:01 -04:00
kjayapra-amd 74ffc5f0d5 SWDEV-413997 - Cleanup fixes for Virtual Memory Management.
Change-Id: I9a4a4d9087b5daf15e3ba31e786d34db431212a1
2024-04-22 10:58:06 -04:00
German Andryeyev 7448113cfc SWDEV-440746 - Remove obsolete code
The "optimized" version of memcpy is outdated and
was used in win32 only.

Change-Id: I7f2e0e9051e37cec95438266824b5b0025c324c6
2024-04-22 09:56:42 -04:00
German Andryeyev 329ba271fa SWDEV-440746 - Wait for signal before release
Change-Id: I9e2aefdbcbba153c7f1080d80aab7a345eaf1eb4
2024-04-19 18:33:28 -04:00
Ioannis Assiouras bf74ef4025 SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args
Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7
2024-04-19 09:29:20 -04:00
kjayapra-amd d52d16c8e6 SWDEV-413997 - Fixing multiple device cases.
Change-Id: I10ad3fbfca887e92cd81f68392fa1acf753cbd2b
2024-04-13 06:14:03 -04:00
Ioannis Assiouras d7f352dbed SWDEV-453301 - Remove the option to write multiple packets in dispatchGenericAqlPacket
Dispatching multiple packets with ring the doorbell once is not supported by the lower layers

Change-Id: I7665a2dcdd4ef9e47dadfe410180fed64c5a4ee0
2024-04-05 05:28:10 -04:00
Saleel Kudchadker c157bfb202 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006
2024-03-26 14:47:24 -04:00
Rakesh Roy d1fff7cea2 SWDEV-445096 - Fix -O0 crash in OpenCL tests
- With https://gerrit-git.amd.com/c/lightning/ec/llvm-project/+/1002628 applied, at -O0 Kernel::dynamicParallelism() returns true but virtual queue isn't created
- This causes segfault inside VirtualGPU::submitKernelInternal() when getVQVirtualAddress() is called

Change-Id: Ia7af042adad2329e870c142caaac3e8fa886f8b8
2024-03-26 11:42:33 -04:00
German Andryeyev f296159f62 SWDEV-353281 - Change pool type for graphs
Under ROCr physical allocations don't have initial VA and require extra
flag in ROCclr. Add an option to have a mempool of physical allocations.

Change-Id: I4d062fe0dd8113d4eaf6e8b51749ed56d8701d1e
2024-03-25 10:21:05 -04:00
Ioannis Assiouras 96f5c44851 SWDEV-451166 - Disable kernel args for non-XGMI if HDP flush register is invalid
Change-Id: I227e046e2b9cb25476a50240f5d070adbd558f21
2024-03-15 05:27:52 -04:00
Saleel Kudchadker f1adecd186 SWDEV-301667 - Use right macros and level for logging
- Sometimes we want to mask out kernel names, use right level for kernel
logging

Change-Id: Ideae9647c57b86ae390ff2f4131f6d8c6df5c086
2024-03-12 19:00:03 -04:00
kjayapra-amd f5ca620baa SWDEV-423835 - Fixing kernel launch issues on Virtual Memory Management path.
Change-Id: I9f5e8a3d83af3809b2c50b21a10697e26113dd23
2024-03-12 17:22:07 -04:00
Saleel Kudchadker 984c86f407 SWDEV-301667 - Better log
- Print SWq for AQL packets, this helps correlating a stream to the HWq
mapped

Change-Id: I610430c0872a1abc6636027c00163ec46983cd65
2024-03-01 16:43:06 -05:00
Saleel Kudchadker 68f40f78dd SWDEV-443760 - Enable device kernel args for MI300
- Enable Device kernel args for MI300* for now.
- Fix a perf issue which impacts graph instantiate when dev kernel args
are enabled.

Change-Id: I962e58fd9d8dd1a8db95e601cb03a8e9c7bac97f
2024-02-28 19:10:04 -05:00
Anusha GodavarthySurya e0e63eb04d SWDEV-447545 - Fix Enable/Disable node with hipGraph
Node can be enabled/disabled only for kernel, memcpy and memset nodes.
If the node is disabled it becomes empty node.
To maintain ordering just enqueue marker with respective node dependencies.

Change-Id: I710f3e88ab4e76c81f6f86a40a7dc61fd4c7e440
2024-02-28 17:34:03 -05:00