Граф коммитов

284 Коммитов

Автор SHA1 Сообщение Дата
Kudchadker, Saleel 072fb0804e SWDEV-521647 - Fix tracking of hw_event (#206)
- When a command may possibly have two packets(like device heap
  initializer), and if there is no signal on the main kernel packet the
tracking was broken as it marked HW event of the command as the first
packet signal.
- Make sure if no completion signal is attached to the second packet
  then clear the HW event for the command.
2025-04-25 08:46:44 -07:00
Kudchadker, Saleel ce24936970 SWDEV-510186 - Improve logging (#220)
- Print all arguments for logs, this is useful for debug
2025-04-25 08:40:31 -07:00
Andryeyev, German 4c363df3bf SWDEV-517481 - Add more restrictions to the queue management (#168) 2025-04-10 21:51:45 +05:30
Patel, Jaydeepkumar 9e7248aa36 SWDEV-521011 - Allow max stack size as per ISA. (#73) 2025-04-08 10:15:38 +05:30
Arandjelovic, Marko e7ada4effe Revert SWDEV-512344 - Unmap all subbuffers (#26)
This reverts commit 0b69120cfcb5b4689d9f2037b1a01e274d85c20f.
2025-03-19 21:17:36 +05:30
Andryeyev, German 28967982b2 SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature
2025-03-19 11:22:50 -04:00
Saleel Kudchadker e03e4f3b5d SWDEV-502365 - Track last used command
- This change tries to save extra synchronization packets we may insert
  as we didnt track the completion signals for every command. We track
the current enqueued command until it exits the enqueue stage. We also
record the exit scope to know if we flushed the caches
- Handle correct release scopes and store completion signal as HW events
- Use a new finishCommand implementation to only wait for the command
  passed as the argument

Change-Id: Ie4350c5dd24f5d48dfa6ccbabd892f0544caadcc
2025-03-04 16:05:02 -05:00
Marko Arandjelovic 3ec1d2d2f1 SWDEV-512344 - Unmap all subbuffers
Since hipMemMap can be called for multiple device handles on the same virtual memory, the same is true for hipMemUnmap, meaning that virtual memory can be "partially unmapped".

This means that the unmap function can be called for a specific part of the reserved address, meaning that only the designated subbuffer should be released. If unmap is called on the entire reserved memory, then all subbuffers should be released.

The main point is that for every hsa_amd_vmem_map, there should be a corresponding hsa_amd_vmem_unmap. Otherwise, if entire memory is unmapped by a single unmap call, then HSA will report the memory as "in use" if an attempt is made to delete it.

Change-Id: I039308eafb820decfb1c09f60347f26cdad1a362
2025-03-02 13:41:48 -05:00
Saleel Kudchadker ca530c660b SWDEV-513197 - Improve launch perf for Device Heap kernels
- If any kernel uses device heap, the launch needs to be preceeded by an
  init kernel, Save on the extra barrier packet launch/flush between the
init heap kernel and user kernel

Change-Id: I8ebc6246188200e5f673dc464bc76a53bcb8b7c6
2025-02-27 19:17:51 -05:00
Jimbo Xie 4872b420c9 SWDEV-504383 - Cleaned up kForcedTimeout10us and removed IsHwEventReadyForcedWait
Also removed active_wait_timeout

Change-Id: I7a429f003c09a4df267b5c0983050704260094c6
2025-01-31 14:40:18 -05:00
kjayapra-amd 0324014710 SWDEV-509280 - Combine multiple definitions of callbackQueue into a single function.
Change-Id: Ibbb56136bec2beed71c202d75e8aec9e82640a4e
2025-01-30 15:58:11 -05:00
Saleel Kudchadker 2d450e8b06 SWDEV-504494 - Resolve signal dependencies
- Resolve signal dependencies for barrier value packet if there are > 1
  depenent signals. Barrier Value packet accounts for only 1 dep signal
- Better log

Change-Id: Ia506ad5d80b91d598f92e7b539f41756e9b4b64b
2025-01-29 19:49:02 +00:00
German Andryeyev ea0b092af8 SWDEV-459826 - Add a crash dump for a failed queue
The logic can analyze the AQL queue state and
find a failed AQL packet with the kernel's name

Change-Id: I1a478fa2c25462cd07a194784958bdf22454b897
2025-01-28 14:27:46 -05:00
Ioannis Assiouras 21c223f8df SWDEV-510319 - Fixed random segfaults in graph tests
This change fixes random segfaults in graph tests that
are seen after the change make internal callbacks non-blocking.
The callback thread that decreases the GraphExec ref count
may now run after the runtime shutdown. This can cause a segfault
because the hip::device that is accessed in GraphExec destructor
is already destroyed during runtime shutdown. This patch ensures
that the hip::device object  stays alive until after the
callback thread completes.

Change-Id: I75a6ac01f27a0b2250bbd10ed389ebfb322927af
2025-01-25 09:54:15 -05:00
Saleel Kudchadker 9b7e0ad48a SWDEV-510186 - Improve logging of kernel names
- Demangle kernel names in logs

Change-Id: I9aa58e8c109becb45ef7fc747d991bd657c4190a
2025-01-24 11:43:02 -05:00
Anusha GodavarthySurya 683a942364 SWDEV-480209 - Make internal callbacks non-blocking
Change-Id: Ic918d08f341abfd9a7c167d09f9c723cdc43157f
2025-01-10 02:16:11 -05:00
Pengda Xie 8155943c5f SWDEV-505833 - Provide functionality to avoid L2 flush for CPX mode for dispatch packets
- Added DEBUG_CLR_SKIP_RELEASE_SCOPE flag to force release scope to
   SCOPE_NONE in AQL packet header

Change-Id: Ife02cddb9d5cd4749103ce585d3d5fe9024c6868
2025-01-03 17:28:21 -05:00
Pengda Xie 078fe7e5de SWDEV-503764 - Add wptr and rptr to ClPrint for dispatch barrier methods
- added wptr and rptr to ClPrint in dispatchBarrierPacket and dispatchBarrierValuePacket

Change-Id: I8a62289deb23c9f657a9b0ac6138bb55eafecba2
2024-12-16 16:45:30 -05:00
Ioannis Assiouras a808c4b23a SWDEV-489255 - Update stack size limit in rocvirtual
Change-Id: I2aac9d211f64b3d6c121d8b010d215dcbdeac3aa
2024-12-16 09:30:39 -05:00
Saleel Kudchadker 93f1e8ff60 SWDEV-301667 - Clear dispatch indicator signal flag
Change-Id: I9028df0bb73289791d169e7f064a1d0f615236a5
2024-12-12 21:20:05 -05:00
Sourabh Betigeri 03dbcd8ca7 SWDEV-440866 - [hip-roclr] Adds support to batch memory operations APIs
Change-Id: I5ac63a6626af8c2b4ac382c52dfe1aaf0b3716b8
2024-12-12 19:29:24 -05:00
Michael Xie cfcc743824 SWDEV-499997 - Unify ManagedBuffer and KernelArg buffer implementation
Change-Id: I95421c87904dd62d7ee214539a57c7bda1097ff4
2024-12-12 12:56:23 -05:00
German Andryeyev 14f58fc74d SWDEV-501757 - Clean-up signal creation
Use hsa_amd_signal_create() if settings.system_scope_signal_ is true.

Change-Id: I6d440155dfbcd5bf03658583a93827cb1c56537c
2024-12-11 09:57:50 -05:00
German Andryeyev f4b9d3b7bd SWDEV-501757 - Use signals without interrupts
In active wait mode use signals without interrupts by default and switch
to the interrupts only if a callback is required.

Change-Id: Ibcde8f7d44c70f8fb8fa5e0a7fdd8b08a2982a8e
2024-12-09 15:16:15 -05:00
Saleel Kudchadker 7863eb92dc SWDEV-497145 - Use rocr copyOnEngine API for staged copies
- Refactor blit code and clean ASAN instrumentation
- Use unified function for rocr copy
- Enable shader copy path for unpinned writeBuffer/readBuffer paths
- Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for
  pinned copies or unpinned H2D/D2H copies < 16KB

Change-Id: I42045cca79234b340dbf53dafb93044199736ae4
2024-12-04 13:38:13 -05:00
Sourabh Betigeri 2ca644cf22 Revert "SWDEV-440866 - [hip-roclr] Adds support to batch memory operations APIs"
This reverts commit bd5d8e9baf.

Reason for revert: hipInfo fails on windows. Updating llvm amd-mainline-closed

Change-Id: I57e1fa1945188b0bc0a799c4f3d540f2b7713003
2024-12-02 16:46:12 -05:00
Sourabh Betigeri bd5d8e9baf SWDEV-440866 - [hip-roclr] Adds support to batch memory operations APIs
Change-Id: I449ffca44bbb04d13348d112e896d603c70fd485
2024-11-30 17:54:32 -05:00
Anusha GodavarthySurya 9820480cbd SWDEV-491643 - AQL packets are captured for kernels disable sdma profiling
hsa_amd_profiling_async_copy_enable is taking 45us for the first call. Disable sdma profiling for enqueuing captured kernel packets and for accumulate command.
Change-Id: I80b51a58c46bccc9c1025e9331515f57c97b5a2a
2024-11-26 08:37:31 -05:00
Saleel Kudchadker 2273a1dbdc SWDEV-497886 - Fix unaligned size copy for kernel args
Change-Id: If6675b98178aeb35f376d6994555cbf941b048c3
2024-11-21 14:30:04 -05:00
German Andryeyev e2eeb20c00 SWDEV-494231 - Revert TS optimization
Runtime may use checkGpuTime() for the wait and not just for the GPU time queries. Hence, the call can't be skipped if profiling isn't enabled.
More changes are required for this optimization.

Change-Id: I79e8918312e755d75f0d26685f2fdc604a8ffb18
2024-11-19 10:17:38 -05:00
German Andryeyev 403f624bf8 SWDEV-486602 - Add tracking of HSA handlers
Add an atomic counter to track the outstanding HSA handlers.
Wait on CPU for the callbacks if the number exceeds the value
in DEBUG_HIP_BLOCK_SYNC env variable.

Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5
2024-10-25 15:20:50 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Todd tiantuo Li 41dc4545fc SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL
Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb
2024-10-10 18:00:19 -04:00
Jaydeep Patel 5ccc140e1b SWDEV-485866 - Return OOM if stream creation fails due to insufficient memory.
Change-Id: I4e57ecc81921bde274bb6a4e0890f0fc6a17955a
2024-10-10 00:44:54 -04:00
Saleel Kudchadker d3d0ca5fc6 SWDEV-478065 - Revert "SWDEV-478065 - Embed host thread in shared_ptr"
This reverts commit 4b03017e8a.

Reason for revert: This blocks multithreaded callbacks

Change-Id: I9944417e4fb63c9eea2b286c828c7dfa621c4fe8
2024-10-04 19:19:28 -04:00
Vladana Stojiljkovic da5f1a6146 SWDEV-482086 - Fix hipGraphInstantiate leak
* In a scenario where kernel is launched with hipExtLaunchKernelGGL and stop event is used, hipGraphInstantiate leaks. Since stop event is used, profiling is enabled and Timestamp (ReferencedCountedObject) is created, but it doesn't get released.
* The idea behind this solution is that profiling should be disabled when command is captured, hence the timestamp should not be created. Because information about capturing isn't available when kernel command is created, packet capturing state is used to determine whether to create a timestamp or not.

Change-Id: Ia23adac4592ded4fb5e236acf99e12e729f63692
2024-09-29 11:36:53 -04:00
German Andryeyev 29cc678d8d SWDEV-483586 - Unblock staging H2D transfers
Although unpinned copies require synchronizations
in HIP, runtime can avoid syncs for H2D copies with
a staging buffer

Change-Id: If2203c6bc0cbd89742823688dc8e89e9acd873b2
2024-09-21 10:25:27 -04:00
Jatin Chaudhary 4b03017e8a SWDEV-478065 - Embed host thread in shared_ptr
This shows up in some valgrind runs. Make sure the resources are
released.

Change-Id: I34c25c00370a221585895655744831215136d5f4
2024-09-17 09:53:51 -04:00
Saleel Kudchadker abc80fcc2f SWDEV-301667 - Improve kernel logging
Change-Id: I4b2b1950e3ab7124fd41af9a92a677c48d6da5eb
2024-09-10 13:43:58 -04:00
Saleel Kudchadker 62a7fed90d SWDEV-481974 - Clear dependent signal bit for barrier value
Change-Id: I3ffda051fa8538970fbb1964beb1f538fce0782c
2024-09-10 13:43:04 -04:00
Rahul Manocha ddbd7039b0 SWDEV-478921 - Destroy Queue created by Coop Launch
Change-Id: I7f31ce05421479ff1de138cae26aafa071e956e2
2024-09-02 02:35:08 -04:00
Julia Jiang 417d3279f9 SWDEV-476623 - correct the format on the fix for clCopyImage
Change-Id: I3a3fb2eaa338ff4e298a43e583fcf94ec7cabdf6
2024-08-28 16:16:24 -04:00
Julia Jiang c3c41dae0d SWDEV-476623 - Fix test failures for clCopyImage
Change-Id: I971c5be98304bdbef0feec73e15ebd61a131b12f
2024-08-27 11:43:12 -04:00
kjayapra-amd 00eb038eec SWDEV-479620 - Change argument type to size_t from uint64_t in nonTemporalMemcpy function.
Change-Id: I31f8a2b00685789b027d78be40a9f82c235f51b9
2024-08-24 07:42:37 -04:00
Shane Xiao 3959b5be1e [SWDEV-479204] Fix the hipGraph AQL package fill issue
This patch fixes this potential issue that filling AQL header before
filling the AQL body. The hsa spec specifies "Packet processors may
process AQL packets after the packet format field is updated, but
before the doorbell is signaled."
However, the hipGraph AQL package with valid header will be filled
before fill the body, which may have the potential issue that CP
receive invalid AQL body.

Change-Id: I84af798c19ee2b8805ba19732b0eabdea2958a96
2024-08-21 21:49:11 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya bd3a35bde1 SWDEV-468424 - Add support to capture multiple AQL Packets
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.

Change-Id: I7e1c2ba06927459e024653058af142bd82192c43
2024-08-01 23:55:51 -04:00
German Andryeyev 18187cd8fe SWDEV-470612 - Avoid processing internal signals
If only external signals were provided, then just process it
without adding internal signals

Change-Id: Iaefd65d0f8b0a64b9f6a864a9bd73de20a29dfa4
2024-07-25 10:08:16 -04:00
Anusha GodavarthySurya 346da4bb40 SWDEV-468424 - hipgraph capture memset node
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.

Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..

Change-Id: I14dfbc41731025cc3a548a730558915def3fa384
2024-07-19 23:52:50 -04:00
Ioannis Assiouras ea50d2c0c2 SWDEV-469825 - Modified the kernel argument readback to use a pointer to volatile
This change modifies the readback mechanism to use a pointer to volatile
instead of a volatile pointer. This ensures that the compiler does not
optimize away the read operation.

Change-Id: I79ff925d615aa8cc4f950e8ff4b7e608fcb179a4
2024-07-09 17:28:47 -04:00