Since the majority of the Hostcall implementation now sits in the
commmon layer, the PAL backend simply just needs to invoke it. One thing
that is missing though is HSA signal support.
The newly added pal::Signal class is a light emulaion of what HSA
signals provide. The current implementation is just enough to get
Hostcall working, but it can be expanded in the future if needed to
fully emulate HSA signals.
The major difference for now between PAL and ROCm hostcall
implemenations is that PAL doesn't support blocking signals. This will
be enabled in the near future. For now use active wait for PAL.
Change-Id: I746557354ab9d71a7d4a31f9320fcc2fee5aee7f
[ROCm/clr commit: 99e8ac55cd]
HIP requires to return AccessedBy query for all device, but ROCr
can process one per query. Hence send the queries for all
available devices and then accumulate the results in runtime.
Change-Id: I082f9adb8e31c775a8ad1bf7a5af37440ef4bd16
[ROCm/clr commit: e9c484d1ce]
Enabling DebugVMID requires a certain sequence in
PAL initialization. StartLateDeviceInit() must be called before
CommitSettingsAndInit().
Change-Id: I7385a8cc89e7a8ad97a6b56ad6acbd2cf2f29728
[ROCm/clr commit: dee99ca807]
HIP tests require HIP callbacks to be processed in another thread.
This change will use a thread from HSA signal callbacks to make
sure a HIP callback was done asynchronously.
Also process the callback before changing the status of command
Change-Id: Icef85d0e0f808663882cf6881ff1be3e5eca29ac
[ROCm/clr commit: 7f32d0b425]
- Don't notify if the batch is empty, because that means
the current command was processed already.
- Disable pinning optimization to avoid a race condition on stall.
- TS marker submition requires extra AQL barrier
to track the status.
Change-Id: I17eff4ad12ac66cfe1bb44048bebb1891805279d
[ROCm/clr commit: 24299e25bd]
1. Fix the size of the memory when releasing.
2. Make sure we only count the device memory
Change-Id: Ib4dcda79f313c4ee9cc1c7bab53f8076bce5f583
[ROCm/clr commit: 639d67866c]
Skip notification for markers with direct dispatch only,
since they are blocking always
Change-Id: I6bb17650f73371dae6e29c59fd6bb2012cc062fd
[ROCm/clr commit: a9b0e20d26]
Since the allocation can be a suballocation, we should print both the
VA range for the allocation and the underlying memory object.
Change-Id: Ic9c707bbb78113b366d1b2c688e6fd33bdc8fd94
[ROCm/clr commit: 9e8a2f3266]
Direct disaptch doesn't insert extra barriers for Markers if
AQL barrier was the last issued command already.
Change-Id: I00fbc658547d83dd3ee64ec391ed50e5f8a08e30
[ROCm/clr commit: 0587fb7450]
The settings need to be populated ASAP, otherwise the dummy context is
not created properly.
Change-Id: Iede0066308bb601dc68164e894775a646a0372f1
[ROCm/clr commit: 263173914f]
GPU waits have noticeable overheads on compute with extra
AQL barrier packet and on SDMA with power saving features. This
change introduces a wait on CPU for 30 us in case the app has tiny
operations.
Change-Id: I761ba3af595f3f48544980058a9077dda15aa5f9
[ROCm/clr commit: ac387f9b03]