Windows kills threads on exit without any notification. However,
runtime can still destroy VirtualGPU object from the host thread with
HostQueue destruction.
This change also forces RGP trace transfer on the last capture without
any delays.
Change-Id: I768e87e99e1d23a021e63c12f36e450817743759
OCL runtime uses WGP mode and total CU count reported in WGP.
Realtime values are still in CUs. That can mislead in the test results.
Report realtime in WGP values and convert to CUs for KMD.
Change-Id: I90b82615640734dd655be2b613ccac3cb8483239
HIP can't rely on the resource tracking, used in OCL and requires different explicit sync.
Make sure ROCCLR syncs compute only when SDMA is used and vise versa.
The new logic will allow to enable CPDMA without unnecessary waits.
Change-Id: Ib9d1788cfd5afa5ea2fec4c96a37d8b9c4d0059d
Introducing a VirtualMemObj map as it is needed to differentiate
between virtual address ranges and actual physical memory
This is because a whole VA range can have several physical memories
as chunks.
Change-Id: Ie2a972b4faf3f7d552cfa53e77898f80ad75740a
This code change is to improve error handling.
This code change does not fix issue itself.
Before this code change, hostcallBuffer_ point is initialized in the end of
create() function. If create function fails and returns early,
hostcallBuffer_ point is not initialized. This non-initialized point can
cause access violation when object is destructed.
This code change put the initialization of the pointer in the constructor.
Change-Id: I7fb6e764eb0547196dca03db237e49d3ff0fd06a
Pass active queue for transfers in the cache coherency layer.
That will allow to use device transfer queue only for
cases when active queue isn't available, because using device
transfer queue from another active queue may cause a deadlock
Change-Id: Ifbe7e0303b77dbf6eeda3939ffbc25a3df7472de
Metadata in Codeobject version 5 is the extension of CO3 and CO4.
Add the detection of the new fields and program them in
the setup of the kernel arguments.
Change-Id: I27e58df77320ad00f4f16d35912668db803826af
HIP should be built with HSAIL support disabled.
Currently HSAILProgram::info() and VirtualGPU::buildKernelInfo() expose
ACL interfaces directly. This should not be allowed.
Change-Id: Iae15d4f19be16806826f2f6cb600752c11f97fc1
This is part 2 of the change. This is for PAL backend.
The parent buffer sometimes has newer data than the sub buffer or image.
We always need to copy the data into copybuffer in pitch workaround.
Tests:
clinfo
Conformance tests: all images test, info, API, basic.
Internal runtime tests
Change-Id: I97d876ac75b240e69b48244be4c9e522db24f8ac
This is part 2 of the code change for PAL.
The copy image workaround could be recursively used by ROCclr blit kernel.
Avoid such situation by using stack variable.
Tests:
clinfo.
Conformance tests - basic, API, info, and all images tests.
Internal runtime tests - all passed.
Change-Id: I3c822e55398cdf35c2c4a46ed9fc20fbee7cc908
Since the majority of the Hostcall implementation now sits in the
commmon layer, the PAL backend simply just needs to invoke it. One thing
that is missing though is HSA signal support.
The newly added pal::Signal class is a light emulaion of what HSA
signals provide. The current implementation is just enough to get
Hostcall working, but it can be expanded in the future if needed to
fully emulate HSA signals.
The major difference for now between PAL and ROCm hostcall
implemenations is that PAL doesn't support blocking signals. This will
be enabled in the near future. For now use active wait for PAL.
Change-Id: I746557354ab9d71a7d4a31f9320fcc2fee5aee7f
The existing workgroup calculation logic for GWS initialization is
incorrect. It tries to add together workgroups across dimensions,
leading to major under-count in 2D and 3D kernels. An (x,y,z) kernel
uses x * y * z blocks, not x + y + z.
In addition, the previous logic was incorrect for the case of launching
a single-threaded kernel. It calculated 0 workgroups, leading to
initializing GWS to -1.
Change-Id: I1bb20a0d5b6e0cc10ac55901c28d8f93aac61c09
With the PAL_ALWAYS_RESIDENT flag memory objects are resident at allocation time, no need to make them resident again before submit.
Also we should never evict anything with this setting, or we'll generate a VM fault.
Change-Id: Ieacc6af88ab4e09c20efd94100e148b2502e1d70
Add MS HWS support. PAL reports just one compute engine
in that mode and runtime needs extra logic to detect RT queues.
Change-Id: I011f1f1b18dec6a7195a4f1fe939f8029bc269ae
Remove a workaround to CS_PARTIAL_FLUSH added in CL#1495187,
since PAL is no longer uses CS_PARTIAL_FLUSH.
Change-Id: I03edc7595459e19aad33b2b0901f0ebe4754d310