Introducing a VirtualMemObj map as it is needed to differentiate
between virtual address ranges and actual physical memory
This is because a whole VA range can have several physical memories
as chunks.
Change-Id: Ie2a972b4faf3f7d552cfa53e77898f80ad75740a
[ROCm/clr commit: 905088e4e7]
Implement map/unmap for PAL backend
Create commands since PAL uses the IQueue to map/unmap
Change-Id: I97e26a7d28ae5e10774c9ca65307153100945621
[ROCm/clr commit: 67657d6099]
This code change is to improve error handling.
This code change does not fix issue itself.
Before this code change, hostcallBuffer_ point is initialized in the end of
create() function. If create function fails and returns early,
hostcallBuffer_ point is not initialized. This non-initialized point can
cause access violation when object is destructed.
This code change put the initialization of the pointer in the constructor.
Change-Id: I7fb6e764eb0547196dca03db237e49d3ff0fd06a
[ROCm/clr commit: 5528812aa9]
Pass active queue for transfers in the cache coherency layer.
That will allow to use device transfer queue only for
cases when active queue isn't available, because using device
transfer queue from another active queue may cause a deadlock
Change-Id: Ifbe7e0303b77dbf6eeda3939ffbc25a3df7472de
[ROCm/clr commit: 95d55fdfa8]
Metadata in Codeobject version 5 is the extension of CO3 and CO4.
Add the detection of the new fields and program them in
the setup of the kernel arguments.
Change-Id: I27e58df77320ad00f4f16d35912668db803826af
[ROCm/clr commit: be6a06384e]
Reuse FillMemory function, that should fix the cache syncs from the host
Change-Id: Ieebec5fc3ed3a322b88d5187c8dca4805ec6f84b
[ROCm/clr commit: 24442be35a]
HIP should be built with HSAIL support disabled.
Currently HSAILProgram::info() and VirtualGPU::buildKernelInfo() expose
ACL interfaces directly. This should not be allowed.
Change-Id: Iae15d4f19be16806826f2f6cb600752c11f97fc1
[ROCm/clr commit: bbe6246f19]
This is part 2 of the change. This is for PAL backend.
The parent buffer sometimes has newer data than the sub buffer or image.
We always need to copy the data into copybuffer in pitch workaround.
Tests:
clinfo
Conformance tests: all images test, info, API, basic.
Internal runtime tests
Change-Id: I97d876ac75b240e69b48244be4c9e522db24f8ac
[ROCm/clr commit: 0de4b2962c]
This is part 2 of the code change for PAL.
The copy image workaround could be recursively used by ROCclr blit kernel.
Avoid such situation by using stack variable.
Tests:
clinfo.
Conformance tests - basic, API, info, and all images tests.
Internal runtime tests - all passed.
Change-Id: I3c822e55398cdf35c2c4a46ed9fc20fbee7cc908
[ROCm/clr commit: 090cf6c6d3]
Since the majority of the Hostcall implementation now sits in the
commmon layer, the PAL backend simply just needs to invoke it. One thing
that is missing though is HSA signal support.
The newly added pal::Signal class is a light emulaion of what HSA
signals provide. The current implementation is just enough to get
Hostcall working, but it can be expanded in the future if needed to
fully emulate HSA signals.
The major difference for now between PAL and ROCm hostcall
implemenations is that PAL doesn't support blocking signals. This will
be enabled in the near future. For now use active wait for PAL.
Change-Id: I746557354ab9d71a7d4a31f9320fcc2fee5aee7f
[ROCm/clr commit: 99e8ac55cd]
The existing workgroup calculation logic for GWS initialization is
incorrect. It tries to add together workgroups across dimensions,
leading to major under-count in 2D and 3D kernels. An (x,y,z) kernel
uses x * y * z blocks, not x + y + z.
In addition, the previous logic was incorrect for the case of launching
a single-threaded kernel. It calculated 0 workgroups, leading to
initializing GWS to -1.
Change-Id: I1bb20a0d5b6e0cc10ac55901c28d8f93aac61c09
[ROCm/clr commit: 54d1d69c0a]
With the PAL_ALWAYS_RESIDENT flag memory objects are resident at allocation time, no need to make them resident again before submit.
Also we should never evict anything with this setting, or we'll generate a VM fault.
Change-Id: Ieacc6af88ab4e09c20efd94100e148b2502e1d70
[ROCm/clr commit: fd09a7a23c]
- Make sure only one GPU barrier is issued per dispatch
when memory tracking is disabled
Change-Id: I974569ab42a8835304a2930eef87b561a3750327
[ROCm/clr commit: 481cecec78]
Add MS HWS support. PAL reports just one compute engine
in that mode and runtime needs extra logic to detect RT queues.
Change-Id: I011f1f1b18dec6a7195a4f1fe939f8029bc269ae
[ROCm/clr commit: 622c714165]
Remove a workaround to CS_PARTIAL_FLUSH added in CL#1495187,
since PAL is no longer uses CS_PARTIAL_FLUSH.
Change-Id: I03edc7595459e19aad33b2b0901f0ebe4754d310
[ROCm/clr commit: 1d25343af8]