HIP can't rely on the resource tracking, used in OCL and requires different explicit sync.
Make sure ROCCLR syncs compute only when SDMA is used and vise versa.
The new logic will allow to enable CPDMA without unnecessary waits.
Change-Id: Ib9d1788cfd5afa5ea2fec4c96a37d8b9c4d0059d
If we don't create the __amd_rocclr_gwsInit kernel, we still want
to create the rest of the image related blit kernels.
Change-Id: I8bc4645f9f9116eeecbb8b22e981ac4d520f3121
For the fillBuffer shader, if there are two 32bit writes to a MMIO
register, it can get dropped. It has to be a single 64bit write.
Add optimization to fillBuffer to write 64bit and 16bit writes.
Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.
Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef
The last commit to replace the cl_* types with standard types
failed to correct issues introduced in the PAL and GPU backend.
Change-Id: I926997234dfbe346fc165a7bc4e1b8aabab7bac5