__amd_streamOpsWrite blitkernel in device-libs has only 3 args.
so getting rid of the 4th unused arg (sizeBytes)
Change-Id: I81cc1107f8b424bf58558c93a2495a1b878aef91
[ROCm/clr commit: e643406caa]
The kernel accepts uint64_t, but with 32bit OCL build size_t was 32 bit
Change-Id: I6fe37d2e5e69c7bd62d7b1bd4cace758758b3482
[ROCm/clr commit: b3171d08e6]
The new copy kernel can limit the number of launched workgoups.
It can copy in chunks of 16 bytes or 4 bytes.
Workgoup size is increased to 512 or 1024
Change-Id: Ic3fefa2d5bda6afebd1acc4d41ad310b138af6df
[ROCm/clr commit: ed4e1fec98]
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
[ROCm/clr commit: f1dc81f427]
Add dstMemory format updating.
Separate format updating for srcMemory and dstMemory.
Change-Id: I1692b92d417bbd742d562679f218ebf8ca532e92
[ROCm/clr commit: 7624a48de9]
HIP can't rely on the resource tracking, used in OCL and requires different explicit sync.
Make sure ROCCLR syncs compute only when SDMA is used and vise versa.
The new logic will allow to enable CPDMA without unnecessary waits.
Change-Id: Ib9d1788cfd5afa5ea2fec4c96a37d8b9c4d0059d
[ROCm/clr commit: ff6b4db70b]
If we don't create the __amd_rocclr_gwsInit kernel, we still want
to create the rest of the image related blit kernels.
Change-Id: I8bc4645f9f9116eeecbb8b22e981ac4d520f3121
[ROCm/clr commit: 55a0cf0b0c]
For the fillBuffer shader, if there are two 32bit writes to a MMIO
register, it can get dropped. It has to be a single 64bit write.
Add optimization to fillBuffer to write 64bit and 16bit writes.
Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b
[ROCm/clr commit: 21ba34d0fe]
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.
Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef
[ROCm/clr commit: fdef6f722f]
The last commit to replace the cl_* types with standard types
failed to correct issues introduced in the PAL and GPU backend.
Change-Id: I926997234dfbe346fc165a7bc4e1b8aabab7bac5
[ROCm/clr commit: b81816f482]