__amd_streamOpsWrite blitkernel in device-libs has only 3 args.
so getting rid of the 4th unused arg (sizeBytes)
Change-Id: I81cc1107f8b424bf58558c93a2495a1b878aef91
The new copy kernel can limit the number of launched workgoups.
It can copy in chunks of 16 bytes or 4 bytes.
Workgoup size is increased to 512 or 1024
Change-Id: Ic3fefa2d5bda6afebd1acc4d41ad310b138af6df
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG
Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
For the fillBuffer shader, if there are two 32bit writes to a MMIO
register, it can get dropped. It has to be a single 64bit write.
Add optimization to fillBuffer to write 64bit and 16bit writes.
Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b