For the fillBuffer shader, if there are two 32bit writes to a MMIO register, it can get dropped. It has to be a single 64bit write. Add optimization to fillBuffer to write 64bit and 16bit writes. Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b