1. Enable pitch workaround
2. When we use copy image, we don't need to create the custom pitch image
3. wrtBackImageBuffer_ stores device memory object, not amd image object.
Tests:
conformance kernel read / write test pass with this code change.
Change-Id: I7dca3127adde6ac83e78dd270a2256ebed55c60d
When we're aligning rowPitch to imagePitchAlignment, rowPitch is in pixels,
but imagePitchAlignment_ is bytes, so we end up overaligning the pitch.
Convert imagePitchAlignment_ to pixels before doing any logic.
Change-Id: Ia5ab9d54bed150fe974e86b060dbadc196165b29
[hipclang-vdi-rocm][perf]~45% to 50% of Performance drop on
rocBLAS_int8 test
- Enable AMD_OPT_FLUSH optimization by default to match HCC
- Disable CPU writes to GPU memory on boards with large bar,
because it requires HDP flush tracking.
- Enable L2 cache on kernel arguments, because L2 will be
invalidated on memory reuse .
Change-Id: I124cf250bdd4d19c523ce542c163813828f8fbdc