62fee66ff2
The existing workgroup calculation logic for GWS initialization is
incorrect. It tries to add together workgroups across dimensions,
leading to major under-count in 2D and 3D kernels. An (x,y,z) kernel
uses x * y * z blocks, not x + y + z.
In addition, the previous logic was incorrect for the case of launching
a single-threaded kernel. It calculated 0 workgroups, leading to
initializing GWS to -1.
Change-Id: I1bb20a0d5b6e0cc10ac55901c28d8f93aac61c09
[ROCm/clr commit: 54d1d69c0a]