bd63e5045c
If the required scratch allocation is too large, ROCr will attempt to reduce it by lowering the dispatch's targeted occupancy. The reduction loop however was prone to overflow if waves_per_cu was not a multiple of waves_per_group. Ensure no overflow by aligning waves_per_cu to waves_per_group. On GC 9.4.3 dGPU, dispatches with a large grid size and a waves_per_group of e.g. 16 may require to reduce occupancy such that waves_per_cu is less than waves_per_group to ensure the allocation size is small enough. Allow this while also ensuring the tmpring scratch wave count is kept divisible by the number of SEs per XCC. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480