53b5692d07cf3975df9f6efea3370c23b24da4d7
If the required scratch allocation is too large, ROCr will attempt to
reduce it by lowering the dispatch's targeted occupancy. The reduction
loop however was prone to overflow if waves_per_cu was not a multiple of
waves_per_group. Ensure no overflow by aligning waves_per_cu to
waves_per_group.
On GC 9.4.3 dGPU, dispatches with a large grid size and a
waves_per_group of e.g. 16 may require to reduce occupancy such that
waves_per_cu is less than waves_per_group to ensure the allocation size
is small enough. Allow this while also ensuring the tmpring scratch wave
count is kept divisible by the number of SEs per XCC.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480
[ROCm/ROCR-Runtime commit: bd63e5045c]
توضیحات
No description provided
Languages
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
دیگر
1.1%