1fb66c3e1e
thread_rank() gives thread index in a block. Limit the range to the
current warp size.
Change-Id: Ib5c9831236096485cf99ba7ab0b911a3b10de31c
[ROCm/clr commit: bd7d40a4d8]