ba5a9a5395
Do not use __ockl_activelane_u32() to calculate the index of the lane within the mask, as that would not work with divergent masks that have other bits on before the associated lane.
[ROCm/clr commit: 1a8d766836]