Revert collective chunk and slice steps to avoid drop in throughput [ROCm/rccl commit: 998ab83675]
998ab83675