b70cac2b33
Improve MPI example to avoid confusion of number of processes / total number of GPUs.
https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369
[ROCm/rccl-tests commit: 2cbb968101]