ef6d75b3ee
* Make sure the target device is used for MSCCL
* Enable single process mode by default to use MSCCL in MT
* Create a per-rank state when GPUs share a thread
[ROCm/rccl commit: 03a3ef3c34]