* Use one side stream per process * Handle multiple GPUs per process * Reset stream when not found * Address review comments * Fix missing mutex initializer [ROCm/rccl commit: 185e78a8f0]
185e78a8f0