3e650467fa
* Use one side stream per process
* Handle multiple GPUs per process
* Reset stream when not found
* Address review comments
* Fix missing mutex initializer
[ROCm/rccl commit: 185e78a8f0]