ebcac26530
* Add gpt-fast pytorch all reduce benchmark script
* Update readme instructions
* Minor changes
[ROCm/rccl commit: 5f2b88bc28]