* Add gpt-fast pytorch all reduce benchmark script * Update readme instructions * Minor changes [ROCm/rccl commit: 5f2b88bc28]
5f2b88bc28