Improve throughput for about 20%. Also remove P2P over PCIe which was left enabled at initial release. Signed-off-by: Wenkai Du <wenkai.du@amd.com> [ROCm/rccl commit: f45566a8bd]
f45566a8bd