* mscclpp patch apply clip patch and set allreduce8 blocks from 512 to 1024
* add compilation flag for enabling/disabling clipping in mscclpp
* change flag name for consistency, set flag to OFF
* add compilation flag in rccl for enabling clipping in mscclpp
* set 1024 threads for mscclpp allreduce8 only for bfloat16
* fix improper description for ENABLE_MSCCLPP_CLIP flag
* Revert "Merge branch 'clip-patch' of https://github.com/isaki001/rccl into clip-patch"
This reverts commit 6e31857a9db98314b8a748eb024f2c3699ebe2d5, reversing
changes made to 193f4caa8ffa78b4e056893212fd8344aa14e937.
* update clip remove-clip.patch for rebase
[ROCm/rccl commit: 8145c4f3b8]