8145c4f3b8
* mscclpp patch apply clip patch and set allreduce8 blocks from 512 to 1024 * add compilation flag for enabling/disabling clipping in mscclpp * change flag name for consistency, set flag to OFF * add compilation flag in rccl for enabling clipping in mscclpp * set 1024 threads for mscclpp allreduce8 only for bfloat16 * fix improper description for ENABLE_MSCCLPP_CLIP flag * Revert "Merge branch 'clip-patch' of https://github.com/isaki001/rccl into clip-patch" This reverts commit 6e31857a9db98314b8a748eb024f2c3699ebe2d5, reversing changes made to 193f4caa8ffa78b4e056893212fd8344aa14e937. * update clip remove-clip.patch for rebase