1211790607
* Add new implementation of direct send/recv reduce scatter * Resolved conflicts * Add multiple channels support to the reduction kernel of direct reduce scatter and adjust offset into buffer to utilize multiple channels. * Resolve validation issue when number of elements is not divisible by number of channels leaving elements unaccount for in reduction. * fix proxy hang * set maxSrcs to 64 in reduceCopy * optimize multi-channel code * fix validation issue in single node MI300 * Tune the message size range for 2,4, and 8 Nodes * Move Direct RS into separate kernel * Add Copyright * resolve review comments * resolve review comments * fix merge build issue * revert move Direct RS into separate kernel * address review comments * address review comments --------- Co-authored-by: KawtharShafie <kawtharshafie@gmail.com> Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com> Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>