* Initial commit of all_reduce_only support
* Working AllReduce only build
* Removing printfs and restoring release build
* Restore P2P index
* Updates to build_allreduce_only mode.
* cleaning up macro ifdefs
* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu