* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu
* Fixing temp file creation/deletion for Clique kernel mode.
* Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs
* GroupCall MP UT properly quits when too many devices specified
* MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script