* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests * Adding duration / overhead info
Adding ability to switch between memset/memcpy
* Adding standalone TransferBench tool