* working tests for a single message size
* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests
* simplify test invocation
* remove unecessary logs and exit from ncclCommRegister
* set expected results for allGather
* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process
* fix improper size of expected-results vector
* Removing unused changes.
* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.
* Apply suggestions from code review
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
---------
Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>
[ROCm/rccl commit: 3398fa78fe]