* working tests for a single message size
* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests
* simplify test invocation
* remove unecessary logs and exit from ncclCommRegister
* set expected results for allGather
* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process
* fix improper size of expected-results vector
* Removing unused changes.
* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.
* Apply suggestions from code review
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
---------
Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo
* Refactoring unit tests to improve performance
* Spawning child processes during InitComms instead of on TestBed construction
* Temporarily disabling graph unit tests