* Added python test runner to execute rccl tests
* Disabled capture output to avoid hangs
* Add RCCL_TEST_MPI_HOSTFILE env var to get the hostfile
* Converted test_type to boolean gtest flag
* Removed unused return values
* Added custom rccl library usage
* Removed json output
* Updates to test_runner: added num_gpus field
* Address review comments
* Prepend env vars for single node, single process executions
* Added separate enums for exit and result codes
* Update configuration files
* Moved configurations to its own dir
* Address review comments
* Update tools/scripts/test_runner/README.md
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
---------
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
[ROCm/rccl commit: 0c2c61d2f1]
* create dir regardless of default or user-provided path if it doesn't exist
* Fix npkit_dump_dir on npkit_trace_generator.py
---------
Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>
[ROCm/rccl commit: aec4f0a659]
* Support gfx950 in topo_expl
* Fix dependencies and fetch fmt from sources
* Remove third_party folder in make clean
* Add empty target when fmt is found
* Add MI350 example
* Update README.md
---------
Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
[ROCm/rccl commit: dfad51e3c9]
* Support fused all reduce and elementwise operations
Add additional "acc" parameter to RCCL Replayer logs
Add flag which indicates availability of new API
* Fix Recorder json parsing
* Remove unreachable code
* Remove extra acc pointer check
* .
* Revert "[DEVICE] Adding ability to choose unroll factor at runtime (#1734)"
This reverts commit 4cadf3597c.
* Use noinline to reduce kernels linking time
* Don't use noinline for gfx942 and gfx950 to avoid perf regression
---------
Co-authored-by: AtlantaPepsi <timhu102@amd.com>
Co-authored-by: BertanDogancay <bertan.dogancay@gmail.com>
[ROCm/rccl commit: 9a4213356d]
* Increased max stack size to 640
* Added new binary for executing unit tests
Added new unit tests for argcheck.cc and alt_rsmi.cc files
Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.
[ROCm/rccl commit: 275fdd43c1]
* First version of new replayer, with comments on future TODOs
* plus minor fixes for UT
* Updated format of recorder, especially in binary department, according to replayer's need
[ROCm/rccl commit: ba97c9c18b]
* Internal RCCL/NCCL functionality exposed when RCCL_EXPOSE_STATIC is enabled
* Algo/protocol/max channels can be obtained with the new RCCL API
* Introduce rccl_static and rccl_static_inline macros to work around invisible functions in core source files like enqueue.cc
* Add usage example in topo-explorer tool
[ROCm/rccl commit: 82afb2bcfe]
* added rccl version using rccl-tests
* Added function to get rccl version from rccl-tests
* removed whitespace
* Added rccl version
* Updated readme and fixed formatting
* removed debug prints
[ROCm/rccl commit: 3dc0478722]
* Fix compiler issues due to broken compatibility
* Fix segfault and pass rank instead of busid and add a pointer to cover a new algorithm
[ROCm/rccl commit: aace4e27f8]
* removed gfx940 and gfx941
* removed gfx940 and gfx941
* Update "gfx94" to "gfx942" in init.cc
* Updated remaining "gfx94" updates to "gfx942"
* Update filenames and variables from gfx940 to gfx942
---------
Co-authored-by: akolliasAMD <akollias@amd.com>
[ROCm/rccl commit: 6505639cf4]
* Initial Script ready for review
* Added RCCL-tests and RCCL versions
* Added output folder and README
* Base format built
* Added ROCm version
* Added function to center titles and Vram information
* Added HIP version
* Cleaned formatting
* UCX version and MPI version
* Added NUMA balancing
* Added rocminfo
* Removed notes
* Changed regex for broadcom Nic
* Removed note by the ACS info
* Added Hostname to summary and details
* Print summary to terminal
* Added argparse
* Added flags and readme
* Added GPU ID
* fixed spelling
* renamed script again
* Added file descriptor and locked mem checks
* Added file descriptor and locked mem checks
* Removed extra spaces from summary table
* printing output file location
* Removed sudo in code and ACS flag
[ROCm/rccl commit: 4ba94d6662]
* Add Topologies for 16-GPU gfx942 SuperNode
- Add GigaIO topologies to tools/topo_expl for dev and testing
- Add GigaIO Columba 16 GPU romeModel and adjust topology
matching algorithm in rome_models for 16 GPU system
- Fix bug which failed to match Rome Model when using subsets
of system resources (i.e. ROCR_VISIBLE_DEVICES is set)
- Fixes for topo_expl
* Fix bug w/ 1H16P
[ROCm/rccl commit: a05329bd0d]
MSCCL 1-shot xmls may cause different output values on different ranks.
Disabling them for now to avoid undefined behavior in applications.
[ROCm/rccl commit: 62d10fdc25]