* added rccl version using rccl-tests
* Added function to get rccl version from rccl-tests
* removed whitespace
* Added rccl version
* Updated readme and fixed formatting
* removed debug prints
[ROCm/rccl commit: 3dc0478722]
* Initial Script ready for review
* Added RCCL-tests and RCCL versions
* Added output folder and README
* Base format built
* Added ROCm version
* Added function to center titles and Vram information
* Added HIP version
* Cleaned formatting
* UCX version and MPI version
* Added NUMA balancing
* Added rocminfo
* Removed notes
* Changed regex for broadcom Nic
* Removed note by the ACS info
* Added Hostname to summary and details
* Print summary to terminal
* Added argparse
* Added flags and readme
* Added GPU ID
* fixed spelling
* renamed script again
* Added file descriptor and locked mem checks
* Added file descriptor and locked mem checks
* Removed extra spaces from summary table
* printing output file location
* Removed sudo in code and ACS flag
[ROCm/rccl commit: 4ba94d6662]
* Add another rome model and override
* Fix bug
* Fix typo
* Add ring
* Update ring
* Fix model matching
* Clean up
* Clean up
* Reverse rings for NCCL_RINGS input
* Only reverse NCCL_RINGS for ring graph
* Fix mapping issue when using NCCL_RINGS
* Add NCCL_RINGS_REMAP to handle inconsistant net names
[ROCm/rccl commit: 532b70afb6]
* adding rocprof parser script
* adding the support for multiple json files
* adding pytorch profiler script
* remove filtering from pytorch log
* adding the addressing the comments and add the feature to parse all kernels
* completing the report for torch profiler
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl commit: 519843d2cf]
- Modifies the ring creation algorithm to be friendlier to rail-optimized topologies (should not affect classic fabric topologies)
[ROCm/rccl commit: 4cb62f999a]
* Add another Rome model
* Add option to force enable intranet on single node
* Limit p2p channels to number of ranks
* Refine p2p channels handling
[ROCm/rccl commit: ef499c4810]
* Add 1H16P GPU model
* Implement NIC identification and remapping
* Revert "Sort IB devices based on device name (#413)"
This reverts commit de0c586bad.
* Fix permute and check order
* Correction on IB speed reporting
* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)"
This reverts commit fa690c47a0.
[ROCm/rccl commit: 5c8380ff5b]
* Add another Rome model
* Add gfx908 4P3L models and support
* Revert "Use cached value for detecting GDR support only once"
This reverts commit 0108a1219d.
* Skip using ibverb for GPU direct RDMA detection
* Fine tune one Rome model
[ROCm/rccl commit: a51e4071e3]