4 Коммитов

Автор SHA1 Сообщение Дата
AbandiGa acf0bc1c6e added copyright (#1635)
[ROCm/rccl commit: 7a84c5dbb0]
2025-04-14 09:46:18 -05:00
Nikhil-Nunna ee9da06c80 Fetching RCCL version from shared object file. (#1569)
* added rccl version using rccl-tests

* Added function to get rccl version from rccl-tests

* removed whitespace

* Added rccl version

* Updated readme and fixed formatting

* removed debug prints

[ROCm/rccl commit: 3dc0478722]
2025-04-08 14:09:39 -05:00
Nilesh M Negi 5f3134db50 [TOOLS] Update rcclDiagnostics script (#1557)
* [TOOLS] Update rcclDiagnostics script

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Fix typo in valid_marketing_names list

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 159587be5c]
2025-02-20 16:11:05 -06:00
Nikhil-Nunna f228a50646 Env conf debug (#1534)
* Initial Script ready for review

* Added RCCL-tests and RCCL versions

* Added output folder and README

* Base format built

* Added ROCm version

* Added function to center titles and Vram information

* Added HIP version

* Cleaned formatting

* UCX version and MPI version

* Added NUMA balancing

* Added rocminfo

* Removed notes

* Changed regex for broadcom Nic

* Removed note by the ACS info

* Added Hostname to summary and details

* Print summary to terminal

* Added argparse

* Added flags and readme

* Added GPU ID

* fixed spelling

* renamed script again

* Added file descriptor and locked mem checks

* Added file descriptor and locked mem checks

* Removed extra spaces from summary table

* printing output file location

* Removed sudo in code and ACS flag

[ROCm/rccl commit: 4ba94d6662]
2025-02-14 17:31:18 -06:00