Files
rocm-systems/.github
Sai Enduri 01d16d4139 Enable multi node rccl tests on MI350x slurm cluster. (#1900)
* Add tests on slurm cluster

* Integrate slurm.

* Add flags.

* Added dynamic selection of runners for tests and cleanup for slurm reservation

* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"

This reverts commit d5350ff6e4f563ddd56ad81e4bc2a393ed55ba00.

* Refactor so tests run on both architectures.

* continue on error

* fail fast false on matrix

* remove scancel

* skip all single node tests

* fix pattern matching for pytest

* switch to always skip github job

* Update to latest allocation.

* Clean up workflows and update docker image.

* Updated container image published from PR #1517

* Switch back to TheRock main branch sha.

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-09-23 22:00:26 -07:00
..