Граф коммитов

42 Коммитов

Автор SHA1 Сообщение Дата
Aravind Ravikumar 239d62f545 Enable Robust Multi-Node RCCL Testing with cvs-sbatch and Improve CI Reliability (#2123)
* sbatch changes and TheRock SHA update

* Move tests location from /home to /apps/cvs_tests

* Add comments and move credential.ini file to /apps/cvs_tests

* Changed salloc reservation to rccl reservation

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
2026-01-16 23:13:06 -05:00
Geo Min 4b295c9893 [TheRock CI] Adding working single node tests (#2142)
* Adding working single node tests

* Revert to old docker sha

* adding back no perf tests

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
2026-01-13 08:35:58 -08:00
Geo Min 4f474a7389 Revert "Adding org var and dynamic runner selection (#2106)" (#2114)
This reverts commit 2e193aed68.
2025-12-19 12:53:09 -08:00
Geo Min 2e193aed68 Adding org var and dynamic runner selection (#2106) 2025-12-16 10:41:57 -08:00
Geo Min 5384a8abb2 Correct runner name (#2098) 2025-12-10 11:44:48 -08:00
Geo Min 6af9087b0c [ci] Bumping TheRock CI commit hash (#2097)
* Bumping TheRock CI commit hasH

* fixing artifact group
2025-12-09 16:25:57 -08:00
Aravind Ravikumar 07f8f6d6c6 Add S3 upload support for Perf and test reports by run ID and architecture (#2020)
* Commits to enable scp report copy

* Added Post report upload step

* Added extra arg for fetch artifacts

* Moved to a specific commit

* Add write permissions to s3

* Added comment for TheRock sha commit date

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-11-03 19:09:34 -05:00
Arm Patinyasakdikul 84fdcab68a Added copyrights for Palamida scan 7.2. (#2018) 2025-10-30 13:33:20 -05:00
corey-derochie-amd f290e302d3 Updated CODEOWNERS to instead use RCCL-Reviewers team (#2010)
* Updated CODEOWNERS to instead use RCCL-Reviewers team

* Apply suggestion from @nileshnegi

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-10-28 09:27:26 -06:00
Aravind Ravikumar 506c2e9878 Adding reservation time for salloc in CI (#1992)
Co-authored-by: arravikum <arravikum@amd.com>
2025-10-22 10:00:01 -04:00
Mythreya Kuricheti 9ae5956ca5 [rocprofiler-sdk] Update codeowner for api-trace.h (#1974)
Feedback from #1933
2025-10-20 10:43:42 -06:00
JC b1589a5786 [CI] Enable ccache w/ namespace for external use (#1966)
* Enable ccache w/ namespace for external use

* Remove TheRock from setup_tools.py command line

* Bump TheRock commit to use health_status.py

Resolves https://github.com/ROCm/rccl/pull/1966/files/c6d2e8ce5c14a2c94bfb47e21d3e2d466f25c9b4#r2420734710

* Bump TheRock to older commit with health_status.py

* Add git safe directory for working directory

* Move install python deps

* Remove pip freeze
2025-10-20 08:44:42 -07:00
Geo Min 97f2665da2 fixing group id (#1975) 2025-10-10 16:40:44 -07:00
Mythreya Kuricheti 3000f0e837 [rocprofiler-sdk] Add codeowner for api-trace.h (#1933) 2025-10-10 16:29:17 -05:00
Aravind Ravikumar 1858a31c41 Enable Presubmit CI Gating for develop Branch (TheRock CI for RCCL) (#1954)
* Trigger CI run on pull request

* Enabling CI run on different PR types

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-10-07 09:11:50 -04:00
Sai Enduri 01d16d4139 Enable multi node rccl tests on MI350x slurm cluster. (#1900)
* Add tests on slurm cluster

* Integrate slurm.

* Add flags.

* Added dynamic selection of runners for tests and cleanup for slurm reservation

* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"

This reverts commit d5350ff6e4f563ddd56ad81e4bc2a393ed55ba00.

* Refactor so tests run on both architectures.

* continue on error

* fail fast false on matrix

* remove scancel

* skip all single node tests

* fix pattern matching for pytest

* switch to always skip github job

* Update to latest allocation.

* Clean up workflows and update docker image.

* Updated container image published from PR #1517

* Switch back to TheRock main branch sha.

---------

Co-authored-by: arravikum <arravikum@amd.com>
2025-09-23 22:00:26 -07:00
Surya Periaswamy 389f794d9a Add speriaswamy-amd to CODEOWNERS (#1921) 2025-09-18 07:15:21 -05:00
nawrinsu 0b03bb718a Add nawrinsu to CODEOWNERS (#1917) 2025-09-16 23:40:51 -05:00
Weile f64b1f409f add weilewei to CODEOWNERS (#1915) 2025-09-16 10:14:18 -07:00
Geo Min f404624d9e [TheRock CI] Adding single node tests for RCCL (#1876)
* Add single-node testing

* Adding single node test

* Adding quotes

* fix typo

* Adding test flag

* No MPI

* Adding openmpi install

* Adding comment

* PR comments

* Missing proj

* Adding half

* Adding rocr runtime

* Adding them all'

* new sha

* Fixing script

* Removing confusing skip test case

* Adding docs

* Update .github/workflows/therock-test-packages-single-node.yml

Co-authored-by: Marius Brehler <marius.brehler@amd.com>

---------

Co-authored-by: Marius Brehler <marius.brehler@amd.com>
2025-08-27 08:13:10 -07:00
Marius Brehler 221205ebd4 Bump TheRock version used for testing (#1885) 2025-08-27 16:22:27 +02:00
Geo Min f9a957bbab [TheRock CI] Adding TheRock RCCL tests (#1873)
* First commit for rccl multi node test workflow

* Adding workflow dispatch

* Added branch based pull trigger

* Changed typo in branch name

* Add input variables to push

* Removed input variables to push

* Added self hosted runner for Vultr cloud

* Skipping build and only running test

* Changed test runner label name

* Made changes to executable paths in test script

* Made changes to run

* Made changes to cd into cvs dir

* This is a dummy commit

* Added cmake options

* Modified build options

* Commiting build changes

* Adding rccl and rccl-tests

* Re-ordering rccl and rccl-tests

* adding --global command

* modified cmake command

* modified script paths

* Testing OIDC for rccl repo

* Testing OIDC for rccl repo

* Testing build and upload workflow

* use default env variable for AMDGPU families on push workflow trigger

* Adding cleanup and correct role

* Adding additional yml files

* Fixing typo';

* Adding new sha

* Adding correct gpu target

* Adding back venv bin activate

* Adding workflow dispatch for tests

* Testing

* Adding cat

* Adding cat

* Adding rocm dir change

* Adding checkout

* cat with sudo

* rccl checkout

* correcting branch

* removing sudo

* trying to adjust correct path'

* Adding output dir path

* Use docker container with pre-installed MPI

* Adding back build steps

* Fixing SHA

* Adding exclusion logic:

* Adding test

* Adding CI check

* Removing testing

* Limit to build only rccl, rccl-tests and required dependencies

* Adding test

* Removing test

* Removing quote

* Reverting test

* PR comments

---------

Co-authored-by: arravikum <arravikum@amd.com>
Co-authored-by: Marius Brehler <marius.brehler@amd.com>
2025-08-20 15:07:23 -07:00
Atul Kulkarni 231449c896 Added new code owners (#1869) 2025-08-19 16:32:25 -05:00
akolliasAMD aabd181fe4 remove user from code owner file (#1709) 2025-05-23 15:45:15 -05:00
Nikhil-Nunna a72a1939d1 Updated Codeowners (#1692) 2025-05-12 18:58:39 -05:00
Nikhil-Nunna fd3422afdb Added Nikhil-Nunna to codeowners 2025-02-05 14:28:00 -06:00
AbandiGa e92a103bad Adding @AbandiGa (myself) as code owner (#1532)
Signed-off-by: AbandiGa <galaband@amd.com>
2025-02-05 13:23:25 -06:00
Edgar Gabriel 3646b1de43 update CODEOWNERS (#1529)
* update CODEOWNERS
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2025-02-04 11:54:42 -07:00
JhaShweta1 f60fac76e6 Update CODEOWNERS
Added  a new user: Shweta Jha
2025-01-03 11:47:22 -06:00
corey-derochie-amd 4336a0f3a3 Added latest users to CODEOWNERS. (#1422) 2024-11-14 16:55:18 -07:00
akolliasAMD 7e78641dc1 cleaned codeowners file (#1247) 2024-07-09 10:31:23 -06:00
Sam Wu 9f01acc030 Update Read the Docs configuration to use Python 3.10 and latest rocm-docs-core (#1190)
* Add doc team as owners of RTD config

* Update Read the Docs configuration to use Python 3.10 and latest rocm-docs-core
2024-06-14 12:12:22 -06:00
corey-derochie-amd 8f471ba537 Created PR template for the rccl repo (#1118) 2024-04-15 15:34:42 -06:00
corey-derochie-amd 606d3e6b6e Added @corey-derochie-amd as a code owner (to rocm-documentation) (#1119) 2024-03-21 14:56:05 -06:00
Sam Wu 7d6da4c66b Add codeowners for documentation (#1061)
* Add codeowners for documentation

* Update CODEOWNERS

---------

Co-authored-by: samjwu <samjwu@users.noreply.github.com>
2024-01-25 09:33:28 -07:00
Bertan Dogancay ff7c9c4050 Add codeowners (#1041) 2024-01-11 15:41:08 -07:00
dependabot[bot] a433fcc726 Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx (#870)
* Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* replace noCI with ci:docs-only label

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>
2023-08-30 08:56:38 -06:00
akolliasAMD 59db7b8722 removed codeowners file (#815) 2023-07-20 12:50:54 -06:00
akolliasAMD 1ff1bb3397 added codeowners file (#813) 2023-07-20 12:16:50 -06:00
Sam Wu c3f47853bd Update Read the Docs, documentation, and dependabot (#772)
* update documentation

add version number to documentation

rename .sphinx/.doxygen to sphinx/doxygen

enable htmlzip, pdf, epub formats when publishing on Read the Docs

* add noCI label for dependabot PRs

since RTD CI is separate from math lib CI

* update rocm-docs-core to v0.13.4

* update README with link to rocm.docs.amd.com
2023-06-07 15:31:58 -06:00
Saad Rahim a78ff46861 Standardizing documentation homepage message (#726) 2023-04-16 18:14:56 -06:00
Sam Wu dc149a9fbd pin rocm-docs-core and add dependabot config (#722) 2023-04-11 10:01:24 -06:00