42 次程式碼提交

作者 SHA1 備註 日期
Aravind Ravikumar f336ad5133 Enable Robust Multi-Node RCCL Testing with cvs-sbatch and Improve CI Reliability (#2123)
* sbatch changes and TheRock SHA update

* Move tests location from /home to /apps/cvs_tests

* Add comments and move credential.ini file to /apps/cvs_tests

* Changed salloc reservation to rccl reservation

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>

[ROCm/rccl commit: 239d62f545]
2026-01-16 23:13:06 -05:00
Geo Min dfdb64572c [TheRock CI] Adding working single node tests (#2142)
* Adding working single node tests

* Revert to old docker sha

* adding back no perf tests

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>

[ROCm/rccl commit: 4b295c9893]
2026-01-13 08:35:58 -08:00
Geo Min c199df6b96 Revert "Adding org var and dynamic runner selection (#2106)" (#2114)
This reverts commit 4f7698c27e.

[ROCm/rccl commit: 4f474a7389]
2025-12-19 12:53:09 -08:00
Geo Min 4f7698c27e Adding org var and dynamic runner selection (#2106)
[ROCm/rccl commit: 2e193aed68]
2025-12-16 10:41:57 -08:00
Geo Min 1b4eef8f86 Correct runner name (#2098)
[ROCm/rccl commit: 5384a8abb2]
2025-12-10 11:44:48 -08:00
Geo Min 2e0abab81a [ci] Bumping TheRock CI commit hash (#2097)
* Bumping TheRock CI commit hasH

* fixing artifact group

[ROCm/rccl commit: 6af9087b0c]
2025-12-09 16:25:57 -08:00
Aravind Ravikumar 4babb01f4d Add S3 upload support for Perf and test reports by run ID and architecture (#2020)
* Commits to enable scp report copy

* Added Post report upload step

* Added extra arg for fetch artifacts

* Moved to a specific commit

* Add write permissions to s3

* Added comment for TheRock sha commit date

---------

Co-authored-by: arravikum <arravikum@amd.com>

[ROCm/rccl commit: 07f8f6d6c6]
2025-11-03 19:09:34 -05:00
Arm Patinyasakdikul 03e92dc942 Added copyrights for Palamida scan 7.2. (#2018)
[ROCm/rccl commit: 84fdcab68a]
2025-10-30 13:33:20 -05:00
corey-derochie-amd 44160d34a4 Updated CODEOWNERS to instead use RCCL-Reviewers team (#2010)
* Updated CODEOWNERS to instead use RCCL-Reviewers team

* Apply suggestion from @nileshnegi

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: f290e302d3]
2025-10-28 09:27:26 -06:00
Aravind Ravikumar a7a1647926 Adding reservation time for salloc in CI (#1992)
Co-authored-by: arravikum <arravikum@amd.com>

[ROCm/rccl commit: 506c2e9878]
2025-10-22 10:00:01 -04:00
Mythreya Kuricheti ef1ed44e93 [rocprofiler-sdk] Update codeowner for api-trace.h (#1974)
Feedback from #1933

[ROCm/rccl commit: 9ae5956ca5]
2025-10-20 10:43:42 -06:00
JC 08d93e763e [CI] Enable ccache w/ namespace for external use (#1966)
* Enable ccache w/ namespace for external use

* Remove TheRock from setup_tools.py command line

* Bump TheRock commit to use health_status.py

Resolves https://github.com/ROCm/rccl/pull/1966/files/f9d6d76440b88ecf67d08765ee0e9bac00b55b40#r2420734710

* Bump TheRock to older commit with health_status.py

* Add git safe directory for working directory

* Move install python deps

* Remove pip freeze

[ROCm/rccl commit: b1589a5786]
2025-10-20 08:44:42 -07:00
Geo Min 3ead4ca4a1 fixing group id (#1975)
[ROCm/rccl commit: 97f2665da2]
2025-10-10 16:40:44 -07:00
Mythreya Kuricheti 24a62a2ab3 [rocprofiler-sdk] Add codeowner for api-trace.h (#1933)
[ROCm/rccl commit: 3000f0e837]
2025-10-10 16:29:17 -05:00
Aravind Ravikumar 45abdcfe62 Enable Presubmit CI Gating for develop Branch (TheRock CI for RCCL) (#1954)
* Trigger CI run on pull request

* Enabling CI run on different PR types

---------

Co-authored-by: arravikum <arravikum@amd.com>

[ROCm/rccl commit: 1858a31c41]
2025-10-07 09:11:50 -04:00
Sai Enduri 15628819e2 Enable multi node rccl tests on MI350x slurm cluster. (#1900)
* Add tests on slurm cluster

* Integrate slurm.

* Add flags.

* Added dynamic selection of runners for tests and cleanup for slurm reservation

* Revert "Added dynamic selection of runners for tests and cleanup for slurm reservation"

This reverts commit fdd5a6cc968c764d3d1039f0897fb11f11422928.

* Refactor so tests run on both architectures.

* continue on error

* fail fast false on matrix

* remove scancel

* skip all single node tests

* fix pattern matching for pytest

* switch to always skip github job

* Update to latest allocation.

* Clean up workflows and update docker image.

* Updated container image published from PR #1517

* Switch back to TheRock main branch sha.

---------

Co-authored-by: arravikum <arravikum@amd.com>

[ROCm/rccl commit: 01d16d4139]
2025-09-23 22:00:26 -07:00
Surya Periaswamy ebbcb16cca Add speriaswamy-amd to CODEOWNERS (#1921)
[ROCm/rccl commit: 389f794d9a]
2025-09-18 07:15:21 -05:00
nawrinsu 266067920f Add nawrinsu to CODEOWNERS (#1917)
[ROCm/rccl commit: 0b03bb718a]
2025-09-16 23:40:51 -05:00
Weile 6ddae6ec42 add weilewei to CODEOWNERS (#1915)
[ROCm/rccl commit: f64b1f409f]
2025-09-16 10:14:18 -07:00
Geo Min 6db483845d [TheRock CI] Adding single node tests for RCCL (#1876)
* Add single-node testing

* Adding single node test

* Adding quotes

* fix typo

* Adding test flag

* No MPI

* Adding openmpi install

* Adding comment

* PR comments

* Missing proj

* Adding half

* Adding rocr runtime

* Adding them all'

* new sha

* Fixing script

* Removing confusing skip test case

* Adding docs

* Update .github/workflows/therock-test-packages-single-node.yml

Co-authored-by: Marius Brehler <marius.brehler@amd.com>

---------

Co-authored-by: Marius Brehler <marius.brehler@amd.com>

[ROCm/rccl commit: f404624d9e]
2025-08-27 08:13:10 -07:00
Marius Brehler 5277457f21 Bump TheRock version used for testing (#1885)
[ROCm/rccl commit: 221205ebd4]
2025-08-27 16:22:27 +02:00
Geo Min bec7d58b04 [TheRock CI] Adding TheRock RCCL tests (#1873)
* First commit for rccl multi node test workflow

* Adding workflow dispatch

* Added branch based pull trigger

* Changed typo in branch name

* Add input variables to push

* Removed input variables to push

* Added self hosted runner for Vultr cloud

* Skipping build and only running test

* Changed test runner label name

* Made changes to executable paths in test script

* Made changes to run

* Made changes to cd into cvs dir

* This is a dummy commit

* Added cmake options

* Modified build options

* Commiting build changes

* Adding rccl and rccl-tests

* Re-ordering rccl and rccl-tests

* adding --global command

* modified cmake command

* modified script paths

* Testing OIDC for rccl repo

* Testing OIDC for rccl repo

* Testing build and upload workflow

* use default env variable for AMDGPU families on push workflow trigger

* Adding cleanup and correct role

* Adding additional yml files

* Fixing typo';

* Adding new sha

* Adding correct gpu target

* Adding back venv bin activate

* Adding workflow dispatch for tests

* Testing

* Adding cat

* Adding cat

* Adding rocm dir change

* Adding checkout

* cat with sudo

* rccl checkout

* correcting branch

* removing sudo

* trying to adjust correct path'

* Adding output dir path

* Use docker container with pre-installed MPI

* Adding back build steps

* Fixing SHA

* Adding exclusion logic:

* Adding test

* Adding CI check

* Removing testing

* Limit to build only rccl, rccl-tests and required dependencies

* Adding test

* Removing test

* Removing quote

* Reverting test

* PR comments

---------

Co-authored-by: arravikum <arravikum@amd.com>
Co-authored-by: Marius Brehler <marius.brehler@amd.com>

[ROCm/rccl commit: f9a957bbab]
2025-08-20 15:07:23 -07:00
Atul Kulkarni 8c5095dd94 Added new code owners (#1869)
[ROCm/rccl commit: 231449c896]
2025-08-19 16:32:25 -05:00
akolliasAMD 6e2f75d424 remove user from code owner file (#1709)
[ROCm/rccl commit: aabd181fe4]
2025-05-23 15:45:15 -05:00
Nikhil-Nunna ad657d957a Updated Codeowners (#1692)
[ROCm/rccl commit: a72a1939d1]
2025-05-12 18:58:39 -05:00
Nikhil-Nunna 60a86a65a1 Added Nikhil-Nunna to codeowners
[ROCm/rccl commit: fd3422afdb]
2025-02-05 14:28:00 -06:00
AbandiGa 236cc66797 Adding @AbandiGa (myself) as code owner (#1532)
Signed-off-by: AbandiGa <galaband@amd.com>

[ROCm/rccl commit: e92a103bad]
2025-02-05 13:23:25 -06:00
Edgar Gabriel cac16e2c96 update CODEOWNERS (#1529)
* update CODEOWNERS
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>


[ROCm/rccl commit: 3646b1de43]
2025-02-04 11:54:42 -07:00
JhaShweta1 d9d60404fd Update CODEOWNERS
Added  a new user: Shweta Jha

[ROCm/rccl commit: f60fac76e6]
2025-01-03 11:47:22 -06:00
corey-derochie-amd 6d61a4e21f Added latest users to CODEOWNERS. (#1422)
[ROCm/rccl commit: 4336a0f3a3]
2024-11-14 16:55:18 -07:00
akolliasAMD 9644767ead cleaned codeowners file (#1247)
[ROCm/rccl commit: 7e78641dc1]
2024-07-09 10:31:23 -06:00
Sam Wu 17eb7a3c6b Update Read the Docs configuration to use Python 3.10 and latest rocm-docs-core (#1190)
* Add doc team as owners of RTD config

* Update Read the Docs configuration to use Python 3.10 and latest rocm-docs-core

[ROCm/rccl commit: 9f01acc030]
2024-06-14 12:12:22 -06:00
corey-derochie-amd fa5d8d7a6b Created PR template for the rccl repo (#1118)
[ROCm/rccl commit: 8f471ba537]
2024-04-15 15:34:42 -06:00
corey-derochie-amd 9c2a57259d Added @corey-derochie-amd as a code owner (to rocm-documentation) (#1119)
[ROCm/rccl commit: 606d3e6b6e]
2024-03-21 14:56:05 -06:00
Sam Wu b229abc692 Add codeowners for documentation (#1061)
* Add codeowners for documentation

* Update CODEOWNERS

---------

Co-authored-by: samjwu <samjwu@users.noreply.github.com>

[ROCm/rccl commit: 7d6da4c66b]
2024-01-25 09:33:28 -07:00
Bertan Dogancay 3d54c3fe5c Add codeowners (#1041)
[ROCm/rccl commit: ff7c9c4050]
2024-01-11 15:41:08 -07:00
dependabot[bot] 29b01e4b3b Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx (#870)
* Bump rocm-docs-core from 0.20.0 to 0.21.0 in /docs/sphinx

Bumps [rocm-docs-core](https://github.com/RadeonOpenCompute/rocm-docs-core) from 0.20.0 to 0.21.0.
- [Release notes](https://github.com/RadeonOpenCompute/rocm-docs-core/releases)
- [Changelog](https://github.com/RadeonOpenCompute/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/RadeonOpenCompute/rocm-docs-core/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* replace noCI with ci:docs-only label

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Sam Wu <sam.wu2@amd.com>

[ROCm/rccl commit: a433fcc726]
2023-08-30 08:56:38 -06:00
akolliasAMD f4b106ec94 removed codeowners file (#815)
[ROCm/rccl commit: 59db7b8722]
2023-07-20 12:50:54 -06:00
akolliasAMD f4dd02554c added codeowners file (#813)
[ROCm/rccl commit: 1ff1bb3397]
2023-07-20 12:16:50 -06:00
Sam Wu 5168be1867 Update Read the Docs, documentation, and dependabot (#772)
* update documentation

add version number to documentation

rename .sphinx/.doxygen to sphinx/doxygen

enable htmlzip, pdf, epub formats when publishing on Read the Docs

* add noCI label for dependabot PRs

since RTD CI is separate from math lib CI

* update rocm-docs-core to v0.13.4

* update README with link to rocm.docs.amd.com

[ROCm/rccl commit: c3f47853bd]
2023-06-07 15:31:58 -06:00
Saad Rahim 99e407ba76 Standardizing documentation homepage message (#726)
[ROCm/rccl commit: a78ff46861]
2023-04-16 18:14:56 -06:00
Sam Wu 382306e2e8 pin rocm-docs-core and add dependabot config (#722)
[ROCm/rccl commit: dc149a9fbd]
2023-04-11 10:01:24 -06:00