Commit Graph

228 Commits

Author SHA1 Message Date
Jason Bonnell 66ea1cdff2 Add workflow to remove old untagged rocprofiler GHCR Docker Images (#1959)
* Add WIP workflow step to delete untagged images older than 1 week

* Formatting fix for rocprofiler-systems-ghcr.yml

* Move step to new workflow

* Remove needs parameter from cleanup-rocprofiler-images

* Remove expand-packages option

* Expand cleanup for every OS

* Revert spacing change to rocprofiler-systems-ghcr.yml

* Turn off dry-run to do an initial clean

* Switch dry-run to be only on PR

* Added comment about schedule
2025-11-21 08:49:29 -05:00
amd-hsivasun adf6a5ec3b [Ex CI] amdsmi monorepo enablement (#1943)
* [Ex CI] amdsmi monorepo enablement

* [Ex CI] Add amdsmi pipeline to monorepo
2025-11-20 14:19:02 -05:00
vedithal-amd ae8f72fa79 [rocprofiler-compute] Use native tool for counter collection (#1212)
* Use native tool for counter collection

* Add native counter collection tool which uses rocprofiler-sdk C++
  library public API to get counter collection data
    * This is enabled by default, unless --no-native-tool option is
      provided or ROCPROF=rocprofv3 env. var. is provided
    * This tool is only supported for ROCm version >=7.x.x
    * This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
  t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
  collection and data collected by native tool is merged into the
  rocpd/csv output of rocprofiler-sdk tool

* Make `rocpd` choice the default choice for `--format-rocprof-output`
  option
    * If `rocpd` public API from rocprofiler-sdk library is not present,
      then fallback to `csv` choice
    * In this case only `pmc_perf.csv` is written in workload folder
      instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
  functions identical to `csv` option

* Rename option `--rocprofiler-sdk-library-path` to
  `--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
  rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object

* Fix the meaning of `--dispatch` option in `profile` mode to mention
  dispatch iteration filtering instead of dispatch id filtering
    * --dispatch option in analyze mode does dispatch id filtering

* Move standalone binary creation logic from cmake file to docker file

* fix native counter collection tool during attach/detach

* improve logging

* fix attach detach with native tool

* fix attach detach with native tool

* do not support attach/detach in native tool

* Update changelog

* add standalone binary creation functionality in cmake

* address review comments

* address review comments

* fix formatting

* address review comments

* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Updating changelog per docs team recommendations

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Apply suggestions from code review to changelog

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Do not required HIP complier to build native counter collection tool

* fix cmake

* gersemi formatting on latest cmake change

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* gersemi run for formatting

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* include cstdint for uint32_t

* Run formatting on helper.cpp

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr

* fix version

* fix changelog

* fix changelog

* run ruff formatter

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* fix rocprofiler-sdk attach so path

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-11-18 23:34:38 -05:00
Scott Todd fa772be675 Reapply amdgpu-windows-interop revert. (#1893)
## Overview and rationale

This reverts https://github.com/ROCm/rocm-systems/pull/1886, which...
* Re-applies https://github.com/ROCm/rocm-systems/pull/1866
* Reverts https://github.com/ROCm/rocm-systems/pull/1728

(So it restores the [`amdgpu-windows-interop/`](https://github.com/ROCm/rocm-systems/tree/develop/shared/amdgpu-windows-interop) folder back to the state from a few weeks ago)

The rationale for this change is at https://github.com/ROCm/rocm-systems/pull/1866:
> Last PAL update broke applications on gfx12 Windows.

## Cross-repository change details

That PR failed to build but was merged with this explanation:

> TheRock CI Windows build fails as expected with this revert.
> 
> References to these PAL members need to be stripped out in a patch on TheRock.
> 
> ```
> 11.3	C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(152): error C2039: 'RegisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(195): error C2039: 'UnregisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> ```

The patch in TheRock was updated in https://github.com/ROCm/TheRock/pull/2154. This rolls forward by updating the ref for TheRock.

That original PR could have been sequenced differently to avoid a build break - perhaps by
* Pointing to a branch in TheRock with the patch rebased
* Deleting the patch in the workflows here but holding a local copy of the path to be applied in workflows
* Landing the patch as a normal commit instead of carrying it at all

## Test plan

1. Watch TheRock CI here (https://github.com/ROCm/rocm-systems/actions/runs/19447202693/job/55644411119?pr=1893)
2. Build locally:
    
    ```bash
    # In rocm-systems
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0001-Revert-SWDEV-543498-Some-compute-Ubertrace-profiles-.patch
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0003-Use-is_versioned-true-consistently-in-both-Comgr-Loa.patch
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0006-Explicitly-load-libamdhip64.so.7.patch
    # Note: the build fails with the observed errors if patch 0001 is not applied!
    
    # In TheRock
    cmake -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_C_COMPILER=cl.exe -DCMAKE_CXX_COMPILER=cl.exe \
      -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
      -DPython3_EXECUTABLE=d:/projects/TheRock/.venv/Scripts/python \
      -DTHEROCK_ROCM_SYSTEMS_SOURCE_DIR=d:/projects/TheRock/../rocm-systems \  # IMPORTANT
      -DTHEROCK_AMDGPU_FAMILIES=gfx110X-all \
      -DBUILD_TESTING=ON \
      -DTHEROCK_ENABLE_ALL=ON \
      -Damd-llvm_BUILD_TYPE=RelWithDebInfo \
      -S D:/projects/TheRock \
      -B D:/projects/TheRock/build \
      -G Ninja
    
    cmake --build D:/projects/TheRock/build --target hip-clr
    # [build] Build finished with exit code 0
    cmake --build D:/projects/TheRock/build --target ocl-clr+dist
    # [build] Build finished with exit code 0
    ```
2025-11-18 07:17:06 -08:00
ammallya 53c9b9655d Adding amdsmi label (#1894) 2025-11-17 14:53:43 -08:00
jamessiddeley-amd d49e2e35fd [rocprof-compute] Automate ctest coverage and test cases on runners with CDash (#1481)
* Add nightly coverage workflow

* ruff formatting

* temp workflow testing

* restore workflow file

* add workflow condition

* update workflow file

* update workflow file

* fix typo in run-ci.py

* edit run-ci.py

* add python deps install

* add python deps install

* add python deps install

* add python deps install

* check if enable coverage is on when using workflow

* remove github CI breakdown and fix enable coverage

* set cache variables must be set before dashboard starts

* Update run-ci.py

* Update run-ci.py to fix ctest cache

* Update rocprofiler-compute-code-coverage.yml to install tests

* Update rocprofiler-compute-code-coverage.yml

* Restore workflow file

* Update run-ci.py

* Simplify workflow build command

* Update run-ci.py to build tests

* edited run-ci script

* edit ctest configure commands

* edit ctest configure commands to be on one line

* edit ctest configure command to include path to amdclang++

* update clang check in tests/cmakelists.txt

* update rocm

* update rocm

* update rocm version 7.0.2

* update tests/CMakeLists.txt

* use tarball instead for rocm install

* apt install rocm-dev instead for 7.0.0 release

* workflow tweaks

* update to use new 'tools' dir

* install rocm-dev

* add CMAKE_CXX_COMPILER as clang

* update tests/cmakelists.txt

* update cdasg site and build names

* remove run automatically on pull requests

* ruff format

* increased timeouts for tests

* add back reruns for workflow testing

* fix typo

* rename workflow "nightly" -> "code"

* added tracks to keep track of gpu (325 vs 355)

* remove test_db_connector.py

* revert build names and tracking

* update workflow pushes

* CMake format

* changed parallel level back to 1
2025-11-17 09:24:24 -05:00
JC c9dd49c48a Fix SignatureDoesNotMatch by aws credential option (#1867)
Replicating https://github.com/ROCm/TheRock/pull/2147#discussion_r2528008441
## Motivation

Fixes https://github.com/ROCm/TheRock/issues/875 which is the issue where Windows builds would fail randomly when uploading to s3 with the `SignatureDoesNotMatch` error as a result of special characters existing in the AWS Access Keys generated by the `configure-aws-credentials` action that is passed through Windows environment variables to `aws-cli`. More details below.

## Technical Details

https://github.com/ROCm/TheRock/issues/875#issuecomment-3530851762
In summary, in Windows workflows, the `special-characters-workaround` option is set to true for the `configure-aws-credentials` action which will regenerate access keys until there are no special characters that may not be passable through windows environment variables correctly.

## Test Plan

Observe CI.

## Test Result

TBD.

## Submission Checklist

- [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2025-11-14 14:04:54 -08:00
cfallows-amd 683a63d9ec Update rocprofiler-compute workflows (#1788)
* Update workflow files to use general public rocm dev build images from dockerhub.
Old method was to borrow rocprofiler-systems images but they do not contain rocm install anymore, so we cannot rely on them.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add workflow files to paths on push and PR

* Revert change of image for red hat variant because the image offered in official rocm image release is too large for runners.
Going back to using systems team images and installing rocm on them (as they do) as a workaround until we can get a smaller package size docker image with ROCm included.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

Adjusted python3-devel install line with an if else determined by distro version.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
2025-11-10 20:48:39 -05:00
amd-hsivasun 946eacdd4a [Ex CI] Disable hip-tests pipeline (#1785) 2025-11-10 17:33:42 -05:00
Aleksandar Djordjevic f39a60ac25 [rocprofiler-systems] Apply new CMake formatting for the latest gersemi version (#1778)
* Fix cmake formatting

* Updated rev. in `.pre-commit-config.yaml`

* Pin the gersemi used in CI to v0.23.1, matching the pre-commit

---------

Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-10 13:08:44 -05:00
Milan Radosavljevic d9b00da102 Add clean up of buffered_storage files (#1738)
* Add clean up of buffered_storage files

* Add step to workflows to test for remaining temp files after tests

* Applied suggestions from code review

* add deletion of all cache files

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-07 11:51:09 -05:00
Jason Bonnell 6e195ded9b Update rocprofiler_config_interfaces.cmake to use different elf naming (#1722)
* Update rocprofiler_config_interfaces.cmake to use different elf naming

* try out conditional for libelf

* run cmake-format to fix formatting issue

* Remove libelf.patch file from therock-ci-windows.yml

* Remove libelf patch from therock-ci-linux.yml as well
2025-11-06 23:50:02 -05:00
David Galiffi 89cf46eb55 Removing jlumbroso/free-disk-space action from workflows (#1700) 2025-11-06 18:11:09 -05:00
Joseph Macaranas 524f62ae67 TheRock CI Workflow Updates 20251106 (#1743)
- Update the pinned SHA for TheRock in CI workflows.
- Update the version for actions in those same workflows.
- Comment out the rm .patch line and provide details on its use.
2025-11-06 12:06:44 -05:00
alexxu-amd a330fb6b91 fix latest docs doesn't get synchronized issue (#1714) 2025-11-05 17:08:19 -05:00
David Galiffi 1e501dd89a Free runner disk space (#1693)
Motivation:
Basic runners are frequently running out of space

Technical Details:
Running autoclean after package installations.
Use the jlumbroso/free-disk-space action.
2025-11-04 17:31:23 -05:00
Joseph Macaranas b19cf0aadf Revert "Disable therock summary check, make it always positive (#1675)" (#1686)
This reverts commit 0c32b90130.
2025-11-04 14:17:37 -05:00
Danylo Lytovchenko 0c32b90130 Disable therock summary check, make it always positive (#1675) 2025-11-04 12:58:53 +01:00
MachineTom fb006546d0 SWDEV-1 - Fix a typo (#1615)
* SWDEV-1 - Fix a typo

Fix a typo.
Remove unnecessary log.

* Removing patch

---------

Co-authored-by: geomin12 <geomin12@amd.com>
Co-authored-by: Scott Todd <scott.todd0@gmail.com>
2025-11-03 12:59:00 -08:00
Ammar ELWazir da297d46e8 [ROCProfiler-sdk] [Docs CI] Refactor Git setup and CMake commands in workflow (#1662) 2025-11-03 12:12:35 -06:00
Ammar ELWazir 9fa1d1b97e [ROCProfiler-SDK] Remove 'gfx900' and 'gfx940' from GPU targets (#1661)
* [ROCProfiler-SDK] Remove 'gfx900' and 'gfx940' from GPU targets

* Remove unsupported GPU targets from workflow

* Remove gfx900 and gfx940 from GPU targets
2025-11-03 11:09:29 -05:00
sluzynsk-amd 9f940c7265 Add missing API calls to rocprofiler (#1599)
Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>
2025-11-03 09:40:16 -06:00
Ammar ELWazir fee5bd9a4e Fixing ROCProfiler Register CI & ROCProfiler-SDK Docs CI (#1570)
---------

Co-authored-by: bgopesh <gopesh.bhardwaj@amd.com>
2025-11-03 09:24:32 -06:00
Geo Min 8e98b80deb [TheRock CI] Fixing patches for rocm-systems (#1460)
* Fixing patches for rocm-systems

* Adding all

* Adding remaining projects

* Submodule bump

* adding compiler

* adding test commit hash

* Adding artifact group

* adding update for artifact group

* Adding new commit hash
2025-10-28 19:47:17 -07:00
Venkateshwar Reddy Kandula c5bd693478 [rocprofiler-sdk] Disable HIP/CLR build in rocprofiler-sdk CI jobs (#1574)
* disable HIP/CLR build

* misc. fix
2025-10-28 11:42:11 -05:00
Milan Radosavljevic 8806be162c Change how cache manager handles child process trace cache for rocpd (#1033)
* Change how cache manager handles child process trace cache

* Sampling and backtrace metrics to cache

* Apply cmake formatting

* Fix parsing of metadata json

* Code clean up

* Fix build nlohmann json from source

* Fix storage parsed finished callback

* Revert sampling for child process

* Change cache file name generating

* Fix thread start stop

* Fix process start end timestamp

* Applied suggestions from code review

* Try with late start of flushing task thread

* Change dockerfiles for ci

* Revert changes on github workflows

* Remove json_fwd.hpp include

* fix dump

* Build nlohmann/json by default

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update location of build artifacts for nlohmann/json

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Revert use_output_suffix

* Remove unused logs

* Fix cache store inside counter due to structure change

* Remove decode tests from debian ci

* Fix issue where all databases have the same UUID (#1499)

Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>

* Removing the cpack and install steps to save space

* Revert "Remove decode tests from debian ci"

This reverts commit ddabf6dd142dcf438e6b8997b8abe86f2c868468.

* Revert "Removing the cpack and install steps to save space"

This reverts commit 973da3a1ba99d99d529af5269d30e177092f9bfa.

* Add prepare-runner job as dependency to clean up the space

* Fix formatting

* Free up even more space

* Remove verbose for workflows

* remove hw_counters from ext_data

* move space clean up inside container

* try to remove external folder to free up space

* Check space

* Refactor Cleanup to it's own step

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
2025-10-24 11:47:15 -04:00
amd-hsivasun 43687b24f8 [Github Actions] Added monorepo_source_of_truth flag (#1525) 2025-10-23 16:37:12 -04:00
Venkateshwar Reddy Kandula 8c89ed8ab1 [rocprofiler-sdk][CI] Use rock infra for rocprofiler-sdk build docs jobs (#1518)
* Initial changes to move build docs job to rock infra

* misc. fix

* clean up code.
2025-10-23 11:17:13 -05:00
Venkateshwar Reddy Kandula 4f590499c6 [rocprofiler-sdk] Fix rocm-release compatibility latest (#1479)
* Update rocprofiler-sdk-rocm_release_compatibility.yml

* apply Copilot

* addr comments

* remove 6.2 requirements. 6.2 now can use normal Install requirements step
2025-10-21 21:45:18 -05:00
Mythreya Kuricheti 65d4ff9d04 [CI][rocprofiler-compute] Fix rhel python deps (#1370)
Install `python39-devel` dependency for pandas. Fixes build on RHEL 8.10.
2025-10-21 08:28:57 -07:00
Fei Zheng 2c59a82fe1 Fix rocprof-compute TUI build err with python 39 (#303)
* Upgrade min python version from 3.8 to 3.9

* Set min version for textual-fspicker for TUI support

* Update workflows to use python 3.9 instead of 3.8

* fix formatting

* fix bug

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-10-21 00:27:35 -04:00
David Galiffi 32f9fa6ca5 Enable some simple ROCpd testing (#834)
* Add for rocpd testing and output validation

Add for transpose, video-decode, jpeg-decode, roctx, and openmp-target
Add JSON check to pre-commit-config

Co-authored-by: Marjan Antic <Marjan.Antic@amd.com>

* Remove redundant environment variable

* Fix spelling typo

* Fix typo in error message

* Fix memory_allocation query

* Incorperate feedback from review. Handle case where there are multiple matching "name_prefix" tables.

* Fix environment settings in `rocprof-sys-testing.cmake`

Accidently removed in previous refactoring.

* Formatting python file

---------

Co-authored-by: Marjan Antic <Marjan.Antic@amd.com>
2025-10-20 17:40:10 -04:00
alexxu-amd 55baf27627 [CI] Copy over docs update workflow from rocm-libraries (#1400)
* migrate docs update workflow from rocm-libraries

* add test branch to the trigger condition

* modify docs to test workflow

* temporarily rename project folder name to match the test project

* add more content for testing

* test successful, restore test modifications
2025-10-17 13:47:28 -04:00
Jason Bonnell 9664f1dc91 [rocprofiler-systems] Add retries to RHEL install steps (#1384)
* Add GHCR retry logic

* Add retries to Install ROCm Packages step in rocprofiler-systems-redhat.yml

* Update containers-ci.yml file to use latest RHEL9/10 releases

* Use build-docker-ci script in rocprofiler-systems-containers

* Remove working-directory from step in rocprofiler-systems-redhat.yml

* Remove shell bash from Install ROCm Packages step

* Revert RHEL version change in rocprofiler-systems-redhat.yml
2025-10-17 10:20:54 -04:00
Venkateshwar Reddy Kandula 9404178ea5 [rocprofiler-sdk][CI] rhel sles workflow fix (#1373)
* bug fix.

* add backslash

* add export for path, bug
2025-10-15 11:48:59 -05:00
Mythreya Kuricheti ac8adbacff [CI][rocprofiler-sdk] Fix codeql jobs (#1366) 2025-10-15 10:34:29 -05:00
Mythreya Kuricheti 765d9026c7 [CI][rocprofiler-sdk] Workflow improvements (#1341) 2025-10-14 15:21:55 -05:00
Mythreya Kuricheti fd82a185c2 [CI][rocprofiler-sdk] Add HIP build to CI (#1311) 2025-10-08 21:37:42 -05:00
Geo Min 388edb1b57 [TheRock CI] Adding profiler builds (#1301)
* Adding profiler for TheRock CI

* adding temp test for rocproiler

* Removing subtrees

* PR comment
2025-10-08 14:38:38 -07:00
Jason Bonnell cccc350dc6 [rocprofiler-systems] Add different test coverage for CI/Nightly, add better logging for failures (#1272)
* Try outputting LastTest.log

* Update if condition for outputting log

* Another attempt

* Only run Ubuntu Noble on MI355 in push/PR

* Try exclude matrix

* Move conditional statement in matrix exclusion

* Create ci-matrix.yml file

* Add needs parameter to ubuntu job

* Fix typo in matrix output variable

* Add back pull_request_template.md

* Add back pull_request_template.md
2025-10-08 15:18:56 -04:00
David Galiffi d6bdc53f1a Update rocprofiler-systems-continuous-integration.yml (#1271)
Disabling network test from CI while we investigate it's instability.
2025-10-07 18:55:30 -04:00
Venkateshwar Reddy Kandula 952d1dabe2 [ROCProfiler-SDK][ROCR] HSA New API changes for HSA_AMD_EXT_API_TABLE_STEP_VERSION 8 (#1182)
* add new hsa ext api for version 8.

* use fmt instead of ostream.

* override rccl from therock

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* enable rocr-build

* format

* disable att consecutive-kernels tests.

* Enable ROCR build in code coverage workflow

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
2025-10-06 13:09:39 -05:00
Jason Bonnell ad78611674 rocprofiler-systems Nightly and CI on Ubuntu Jammy/Noble on MI355 and MI325 (#997)
* Initial steps added for rocprofiler-systems-continuous-integration.yml

* Add new line to end of rocprofiler-systems-continuous-integration.yml

* Fix matrix issue in rocprofiler-systems CI workflow

* Update runner to use mi355

* Remove sudo from ROCm download step

* Add Python venv

* Try to install python venv

* Add -y to pip venv install commands

* Add shell: bash to download ROCm step

* Fix issue in if statement

* Fix typo in mv command

* Fix mv command

* Update paths

* add directory in install step

* Use default runner for now while debugging setup

* Add set -e to steps

* debug build step

* Add amdgpu install step

* remove working-directory from amdgpu install step

* add path/ld lib path, add -S argument to run-ci.py

* Fix typo in DCMAKE_PREFIX_PATH

* Add DGPU_TARGETS to run-ci.py command

* add Docker options, remove GPU_TARGETS

* Install amd-smi-lib

* Add DCMAKE_BUILD_TYPE, update path

* Remove mkdir

* Add build dynist cmake arguments

* Update cmake arguments again

* Add missing \ to run-ci.py command

* add libdw dependency

* Add later install step

* Increase timeout of configure/build/test step

* use 16 jobs to try and speed up pipeline time

* Add GHCR image, remove TheRock tarball download step, minor changes for debugging

* Add credentials to container portion of step

* Add package read permissions to ubuntu step

* Update tarball name

* Increase jobs to 16, disable some tests for now due to timeouts

* Modify to only include gpu tests

* Fix configuration

* Enable MPI on run-ci.py run

* Add install MPI step, changed tests to be run

* Enable OMPI flags, enable network counter access

* Use new Docker image names, add privileged option to Docker

* Change cmake build type

* Add fail-fast false option for CI

* Update ROCM_VERSION variable to reflect docker changes

* Specify TARBALL_ROCM_VERSION as separate

* Add MI325 to debug pipeline errors

* Move location of env variables

* Only test on jammy for now, run all tests to assess other issues

* test with branch that contains fix for openmp

* Exclude "ompvv"

We will re-add one ticket is fixed.

* Test: Disable USE_MPI

* Replace TheRock ROCm install with rocm-dev for now

* Try out MI355 noble and MI325 for jammy/noble

* Update amdgpu step to support different ROCm versions

* Remove unused env variables

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-10-06 11:40:58 -04:00
Jason Bonnell f0fd2797b6 Add rocm-version 7.0 to rocprofiler-systems workflows (#1139)
* Adding rocm 7.0 to Ubuntu, Red Hat, and Debian workflows

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-10-03 13:16:21 -04:00
Geo Min 36a1fd87af Removing landed patch (#1184) 2025-09-30 16:51:41 -07:00
Geo Min b0a9a2386f [ci] Adding TheRock CI coverage for rocm-core (#868)
* TheRock CI points to rocm systems

* Fixing depth

* Fixing cache path

* Adding core components

* Adding more packages

* try this for windows building

* Add math libs

* Adding core only

* Attempt with no ccache

* adding patching

* Adding ls test

* adding this

* removing ls test

* changing dir name

* Adding cleanup for patch

* Adding ref

* adding correct no include

* Adding new temp branch for testing

* empty commit

* empty commit

* Adding commit hash bump

* Adding new hash for removed patches

* Adding TheRock submodule bump

* trying with compiler removed test

* Try dvc pull windows

* Update .github/workflows/therock-ci-linux.yml

Co-authored-by: Marius Brehler <marius.brehler@gmail.com>

* Adding correct env

* revert to ../

* Adding path

* try new var

* Adding new branch

* Adding correct hash

* Update .github/workflows/therock-ci-linux.yml

Co-authored-by: Marius Brehler <marius.brehler@gmail.com>

* Update .github/workflows/therock-ci-windows.yml

Co-authored-by: Marius Brehler <marius.brehler@gmail.com>

---------

Co-authored-by: Marius Brehler <marius.brehler@gmail.com>
2025-09-30 16:08:50 -07:00
Jason Bonnell 953fd60e9b rocprofiler GHCR Rename (#1112)
- Rename the GHCR packages for rocprofiler Docker images to reduce the number of packages that will be released on the repository
- Changed package name to only include the OS instead of OS+Version - version moved to the tag instead.
- Updated Dockerfile.*.ci files to specify target ROCm version from tarball in name.
2025-09-30 15:15:12 -04:00
Jason Bonnell cec7ce77d6 Add sudo apt-get update command to workflow (#1177)
- 404 Not Found errors when trying to download dependencies in the Get the latest therock build step. Adding `sudo apt-get update` command first to avoid this.
- Added `sudo apt-get update` to the rocprofiler-sdk-build-ci-docker-images.yml workflow.
2025-09-30 14:09:36 -04:00
Laura Promberger fb3677cad6 fetch_sources: replace flags that the newer version recognizes (#1148)
* fetch_sources: replace flags that the newer version recognizes

* fetch_sources: remove --no-include-rocm-libraries
2025-09-26 11:36:28 -07:00
amd-hsivasun c16b06a7d7 [Ex CI] Enable aqlprofile (#1002)
* [Ex CI] Enable aqlprofile

* [Ex CI] Added PipelineID for aqlprofile
2025-09-26 14:00:41 -04:00