Sets heavy GitHub CI workflows to not trigger on docs-only changes.
Specifically, sets azure-ci-dispatcher.yml and therock-ci.yml, as well as many rocprofiler workflows, to not trigger when the change consists entirely of docs-only files.
* Fix typo in matrix definition for aqlprofile-continuous_integration.yml
* Update ROCM_VERSION to 7.1.1
* Minor changes to core-rpm step
* Add working-directory to test steps
* Revert changes
* Add set -v to rpm test step
* Remove Python venv line from rpm test step
* [rocprofiler-sdk] Fix fmt::join build errors
- remedy use of fmt::join without include <fmt/ranges.h>
* include memory header
* Disable FMT build for SDK CI
* Add -DROCPROFILER_BUILD_FMT=OFF to sanitizer steps
* Add temporary workaround for rccl.h issue
* Add ROCPROFILER_INTERNAL_RCCL_API_TRACE to SDK CI builds
* disable clang-tidy for vendored includes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
* Add verbose output for submods step
* Remove git config setting
* Determine git version
* Try different git install
* Update Dockerfile.ci
* Revert git location in Ubuntu jobs
* Update RHEL and SLES sections to use 2.52 as well
* Add git --version to each step, fix typo in SLES Docker
* Update rocprofiler workflows to use new runner naming for mi325
* Add input options to workflow_dispatch for rocprofiler-systems CI workflow
* Update runner name on therock-ci-linux.yml as well
## Motivation
Missing CODEOWNERS for ROCProfiler-SDK
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
## Technical Details
Add CODEOWNERS for rocprofiler-sdk project
<!-- Explain the changes along with any relevant GitHub links. -->
## JIRA ID
<!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->
## Test Plan
<!-- Explain any relevant testing done to verify this PR. -->
## Test Result
<!-- Briefly summarize test outcomes. -->
## Submission Checklist
- [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
## Add Full Build Capability to theROCK for HIP
### Summary
This PR adds full build support to **theROCK** for HIP-related changes, ensuring that all components are built.
### Changes
- Enabled full build coverage for the following projects:
- `projects/clr`
- `projects/hip`
- `projects/hip-tests`
- `projects/rocr-runtime`
- Updated build configuration to include all targets for the above projects.
- Ensured rocm-libraries is pulled to build optional components.
### Motivation
These changes are required to support HIP development and testing within theROCK by ensuring all components are built together. This improves reliability, integration testing.
* Install rocm-dev in rocprofiler-compute-tarball.yml workflow
* Update paths for push and PR for rocprofiler-compute-tarball.yml
* Add ROCm dependencies to disttest job
* cmake fix binary link creation and fix format
* Use python3 instead of python3.9 in RHEL 8 and RHEL 9 workflows
* set default python3 to python3.9 in rhel8
* Try alternatives setup for python3 in RHEL8 env
* Add pip install cmake to debug RHEL8 issue
* Remove python3.11 in RHEL8 workflow
* Add back comment regarding RHEL8
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Following the pattern from ROCm/rocm-libraries#2679, add logic to skip
CI builds when only documentation files are modified.
Changes:
- Add SKIPPABLE_PATH_PATTERNS for docs, markdown, and .gitignore files
- Return empty projects list when only skippable paths are modified
- No workflow changes needed - existing projects != '[]' check handles it
- Add unit tests for doc-filtering logic
- Fix existing tests with proper subprocess mocking
Reference: https://github.com/ROCm/rocm-libraries/pull/2679
Although the value is correct; there is no source of truth between
kernel and userspace. This leads to problems if the kernel has strict
restrictions (such as kernel 6.17 or earlier). The restrictions were
lifted in 6.17.9 and and 6.18, but there is no guarantee userspace is
using this.
So short term this value will be wrong. But on newer kernels the kernel
will communicate the right size and rocr-runtime will be adjusted to
use that.
Link: https://github.com/ROCm/TheRock/pull/2505
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Add ls statement for debugging /opt directory file naming
* Update ROCM_VERSION from 7.0.0 to 7.1.1 in SDK CI
* Update amdgpu debian package for Ubuntu in Dockerfile.ci
* disable HIP/CLR build in codeql (#2242)
---------
Co-authored-by: Venkateshwar Reddy Kandula <Venkateshwarreddy.Kandula@amd.com>
* Initial work in progress for compute CI workflow
* Update run-ci.py script location, enable test creation
* Add new lines to files
* Add coverage file argument to run-ci.py
* Remove run-ci.py script usage from rocprofiler-compute-continuous-integration.yml workflow
* Add --break-system-packages parameter
* Add --ignore-installed to pip install
* Checkout specific branch until amdclang issue fixed in develop
* Add missing slash to path for cxx compiler
* Remove specific branch from checkout action
* Use run-ci.py in rocprofiler-compute-continuous-integration.yml
* Update install python requirements step
* Fix typo in build-name
* Update run-ci.py to have toggle for code coverage
* Apply ruff formatting
* Ruff again
* Exclude live attach detach and roofline tests in CI
* Add ctest args
* Revert run-ci.py changes
* Try new run-ci-2.py
* Update type of pytest-numprocs argument
* Try casting arg to str
* Fix typo in arg reference
* upgrade pip before running python installs
* Use jammy instead of noble for CI
* Remove python nproc arg from run-ci-2.py
* Switch to MI325 runners for CI
* Fix spacing issue
* Rename run-ci.py to run-code-coverage.py, add new run-ci.py
* Update to ROCm version 7.1.0 to debug sdk issues
* Testing out tarball install again
* Update regex on tarball version
* Update tarball regex on compute
* ruff formatting
* Revert change to systems CI file
* Switch back to rocm-dev install
* ruff formatting again
* Add ld_lib_path for rocm_sysdeps
* Remove excluded tests temporarily
* Add back excluded tests, add timeout for test step
* Address PR feedback
* Add git safe directory lines
* Revert dependencies change to debug new failures
* Exclude roofline again, rework dependencies
* Add in hip-runtime-amd dependency
* Install hip dev package
* Add TEST_FROM_INSTALL cmake arg to compute CI workflow
* Remove test_from_install for now
* Enable roofline tests again
## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
- __Reduced Code Duplication__: Version parsing logic moved from individual Dockerfiles to the central build script
- __Improved Edge Case Handling__: Better handling of ROCm versions with and without patch numbers (e.g., `6.2` vs `6.2.0`)
- __Easier Maintenance__: Future version-related changes only need to be made in one place
- __Cleaner Dockerfiles__: Simplified Dockerfiles focus on package installation rather than complex shell logic
- __Updated Platform Support__: Refreshed container matrix to reflect current platform/ROCm version combinations
- __Fix OpenSUSE Docker Generation__: OpenSUSE container generation fails due to a change to the `binutils-gold` package
- __Error Handling__: Fix bug where errors in docker image build were being masked, allowing workflow to pass anyway.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
- Updated `Dockerfile.opensuse` and `Dockerfile.opensuse.ci` docker files to remove `binutils-gold`
- Not needed since we build `binutils` with systems anyways
- Updated `rocprofiler-systems-containers.yml` to remove `pushd/popd` commands and just run the shell scripts
- There was a silent failure observed here, which I verified in this PR before adding the fix for openSUSE
- Refactor ROCm version parsing. Move this logic to the `build-docker.sh` script to reduce duplication.
- Fix bug that caused ROCm 7.0 to fail installation. The trailing `.0` was being trimmed.
- Fixed inconsistencies in `containers.yml` that lead to invalid ROCm-OS_VERSION combinations.
- Formatting fixes
- Removed trailing whitespace
- Fix docker build warnings. Use an `=` rather than ` ` when assigning an environment variable.
* Add WIP workflow step to delete untagged images older than 1 week
* Formatting fix for rocprofiler-systems-ghcr.yml
* Move step to new workflow
* Remove needs parameter from cleanup-rocprofiler-images
* Remove expand-packages option
* Expand cleanup for every OS
* Revert spacing change to rocprofiler-systems-ghcr.yml
* Turn off dry-run to do an initial clean
* Switch dry-run to be only on PR
* Added comment about schedule
* Use native tool for counter collection
* Add native counter collection tool which uses rocprofiler-sdk C++
library public API to get counter collection data
* This is enabled by default, unless --no-native-tool option is
provided or ROCPROF=rocprofv3 env. var. is provided
* This tool is only supported for ROCm version >=7.x.x
* This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
collection and data collected by native tool is merged into the
rocpd/csv output of rocprofiler-sdk tool
* Make `rocpd` choice the default choice for `--format-rocprof-output`
option
* If `rocpd` public API from rocprofiler-sdk library is not present,
then fallback to `csv` choice
* In this case only `pmc_perf.csv` is written in workload folder
instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
functions identical to `csv` option
* Rename option `--rocprofiler-sdk-library-path` to
`--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object
* Fix the meaning of `--dispatch` option in `profile` mode to mention
dispatch iteration filtering instead of dispatch id filtering
* --dispatch option in analyze mode does dispatch id filtering
* Move standalone binary creation logic from cmake file to docker file
* fix native counter collection tool during attach/detach
* improve logging
* fix attach detach with native tool
* fix attach detach with native tool
* do not support attach/detach in native tool
* Update changelog
* add standalone binary creation functionality in cmake
* address review comments
* address review comments
* fix formatting
* address review comments
* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Updating changelog per docs team recommendations
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Apply suggestions from code review to changelog
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* Do not required HIP complier to build native counter collection tool
* fix cmake
* gersemi formatting on latest cmake change
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* gersemi run for formatting
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* include cstdint for uint32_t
* Run formatting on helper.cpp
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr
* fix version
* fix changelog
* fix changelog
* run ruff formatter
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* fix rocprofiler-sdk attach so path
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
## Overview and rationale
This reverts https://github.com/ROCm/rocm-systems/pull/1886, which...
* Re-applies https://github.com/ROCm/rocm-systems/pull/1866
* Reverts https://github.com/ROCm/rocm-systems/pull/1728
(So it restores the [`amdgpu-windows-interop/`](https://github.com/ROCm/rocm-systems/tree/develop/shared/amdgpu-windows-interop) folder back to the state from a few weeks ago)
The rationale for this change is at https://github.com/ROCm/rocm-systems/pull/1866:
> Last PAL update broke applications on gfx12 Windows.
## Cross-repository change details
That PR failed to build but was merged with this explanation:
> TheRock CI Windows build fails as expected with this revert.
>
> References to these PAL members need to be stripped out in a patch on TheRock.
>
> ```
> 11.3 C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(152): error C2039: 'RegisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4 C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> 11.4 C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(195): error C2039: 'UnregisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4 C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> ```
The patch in TheRock was updated in https://github.com/ROCm/TheRock/pull/2154. This rolls forward by updating the ref for TheRock.
That original PR could have been sequenced differently to avoid a build break - perhaps by
* Pointing to a branch in TheRock with the patch rebased
* Deleting the patch in the workflows here but holding a local copy of the path to be applied in workflows
* Landing the patch as a normal commit instead of carrying it at all
## Test plan
1. Watch TheRock CI here (https://github.com/ROCm/rocm-systems/actions/runs/19447202693/job/55644411119?pr=1893)
2. Build locally:
```bash
# In rocm-systems
git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0001-Revert-SWDEV-543498-Some-compute-Ubertrace-profiles-.patch
git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0003-Use-is_versioned-true-consistently-in-both-Comgr-Loa.patch
git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0006-Explicitly-load-libamdhip64.so.7.patch
# Note: the build fails with the observed errors if patch 0001 is not applied!
# In TheRock
cmake -DCMAKE_BUILD_TYPE=Release \
-DCMAKE_C_COMPILER=cl.exe -DCMAKE_CXX_COMPILER=cl.exe \
-DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
-DPython3_EXECUTABLE=d:/projects/TheRock/.venv/Scripts/python \
-DTHEROCK_ROCM_SYSTEMS_SOURCE_DIR=d:/projects/TheRock/../rocm-systems \ # IMPORTANT
-DTHEROCK_AMDGPU_FAMILIES=gfx110X-all \
-DBUILD_TESTING=ON \
-DTHEROCK_ENABLE_ALL=ON \
-Damd-llvm_BUILD_TYPE=RelWithDebInfo \
-S D:/projects/TheRock \
-B D:/projects/TheRock/build \
-G Ninja
cmake --build D:/projects/TheRock/build --target hip-clr
# [build] Build finished with exit code 0
cmake --build D:/projects/TheRock/build --target ocl-clr+dist
# [build] Build finished with exit code 0
```
Replicating https://github.com/ROCm/TheRock/pull/2147#discussion_r2528008441
## Motivation
Fixes https://github.com/ROCm/TheRock/issues/875 which is the issue where Windows builds would fail randomly when uploading to s3 with the `SignatureDoesNotMatch` error as a result of special characters existing in the AWS Access Keys generated by the `configure-aws-credentials` action that is passed through Windows environment variables to `aws-cli`. More details below.
## Technical Details
https://github.com/ROCm/TheRock/issues/875#issuecomment-3530851762
In summary, in Windows workflows, the `special-characters-workaround` option is set to true for the `configure-aws-credentials` action which will regenerate access keys until there are no special characters that may not be passable through windows environment variables correctly.
## Test Plan
Observe CI.
## Test Result
TBD.
## Submission Checklist
- [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
* Update workflow files to use general public rocm dev build images from dockerhub.
Old method was to borrow rocprofiler-systems images but they do not contain rocm install anymore, so we cannot rely on them.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Add workflow files to paths on push and PR
* Revert change of image for red hat variant because the image offered in official rocm image release is too large for runners.
Going back to using systems team images and installing rocm on them (as they do) as a workaround until we can get a smaller package size docker image with ROCm included.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Adjusted python3-devel install line with an if else determined by distro version.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
* Fix cmake formatting
* Updated rev. in `.pre-commit-config.yaml`
* Pin the gersemi used in CI to v0.23.1, matching the pre-commit
---------
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* Add clean up of buffered_storage files
* Add step to workflows to test for remaining temp files after tests
* Applied suggestions from code review
* add deletion of all cache files
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* Update rocprofiler_config_interfaces.cmake to use different elf naming
* try out conditional for libelf
* run cmake-format to fix formatting issue
* Remove libelf.patch file from therock-ci-windows.yml
* Remove libelf patch from therock-ci-linux.yml as well