Motivation:
Basic runners are frequently running out of space
Technical Details:
Running autoclean after package installations.
Use the jlumbroso/free-disk-space action.
* [ROCProfiler-SDK] Remove 'gfx900' and 'gfx940' from GPU targets
* Remove unsupported GPU targets from workflow
* Remove gfx900 and gfx940 from GPU targets
* Change how cache manager handles child process trace cache
* Sampling and backtrace metrics to cache
* Apply cmake formatting
* Fix parsing of metadata json
* Code clean up
* Fix build nlohmann json from source
* Fix storage parsed finished callback
* Revert sampling for child process
* Change cache file name generating
* Fix thread start stop
* Fix process start end timestamp
* Applied suggestions from code review
* Try with late start of flushing task thread
* Change dockerfiles for ci
* Revert changes on github workflows
* Remove json_fwd.hpp include
* fix dump
* Build nlohmann/json by default
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Update location of build artifacts for nlohmann/json
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Revert use_output_suffix
* Remove unused logs
* Fix cache store inside counter due to structure change
* Remove decode tests from debian ci
* Fix issue where all databases have the same UUID (#1499)
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
* Removing the cpack and install steps to save space
* Revert "Remove decode tests from debian ci"
This reverts commit ddabf6dd142dcf438e6b8997b8abe86f2c868468.
* Revert "Removing the cpack and install steps to save space"
This reverts commit 973da3a1ba99d99d529af5269d30e177092f9bfa.
* Add prepare-runner job as dependency to clean up the space
* Fix formatting
* Free up even more space
* Remove verbose for workflows
* remove hw_counters from ext_data
* move space clean up inside container
* try to remove external folder to free up space
* Check space
* Refactor Cleanup to it's own step
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
* Upgrade min python version from 3.8 to 3.9
* Set min version for textual-fspicker for TUI support
* Update workflows to use python 3.9 instead of 3.8
* fix formatting
* fix bug
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
* migrate docs update workflow from rocm-libraries
* add test branch to the trigger condition
* modify docs to test workflow
* temporarily rename project folder name to match the test project
* add more content for testing
* test successful, restore test modifications
* Add GHCR retry logic
* Add retries to Install ROCm Packages step in rocprofiler-systems-redhat.yml
* Update containers-ci.yml file to use latest RHEL9/10 releases
* Use build-docker-ci script in rocprofiler-systems-containers
* Remove working-directory from step in rocprofiler-systems-redhat.yml
* Remove shell bash from Install ROCm Packages step
* Revert RHEL version change in rocprofiler-systems-redhat.yml
* Try outputting LastTest.log
* Update if condition for outputting log
* Another attempt
* Only run Ubuntu Noble on MI355 in push/PR
* Try exclude matrix
* Move conditional statement in matrix exclusion
* Create ci-matrix.yml file
* Add needs parameter to ubuntu job
* Fix typo in matrix output variable
* Add back pull_request_template.md
* Add back pull_request_template.md
* Initial steps added for rocprofiler-systems-continuous-integration.yml
* Add new line to end of rocprofiler-systems-continuous-integration.yml
* Fix matrix issue in rocprofiler-systems CI workflow
* Update runner to use mi355
* Remove sudo from ROCm download step
* Add Python venv
* Try to install python venv
* Add -y to pip venv install commands
* Add shell: bash to download ROCm step
* Fix issue in if statement
* Fix typo in mv command
* Fix mv command
* Update paths
* add directory in install step
* Use default runner for now while debugging setup
* Add set -e to steps
* debug build step
* Add amdgpu install step
* remove working-directory from amdgpu install step
* add path/ld lib path, add -S argument to run-ci.py
* Fix typo in DCMAKE_PREFIX_PATH
* Add DGPU_TARGETS to run-ci.py command
* add Docker options, remove GPU_TARGETS
* Install amd-smi-lib
* Add DCMAKE_BUILD_TYPE, update path
* Remove mkdir
* Add build dynist cmake arguments
* Update cmake arguments again
* Add missing \ to run-ci.py command
* add libdw dependency
* Add later install step
* Increase timeout of configure/build/test step
* use 16 jobs to try and speed up pipeline time
* Add GHCR image, remove TheRock tarball download step, minor changes for debugging
* Add credentials to container portion of step
* Add package read permissions to ubuntu step
* Update tarball name
* Increase jobs to 16, disable some tests for now due to timeouts
* Modify to only include gpu tests
* Fix configuration
* Enable MPI on run-ci.py run
* Add install MPI step, changed tests to be run
* Enable OMPI flags, enable network counter access
* Use new Docker image names, add privileged option to Docker
* Change cmake build type
* Add fail-fast false option for CI
* Update ROCM_VERSION variable to reflect docker changes
* Specify TARBALL_ROCM_VERSION as separate
* Add MI325 to debug pipeline errors
* Move location of env variables
* Only test on jammy for now, run all tests to assess other issues
* test with branch that contains fix for openmp
* Exclude "ompvv"
We will re-add one ticket is fixed.
* Test: Disable USE_MPI
* Replace TheRock ROCm install with rocm-dev for now
* Try out MI355 noble and MI325 for jammy/noble
* Update amdgpu step to support different ROCm versions
* Remove unused env variables
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* TheRock CI points to rocm systems
* Fixing depth
* Fixing cache path
* Adding core components
* Adding more packages
* try this for windows building
* Add math libs
* Adding core only
* Attempt with no ccache
* adding patching
* Adding ls test
* adding this
* removing ls test
* changing dir name
* Adding cleanup for patch
* Adding ref
* adding correct no include
* Adding new temp branch for testing
* empty commit
* empty commit
* Adding commit hash bump
* Adding new hash for removed patches
* Adding TheRock submodule bump
* trying with compiler removed test
* Try dvc pull windows
* Update .github/workflows/therock-ci-linux.yml
Co-authored-by: Marius Brehler <marius.brehler@gmail.com>
* Adding correct env
* revert to ../
* Adding path
* try new var
* Adding new branch
* Adding correct hash
* Update .github/workflows/therock-ci-linux.yml
Co-authored-by: Marius Brehler <marius.brehler@gmail.com>
* Update .github/workflows/therock-ci-windows.yml
Co-authored-by: Marius Brehler <marius.brehler@gmail.com>
---------
Co-authored-by: Marius Brehler <marius.brehler@gmail.com>
- Rename the GHCR packages for rocprofiler Docker images to reduce the number of packages that will be released on the repository
- Changed package name to only include the OS instead of OS+Version - version moved to the tag instead.
- Updated Dockerfile.*.ci files to specify target ROCm version from tarball in name.
- 404 Not Found errors when trying to download dependencies in the Get the latest therock build step. Adding `sudo apt-get update` command first to avoid this.
- Added `sudo apt-get update` to the rocprofiler-sdk-build-ci-docker-images.yml workflow.
This will update the CI to match TheRocks CI by introducing to use a
python script to report on health status. Commands that were in the
section that modified the status were moved to separate sections
(ccache, git config, ..).
Related PR:
https://github.com/ROCm/TheRock/pull/1516
* Initial skeleton code for rocprofiler-systems-continuous-integration.yml
* Add python3-devel to opensuse and rhel ci images
* Update rocprofiler-systems-containers.yml to include TheRock tarballs
* Update pip install command for Dockerfile.ubuntu.ci
* Fix pip install again for Dockerfile.ubuntu.ci
* Remove skeleton workflow for CI
* Add new ci-gfx containers for TheRock installs
* Add set -e and pipefail to ci Dockerfiles to detect errors
* Upgrade pip in Dockerfile.ubuntu.ci
* revert pipefail set -e change
* Replace build-docker-ci.sh script with Docker step for ci-base
* Add support for gfx950, add containers-ci-gfx.yml
* Add working-directory to matrix setup steps
* Try changing containers-ci-gfx.yml
* make more changes to containers-ci-gfx.yml
* Remove build-docker-ci.sh script from gfx step, fix typo in Dockerfile
* Remove gfx110X and gfx120X for now
* Update ci-gfx docker workflow to use ghcr.io
* Temporary change to test one image
* Enable push to test out ghcr package
* Add labels to debug oauth issue
* add pacakages permissions to step
* add rocprofiler-systems-ghcr.yml workflow
* Remove cache from Docker push action step
* Add prefix to tag
* Add back gfx94X and gfx950 support, add back no push on PR
* Remove gfx container creation from rocprofiler-systems-containers.yml
* Add a gfx950 image for now
* Revert change
* Replace cmake-format with gersemi in rocprofiler-compute-formatting.yml
* Run gersemi formatting on CMakeLists.txt files
* Remove .cmake-format.yaml, add .gersemirc file
* Add more options to .gersemirc
* Add new line to .gersemirc
* Add new line to CMakeLists.txt
* Run gersemi again with new options
* Added Fortran (amdflang) openmp tests using the openmp-vv project
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
ROCProfiler-Register/Systems/Compute: The license file name in the CMake install module and other locations was originally LICENSE, but it was recently changed to LICENSE.md, requiring an update to the CMake install module and all other relevant locations.