## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
- __Reduced Code Duplication__: Version parsing logic moved from individual Dockerfiles to the central build script
- __Improved Edge Case Handling__: Better handling of ROCm versions with and without patch numbers (e.g., `6.2` vs `6.2.0`)
- __Easier Maintenance__: Future version-related changes only need to be made in one place
- __Cleaner Dockerfiles__: Simplified Dockerfiles focus on package installation rather than complex shell logic
- __Updated Platform Support__: Refreshed container matrix to reflect current platform/ROCm version combinations
- __Fix OpenSUSE Docker Generation__: OpenSUSE container generation fails due to a change to the `binutils-gold` package
- __Error Handling__: Fix bug where errors in docker image build were being masked, allowing workflow to pass anyway.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
- Updated `Dockerfile.opensuse` and `Dockerfile.opensuse.ci` docker files to remove `binutils-gold`
- Not needed since we build `binutils` with systems anyways
- Updated `rocprofiler-systems-containers.yml` to remove `pushd/popd` commands and just run the shell scripts
- There was a silent failure observed here, which I verified in this PR before adding the fix for openSUSE
- Refactor ROCm version parsing. Move this logic to the `build-docker.sh` script to reduce duplication.
- Fix bug that caused ROCm 7.0 to fail installation. The trailing `.0` was being trimmed.
- Fixed inconsistencies in `containers.yml` that lead to invalid ROCm-OS_VERSION combinations.
- Formatting fixes
- Removed trailing whitespace
- Fix docker build warnings. Use an `=` rather than ` ` when assigning an environment variable.
Update Dyninst submodule
Refactoring of build scripts to build TBB, Boost, ElfUtils, and LibIberty, since Dyninst build scripts no longer do.
Workflows are now building Dyninst and its dependencies.
---------
Co-authored-by: marantic-amd <marantic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 96df9b6d3e]
- Bringing in recent changes from rocm-6.4 branch (https://github.com/ROCm/rocprofiler-systems/pull/171)
- Add libdrm-devel to rhel and suse files
- Update ROCm installation method in Ubuntu file
- Add additional output to `test-release.sh` to catch failures due to a Python version not included
- Add Python 3.13 to Dockers
[ROCm/rocprofiler-systems commit: 83a9eb3d7c]
- Renames the CMake option "ROCPROFSYS_USE_HIP" to "ROCPROFSYS_USE_ROCM"
- Remove the "ROCPROFSYS_USE_ROCM_SMI option. Controlled with the "ROCPROFSYS_USE_ROCM" option, instead.
- Runtime configuration can still toggle ROCPROFSYS_USE_ROCM_SMI to disable the sampling.
- Rename ROCPROFSYS_HIP_VERSION macro to ROCPROFSYS_ROCM_VERSION and remove blocks for `ROCPROFSYS_ROCM_VERSION < 60000`
- Remove ROCPROFSYS_USE_ROCTRACER and ROCPROFSYS_USE_ROCPROFILER
- Update test cases
- Update docker files and workflows to install cmake 3.21, which is required for the rocprofiler-sdk findPackage script.
- Removed rocm-6.2 from workflows due to a rocprofiler-sdk API change.
[ROCm/rocprofiler-systems commit: 88aa2d3cbe]
Updated OS test matrix to match ROCm 6.2.
Update build and CI docker files
Remove the "docs" workflow, because "read-the-docs" is now being used for ROCm documentation
[ROCm/rocprofiler-systems commit: b15c9e94fc]
The Omnitrace program is being renamed.
Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"
---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: d07bf508a9]
* Update perfetto args.gn.in
- remove enable_perfetto_tools_trace_to_text (unused)
* core timeout implementation
- requires OMNITRACE_CI=ON
- requires OMNITRACE_CI_TIMEOUT=<sec>
- adds pthread_self and std::this_thread::get_id to thread info
- pthread_create_gotcha stores native handles (pthread_self)
* Testing updates
- improve detection of segfault/failures with PASS_REGEX exists
- add OMNITRACE_CI_TIMEOUT env variable to all tests
* Line-info in releases
- e.g. -g1 + more options to minimize size of debug info
* Fix typo in config exit action message
* OMNITRACE_UNLIKELY around debug/verbose messages
* format fixes
* Overflow tests + capability check
* transpose example update
- link to threads library
* roctracer/rocprofiler update
- in ROCm 5.5.0, cannot include rocprofiler.h and roctracer.h in same file due to conflicting enum defs
- Moved HSA tracing setup/shutdown to component::roctracer
* roctracer update
- fix definition of roctracer::setup when disabled
* Update fork example
- detach threads on main PID
- flush io outputs when printing info
* Update overflow tests
- pass regular expressions
- overflow on PERF_COUNT_SW_CPU_CLOCK event
* fork gotcha update
- use getpid() instead of getppid()
* update fork example
- wait on threads calling fork
* timeout update
- wait on timeout thread to launch before proceeding
[ROCm/rocprofiler-systems commit: 3e2fa69a14]
* Fixes for Python 3.11
* Add python 3.11 to scripts
- also tweak to to{upper,lower} bash functions
* Fix PAPI RPM packaging in RedHat
- fix error from #!/usr/bin/python in papi_hl_output_writer.py
- requires either python2 or python3 instead of python
* cpack updates
- only generate STGZ for RedHat
- support `--generators` arg in build-release.sh
- support 7z, zip, and other zip generators
- fix build-release.sh with `--mpi`
- support setting CONDA_ROOT
* Support rhel/fedora/centos in omnitrace-install.py
* RedHat status badge
* Fix support for Python 3.11 + tweak ubuntu ci
- Remove installing clang and mpich in Ubuntu CI container
- Fallback on conda-forge for Python 3.11
- Enable entrypoint-rhel.sh for RHEL CI
- Pull latest container by default
* Update ElfUtils and PAPI builds
- quieter build output
- disable-nls for ElfUtils
- use -s flag for make
* Development Guide Docs
[ROCm/rocprofiler-systems commit: 83f9ed8696]
* Testing and CI support for Ubuntu 22.04
* Fixes for ROCm
- Jammy does not have ROCm installers
* Name, timeout, and python updates
- renamed ubuntu-jammy-external.yml to ubuntu-jammy.yml
- increased all 5 minute timeouts to 10 minutes
- include python 3.10 in testing
* Update dyninst to remove interposed definition of _r_debug
* Rebuild Dyninst + test install script
* Revert container change
* git safe directory
* pushd -> cd
* fix MPI include
* Fix testing step
* OMPI_ALLOW_RUN_AS_ROOT
* Test script changes
* Fix mismatched malloc / delete[]
* Jammy workflow tweaks
* CPack tweak for boost deb deps
* pthread_mutex_gotcha config returns when not enabled
* fix echoing config in CI
* USE_CLANG_OMP
- option to disable using LLVM OpenMP when building OpenMP test executables
- Jammy workflow sets USE_CLANG_OMP=OFF
* Dyninst submodule boost download
- updated containers workflow to include jammy
- updated workflow to use ci
* Updates to workflows + replace test-install.sh
- test-install.sh in this branch was replaced with one in main branch
* Expand jammy test-install.sh args
* Fix openmp-cg-sampling-duration test
* update timemory submodule
- use-after-free violation in popen::pclose
* revert some tweaks to sampling-duration test
* Fix env of test-install.sh
* cmake format
* jammy bash
* CPack install for jammy
* formatting workflow action version bump
* Update timemory submodule
- libunwind submodule via timemory sets SOVERSION to 99 to avoid ABI conflicts with v8
* Fix help menu for omnitrace-sample
* Support other boolean forms in test-install.sh
* Update docker files and build-docker.sh
- consolidated cases in build-docker.sh
- support rocm version of 0.0 (no rocm install)
- support rocm v5.3
- updated centos handling
* update opensuse actions/checkout version
* Tweaks to ubuntu-focal testing
- actions/checkout@v3
- use test-install script
* update cpack
- ubuntu 22.04
- rocm 5.3
- rename os matrix field to os-version
- remove CI_ROCM_VERSION (no longer necessary)
- remove default-rocm-version matrix field (no longer necessary)
- CentOS packaging
* fix argparsing and omnitrace-sample tests in install-tests.sh
* focal rocm test install workflow fix
* Fix omnitrace-sample build
* Dockerfile.centos + build-docker.sh updates
* Update actions/upload-artifact version
* Dockerfile.ubuntu: install rocm-device-libs
* Refactor cpack
* fix cpack if quotes
* Dockerfile.ubuntu rocm < 5 installs rocm-dev
* build-release.sh defaults to boost version 1.79.0
[ROCm/rocprofiler-systems commit: ede6007f9b]
- More to come in later commit, below is just tidying some stuff up
- clang-tidy
- mpi_gotcha quiet about not finding funcs
- update to new papi config
- sampling block_samples / unblock_samples
- disable calling component's sample functions within sampler
- release doesn't strip library
- remove HSA and ROCP env variables from modulefile / setup-env
- preliminary support for LD_PRELOAD usage
- default sampling rate is 300 interrupts / second
- fixes various deadlock issues at startup
[ROCm/rocprofiler-systems commit: 8f36620e29]
* Support multiple Python versions in single build
* RPATH + Split up config into config and runtime
* pybind11 submodule
* Docker build updates
[ROCm/rocprofiler-systems commit: 4db6ba3d28]
* omnitrace namespace
* Kokkos + Lulesh example/tests
* Sampling support + more
- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options
* Release workflow updates
* Updated settings printing
* Fixed defaults in README
* Tweak setting defaults in README
* CMake fixes
* cmake-format
* clang-format
* LULESH_USE_MPI OFF
* LULESH_USE_MPI fix
* timemory add_secondary fix
* timemory ambiguous internal namespace fix
* Update timemory submodule
* Handle output path/prefix in omnitrace
- updated timemory
- updated test environment
* sampling + papi fix
* Fix to sampling without PAPI
* Fix for using too many processors in CI
* formatting
* Updated CI
- minor cmake tweaks
- updated timemory submodule
* Updated CI
* Updated CI
* CI + timemory updates
- data race fixes
* CI updates + debug for sampling
* Sampling updates
- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality
* Minimum OMNITRACE_THREAD_COUNT of 128
* Handle multiple dims in sampler data
* Configure libunwind support for timemory
* Improved safeguards for sampling
- updated CI
- lulesh runtime-instrument test tweak
* formatting
* CI updates + sampler updates + misc
- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default
* Updated timemory submodule
- hidden visibility for timemory
- storage finalizers do not capture this
* Update timemory submodule
- component visibility updates
* Reworked header includes
- use <...> for timemory headers
- always include <library/defines.hpp>
* Rename some config options
* Update PTL submodule
* Update kokkos submodule
* Updated sampling
* Updated CI
* Reworked instrumentation exe
- lowered min-address-range threshold to 256
- extended whole function exclude
* CI fix + timemory submodule update
- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead
* Sampling flags + transpose update + CI update
- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads
* CI update
- removed ubuntu-focal-external-debug
- reduced data artifacts upload
* CI timeouts
- updated timemory submodule
- minor tweaks to omnitrace exe logging
* LICENSE updates (partial)
* CI Test stage timeout extension
* Docker and Packaging updates
* Miscellaneous fixes/tweaks
- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix
* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate
* cmake format
* Sampler deadlock fixes
* Removed debug messages from sampler
* Fix for MPI detection + test tweaks + misc
* Sampler deadlock fixes + misc
- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
- no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY
* omnitrace-avail + libunwind update + restructure
- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace
* Fix remaining reorganization issues
- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting
* ensure_storage fix + avail improvements
- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail
* Delay settings initialization
- slight tweak to tests w/ MPI
* Disable OpenMPI testing w/ ubuntu-bionic
- MPI testing is hanging bc of network interface issue on system:
> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
> Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.
[ROCm/rocprofiler-systems commit: 778af2a760]