## Motivation
Enable UCX communication tracing and communication metadata
## Technical Details
Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.
- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles to include UCX dependencies.
## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
- __Reduced Code Duplication__: Version parsing logic moved from individual Dockerfiles to the central build script
- __Improved Edge Case Handling__: Better handling of ROCm versions with and without patch numbers (e.g., `6.2` vs `6.2.0`)
- __Easier Maintenance__: Future version-related changes only need to be made in one place
- __Cleaner Dockerfiles__: Simplified Dockerfiles focus on package installation rather than complex shell logic
- __Updated Platform Support__: Refreshed container matrix to reflect current platform/ROCm version combinations
- __Fix OpenSUSE Docker Generation__: OpenSUSE container generation fails due to a change to the `binutils-gold` package
- __Error Handling__: Fix bug where errors in docker image build were being masked, allowing workflow to pass anyway.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
- Updated `Dockerfile.opensuse` and `Dockerfile.opensuse.ci` docker files to remove `binutils-gold`
- Not needed since we build `binutils` with systems anyways
- Updated `rocprofiler-systems-containers.yml` to remove `pushd/popd` commands and just run the shell scripts
- There was a silent failure observed here, which I verified in this PR before adding the fix for openSUSE
- Refactor ROCm version parsing. Move this logic to the `build-docker.sh` script to reduce duplication.
- Fix bug that caused ROCm 7.0 to fail installation. The trailing `.0` was being trimmed.
- Fixed inconsistencies in `containers.yml` that lead to invalid ROCm-OS_VERSION combinations.
- Formatting fixes
- Removed trailing whitespace
- Fix docker build warnings. Use an `=` rather than ` ` when assigning an environment variable.
- Add `nlohmann-json-dev` (or equivalent) to CI Docker images for RHEL, SUSE, and Ubuntu.
- Add `gmock-dev` and `gtest-dev` (or equivalent) to CI Docker images for RHEL, SUSE, and Ubuntu.
- Add `--set solver classic` to conda config to resolve an issue setting up the conda environment
- Fix Perfetto package installation on ubuntu noble image.
- Add a check and log error if pip installation fail
---------
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
- Rename the GHCR packages for rocprofiler Docker images to reduce the number of packages that will be released on the repository
- Changed package name to only include the OS instead of OS+Version - version moved to the tag instead.
- Updated Dockerfile.*.ci files to specify target ROCm version from tarball in name.
* Initial skeleton code for rocprofiler-systems-continuous-integration.yml
* Add python3-devel to opensuse and rhel ci images
* Update rocprofiler-systems-containers.yml to include TheRock tarballs
* Update pip install command for Dockerfile.ubuntu.ci
* Fix pip install again for Dockerfile.ubuntu.ci
* Remove skeleton workflow for CI
* Add new ci-gfx containers for TheRock installs
* Add set -e and pipefail to ci Dockerfiles to detect errors
* Upgrade pip in Dockerfile.ubuntu.ci
* revert pipefail set -e change
* Replace build-docker-ci.sh script with Docker step for ci-base
* Add support for gfx950, add containers-ci-gfx.yml
* Add working-directory to matrix setup steps
* Try changing containers-ci-gfx.yml
* make more changes to containers-ci-gfx.yml
* Remove build-docker-ci.sh script from gfx step, fix typo in Dockerfile
* Remove gfx110X and gfx120X for now
* Update ci-gfx docker workflow to use ghcr.io
* Temporary change to test one image
* Enable push to test out ghcr package
* Add labels to debug oauth issue
* add pacakages permissions to step
* add rocprofiler-systems-ghcr.yml workflow
* Remove cache from Docker push action step
* Add prefix to tag
* Add back gfx94X and gfx950 support, add back no push on PR
* Remove gfx container creation from rocprofiler-systems-containers.yml
* Add a gfx950 image for now
* Revert change
- Bringing in recent changes from rocm-6.4 branch (https://github.com/ROCm/rocprofiler-systems/pull/171)
- Add libdrm-devel to rhel and suse files
- Update ROCm installation method in Ubuntu file
- Add additional output to `test-release.sh` to catch failures due to a Python version not included
- Add Python 3.13 to Dockers
[ROCm/rocprofiler-systems commit: 83a9eb3d7c]
* Update runners to `ubuntu-latest`.
The `ubuntu-20.04` runner is deprecated and will be removed.
* Add 'vim' and 'perfetto' to CI docker images
For convenience when using the images locally.
[ROCm/rocprofiler-systems commit: a25554359b]
- Renames the CMake option "ROCPROFSYS_USE_HIP" to "ROCPROFSYS_USE_ROCM"
- Remove the "ROCPROFSYS_USE_ROCM_SMI option. Controlled with the "ROCPROFSYS_USE_ROCM" option, instead.
- Runtime configuration can still toggle ROCPROFSYS_USE_ROCM_SMI to disable the sampling.
- Rename ROCPROFSYS_HIP_VERSION macro to ROCPROFSYS_ROCM_VERSION and remove blocks for `ROCPROFSYS_ROCM_VERSION < 60000`
- Remove ROCPROFSYS_USE_ROCTRACER and ROCPROFSYS_USE_ROCPROFILER
- Update test cases
- Update docker files and workflows to install cmake 3.21, which is required for the rocprofiler-sdk findPackage script.
- Removed rocm-6.2 from workflows due to a rocprofiler-sdk API change.
[ROCm/rocprofiler-systems commit: 88aa2d3cbe]
* Update cmake version installed in dockerfiles
* Standardize the cmake_minimum_required to 3.18.4 across dockerfiles
* Fix link to perl repo in opensuse docker.
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: cef228bfbd]
Updated OS test matrix to match ROCm 6.2.
Update build and CI docker files
Remove the "docs" workflow, because "read-the-docs" is now being used for ROCm documentation
[ROCm/rocprofiler-systems commit: b15c9e94fc]
* Update build-docker.sh
- support rocm 6.0
* Update cpack workflow
- support rocm 6.0
* Update CI testing workflow paths-ignore
- changes to {docs,cpack,containers,formatting}.yml and docker do not require testing
* Update docker for OpenSUSE
- always use --non-interactive with zypper
- tweak to PERL_REPO when OS version >= 15.4
[ROCm/rocprofiler-systems commit: cfaace38a8]
* Fixes for Python 3.11
* Add python 3.11 to scripts
- also tweak to to{upper,lower} bash functions
* Fix PAPI RPM packaging in RedHat
- fix error from #!/usr/bin/python in papi_hl_output_writer.py
- requires either python2 or python3 instead of python
* cpack updates
- only generate STGZ for RedHat
- support `--generators` arg in build-release.sh
- support 7z, zip, and other zip generators
- fix build-release.sh with `--mpi`
- support setting CONDA_ROOT
* Support rhel/fedora/centos in omnitrace-install.py
* RedHat status badge
* Fix support for Python 3.11 + tweak ubuntu ci
- Remove installing clang and mpich in Ubuntu CI container
- Fallback on conda-forge for Python 3.11
- Enable entrypoint-rhel.sh for RHEL CI
- Pull latest container by default
* Update ElfUtils and PAPI builds
- quieter build output
- disable-nls for ElfUtils
- use -s flag for make
* Development Guide Docs
[ROCm/rocprofiler-systems commit: 83f9ed8696]
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
- improves the stability of MPI finalization
- reduces some debug messages within timemory when `OMNITRACE_DEBUG=ON`
- fixes issue found in RHEL where libunwind is using mutex and omnitrace was not treating this as an internal mutex call
- this may have been affecting the causal profiling slightly (tests seem a bit more stable now)
- fix data race in timemory
* Add RedHat CI and release packaging
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
* Fix URL for ROCm packages in redhat workflow
* Fix dnf --enable-repo for ROCm perl packages
* Dockerfile.rhel and redhat.yml updates
- Fix dnf repo for ROCm PERL packages
- Disable python in CI (interpreter segfaults)
- Exclude parallel-overhead-locks tests due to inclusion of internal locks
- This needs to be remedied in the future
* Exclude _dl_relocate_static_pie from instrumentation
* Testing updates
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks
* Fix redhat workflow
* redhat.yml update
- remove if condition on config/build/test step
* Update timemory submodule
- tweaks to verbosity messages
* Set thread state before unw_step
- on Redhat, unw_step calls mutex
* Update timemory submodule
- verbosity changes
- gotcha uses spin_lock/spin_mutex
* Remove using gsplit-dwarf unless OMNITRACE_BUILD_NUMBER > 2
* Re-enable parallel-overhead-locks tests in redhat workflow
* Always disable timemory manager metadata auto output
* testing updates
- tweak parallel-overhead-locks-timemory to higher instruction count min
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks-perfetto
* Update timemory submodule
- quiet realpath queries
* omnitrace exe updates
- detect text files
- improved bin/lib locating
* cmake format
* test-install.sh and redhat workflow updates
- handle testing when ls is script
- re-enable python testing on redhat workflow
- invoke test-install.sh in redhat workflow
* Misc guards for finalization
* omnitrace-exe, testing updates
- test-install.sh: LS_EXEC -> LS_NAME
- handle /usr/bin/ls being script in source/bin/tests
- improve locating the binary
* Fix mpi_gotcha compile error
* omnitrace-exe updates
- improve file locating
* formatting
* Misc fixes
- remove -static-libstdc++ for RHEL packaging (rocky-linux doesn't distribute static lib)
* omnitrace-exe paths
* Replace realpath with absolute
- using absolute path to symlink fixes issues with locating libdyninstAPI_RT at runtime
* omnitrace exe updates
- judicious use of realpath
* Update timemory submodule
- fix update main hash ids/aliases data race in merge
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
* omnitrace exe updates
- Update resolved exe/lib messaging
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
[ROCm/rocprofiler-systems commit: 1688a027d8]
* Submitting jobs to cdash
* Fail on submit
* submit url env
* submit url env
* try passing submit url as arg
* fix submit url
* Updated default URL
* Add submissions for remaining ubuntu focal workflow jobs
* Replace g++ with gcc in dashboard build name
* Add --ctest-args to run-ci.sh
* Add cdash support for bionic, jammy, and opensuse workflows
* Decrease CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE
* OMNITRACE_BUILD_CODECOV option
* Support code coverage in CDash script
* CI dyninst built with debug info
* Update ci-containers
- cron schedule moved 4 hours later to UTC+5
* Update implementation of config::configure_signal_handler
- using lambdas failed to compile with codecov flags
* Add codecov job to ubuntu focal workflow
* Fix support for --ctest-args in run-ci script
* Fix ubuntu workflows
* Fix quotation handling in run-ci script
* git safe directory for codecov
* New MPI examples
* Remove --stop-on-failure
* dynamic_library update
- find_library_path checks procfs maps
- invoke find_library_path with no additional args to resolve to mapped file
* RCCLP uses dynamic_library
* check if file exists for memory_map_files metadata
* Testing updates
- include new mpi examples in tests
- fix test labels
- test critical-trace exe
* Update MPI C examples tests (needed arg)
* Remove try/catch block from critical-trace
* Fix sampling max wait when shutting down
* Fix test env for critical-trace
* Fix settings for critical-trace
- disable time output: data is deterministic
- disable PID suffixes: not multiprocess
* Update critical-trace ctest
* Update critical-trace exe
- throw error if input cannot be opened
- throw error if input has no data
* Update lulesh example with more kokkos tools usage
* Fix tasking issue with critical_trace and roctracer
- were not setting pools to active
- also sync before critical_trace::get_entries
* Increase verbosity of critical-trace tests
* Update code coverage tests
- skip code coverage + preload
- code-coverage python example and test
* Remove duplication omnitrace.initialize function
* Skip python3.6 for ubuntu jammy
* Update MPI examples
- use MPI_Isend and MPI_Irecv
- explicitly use MPI_Bcast
* Update Formatting.cmake
- include C files in examples
* run-ci script does not check return of coverage
* mpi-allreduce link to libm
* Update ctest args in run-ci script
* Update dyninst submodule
- safety improvements in BinaryEdit::openResolvedLibraryName
* capture cmake error for ctest_coverage
[ROCm/rocprofiler-systems commit: 46b6db1a4c]
* CI for OpenSUSE
* OpenSUSE MPI --allow-run-as-root
- when testing in OpenSUSE docker container, tests using OpenMPI fail due to running as root
- This commit toggles OMNITRACE_CI_MPI_RUN_AS_ROOT to ON and tests whether --allow-run-as-root is supported
* Tweak to OMNITRACE_CI_MPI_RUN_AS_ROOT handling
[ROCm/rocprofiler-systems commit: 20e7b573d4]
* omnitrace find_package support
- Fix to INSTALL_DESTINATION for configure_package_config_file
- Fixes to ConfigInstall.cmake and omnitrace-config.cmake.in
* Test find_package
[ROCm/rocprofiler-systems commit: d2e635ed3c]