## Motivation
Enable UCX communication tracing and communication metadata
## Technical Details
Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.
- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles to include UCX dependencies.
## Motivation
<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
- __Reduced Code Duplication__: Version parsing logic moved from individual Dockerfiles to the central build script
- __Improved Edge Case Handling__: Better handling of ROCm versions with and without patch numbers (e.g., `6.2` vs `6.2.0`)
- __Easier Maintenance__: Future version-related changes only need to be made in one place
- __Cleaner Dockerfiles__: Simplified Dockerfiles focus on package installation rather than complex shell logic
- __Updated Platform Support__: Refreshed container matrix to reflect current platform/ROCm version combinations
- __Fix OpenSUSE Docker Generation__: OpenSUSE container generation fails due to a change to the `binutils-gold` package
- __Error Handling__: Fix bug where errors in docker image build were being masked, allowing workflow to pass anyway.
## Technical Details
<!-- Explain the changes along with any relevant GitHub links. -->
- Updated `Dockerfile.opensuse` and `Dockerfile.opensuse.ci` docker files to remove `binutils-gold`
- Not needed since we build `binutils` with systems anyways
- Updated `rocprofiler-systems-containers.yml` to remove `pushd/popd` commands and just run the shell scripts
- There was a silent failure observed here, which I verified in this PR before adding the fix for openSUSE
- Refactor ROCm version parsing. Move this logic to the `build-docker.sh` script to reduce duplication.
- Fix bug that caused ROCm 7.0 to fail installation. The trailing `.0` was being trimmed.
- Fixed inconsistencies in `containers.yml` that lead to invalid ROCm-OS_VERSION combinations.
- Formatting fixes
- Removed trailing whitespace
- Fix docker build warnings. Use an `=` rather than ` ` when assigning an environment variable.
- Add `nlohmann-json-dev` (or equivalent) to CI Docker images for RHEL, SUSE, and Ubuntu.
- Add `gmock-dev` and `gtest-dev` (or equivalent) to CI Docker images for RHEL, SUSE, and Ubuntu.
- Add `--set solver classic` to conda config to resolve an issue setting up the conda environment
- Fix Perfetto package installation on ubuntu noble image.
- Add a check and log error if pip installation fail
---------
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
- matrix -m argument for build-docker.sh that lists compatible OS + ROCm combinations.
- ${DISTRO} is now case-insensitive.
- Added note to README.md to mention this flag.
- Removed --build-arg AMDGPU_RPM=${ROCM_RPM}, which is no longer used
[ROCm/rocprofiler-systems commit: 67bc147780]
- Bringing in recent changes from rocm-6.4 branch (https://github.com/ROCm/rocprofiler-systems/pull/171)
- Add libdrm-devel to rhel and suse files
- Update ROCm installation method in Ubuntu file
- Add additional output to `test-release.sh` to catch failures due to a Python version not included
- Add Python 3.13 to Dockers
[ROCm/rocprofiler-systems commit: 83a9eb3d7c]
* Integrating amd-smi into rocprofiler-systems due to rocm-smi deprecation.
* No functionality changes to users other than naming conventions.
* New tracks available in perfetto- gpu busy percentage metrics now splits gfx busy into separate gfx, umc, and mm engine measurements.
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 0c32dfd6bc]
- Fixed type in script name to fix CTest. Should be: `run-rocprof-sys-pid.sh`
- Add `/usr/lib64/openmpi/bin` to Docker Image PATH to help ctests to discover `mpiexec`
- `pip install perfetto` in the CI Docker file, for convenience.
[ROCm/rocprofiler-systems commit: 4c4d0dea85]
- Renames the CMake option "ROCPROFSYS_USE_HIP" to "ROCPROFSYS_USE_ROCM"
- Remove the "ROCPROFSYS_USE_ROCM_SMI option. Controlled with the "ROCPROFSYS_USE_ROCM" option, instead.
- Runtime configuration can still toggle ROCPROFSYS_USE_ROCM_SMI to disable the sampling.
- Rename ROCPROFSYS_HIP_VERSION macro to ROCPROFSYS_ROCM_VERSION and remove blocks for `ROCPROFSYS_ROCM_VERSION < 60000`
- Remove ROCPROFSYS_USE_ROCTRACER and ROCPROFSYS_USE_ROCPROFILER
- Update test cases
- Update docker files and workflows to install cmake 3.21, which is required for the rocprofiler-sdk findPackage script.
- Removed rocm-6.2 from workflows due to a rocprofiler-sdk API change.
[ROCm/rocprofiler-systems commit: 88aa2d3cbe]
* Update cmake version installed in dockerfiles
* Standardize the cmake_minimum_required to 3.18.4 across dockerfiles
* Fix link to perl repo in opensuse docker.
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: cef228bfbd]
Updated OS test matrix to match ROCm 6.2.
Update build and CI docker files
Remove the "docs" workflow, because "read-the-docs" is now being used for ROCm documentation
[ROCm/rocprofiler-systems commit: b15c9e94fc]
* Add texinfo to Ubuntu and RHEL docker files
* Add the `bison` package to Ubuntu dockerfile
* Update the CI docker files too.
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: d0d97490b4]
* Support ROCm 5.5 in docker
* Update containers workflow
- add Ubuntu and OpenSUSE container builds for ROCm 5.4 and 5.5
- add RHEL builds
* Update cpack workflow
- build on PR against main when cpack.yml or docker files updated
- removed packaging for ROCm < 5.2 for many OSes
- added packaging for ROCm 5.5
* Update OpenSUSE workflow
- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails
* Update RedHat workflow
- provide run-name
- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails
* Update Ubuntu (Bionic) workflow
- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails
* Update Ubuntu (Focal) workflow
- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails
- remove testing of ROCm 4.3, 5.0, 5.1
- add testing of ROCm 5.5
* Update Ubuntu (Jammy) workflow
- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails
- add testing of ROCm latest
* Dockerfile.{rhel,opensuse} update
- remove use of amdgpu-install in favor of installing rocm-dev package
- In ROCm 5.5, amdgpu-install changed meaning of --usecase=rocm (added rocmdev use case)
* redhat workflow update
- remove use of amdgpu-install in favor of installing rocm-dev package
- In ROCm 5.5, amdgpu-install changed meaning of --usecase=rocm (added rocmdev use case)
* build-docker.sh update
- add '--progress plain' to docker build commands
* Ubuntu (jammy) workflow update
- fix rocm installation
* Update Dockerfile.rhel
- add LIBRARY_PATH for /opt/amdgpu/lib64 for redhat
* Update Dockerfile.rhel
- install libpciaccess for rocm
[ROCm/rocprofiler-systems commit: 693f753a9e]
Remove docker hiplibsdk from amdgpu-install
- amdgpu-install use case hiplibsdk is not necessary and bloats the install
- same as above for package rocm-hip-sdk
[ROCm/rocprofiler-systems commit: 2ce0cb4a19]
* Fixes for Python 3.11
* Add python 3.11 to scripts
- also tweak to to{upper,lower} bash functions
* Fix PAPI RPM packaging in RedHat
- fix error from #!/usr/bin/python in papi_hl_output_writer.py
- requires either python2 or python3 instead of python
* cpack updates
- only generate STGZ for RedHat
- support `--generators` arg in build-release.sh
- support 7z, zip, and other zip generators
- fix build-release.sh with `--mpi`
- support setting CONDA_ROOT
* Support rhel/fedora/centos in omnitrace-install.py
* RedHat status badge
* Fix support for Python 3.11 + tweak ubuntu ci
- Remove installing clang and mpich in Ubuntu CI container
- Fallback on conda-forge for Python 3.11
- Enable entrypoint-rhel.sh for RHEL CI
- Pull latest container by default
* Update ElfUtils and PAPI builds
- quieter build output
- disable-nls for ElfUtils
- use -s flag for make
* Development Guide Docs
[ROCm/rocprofiler-systems commit: 83f9ed8696]
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
- improves the stability of MPI finalization
- reduces some debug messages within timemory when `OMNITRACE_DEBUG=ON`
- fixes issue found in RHEL where libunwind is using mutex and omnitrace was not treating this as an internal mutex call
- this may have been affecting the causal profiling slightly (tests seem a bit more stable now)
- fix data race in timemory
* Add RedHat CI and release packaging
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
* Fix URL for ROCm packages in redhat workflow
* Fix dnf --enable-repo for ROCm perl packages
* Dockerfile.rhel and redhat.yml updates
- Fix dnf repo for ROCm PERL packages
- Disable python in CI (interpreter segfaults)
- Exclude parallel-overhead-locks tests due to inclusion of internal locks
- This needs to be remedied in the future
* Exclude _dl_relocate_static_pie from instrumentation
* Testing updates
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks
* Fix redhat workflow
* redhat.yml update
- remove if condition on config/build/test step
* Update timemory submodule
- tweaks to verbosity messages
* Set thread state before unw_step
- on Redhat, unw_step calls mutex
* Update timemory submodule
- verbosity changes
- gotcha uses spin_lock/spin_mutex
* Remove using gsplit-dwarf unless OMNITRACE_BUILD_NUMBER > 2
* Re-enable parallel-overhead-locks tests in redhat workflow
* Always disable timemory manager metadata auto output
* testing updates
- tweak parallel-overhead-locks-timemory to higher instruction count min
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks-perfetto
* Update timemory submodule
- quiet realpath queries
* omnitrace exe updates
- detect text files
- improved bin/lib locating
* cmake format
* test-install.sh and redhat workflow updates
- handle testing when ls is script
- re-enable python testing on redhat workflow
- invoke test-install.sh in redhat workflow
* Misc guards for finalization
* omnitrace-exe, testing updates
- test-install.sh: LS_EXEC -> LS_NAME
- handle /usr/bin/ls being script in source/bin/tests
- improve locating the binary
* Fix mpi_gotcha compile error
* omnitrace-exe updates
- improve file locating
* formatting
* Misc fixes
- remove -static-libstdc++ for RHEL packaging (rocky-linux doesn't distribute static lib)
* omnitrace-exe paths
* Replace realpath with absolute
- using absolute path to symlink fixes issues with locating libdyninstAPI_RT at runtime
* omnitrace exe updates
- judicious use of realpath
* Update timemory submodule
- fix update main hash ids/aliases data race in merge
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
* omnitrace exe updates
- Update resolved exe/lib messaging
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
[ROCm/rocprofiler-systems commit: 1688a027d8]