## Motivation
- Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
- Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake`
- Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
- Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
- GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
- OMPVV offload tests now only compile if `amdflang` version is `>= 20`
- Improve link time by reducing the number of GFX targets that binaries need to support.
- RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
* Put cached perfetto traces as default one
* Improve cached data and perfetto traces in order to be more aligned with E2E tests
* Addressing PR comments and findings
* Force early instrumentation bundle instantiation
* Sync-up insturumented containers with thread growth data
* Revert ompvv number of host threads to default 8
* Fixed counter track namings for amd-smi
* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
This PR fixes a segmentation fault seen when running rocprof-sys-sample with multi-process OpenMP/HIP applications.
The crash was caused by missing libomptarget.so on the runtime loader path or incorrect LD_PRELOAD settings.
Fixes SWDEV-552804
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* Added Fortran (amdflang) openmp tests using the openmp-vv project
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
- Update minimum_cmake_required to match version used in CI
- We should match the minimum version that we test against
- Ensure ".S" files are treated as assembly.
- openmp-target: add runtime rpath for libomptarget and update tests
- Handle events not associated with a HIP Stream
- Kernels from OpenMP target offload are not associated with a HIP stream. Fix handling with the callback record's stream_id is 0
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: c424dac261]
Test is missing from rocm-7.0 stack because of a HIP version check.
In these builds, hip_version.h is still reporting 6.5.0.
This check was originally put in to skip the test on older versions
of ROCm, which should no longer be required
- For SWDEV-537718
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 28bee27253]
- Add cmake formatting rule for `rocprofiler-systems-custom-compilation`
- Resolve build failure observed with the `openmp-target` example when building in some environments
- Observed in our docker images
- Ensure `libomptarget-amdgpu-gfx*.bc` files are found
[ROCm/rocprofiler-systems commit: 96b31d92c3]
- Porting from https://github.com/ROCm/omnitrace/pull/411
- Improve OMPT support
- Add OpenMP target example to testing
- Update Timemory submodule to use ROCm/Timemory rather than NERSC/Timemory
- Update `actions/upload-artifacts` to v4
- Standardize the `cmake_minimum_required` to 3.18.4 across workflows, project, and examples
- Updated Ubuntu 20.04 workflows
[ROCm/rocprofiler-systems commit: 7dce5926a7]
The Omnitrace program is being renamed.
Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"
---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: d07bf508a9]
* Testing and CI support for Ubuntu 22.04
* Fixes for ROCm
- Jammy does not have ROCm installers
* Name, timeout, and python updates
- renamed ubuntu-jammy-external.yml to ubuntu-jammy.yml
- increased all 5 minute timeouts to 10 minutes
- include python 3.10 in testing
* Update dyninst to remove interposed definition of _r_debug
* Rebuild Dyninst + test install script
* Revert container change
* git safe directory
* pushd -> cd
* fix MPI include
* Fix testing step
* OMPI_ALLOW_RUN_AS_ROOT
* Test script changes
* Fix mismatched malloc / delete[]
* Jammy workflow tweaks
* CPack tweak for boost deb deps
* pthread_mutex_gotcha config returns when not enabled
* fix echoing config in CI
* USE_CLANG_OMP
- option to disable using LLVM OpenMP when building OpenMP test executables
- Jammy workflow sets USE_CLANG_OMP=OFF
* Dyninst submodule boost download
- updated containers workflow to include jammy
- updated workflow to use ci
* Updates to workflows + replace test-install.sh
- test-install.sh in this branch was replaced with one in main branch
* Expand jammy test-install.sh args
* Fix openmp-cg-sampling-duration test
* update timemory submodule
- use-after-free violation in popen::pclose
* revert some tweaks to sampling-duration test
* Fix env of test-install.sh
* cmake format
* jammy bash
* CPack install for jammy
* formatting workflow action version bump
* Update timemory submodule
- libunwind submodule via timemory sets SOVERSION to 99 to avoid ABI conflicts with v8
* Fix help menu for omnitrace-sample
* Support other boolean forms in test-install.sh
* Update docker files and build-docker.sh
- consolidated cases in build-docker.sh
- support rocm version of 0.0 (no rocm install)
- support rocm v5.3
- updated centos handling
* update opensuse actions/checkout version
* Tweaks to ubuntu-focal testing
- actions/checkout@v3
- use test-install script
* update cpack
- ubuntu 22.04
- rocm 5.3
- rename os matrix field to os-version
- remove CI_ROCM_VERSION (no longer necessary)
- remove default-rocm-version matrix field (no longer necessary)
- CentOS packaging
* fix argparsing and omnitrace-sample tests in install-tests.sh
* focal rocm test install workflow fix
* Fix omnitrace-sample build
* Dockerfile.centos + build-docker.sh updates
* Update actions/upload-artifact version
* Dockerfile.ubuntu: install rocm-device-libs
* Refactor cpack
* fix cpack if quotes
* Dockerfile.ubuntu rocm < 5 installs rocm-dev
* build-release.sh defaults to boost version 1.79.0
[ROCm/rocprofiler-systems commit: ede6007f9b]
## Overview
This is a significant PR which has 3 very notable characteristics:
1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling
- Samples now record the current instruction pointers instead of strings
- This _dramatically_ decreases the overhead of taking a sample
- The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
- When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
- `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency
- `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
- Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
- In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
- This helped de-clutter the output folder
Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.
## Bug Fixes
There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made
## Details
- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
- there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
- added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
- removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
- removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
- migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
- handle disabled thread state
- handle finalized state
- fewer debug messages
- invoke thread_init()
- invoke thread_init_sampling()
- handle push/pop count based on category
- push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
[ROCm/rocprofiler-systems commit: 808ea7dfa7]
* Enable building PAPI via submodule
* Miscellaneous fixes
- Use TIMEMORY_PAPI_ARRAY_SIZE in backtrace
- remove pthread_gotcha init from fork_gotcha::configure
- fix HSA OnLoad called during before tooling init
* PAPI array size + PAPI.cmake updates
- updated timemory submodule with PAPI updates
- fix for backtrace _hw_cnt_labels
* Disable OMPT for focal
* format
[ROCm/rocprofiler-systems commit: d98e60a17f]