* causal backtrace updates
- fix initial causal sampling period value
* causal delay updates
- tweak handling of sleep_for_overhead
* Fix experiment global scaling for prog pts
- results in drastically improved predictions
* pthread_mutex_gotcha updates
- disable all wrappers during causal profiling
* validate-causal-json.py updates
- support decimal stddev
- fix setting stddev from command-line
* causal perform_experiment_impl update
- handle start failing because finalizing
* deprecate causal::component::sample_rate
- appears to not help at all
* Rework sample info
* Increase causal unwind_depth
- use OMNITRACE_MAX_UNWIND_DEPTH
* validate-causal-json updates
- min experiments
- exclude reporting predictions with less than X experiments at a given speedup
- percent samples
- only print samples within X% of the peak (default: 95%)
* Update timemory submodule
- extensions to sampling for signals delivered via non-timer method
- e.g. via HW counter overflow
* dwarf_entry::operator< updates
- sort via file
* causal profiling docs updates
- info about backends
- info about installing/enabling perf
* config updates: causal backend
- CausalBackend enum
- OMNITRACE_CAUSAL_BACKEND: perf, timer, auto
- omnitrace-causal option: --backend
* debug update
- use spin_mutex instead of std::mutex
* address_range::contains update
- range from 0-100 contains range from 10-100 but was returning false because high was == 100 not < 100
* symbol::operator< update
- handle load address differences
* sampling updates (non-causal)
- update get_timer to get_trigger + dynamic_cast
* container::static_vector updates
- support construction from container::c_array
- update_size private member func for handling atomic m_size
* Move perf files
- moved library/causal/perf.{hpp,cpp} to library/perf.{hpp,cpp}
* causal example update
- created impl.hpp (forward decls)
- renamed {cpu,rng}_func_impl to {cpu,rng}_impl_func
- only create two threads which run N iterations instead of two threads each iteration
* Update timemory submodule
- updates to unwind::processed_entry
- updates to procfs::maps
* Updated causal documentation
- fixed line numbers changed by modifications to causal example
* omnitrace-causal exe updates
- set OMNITRACE_THREAD_POOL_SIZE to zero by default
* core/containers updates
- static_vector: provide data() member function
- c_array pop_front() and pop_back() member functions
* core: config and argparse updates + perf
- core/perf.{hpp,cpp}
- forward decl of enums
- config-related capabilities
- argparse: --sample-overflow
- renamed some config functions
- e.g. get_sampling_cpu_freq -> get_sampling_cputime_freq
- added config settings related to overflow sampling via perf
- added timer_sampling and overflow_sampling categories
* Update timemory submodule
- sampling allocator flushing
* binary updates
- lookup_ipaddr_entry
- use bfd_find_nearest_line instead of bfd_find_nearest_line_discriminator
- discriminators are not used
- explicit instantiations of inlined_symbol::serialize
* Bump VERSION to 1.10.0
* sampling and perf updates
- support overflow sampling via Linux Perf
- update perf namespace
- update perf::perf_event
- update record ctor: pointer instead of const ref
- update open member func: return optional string
- add m_batch_size member variable
- sampling updates
- support overflow sampling
- flush allocators
- increase buffer size from 1024 to 2048
- restructure post-processing in light of perf overflow supports
- improve offload memory usage only load buffers for thread
- load_offload_buffer(tid) uses thread-specific filepos
- component updates
- backtrace_metrics::operator-=
- backtrace_metrics::operator-
- backtrace::sample does not record for overflow signal
- callchain: perf overflow sample
* core updates
- component::sampling_percent does not report self + uses_percent_units
* causal updates
- tweak get_line_info
- overloads for set_current_selection (uint64_t, c_array, std::array)
- delay
- use sampling::pause/sampling::resume
- experiment
- experiment::sample derives from unwind::processed_entry
- experiment::samples is vector instead of set
- fixed samples
- overloads for is_selected (uint64_t, c_array, std::array)
- scaling factor defaults to 100 instead of 50
- serialize updates follow change to experiment::sample
- modify algorithm for increasing/decreasing experiment length
- sample_data
- use map<uintptr, uint64_t> instead of set<sample_data>
- get_samples returns vector<sample_data> instead of set<sample_data>
- sampling
- support overflow via Linux Perf
- update causal_offload_buffer
- flush sampling allocator
- backtrace
- overflow component
* libomnitrace-dl updates
- handle dl::InstrumentMode::PythonProfile
* testing updates (causal)
- causal line 155 -> causal line 100
- causal line 165 -> causal line 110
* formatting
* exit_gotcha updates
- exit_info for abort()
- message about non-zero exit code
* testing updates
- fail regex for causal tests
- validate-causal-json: >= min_experiments instead of > min_experiments
- handle OMNITRACE_DEBUG_SETTINGS in omnitrace_write_test_config
* causal sampling updates
- add new lines where appropriate
* causal data updates
- reorder diagnostic info when experiment fails to start
* binary updates
- symbol address range from address to address + symsize + 1
- add 1 based on debug info
* causal data updates
- sample_selection wait_ns defaults to 1,000 instead of 10,000
- sample_selection wait scaled by iteration number
- save_line_info_impl verbosity
- print latest_eligible_pc when experiment does not start
* causal sampling + component updates
- perf backend disables component::backtrace
- ensure get_sampling_(realtime|cputime|overflow)_signal do not malloc
* causal: remove period stats
* validate-causal-json update
- fix --help
* causal data updates
- improve eligible pc history reporting when experiment fails to start
* causal data updates
- fix compute_eligible_lines_impl
- eligible address ranges returning too many ranges
- occasionally, overwrite all *true* eligible address ranges
* causal data updates
- reduce scoped ranges to symbol ranges
- is_eligible_address() returns true contains (not just coarse)
- revert some sample_selection behavior
* binary address_multirange updates
- make coarse_range private
- fix operator+=(pair<coarse, uintptr_t>)
* causal example update
- fix nsync to default to once per iteration
* binary analysis updates
- tweak header file includes
* causal updates
- remove factoring in sleep_for_overhead
- invoke delay::process() even if experiment is not active
* causal data updates
- update latest_eligible_pc structure
* update omnitrace-install.py.in
- fix support for fedora
- /etc/os-release does not have ID_LIKE
- fallback to RHEL 8.7 if version not specified
* update omnitrace-install.py.in
- fix support for debian
- /etc/os-release does not have ID_LIKE
- version mapping
* Update documentation
- update docs on installation
* causal data and experiment updates
- data: reset_sample_selection
* causal set_current_selection debugging
- debug messages for failed e2e runs
* causal data and backtrace component updates
- data: set_current_selection returns the number of eligible addresses added
- backtrace: if cputime signal has selected zero IPs > 5x, then realtime signal starts contributing call-stacks
* core library updates
- move config::parse_numeric_range to utility namespace
- add core/utility.cpp
- support range:increment, e.g. 5-25:10 expands to '5 15 25' instead of '5 10 15 20 25'
* omnitrace-causal update
- end-to-end expands all speedups
- support range:increment in speedups
* causal backtrace updates
- remove select_ival (realtime signal always contributes when select_count == 0)
* containers: static_vector update
- explicit c_array constructor
- explicit std::array constructor
* causal data updates
- remove set_current_selection(uint64_t)
- remove set_current_selection(std::array)
- sample_selection increase default wait time
- report eligible PC candidates
- move reset_sample_selection to perform_experiment_impl
- decrease latest_eligible_pc array size
- set_current_selection does not guard for experiment::active
* core debug updates
- OMNITRACE_PRINT_COLOR macros
* causal data updates
- tweak to experiment never started message
* causal gotcha updates
- remove unused code
* critical trace updates
- remove unused code
* omnitrace-causal
- OMNITRACE_LAUNCHER
* causal data updates
- don't fail on end-to-end + omnitrace-causal
* causal backtrace updates
- reintroduce select_ival behavior
* causal data updates
- tweak verbose messages about number of PC candidates
* core mproc updates
- utilities for waiting on child PID and diagnosing status
- omnitrace::mproc::wait_pid
- omnitrace::mproc::diagnose_status
* omnitrace-run updates
- support --fork argument for executing via fork in current process + execvpe on child instead of execvpe in current process
* omnitrace-causal updates
- wait_pid and diagnose_status just call equivalent functions in omnitrace::mproc
* ubuntu-focal workflow update
- attempt to launch ubuntu-focal-codecov job with CAP_SYS_ADMIN and use perf backend
* tests reorg and updates
- remove binary-rewrite-sampling and runtime-instrument-sampling tests
- rename *-preload tests (which use omnitrace-sample exe) to *-sampling
- split tests/CMakeLists.txt into several tests/omnitrace-<category>-tests.cmake files
- tweak to causal-both-omni-func test
- add args: -n 2 -b timer
* update validate-causal-json.py
- better reasoning info for adjusting tolerance
- always apply tolerance adjustments in CI mode
* causal e2e tests update
- add label "causal-e2e" label
- tweak params
- old: 80 12 432525 500000000
- new: 80 50 432525 100000000
- disable processor affinity for slow-func/line-100 tests
- artificially inflates some speedups with perf
* unblocking_gotcha updates
- overload operator() according to gotcha function index
* blocking_gotcha updates
- overload operator() according to gotcha function index
- fix bug where potentially post block functors (e.g. pthread_mutex_trylock) throw error if lock is not acquired.
* parse_numeric_range update
- support unordered_set
* config update
- OMNITRACE_DEBUG_{TIDS,PIDS} use parse_numeric_range
12 KiB
Installation
.. toctree::
:glob:
:maxdepth: 4
Quick Start (Latest Release, Binary Installer)
Download the omnitrace-install.py and specify --prefix <install-directory>. This script
will attempt to auto-detect the appropriate OS distribution and OS version.
If ROCm support is desired, specify --rocm X.Y where X is the ROCm major version and Y
is the ROCm minor version, e.g. --rocm 5.4.
wget https://github.com/AMDResearch/omnitrace/releases/latest/download/omnitrace-install.py
python3 ./omnitrace-install.py --prefix /opt/omnitrace --rocm 5.4
This script supports installation on Ubuntu, OpenSUSE, RedHat, Debian, CentOS, and Fedora.
If the target OS is compatible with one of the operating system versions below,
specify -d <DISTRO> -v <VERSION>, e.g. if the OS is compatible with Ubuntu 18.04, pass
-d ubuntu -v 18.04 to the script.
Operating System
OmniTrace is only supported on Linux. The following distributions are tested:
- Ubuntu 18.04
- Ubuntu 20.04
- Ubuntu 22.04
- OpenSUSE 15.2
- OpenSUSE 15.3
- OpenSUSE 15.4
- RedHat 8.7
- RedHat 9.0
- RedHat 9.1
Other OS distributions may be supported but are not tested.
Identifying the Operating System
If you are unsure of the operating system and version, the /etc/os-release and /usr/lib/os-release files contain
operating system identification data for Linux systems.
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
...
VERSION_ID="20.04"
...
The relevent fields are ID and the VERSION_ID.
Architecture
With regards to instrumentation, at present only amd64 (x86_64) architectures are tested; however, Dyninst supports several more architectures and thus, omnitrace instrumentation may support other CPU architectures such as aarch64, ppc64, etc. Other modes of use, such as sampling and causal profiling, are not dependent on Dyninst and therefore may be more portable.
Installing omnitrace from binary distributions
Every omnitrace release provides binary installer scripts of the form:
omnitrace-{VERSION}-{OS_DISTRIB}-{OS_VERSION}[-ROCm-{ROCM_VERSION}[-{EXTRA}]].sh
E.g.:
omnitrace-1.0.0-ubuntu-18.04-OMPT-PAPI-Python3.sh
omnitrace-1.0.0-ubuntu-18.04-ROCm-405000-OMPT-PAPI-Python3.sh
...
omnitrace-1.0.0-ubuntu-20.04-ROCm-50000-OMPT-PAPI-Python3.sh
Any of the EXTRA fields with a cmake build option (e.g. PAPI, see below) or no link requirements (e.g. OMPT) have self-contained support for these packages.
Download the appropriate binary distribution
wget https://github.com/AMDResearch/omnitrace/releases/download/v<VERSION>/<SCRIPT>
Create the target installation directory
mkdir /opt/omnitrace
Run the installer script
./omnitrace-1.0.0-ubuntu-18.04-ROCm-405000-OMPT-PAPI.sh --prefix=/opt/omnitrace --exclude-subdir
Installing OmniTrace from source
Build Requirements
OmniTrace needs a GCC compiler with full support for C++17 and CMake v3.16 or higher. The Clang compiler may be used in lieu of the GCC compiler if Dyninst is already installed.
- GCC compiler v7+
- Older GCC compilers may be supported but are not tested
- Clang compilers are generally supported for OmniTrace but not Dyninst
- CMake v3.16+
If the system installed cmake is too old, installing a new version of cmake can be done through several methods. One of the easiest options is to use PyPi (i.e. python's pip):
pip install --user 'cmake==3.18.4' export PATH=${HOME}/.local/bin:${PATH}
Required Third-Party Packages
All of the third-party packages required by DynInst and DynInst itself can be built and installed during the build of omnitrace itself. In the list below, we list the package, the version, which package requires the package (i.e. omnitrace requires Dyninst and Dyninst requires TBB), and the CMake option to build the package alongside omnitrace:
| Third-Party Library | Minimum Version | Required By | CMake Option |
|---|---|---|---|
| Dyninst | 12.0 | OmniTrace | OMNITRACE_BUILD_DYNINST (default: OFF) |
| Libunwind | OmniTrace | OMNITRACE_BUILD_LIBUNWIND (default: ON) |
|
| TBB | 2018.6 | Dyninst | DYNINST_BUILD_TBB (default: OFF) |
| ElfUtils | 0.178 | Dyninst | DYNINST_BUILD_ELFUTILS (default: OFF) |
| LibIberty | Dyninst | DYNINST_BUILD_LIBIBERTY (default: OFF) |
|
| Boost | 1.67.0 | Dyninst | DYNINST_BUILD_BOOST (default: OFF) |
| OpenMP | 4.x | Dyninst |
Optional Third-Party Packages
- ROCm
- HIP
- Roctracer for HIP API and kernel tracing
- ROCM-SMI for GPU monitoring
- Rocprofiler for GPU hardware counters
- PAPI
- MPI
OMNITRACE_USE_MPIwill enable full MPI supportOMNITRACE_USE_MPI_HEADERSwill enable wrapping of the dynamically-linked MPI C function calls- By default, if an OpenMPI MPI distribution cannot be found, omnitrace will use a local copy of the OpenMPI mpi.h
- Several optional third-party profiling tools supported by timemory (e.g. Caliper, TAU, CrayPAT, etc.)
| Third-Party Library | CMake Enable Option | CMake Build Option |
|---|---|---|
| PAPI | OMNITRACE_USE_PAPI (default: ON) |
OMNITRACE_BUILD_PAPI (default: ON) |
| MPI | OMNITRACE_USE_MPI (default: OFF) |
|
| MPI (header-only) | OMNITRACE_USE_MPI_HEADERS (default: ON) |
Installing DynInst
Building Dyninst alongside OmniTrace
The easiest way to install Dyninst is to configure omnitrace with OMNITRACE_BUILD_DYNINST=ON. Depending on the version of Ubuntu, the apt package manager may have current enough
versions of Dyninst's Boost, TBB, and LibIberty dependencies (i.e. apt-get install libtbb-dev libiberty-dev libboost-dev); however, it is possible to request Dyninst to install
it's dependencies via DYNINST_BUILD_<DEP>=ON, e.g.:
git clone https://github.com/AMDResearch/omnitrace.git omnitrace-source
cmake -B omnitrace-build -DOMNITRACE_BUILD_DYNINST=ON -DDYNINST_BUILD_{TBB,ELFUTILS,BOOST,LIBIBERTY}=ON omnitrace-source
where -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON is expanded by the shell to -DDYNINST_BUILD_TBB=ON -DDYNINST_BUILD_BOOST=ON ...
Installing Dyninst via Spack
Spack is another option to install Dyninst and it's dependencies:
git clone https://github.com/spack/spack.git
source ./spack/share/spack/setup-env.sh
spack compiler find
spack external find --all --not-buildable
spack spec -I --reuse dyninst
spack install --reuse dyninst
spack load -r dyninst
Installing omnitrace
OmniTrace has cmake configuration options for supporting MPI (OMNITRACE_USE_MPI or OMNITRACE_USE_MPI_HEADERS), HIP kernel tracing (OMNITRACE_USE_ROCTRACER),
sampling ROCm devices (OMNITRACE_USE_ROCM_SMI), OpenMP-Tools (OMNITRACE_USE_OMPT), hardware counters via PAPI (OMNITRACE_USE_PAPI), among others.
Various additional features can be enabled via the TIMEMORY_USE_* CMake options.
Any OMNITRACE_USE_<VAL> option which has a corresponding TIMEMORY_USE_<VAL> option means that the support within timemory for this feature has been integrated
into omnitrace's perfetto support, e.g. OMNITRACE_USE_PAPI=<VAL> forces TIMEMORY_USE_PAPI=<VAL> and the data that timemory is able to collect via this package
is passed along to perfetto and will be displayed when the .proto file is visualized in ui.perfetto.dev.
git clone https://github.com/AMDResearch/omnitrace.git omnitrace-source
cmake \
-B omnitrace-build \
-D CMAKE_INSTALL_PREFIX=/opt/omnitrace \
-D OMNITRACE_USE_HIP=ON \
-D OMNITRACE_USE_ROCM_SMI=ON \
-D OMNITRACE_USE_ROCTRACER=ON \
-D OMNITRACE_USE_PYTHON=ON \
-D OMNITRACE_USE_OMPT=ON \
-D OMNITRACE_USE_MPI_HEADERS=ON \
-D OMNITRACE_BUILD_PAPI=ON \
-D OMNITRACE_BUILD_LIBUNWIND=ON \
-D OMNITRACE_BUILD_DYNINST=ON \
-D DYNINST_BUILD_TBB=ON \
-D DYNINST_BUILD_BOOST=ON \
-D DYNINST_BUILD_ELFUTILS=ON \
-D DYNINST_BUILD_LIBIBERTY=ON \
omnitrace-source
cmake --build omnitrace-build --target all --parallel 8
cmake --build omnitrace-build --target install
source /opt/omnitrace/share/omnitrace/setup-env.sh
MPI Support within OmniTrace
OmniTrace can have full (OMNITRACE_USE_MPI=ON) or partial (OMNITRACE_USE_MPI_HEADERS=ON) MPI support.
The only difference between these two modes is whether or not the results collected via timemory and/or perfetto can be aggregated into a single
output file during finalization. When full MPI support is enabled, combining the timemory results always occurs whereas combining the perfetto
results is configurable via the OMNITRACE_PERFETTO_COMBINE_TRACES setting.
The primary benefits of partial or full MPI support are the automatic wrapping of MPI functions and the ability
to label output with suffixes which correspond to the MPI_COMM_WORLD rank ID instead of using the system process identifier (i.e. PID).
In general, it is recommended to use partial MPI support with the OpenMPI headers as this is the most portable configuration.
If full MPI support is selected, make sure your target application is built against the same MPI distribution as omnitrace,
i.e. do not build omnitrace with MPICH and use it on a target application built against OpenMPI.
If partial support is selected, the reason the OpenMPI headers are recommended instead of the MPICH headers is
because the MPI_COMM_WORLD in OpenMPI is a pointer to ompi_communicator_t (8 bytes), whereas MPI_COMM_WORLD in MPICH,
it is an int (4 bytes). Building omnitrace with partial MPI support and the MPICH headers and then using
omnitrace on an application built against OpenMPI will cause a segmentation fault due to the value of the MPI_COMM_WORLD being narrowed
during the function wrapping before being passed along to the underlying MPI function.
Post-Installation Steps
Configure the environment
If environment modules are available and preferred:
module use /opt/omnitrace/share/modulefiles
module load omnitrace/1.0.0
Alternatively, once can directly source the setup-env.sh script:
source /opt/omnitrace/share/omnitrace/setup-env.sh
Test the executables
Successful execution of these commands indicates that the installation does not have any issues locating the installed libraries:
omnitrace-instrument --help
omnitrace-avail --help
NOTE: If ROCm support was enabled, you may have to add the path to the ROCm libraries to
LD_LIBRARY_PATH, e.g.export LD_LIBRARY_PATH=/opt/rocm/lib:${LD_LIBRARY_PATH}