파일
rocm-systems/source/docs/installation.md
T
Jonathan R. Madsen 9618ddefba Causal profiling (#229)
* Addition of basic structure

* Reworked categories

* More causal integration additions

* Causal implementation

* Update examples

* delete virtual_speedup files

* Update perfetto submodule to v31.0

* Update dyninst submodule

* Update timemory submodule

* ElfUtils build for libdw

* OMNITRACE_LIKELY and OMNITRACE_UNLIKELY

* Update common lib join

* Examples updates for causal profiling

* config updates with causal options

- OMNITRACE_CAUSAL_FIXED_LINE
- OMNITRACE_CAUSAL_FIXED_SPEEDUP
- OMNITRACE_CAUSAL_FILE
- OMNITRACE_CAUSAL_BINARY_SCOPE
- OMNITRACE_CAUSAL_SOURCE_SCOPE
- version info in banner
- support increments in parse_numeric_range
- fix occasional deadlock in first call to get_config

* PTL general task group

* Always include PID in debug/verbose messages

* Add blocking/unblocking gotchas to runtime init bundle

* CausalState

* thread_data updates

- generic component_bundle_cache

* Improve handling of causal in category_region

* components updates

- backtrace_causal component
- backtrace::get_data member func
- decrease ignore_depth in backtrace::sample(int)
- handle "omnitrace_main" in backtrace::filter_and_patch(...)
- tweak internal thread state scope for pthread_mutex_gotcha wrappers

* simplify tracing get_instrumentation_bundles usage

* sampling updates

- include backtrace_causal component
- disable backtrace_metrics if using causal and not using perfetto
- disable backtrace and backtrace_timestamp when using causal
- post_process_causal

* causal updates

- more checks in blocking_gotcha and unblocking_gotcha start/stop
- miscellaneous overhaul of data
- experiment update

* Remove virtual speedup

* libomnitrace code_object

* causal-profiling test

* libomnitrace library.cpp updates

- handle causal profiling
- fini_bundle

* Disable causal profiling by default

* Updated causal code and example

- example: three execution variants: cpu + rng, cpu, rng
- example: three instrumentation variants: none, omni, coz
- fix blocking gotcha credit
- rework perform_experiment_impl
- get_eligible_address_ranges
- compute_eligible_lines
- support fixed lines/speedups/functions
- update selected_entry to support function mode
- fix causal::delay
- experiment updates

* omnitrace_progress / omnitrace_user_progress

- with accompanying omnitrace_annotated_progress / omnitrace_user_annotated_progress

* Update timemory submodule

* CausalMode

- mode indicated whether causal predictions source be at line-level or function-level

* code_object, config, runtime, sampling, thread_data

- code_object: address_range
- code_object: basic::line_info serialize(), name(), hash()
- config updates
- two signals for causal sampling
- thread_data init fixes

* pthread updates

- pthread_create_gotcha processes delays
- pthread_mutex_gotcha does not wrap pthread_join in causal mode

* backtrace_causal update

- dynamic delay period stats

* main wrapper uses basename of argv[0]

* update elfio submodule

* perf support (currently unused)

* Fix experiment JSON serialization

- static_vector.hpp (unused)

* causal executable + config options updates

- omnitrace-causal exe simplifies running multiple causal configs
- changed the causal config option names

* Support both throughput and latency points

* process-causal-json.py script

- will be used later for testing

* stable_vector

* Rework thread_data

* Improve omnitrace-causal exe

- better verbosity handling
- correct diagnosis of status for child process
- execvpe when only one iteration (debugging)

* Update timemory submodule

* exe --version

- omnitrace, omnitrace-avail, and omnitrace-sample all support --version on command-line

* OMNITRACE_INTERNAL_API + OMNITRACE_{LIKELY,UNLIKELY}

* omnitrace-causal cmake format

* omnitrace config update

- OMNITRACE_CAUSAL_FILE_CLOBBER

* custom exception

- wraps STL exception and gets stacktrace during construction

* exit_gotcha supports _Exit

* use global construct_on_init + max threads

- add some safety when exceeding max # of threads

* update code_object binary filter

- exclude dyninst and tbbmalloc library

* containers: c_array, static_vector, stable_vector

- moved utility::c_array to container::c_array
- created static_vector: std::vector bound to std::array
- created stable_vector: vector with stable references

* grow thread_data when new thread created

* causal updates

- data: improve compute_eligible_lines to ignore lambdas
- data: use new thread_data
- delay: use new thread_data
- experiment: properly support latency points
- experiment: support file clobber
- experiment: ensure non-zero experiment time
- progress_point: use new thread_data
- backtrace_causal: use new thread_data

* Update causal-profiling tests

* fix omnitrace-causal backslash escaping

* process-causal-json script

* restructure causal implementation

- update verbose messages for omnitrace-causal diagnose_status
- migrated causal implementation in sampling.cpp to causal/sampling.cpp
- OMNITRACE_USE_CAUSAL does not require OMNITRACE_USE_SAMPLING
- added Mode::Causal
- causal sampling uses same signals as regular sampling
- moved tracing::thread_init to implementation file
- combined tracing::thread_init and tracing::thread_init_sampling
- added causal/components folder
- pthread_create_gotcha::wrapper_config
- omnitrace_preload checks OMNITRACE_USE_CAUSAL
  - updates mode accordingly

* update timemory submodule

* update timemory submodule

* causal example updates

- causal for lulesh

* perf code + utility - helpers

- relocated causal perf code
- placement new when generating unique ptr trait for potentially allocating during sampling
- additions to utility header
- removed previously added helpers.hpp

* update timemory submodule

* Default env variables for omnitrace-causal

- activate OMNITRACE_USE_KOKKOSP, etc.

* update stable_vector and static_vector

- static vector can use atomic for size tracking for thread-safe situations

* update causal example header

- CAUSAL_PROGRESS_NAMED
- use CAUSAL_ prefix for some macros

* Tweak lulesh example

- use CAUSAL_PROGRESS instead of CAUSAL_BEGIN and CAUSAL_END

* omnitrace-sample support for causal mode

- set OMNITRACE_USE_SAMPLING to off when OMNITRACE_MODE=causal

* refactor and cleanup code_object

- scope filter
- fixes to address_range

* overhaul causal data + causal config options

- full support for function and line mode
- support static vector of instruction pointers
- improve line info mapping resolution
- remove thread-locality from miscellanous functions where unnecessary
- causal options for {binary,source,function,fileline} exclusion

* causal experiment, sampling, and backtrace updates

- is_selected + unwind address array
- experiment warning about progress points
- increased buffer size for backtrace_casual sampler
- backtrace_causal only stores IP addresses instead of full unwind info

* category_region updates

- minor refactor
- local_category_region::mark

* Update causal tests

* Bump version to 1.8.0

* omnitrace-causal args + CLOBBER -> RESET

- renamed OMNITRACE_CAUSAL_FILE_CLOBBER to OMNITRACE_CAUSAL_FILE_RESET
- updated omnitrace-causal exe to support recently added configuration options
- other miscellaneous tweaks to data.cpp, experiment.cpp, and sampling.cpp

* Refactor causal and code_object

- code_object.hpp and code_object.cpp moved into binary folder
- causal components namespaced into omnitrace::causal::component
- moved sample_data out of backtrace_causal and into own file
- renamed backtrace_causal to causal::component::backtrace

* preload omnitrace_init + OMNITRACE_DEBUG_MARK

- env OMNITRACE_DEBUG_MARK
- fix omnitrace_init call when LD_PRELOAD-ing omnitrace

* Fix fileline support + line-info output names + experiment log

- line-info log files are prefixed with experiment name
- don't print experiment duration when E2E
- account for fileline scope in analysis

* KokkosP: OMNITRACE_KOKKOSP_NAME_LENGTH_MAX

- config option to limit the name of kokkos tool callbacks
- remove [kokkos] from KokkosP names

* Update causal example

- minor tweaks to decrease probability of overlapping regions in binary

* omnitrace-causal update

- prefix N / Ntot in environment printout

* Miscellaneous updates

- causal::finish_experimenting()
- OMNITRACE_CAUSAL_RANDOM_SEED
- KokkosP causal updates
  - exclude some callbacks, make some callbacks unique, etc.
- address_range::operator+=(address_range)
- combine contiguous ranges in binary/analysis.cpp when file, func, line is same and address range is contiguous
- bfd_line_info reads inline info
- wait for perform_experiment_impl to complete
- causal::delay updates
  - delay::process checks if experiment is active
  - uses threading::get_id()
- experiment scales duration up for larger speedup experiments
- line info samples includes excluded lines
- sampler uses CLOCK_REALTIME
- blocking_gotcha updates
  - is no longer fully static
  - adds audit routine which sets the postblock value to zero if try/timed routine fails
- category::host was added to causal_throughput_categories_t
- pthread_create_gotcha sets new threads local parent delay
  - was using internal value, now uses sequent value

* Causal improvements to KokkosP

* Updates to experiment time scaling

- use stats instead of just max

* binary/link_map.{hpp,cpp}

* update process-causal-json.py

* Folded fileline scope into source scope

* Update documentation

- Add documentation for causal profiling
- Replace 'Omnitrace' with 'OmniTrace' everywhere

* Update causal-helpers.cmake + omnitrace-testing.cmake

- split tests/CMakeLists.txt partially into omnitrace-testing.cmake

* omnitrace/causal.h

- OMNITRACE_CAUSAL_PROGRESS
- OMNITRACE_CAUSAL_PROGRESS_NAMED
- OMNITRACE_CAUSAL_BEGIN
- OMNITRACE_CAUSAL_END

* selected_entry + remove default filters for lambdas and operator()

- selected entry stores range and binary load address

* update process-causal-json.py

* format examples/lulesh/CMakeLists.txt

* causal-helpers find_package(Threads)

* OMNITRACE_KOKKOSP_KERNEL_LOGGER

- was OMNITRACE_KOKKOS_KERNEL_LOGGER

* quiet find of coz-profiler

* Fix rocm_smi exception handling

* Update timemory submodule (binutils)

- fix binutls compile error on some systems
- bump binutils to v2.40

* Fix miscellaneous tests

* OMNITRACE_KOKKOSP_PREFIX

* revert rocm_smi handling

* ElfUtils updates

- default to download version 0.188
- add -Wno-error=null-dereference due to GCC 12 compiler error

* Update causal example

* Remove OMNITRACE_VERBOSE from global workflow envs

* Reliable causal test

* disable compilation of causal perf files

* Remove set_current_selection with unwind stack

* update timemory submodule

* fix for segfault on bionic

- locking in TLS dtor was causing segfault

* remove experiment::is_selected(unwind_stack_t)

* update default init of selected_entry

* Fix for when IP is not offset by load address

* Update CMakeLists.txt

* Miscellaneous updates

- OMNITRACE_WARNING_OR_CI_THROW
- OMNITRACE_REQUIRE
- OMNITRACE_PREFER
- fixed issues with no ASLR
-  added load address variable and ipaddr() func to basic/bfd line info
- removed get_basic() from dwarf_line_info
- TIMEMORY_PREFER -> OMNITRACE_PREFER
- removed previously added binary_address and range variables from selected_entry

* Removed superfluous CausalState

* Additional causal tests (lulesh + kokkos)

* filter, prefer, analysis ASLR handling

- removed default filter on cold functions
- fixed OMNITRACE_PREFER
- fixed analysis ASLR handling

* Tweak line-info output

* Removed some superfluous code

- causal/delay
- causal/selected_entry

* Exclude main.cold in function mode

* Update validate-perfetto-proto.py

- account for occasional http errors

* Add sampling test disabling tmp files

* argparser for process-causal-json

- support validation
- support filtering

* Avoid pthread_{lock,unlock} in sampling offload

- use homemade atomic_mutex/atomic_lock since contention will be low and using pthread tools might trigger our wrappers

* Rename process-causal-json.py

- validate-causal-json.py

* rework omnitrace_add_causal_test

- capable of performing validation
- added validation tests

* Fix kokkosp_begin_deep_copy + causal

* Tweak address range in bfd_line_info::read_pc

* Tweak analysis and data IP handling

- look for gaps

* Disable scaling experiment time by speedup

* Revert change in max threads during CI

* binary updates

- significant overhaul of binary analysis implementation
- removed "basic_line_info" and "bfd_line_info" in lieu of "symbol" class
  - symbol class has basic BFD info + vector of inlines + vector of dwarf info

* Updated causal to use new binary analysis

- Fix symbol.cpp includes

* Updated formatting target

- include *.cmake files

* Updated causal tests

- causal tests should be stable now

* Update timemory and dyninst submodules

- TPLs are stripped + built w/o debug info

* Increase tolerance for causal validation speedups

- higher speedups have more variance (increased to +/- 5 from 3)

* Support causal output for MPI

- i.e. tag with MPI rank

* omnitrace-causal launcher argument

* improve experiment sampling output

* causal data updates

- call compute lines once
- fixed filtered cached binary info
- debugging info when experiment fails to start

* Tweaked causal validation tests

* dwarf_entry ranges

* CI updates

- increase max threads to 64

* Tweak causal E2E validation tests

- more threads
- shorter thread runtime
- more iterations

* Fix shadowed variable

* fix symbol read_bfd last PC calculation

* fix maybe-uninitialized warning

* omnitrace-causal launcher update

- only inject "omnitrace-causal --" once
- throw error if no matches found

* Update causal profiling docs for launcher

* fix address range boundaries
2023-01-24 18:53:23 -06:00

11 KiB

Installation

.. toctree::
   :glob:
   :maxdepth: 4

Operating System

OmniTrace is only supported on Linux.

  • Ubuntu 18.04
  • Ubuntu 20.04
  • OpenSUSE 15.2
  • OpenSUSE 15.3
  • Other OS distributions may be supported but are not tested

Identifying the Operating System

If you are unsure of the operating system and version, the /etc/os-release and /usr/lib/os-release files contain operating system identification data for Linux systems.

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"
ID=ubuntu
...
VERSION_ID="20.04"
...

The relevent fields are ID and the VERSION_ID.

Architecture

At present, only amd64 (x86_64) architectures are tested but Dyninst supports several more architectures. Thus, omnitrace should support other CPU architectures such as aarch64, ppc64, etc.

Installing omnitrace from binary distributions

Every omnitrace release provides binary installer scripts of the form:

omnitrace-{VERSION}-{OS_DISTRIB}-{OS_VERSION}[-ROCm-{ROCM_VERSION}[-{EXTRA}]].sh

E.g.:

omnitrace-1.0.0-ubuntu-18.04-OMPT-PAPI-Python3.sh
omnitrace-1.0.0-ubuntu-18.04-ROCm-405000-OMPT-PAPI-Python3.sh
...
omnitrace-1.0.0-ubuntu-20.04-ROCm-50000-OMPT-PAPI-Python3.sh

Any of the EXTRA fields with a cmake build option (e.g. PAPI, see below) or no link requirements (e.g. OMPT) have self-contained support for these packages.

Download the appropriate binary distribution

wget https://github.com/AMDResearch/omnitrace/releases/download/v<VERSION>/<SCRIPT>

Create the target installation directory

mkdir /opt/omnitrace

Run the installer script

./omnitrace-1.0.0-ubuntu-18.04-ROCm-405000-OMPT-PAPI.sh --prefix=/opt/omnitrace --exclude-subdir

Installing OmniTrace from source

Build Requirements

OmniTrace needs a GCC compiler with full support for C++17 and CMake v3.16 or higher. The Clang compiler may be used in lieu of the GCC compiler if Dyninst is already installed.

  • GCC compiler v7+
    • Older GCC compilers may be supported but are not tested
    • Clang compilers are generally supported for OmniTrace but not Dyninst
  • CMake v3.16+

If the system installed cmake is too old, installing a new version of cmake can be done through several methods. One of the easiest options is to use PyPi (i.e. python's pip):

pip install --user 'cmake==3.18.4'
export PATH=${HOME}/.local/bin:${PATH}

Required Third-Party Packages

All of the third-party packages required by DynInst and DynInst itself can be built and installed during the build of omnitrace itself. In the list below, we list the package, the version, which package requires the package (i.e. omnitrace requires Dyninst and Dyninst requires TBB), and the CMake option to build the package alongside omnitrace:

Third-Party Library Minimum Version Required By CMake Option
Dyninst 10.0 OmniTrace OMNITRACE_BUILD_DYNINST (default: OFF)
Libunwind OmniTrace OMNITRACE_BUILD_LIBUNWIND (default: ON)
TBB 2018.6 Dyninst DYNINST_BUILD_TBB (default: OFF)
ElfUtils 0.178 Dyninst DYNINST_BUILD_ELFUTILS (default: OFF)
LibIberty Dyninst DYNINST_BUILD_LIBIBERTY (default: OFF)
Boost 1.67.0 Dyninst DYNINST_BUILD_BOOST (default: OFF)
OpenMP 4.x Dyninst

Optional Third-Party Packages

  • ROCm
    • HIP
    • Roctracer for HIP API and kernel tracing
    • ROCM-SMI for GPU monitoring
    • Rocprofiler for GPU hardware counters
  • PAPI
  • MPI
    • OMNITRACE_USE_MPI will enable full MPI support
    • OMNITRACE_USE_MPI_HEADERS will enable wrapping of the dynamically-linked MPI C function calls
      • By default, if an OpenMPI MPI distribution cannot be found, omnitrace will use a local copy of the OpenMPI mpi.h
  • Several optional third-party profiling tools supported by timemory (e.g. Caliper, TAU, CrayPAT, etc.)
Third-Party Library CMake Enable Option CMake Build Option
PAPI OMNITRACE_USE_PAPI (default: ON) OMNITRACE_BUILD_PAPI (default: ON)
MPI OMNITRACE_USE_MPI (default: OFF)
MPI (header-only) OMNITRACE_USE_MPI_HEADERS (default: ON)

Installing DynInst

Building Dyninst alongside OmniTrace

The easiest way to install Dyninst is to configure omnitrace with OMNITRACE_BUILD_DYNINST=ON. Depending on the version of Ubuntu, the apt package manager may have current enough versions of Dyninst's Boost, TBB, and LibIberty dependencies (i.e. apt-get install libtbb-dev libiberty-dev libboost-dev); however, it is possible to request Dyninst to install it's dependencies via DYNINST_BUILD_<DEP>=ON, e.g.:

git clone https://github.com/AMDResearch/omnitrace.git omnitrace-source
cmake -B omnitrace-build -DOMNITRACE_BUILD_DYNINST=ON -DDYNINST_BUILD_{TBB,ELFUTILS,BOOST,LIBIBERTY}=ON omnitrace-source

where -DDYNINST_BUILD_{TBB,BOOST,ELFUTILS,LIBIBERTY}=ON is expanded by the shell to -DDYNINST_BUILD_TBB=ON -DDYNINST_BUILD_BOOST=ON ...

Installing Dyninst via Spack

Spack is another option to install Dyninst and it's dependencies:

git clone https://github.com/spack/spack.git
source ./spack/share/spack/setup-env.sh
spack compiler find
spack external find --all --not-buildable
spack spec -I --reuse dyninst
spack install --reuse dyninst
spack load -r dyninst

Installing omnitrace

OmniTrace has cmake configuration options for supporting MPI (OMNITRACE_USE_MPI or OMNITRACE_USE_MPI_HEADERS), HIP kernel tracing (OMNITRACE_USE_ROCTRACER), sampling ROCm devices (OMNITRACE_USE_ROCM_SMI), OpenMP-Tools (OMNITRACE_USE_OMPT), hardware counters via PAPI (OMNITRACE_USE_PAPI), among others. Various additional features can be enabled via the TIMEMORY_USE_* CMake options. Any OMNITRACE_USE_<VAL> option which has a corresponding TIMEMORY_USE_<VAL> option means that the support within timemory for this feature has been integrated into omnitrace's perfetto support, e.g. OMNITRACE_USE_PAPI=<VAL> forces TIMEMORY_USE_PAPI=<VAL> and the data that timemory is able to collect via this package is passed along to perfetto and will be displayed when the .proto file is visualized in ui.perfetto.dev.

git clone https://github.com/AMDResearch/omnitrace.git omnitrace-source
cmake                                       \
    -B omnitrace-build                      \
    -D CMAKE_INSTALL_PREFIX=/opt/omnitrace  \
    -D OMNITRACE_USE_HIP=ON                 \
    -D OMNITRACE_USE_ROCM_SMI=ON            \
    -D OMNITRACE_USE_ROCTRACER=ON           \
    -D OMNITRACE_USE_PYTHON=ON              \
    -D OMNITRACE_USE_OMPT=ON                \
    -D OMNITRACE_USE_MPI_HEADERS=ON         \
    -D OMNITRACE_BUILD_PAPI=ON              \
    -D OMNITRACE_BUILD_LIBUNWIND=ON         \
    -D OMNITRACE_BUILD_DYNINST=ON           \
    -D DYNINST_BUILD_TBB=ON                 \
    -D DYNINST_BUILD_BOOST=ON               \
    -D DYNINST_BUILD_ELFUTILS=ON            \
    -D DYNINST_BUILD_LIBIBERTY=ON           \
    omnitrace-source
cmake --build omnitrace-build --target all --parallel 8
cmake --build omnitrace-build --target install
source /opt/omnitrace/share/omnitrace/setup-env.sh

MPI Support within OmniTrace

OmniTrace can have full (OMNITRACE_USE_MPI=ON) or partial (OMNITRACE_USE_MPI_HEADERS=ON) MPI support. The only difference between these two modes is whether or not the results collected via timemory and/or perfetto can be aggregated into a single output file during finalization. When full MPI support is enabled, combining the timemory results always occurs whereas combining the perfetto results is configurable via the OMNITRACE_PERFETTO_COMBINE_TRACES setting.

The primary benefits of partial or full MPI support are the automatic wrapping of MPI functions and the ability to label output with suffixes which correspond to the MPI_COMM_WORLD rank ID instead of using the system process identifier (i.e. PID). In general, it is recommended to use partial MPI support with the OpenMPI headers as this is the most portable configuration. If full MPI support is selected, make sure your target application is built against the same MPI distribution as omnitrace, i.e. do not build omnitrace with MPICH and use it on a target application built against OpenMPI. If partial support is selected, the reason the OpenMPI headers are recommended instead of the MPICH headers is because the MPI_COMM_WORLD in OpenMPI is a pointer to ompi_communicator_t (8 bytes), whereas MPI_COMM_WORLD in MPICH, it is an int (4 bytes). Building omnitrace with partial MPI support and the MPICH headers and then using omnitrace on an application built against OpenMPI will cause a segmentation fault due to the value of the MPI_COMM_WORLD being narrowed during the function wrapping before being passed along to the underlying MPI function.

Post-Installation Steps

Configure the environment

If environment modules are available and preferred:

module use /opt/omnitrace/share/modulefiles
module load omnitrace/1.0.0

Alternatively, once can directly source the setup-env.sh script:

source /opt/omnitrace/share/omnitrace/setup-env.sh

Test the executables

Successful execution of these commands indicates that the installation does not have any issues locating the installed libraries:

omnitrace --help
omnitrace-avail --help

NOTE: If ROCm support was enabled, you may have to add the path to the ROCm libraries to LD_LIBRARY_PATH, e.g. export LD_LIBRARY_PATH=/opt/rocm/lib:${LD_LIBRARY_PATH}