Files
rocm-systems/projects/rocprofiler-systems/README.md
T
Jeffrey Novotny dfaa4dc9c5 Omnitrace docs refactoring (#353)
* Add Sphinx and Read the Docs configs

* Add documentation workflow configurations

* Changed macros verbprintf and verbprintf_bare so they write to stdout… (#346)

Flush stdout when listing keys + bump verbose level for GPU count

* Removing static version asserts. (#347)

It is causing failures on our internal builds

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Check for an empty vector before popping (#350)

Protect from possible seg. fault

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Add release links to installation.md (#351)

* Initial infrastructure rework for Omnitrace refactoring and a rewrite of the What is file

* Add files in conceptual section, along with images and infrastructure changes.

* Formatting and style fixes for files in conceptual directory

* Add quick start install guide and fix spelling errors in other files

* Add install document and fix code tags. Infrastructure changes

* Add two how-to guides along with infra changes and spelling fixes

* Add two new how to files and fix errors in the last commit

* Fix spelling mistakes

* Add new how to file on causal profiling and infra changes.

* Add how to file on interpreting Omnitrace output, fixes, and images

* Add remaining how-to guides and reference materials along with fixes and infrastructure

* Add YouTube file and fix spelling and formatting

* Fix a few loose ends and add link to license page

* Add Sphinx and Doxygen infrastructure and some additional corrections

* Update rocm-docs-core

* Fix Doxyfile

* Fix path to API header files

* Run doxysphinx in conf.py

* Add back custom css for doxygen

* Remove doxygenlayout

* Add api to toc

* Update Doxyfile

Generate from source .in

* Proofreading edits and other changes

* Add .gitignore for Doxygen and remove deprecated words and typos

* Fix one additional typo

* Turn off dot

* Update doxyfile strip from path

* Workflow, submodules, and thread info Updates (#352)

* Update CI workflows

- use node20 workflow packages

* Update tests/source/CMakeLists.txt

- Use OMNITRACE_TRACE and OMNTRACE_PROFILE instead of perfetto/timemory

* Update timemory submodule

- argparse: requires -> required
- parse callbacks

* Update thread_info.cpp

- fix causal::delay::get_local usage

* Update timemory submodule

* Update kokkos submodule

- release 3.7.02

* Revert opensuse.yml and ubuntu-bionic.yml to use node16 workflows

* Update docs.yml

* ROCm 6.1 Installers (#349)

* Add ROCm 6.1 to packages
* Bump version to 1.11.3
* Add 6.1 support to the docker build support.
   Simplified this by adding 6.* to case statements, now that repo links have been standardized.

* Update timemory submodule (#354)

- fix argparse::argument::required template deduction

* Build omnitrace-rt library (#355)

* Build omnitrace-rt library

- Explicitly build dyninstAPI_RT as omnitrace-rt so that the SONAME in the ELF is omnitrace-rt instead of dyninstAPI_RT
- Create symbolic link lib/omnitrace/libdyninstAPI_RT.so which points to lib/libomnitrace-rt.so
- Simplify build tree location of libomnitrace-rt.so since it is ../lib from the bin directory even in the build tree
- Update dyninst submodule with minor tweaks to dyninstAPI_RT/CMakeLists.txt

* Update source/lib/omnitrace-rt/cmake/platform.cmake

* Use ftpmirror.gnu.org instead of ftp.gnu.org

- in timemory and dyninst submodules
- minor .clang-tidy tweak

* Executables append omnitrace library directory to LD_LIBRARY_PATH (#356)

- omnitrace-run, omnitrace-sample, and omnitrace-causal now automatically append the LD_LIBRARY_PATH with the directory containing the omnitrace libraries
  - this helps ensure that binary rewritten exes can resolve omnitrace-rt library location

* Fix a few typos and formatting issues

* Additional fixes and minor formatting changes.

* More fixes and minor formatting changes.

* Complete second proofreading with fixes and minor formatting changes.

* Make changes to table of contents and disable linting

* Update links in the README doc to reflect the new structure.

* Align intro on the Omnitrace index page with the first paragraph of the what-is page

* Changes and edits based on review comments

* Additional changes and edits based on external review

* Additional updates and changes from the external review of Omnitrace

* Additional changes based on the external review

* New round of edits based on the external review

* Additional edits based on the external review

* Changes to address comments from the internal review

* Correct to the RHEL SELinux note in the troubleshooting guide

* One additional change to the development guide code example

* Move troubleshooting to post-install of install.rst and other minor edits.

* Remove troubleshooting page and modify new post-install troubleshooting section on install.rst

* Refactor the how Omnitrace works page into seperate topics and redo infrastructure

* API ToC changes

* Additional API and ToC changes

* Back out API and ToC changes and update requirements.txt

* Additional API and ToC changes

* Add commit for signing purposes

* Add ElfUtils and BinUtils Download URL Overrides (#358)

* Add CMake CACHE Variable ElfUtils_DOWNLOAD_URL

Used to override the default URL to download ElfUtils from.
Useful for internal builds

Also, include a mirror to fallback to if the override URL fails.

* Update timemory submodule

Updating to include the BINUTIL_DOWNLOAD_URL override cmake
variable.

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Remove Ubuntu 18.04 and SUSE 15.2

* Update checkout action to v4

* Add `docs/**` to `paths-ignore`

Document location is being refactored.

* Modified submodules dyninst and timemory. (#361)

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Peter Jun Park <peter.park@amd.com>
Co-authored-by: ajanicijamd <Aleksandar.Janicijevic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

[ROCm/rocprofiler-systems commit: 0689797736]
2024-07-29 17:23:36 -04:00

14 KiB
Executable File

Omnitrace: Application Profiling, Tracing, and Analysis

Ubuntu 20.04 with GCC, ROCm, and MPI Ubuntu 22.04 (GCC, Python, ROCm) OpenSUSE 15.x with GCC RedHat Linux (GCC, Python, ROCm) Installer Packaging (CPack) Documentation

Overview

AMD Research is seeking to improve observability and performance analysis for software running on AMD heterogeneous systems. If you are familiar with rocprof and/or uProf, you will find many of the capabilities of these tools available via Omnitrace in addition to many new capabilities.

Omnitrace is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU. It is capable of gathering the performance information of functions through any combination of binary instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks. Omnitrace supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics. In addition to runtimes, omnitrace supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics such as the memory usage, page-faults, and context-switches, and thread-level metrics such as memory usage, CPU time, and numerous hardware counters.

Data Collection Modes

  • Dynamic instrumentation
    • Runtime instrumentation
      • Instrument executable and shared libraries at runtime
    • Binary rewriting
      • Generate a new executable and/or library with instrumentation built-in
  • Statistical sampling
    • Periodic software interrupts per-thread
  • Process-level sampling
    • Background thread records process-, system- and device-level metrics while the application executes
  • Causal profiling
    • Quantifies the potential impact of optimizations in parallel codes

Data Analysis

  • High-level summary profiles with mean/min/max/stddev statistics
    • Low overhead, memory efficient
    • Ideal for running at scale
  • Comprehensive traces
    • Every individual event/measurement
  • Application speedup predictions resulting from potential optimizations in functions and lines of code (causal profiling)

Parallelism API Support

  • HIP
  • HSA
  • Pthreads
  • MPI
  • Kokkos-Tools (KokkosP)
  • OpenMP-Tools (OMPT)

GPU Metrics

  • GPU hardware counters
  • HIP API tracing
  • HIP kernel tracing
  • HSA API tracing
  • HSA operation tracing
  • System-level sampling (via rocm-smi)
    • Memory usage
    • Power usage
    • Temperature
    • Utilization

CPU Metrics

  • CPU hardware counters sampling and profiles
  • CPU frequency sampling
  • Various timing metrics
    • Wall time
    • CPU time (process and/or thread)
    • CPU utilization (process and/or thread)
    • User CPU time
    • Kernel CPU time
  • Various memory metrics
    • High-water mark (sampling and profiles)
    • Memory page allocation
    • Virtual memory usage
  • Network statistics
  • I/O metrics
  • ... many more

Documentation

The full documentation for omnitrace is available at the ROCm Omnitrace documentation repository. See the Getting Started documentation for general tips and a detailed discussion about sampling vs. binary instrumentation.

Quick Start

Installation

  • Visit Releases page
  • Select appropriate installer (recommendation: .sh scripts do not require super-user priviledges unlike the DEB/RPM installers)
    • If targeting a ROCm application, find the installer script with the matching ROCm version
    • If you are unsure about your Linux distro, check /etc/os-release or use the omnitrace-install.py script

If the above recommendation is not desired, download the omnitrace-install.py and specify --prefix <install-directory> when executing it. This script will attempt to auto-detect a compatible OS distribution and version. If ROCm support is desired, specify --rocm X.Y where X is the ROCm major version and Y is the ROCm minor version, e.g. --rocm 5.4.

wget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-install.py
python3 ./omnitrace-install.py --prefix /opt/omnitrace/rocm-5.4 --rocm 5.4

See the Installation Documentation for detailed information.

Setup

NOTE: Replace /opt/omnitrace below with installation prefix as necessary.

  • Option 1: Source setup-env.sh script
source /opt/omnitrace/share/omnitrace/setup-env.sh
  • Option 2: Load modulefile
module use /opt/omnitrace/share/modulefiles
module load omnitrace
  • Option 3: Manual
export PATH=/opt/omnitrace/bin:${PATH}
export LD_LIBRARY_PATH=/opt/omnitrace/lib:${LD_LIBRARY_PATH}

Omnitrace Settings

Generate an omnitrace configuration file using omnitrace-avail -G omnitrace.cfg. Optionally, use omnitrace-avail -G omnitrace.cfg --all for a verbose configuration file with descriptions, categories, etc. Modify the configuration file as desired, e.g. enable perfetto, timemory, sampling, and process-level sampling by default and tweak some sampling default values:

# ...
OMNITRACE_TRACE                = true
OMNITRACE_PROFILE              = true
OMNITRACE_USE_SAMPLING         = true
OMNITRACE_USE_PROCESS_SAMPLING = true
# ...
OMNITRACE_SAMPLING_FREQ        = 50
OMNITRACE_SAMPLING_CPUS        = all
OMNITRACE_SAMPLING_GPUS        = $env:HIP_VISIBLE_DEVICES

Once the configuration file is adjusted to your preferences, either export the path to this file via OMNITRACE_CONFIG_FILE=/path/to/omnitrace.cfg or place this file in ${HOME}/.omnitrace.cfg to ensure these values are always read as the default. If you wish to change any of these settings, you can override them via environment variables or by specifying an alternative OMNITRACE_CONFIG_FILE.

Call-Stack Sampling

The omnitrace-sample executable is used to execute call-stack sampling on a target application without binary instrumentation. Use a double-hypen (--) to separate the command-line arguments for omnitrace-sample from the target application and it's arguments.

omnitrace-sample --help
omnitrace-sample <omnitrace-options> -- <exe> <exe-options>
omnitrace-sample -f 1000 -- ls -la

Binary Instrumentation

The omnitrace executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside the execution an instrumented binary, to help "fill in the gaps" between the instrumentation via setting the OMNITRACE_USE_SAMPLING configuration variable to ON. Similar to omnitrace-sample, use a double-hypen (--) to separate the command-line arguments for omnitrace from the target application and it's arguments.

omnitrace-instrument --help
omnitrace-instrument <omnitrace-options> -- <exe-or-library> <exe-options>

Binary Rewrite

Rewrite the text section of an executable or library with instrumentation:

omnitrace-instrument -o app.inst -- /path/to/app

In binary rewrite mode, if you also want instrumentation in the linked libraries, you must also rewrite those libraries. Example of rewriting the functions starting with "hip" with instrumentation in the amdhip64 library:

mkdir -p ./lib
omnitrace-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4
export LD_LIBRARY_PATH=${PWD}/lib:${LD_LIBRARY_PATH}

Verify via ldd that your executable will load the instrumented library -- if you built your executable with an RPATH to the original library's directory, then prefixing LD_LIBRARY_PATH will have no effect.

Once you have rewritten your executable and/or libraries with instrumentation, you can just run the (instrumented) executable or exectuable which loads the instrumented libraries normally, e.g.:

omnitrace-run -- ./app.inst

If you want to re-define certain settings to new default in a binary rewrite, use the --env option. This omnitrace option will set the environment variable to the given value but will not override it. E.g. the default value of OMNITRACE_PERFETTO_BUFFER_SIZE_KB is 1024000 KB (1 GiB):

# buffer size defaults to 1024000
omnitrace-instrument -o app.inst -- /path/to/app
omnitrace-run -- ./app.inst

Passing --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 will change the default value in app.inst to 5120000 KiB (5 GiB):

# defaults to 5 GiB buffer size
omnitrace-instrument -o app.inst --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 -- /path/to/app
omnitrace-run -- ./app.inst
# override default 5 GiB buffer size to 200 MB via command-line
omnitrace-run --trace-buffer-size=200000 -- ./app.inst
# override default 5 GiB buffer size to 200 MB via environment
export OMNITRACE_PERFETTO_BUFFER_SIZE_KB=200000
omnitrace-run -- ./app.inst

Runtime Instrumentation

Runtime instrumentation will not only instrument the text section of the executable but also the text sections of the linked libraries. Thus, it may be useful to exclude those libraries via the -ME (module exclude) regex option or exclude specific functions with the -E regex option.

omnitrace-instrument -- /path/to/app
omnitrace-instrument -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app
omnitrace-instrument -E 'rocr::atomic|rocr::core|rocr::HSA' --  /path/to/app

Python Profiling and Tracing

Use the omnitrace-python script to profile/trace Python interpreter function calls. Use a double-hypen (--) to separate the command-line arguments for omnitrace-python from the target script and it's arguments.

omnitrace-python --help
omnitrace-python <omnitrace-options> -- <python-script> <script-args>
omnitrace-python -- ./script.py

Please note, the first argument after the double-hyphen must be a Python script, e.g. omnitrace-python -- ./script.py.

If you need to specify a specific python interpreter version, use omnitrace-python-X.Y where X.Y is the Python major and minor version:

omnitrace-python-3.8 -- ./script.py

If you need to specify the full path to a Python interpreter, set the PYTHON_EXECUTABLE environment variable:

PYTHON_EXECUTABLE=/opt/conda/bin/python omnitrace-python -- ./script.py

If you want to restrict the data collection to specific function(s) and its callees, pass the -b / --builtin option after decorating the function(s) with @profile. Use the @noprofile decorator for excluding/ignoring function(s) and its callees:

def foo():
    pass

@noprofile
def bar():
    foo()

@profile
def spam():
    foo()
    bar()

Each time spam is called during profiling, the profiling results will include 1 entry for spam and 1 entry for foo via the direct call within spam. There will be no entries for bar or the foo invocation within it.

Trace Visualization

  • Visit ui.perfetto.dev in the web-browser
  • Select "Open trace file" from panel on the left
  • Locate the omnitrace perfetto output (extension: .proto)

omnitrace-perfetto

omnitrace-rocm

omnitrace-rocm-flow

omnitrace-user-api

Using Perfetto tracing with System Backend

Perfetto tracing with the system backend supports multiple processes writing to the same output file. Thus, it is a useful technique if Omnitrace is built with partial MPI support because all the perfetto output will be coalesced into a single file. The installation docs for perfetto can be found here. If you are building omnitrace from source, you can configure CMake with OMNITRACE_INSTALL_PERFETTO_TOOLS=ON and the perfetto and traced applications will be installed as part of the build process. However, it should be noted that to prevent this option from accidentally overwriting an existing perfetto install, all the perfetto executables installed by omnitrace are prefixed with omnitrace-perfetto-, except for the perfetto executable, which is just renamed omnitrace-perfetto.

Enable traced and perfetto in the background:

pkill traced
traced --background
perfetto --out ./omnitrace-perfetto.proto --txt -c ${OMNITRACE_ROOT}/share/perfetto.cfg --background

NOTE: if the perfetto tools were installed by omnitrace, replace traced with omnitrace-perfetto-traced and perfetto with omnitrace-perfetto.

Configure omnitrace to use the perfetto system backend via the --perfetto-backend option of omnitrace-run:

# enable sampling on the uninstrumented binary
omnitrace-run --sample --trace --perfetto-backend=system -- ./myapp
# trace the instrument the binary
omnitrace-instrument -o ./myapp.inst -- ./myapp
omnitrace-run --trace --perfetto-backend=system -- ./myapp.inst

or via the --env option of omnitrace-instrument + runtime instrumentation:

omnitrace-instrument --env OMNITRACE_PERFETTO_BACKEND=system -- ./myapp