Jonathan R. Madsen 45be03906a RCCL support (#93)
* Initial support for RCCL

* OMNITRACE_USE_RCCLP + sampling tweaks

- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile

* Update docker and workflows to download RCCL

* Update CPack DEB with rocprofiler dependency

* Rework rccl into library and library/components folder

- add tpls/rccl/rccl/rccl.h

* Fix timemory includes

* rcclp inline definitions when disabled

* Tweaks to ubuntu-focal-external-rocm

- disable ompt
- enable building testing

* Tweaks to ubuntu-focal-external-rocm

- ctest exclude

* Tweak ubuntu-focal.yml

- remove source /.../setup-env.sh, replace with $GITHUB_ENV

* Fix ubuntu-focal-rocm + OMPI + root

* Improved rocm-smi error handling

- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled

* formatting

* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL

* Update RCCL include directory

- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>

* RCCL Testing

- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes

* Handle RCCL include w/o HIP

* RCCL requires HIP

* Update OMNITRACE_SAMPLING_CPUS for testing

* Update tests/CMakeLists.txt

* Debug settings

* Install MPI even when USE_MPI=OFF

* exclude printf

* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS

* update ubuntu rocm workflow

* Fix configure env step for ubuntu rocm
2022-07-25 12:16:11 -05:00
2022-07-25 12:16:11 -05:00
2022-07-25 12:16:11 -05:00
2022-07-25 12:16:11 -05:00
2022-07-25 12:16:11 -05:00
2022-07-25 05:03:51 -05:00
2022-07-25 12:16:11 -05:00
2022-07-25 12:16:11 -05:00
2022-07-25 12:16:11 -05:00
2021-11-24 04:59:59 -06:00
2022-04-21 21:36:07 -05:00
2022-07-25 12:16:11 -05:00
2022-07-23 03:02:31 -05:00
2022-07-23 03:02:31 -05:00

Omnitrace: Application Profiling, Tracing, and Analysis

Ubuntu 18.04 with GCC and MPICH Ubuntu 20.04 with GCC, ROCm, and MPI OpenSUSE 15.x with GCC

Omnitrace is an AMD open source research project and is not supported as part of the ROCm software stack.

Overview

AMD Research is seeking to improve observability and performance analysis for software running on AMD heterogeneous systems. If you are familiar with rocprof and/or uProf, you will find many of the capabilities of these tools available via Omnitrace in addition to many new capabilities.

Omnitrace is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU. It is capable of gathering the performance information of functions through any combination of binary instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks. Omnitrace supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics. In addition to runtimes, omnitrace supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics such as the memory usage, page-faults, and context-switches, and thread-level metrics such as memory usage, CPU time, and numerous hardware counters.

Data Collection Modes

  • Dynamic instrumentation
    • Runtime instrumentation
      • Instrument executable and shared libraries at runtime
    • Binary rewriting
      • Generate a new executable and/or library with instrumentation built-in
  • Statistical sampling
    • Periodic software interrupts per-thread
  • Process-level sampling
    • Background thread records process-, system- and device-level metrics while the application executes
  • Critical trace generation

Data Analysis

  • High-level summary profiles with mean/min/max/stddev statistics
    • Low overhead, memory efficient
    • Ideal for running at scale
  • Comprehensive traces
    • Every individual event/measurement
  • Critical trace analysis (alpha)

Parallelism API Support

  • HIP
  • HSA
  • Pthreads
  • MPI
  • Kokkos-Tools (KokkosP)
  • OpenMP-Tools (OMPT)

GPU Metrics

  • GPU hardware counters
  • HIP API tracing
  • HIP kernel tracing
  • HSA API tracing
  • HSA operation tracing
  • System-level sampling (via rocm-smi)
    • Memory usage
    • Power usage
    • Temperature
    • Utilization

CPU Metrics

  • CPU hardware counters sampling and profiles
  • CPU frequency sampling
  • Various timing metrics
    • Wall time
    • CPU time (process and/or thread)
    • CPU utilization (process and/or thread)
    • User CPU time
    • Kernel CPU time
  • Various memory metrics
    • High-water mark (sampling and profiles)
    • Memory page allocation
    • Virtual memory usage
  • Network statistics
  • I/O metrics
  • ... many more

Documentation

The full documentation for omnitrace is available at amdresearch.github.io/omnitrace.

Quick Start

Omnitrace Settings

Generate an omnitrace configuration file using omnitrace-avail -G omnitrace.cfg. Optionally, use omnitrace-avail -G omnitrace.cfg --all for a verbose configuration file with descriptions, categories, etc. Modify the configuration file as desired, e.g. enable perfetto, timemory, sampling, and process-level sampling by default and tweak some sampling default values:

# ...
OMNITRACE_USE_PERFETTO         = true
OMNITRACE_USE_TIMEMORY         = true
OMNITRACE_USE_SAMPLING         = true
OMNITRACE_USE_PROCESS_SAMPLING = true
# ...
OMNITRACE_SAMPLING_FREQ        = 50
OMNITRACE_SAMPLING_CPUS        = all
OMNITRACE_SAMPLING_GPUS        = $env:HIP_VISIBLE_DEVICES

Once the configuration file is adjusted to your preferences, either export the path to this file via OMNITRACE_CONFIG_FILE=/path/to/omnitrace.cfg or place this file in ${HOME}/.omnitrace.cfg to ensure these values are always read as the default. If you wish to change any of these settings, you can override them via environment variables or by specifying an alternative OMNITRACE_CONFIG_FILE.

Omnitrace Executable

The omnitrace executable is used to instrument an existing binary.

omnitrace --help
omnitrace <omnitrace-options> -- <exe-or-library> <exe-options>

Binary Rewrite

Rewrite the text section of an executable or library with instrumentation:

omnitrace -o app.inst -- /path/to/app

In binary rewrite mode, if you also want instrumentation in the linked libraries, you must also rewrite those libraries. Example of rewriting the functions starting with "hip" with instrumentation in the amdhip64 library:

mkdir -p ./lib
omnitrace -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4
export LD_LIBRARY_PATH=${PWD}/lib:${LD_LIBRARY_PATH}

Verify via ldd that your executable will load the instrumented library -- if you built your executable with an RPATH to the original library's directory, then prefixing LD_LIBRARY_PATH will have no effect.

Once you have rewritten your executable and/or libraries with instrumentation, you can just run the (instrumented) executable or exectuable which loads the instrumented libraries normally, e.g.:

./app.inst

If you want to re-define certain settings to new default in a binary rewrite, use the --env option. This omnitrace option will set the environment variable to the given value but will not override it. E.g. the default value of OMNITRACE_PERFETTO_BUFFER_SIZE_KB is 1024000 KB (1 GiB):

# buffer size defaults to 1024000
omnitrace -o app.inst -- /path/to/app
./app.inst

Passing --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 will change the default value in app.inst to 5120000 KiB (5 GiB):

# defaults to 5 GiB buffer size
omnitrace -o app.inst --env OMNITRACE_PERFETTO_BUFFER_SIZE_KB=5120000 -- /path/to/app
./app.inst
# override default 5 GiB buffer size to 200 MB
export OMNITRACE_PERFETTO_BUFFER_SIZE_KB=200000
./app.inst

Runtime Instrumentation

Runtime instrumentation will not only instrument the text section of the executable but also the text sections of the linked libraries. Thus, it may be useful to exclude those libraries via the -ME (module exclude) regex option or exclude specific functions with the -E regex option.

omnitrace -- /path/to/app
omnitrace -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app
omnitrace -E 'rocr::atomic|rocr::core|rocr::HSA' --  /path/to/app

Visualizing Perfetto Results

Visit ui.perfetto.dev in your browser and open up the .proto file(s) created by omnitrace.

omnitrace-perfetto

omnitrace-rocm

omnitrace-rocm-flow

omnitrace-user-api

Using Perfetto tracing with System Backend

Perfetto tracing with the system backend supports multiple processes writing to the same output file. Thus, it is a useful technique if Omnitrace is built with partial MPI support because all the perfetto output will be coalesced into a single file. The installation docs for perfetto can be found here. If you are building omnitrace from source, you can configure CMake with OMNITRACE_INSTALL_PERFETTO_TOOLS=ON and the perfetto and traced applications will be installed as part of the build process. However, it should be noted that to prevent this option from accidentally overwriting an existing perfetto install, all the perfetto executables installed by omnitrace are prefixed with omnitrace-perfetto-, except for the perfetto executable, which is just renamed omnitrace-perfetto.

Enable traced and perfetto in the background:

pkill traced
traced --background
perfetto --out ./omnitrace-perfetto.proto --txt -c ${OMNITRACE_ROOT}/share/omnitrace.cfg --background

NOTE: if the perfetto tools were installed by omnitrace, replace traced with omnitrace-perfetto-traced and perfetto with omnitrace-perfetto.

Configure omnitrace to use the perfetto system backend:

export OMNITRACE_PERFETTO_BACKEND=system

And finally, execute your instrumented application. Either the binary rewritten application:

omnitrace -o ./myapp.inst -- ./myapp
./myapp.inst

Or with runtime instrumentation:

omnitrace -- ./myapp
S
توضیحات
No description provided
Readme 282 MiB
Languages
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
دیگر 1.1%