diff --git a/projects/rocprofiler-systems/README.md b/projects/rocprofiler-systems/README.md index 7f8c7cd4bb..e2826e2bb3 100755 --- a/projects/rocprofiler-systems/README.md +++ b/projects/rocprofiler-systems/README.md @@ -1,81 +1,32 @@ -# omnitrace: application tracing with static/dynamic binary instrumentation +# Omnitrace: Application Profiling, Tracing, and Analysis [![Ubuntu 18.04 (GCC 7, 8, MPICH)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-bionic.yml/badge.svg)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-bionic.yml) [![Ubuntu 20.04 (GCC 7, 8, 9, 10)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-focal-external.yml/badge.svg)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-focal-external.yml) [![Ubuntu 20.04 (GCC 9, MPICH, OpenMPI)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-focal.yml/badge.svg)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-focal.yml) [![Ubuntu 20.04 (GCC 9, MPICH, OpenMPI, ROCm 4.3, 4.5, 5.0)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-focal-external-rocm.yml/badge.svg)](https://github.com/AMDResearch/omnitrace/actions/workflows/ubuntu-focal-external-rocm.yml) -> ***[Omnitrace](https://github.com/AMDResearch/omnitrace) is an AMD research project and should*** -> ***not be treated as an offical part of the ROCm software stack.*** +> ***[Omnitrace](https://github.com/AMDResearch/omnitrace) is an AMD open source research project and is not supported as part of the ROCm software stack.*** -The documentation for omnitrace is available at [amdresearch.github.io/omnitrace](https://amdresearch.github.io/omnitrace/). +## Documentation -## Using Omnitrace Executable +The full documentation for [omnitrace](https://github.com/AMDResearch/omnitrace) is available at [amdresearch.github.io/omnitrace](https://amdresearch.github.io/omnitrace/). + +## Quick Start + +### Omnitrace Settings + +`omnitrace-avail -Sd` will provide a list of all the possible omnitrace settings, their current value, and a description of the setting +when running an instrumented binary. + +### Omnitrace Executable + +The `omnitrace` executable is used to instrument an existing binary. ```shell omnitrace --help omnitrace -- ``` -## Omnitrace Settings - -`omnitrace-avail -Sd` will provide a list of all the possible omnitrace settings, their current value, and a description of the setting. - -> ***Some settings may only affect the timemory backend.*** - -These settings can be set via environment variables or placed in a config file and specified via `OMNITRACE_CONFIG_FILE=/path/to/config/file`. The config file -can be a text, JSON, or XML file. Some of the most relevant settings are provided below: - -| Environment Variable | Default Value | Description | -|--------------------------------------------|--------------------------|------------------------------------------------------------------------------------------------------------------| -| `OMNITRACE_USE_PERFETTO` | `false` | Enable perfetto backend | -| `OMNITRACE_USE_PID` | `true` | Enable tagging filenames with process identifier (either MPI rank or pid) | -| `OMNITRACE_USE_ROCTRACER` | `true` | Enable ROCM tracing | -| `OMNITRACE_USE_SAMPLING` | `true` | Enable statistical sampling of call-stack | -| `OMNITRACE_USE_TIMEMORY` | `false` | Enable timemory backend | -| `OMNITRACE_BACKEND` | `inprocess` | Specify the perfetto backend to activate. Options are: 'inprocess', 'system', or 'all' | -| `OMNITRACE_BUFFER_SIZE_KB` | `1024000` | Size of perfetto buffer (in KB) | -| `OMNITRACE_COUT_OUTPUT` | `false` | Write output to stdout | -| `OMNITRACE_CRITICAL_TRACE` | `false` | Enable generation of the critical trace | -| `OMNITRACE_CRITICAL_TRACE_BUFFER_COUNT` | `2000` | Number of critical trace records to store in thread-local memory before submitting to shared buffer | -| `OMNITRACE_CRITICAL_TRACE_COUNT` | `0` | Number of critical trace to export (0 == all) | -| `OMNITRACE_CRITICAL_TRACE_DEBUG` | `false` | Enable debugging for critical trace | -| `OMNITRACE_CRITICAL_TRACE_NUM_THREADS` | `8` | Number of threads to use when generating the critical trace | -| `OMNITRACE_CRITICAL_TRACE_PER_ROW` | `0` | How many critical traces per row in perfetto (0 == all in one row) | -| `OMNITRACE_CRITICAL_TRACE_SERIALIZE_NAMES` | `false` | Include names in serialization of critical trace (mainly for debugging) | -| `OMNITRACE_DIFF_OUTPUT` | `false` | Generate a difference output vs. a pre-existing output (see also: TIMEMORY_INPUT_PATH and TIMEMORY_INPUT_PREFIX) | -| `OMNITRACE_FLAT_SAMPLING` | `false` | Ignore hierarchy in all statistical sampling entries | -| `OMNITRACE_INSTRUMENTATION_INTERVAL` | `1` | Instrumentation only takes measurements once every N function calls (not statistical) | -| `OMNITRACE_JSON_OUTPUT` | `true` | Write json output files | -| `OMNITRACE_MEMORY_PRECISION` | `-1` | Set the precision for components with 'is_memory_category' type-trait | -| `OMNITRACE_MEMORY_SCIENTIFIC` | `false` | Set the numerical reporting format for components with 'is_memory_category' type-trait | -| `OMNITRACE_MEMORY_UNITS` | `""` | Set the units for components with 'uses_memory_units' type-trait | -| `OMNITRACE_OUTPUT_FILE` | `""` | Perfetto filename | -| `OMNITRACE_OUTPUT_PATH` | `omnitrace-{EXE}-output` | Explicitly specify the output folder for results | -| `OMNITRACE_OUTPUT_PREFIX` | `""` | Explicitly specify a prefix for all output files | -| `OMNITRACE_PRECISION` | `-1` | Set the global output precision for components | -| `OMNITRACE_ROCTRACER_FLAT_PROFILE` | `false` | Ignore hierarchy in all kernels entries with timemory backend | -| `OMNITRACE_ROCTRACER_HSA_ACTIVITY` | `false` | Enable HSA activity tracing support | -| `OMNITRACE_ROCTRACER_HSA_API` | `false` | Enable HSA API tracing support | -| `OMNITRACE_ROCTRACER_HSA_API_TYPES` | `""` | HSA API type to collect | -| `OMNITRACE_ROCTRACER_TIMELINE_PROFILE` | `false` | Create unique entries for every kernel with timemory backend | -| `OMNITRACE_SAMPLING_DELAY` | `1e-06` | Number of seconds to delay activating the statistical sampling | -| `OMNITRACE_SAMPLING_FREQ` | `10` | Number of software interrupts per second when OMNITTRACE_USE_SAMPLING=ON | -| `OMNITRACE_SCIENTIFIC` | `false` | Set the global numerical reporting to scientific format | -| `OMNITRACE_SETTINGS_DESC` | `false` | Provide descriptions when printing settings | -| `OMNITRACE_SHMEM_SIZE_HINT_KB` | `40960` | Hint for shared-memory buffer size in perfetto (in KB) | -| `OMNITRACE_TEXT_OUTPUT` | `true` | Write text output files | -| `OMNITRACE_TIMELINE_SAMPLING` | `false` | Create unique entries for every sample when statistical sampling is enabled | -| `OMNITRACE_TIMEMORY_COMPONENTS` | `wall_clock` | List of components to collect via timemory (see omnitrace-avail) | -| `OMNITRACE_TIME_FORMAT` | `%F_%I.%M_%p` | Customize the folder generation when TIMEMORY_TIME_OUTPUT is enabled (see also: strftime) | -| `OMNITRACE_TIME_OUTPUT` | `true` | Output data to subfolder w/ a timestamp (see also: TIMEMORY_TIME_FORMAT) | -| `OMNITRACE_TIMING_PRECISION` | `6` | Set the precision for components with 'is_timing_category' type-trait | -| `OMNITRACE_TIMING_SCIENTIFIC` | `false` | Set the numerical reporting format for components with 'is_timing_category' type-trait | -| `OMNITRACE_TIMING_UNITS` | `""` | Set the units for components with 'uses_timing_units' type-trait | -| `OMNITRACE_TREE_OUTPUT` | `true` | Write hierarchical json output files | - -### Example Omnitrace Instrumentation - #### Binary Rewrite Rewrite the text section of an executable or library with instrumentation: @@ -130,7 +81,8 @@ export OMNITRACE_BUFFER_SIZE_KB=200000 #### Runtime Instrumentation Runtime instrumentation will not only instrument the text section of the executable but also the text sections of the -linked libraries. Thus, it may be useful to exclude those libraries via the `-ME` (module exclude) regex option. +linked libraries. Thus, it may be useful to exclude those libraries via the `-ME` (module exclude) regex option +or exclude specific functions with the `-E` regex option. ```shell omnitrace -- /path/to/app @@ -138,37 +90,17 @@ omnitrace -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app omnitrace -E 'rocr::atomic|rocr::core|rocr::HSA' -- /path/to/app ``` -## Miscellaneous Features and Caveats +### Visualizing Perfetto Results -- You may need to increase the default perfetto buffer size (1 GiB) to capture all the information - - E.g. `export OMNITRACE_BUFFER_SIZE_KB=10240000` increases the buffer size to 10 GiB -- The omnitrace library has various setting which can be configured via environment variables, you can - configure these settings to custom defaults with the omnitrace command-line tool via the `--env` option - - E.g. to default to a buffer size of 5 GB, use `--env OMNITRACE_BUFFER_SIZE_KB=5120000` - - This is particularly useful in binary rewrite mode -- Perfetto tooling is enabled by default -- Timemory tooling is disabled by default -- Enabling/disabling one of the aformentioned tools but not specifying enabling/disable the other will assume the inverse of the other's enabled state, e.g. - - `OMNITRACE_USE_PERFETTO=OFF` yields the same result `OMNITRACE_USE_TIMEMORY=ON` - - `OMNITRACE_USE_PERFETTO=ON` yields the same result as `OMNITRACE_USE_TIMEMORY=OFF` - - In order to enable _both_ timemory and perfetto, set both `OMNITRACE_USE_TIMEMORY=ON` and `OMNITRACE_USE_PERFETTO=ON` - - Setting `OMNITRACE_USE_TIMEMORY=OFF` and `OMNITRACE_USE_PERFETTO=OFF` will disable all instrumentation but call-stack sampling (`OMNITRACE_USE_SAMPLING=ON`) is still available. -- Use `omnitrace-avail -S` to view the various settings for timemory -- Set `OMNITRACE_COMPONENTS=""` to control which components timemory collects - - The list of components and their descriptions can be viewed via `omnitrace-avail -Cd` - - The list of components and their string identifiers can be view via `omnitrace-avail -Cbs` -- You can filter any `omnitrace-avail` results via `-r -hl` +Visit [ui.perfetto.dev](https://ui.perfetto.dev) in your browser and open up the `.proto` file(s) created by omnitrace. -## Omnitrace Output +![omnitrace-perfetto](source/docs/images/omnitrace-perfetto.png) -`omnitrace` will create an output directory named `omnitrace--output`, e.g. if your executable -is named `app.inst`, the output directory will be `omnitrace-app.inst-output`. Depending on whether -`OMNITRACE_TIME_OUTPUT=ON` (the default when perfetto is enabled), there will be a subdirectory with the date and time, -e.g. `2021-09-02_01.03_PM`. Within this directory, all perfetto files will be named `perfetto-trace..proto` or -when `OMNITRACE_USE_MPI=ON`, `perfetto-trace..proto` (assuming omnitrace was built with MPI support). +![omnitrace-rocm](source/docs/images/omnitrace-rocm.png) -You can explicitly control the output path and naming scheme of the files via the `OMNITRACE_OUTPUT_FILE` environment -variable. The special character sequences `%pid%` and `%rank%` will be replaced with the PID or MPI rank, respectively. +![omnitrace-rocm-flow](source/docs/images/omnitrace-rocm-flow.png) + +![omnitrace-user-api](source/docs/images/omnitrace-user-api.png) ## Merging the traces from rocprof and omnitrace @@ -196,7 +128,7 @@ julia -e 'using Pkg; for name in ["JSON", "DataFrames", "Dates", "CSV", "Chain", Use the `omnitrace-merge.jl` Julia script to merge rocprof and perfetto traces. ```shell -export OMNITRACE_ROCTRACER_ENABLED=OFF +export OMNITRACE_USE_ROCTRACER=OFF rocprof --hip-trace --roctx-trace --stats ./app.inst omnitrace-merge.jl results.json omnitrace-app.inst-output/2021-09-02_01.03_PM/*.proto ``` @@ -214,7 +146,7 @@ perfetto --out ./htrace.out --txt -c ${OMNITRACE_ROOT}/share/roctrace.cfg then in the window running the application, configure the omnitrace instrumentation to use the system backend: ```shell -export OMNITRACE_BACKEND_SYSTEM=1 +export OMNITRACE_BACKEND=system ``` for the merge use the `htrace.out`: diff --git a/projects/rocprofiler-systems/source/docs/about.md b/projects/rocprofiler-systems/source/docs/about.md index d85ef8be4c..771c769286 100644 --- a/projects/rocprofiler-systems/source/docs/about.md +++ b/projects/rocprofiler-systems/source/docs/about.md @@ -6,8 +6,7 @@ :maxdepth: 4 ``` -> ***[Omnitrace](https://github.com/AMDResearch/omnitrace) is an AMD research project and should*** -> ***not be treated as an offical part of the ROCm software stack.*** +> ***[Omnitrace](https://github.com/AMDResearch/omnitrace) is an AMD open source research project and is not supported as part of the ROCm software stack.*** [Browse Omnitrace source code on Github](https://github.com/AMDResearch/omnitrace) diff --git a/projects/rocprofiler-systems/source/docs/images/omnitrace-perfetto.png b/projects/rocprofiler-systems/source/docs/images/omnitrace-perfetto.png new file mode 100644 index 0000000000..5bd8da7279 Binary files /dev/null and b/projects/rocprofiler-systems/source/docs/images/omnitrace-perfetto.png differ diff --git a/projects/rocprofiler-systems/source/docs/images/omnitrace-rocm-flow.png b/projects/rocprofiler-systems/source/docs/images/omnitrace-rocm-flow.png new file mode 100644 index 0000000000..ee188b455a Binary files /dev/null and b/projects/rocprofiler-systems/source/docs/images/omnitrace-rocm-flow.png differ diff --git a/projects/rocprofiler-systems/source/docs/images/omnitrace-rocm.png b/projects/rocprofiler-systems/source/docs/images/omnitrace-rocm.png new file mode 100644 index 0000000000..8f80ae6a8a Binary files /dev/null and b/projects/rocprofiler-systems/source/docs/images/omnitrace-rocm.png differ diff --git a/projects/rocprofiler-systems/source/docs/images/omnitrace-user-api.png b/projects/rocprofiler-systems/source/docs/images/omnitrace-user-api.png new file mode 100644 index 0000000000..e1d748a5fd Binary files /dev/null and b/projects/rocprofiler-systems/source/docs/images/omnitrace-user-api.png differ diff --git a/projects/rocprofiler-systems/source/docs/output.md b/projects/rocprofiler-systems/source/docs/output.md index cf2c21f161..2337d7108e 100644 --- a/projects/rocprofiler-systems/source/docs/output.md +++ b/projects/rocprofiler-systems/source/docs/output.md @@ -220,7 +220,15 @@ set `OMNITRACE_OUTPUT_PREFIX="%argt%-"` and let omnitrace cleanly organize the o ## Perfetto Output Use the `OMNITRACE_OUTPUT_FILE` to specify a specific location. If this is an absolute path, then all `OMNITRACE_OUTPUT_PATH`, etc. -settings will be ignored. +settings will be ignored. Visit [ui.perfetto.dev](https://ui.perfetto.dev) and open this file. + +![omnitrace-perfetto](images/omnitrace-perfetto.png) + +![omnitrace-rocm](images/omnitrace-rocm.png) + +![omnitrace-rocm-flow](images/omnitrace-rocm-flow.png) + +![omnitrace-user-api](images/omnitrace-user-api.png) ## Timemory Output