2025-06-23 15:19:47 -04:00
# ROCm Systems Profiler: Application profiling, tracing, and analysis
2021-08-25 11:39:00 -05:00
2025-02-12 21:16:00 -05:00
> [!NOTE]
> If you are using a version of ROCm prior to ROCm 6.3.1 and are experiencing problems viewing your trace in the latest version of [Perfetto](http://ui.perfetto.dev), then try using [Perfetto UI v46.0](https://ui.perfetto.dev/v46.0-35b3d9845/#!/).
2022-07-21 12:56:10 -05:00
## Overview
2024-10-15 11:20:40 -04:00
ROCm Systems Profiler (rocprofiler-systems), formerly Omnitrace, is a comprehensive profiling and tracing tool for parallel applications written in C, C++, Fortran, HIP, OpenCL, and Python which execute on the CPU or CPU+GPU.
2022-07-21 12:56:10 -05:00
It is capable of gathering the performance information of functions through any combination of binary instrumentation, call-stack sampling, user-defined regions, and Python interpreter hooks.
2024-09-27 17:18:21 -04:00
ROCm Systems Profiler supports interactive visualization of comprehensive traces in the web browser in addition to high-level summary profiles with mean/min/max/stddev statistics.
In addition to runtimes, ROCm Systems Profiler supports the collection of system-level metrics such as the CPU frequency, GPU temperature, and GPU utilization, process-level metrics
2022-07-21 12:56:10 -05:00
such as the memory usage, page-faults, and context-switches, and thread-level metrics such as memory usage, CPU time, and numerous hardware counters.
2024-09-13 16:00:08 -04:00
> [!NOTE]
2024-10-17 15:19:19 -04:00
> Full documentation is available at [ROCm Systems Profiler documentation](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/index.html) in an organized, easy-to-read, searchable format.
2024-09-13 16:00:08 -04:00
The documentation source files reside in the [`/docs` ](/docs ) folder of this repository. For information on contributing to the documentation, see
[Contribute to ROCm documentation ](https://rocm.docs.amd.com/en/latest/contribute/contributing.html )
2025-06-23 15:19:47 -04:00
### Data collection modes
2022-07-21 12:56:10 -05:00
- Dynamic instrumentation
- Runtime instrumentation
- Instrument executable and shared libraries at runtime
- Binary rewriting
- Generate a new executable and/or library with instrumentation built-in
- Statistical sampling
- Periodic software interrupts per-thread
- Process-level sampling
- Background thread records process-, system- and device-level metrics while the application executes
2023-01-24 18:53:23 -06:00
- Causal profiling
- Quantifies the potential impact of optimizations in parallel codes
2022-07-21 12:56:10 -05:00
2025-06-23 15:19:47 -04:00
### Data analysis
2022-07-21 12:56:10 -05:00
- High-level summary profiles with mean/min/max/stddev statistics
- Low overhead, memory efficient
- Ideal for running at scale
- Comprehensive traces
- Every individual event/measurement
2023-01-24 18:53:23 -06:00
- Application speedup predictions resulting from potential optimizations in functions and lines of code (causal profiling)
2022-07-21 12:56:10 -05:00
2025-06-23 15:19:47 -04:00
### Parallelism API support
2022-07-21 12:56:10 -05:00
- HIP
- HSA
- Pthreads
- MPI
- Kokkos-Tools (KokkosP)
- OpenMP-Tools (OMPT)
2025-06-23 15:19:47 -04:00
### GPU metrics
2022-07-21 12:56:10 -05:00
- GPU hardware counters
- HIP API tracing
- HIP kernel tracing
- HSA API tracing
- HSA operation tracing
2025-03-06 18:03:33 -05:00
- rocDecode API tracing
- rocJPEG API tracing
2025-03-25 17:31:27 -04:00
- System-level sampling (via AMD-SMI)
2022-07-21 12:56:10 -05:00
- Memory usage
- Power usage
- Temperature
- Utilization
2025-03-06 18:03:33 -05:00
- VCN Utilization
- JPEG Utilization
2025-11-17 18:34:28 -05:00
- XGMI interconnect metrics (link width, link speed, read/write data)
- PCIe metrics (link width, link speed, bandwidth)
2022-07-21 12:56:10 -05:00
2025-06-23 15:19:47 -04:00
> [!NOTE]
2025-11-17 18:34:28 -05:00
> The availability of VCN, JPEG, XGMI, and PCIe metrics depends on device support, system topology, and GPU architecture. If unsupported, all values will be reported as N/A in the output of `amd-smi metric --usage`.
2025-06-23 15:19:47 -04:00
### CPU metrics
2022-07-21 12:56:10 -05:00
- CPU hardware counters sampling and profiles
- CPU frequency sampling
- Various timing metrics
- Wall time
- CPU time (process and/or thread)
- CPU utilization (process and/or thread)
- User CPU time
- Kernel CPU time
- Various memory metrics
- High-water mark (sampling and profiles)
- Memory page allocation
- Virtual memory usage
- Network statistics
- I/O metrics
- ... many more
2025-06-23 15:19:47 -04:00
## Quick start
2022-05-08 07:57:09 -05:00
2022-10-19 03:30:00 -05:00
### Installation
2024-10-17 15:19:19 -04:00
See the [ROCm Systems Profiler installation guide ](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/install/install.html ) for detailed information.
2022-10-19 03:30:00 -05:00
### Setup
2025-06-23 15:19:47 -04:00
> [!NOTE]
> Replace `/opt/rocprofiler-systems` below with installation prefix as necessary.
2022-10-19 03:30:00 -05:00
2025-06-23 15:19:47 -04:00
- **Option 1**: Source `setup-env.sh` script
2022-10-19 03:30:00 -05:00
``` bash
2024-10-15 11:20:40 -04:00
source /opt/rocprofiler-systems/share/rocprofiler-systems/setup-env.sh
2022-10-19 03:30:00 -05:00
```
2025-06-23 15:19:47 -04:00
- **Option 2**: Load modulefile
2022-10-19 03:30:00 -05:00
``` bash
2024-10-15 11:20:40 -04:00
module use /opt/rocprofiler-systems/share/modulefiles
module load rocprofiler-systems
2022-10-19 03:30:00 -05:00
```
2025-06-23 15:19:47 -04:00
- **Option 3**: Manual
2022-10-19 03:30:00 -05:00
``` bash
2024-10-15 11:20:40 -04:00
export PATH = /opt/rocprofiler-systems/bin:${ PATH }
export LD_LIBRARY_PATH = /opt/rocprofiler-systems/lib:${ LD_LIBRARY_PATH }
2022-10-19 03:30:00 -05:00
```
2025-06-23 15:19:47 -04:00
### Testing environment
The `build-docker` script can be used to create a testing environment. To see the available options, use the following commands:
``` shell
cd docker
./build-docker.sh --help
```
2025-07-08 11:05:14 -04:00
> [!NOTE]
> The `-m` argument can be used to show supported OS + ROCm combinations.
2025-06-23 15:19:47 -04:00
**Example: ** To set up an Ubuntu 24.04 + ROCm 6.4 + Python 3.12 environment for building and testing, run the following commands:
``` shell
cd docker
./build-docker.sh --distro ubuntu --versions 24.04 \
--rocm-versions 6.4 --python-versions 12 --retry 1
docker run -v " $( cd .. && pwd ) " :/home/development \
-it -w /home/development \
--device /dev/kfd --device /dev/dri \
2025-06-25 09:51:30 -04:00
$( whoami) /rocprofiler-systems:release-base-ubuntu-24.04-rocm-6.4
2025-06-23 15:19:47 -04:00
```
Inside the container, clean, build, and install the project with testing enabled using the following commands:
``` shell
rm -rf rocprof-sys-build
cmake -B rocprof-sys-build -S . \
-D CMAKE_INSTALL_PREFIX = /opt/rocprofiler-systems \
-D ROCPROFSYS_USE_PYTHON = ON -D ROCPROFSYS_BUILD_DYNINST = ON \
-D ROCPROFSYS_BUILD_TBB = ON -D ROCPROFSYS_BUILD_BOOST = ON \
-D ROCPROFSYS_BUILD_ELFUTILS = ON -D ROCPROFSYS_BUILD_LIBIBERTY = ON \
-D ROCPROFSYS_BUILD_TESTING = ON
cmake --build rocprof-sys-build --target all --parallel 8
cmake --build rocprof-sys-build --target install
source /opt/rocprofiler-systems/share/rocprofiler-systems/setup-env.sh
```
> [!NOTE]
> If you see "dubious ownership" Git errors when working in the container, run:
>
> ```shell
> git config --global --add safe.directory /home/development
> ```
>
> and
>
> ```shell
> git config --global --add safe.directory /home/development/external/timemory
> ```
Then, use the following command to start automated testing:
``` shell
ctest --test-dir rocprof-sys-build --output-on-failure
```
To enable MPI testing inside the container, set the following environment variables:
``` shell
export OMPI_ALLOW_RUN_AS_ROOT = 1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM = 1
```
For manual testing, you can find the executables in `rocprof-sys-build/bin` .
### ROCm Systems Profiler settings
2022-05-08 07:57:09 -05:00
2024-10-15 11:20:40 -04:00
Generate a rocprofiler-systems configuration file using `rocprof-sys-avail -G rocprof-sys.cfg` . Optionally, use `rocprof-sys-avail -G rocprof-sys.cfg --all` for
2022-06-28 01:36:04 -05:00
a verbose configuration file with descriptions, categories, etc. Modify the configuration file as desired, e.g. enable
2024-11-07 16:49:32 -05:00
[perfetto ](https://perfetto.dev/ ), [timemory ](https://github.com/ROCm/timemory ), sampling, and process-level sampling by default
2022-06-28 01:36:04 -05:00
and tweak some sampling default values:
``` console
# ...
2024-10-15 11:20:40 -04:00
ROCPROFSYS_TRACE = true
ROCPROFSYS_PROFILE = true
ROCPROFSYS_USE_SAMPLING = true
ROCPROFSYS_USE_PROCESS_SAMPLING = true
2022-06-28 01:36:04 -05:00
# ...
2024-10-15 11:20:40 -04:00
ROCPROFSYS_SAMPLING_FREQ = 50
ROCPROFSYS_SAMPLING_CPUS = all
ROCPROFSYS_SAMPLING_GPUS = $env:HIP_VISIBLE_DEVICES
2022-06-28 01:36:04 -05:00
```
2024-10-15 11:20:40 -04:00
Once the configuration file is adjusted to your preferences, either export the path to this file via `ROCPROFSYS_CONFIG_FILE=/path/to/rocprof-sys.cfg`
2024-09-27 17:18:21 -04:00
or place this file in `${HOME}/.rocprof-sys.cfg` to ensure these values are always read as the default. If you wish to change any of these settings,
2024-10-15 11:20:40 -04:00
you can override them via environment variables or by specifying an alternative `ROCPROFSYS_CONFIG_FILE` .
2022-05-08 07:57:09 -05:00
2025-06-23 15:19:47 -04:00
### Call-Stack sampling
2022-05-08 07:57:09 -05:00
2024-09-27 17:18:21 -04:00
The `rocprof-sys-sample` executable is used to execute call-stack sampling on a target application without binary instrumentation.
Use a double-hypen (`--` ) to separate the command-line arguments for `rocprof-sys-sample` from the target application and it's arguments.
2022-10-19 03:30:00 -05:00
``` shell
2024-09-27 17:18:21 -04:00
rocprof-sys-sample --help
rocprof-sys-sample <rocprof-sys-options> -- <exe> <exe-options>
rocprof-sys-sample -f 1000 -- ls -la
2022-10-19 03:30:00 -05:00
```
2025-06-23 15:19:47 -04:00
### Binary instrumentation
2022-10-19 03:30:00 -05:00
2024-09-27 17:18:21 -04:00
The `rocprof-sys-instrument` executable is used to instrument an existing binary. Call-stack sampling can be enabled alongside
2024-10-15 11:20:40 -04:00
the execution an instrumented binary, to help "fill in the gaps" between the instrumentation via setting the `ROCPROFSYS_USE_SAMPLING`
2022-10-19 03:30:00 -05:00
configuration variable to `ON` .
2024-09-27 17:18:21 -04:00
Similar to `rocprof-sys-sample` , use a double-hypen (`--` ) to separate the command-line arguments for `rocprof-sys-instrument` from the target application and it's arguments.
2021-09-02 13:14:58 -05:00
2021-09-06 22:23:24 -05:00
``` shell
2024-09-27 17:18:21 -04:00
rocprof-sys-instrument --help
rocprof-sys-instrument <rocprof-sys-options> -- <exe-or-library> <exe-options>
2021-08-25 11:39:00 -05:00
```
2025-06-23 15:19:47 -04:00
#### Binary rewrite
2021-08-25 11:39:00 -05:00
2021-09-06 22:23:24 -05:00
Rewrite the text section of an executable or library with instrumentation:
2021-08-25 11:39:00 -05:00
2021-09-06 22:23:24 -05:00
``` shell
2024-09-27 17:18:21 -04:00
rocprof-sys-instrument -o app.inst -- /path/to/app
2021-08-25 11:39:00 -05:00
```
2021-09-02 13:14:58 -05:00
In binary rewrite mode, if you also want instrumentation in the linked libraries, you must also rewrite those libraries.
Example of rewriting the functions starting with `"hip"` with instrumentation in the amdhip64 library:
2021-09-06 22:23:24 -05:00
``` shell
2021-09-02 13:14:58 -05:00
mkdir -p ./lib
2024-09-27 17:18:21 -04:00
rocprof-sys-instrument -R '^hip' -o ./lib/libamdhip64.so.4 -- /opt/rocm/lib/libamdhip64.so.4
2021-09-02 13:14:58 -05:00
export LD_LIBRARY_PATH = ${ PWD } /lib:${ LD_LIBRARY_PATH }
2021-08-25 11:39:00 -05:00
```
2021-09-02 13:14:58 -05:00
2025-06-23 15:19:47 -04:00
> [!NOTE]
> Verify via `ldd` that your executable will load the instrumented library. If you built your executable with an RPATH to the original library's directory, then prefixing `LD_LIBRARY_PATH` will have no effect.
2021-09-02 13:14:58 -05:00
Once you have rewritten your executable and/or libraries with instrumentation, you can just run the (instrumented) executable
or exectuable which loads the instrumented libraries normally, e.g.:
2021-09-06 22:23:24 -05:00
``` shell
2024-09-27 17:18:21 -04:00
rocprof-sys-run -- ./app.inst
2021-09-06 22:23:24 -05:00
```
2024-09-27 17:18:21 -04:00
If you want to re-define certain settings to new default in a binary rewrite, use the `--env` option. This `rocprof-sys` option
2024-10-15 11:20:40 -04:00
will set the environment variable to the given value but will not override it. E.g. the default value of `ROCPROFSYS_PERFETTO_BUFFER_SIZE_KB`
2021-09-06 22:23:24 -05:00
is 1024000 KB (1 GiB):
``` shell
# buffer size defaults to 1024000
2024-09-27 17:18:21 -04:00
rocprof-sys-instrument -o app.inst -- /path/to/app
rocprof-sys-run -- ./app.inst
2021-09-06 22:23:24 -05:00
```
2024-10-15 11:20:40 -04:00
Passing `--env ROCPROFSYS_PERFETTO_BUFFER_SIZE_KB=5120000` will change the default value in `app.inst` to 5120000 KiB (5 GiB):
2021-09-06 22:23:24 -05:00
``` shell
# defaults to 5 GiB buffer size
2024-10-15 11:20:40 -04:00
rocprof-sys-instrument -o app.inst --env ROCPROFSYS_PERFETTO_BUFFER_SIZE_KB = 5120000 -- /path/to/app
2024-09-27 17:18:21 -04:00
rocprof-sys-run -- ./app.inst
2021-09-06 22:23:24 -05:00
```
``` shell
2023-03-14 19:48:29 -05:00
# override default 5 GiB buffer size to 200 MB via command-line
2024-09-27 17:18:21 -04:00
rocprof-sys-run --trace-buffer-size= 200000 -- ./app.inst
2023-03-14 19:48:29 -05:00
# override default 5 GiB buffer size to 200 MB via environment
2024-10-15 11:20:40 -04:00
export ROCPROFSYS_PERFETTO_BUFFER_SIZE_KB = 200000
2024-09-27 17:18:21 -04:00
rocprof-sys-run -- ./app.inst
2021-08-25 11:39:00 -05:00
```
2025-06-23 15:19:47 -04:00
#### Runtime instrumentation
2021-08-25 11:39:00 -05:00
2021-09-02 13:14:58 -05:00
Runtime instrumentation will not only instrument the text section of the executable but also the text sections of the
2022-05-08 07:57:09 -05:00
linked libraries. Thus, it may be useful to exclude those libraries via the `-ME` (module exclude) regex option
or exclude specific functions with the `-E` regex option.
2021-09-02 13:14:58 -05:00
2021-09-06 22:23:24 -05:00
``` shell
2024-09-27 17:18:21 -04:00
rocprof-sys-instrument -- /path/to/app
rocprof-sys-instrument -ME '^(libhsa-runtime64|libz\\.so)' -- /path/to/app
rocprof-sys-instrument -E 'rocr::atomic|rocr::core|rocr::HSA' -- /path/to/app
2021-08-25 11:39:00 -05:00
```
2025-06-23 15:19:47 -04:00
### Python profiling and tracing
2022-10-19 03:30:00 -05:00
2024-09-27 17:18:21 -04:00
Use the `rocprof-sys-python` script to profile/trace Python interpreter function calls.
Use a double-hypen (`--` ) to separate the command-line arguments for `rocprof-sys-python` from the target script and it's arguments.
2022-10-19 03:30:00 -05:00
``` shell
2024-09-27 17:18:21 -04:00
rocprof-sys-python --help
rocprof-sys-python <rocprof-sys-options> -- <python-script> <script-args>
rocprof-sys-python -- ./script.py
2022-10-19 03:30:00 -05:00
```
2025-06-23 15:19:47 -04:00
> [!NOTE]
> The first argument after the double-hyphen must be a Python script, e.g. `rocprof-sys-python -- ./script.py`.
2022-10-19 03:30:00 -05:00
2024-09-27 17:18:21 -04:00
If you need to specify a specific python interpreter version, use `rocprof-sys-python-X.Y` where `X.Y` is the Python
2022-10-19 03:30:00 -05:00
major and minor version:
``` shell
2024-09-27 17:18:21 -04:00
rpcprof-sys-python-3.8 -- ./script.py
2022-10-19 03:30:00 -05:00
```
If you need to specify the full path to a Python interpreter, set the `PYTHON_EXECUTABLE` environment variable:
``` shell
2024-09-27 17:18:21 -04:00
PYTHON_EXECUTABLE = /opt/conda/bin/python rocprof-sys-python -- ./script.py
2022-10-19 03:30:00 -05:00
```
If you want to restrict the data collection to specific function(s) and its callees, pass the `-b` / `--builtin` option after decorating the
function(s) with `@profile` . Use the `@noprofile` decorator for excluding/ignoring function(s) and its callees:
``` python
def foo ( ) :
pass
@noprofile
def bar ( ) :
foo ( )
@profile
def spam ( ) :
foo ( )
bar ( )
```
Each time `spam` is called during profiling, the profiling results will include 1 entry for `spam` and 1 entry
for `foo` via the direct call within `spam` . There will be no entries for `bar` or the `foo` invocation within it.
2025-06-23 15:19:47 -04:00
### Trace visualization
2022-05-08 07:57:09 -05:00
2022-10-19 03:30:00 -05:00
- Visit [ui.perfetto.dev ](https://ui.perfetto.dev ) in the web-browser
- Select "Open trace file" from panel on the left
2024-10-15 11:20:40 -04:00
- Locate the rocprofiler-systems perfetto output (extension: `.proto` )
2022-05-08 07:57:09 -05:00
2024-10-17 15:19:19 -04:00

2022-05-08 07:57:09 -05:00
2024-10-17 15:19:19 -04:00

2022-05-08 07:57:09 -05:00
2024-10-17 15:19:19 -04:00

2022-05-08 07:57:09 -05:00
2024-10-17 15:19:19 -04:00

2021-09-02 13:14:58 -05:00
2025-06-23 15:19:47 -04:00
## Using Perfetto tracing with system backend
2022-07-23 03:02:31 -05:00
Perfetto tracing with the system backend supports multiple processes writing to the same
2024-10-15 11:20:40 -04:00
output file. Thus, it is a useful technique if rocprofiler-systems is built with partial MPI support
2022-07-23 03:02:31 -05:00
because all the perfetto output will be coalesced into a single file. The
installation docs for perfetto can be found [here ](https://perfetto.dev/docs/contributing/build-instructions ).
2024-10-15 11:20:40 -04:00
If you are building rocprofiler-systems from source, you can configure CMake with `ROCPROFSYS_INSTALL_PERFETTO_TOOLS=ON`
2022-07-23 03:02:31 -05:00
and the `perfetto` and `traced` applications will be installed as part of the build process. However,
it should be noted that to prevent this option from accidentally overwriting an existing perfetto install,
2024-10-15 11:20:40 -04:00
all the perfetto executables installed by ROCm Systems Profiler are prefixed with `rocprof-sys-perfetto-` , except
for the `perfetto` executable, which is just renamed `rocprof-sys-perfetto` .
2021-09-02 13:14:58 -05:00
2022-05-30 18:25:12 -05:00
Enable `traced` and `perfetto` in the background:
2021-09-02 13:14:58 -05:00
2021-09-06 22:23:24 -05:00
``` shell
2021-09-02 13:14:58 -05:00
pkill traced
traced --background
2024-10-15 11:20:40 -04:00
perfetto --out ./rocprof-sys-perfetto.proto --txt -c ${ ROCPROFSYS_ROOT } /share/perfetto.cfg --background
2021-08-25 11:39:00 -05:00
```
2021-09-02 13:14:58 -05:00
2025-06-23 15:19:47 -04:00
> [!NOTE]
> If the perfetto tools were installed by rocprofiler-systems, replace `traced` with `rocprof-sys-perfetto-traced` and `perfetto` with `rocprof-sys-perfetto`.
2022-07-23 03:02:31 -05:00
2024-10-15 11:20:40 -04:00
Configure rocprofiler-systems to use the perfetto system backend via the `--perfetto-backend` option of `rocprof-sys-run` :
2021-09-02 13:14:58 -05:00
2021-09-06 22:23:24 -05:00
``` shell
2023-03-14 19:48:29 -05:00
# enable sampling on the uninstrumented binary
2024-09-27 17:18:21 -04:00
rocprof-sys-run --sample --trace --perfetto-backend= system -- ./myapp
2024-10-17 15:19:19 -04:00
2023-03-14 19:48:29 -05:00
# trace the instrument the binary
2024-09-27 17:18:21 -04:00
rocprof-sys-instrument -o ./myapp.inst -- ./myapp
rocprof-sys-run --trace --perfetto-backend= system -- ./myapp.inst
2022-05-30 18:25:12 -05:00
```
2024-09-27 17:18:21 -04:00
or via the `--env` option of `rocprof-sys-instrument` + runtime instrumentation:
2022-05-30 18:25:12 -05:00
``` shell
2024-10-15 11:20:40 -04:00
rocprof-sys-instrument --env ROCPROFSYS_PERFETTO_BACKEND = system -- ./myapp
2021-08-25 11:39:00 -05:00
```