2024-07-29 17:23:36 -04:00
.. meta ::
2024-10-17 15:19:19 -04:00
:description: ROCm Systems Profiler call stack sampling documentation and reference
:keywords: rocprofiler-systems,rocprofsys, ROCm, profiler, sampling, call stack, tracking, visualization, tool, Instinct, accelerator, AMD
2024-07-29 17:23:36 -04:00
****************************************************
Sampling the call stack
****************************************************
2025-10-15 23:11:46 -04:00
`ROCm Systems Profiler <https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-systems> `_ can use call-stack sampling
2024-10-17 15:19:19 -04:00
on a binary instrumented with either the `` rocprof-sys `` executable
or the `` rocprof-sys-sample `` executable.
2024-07-29 17:23:36 -04:00
For example, all of the following commands are effectively equivalent:
* Binary rewrite with only the instrumentation necessary to start and stop sampling
.. code-block:: shell
2024-10-17 15:19:19 -04:00
rocprof-sys-instrument -M sampling -o foo.inst -- foo
rocprof-sys-run -- ./foo.inst
2024-07-29 17:23:36 -04:00
* Runtime instrumentation with only the instrumentation necessary to start and stop sampling
.. code-block:: shell
2024-10-17 15:19:19 -04:00
rocprof-sys-instrument -M sampling -- foo
2024-07-29 17:23:36 -04:00
* No instrumentation required
.. code-block:: shell
2024-10-17 15:19:19 -04:00
rocprof-sys-sample -- foo
2024-07-29 17:23:36 -04:00
.. note ::
2024-10-17 15:19:19 -04:00
Set `` ROCPROFSYS_USE_SAMPLING=ON `` to activate call-stack sampling when executing an instrumented binary.
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
All `` rocprof-sys-instrument -M sampling `` (subsequently referred to as "instrumented-sampling")
2024-07-29 17:23:36 -04:00
does is wrap the `` main `` of the executable with initialization
before `` main `` starts and finalization after `` main `` ends.
2024-10-17 15:19:19 -04:00
This can be accomplished without instrumentation through a `` LD_PRELOAD ``
2024-07-29 17:23:36 -04:00
of a library containing a dynamic symbol wrapper around `` __libc_start_main `` .
2024-10-17 15:19:19 -04:00
The use of `` rocprof-sys-sample `` is **recommended** over
`` rocprof-sys-instrument -M sampling `` when binary instrumentation
2024-07-29 17:23:36 -04:00
is not necessary. This is for a number of reasons:
2024-10-17 15:19:19 -04:00
* `` rocprof-sys-sample `` provides command-line options for controlling the ROCm Systems Profiler feature set instead of
2024-07-29 17:23:36 -04:00
requiring configuration files or environment variables
2024-10-17 15:19:19 -04:00
* Despite the fact that instrumented-sampling only requires inserting snippets
2024-07-29 17:23:36 -04:00
around one function (`` main `` ), Dyninst
2024-10-17 15:19:19 -04:00
does not have a feature for specifying that parsing and processing all the
2024-07-29 17:23:36 -04:00
other symbols in the binary is unnecessary.
2024-10-17 15:19:19 -04:00
In the best-case scenario when the target binary is relatively small,
2024-07-29 17:23:36 -04:00
instrumented-sampling has a slightly slower launch time,
but in the worst case scenarios it requires a significant amount of time and memory to launch.
2024-10-17 15:19:19 -04:00
* `` rocprof-sys-sample `` is fully compatible with MPI. For example,
the command `` mpirun -n 2 rocprof-sys-sample -- foo `` is valid,
whereas `` mpirun -n 2 rocprof-sys-instrument -M sampling -- foo ``
2024-07-29 17:23:36 -04:00
is incompatible with some MPI distributions (particularly OpenMPI). This is because
MPI prohibits forking within an MPI rank.
* When MPI and binary instrumentation are both involved, two steps are required:
2024-10-17 15:19:19 -04:00
performing a binary rewrite of the executable and then using the instrumented executable
in lieu of the original executable. `` rocprof-sys-sample `` is therefore much easier to use with MPI.
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
The rocprof-sys-sample executable
2024-07-29 17:23:36 -04:00
========================================
2024-10-17 15:19:19 -04:00
View the help menu of `` rocprof-sys-sample `` with the `` -h `` / `` --help `` option:
2024-07-29 17:23:36 -04:00
.. code-block :: shell
2024-10-17 15:19:19 -04:00
$ rocprof-sys-sample --help
[ rocprof-sys-sample] Usage: rocprof-sys-sample [ --help ( count: 0, dtype: bool)
2024-07-29 17:23:36 -04:00
--version ( count: 0, dtype: bool)
--monochrome ( max: 1, dtype: bool)
--debug ( max: 1, dtype: bool)
--verbose ( count: 1)
--config ( min: 0, dtype: filepath)
--output ( min: 1)
--trace ( max: 1, dtype: bool)
--profile ( max: 1, dtype: bool)
--flat-profile ( max: 1, dtype: bool)
--host ( max: 1, dtype: bool)
--device ( max: 1, dtype: bool)
--wait ( count: 1)
--duration ( count: 1)
--trace-file ( count: 1, dtype: filepath)
--trace-buffer-size ( count: 1, dtype: KB)
--trace-fill-policy ( count: 1)
--trace-wait ( count: 1)
--trace-duration ( count: 1)
--trace-periods ( min: 1)
--trace-clock-id ( count: 1)
--profile-format ( min: 1)
--profile-diff ( min: 1)
--process-freq ( count: 1)
--process-wait ( count: 1)
--process-duration ( count: 1)
--cpus ( count: unlimited, dtype: int or range)
--gpus ( count: unlimited, dtype: int or range)
--freq ( count: 1)
--sampling-wait ( count: 1)
--sampling-duration ( count: 1)
--tids ( min: 1)
--cputime ( min: 0)
--realtime ( min: 0)
--include ( count: unlimited)
--exclude ( count: unlimited)
--cpu-events ( count: unlimited)
--gpu-events ( count: unlimited)
--inlines ( max: 1, dtype: bool)
--hsa-interrupt ( count: 1, dtype: int)
2024-10-17 15:19:19 -04:00
]
2024-07-29 17:23:36 -04:00
Options:
2024-10-17 15:19:19 -04:00
-h, -?, --help Shows this page ( count: 0, dtype: bool)
--version Prints the version and exit ( count: 0, dtype: bool)
[ DEBUG OPTIONS]
--monochrome Disable colorized output ( max: 1, dtype: bool)
--debug Debug output ( max: 1, dtype: bool)
-v, --verbose Verbose output ( count: 1)
[ GENERAL OPTIONS] These are options which are ubiquitously applied
-c, --config Configuration file ( min: 0, dtype: filepath)
-o, --output Output path. Accepts 1-2 parameters corresponding to the output path and the output prefix ( min: 1)
-T, --trace Generate a detailed trace ( perfetto output) ( max: 1, dtype: bool)
-P, --profile Generate a call-stack-based profile ( conflicts with --flat-profile) ( max: 1, dtype: bool)
-F, --flat-profile Generate a flat profile ( conflicts with --profile) ( max: 1, dtype: bool)
-H, --host Enable sampling host-based metrics for the process. E.g. CPU frequency, memory usage, etc. ( max: 1, dtype: bool)
-D, --device Enable sampling device-based metrics for the process. E.g. GPU temperature, memory usage, etc. ( max: 1, dtype: bool)
-w, --wait This option is a combination of '--trace-wait' and '--sampling-wait' . See the descriptions for those two options.
( count: 1)
-d, --duration This option is a combination of '--trace-duration' and '--sampling-duration' . See the descriptions for those two
options. ( count: 1)
[ TRACING OPTIONS] Specific options controlling tracing ( i.e. deterministic measurements of every event)
--trace-file Specify the trace output filename. Relative filepath will be with respect to output path and output prefix. ( count: 1,
dtype: filepath)
--trace-buffer-size Size limit for the trace output ( in KB) ( count: 1, dtype: KB)
2024-07-29 17:23:36 -04:00
--trace-fill-policy [ discard | ring_buffer ]
2024-10-17 15:19:19 -04:00
2024-07-29 17:23:36 -04:00
Policy for new data when the buffer size limit is reached:
- discard : new data is ignored
- ring_buffer : new data overwrites oldest data ( count: 1)
2024-10-17 15:19:19 -04:00
--trace-wait Set the wait time ( in seconds) before collecting trace and/or profiling data( in seconds) . By default, the duration is
in seconds of realtime but that can changed via --trace-clock-id. ( count: 1)
--trace-duration Set the duration of the trace and/or profile data collection ( in seconds) . By default, the duration is in seconds of
realtime but that can changed via --trace-clock-id. ( count: 1)
--trace-periods More powerful version of specifying trace delay and/or duration. Format is one or more groups of: <DELAY>:<DURATION>,
<DELAY>:<DURATION>:<REPEAT>, and/or <DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>. ( min: 1)
2024-07-29 17:23:36 -04:00
--trace-clock-id [ 0 ( realtime| CLOCK_REALTIME)
1 ( monotonic| CLOCK_MONOTONIC)
2 ( cputime| CLOCK_PROCESS_CPUTIME_ID)
4 ( monotonic_raw| CLOCK_MONOTONIC_RAW)
5 ( realtime_coarse| CLOCK_REALTIME_COARSE)
6 ( monotonic_coarse| CLOCK_MONOTONIC_COARSE)
7 ( boottime| CLOCK_BOOTTIME) ]
2024-10-17 15:19:19 -04:00
Set the default clock ID for for trace delay/duration. Note: "cputime" is the *process* CPU time and might need to be
scaled based on the number of threads, i.e. 4 seconds of CPU-time for an application with 4 fully active threads would
equate to ~1 second of realtime. If this proves to be difficult to handle in practice, please file a feature request
for rocprof-sys to auto-scale based on the number of threads. ( count: 1)
[ PROFILE OPTIONS] Specific options controlling profiling ( i.e. deterministic measurements which are aggregated into a summary)
2024-07-29 17:23:36 -04:00
--profile-format [ console | json | text ]
2024-10-17 15:19:19 -04:00
Data formats for profiling results ( min: 1)
--profile-diff Generate a diff output b/t the profile collected and an existing profile from another run Accepts 1-2 parameters
corresponding to the input path and the input prefix ( min: 1)
2024-07-29 17:23:36 -04:00
[ HOST/DEVICE ( PROCESS SAMPLING) OPTIONS]
2024-10-17 15:19:19 -04:00
Process sampling is background measurements for resources available to the entire process. These samples are not tied
to specific lines/regions of code
--process-freq Set the default host/device sampling frequency ( number of interrupts per second) ( count: 1)
--process-wait Set the default wait time ( i.e. delay) before taking first host/device sample ( in seconds of realtime) ( count: 1)
--process-duration Set the duration of the host/device sampling ( in seconds of realtime) ( count: 1)
--cpus CPU IDs for frequency sampling. Supports integers and/or ranges ( count: unlimited, dtype: int or range)
--gpus GPU IDs for SMI queries. Supports integers and/or ranges ( count: unlimited, dtype: int or range)
[ GENERAL SAMPLING OPTIONS] General options for timer-based sampling per-thread
-f, --freq Set the default sampling frequency ( number of interrupts per second) ( count: 1)
--sampling-wait Set the default wait time ( i.e. delay) before taking first sample ( in seconds) . This delay time is based on the clock
of the sampler, i.e., a delay of 1 second for CPU-clock sampler may not equal 1 second of realtime ( count: 1)
--sampling-duration Set the duration of the sampling ( in seconds of realtime) . I.e., it is possible ( currently) to set a CPU-clock time
delay that exceeds the real-time duration... resulting in zero samples being taken ( count: 1)
-t, --tids Specify the default thread IDs for sampling, where 0 ( zero) is the main thread and each thread created by the target
application is assigned an atomically incrementing value. ( min: 1)
[ SAMPLING TIMER OPTIONS] These options determine the heuristic for deciding when to take a sample
2024-07-29 17:23:36 -04:00
--cputime Sample based on a CPU-clock timer ( default) . Accepts zero or more arguments:
0. Enables sampling based on CPU-clock timer.
1. Interrupts per second. E.g., 100 = = sample every 10 milliseconds of CPU-time.
2. Delay ( in seconds of CPU-clock time ) . I.e., how long each thread should wait before taking first sample.
3+ Thread IDs to target for sampling, starting at 0 ( the main thread) .
May be specified as index or range, e.g., '0 2-4' will be interpreted as:
sample the main thread ( 0) , do not sample the first child thread but sample the 2nd, 3rd, and 4th child threads ( min: 0)
--realtime Sample based on a real-clock timer. Accepts zero or more arguments:
0. Enables sampling based on real-clock timer.
1. Interrupts per second. E.g., 100 = = sample every 10 milliseconds of realtime.
2. Delay ( in seconds of real-clock time ) . I.e., how long each thread should wait before taking first sample.
3+ Thread IDs to target for sampling, starting at 0 ( the main thread) .
May be specified as index or range, e.g., '0 2-4' will be interpreted as:
sample the main thread ( 0) , do not sample the first child thread but sample the 2nd, 3rd, and 4th child threads
When sampling with a real-clock timer, please note that enabling this will cause threads which are typically "idle"
to consume more resources since, while idle, the real-clock time increases ( and therefore triggers taking samples)
whereas the CPU-clock time does not. ( min: 0)
2024-10-17 15:19:19 -04:00
[ BACKEND OPTIONS] These options control region information captured w/o sampling or instrumentation
2025-03-25 17:31:27 -04:00
-I, --include [ all | kokkosp | mpip | mutex-locks | ompt | rcclp | amd-smi | rocprofiler-sdk | rw-locks | spin-locks ]
2024-10-17 15:19:19 -04:00
Include data from these backends ( count: unlimited)
2025-03-25 17:31:27 -04:00
-E, --exclude [ all | kokkosp | mpip | mutex-locks | ompt | rcclp | amd-smi | rocprofiler-sdk | rw-locks | spin-locks ]
2024-10-17 15:19:19 -04:00
Exclude data from these backends ( count: unlimited)
[ HARDWARE COUNTER OPTIONS] See also: rocprof-sys-avail -H
-C, --cpu-events Set the CPU hardware counter events to record ( ref: ` rocprof-sys-avail -H -c CPU` ) ( count: unlimited)
-G, --gpu-events Set the GPU hardware counter events to record ( ref: ` rocprof-sys-avail -H -c GPU` ) ( count: unlimited)
[ MISCELLANEOUS OPTIONS]
-i, --inlines Include inline info in output when available ( max: 1, dtype: bool)
2024-07-29 17:23:36 -04:00
--hsa-interrupt [ 0 | 1 ] Set the value of the HSA_ENABLE_INTERRUPT environment variable.
ROCm version 5.2 and older have a bug which will cause a deadlock if a sample is taken while waiting for the signal
that a kernel completed -- which happens when sampling with a real-clock timer. We require this option to be set to
when --realtime is specified to make users aware that, while this may fix the bug, it can have a negative impact on
performance.
Values:
0 avoid triggering the bug, potentially at the cost of reduced performance
1 do not modify how ROCm is notified about kernel completion ( count: 1, dtype: int)
2024-10-17 15:19:19 -04:00
The general syntax for separating ROCm Systems Profiler command-line arguments from the
following application arguments
is consistent with the LLVM style of using a stand-alone double hyphen (`` -- `` ).
2024-07-29 17:23:36 -04:00
All arguments preceding the double hyphen
2024-10-17 15:19:19 -04:00
are interpreted as belonging to ROCm Systems Profiler and all arguments following it
2024-07-29 17:23:36 -04:00
are interpreted as the
2024-10-17 15:19:19 -04:00
application and its arguments. The double hyphen is only necessary when passing
2024-07-29 17:23:36 -04:00
command-line arguments to a target
2024-10-17 15:19:19 -04:00
which also uses hyphens. For example, you can run `` rocprof-sys-sample ls `` , but
to run `` ls -la `` , use `` rocprof-sys-sample -- ls -la `` .
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
:doc: `Configuring the ROCm Systems Profiler runtime options <./configuring-runtime-options>`
establishes the precedence of environment variable values over values specified
2024-07-29 17:23:36 -04:00
in the configuration files. This enables
2024-10-17 15:19:19 -04:00
you to configure the ROCm Systems Profiler runtime to your preferred default behavior
in a file such as `` ~/.rocprof-sys.cfg `` and then easily override
those settings in the command line, for example, `` ROCPROFSYS_ENABLED=OFF rocprof-sys-sample -- foo `` .
Similarly, the command-line arguments passed to `` rocprof-sys-sample `` take precedence
2024-07-29 17:23:36 -04:00
over environment variables.
2024-10-17 15:19:19 -04:00
All of the command-line options above correlate to one or more configuration
settings, for example, `` --cpu-events `` correlates to the `` ROCPROFSYS_PAPI_EVENTS `` configuration variable.
`` rocprof-sys-sample `` processes the arguments and outputs a summary of its configuration
before running the target application.
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
The following snippets show how `` rocprof-sys-sample `` runs with various environment updates.
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
* This snippet shows the environment updates when `` rocprof-sys-sample `` is invoked with no arguments:
2024-07-29 17:23:36 -04:00
.. code-block:: shell
2024-10-17 15:19:19 -04:00
$ rocprof-sys-sample -- ./parallel-overhead-locks 30 4 100
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
LD_PRELOAD=/opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.7.1
ROCPROFSYS_USE_PROCESS_SAMPLING=false
ROCPROFSYS_USE_SAMPLING=true
OMP_TOOL_LIBRARIES=/opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.7.1
ROCP_TOOL_LIB=/opt/rocprofiler-systems/lib/librocprof-sys.so.1.7.1
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
* The next snippet shows the environment updates when `` rocprof-sys-sample `` enables
2025-04-30 09:50:51 -04:00
profiling, tracing, device process-sampling, and does not enable host process-sampling:
.. code-block :: shell
$ rocprof-sys-sample -PTD -- ./parallel-overhead-locks 30 4 100
LD_PRELOAD = /opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.7.1
ROCPROFSYS_CPU_FREQ_ENABLED = false
ROCPROFSYS_PROFILE = true
2025-12-25 13:36:04 +01:00
ROCPROFSYS_TRACE = true
2025-04-30 09:50:51 -04:00
ROCPROFSYS_USE_AMD_SMI = true
ROCPROFSYS_USE_PROCESS_SAMPLING = true
ROCPROFSYS_USE_SAMPLING = true
OMP_TOOL_LIBRARIES = /opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.7.1
ROCP_TOOL_LIB = /opt/rocprofiler-systems/lib/librocprof-sys.so.1.7.1
* The next snippet shows the environment updates when `` rocprof-sys-sample `` enables
profiling, tracing, device process-sampling, host process-sampling, and all the available backends:
2024-07-29 17:23:36 -04:00
.. code-block :: shell
2024-10-17 15:19:19 -04:00
$ rocprof-sys-sample -PTDH -I all -- ./parallel-overhead-locks 30 4 100
2024-07-29 17:23:36 -04:00
2024-12-20 01:20:54 +01:00
KOKKOS_TOOLS_LIBS = /opt/rocprofiler-systems/lib/librocprof-sys.so.1.7.1
2024-10-17 15:19:19 -04:00
LD_PRELOAD = /opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.7.1
ROCPROFSYS_CPU_FREQ_ENABLED = true
ROCPROFSYS_TRACE_THREAD_LOCKS = true
ROCPROFSYS_TRACE_THREAD_RW_LOCKS = true
ROCPROFSYS_TRACE_THREAD_SPIN_LOCKS = true
ROCPROFSYS_USE_KOKKOSP = true
ROCPROFSYS_USE_MPIP = true
ROCPROFSYS_USE_OMPT = true
2025-12-25 13:36:04 +01:00
ROCPROFSYS_TRACE = true
2024-10-17 15:19:19 -04:00
ROCPROFSYS_USE_PROCESS_SAMPLING = true
ROCPROFSYS_USE_RCCLP = true
2025-03-25 17:31:27 -04:00
ROCPROFSYS_USE_AMD_SMI = true
2024-12-13 18:48:39 -05:00
ROCPROFSYS_USE_ROCM = true
2024-10-17 15:19:19 -04:00
ROCPROFSYS_USE_SAMPLING = true
ROCPROFSYS_PROFILE = true
OMP_TOOL_LIBRARIES = /opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.7.1
ROCP_TOOL_LIB = /opt/rocprofiler-systems/lib/librocprof-sys.so.1.7.1
2024-07-29 17:23:36 -04:00
...
2024-10-17 15:19:19 -04:00
* The final snippet shows the environment updates when `` rocprof-sys-sample `` enables
2025-04-30 09:50:51 -04:00
profiling, tracing, device process-sampling, and host process-sampling,
2024-10-17 15:19:19 -04:00
sets the output path to `` rocprof-sys-output `` and the output prefix to `` %tag% `` , and disables
2024-07-29 17:23:36 -04:00
all the available backends:
.. code-block :: shell
2024-10-17 15:19:19 -04:00
$ rocprof-sys-sample -PTDH -E all -o rocprof-sys-output %tag% -- ./parallel-overhead-locks 30 4 100
LD_PRELOAD = /opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.7.1
ROCPROFSYS_CPU_FREQ_ENABLED = true
ROCPROFSYS_OUTPUT_PATH = rocprof-sys-output
ROCPROFSYS_OUTPUT_PREFIX = %tag%
ROCPROFSYS_TRACE_THREAD_LOCKS = false
ROCPROFSYS_TRACE_THREAD_RW_LOCKS = false
ROCPROFSYS_TRACE_THREAD_SPIN_LOCKS = false
ROCPROFSYS_USE_KOKKOSP = false
ROCPROFSYS_USE_MPIP = false
ROCPROFSYS_USE_OMPT = false
2025-12-25 13:36:04 +01:00
ROCPROFSYS_TRACE = true
2024-10-17 15:19:19 -04:00
ROCPROFSYS_USE_PROCESS_SAMPLING = true
ROCPROFSYS_USE_RCCLP = false
2025-03-25 17:31:27 -04:00
ROCPROFSYS_USE_AMD_SMI = false
2024-12-13 18:48:39 -05:00
ROCPROFSYS_USE_ROCM = false
2024-10-17 15:19:19 -04:00
ROCPROFSYS_USE_SAMPLING = true
ROCPROFSYS_PROFILE = true
2024-07-29 17:23:36 -04:00
...
2025-04-30 09:50:51 -04:00
A rocprof-sys-sample example
2024-07-29 17:23:36 -04:00
========================================
2024-10-17 15:19:19 -04:00
Here is the full output from the previous
`` rocprof-sys-sample -PTDH -E all -o rocprof-sys-output %tag% -- ./parallel-overhead-locks 30 4 100 `` command:
2024-07-29 17:23:36 -04:00
2024-12-13 15:59:07 -05:00
.. code-block :: shell-session
2024-07-29 17:23:36 -04:00
2024-10-17 15:19:19 -04:00
$ rocprof-sys-sample -PTDH -E all -o rocprof-sys-output %tag% -c -- ./parallel-overhead-locks 30 4 100
LD_PRELOAD=/opt/rocprofiler-systems/lib/librocprof-sys-dl.so.1.11.3
ROCPROFSYS_CONFIG_FILE=
ROCPROFSYS_CPU_FREQ_ENABLED=true
ROCPROFSYS_OUTPUT_PATH=rocprof-sys-output
ROCPROFSYS_OUTPUT_PREFIX=%tag%
ROCPROFSYS_PROFILE=true
2025-12-25 13:36:04 +01:00
ROCPROFSYS_TRACE=true
2024-10-17 15:19:19 -04:00
ROCPROFSYS_TRACE_THREAD_LOCKS=false
ROCPROFSYS_TRACE_THREAD_RW_LOCKS=false
ROCPROFSYS_TRACE_THREAD_SPIN_LOCKS=false
ROCPROFSYS_USE_KOKKOSP=false
ROCPROFSYS_USE_MPIP=false
ROCPROFSYS_USE_OMPT=false
ROCPROFSYS_USE_PROCESS_SAMPLING=true
ROCPROFSYS_USE_RCCLP=false
2025-03-25 17:31:27 -04:00
ROCPROFSYS_USE_AMD_SMI=false
2024-12-13 18:48:39 -05:00
ROCPROFSYS_USE_ROCM=false
2024-10-17 15:19:19 -04:00
ROCPROFSYS_USE_SAMPLING=true
[rocprof-sys][dl][1785877] rocprofsys_main
[rocprof-sys][1785877][rocprofsys_init_tooling] Instrumentation mode: Sampling
__
_ __ ___ ___ _ __ _ __ ___ / _| ___ _ _ ___
| '__| / _ \ / __| | '_ \ | '__| / _ \ | |_ _____ / __| | | | | / __|
| | | (_) | | (__ | |_) | | | | (_) | | _| |_____| \__ \ | |_| | \__ \
|_| \___/ \___| | .__/ |_| \___/ |_| |___/ \__, | |___/
|_| |___/
rocprof-sys v1.11.2 (rev: 2586b74db8bf335742600010b8d9f1ce8da9cf89, compiler: GNU v11.4.1, rocm: v6.1.x)
2024-07-29 17:23:36 -04:00
[988.958] perfetto.cc:58649 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: ""
[parallel-overhead-locks] Threads: 4
[parallel-overhead-locks] Iterations: 100
[parallel-overhead-locks] fibonacci(30)...
[1] number of iterations: 100
[2] number of iterations: 100
[3] number of iterations: 100
[4] number of iterations: 100
[parallel-overhead-locks] fibonacci(30) x 4 = 409221992
[parallel-overhead-locks] number of mutex locks = 400
2024-10-17 15:19:19 -04:00
[rocprof-sys][1785877][0][rocprofsys_finalize] finalizing...
[rocprof-sys][1785877][0][rocprofsys_finalize]
[rocprof-sys][1785877][0][rocprofsys_finalize] rocprof-sys/process/1785877 : 0.294342 sec wall_clock, 4.776 MB peak_rss, 3.170 MB page_rss, 0.990000 sec cpu_clock, 336.3 % cpu_util [laps: 1]
[rocprof-sys][1785877][0][rocprofsys_finalize] rocprof-sys/process/1785877/thread/0 : 0.291535 sec wall_clock, 0.002619 sec thread_cpu_clock, 0.9 % thread_cpu_util, 4.776 MB peak_rss [laps: 1]
[rocprof-sys][1785877][0][rocprofsys_finalize] rocprof-sys/process/1785877/thread/1 : 0.271353 sec wall_clock, 0.222572 sec thread_cpu_clock, 82.0 % thread_cpu_util, 4.200 MB peak_rss [laps: 1]
[rocprof-sys][1785877][0][rocprofsys_finalize] rocprof-sys/process/1785877/thread/2 : 0.238218 sec wall_clock, 0.206405 sec thread_cpu_clock, 86.6 % thread_cpu_util, 3.432 MB peak_rss [laps: 1]
[rocprof-sys][1785877][0][rocprofsys_finalize] rocprof-sys/process/1785877/thread/3 : 0.209459 sec wall_clock, 0.193415 sec thread_cpu_clock, 92.3 % thread_cpu_util, 2.472 MB peak_rss [laps: 1]
[rocprof-sys][1785877][0][rocprofsys_finalize] rocprof-sys/process/1785877/thread/4 : 0.212029 sec wall_clock, 0.211694 sec thread_cpu_clock, 99.8 % thread_cpu_util, 1.152 MB peak_rss [laps: 1]
[rocprof-sys][1785877][0][rocprofsys_finalize]
[rocprof-sys][1785877][0][rocprofsys_finalize] Finalizing perfetto...
[rocprof-sys][1785877][perfetto]> Outputting '/home/user/code/rocprofiler-systems/build-release/rocprofiler-systems-output/2024-07-15_16.21/parallel-overhead-locksperfetto-trace-1785877.proto' (39.12 KB / 0.04 MB / 0.00 GB)... Done
[rocprof-sys][1785877][wall_clock]> Outputting 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-lockswall_clock-1785877.json'
[rocprof-sys][1785877][wall_clock]> Outputting 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-lockswall_clock-1785877.txt'
[rocprof-sys][1785877][metadata]> Outputting 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-locksmetadata-1785877.json' and 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-locksfunctions-1785877.json'
[rocprof-sys][1785877][0][rocprofsys_finalize] Finalized: 0.054582 sec wall_clock, 0.000 MB peak_rss, -1.798 MB page_rss, 0.040000 sec cpu_clock, 73.3 % cpu_util
2024-07-29 17:23:36 -04:00
[989.312] perfetto.cc:60128 Tracing session 1 ended, total sessions:0