Files
Peter Park 92d23f72d2 docs: Fix docutils warnings (#59)
* fix typo

* fix `Lexing literal_block` docutils warning

* fix `Title underline too short` docutils warning

* use consistent file type

* fix `Malformed table` error

* improve index.rst and front-load TOC

[ROCm/rocprofiler-systems commit: a70034055e]
2024-12-16 06:05:42 -05:00

103 baris
4.7 KiB
ReStructuredText

.. meta::
:description: ROCm Systems Profiler glossary and reference
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, glossary, terminology, profiler, tracking, visualization, tool, Instinct, accelerator, AMD
********
Glossary
********
This topic explains the terminology necessary to use ROCm Systems Profiler.
The list below provides a basic glossary for those who
are new to binary instrumentation. It also clarifies ambiguities
when certain terms have different
contextual meanings, for example, the ROCm Systems Profiler meaning of the term "module"
when instrumenting Python.
Binary
A file written in the Executable and Linkable Format (ELF). This is the standard file
format for executable files, shared libraries, etc.
Binary instrumentation
Inserting callbacks to instrumentation into an existing binary. This can be performed
statically or dynamically.
Static binary instrumentation
Loads an existing binary, determines instrumentation points, and generates a new binary
with instrumentation directly embedded. It is applicable to executables and libraries but
limited to only the functions defined in the binary. This is also known as **Binary rewrite**.
Dynamic binary instrumentation
Loads an existing binary into memory, inserts instrumentation, and runs the binary.
It is limited to executables but is capable of instrumenting linked libraries.
This is also known as **Runtime instrumentation**.
Statistical sampling
At periodic intervals, the application is paused and the current call-stack of the CPU
is recorded along with various other metrics. It uses timers that measure either
(A) real clock time or (B) the CPU time used by the current thread and the CPU time
expended on behalf of the thread by the system. This is also known as simply **sampling**.
Sampling rate
* The period at which (A) or (B) are triggered (in units of ``# interrupts / second``)
* Higher values increase the number of samples
Sampling delay
* How long to wait before (A) and (B) begin triggering at their designated rate
Sampling duration
* The amount of time (in real-time) after the start of the application to record samples.
* After this time limit has been reached, no more samples are recorded.
Process sampling
At periodic (real-time) intervals, a background thread records global metrics without
interrupting the current process. These metrics include, but are not limited to:
CPU frequency, CPU memory high-water mark (i.e. peak memory usage), GPU temperature,
and GPU power usage.
Sampling rate
* The real-time period for recording metrics (in units of ``# measurements / second``)
* Higher values increase the number of samples
Sampling delay
* How long to wait (in real-time) before recording samples
Sampling duration
* The amount of time (in real-time) after the start of the application to record samples.
* After this time limit has been reached, no more samples are recorded.
Module
With respect to binary instrumentation, a module is defined as either the filename
(such as ``foo.c``) or library name (``libfoo.so``) which contains the definition
of one or more functions.
With respect to Python instrumentation, a module is defined as the **file** which contains
the definition of one or more functions. The full path to this file typically contains the
name of the "Python module".
Basic block
A straight-line code sequence with no branches in (except for the entry) and
no branches out (except for the exit).
Address range
The instructions for a function in a binary start at certain address with the ELF file
and end at a certain address. The range is ``end - start``.
The address range is a decent approximation for the "cost" of a function.
For example, a larger address range approximately equates to more instructions.
Instrumentation traps
On the x86 architecture, because instructions are of variable size, an instruction
might be too small for Dyninst to replace it with the normal code sequence
used to call instrumentation. When instrumentation is placed at points other
than subroutine entry, exit, or call points, traps may be used to ensure
the instrumentation fits. (By default, ``rocprof-sys-instrument`` avoids instrumentation
which requires a trap.)
Overlapping functions
Due to language constructs or compiler optimizations, it might be possible for
multiple functions to overlap (that is, share part of the same function body)
or for a single function to have multiple entry points. In practice, it's
impossible to determine the difference between multiple overlapping functions
and a single function with multiple entry points. (By default, ``rocprof-sys-instrument``
avoids instrumenting overlapping functions.)