docs: Fix docutils warnings (#59)

* fix typo

* fix `Lexing literal_block` docutils warning

* fix `Title underline too short` docutils warning

* use consistent file type

* fix `Malformed table` error

* improve index.rst and front-load TOC

[ROCm/rocprofiler-systems commit: 39468e8867]
This commit is contained in:
Peter Park
2024-12-13 15:59:07 -05:00
committed by GitHub
parent ab379457a1
commit 95b8f8fdd9
16 changed files with 102 additions and 97 deletions
@@ -2,9 +2,9 @@
:description: ROCm Systems Profiler feature set documentation and reference
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, feature set, use cases, tracking, visualization, tool, Instinct, accelerator, AMD
***************************************
The ROCm Systems Profiler feature set and use cases
***************************************
********************************************
ROCm Systems Profiler features and use cases
********************************************
`ROCm Systems Profiler <https://github.com/ROCm/rocprofiler-systems>`_ is designed to be highly extensible.
Internally, it leverages the `Timemory performance analysis toolkit <https://github.com/ROCm/timemory>`_
@@ -129,4 +129,4 @@ broad picture.
In terms of CPU analysis, ROCm Systems Profiler does not target any specific vendor.
It works just as well on AMD and non-AMD CPUs.
With regard to the GPU, ROCm Systems Profiler is currently restricted to HIP and HSA APIs
and kernels running on AMD GPUs.
and kernels running on AMD GPUs.
@@ -173,7 +173,7 @@ PAPI components from different namespaces:
about the PAPI library used by ROCm Systems Profiler
(because ROCm Systems Profiler statically links to ``libpapi``). However, all of these tools are
installed with the prefix ``rocprof-sys-`` with
underscores replaced with hypens, for example ``papi_avail`` becomes ``rocprof-sys-papi-avail``.
underscores replaced with hyphens, for example ``papi_avail`` becomes ``rocprof-sys-papi-avail``.
ROCPROFSYS_ROCM_EVENTS
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -2,9 +2,9 @@
:description: ROCm Systems Profiler general tips and usage documentation and reference
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, tips, how to, profiler, tracking, visualization, tool, Instinct, accelerator, AMD
**********************************
********************************************
General tips for using ROCm Systems Profiler
**********************************
********************************************
Follow these general guidelines when using ROCm Systems Profiler. For an explanation of the terms used in this topic, see
the :doc:`ROCm Systems Profiler glossary <../reference/rocprof-sys-glossary>`.
@@ -97,32 +97,32 @@ This can happen in three different ways:
Key concepts
-----------------------------------
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| Concept | Setting | Options | Description |
+==================+=====================================+==================================+============================================+
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
| Concept | Setting | Options | Description |
+==================+======================================+==================================+============================================+
| Backend | ``ROCPROFSYS_CAUSAL_BACKEND`` | ``perf``, ``timer`` | Backend for recording samples required |
| | | | to calculate the virtual speed-up |
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| | | | to calculate the virtual speed-up |
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
| Mode | ``ROCPROFSYS_CAUSAL_MODE`` | ``function``, ``line`` | Select an entire function or individual |
| | | | line of code for causal experiments |
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| | | | line of code for causal experiments |
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
| End-to-end | ``ROCPROFSYS_CAUSAL_END_TO_END`` | Boolean | Perform a single experiment during the |
| | | | entire run (does not require |
| | | | progress points) |
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| | | | entire run (does not require |
| | | | progress points) |
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
| Fixed speed-up | ``ROCPROFSYS_CAUSAL_FIXED_SPEEDUP`` | one or more values from [0, 100] | Virtual speed-up or pool of virtual |
| | | | speed-ups to randomly select |
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| | | | speed-ups to randomly select |
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
| Binary scope | ``ROCPROFSYS_CAUSAL_BINARY_SCOPE`` | regular expression(s) | Dynamic binaries containing code for |
| | | | experiments |
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| | | | experiments |
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
| Source scope | ``ROCPROFSYS_CAUSAL_SOURCE_SCOPE`` | regular expression(s) | ``<file>`` and/or ``<file>:<line>`` |
| | | | containing code to include in experiments |
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| | | | containing code to include in experiments |
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
| Function scope | ``ROCPROFSYS_CAUSAL_FUNCTION_SCOPE`` | regular expression(s) | Restricts experiments to matching |
| | | | functions (function mode) or lines of |
| | | | code within matching functions (line mode) |
+------------------+-------------------------------------+----------------------------------+--------------------------------------------+
| | | | functions (function mode) or lines of |
| | | | code within matching functions (line mode) |
+------------------+--------------------------------------+----------------------------------+--------------------------------------------+
.. note::
@@ -28,7 +28,7 @@ be the same size.
``OS`` is the operating system, and ``ABI`` is the application binary interface,
for example, ``libpyrocprofsys.cpython-38-x86_64-linux-gnu.so``.
Getting Started
Getting started
========================================
The ROCm Systems Profiler Python package is installed in ``lib/pythonX.Y/site-packages/rocprofsys``.
@@ -44,7 +44,7 @@ Both the ``share/rocprofiler-systems/setup-env.sh`` script and the module file i
environment variable.
Running ROCm Systems Profiler on a Python script
========================================
================================================
ROCm Systems Profiler provides an ``rocprof-sys-python`` helper bash script which
ensures ``PYTHONPATH`` is properly set and the correct Python interpreter is used.
@@ -200,7 +200,7 @@ And then run using the command ``rocprof-sys-python -b -- ./example.py``, ROCm S
|-----------------------------------------------------------|
ROCm Systems Profiler Python source instrumentation
========================================
===================================================
Starting with the unmodified ``example.py`` script above, import the ``rocprofsys`` module:
@@ -268,7 +268,7 @@ original ``rocprofsys-python ./example.py`` results:
numerous functions called when more complex modules are imported, such as ``import numpy``.
ROCm Systems Profiler Python source instrumentation configuration
-------------------------------------------------------------
-----------------------------------------------------------------
Within the Python source code, the profiler can be configured by directly
modifying the ``rocprof-sys.profiler.config`` data fields.
@@ -343,7 +343,7 @@ An rocprof-sys-sample example
Here is the full output from the previous
``rocprof-sys-sample -PTDH -E all -o rocprof-sys-output %tag% -- ./parallel-overhead-locks 30 4 100`` command:
.. code-block:: shell
.. code-block:: shell-session
$ rocprof-sys-sample -PTDH -E all -o rocprof-sys-output %tag% -c -- ./parallel-overhead-locks 30 4 100
@@ -403,3 +403,4 @@ Here is the full output from the previous
[rocprof-sys][1785877][metadata]> Outputting 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-locksmetadata-1785877.json' and 'rocprof-sys-output/2024-07-15_16.21/parallel-overhead-locksfunctions-1785877.json'
[rocprof-sys][1785877][0][rocprofsys_finalize] Finalized: 0.054582 sec wall_clock, 0.000 MB peak_rss, -1.798 MB page_rss, 0.040000 sec cpu_clock, 73.3 % cpu_util
[989.312] perfetto.cc:60128 Tracing session 1 ended, total sessions:0
@@ -238,7 +238,7 @@ Metadata JSON Sample
}
Configuring the ROCm Systems Profiler output
========================================
============================================
ROCm Systems Profiler includes a core set of options for controlling the format
and contents of the output files. For additional information, see the guide on
@@ -10,7 +10,7 @@ The following example shows how a program can use the ROCm Systems Profiler API
for run-time analysis.
ROCm Systems Profiler user API example program
========================================
==============================================
You can use the ROCm Systems Profiler API to define custom regions to profile and trace.
The following C++ program demonstrates this technique by calling several functions from the
@@ -157,7 +157,7 @@ ROCm Systems Profiler API, such as ``rocprofsys_user_push_region`` and
}
Linking the ROCm Systems Profiler libraries to another program
=======================================================
==============================================================
To link the ``rocprofiler-systems-user-library`` to another program,
use the following CMake and ``g++`` directives.
@@ -186,7 +186,7 @@ Output from the API example program
First, instrument and run the program.
.. code-block:: shell
.. code-block:: shell-session
$ rocprof-sys-instrument -l --min-instructions=8 -E custom_push_region -o -- ./user-api
...
+15 -15
View File
@@ -2,17 +2,17 @@
:description: ROCm Systems Profiler documentation and reference
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, tracking, visualization, tool, Instinct, accelerator, AMD
***********************
***********************************
ROCm Systems Profiler documentation
***********************
***********************************
ROCm Systems Profiler, formerly known as "Omnitrace", is designed for the high-level profiling and comprehensive tracing
ROCm Systems Profiler is designed for the high-level profiling and comprehensive tracing
of applications running on the CPU or the CPU and GPU. It supports dynamic binary
instrumentation, call-stack sampling, and various other features for determining
which function and line number are currently executing. To learn more, see :doc:`what-is-rocprof-sys`
The code is open and hosted at `<https://github.com/ROCm/rocprofiler-systems>`_.
ROCm Systems Profiler is open source and hosted at `<https://github.com/ROCm/rocprofiler-systems>`__.
It is the successor to `<https://github.com/ROCm/omnitrace>`__.
.. grid:: 2
:gutter: 3
@@ -22,17 +22,12 @@ The code is open and hosted at `<https://github.com/ROCm/rocprofiler-systems>`_.
* :doc:`Quick start <./install/quick-start>`
* :doc:`ROCm Systems Profiler installation <./install/install>`
The documentation is structured as follows:
Use the following topics to learn more about the advantages of ROCm Systems Profiler in application
profiling, how it supports performance analysis, and how to leverage its capabilities in practice:
.. grid:: 2
:gutter: 3
.. grid-item-card:: Tutorials
* `GitHub examples <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/examples>`_
* :doc:`Video tutorials <./tutorials/video-tutorials>`
.. grid-item-card:: How to
* :doc:`Configuring and validating the ROCm Systems Profiler environment <./how-to/configuring-validating-environment>`
@@ -48,19 +43,24 @@ The documentation is structured as follows:
.. grid-item-card:: Conceptual
* :doc:`Data collection modes <./conceptual/data-collection-modes>`
* :doc:`The ROCm Systems Profiler feature set <./conceptual/rocprof-sys-feature-set>`
* :doc:`Features and use cases <./conceptual/rocprof-sys-feature-set>`
.. grid-item-card:: Reference
* :doc:`Development guide <./reference/development-guide>`
* :doc:`ROCm Systems Profiler glossary <./reference/rocprof-sys-glossary>`
* :doc:`Glossary <./reference/rocprof-sys-glossary>`
* :doc:`API library <./doxygen/html/files>`
* :doc:`Class member functions <./doxygen/html/functions>`
* :doc:`Globals <./doxygen/html/globals>`
* :doc:`Classes, structures, and interfaces <./doxygen/html/annotated>`
.. grid-item-card:: Tutorials
* `GitHub examples <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/examples>`_
* :doc:`Video tutorials <./tutorials/video-tutorials>`
To contribute to the documentation, refer to
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
You can find licensing information on the
`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
@@ -1,4 +0,0 @@
# License
```{include} ../LICENSE
```
@@ -0,0 +1,8 @@
.. meta::
:description: ROCm Systems Profiler license
*******
License
*******
.. include:: ../LICENSE
@@ -16,7 +16,7 @@ Executables
This section lists the ROCm Systems Profiler executables.
rocprof-sys-avail: `source/bin/rocprof-sys-avail <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/source/bin/rocprof-sys-avail>`_
-------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------
The ``main`` routine of ``rocprof-sys-avail`` has three important sections:
@@ -25,7 +25,7 @@ The ``main`` routine of ``rocprof-sys-avail`` has three important sections:
* Printing hardware counters
rocprof-sys-sample: `source/bin/rocprof-sys-sample <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/source/bin/rocprof-sys-sample>`_
----------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------
* Requires a command-line format of ``rocprof-sys-sample <options> -- <command> <command-args>``
* Translates command-line options into environment variables
@@ -33,7 +33,7 @@ rocprof-sys-sample: `source/bin/rocprof-sys-sample <https://github.com/ROCm/rocp
* Is launched by using ``execvpe`` with ``<command> <command-args>`` and a modified environment
rocprof-sys-casual: `source/bin/rocprof-sys-causal <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/source/bin/rocprof-sys-causal>`_
----------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------------
When there is exactly one causal profiling configuration variant (which enables debugging),
``rocprof-sys-casual`` has a nearly identical design to ``rocprof-sys-sample``
@@ -46,7 +46,7 @@ the following actions take place for each variant:
* the parent process waits for the child process to finish
rocprof-sys-instrument: `source/bin/rocprof-sys-instrument <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/source/bin/rocprof-sys-instrument>`_
----------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
* Requires a command-line format of ``rocprof-sys-instrument <options> -- <command> <command-args>``
* Allows the user to provide options specifying whether to perform runtime instrumentation, use binary rewrite, or
@@ -95,7 +95,7 @@ librocprof-sys: `source/lib/rocprof-sys <https://github.com/ROCm/rocprofiler-sys
This is the main library encapsulating all the capabilities.
librocprof-sys-dl: `source/lib/rocprof-sys-dl <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/source/lib/rocprof-sys-dl>`_
--------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------
This is a lightweight, front-end library for ``librocprof-sys`` which serves three primary purposes:
@@ -106,7 +106,7 @@ This is a lightweight, front-end library for ``librocprof-sys`` which serves thr
* Coordinates communication between ``librocprof-sys-user`` and ``librocprof-sys``
librocprof-sys-user: `source/lib/rocprof-sys-user <https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/source/lib/rocprof-sys-user>`_
--------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------
* Provides a set of functions and types for the users to add to their code, for example,
disabling data collection globally or on a specific thread or
@@ -2,9 +2,9 @@
:description: ROCm Systems Profiler glossary and reference
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, glossary, terminology, profiler, tracking, visualization, tool, Instinct, accelerator, AMD
*******************
ROCm Systems Profiler Glossary
*******************
********
Glossary
********
This topic explains the terminology necessary to use ROCm Systems Profiler.
The list below provides a basic glossary for those who
@@ -13,59 +13,59 @@ when certain terms have different
contextual meanings, for example, the ROCm Systems Profiler meaning of the term "module"
when instrumenting Python.
**Binary**
Binary
A file written in the Executable and Linkable Format (ELF). This is the standard file
format for executable files, shared libraries, etc.
**Binary instrumentation**
Binary instrumentation
Inserting callbacks to instrumentation into an existing binary. This can be performed
statically or dynamically.
**Static binary instrumentation**
Static binary instrumentation
Loads an existing binary, determines instrumentation points, and generates a new binary
with instrumentation directly embedded. It is applicable to executables and libraries but
limited to only the functions defined in the binary. This is also known as **Binary rewrite**.
**Dynamic binary instrumentation**
Dynamic binary instrumentation
Loads an existing binary into memory, inserts instrumentation, and runs the binary.
It is limited to executables but is capable of instrumenting linked libraries.
This is also known as **Runtime instrumentation**.
**Statistical sampling**
Statistical sampling
At periodic intervals, the application is paused and the current call-stack of the CPU
is recorded along with various other metrics. It uses timers that measure either
(A) real clock time or (B) the CPU time used by the current thread and the CPU time
expended on behalf of the thread by the system. This is also known as simply **sampling**.
**Sampling rate**
Sampling rate
* The period at which (A) or (B) are triggered (in units of ``# interrupts / second``)
* Higher values increase the number of samples
**Sampling delay**
Sampling delay
* How long to wait before (A) and (B) begin triggering at their designated rate
**Sampling duration**
Sampling duration
* The amount of time (in real-time) after the start of the application to record samples.
* After this time limit has been reached, no more samples are recorded.
**Process sampling**
Process sampling
At periodic (real-time) intervals, a background thread records global metrics without
interrupting the current process. These metrics include, but are not limited to:
CPU frequency, CPU memory high-water mark (i.e. peak memory usage), GPU temperature,
and GPU power usage.
**Sampling rate**
Sampling rate
* The real-time period for recording metrics (in units of ``# measurements / second``)
* Higher values increase the number of samples
**Sampling delay**
Sampling delay
* How long to wait (in real-time) before recording samples
**Sampling duration**
Sampling duration
* The amount of time (in real-time) after the start of the application to record samples.
* After this time limit has been reached, no more samples are recorded.
**Module**
Module
With respect to binary instrumentation, a module is defined as either the filename
(such as ``foo.c``) or library name (``libfoo.so``) which contains the definition
of one or more functions.
@@ -74,18 +74,18 @@ when instrumenting Python.
the definition of one or more functions. The full path to this file typically contains the
name of the "Python module".
**Basic block**
Basic block
A straight-line code sequence with no branches in (except for the entry) and
no branches out (except for the exit).
**Address range**
Address range
The instructions for a function in a binary start at certain address with the ELF file
and end at a certain address. The range is ``end - start``.
The address range is a decent approximation for the "cost" of a function.
For example, a larger address range approximately equates to more instructions.
**Instrumentation traps**
Instrumentation traps
On the x86 architecture, because instructions are of variable size, an instruction
might be too small for Dyninst to replace it with the normal code sequence
used to call instrumentation. When instrumentation is placed at points other
@@ -93,10 +93,10 @@ when instrumenting Python.
the instrumentation fits. (By default, ``rocprof-sys-instrument`` avoids instrumentation
which requires a trap.)
**Overlapping functions**
Overlapping functions
Due to language constructs or compiler optimizations, it might be possible for
multiple functions to overlap (that is, share part of the same function body)
or for a single function to have multiple entry points. In practice, it's
impossible to determine the difference between multiple overlapping functions
and a single function with multiple entry points. (By default, ``rocprof-sys-instrument``
avoids instrumenting overlapping functions.)
avoids instrumenting overlapping functions.)
@@ -15,13 +15,6 @@ subtrees:
- file: install/install.rst
title: ROCm Systems Profiler installation guide
- caption: Tutorials
entries:
- url: https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/examples
title: GitHub examples
- file: tutorials/video-tutorials.rst
title: Video tutorials
- caption: How to
entries:
- file: how-to/configuring-validating-environment.rst
@@ -45,17 +38,17 @@ subtrees:
- caption: Conceptual
entries:
- file: conceptual/rocprof-sys-feature-set.rst
title: Features and use cases
- file: conceptual/data-collection-modes.rst
title: Data collection modes
- file: conceptual/rocprof-sys-feature-set.rst
title: The ROCm Systems Profiler feature set and use cases
- caption: Reference
entries:
- file: reference/development-guide.rst
title: Development guide
- file: reference/rocprof-sys-glossary.rst
title: ROCm Systems Profiler glossary
title: Glossary
- file: doxygen/html/files
title: API library
- file: doxygen/html/functions
@@ -65,6 +58,13 @@ subtrees:
- file: doxygen/html/annotated
title: Classes, structures, and interfaces
- caption: Tutorials
entries:
- url: https://github.com/ROCm/rocprofiler-systems/tree/amd-mainline/examples
title: GitHub examples
- file: tutorials/video-tutorials.rst
title: Video tutorials
- caption: About
entries:
- file: license.md
- file: license.rst
@@ -23,8 +23,8 @@ Instrumenting a binary
<p align="center"><iframe width="560" height="315" src="https://www.youtube.com/embed/2B0gRr3FygQ?modestbranding=1" title="YouTube video player" frameborder="0" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
Writing an ROCm Systems Profiler configuration file
========================================
Writing a ROCm Systems Profiler configuration file
==================================================
.. raw:: html
@@ -35,4 +35,4 @@ Visualization and features of Perfetto traces
.. raw:: html
<p align="center"><iframe width="560" height="315" src="https://www.youtube.com/embed/7WN3N1hnCbI?modestbranding=1" title="YouTube video player" frameborder="0" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
<p align="center"><iframe width="560" height="315" src="https://www.youtube.com/embed/7WN3N1hnCbI?modestbranding=1" title="YouTube video player" frameborder="0" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p>
@@ -2,9 +2,9 @@
:description: ROCm Systems Profiler introduction, explanation, and reference
:keywords: rocprof-sys, rocprofiler-systems, Omnitrace, ROCm, profiler, explanation, introduction, what is, tracking, visualization, tool, Instinct, accelerator, AMD
******************
******************************
What is ROCm Systems Profiler?
******************
******************************
ROCm Systems Profiler is designed for the high-level profiling and comprehensive tracing
of applications running on the CPU or the CPU and GPU. It supports dynamic binary