"How-To" document describing network performance profiling (#145)

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

[ROCm/rocprofiler-systems commit: de84a277f2]
Этот коммит содержится в:
ajanicijamd
2025-04-09 16:11:52 -04:00
коммит произвёл GitHub
родитель e004775878
Коммит 01c1cbe57f
6 изменённых файлов: 137 добавлений и 0 удалений
+2
Просмотреть файл
@@ -6,6 +6,7 @@ CrayPAT
dl
durations
Dyninst
enp
Kokkos
KokkosP
librocprof
@@ -27,6 +28,7 @@ polymorphism
POSIX
ppc
proc
proto
Pthreads
rocDecode
ROCprofiler
+1
Просмотреть файл
@@ -7,6 +7,7 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.
### Added
- Added profiling and metric collection capabilities for VCN engine activity, JPEG engine activity and API tracing for rocDecode, rocJPEG and VA-APIs.
- Added a "how-to" document for network performance profiling for standard Network Interface Cards (NICs).
### Changed
Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 905 KiB

+131
Просмотреть файл
@@ -0,0 +1,131 @@
.. meta::
:description: ROCm Systems Profiler network performance profiling
:keywords: rocprof-sys, rocprofiler-systems, ROCm, tips, how to, profiler, tracking, NIC, network, AMD
********************************************
Network performance profiling
********************************************
`ROCm Systems Profiler <https://github.com/ROCm/rocprofiler-systems>`_ supports network profiling.
All network events that can be traced on the system can be listed by running the command:
.. code-block:: shell
rocprof-sys-avail -H -r net
For example, if the system's NIC is enp7s0, then the output of this command looks like:
.. code-block:: shell
|-------------------------------|---------|-----------|-------------------------------|
| HARDWARE COUNTER | DEVICE | AVAILABLE | SUMMARY |
|-------------------------------|---------|-----------|-------------------------------|
| net:::enp7s0:rx:byte | CPU | true | enp7s0 receive byte |
| net:::enp7s0:rx:packet | CPU | true | enp7s0 receive packet |
| net:::enp7s0:rx:error | CPU | true | enp7s0 receive error |
| net:::enp7s0:rx:droppe | CPU | true | enp7s0 receive droppe |
| net:::enp7s0:rx:fif | CPU | true | enp7s0 receive fif |
| net:::enp7s0:rx:fram | CPU | true | enp7s0 receive fram |
| net:::enp7s0:rx:compresse | CPU | true | enp7s0 receive compresse |
| net:::enp7s0:rx:multicas | CPU | true | enp7s0 receive multicas |
| net:::enp7s0:tx:byte | CPU | true | enp7s0 transmit byte |
| net:::enp7s0:tx:packet | CPU | true | enp7s0 transmit packet |
| net:::enp7s0:tx:error | CPU | true | enp7s0 transmit error |
| net:::enp7s0:tx:droppe | CPU | true | enp7s0 transmit droppe |
| net:::enp7s0:tx:fif | CPU | true | enp7s0 transmit fif |
| net:::enp7s0:tx:coll | CPU | true | enp7s0 transmit coll |
| net:::enp7s0:tx:carrie | CPU | true | enp7s0 transmit carrie |
| net:::enp7s0:tx:compresse | CPU | true | enp7s0 transmit compresse |
|-------------------------------|---------|-----------|-------------------------------|
To track bytes and packets sent and received by the NIC ``enp7s0``, the configuration parameters should be configured as the following example:
.. code-block:: shell
ROCPROFSYS_PAPI_EVENTS = net:::enp7s0:tx:byte net:::enp7s0:rx:byte net:::enp7s0:tx:packet net:::enp7s0:rx:packet
Configuration
=============
A sample configuration parameter settings looks like:
.. code-block:: shell
ROCPROFSYS_SAMPLING_FREQ=10
ROCPROFSYS_USE_SAMPLING=ON
ROCPROFSYS_TIMEMORY_COMPONENTS=wall_clock papi_array network_stats
ROCPROFSYS_NETWORK_INTERFACE=enp7s0
ROCPROFSYS_PAPI_EVENTS=net:::enp7s0:tx:byte net:::enp7s0:rx:byte net:::enp7s0:rx:packet net:::enp7s0:tx:packet
Details of the configuration parameter settings configured in the example are:
* **Sampling Frequency**: 10 samples per second
* **TIMEMORY**: Outputs the summaries for the ``wall_clock``, ``papi_array``, and ``network_stats`` components.
* **Network Interface**: ``enp7s0`` is the predictable network interface device name.
* **Events for the network device to be sampled**: Bytes transmitted, bytes received, packets transmitted, and packets received.
The configuration parameter settings can be saved in a configuration file. Here is an example of a complete configuration file, ``rocprofsys.cfg``:
.. code-block:: shell
ROCPROFSYS_VERBOSE=1
ROCPROFSYS_DL_VERBOSE=1
ROCPROFSYS_SAMPLING_FREQ=10
ROCPROFSYS_SAMPLING_DELAY=0.05
ROCPROFSYS_SAMPLING_CPUS=0-9
ROCPROFSYS_SAMPLING_GPUS=$env:HIP_VISIBLE_DEVICES
ROCPROFSYS_TRACE=ON
ROCPROFSYS_PROFILE=ON
ROCPROFSYS_USE_SAMPLING=ON
ROCPROFSYS_USE_PROCESS_SAMPLING=OFF
ROCPROFSYS_TIME_OUTPUT=OFF
ROCPROFSYS_FILE_OUTPUT=ON
ROCPROFSYS_TIMEMORY_COMPONENTS=wall_clock papi_array network_stats
ROCPROFSYS_USE_PID=OFF
ROCPROFSYS_OUTPUT_PREFIX=foo/
ROCPROFSYS_NETWORK_INTERFACE=enp7s0
ROCPROFSYS_PAPI_EVENTS = net:::enp7s0:tx:byte net:::enp7s0:rx:byte net:::enp7s0:rx:packet net:::enp7s0:tx:packet
To specify the configuration file, use the ``ROCPROFSYS_CONFIG_FILE`` setting:
.. code-block:: shell
ROCPROFSYS_CONFIG_FILE=/path/to/rocprofsys.cfg
This setting defines the location of the ROCm Systems Profiler configuration file.
.. note::
To collect network counters using Process Application Program Interface (PAPI), ensure that
`/proc/sys/kernel/perf_event_paranoid` has a value <= 2. See
:ref:`rocprof-sys_papi_events`
for details.
Instrumenting and running a program
===================================
An example rocprof-sys-instrument command is:
.. code-block:: shell
rocprof-sys-instrument -o foo.inst \
--log-file mylog.log --verbose --debug \
"--print-instrumented" "functions" "-e" "-v" "2" "--caller-include" \
"inner" "-i" "4096" "--" ./foo
This command generates an instrumented binary ``foo.inst``. Then, run
it with the following command:
.. code-block:: shell
rocprof-sys-run -- ./foo.inst
To view the generated ``.proto`` file in the browser, open the
`Perfetto UI page <https://ui.perfetto.dev/>`_. Then, click on
``Open trace file`` and select the ``.proto`` file. In the browser, it looks
like this:
.. image:: ../data/rocprof-sys-perfetto-nic-trace.png
:alt: Visualization of a performance graph in Perfetto with network tracks
:width: 800
+1
Просмотреть файл
@@ -40,6 +40,7 @@ profiling, how it supports performance analysis, and how to leverage its capabil
* :doc:`Instrumenting and rewriting a binary application <./how-to/instrumenting-rewriting-binary-application>`
* :doc:`Performing causal profiling <./how-to/performing-causal-profiling>`
* :doc:`Profiling Python scripts <./how-to/profiling-python-scripts>`
* :doc:`Network performance profiling <./how-to/nic-profiling>`
* :doc:`Understanding the output <./how-to/understanding-rocprof-sys-output>`
* :doc:`Using the ROCm Systems Profiler API <./how-to/using-rocprof-sys-api>`
+2
Просмотреть файл
@@ -35,6 +35,8 @@ subtrees:
title: Performing causal profiling
- file: how-to/profiling-python-scripts.rst
title: Profiling Python scripts
- file: how-to/nic-profiling.rst
title: Network performance profiling
- file: how-to/understanding-rocprof-sys-output.rst
title: Understanding the output
- file: how-to/using-rocprof-sys-api.rst