diff --git a/projects/rocprofiler-systems/.wordlist.txt b/projects/rocprofiler-systems/.wordlist.txt index 94995602e9..f237d3901b 100644 --- a/projects/rocprofiler-systems/.wordlist.txt +++ b/projects/rocprofiler-systems/.wordlist.txt @@ -6,6 +6,7 @@ CrayPAT dl durations Dyninst +enp Kokkos KokkosP librocprof @@ -27,6 +28,7 @@ polymorphism POSIX ppc proc +proto Pthreads rocDecode ROCprofiler diff --git a/projects/rocprofiler-systems/CHANGELOG.md b/projects/rocprofiler-systems/CHANGELOG.md index bee1eac71d..75bafe2328 100644 --- a/projects/rocprofiler-systems/CHANGELOG.md +++ b/projects/rocprofiler-systems/CHANGELOG.md @@ -7,6 +7,7 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs. ### Added - Added profiling and metric collection capabilities for VCN engine activity, JPEG engine activity and API tracing for rocDecode, rocJPEG and VA-APIs. +- Added a "how-to" document for network performance profiling for standard Network Interface Cards (NICs). ### Changed diff --git a/projects/rocprofiler-systems/docs/data/rocprof-sys-perfetto-nic-trace.png b/projects/rocprofiler-systems/docs/data/rocprof-sys-perfetto-nic-trace.png new file mode 100644 index 0000000000..1e252d5ee9 Binary files /dev/null and b/projects/rocprofiler-systems/docs/data/rocprof-sys-perfetto-nic-trace.png differ diff --git a/projects/rocprofiler-systems/docs/how-to/nic-profiling.rst b/projects/rocprofiler-systems/docs/how-to/nic-profiling.rst new file mode 100644 index 0000000000..25b231af06 --- /dev/null +++ b/projects/rocprofiler-systems/docs/how-to/nic-profiling.rst @@ -0,0 +1,131 @@ +.. meta:: + :description: ROCm Systems Profiler network performance profiling + :keywords: rocprof-sys, rocprofiler-systems, ROCm, tips, how to, profiler, tracking, NIC, network, AMD + +******************************************** +Network performance profiling +******************************************** + +`ROCm Systems Profiler `_ supports network profiling. + +All network events that can be traced on the system can be listed by running the command: + +.. code-block:: shell + + rocprof-sys-avail -H -r net + +For example, if the system's NIC is enp7s0, then the output of this command looks like: + +.. code-block:: shell + + |-------------------------------|---------|-----------|-------------------------------| + | HARDWARE COUNTER | DEVICE | AVAILABLE | SUMMARY | + |-------------------------------|---------|-----------|-------------------------------| + | net:::enp7s0:rx:byte | CPU | true | enp7s0 receive byte | + | net:::enp7s0:rx:packet | CPU | true | enp7s0 receive packet | + | net:::enp7s0:rx:error | CPU | true | enp7s0 receive error | + | net:::enp7s0:rx:droppe | CPU | true | enp7s0 receive droppe | + | net:::enp7s0:rx:fif | CPU | true | enp7s0 receive fif | + | net:::enp7s0:rx:fram | CPU | true | enp7s0 receive fram | + | net:::enp7s0:rx:compresse | CPU | true | enp7s0 receive compresse | + | net:::enp7s0:rx:multicas | CPU | true | enp7s0 receive multicas | + | net:::enp7s0:tx:byte | CPU | true | enp7s0 transmit byte | + | net:::enp7s0:tx:packet | CPU | true | enp7s0 transmit packet | + | net:::enp7s0:tx:error | CPU | true | enp7s0 transmit error | + | net:::enp7s0:tx:droppe | CPU | true | enp7s0 transmit droppe | + | net:::enp7s0:tx:fif | CPU | true | enp7s0 transmit fif | + | net:::enp7s0:tx:coll | CPU | true | enp7s0 transmit coll | + | net:::enp7s0:tx:carrie | CPU | true | enp7s0 transmit carrie | + | net:::enp7s0:tx:compresse | CPU | true | enp7s0 transmit compresse | + |-------------------------------|---------|-----------|-------------------------------| + +To track bytes and packets sent and received by the NIC ``enp7s0``, the configuration parameters should be configured as the following example: + +.. code-block:: shell + + ROCPROFSYS_PAPI_EVENTS = net:::enp7s0:tx:byte net:::enp7s0:rx:byte net:::enp7s0:tx:packet net:::enp7s0:rx:packet + +Configuration +============= + +A sample configuration parameter settings looks like: + +.. code-block:: shell + + ROCPROFSYS_SAMPLING_FREQ=10 + ROCPROFSYS_USE_SAMPLING=ON + ROCPROFSYS_TIMEMORY_COMPONENTS=wall_clock papi_array network_stats + ROCPROFSYS_NETWORK_INTERFACE=enp7s0 + ROCPROFSYS_PAPI_EVENTS=net:::enp7s0:tx:byte net:::enp7s0:rx:byte net:::enp7s0:rx:packet net:::enp7s0:tx:packet + +Details of the configuration parameter settings configured in the example are: + +* **Sampling Frequency**: 10 samples per second +* **TIMEMORY**: Outputs the summaries for the ``wall_clock``, ``papi_array``, and ``network_stats`` components. +* **Network Interface**: ``enp7s0`` is the predictable network interface device name. +* **Events for the network device to be sampled**: Bytes transmitted, bytes received, packets transmitted, and packets received. + +The configuration parameter settings can be saved in a configuration file. Here is an example of a complete configuration file, ``rocprofsys.cfg``: + +.. code-block:: shell + + ROCPROFSYS_VERBOSE=1 + ROCPROFSYS_DL_VERBOSE=1 + ROCPROFSYS_SAMPLING_FREQ=10 + ROCPROFSYS_SAMPLING_DELAY=0.05 + ROCPROFSYS_SAMPLING_CPUS=0-9 + ROCPROFSYS_SAMPLING_GPUS=$env:HIP_VISIBLE_DEVICES + ROCPROFSYS_TRACE=ON + ROCPROFSYS_PROFILE=ON + ROCPROFSYS_USE_SAMPLING=ON + ROCPROFSYS_USE_PROCESS_SAMPLING=OFF + ROCPROFSYS_TIME_OUTPUT=OFF + ROCPROFSYS_FILE_OUTPUT=ON + ROCPROFSYS_TIMEMORY_COMPONENTS=wall_clock papi_array network_stats + ROCPROFSYS_USE_PID=OFF + ROCPROFSYS_OUTPUT_PREFIX=foo/ + ROCPROFSYS_NETWORK_INTERFACE=enp7s0 + ROCPROFSYS_PAPI_EVENTS = net:::enp7s0:tx:byte net:::enp7s0:rx:byte net:::enp7s0:rx:packet net:::enp7s0:tx:packet + +To specify the configuration file, use the ``ROCPROFSYS_CONFIG_FILE`` setting: + +.. code-block:: shell + + ROCPROFSYS_CONFIG_FILE=/path/to/rocprofsys.cfg + +This setting defines the location of the ROCm Systems Profiler configuration file. + +.. note:: + + To collect network counters using Process Application Program Interface (PAPI), ensure that + `/proc/sys/kernel/perf_event_paranoid` has a value <= 2. See + :ref:`rocprof-sys_papi_events` + for details. + +Instrumenting and running a program +=================================== + +An example rocprof-sys-instrument command is: + +.. code-block:: shell + + rocprof-sys-instrument -o foo.inst \ + --log-file mylog.log --verbose --debug \ + "--print-instrumented" "functions" "-e" "-v" "2" "--caller-include" \ + "inner" "-i" "4096" "--" ./foo + +This command generates an instrumented binary ``foo.inst``. Then, run +it with the following command: + +.. code-block:: shell + + rocprof-sys-run -- ./foo.inst + +To view the generated ``.proto`` file in the browser, open the +`Perfetto UI page `_. Then, click on +``Open trace file`` and select the ``.proto`` file. In the browser, it looks +like this: + +.. image:: ../data/rocprof-sys-perfetto-nic-trace.png + :alt: Visualization of a performance graph in Perfetto with network tracks + :width: 800 diff --git a/projects/rocprofiler-systems/docs/index.rst b/projects/rocprofiler-systems/docs/index.rst index bdf9cc1008..57dc081837 100644 --- a/projects/rocprofiler-systems/docs/index.rst +++ b/projects/rocprofiler-systems/docs/index.rst @@ -40,6 +40,7 @@ profiling, how it supports performance analysis, and how to leverage its capabil * :doc:`Instrumenting and rewriting a binary application <./how-to/instrumenting-rewriting-binary-application>` * :doc:`Performing causal profiling <./how-to/performing-causal-profiling>` * :doc:`Profiling Python scripts <./how-to/profiling-python-scripts>` + * :doc:`Network performance profiling <./how-to/nic-profiling>` * :doc:`Understanding the output <./how-to/understanding-rocprof-sys-output>` * :doc:`Using the ROCm Systems Profiler API <./how-to/using-rocprof-sys-api>` diff --git a/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in b/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in index 57fbf67997..711317e8bf 100644 --- a/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in +++ b/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in @@ -35,6 +35,8 @@ subtrees: title: Performing causal profiling - file: how-to/profiling-python-scripts.rst title: Profiling Python scripts + - file: how-to/nic-profiling.rst + title: Network performance profiling - file: how-to/understanding-rocprof-sys-output.rst title: Understanding the output - file: how-to/using-rocprof-sys-api.rst