[Rocprofiler-systems]: Documentation addition for xgmi and pcie metrics feature (#1798)

* Documentation addition for xgmi and pcie metrics feature

Add documentation to provide details about How to get collect XGMI and PCIe interconnect metrics.

* Apply suggestions from code review

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Update projects/rocprofiler-systems/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
此提交包含在:
Sajina PK
2025-11-17 18:34:28 -05:00
提交者 GitHub
父節點 53c9b9655d
當前提交 f6183e3563
共有 9 個檔案被更改,包括 188 行新增3 行删除
+2
查看文件
@@ -8,6 +8,8 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.
### Added
- Profiling and metric collection capabilities for XGMI and PCIe data.
- How-to document for XGMI and PCIe sampling and monitoring.
- Added a `ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MS` configuration setting to set the flush period for Perfetto traces. The default value is 10000 ms (10 seconds).
- Added fetching of the `rocpd` schema from rocprofiler-sdk-rocpd
+3 -1
查看文件
@@ -64,9 +64,11 @@ The documentation source files reside in the [`/docs`](/docs) folder of this rep
- Utilization
- VCN Utilization
- JPEG Utilization
- XGMI interconnect metrics (link width, link speed, read/write data)
- PCIe metrics (link width, link speed, bandwidth)
> [!NOTE]
> The availability of VCN and JPEG engine utilization depends on device support for different ASICs. If unsupported, all values for VCN_ACTIVITY and JPEG_ACTIVITY will be reported as N/A in the output of `amd-smi metric --usage`.
> The availability of VCN, JPEG, XGMI, and PCIe metrics depends on device support, system topology, and GPU architecture. If unsupported, all values will be reported as N/A in the output of `amd-smi metric --usage`.
### CPU metrics
+6 -1
查看文件
@@ -62,7 +62,12 @@ GPU metrics
* Utilization
* VCN activity
* JPEG activity
Note: The availability of VCN and JPEG engine activity depends on device support for different ASICs. If unsupported, all values for VCN_ACTIVITY and JPEG_ACTIVITY will be reported as N/A in the output of amd-smi metric--usage.
* XGMI interconnect metrics (link width, link speed, read/write data)
* PCIe metrics (link width, link speed, bandwidth)
.. note::
The availability of VCN, JPEG, XGMI, and PCIe metrics depends on device support and system topology. If unsupported, values will be reported as ``N/A`` in the output of ``amd-smi metric --usage``.
CPU metrics
========================================
未顯示二進位檔案。

之後

寬度:  |  高度:  |  大小: 232 KiB

未顯示二進位檔案。

之後

寬度:  |  高度:  |  大小: 241 KiB

+1 -1
查看文件
@@ -252,7 +252,7 @@ For example, the following is a valid configuration:
ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,vcn_activity,mem_usage
Supported values for ``ROCPROFSYS_AMD_SMI_METRICS`` are: ``busy``, ``temp``, ``power``, ``vcn_activity``, ``mem_usage``, ``jpeg_activity``.
Supported values for ``ROCPROFSYS_AMD_SMI_METRICS`` are: ``busy``, ``temp``, ``power``, ``vcn_activity``, ``mem_usage``, ``jpeg_activity``, ``xgmi``, ``pcie``.
API tracing is configured with the ``ROCPROFSYS_ROCM_DOMAINS`` setting. The domains are used to filter the events that are captured during profiling.
Supported values for this setting are those supported by ROCprofiler-SDK, which are returned by the API ``get_callback_tracing_names()`` and ``get_buffer_tracing_names()``. See the `ROCprofiler-SDK developer API documentation <https://rocm.docs.amd.com/projects/rocprofiler-sdk/en/latest/_doxygen/rocprofiler-sdk/html/>`_ to learn more about ROCprofiler-SDK APIs.
+173
查看文件
@@ -0,0 +1,173 @@
.. meta::
:description: ROCm Systems Profiler XGMI and PCIe metrics sampling and monitoring
:keywords: rocprof-sys, rocprofiler-systems, ROCm, tips, how to, profiler, tracking, XGMI, PCIe, GPU connectivity, AMD
***********************************************
XGMI and PCIe metrics sampling and monitoring
***********************************************
`ROCm Systems Profiler <https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-systems>`_ supports
sampling of XGMI and PCIe interconnect metrics. It allows you to gather key performance metrics for
GPU-to-GPU communication via XGMI links, and CPU-to-GPU communication via PCIe links. This information can be used
to optimize multi-GPU workloads, identify communication bottlenecks, and analyze data transfer efficiency
in high-performance computing applications.
Sampling support
=================
Sampling of XGMI and PCIe interconnect metrics is supported by leveraging `AMD SMI <https://rocm.docs.amd.com/projects/amdsmi/en/latest/>`_ which provides the interface for GPU metric collection. Follow the steps:
1. Set the ``ROCPROFSYS_USE_AMD_SMI`` environment variable to enable GPU metric collection:
.. code-block:: shell
export ROCPROFSYS_USE_AMD_SMI=true
2. Update the ``ROCPROFSYS_AMD_SMI_METRICS`` variable to collect the XGMI and PCIe metrics. The default value is:
.. code-block:: shell
ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,mem_usage
To include XGMI and PCIe metrics, update it to:
.. code-block:: shell
ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,mem_usage,xgmi,pcie
Alternatively, you can use the following to collect all available GPU metrics:
.. code-block:: shell
ROCPROFSYS_AMD_SMI_METRICS=all
XGMI metrics
------------
XGMI (AMD Infinity Fabric™ XGMI) provides high-bandwidth, low-latency GPU-to-GPU interconnects in multi-GPU systems. The following XGMI metrics are collected:
- **XGMI Link Width**: The number of active XGMI links between GPUs
- **XGMI Link Speed**: The speed of XGMI links (in GT/s)
- **XGMI Read Data**: Accumulated data read through each XGMI link (in KB)
- **XGMI Write Data**: Accumulated data written through each XGMI link (in KB)
These metrics help identify GPU-to-GPU communication patterns and bandwidth utilization in multi-GPU workloads.
.. note::
XGMI metrics are only available on systems with multiple GPUs connected via XGMI links.
The availability depends on the system topology and GPU architecture. If unsupported or not
available, the values will be reported as N/A in the output.
PCIe metrics
------------
PCIe (PCI Express) provides the connection between the CPU and GPU. The following PCIe metrics are collected:
- **PCIe Link Width**: The number of PCIe lanes currently active
- **PCIe Link Speed**: The current PCIe link generation and speed (e.g., Gen3, Gen4, Gen5)
- **PCIe Bandwidth Accumulated**: Total bandwidth accumulated over time (in MB)
- **PCIe Bandwidth Instantaneous**: Instantaneous bandwidth at the time of sampling (in MB/s)
These metrics help analyze CPU-to-GPU data transfer efficiency and identify PCIe bottlenecks.
Using TransferBench for testing
================================
For testing and benchmarking GPU connectivity, you can use the `TransferBench <https://rocm.docs.amd.com/projects/TransferBench/en/latest/index.html>`_.
TransferBench is a benchmarking utility designed to measure the performance of simultaneous data transfers between user-specified devices, such as CPUs and GPUs.
For this example, TransferBench is used to profile XGMI and PCIe traffic for analysis.
1. Source the ROCm Systems Profiler Environment using:
.. code-block:: shell
source /opt/rocprofiler-systems/share/rocprofiler-systems/setup-env.sh
Alternatively, if you are using modules, use:
.. code-block:: shell
module use /opt/rocprofiler-systems/share/modulefiles
2. Generate and configure the profiler config file.
.. code-block:: shell
rocprof-sys-avail -G $HOME/.rocprofsys.cfg -F txt
export ROCPROFSYS_CONFIG_FILE=$HOME/.rocprofsys.cfg
Edit ``.rocprofsys.cfg`` with the following settings:
.. code-block:: shell
ROCPROFSYS_USE_AMD_SMI = true
ROCPROFSYS_AMD_SMI_METRICS = busy,temp,power,mem_usage,xgmi,pcie
ROCPROFSYS_ROCM_DOMAINS = hip_runtime_api,memory_copy,hsa_api
3. Profile the TransferBench application.
.. code-block:: shell
rocprof-sys-sample -PTHD -- ./TransferBench a2a
.. note::
Refer to these steps to `Install and build TransferBench <https://rocm.docs.amd.com/projects/TransferBench/en/latest/install/install.html#install-transferbench>`_.
At the end of the run, a similar message appears::
[rocprofiler-systems][964294][perfetto]> Outputting '/home/demo/rocprofsys-transferBench-output/2025-04-25_15.52/perfetto-trace-964294.proto'
(3124.52 KB / 3.12 MB / 0.00 GB)... Done
To view the generated ``.proto`` file in the browser, open the
`Perfetto UI page <https://ui.perfetto.dev/>`_. Then, click on
``Open trace file`` and select the ``.proto`` file. In the browser, you can visualize the XGMI and PCIe metrics.
.. image:: ../data/rocprof-sys-xgmi.png
:alt: Visualization of a performance graph in Perfetto with XGMI tracks
.. image:: ../data/rocprof-sys-pcie.png
:alt: Visualization of a performance graph in Perfetto with PCIe tracks
The visualization will show:
- **XGMI Read Data** and **XGMI Write Data** tracks showing data transfer through XGMI links over time
- **XGMI Link Width** and **XGMI Link Speed** tracks showing link configuration
- **PCIe Bandwidth** tracks showing CPU-to-GPU data transfer rates
- **PCIe Link Width** and **PCIe Link Speed** tracks showing PCIe link configuration
Tips for effective profiling
=============================
1. **Multi-GPU workloads**: XGMI metrics are most useful when profiling applications that use multiple GPUs and transfer data between them.
2. **Sampling frequency**: Adjust the sampling frequency using ``ROCPROFSYS_PROCESS_SAMPLING_FREQ`` (default is 50Hz) to capture more or fewer samples based on your analysis needs.
3. **Focus on specific metrics**: If you only need XGMI or PCIe metrics, you can specify just those:
.. code-block:: shell
ROCPROFSYS_AMD_SMI_METRICS=xgmi # Only XGMI metrics
ROCPROFSYS_AMD_SMI_METRICS=pcie # Only PCIe metrics
4. **Combine with API tracing**: For detailed analysis, combine XGMI/PCIe metrics with HIP/HSA API tracing to correlate data transfers with application behavior:
.. code-block:: shell
ROCPROFSYS_ROCM_DOMAINS=hip_runtime_api,memory_copy,kernel_dispatch,hsa_api
Exploring available metrics
============================
To explore all supported metrics and domains, use the following commands:
.. code-block:: shell
rocprof-sys-avail --all # Show all available options
rocprof-sys-avail -bd -r AMD_SMI_METRICS # Show AMD SMI metrics
rocprof-sys-avail -bd -r ROCM_DOMAINS # Show ROCm tracing domains
For more details on ROCm Systems Profiler configuration, refer to the `configuration guide <configuring-runtime-options.html>`_.
+1
查看文件
@@ -41,6 +41,7 @@ profiling, how it supports performance analysis, and how to leverage its capabil
* :doc:`Profiling Python scripts <./how-to/profiling-python-scripts>`
* :doc:`Network performance profiling <./how-to/nic-profiling>`
* :doc:`VCN and JPEG sampling and tracing <./how-to/vcn-jpeg-sampling>`
* :doc:`XGMI and PCIe metrics monitoring <./how-to/xgmi-pcie-sampling>`
* :doc:`Understanding the output <./how-to/understanding-rocprof-sys-output>`
* :doc:`Using the ROCm Systems Profiler API <./how-to/using-rocprof-sys-api>`
+2
查看文件
@@ -37,6 +37,8 @@ subtrees:
title: Network performance profiling
- file: how-to/vcn-jpeg-sampling.rst
title: VCN and JPEG sampling and tracing
- file: how-to/xgmi-pcie-sampling.rst
title: XGMI and PCIe metrics monitoring
- file: how-to/understanding-rocprof-sys-output.rst
title: Understanding the output
- file: how-to/using-rocprof-sys-api.rst