diff --git a/projects/rocprofiler-systems/CHANGELOG.md b/projects/rocprofiler-systems/CHANGELOG.md
index 2fabba4301..7cdbb61f5b 100644
--- a/projects/rocprofiler-systems/CHANGELOG.md
+++ b/projects/rocprofiler-systems/CHANGELOG.md
@@ -8,6 +8,8 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.
### Added
+- Profiling and metric collection capabilities for XGMI and PCIe data.
+- How-to document for XGMI and PCIe sampling and monitoring.
- Added a `ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MS` configuration setting to set the flush period for Perfetto traces. The default value is 10000 ms (10 seconds).
- Added fetching of the `rocpd` schema from rocprofiler-sdk-rocpd
diff --git a/projects/rocprofiler-systems/README.md b/projects/rocprofiler-systems/README.md
index 29cd6a4eee..38cfa564c1 100755
--- a/projects/rocprofiler-systems/README.md
+++ b/projects/rocprofiler-systems/README.md
@@ -64,9 +64,11 @@ The documentation source files reside in the [`/docs`](/docs) folder of this rep
- Utilization
- VCN Utilization
- JPEG Utilization
+ - XGMI interconnect metrics (link width, link speed, read/write data)
+ - PCIe metrics (link width, link speed, bandwidth)
> [!NOTE]
-> The availability of VCN and JPEG engine utilization depends on device support for different ASICs. If unsupported, all values for VCN_ACTIVITY and JPEG_ACTIVITY will be reported as N/A in the output of `amd-smi metric --usage`.
+> The availability of VCN, JPEG, XGMI, and PCIe metrics depends on device support, system topology, and GPU architecture. If unsupported, all values will be reported as N/A in the output of `amd-smi metric --usage`.
### CPU metrics
diff --git a/projects/rocprofiler-systems/docs/conceptual/rocprof-sys-feature-set.rst b/projects/rocprofiler-systems/docs/conceptual/rocprof-sys-feature-set.rst
index 6f7b2247bf..6601d89d56 100644
--- a/projects/rocprofiler-systems/docs/conceptual/rocprof-sys-feature-set.rst
+++ b/projects/rocprofiler-systems/docs/conceptual/rocprof-sys-feature-set.rst
@@ -62,7 +62,12 @@ GPU metrics
* Utilization
* VCN activity
* JPEG activity
- Note: The availability of VCN and JPEG engine activity depends on device support for different ASICs. If unsupported, all values for VCN_ACTIVITY and JPEG_ACTIVITY will be reported as N/A in the output of amd-smi metric--usage.
+ * XGMI interconnect metrics (link width, link speed, read/write data)
+ * PCIe metrics (link width, link speed, bandwidth)
+
+ .. note::
+
+ The availability of VCN, JPEG, XGMI, and PCIe metrics depends on device support and system topology. If unsupported, values will be reported as ``N/A`` in the output of ``amd-smi metric --usage``.
CPU metrics
========================================
diff --git a/projects/rocprofiler-systems/docs/data/rocprof-sys-pcie.png b/projects/rocprofiler-systems/docs/data/rocprof-sys-pcie.png
new file mode 100644
index 0000000000..37094b601f
Binary files /dev/null and b/projects/rocprofiler-systems/docs/data/rocprof-sys-pcie.png differ
diff --git a/projects/rocprofiler-systems/docs/data/rocprof-sys-xgmi.png b/projects/rocprofiler-systems/docs/data/rocprof-sys-xgmi.png
new file mode 100644
index 0000000000..628eee43ab
Binary files /dev/null and b/projects/rocprofiler-systems/docs/data/rocprof-sys-xgmi.png differ
diff --git a/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst b/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst
index 89a0017671..8b5f53f8a7 100644
--- a/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst
+++ b/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst
@@ -252,7 +252,7 @@ For example, the following is a valid configuration:
ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,vcn_activity,mem_usage
-Supported values for ``ROCPROFSYS_AMD_SMI_METRICS`` are: ``busy``, ``temp``, ``power``, ``vcn_activity``, ``mem_usage``, ``jpeg_activity``.
+Supported values for ``ROCPROFSYS_AMD_SMI_METRICS`` are: ``busy``, ``temp``, ``power``, ``vcn_activity``, ``mem_usage``, ``jpeg_activity``, ``xgmi``, ``pcie``.
API tracing is configured with the ``ROCPROFSYS_ROCM_DOMAINS`` setting. The domains are used to filter the events that are captured during profiling.
Supported values for this setting are those supported by ROCprofiler-SDK, which are returned by the API ``get_callback_tracing_names()`` and ``get_buffer_tracing_names()``. See the `ROCprofiler-SDK developer API documentation `_ to learn more about ROCprofiler-SDK APIs.
diff --git a/projects/rocprofiler-systems/docs/how-to/xgmi-pcie-sampling.rst b/projects/rocprofiler-systems/docs/how-to/xgmi-pcie-sampling.rst
new file mode 100644
index 0000000000..2c0033488c
--- /dev/null
+++ b/projects/rocprofiler-systems/docs/how-to/xgmi-pcie-sampling.rst
@@ -0,0 +1,173 @@
+.. meta::
+ :description: ROCm Systems Profiler XGMI and PCIe metrics sampling and monitoring
+ :keywords: rocprof-sys, rocprofiler-systems, ROCm, tips, how to, profiler, tracking, XGMI, PCIe, GPU connectivity, AMD
+
+***********************************************
+XGMI and PCIe metrics sampling and monitoring
+***********************************************
+
+`ROCm Systems Profiler `_ supports
+sampling of XGMI and PCIe interconnect metrics. It allows you to gather key performance metrics for
+GPU-to-GPU communication via XGMI links, and CPU-to-GPU communication via PCIe links. This information can be used
+to optimize multi-GPU workloads, identify communication bottlenecks, and analyze data transfer efficiency
+in high-performance computing applications.
+
+Sampling support
+=================
+
+Sampling of XGMI and PCIe interconnect metrics is supported by leveraging `AMD SMI `_ which provides the interface for GPU metric collection. Follow the steps:
+
+1. Set the ``ROCPROFSYS_USE_AMD_SMI`` environment variable to enable GPU metric collection:
+
+.. code-block:: shell
+
+ export ROCPROFSYS_USE_AMD_SMI=true
+
+2. Update the ``ROCPROFSYS_AMD_SMI_METRICS`` variable to collect the XGMI and PCIe metrics. The default value is:
+
+.. code-block:: shell
+
+ ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,mem_usage
+
+To include XGMI and PCIe metrics, update it to:
+
+.. code-block:: shell
+
+ ROCPROFSYS_AMD_SMI_METRICS=busy,temp,power,mem_usage,xgmi,pcie
+
+Alternatively, you can use the following to collect all available GPU metrics:
+
+.. code-block:: shell
+
+ ROCPROFSYS_AMD_SMI_METRICS=all
+
+XGMI metrics
+------------
+
+XGMI (AMD Infinity Fabricâ„¢ XGMI) provides high-bandwidth, low-latency GPU-to-GPU interconnects in multi-GPU systems. The following XGMI metrics are collected:
+
+- **XGMI Link Width**: The number of active XGMI links between GPUs
+- **XGMI Link Speed**: The speed of XGMI links (in GT/s)
+- **XGMI Read Data**: Accumulated data read through each XGMI link (in KB)
+- **XGMI Write Data**: Accumulated data written through each XGMI link (in KB)
+
+These metrics help identify GPU-to-GPU communication patterns and bandwidth utilization in multi-GPU workloads.
+
+.. note::
+
+ XGMI metrics are only available on systems with multiple GPUs connected via XGMI links.
+ The availability depends on the system topology and GPU architecture. If unsupported or not
+ available, the values will be reported as N/A in the output.
+
+PCIe metrics
+------------
+
+PCIe (PCI Express) provides the connection between the CPU and GPU. The following PCIe metrics are collected:
+
+- **PCIe Link Width**: The number of PCIe lanes currently active
+- **PCIe Link Speed**: The current PCIe link generation and speed (e.g., Gen3, Gen4, Gen5)
+- **PCIe Bandwidth Accumulated**: Total bandwidth accumulated over time (in MB)
+- **PCIe Bandwidth Instantaneous**: Instantaneous bandwidth at the time of sampling (in MB/s)
+
+These metrics help analyze CPU-to-GPU data transfer efficiency and identify PCIe bottlenecks.
+
+Using TransferBench for testing
+================================
+
+For testing and benchmarking GPU connectivity, you can use the `TransferBench `_.
+TransferBench is a benchmarking utility designed to measure the performance of simultaneous data transfers between user-specified devices, such as CPUs and GPUs.
+For this example, TransferBench is used to profile XGMI and PCIe traffic for analysis.
+
+1. Source the ROCm Systems Profiler Environment using:
+
+.. code-block:: shell
+
+ source /opt/rocprofiler-systems/share/rocprofiler-systems/setup-env.sh
+
+Alternatively, if you are using modules, use:
+
+.. code-block:: shell
+
+ module use /opt/rocprofiler-systems/share/modulefiles
+
+2. Generate and configure the profiler config file.
+
+.. code-block:: shell
+
+ rocprof-sys-avail -G $HOME/.rocprofsys.cfg -F txt
+ export ROCPROFSYS_CONFIG_FILE=$HOME/.rocprofsys.cfg
+
+Edit ``.rocprofsys.cfg`` with the following settings:
+
+.. code-block:: shell
+
+ ROCPROFSYS_USE_AMD_SMI = true
+ ROCPROFSYS_AMD_SMI_METRICS = busy,temp,power,mem_usage,xgmi,pcie
+ ROCPROFSYS_ROCM_DOMAINS = hip_runtime_api,memory_copy,hsa_api
+
+3. Profile the TransferBench application.
+
+.. code-block:: shell
+
+ rocprof-sys-sample -PTHD -- ./TransferBench a2a
+
+.. note::
+
+ Refer to these steps to `Install and build TransferBench `_.
+
+At the end of the run, a similar message appears::
+
+ [rocprofiler-systems][964294][perfetto]> Outputting '/home/demo/rocprofsys-transferBench-output/2025-04-25_15.52/perfetto-trace-964294.proto'
+ (3124.52 KB / 3.12 MB / 0.00 GB)... Done
+
+
+To view the generated ``.proto`` file in the browser, open the
+`Perfetto UI page `_. Then, click on
+``Open trace file`` and select the ``.proto`` file. In the browser, you can visualize the XGMI and PCIe metrics.
+
+.. image:: ../data/rocprof-sys-xgmi.png
+ :alt: Visualization of a performance graph in Perfetto with XGMI tracks
+
+.. image:: ../data/rocprof-sys-pcie.png
+ :alt: Visualization of a performance graph in Perfetto with PCIe tracks
+
+The visualization will show:
+
+- **XGMI Read Data** and **XGMI Write Data** tracks showing data transfer through XGMI links over time
+- **XGMI Link Width** and **XGMI Link Speed** tracks showing link configuration
+- **PCIe Bandwidth** tracks showing CPU-to-GPU data transfer rates
+- **PCIe Link Width** and **PCIe Link Speed** tracks showing PCIe link configuration
+
+
+Tips for effective profiling
+=============================
+
+1. **Multi-GPU workloads**: XGMI metrics are most useful when profiling applications that use multiple GPUs and transfer data between them.
+
+2. **Sampling frequency**: Adjust the sampling frequency using ``ROCPROFSYS_PROCESS_SAMPLING_FREQ`` (default is 50Hz) to capture more or fewer samples based on your analysis needs.
+
+3. **Focus on specific metrics**: If you only need XGMI or PCIe metrics, you can specify just those:
+
+ .. code-block:: shell
+
+ ROCPROFSYS_AMD_SMI_METRICS=xgmi # Only XGMI metrics
+ ROCPROFSYS_AMD_SMI_METRICS=pcie # Only PCIe metrics
+
+4. **Combine with API tracing**: For detailed analysis, combine XGMI/PCIe metrics with HIP/HSA API tracing to correlate data transfers with application behavior:
+
+ .. code-block:: shell
+
+ ROCPROFSYS_ROCM_DOMAINS=hip_runtime_api,memory_copy,kernel_dispatch,hsa_api
+
+Exploring available metrics
+============================
+
+To explore all supported metrics and domains, use the following commands:
+
+.. code-block:: shell
+
+ rocprof-sys-avail --all # Show all available options
+ rocprof-sys-avail -bd -r AMD_SMI_METRICS # Show AMD SMI metrics
+ rocprof-sys-avail -bd -r ROCM_DOMAINS # Show ROCm tracing domains
+
+For more details on ROCm Systems Profiler configuration, refer to the `configuration guide `_.
diff --git a/projects/rocprofiler-systems/docs/index.rst b/projects/rocprofiler-systems/docs/index.rst
index 3fe02eb55b..1e9e733fb2 100644
--- a/projects/rocprofiler-systems/docs/index.rst
+++ b/projects/rocprofiler-systems/docs/index.rst
@@ -41,6 +41,7 @@ profiling, how it supports performance analysis, and how to leverage its capabil
* :doc:`Profiling Python scripts <./how-to/profiling-python-scripts>`
* :doc:`Network performance profiling <./how-to/nic-profiling>`
* :doc:`VCN and JPEG sampling and tracing <./how-to/vcn-jpeg-sampling>`
+ * :doc:`XGMI and PCIe metrics monitoring <./how-to/xgmi-pcie-sampling>`
* :doc:`Understanding the output <./how-to/understanding-rocprof-sys-output>`
* :doc:`Using the ROCm Systems Profiler API <./how-to/using-rocprof-sys-api>`
diff --git a/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in b/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in
index c754e2d5be..bf9a02d3ad 100644
--- a/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in
+++ b/projects/rocprofiler-systems/docs/sphinx/_toc.yml.in
@@ -37,6 +37,8 @@ subtrees:
title: Network performance profiling
- file: how-to/vcn-jpeg-sampling.rst
title: VCN and JPEG sampling and tracing
+ - file: how-to/xgmi-pcie-sampling.rst
+ title: XGMI and PCIe metrics monitoring
- file: how-to/understanding-rocprof-sys-output.rst
title: Understanding the output
- file: how-to/using-rocprof-sys-api.rst