Files
rocm-systems/source/docs/how-to/using-pc-sampling.rst
T
Bhardwaj, Gopesh cdf22eba7d Adding pc sampling how to guide (#160)
* Adding pc sampling how to guide

* doc update

* Fixing indentation

* updating index

* udpating doc

* updating doc

* Added field information

* Fixing Formatting

* fix formatting error

* Added json format for pc sampling

* feedback resolved

* formatting for text

* PC Sampling API doc

* Reformatted

* Note for shared systems

* update docs

* correcting relative path for cross-referencing

---------

Co-authored-by: vlaindic_amdeng <vladimir.indic@amd.com>
2025-02-10 20:33:05 -06:00

179 строки
6.2 KiB
ReStructuredText

.. meta::
:description: Documentation of the usage of pc-sampling with rocprofv3 command-line tool
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, rocprofv3 tool usage, Using rocprofv3, ROCprofiler-SDK command line tool, PC sampling
.. _using-pc-sampling:
======================
Using ``pc-sampling``
======================
PC (Program Counter) Sampling service for GPU profiling is a profiling technique that periodically samples the program counter during GPU kernel execution to understand code execution patterns and hotspots.
This helps in:
- Identifying performance bottlenecks
- Understanding kernel execution behavior
- Analyzing code coverage
- Finding heavily executed code paths
To try out the PC sampling feature, you can use the rocprofv3 command-line tool or the rocprofiler SDK library on `ROCm 6.4` or later.
.. note::
PC sampling is supported on AMD GPUs with gfx90a and later architectures. Before using the PC sampling feature, ensure that the GPU supports it.
PC Sampling availability and Configuration
==========================================
To check if the GPU supports PC sampling, use the following command:
.. code-block:: bash
rocprofv3 -L
OR
.. code-block:: bash
rocprofv3 --list-avail
The output will list if `rocprofv3` supports PC sampling on the GPU and what configuration is supported.
.. code-block:: bash
List available PC Sample Configurations for node_id 11
Method: ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP
Unit: ROCPROFILER_PC_SAMPLING_UNIT_TIME
Minimum_Interval: 1
Maximum_Interval: 18446744073709551615
The above output shows that the GPU supports PC sampling with the ``ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP`` method and the ``ROCPROFILER_PC_SAMPLING_UNIT_TIME`` unit. The minimum and maximum intervals are also displayed.
Based on the above configuration, you can use the following command to profile the application using PC sampling:
.. code-block:: bash
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-method host_trap --pc-sampling-unit time --pc-sampling-interval 1 -- <application_path>
The above command enables PC sampling with the `host_trap` method, `time` unit, and an interval of `1` us(micro second). Replace `<application_path>` with the path to the application you want to profile.
This will generate 2 files. ``agent_info.csv`` and ``pc_sampling_host_trap.csv``. Both files are prefixed with file prefixed with the process ID.
Here is the output of pc-sampling for the `MatrixTranspose` sample application:
Here are the contents of ``pc_sampling_host_trap.csv`` file:
.. csv-table:: PC sampling host trap
:file: /data/pc_sampling_host_trap.csv
:widths: 20,10,10,10,10,20
:header-rows: 1
For the description of the fields in the output file, see :ref:`pc-sampling-fields`.
If you noticed ``Instruction_Comment`` field in the output file was empty. It is recommended to compile your application with debug symbols to populate this field.
It maps back to the source line if debug symbols were enabled when the application was compiled. This helps in understanding the code execution pattern and hotspots.
.. csv-table:: PC sampling host trap with debug symbols
:file: /data/pc_sampling_host_trap_debug.csv
:widths: 20,10,10,10,10,20
:header-rows: 1
The above output shows the `Instruction_Comment` field populated with the source line information.
.. _pc-sampling-fields:
PC Sampling Fields:
===================
The output file generated by PC sampling contains the following fields:
- ``Sample_Timestamp``: Timestamp when sample is generated
- ``Exec_Mask``: Active SIMD lanes when sampled
- ``Dispatch_Id``: Originating kernel dispatch ID
- ``Instruction``: Assembly instruction e.g: ``s_load_dword s8, s[1:2], 0x10``
- ``Instruction_Comment``: Instruction comment (Maps back to source-line if debug symbols were enabled when application was compiled)
- ``Correlation_Id``: API launch call id that matches dispatch ID
By default the output file is in CSV format. To dump samples in a more comprehensive format, one can use JSON through `--output-format json`.
.. code-block:: bash
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-method host_trap --pc-sampling-unit time --pc-sampling-interval 1 --output-format json -- <application_path>
This will generate a JSON file with the comprehensive output. Here is a trimmed down output with multiple records:
.. code-block:: text
{
"pc_sample_host_trap": [
{
"record": {
"hw_id": {
"chiplet": 0,
"wave_id": 0,
"simd_id": 2,
"pipe_id": 0,
"cu_or_wgp_id": 1,
"shader_array_id": 0,
"shader_engine_id": 2,
"workgroup_id": 0,
"vm_id": 3,
"queue_id": 2,
"microengine_id": 1
},
"pc": {
"code_object_id": 1,
"code_object_offset": 20228
},
"exec_mask": 18446744073709551615,
"timestamp": 51040126667689,
"dispatch_id": 1,
"corr_id": {
"internal": 1,
"external": 0
},
"wrkgrp_id": {
"x": 182,
"y": 0,
"z": 0
},
"wave_in_grp": 1
},
"inst_index": 0
},
{
"record": {
"hw_id": {
"chiplet": 0,
"wave_id": 0,
"simd_id": 2,
"pipe_id": 0,
"cu_or_wgp_id": 0,
"shader_array_id": 0,
"shader_engine_id": 2,
"workgroup_id": 0,
"vm_id": 3,
"queue_id": 2,
"microengine_id": 1
},
"pc": {
"code_object_id": 1,
"code_object_offset": 20236
},
"exec_mask": 18446744073709551615,
"timestamp": 51040126667689,
"dispatch_id": 1,
"corr_id": {
"internal": 1,
"external": 0
},
"wrkgrp_id": {
"x": 158,
"y": 0,
"z": 0
},
"wave_in_grp": 2
},
"inst_index": 1
}
]
}
The description of the fields in the JSON output is available in the :ref:`output-file-fields`.