cdf22eba7d
* Adding pc sampling how to guide * doc update * Fixing indentation * updating index * udpating doc * updating doc * Added field information * Fixing Formatting * fix formatting error * Added json format for pc sampling * feedback resolved * formatting for text * PC Sampling API doc * Reformatted * Note for shared systems * update docs * correcting relative path for cross-referencing --------- Co-authored-by: vlaindic_amdeng <vladimir.indic@amd.com>
179 строки
6.2 KiB
ReStructuredText
179 строки
6.2 KiB
ReStructuredText
.. meta::
|
|
:description: Documentation of the usage of pc-sampling with rocprofv3 command-line tool
|
|
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, rocprofv3 tool usage, Using rocprofv3, ROCprofiler-SDK command line tool, PC sampling
|
|
|
|
.. _using-pc-sampling:
|
|
|
|
======================
|
|
Using ``pc-sampling``
|
|
======================
|
|
|
|
PC (Program Counter) Sampling service for GPU profiling is a profiling technique that periodically samples the program counter during GPU kernel execution to understand code execution patterns and hotspots.
|
|
This helps in:
|
|
- Identifying performance bottlenecks
|
|
- Understanding kernel execution behavior
|
|
- Analyzing code coverage
|
|
- Finding heavily executed code paths
|
|
|
|
To try out the PC sampling feature, you can use the rocprofv3 command-line tool or the rocprofiler SDK library on `ROCm 6.4` or later.
|
|
|
|
.. note::
|
|
PC sampling is supported on AMD GPUs with gfx90a and later architectures. Before using the PC sampling feature, ensure that the GPU supports it.
|
|
|
|
PC Sampling availability and Configuration
|
|
==========================================
|
|
|
|
To check if the GPU supports PC sampling, use the following command:
|
|
|
|
.. code-block:: bash
|
|
|
|
rocprofv3 -L
|
|
|
|
OR
|
|
|
|
.. code-block:: bash
|
|
|
|
rocprofv3 --list-avail
|
|
|
|
The output will list if `rocprofv3` supports PC sampling on the GPU and what configuration is supported.
|
|
|
|
.. code-block:: bash
|
|
|
|
List available PC Sample Configurations for node_id 11
|
|
Method: ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP
|
|
Unit: ROCPROFILER_PC_SAMPLING_UNIT_TIME
|
|
Minimum_Interval: 1
|
|
Maximum_Interval: 18446744073709551615
|
|
|
|
The above output shows that the GPU supports PC sampling with the ``ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP`` method and the ``ROCPROFILER_PC_SAMPLING_UNIT_TIME`` unit. The minimum and maximum intervals are also displayed.
|
|
|
|
Based on the above configuration, you can use the following command to profile the application using PC sampling:
|
|
|
|
.. code-block:: bash
|
|
|
|
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-method host_trap --pc-sampling-unit time --pc-sampling-interval 1 -- <application_path>
|
|
|
|
The above command enables PC sampling with the `host_trap` method, `time` unit, and an interval of `1` us(micro second). Replace `<application_path>` with the path to the application you want to profile.
|
|
|
|
This will generate 2 files. ``agent_info.csv`` and ``pc_sampling_host_trap.csv``. Both files are prefixed with file prefixed with the process ID.
|
|
Here is the output of pc-sampling for the `MatrixTranspose` sample application:
|
|
|
|
Here are the contents of ``pc_sampling_host_trap.csv`` file:
|
|
|
|
.. csv-table:: PC sampling host trap
|
|
:file: /data/pc_sampling_host_trap.csv
|
|
:widths: 20,10,10,10,10,20
|
|
:header-rows: 1
|
|
|
|
For the description of the fields in the output file, see :ref:`pc-sampling-fields`.
|
|
|
|
If you noticed ``Instruction_Comment`` field in the output file was empty. It is recommended to compile your application with debug symbols to populate this field.
|
|
It maps back to the source line if debug symbols were enabled when the application was compiled. This helps in understanding the code execution pattern and hotspots.
|
|
|
|
.. csv-table:: PC sampling host trap with debug symbols
|
|
:file: /data/pc_sampling_host_trap_debug.csv
|
|
:widths: 20,10,10,10,10,20
|
|
:header-rows: 1
|
|
|
|
The above output shows the `Instruction_Comment` field populated with the source line information.
|
|
|
|
.. _pc-sampling-fields:
|
|
|
|
PC Sampling Fields:
|
|
===================
|
|
The output file generated by PC sampling contains the following fields:
|
|
|
|
- ``Sample_Timestamp``: Timestamp when sample is generated
|
|
- ``Exec_Mask``: Active SIMD lanes when sampled
|
|
- ``Dispatch_Id``: Originating kernel dispatch ID
|
|
- ``Instruction``: Assembly instruction e.g: ``s_load_dword s8, s[1:2], 0x10``
|
|
- ``Instruction_Comment``: Instruction comment (Maps back to source-line if debug symbols were enabled when application was compiled)
|
|
- ``Correlation_Id``: API launch call id that matches dispatch ID
|
|
|
|
By default the output file is in CSV format. To dump samples in a more comprehensive format, one can use JSON through `--output-format json`.
|
|
|
|
.. code-block:: bash
|
|
|
|
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-method host_trap --pc-sampling-unit time --pc-sampling-interval 1 --output-format json -- <application_path>
|
|
|
|
This will generate a JSON file with the comprehensive output. Here is a trimmed down output with multiple records:
|
|
|
|
.. code-block:: text
|
|
|
|
{
|
|
"pc_sample_host_trap": [
|
|
{
|
|
"record": {
|
|
"hw_id": {
|
|
"chiplet": 0,
|
|
"wave_id": 0,
|
|
"simd_id": 2,
|
|
"pipe_id": 0,
|
|
"cu_or_wgp_id": 1,
|
|
"shader_array_id": 0,
|
|
"shader_engine_id": 2,
|
|
"workgroup_id": 0,
|
|
"vm_id": 3,
|
|
"queue_id": 2,
|
|
"microengine_id": 1
|
|
},
|
|
"pc": {
|
|
"code_object_id": 1,
|
|
"code_object_offset": 20228
|
|
},
|
|
"exec_mask": 18446744073709551615,
|
|
"timestamp": 51040126667689,
|
|
"dispatch_id": 1,
|
|
"corr_id": {
|
|
"internal": 1,
|
|
"external": 0
|
|
},
|
|
"wrkgrp_id": {
|
|
"x": 182,
|
|
"y": 0,
|
|
"z": 0
|
|
},
|
|
"wave_in_grp": 1
|
|
},
|
|
"inst_index": 0
|
|
},
|
|
{
|
|
"record": {
|
|
"hw_id": {
|
|
"chiplet": 0,
|
|
"wave_id": 0,
|
|
"simd_id": 2,
|
|
"pipe_id": 0,
|
|
"cu_or_wgp_id": 0,
|
|
"shader_array_id": 0,
|
|
"shader_engine_id": 2,
|
|
"workgroup_id": 0,
|
|
"vm_id": 3,
|
|
"queue_id": 2,
|
|
"microengine_id": 1
|
|
},
|
|
"pc": {
|
|
"code_object_id": 1,
|
|
"code_object_offset": 20236
|
|
},
|
|
"exec_mask": 18446744073709551615,
|
|
"timestamp": 51040126667689,
|
|
"dispatch_id": 1,
|
|
"corr_id": {
|
|
"internal": 1,
|
|
"external": 0
|
|
},
|
|
"wrkgrp_id": {
|
|
"x": 158,
|
|
"y": 0,
|
|
"z": 0
|
|
},
|
|
"wave_in_grp": 2
|
|
},
|
|
"inst_index": 1
|
|
}
|
|
]
|
|
}
|
|
|
|
The description of the fields in the JSON output is available in the :ref:`output-file-fields`.
|