[Docs] Update using-pc-sampling (#157)
这个提交包含在:
@@ -42,5 +42,4 @@
|
||||
|
||||
# VSCode Workspaces
|
||||
*.code-workspace
|
||||
rocprofiler-sdk-build/CMakeCache.txt
|
||||
/rocprofiler-sdk-build
|
||||
|
||||
@@ -63,7 +63,7 @@ To check the firmware versions, use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# To check PSP TOS Firmware:
|
||||
# To check PSP TOS Firmware:
|
||||
sudo cat /sys/kernel/debug/dri/0/amdgpu_firmware_info | grep SOS
|
||||
|
||||
# To check MEC Firmware:
|
||||
@@ -81,6 +81,8 @@ This generates two files, ``agent_info.csv`` and ``pc_sampling_host_trap.csv``.
|
||||
|
||||
Here are the contents of ``pc_sampling_host_trap.csv`` file generated for MatrixTranspose sample application:
|
||||
|
||||
.. _pc_sampling_host_trap:
|
||||
|
||||
.. csv-table:: PC sampling host trap
|
||||
:file: /data/pc_sampling_host_trap.csv
|
||||
:widths: 20,10,10,10,10,20
|
||||
@@ -201,50 +203,53 @@ The preceding command generates a JSON file with the comprehensive output. Here
|
||||
|
||||
For description of the fields in the JSON output, see :ref:`output-file-fields`.
|
||||
|
||||
An Arbitrary Host-Trap PC Sampling Skid
|
||||
===============================================
|
||||
Host-trap PC sampling and arbitrary sampling skid
|
||||
==================================================
|
||||
|
||||
Host-Trap PC sampling is a software-based technique that utilizes a background kernel thread
|
||||
to periodically interrupt running waves in order to capture the program counter (PC).
|
||||
Host-trap PC sampling is a software-based technique that utilizes a background kernel thread
|
||||
to periodically interrupt running waves to capture the program counter (PC).
|
||||
This method is effective for gathering performance data without requiring specialized hardware
|
||||
to snapshot the waves. However, it has limitations due to the potential delay between
|
||||
when a wave receives an interrupt and when it processes the interrupt to capture the PC.
|
||||
This delay can lead to a sampling skid, where the PC samples may be attributed to instructions
|
||||
to snapshot the waves. However, this method has certain limitations due to the potential delay between
|
||||
receiving and processing the interrupt by the wave to capture the PC.
|
||||
This delay can lead to a sampling skid, where the PC samples might be attributed to the instructions
|
||||
that are up to two instructions away from the actual source of latency.
|
||||
This results in a non-precise intra-kernel sampling method.
|
||||
|
||||
When analyzing an application profile generated by host-trap PC sampling,
|
||||
developers should consider not only the reported most costly instruction but
|
||||
it is important to consider not only the costliest reported instruction but
|
||||
also the instructions immediately preceding or following it.
|
||||
If the costly instruction is near a branch instruction, it is important
|
||||
to also consider the instruction targeted by the branch and the one immediately following it.
|
||||
If the costly instruction is near a branch instruction, it is important
|
||||
to consider the instruction targeted by the branch and the one immediately following it as well.
|
||||
|
||||
To address the limitations of host-trap sampling, the hardware-based stochastic PC sampling method
|
||||
has been developed. This method provides precise intra-kernel sampling with zero sampling skid,
|
||||
offering more accurate performance insights.
|
||||
|
||||
It is important to note that the skid issue inherent in host-trap PC sampling will not be resolved
|
||||
in its current form. Therefore, users are encouraged to adopt stochastic PC sampling,
|
||||
starting with the GFX942 architecture, to achieve more precise performance profiling.
|
||||
It is important to note that the skid issue inherent in host-trap PC sampling is not likely to be resolved
|
||||
in its current form. Therefore, to achieve more precise performance profiling, it is recommended to adopt stochastic PC sampling starting with the gfx942 architecture.
|
||||
|
||||
Hardware-Based (Stochastic) PC Sampling Method
|
||||
.. note::
|
||||
|
||||
Host-trap PC sampling is supported on AMD Instinct MI200, MI300, MI325, MI350, and MI355.
|
||||
|
||||
Hardware-based (stochastic) PC sampling method
|
||||
===============================================
|
||||
|
||||
The new ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` has been introduced for gfx942 architecture.
|
||||
It employs a specific hardware for probing waves actively running on GPU.
|
||||
Beside information already provided with ``ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP`` useful for determining hot-spots within the kernel,
|
||||
it delivers additional information that tells whether a sampled wave issued an instruction represented with particular PC.
|
||||
If not, it provides the reason for not issuing the instruction (stall reason).
|
||||
This type of information is particularly useful for understanding stalls during the kernel execution.
|
||||
The ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method has been introduced for the gfx942 architecture.
|
||||
It employs a specific hardware for probing waves actively running on the GPU.
|
||||
Besides the information already provided with ``ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP`` useful for
|
||||
determining hotspots within the kernel, this method delivers additional information, which helps to determine
|
||||
whether a sampled wave issued an instruction represented with the specified PC.
|
||||
If not, this method provides the reason for not issuing the instruction (stall reason).
|
||||
Such information is particularly useful for understanding stalls during kernel execution.
|
||||
|
||||
To use this method on gfx942, we recommend listing available PC sampling configurations to verify if the latest ROCm stack is installed
|
||||
on the system by running:
|
||||
To use this method on gfx942, it is recommended to list available PC sampling configurations to verify if the latest ROCm stack is installed on the system using:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 -L
|
||||
|
||||
Output similar to the following indicates that the ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method is available:
|
||||
An output similar to the following indicates that the ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method is available:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@@ -257,22 +262,30 @@ Output similar to the following indicates that the ``ROCPROFILER_PC_SAMPLING_MET
|
||||
Max_Interval :2147483648
|
||||
Flags :interval pow2
|
||||
|
||||
Please note that on gfx942, `ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC` requires intervals to be specified in cycles, whose values are powers of 2
|
||||
.. note::
|
||||
|
||||
To profile a gfx942 accelerated application with ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` PC sampling, one can use the following command:
|
||||
On gfx942, ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` requires intervals to be specified in cycles with values as powers of 2.
|
||||
|
||||
To profile an application with ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` PC sampling enabled on gfx942, use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-method stochastic --pc-sampling-unit cycles --pc-sampling-interval 1048576 --output-format csv, json -- <application_path>
|
||||
|
||||
The previous command serializes samples in both CSV and JSON output formats in the ``pc_sampling_stochastic.csv`` and ``out_results.json`` files, respectively.
|
||||
The preceding command serializes samples in both CSV and JSON output formats in the ``pc_sampling_stochastic.csv`` and ``out_results.json`` files, respectively.
|
||||
|
||||
Comparing the ``pc_sampling_stochastic.csv`` to ``pc_sampling_host_trap`` from previous section, one can notice that the ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method
|
||||
generates additional fields:
|
||||
- ``Wave_Issued_Instruction``: Indicates whether the wave issued an instruction (value 1) represented with particular PC or not (value 0)
|
||||
- ``Instruction_Type``: If the value of ``Wave_Issued_Instruction`` is 1, this fields indicates the type of the issued instruction. Otherwise, this fields irrelevant.
|
||||
- ``Stall_Reason``: If the value of ``Wave_Issued_Instruction`` is 0, this fields indicates the reason for not issuing the instruction (stall reason). Otherwise, this field is irrelevant.
|
||||
- ``Wave_Count``: Total number of waves actively running on a compute unit when the sample was generated.
|
||||
On comparing the :ref:`pc_sampling_stochastic.csv <pc_sampling_stochastic>` to :ref:`pc_sampling_host_trap.csv <pc_sampling_host_trap>`, you can notice that the ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method
|
||||
generates the following additional fields:
|
||||
|
||||
- ``Wave_Issued_Instruction``: Indicates whether the wave issued an instruction represented with the specified PC. Value = 1 for yes and 0 for no.
|
||||
|
||||
- ``Instruction_Type``: If the value of ``Wave_Issued_Instruction`` is 1, this field indicates the type of the issued instruction. Otherwise, this field remains irrelevant.
|
||||
|
||||
- ``Stall_Reason``: If the value of ``Wave_Issued_Instruction`` is 0, this field indicates the reason for not issuing the instruction (stall reason). Otherwise, this field remains irrelevant.
|
||||
|
||||
- ``Wave_Count``: Total number of waves actively running on a compute unit when the sample is generated.
|
||||
|
||||
.. _pc_sampling_stochastic:
|
||||
|
||||
.. csv-table:: PC sampling stochastic with debug symbols
|
||||
:file: /data/pc_sampling_stochastic_debug.csv
|
||||
@@ -280,7 +293,8 @@ generates additional fields:
|
||||
:header-rows: 1
|
||||
|
||||
Similarly, ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method delivers additional information to every sample in the JSON output.
|
||||
The following snippet shows one sample from ``out_results.json`` file.
|
||||
|
||||
Here is a ``out_results.json`` file sample:
|
||||
|
||||
.. code-block:: text
|
||||
|
||||
@@ -351,7 +365,10 @@ The following snippet shows one sample from ``out_results.json`` file.
|
||||
},
|
||||
|
||||
Fields starting with ``arb_state_`` are of particular interest as they indicate the state of the arbiter at the time of sampling.
|
||||
Namely, ``arb_state_issue_`` fields indicate what type of instructions arbiter issued at the time of sampling.
|
||||
On the other hand, ``arb_state_stall_`` fields indicate what type of instructions were stalled at the time of sampling.
|
||||
For example, ``arb_state_issue_`` fields indicate the type of instructions issued by the arbiter at the time of sampling.
|
||||
On the other hand, ``arb_state_stall_`` fields indicate the type of instructions stalled at the time of sampling.
|
||||
This information is useful for understanding how many instructions per cycle (IPC) are issued.
|
||||
|
||||
.. note::
|
||||
|
||||
The stochastic PC sampling is supported on AMD Instinct MI300, MI325, MI350, and MI355.
|
||||
|
||||
在新工单中引用
屏蔽一个用户