[Host-Trap PC Sampling] Host-Trap PC sampling an introduce an arbitrary sampling skid of [0, 2] instructions (#515)
* Arbitrary host-trap sampling skid (doc)
The host-trap PC sampling might introduce a skid of [0, 2]
instructions. We documented this information and provides
some advice to application developers how to find
hot-spots in the profiles generated by host-trap sampling.
[ROCm/rocprofiler-sdk commit: 650d35bdaa]
Этот коммит содержится в:
коммит произвёл
GitHub
родитель
f96cafaa60
Коммит
d5aba741f3
@@ -201,6 +201,31 @@ The preceding command generates a JSON file with the comprehensive output. Here
|
||||
|
||||
For description of the fields in the JSON output, see :ref:`output-file-fields`.
|
||||
|
||||
An Arbitrary Host-Trap PC Sampling Skid
|
||||
===============================================
|
||||
|
||||
Host-Trap PC sampling is a software-based technique that utilizes a background kernel thread
|
||||
to periodically interrupt running waves in order to capture the program counter (PC).
|
||||
This method is effective for gathering performance data without requiring specialized hardware
|
||||
to snapshot the waves. However, it has limitations due to the potential delay between
|
||||
when a wave receives an interrupt and when it processes the interrupt to capture the PC.
|
||||
This delay can lead to a sampling skid, where the PC samples may be attributed to instructions
|
||||
that are up to two instructions away from the actual source of latency.
|
||||
This results in a non-precise intra-kernel sampling method.
|
||||
|
||||
When analyzing an application profile generated by host-trap PC sampling,
|
||||
developers should consider not only the reported most costly instruction but
|
||||
also the instructions immediately preceding or following it.
|
||||
If the costly instruction is near a branch instruction, it is important
|
||||
to also consider the instruction targeted by the branch and the one immediately following it.
|
||||
|
||||
To address the limitations of host-trap sampling, the hardware-based stochastic PC sampling method
|
||||
has been developed. This method provides precise intra-kernel sampling with zero sampling skid,
|
||||
offering more accurate performance insights.
|
||||
|
||||
It is important to note that the skid issue inherent in host-trap PC sampling will not be resolved
|
||||
in its current form. Therefore, users are encouraged to adopt stochastic PC sampling,
|
||||
starting with the GFX942 architecture, to achieve more precise performance profiling.
|
||||
|
||||
Hardware-Based (Stochastic) PC Sampling Method
|
||||
===============================================
|
||||
@@ -329,3 +354,4 @@ Fields starting with ``arb_state_`` are of particular interest as they indicate
|
||||
Namely, ``arb_state_issue_`` fields indicate what type of instructions arbiter issued at the time of sampling.
|
||||
On the other hand, ``arb_state_stall_`` fields indicate what type of instructions were stalled at the time of sampling.
|
||||
This information is useful for understanding how many instructions per cycle (IPC) are issued.
|
||||
|
||||
|
||||
Ссылка в новой задаче
Block a user