diff --git a/source/docs/how-to/using-pc-sampling.rst b/source/docs/how-to/using-pc-sampling.rst index 203ad72dfb..42c95d7734 100644 --- a/source/docs/how-to/using-pc-sampling.rst +++ b/source/docs/how-to/using-pc-sampling.rst @@ -201,6 +201,31 @@ The preceding command generates a JSON file with the comprehensive output. Here For description of the fields in the JSON output, see :ref:`output-file-fields`. +An Arbitrary Host-Trap PC Sampling Skid +=============================================== + +Host-Trap PC sampling is a software-based technique that utilizes a background kernel thread +to periodically interrupt running waves in order to capture the program counter (PC). +This method is effective for gathering performance data without requiring specialized hardware +to snapshot the waves. However, it has limitations due to the potential delay between +when a wave receives an interrupt and when it processes the interrupt to capture the PC. +This delay can lead to a sampling skid, where the PC samples may be attributed to instructions +that are up to two instructions away from the actual source of latency. +This results in a non-precise intra-kernel sampling method. + +When analyzing an application profile generated by host-trap PC sampling, +developers should consider not only the reported most costly instruction but +also the instructions immediately preceding or following it. +If the costly instruction is near a branch instruction, it is important +to also consider the instruction targeted by the branch and the one immediately following it. + +To address the limitations of host-trap sampling, the hardware-based stochastic PC sampling method +has been developed. This method provides precise intra-kernel sampling with zero sampling skid, +offering more accurate performance insights. + +It is important to note that the skid issue inherent in host-trap PC sampling will not be resolved +in its current form. Therefore, users are encouraged to adopt stochastic PC sampling, +starting with the GFX942 architecture, to achieve more precise performance profiling. Hardware-Based (Stochastic) PC Sampling Method =============================================== @@ -329,3 +354,4 @@ Fields starting with ``arb_state_`` are of particular interest as they indicate Namely, ``arb_state_issue_`` fields indicate what type of instructions arbiter issued at the time of sampling. On the other hand, ``arb_state_stall_`` fields indicate what type of instructions were stalled at the time of sampling. This information is useful for understanding how many instructions per cycle (IPC) are issued. +