[Docs][rocprofv3]Add Consecutive Kernels Parameter Description to Docs (#1111)

* Add consecutive kernels parameter description

* remove space

* Updated docs and CHANGELOG
This commit is contained in:
itrowbri
2025-09-29 11:21:13 -05:00
committed by GitHub
vanhempi 81775169cc
commit 956daca743
2 muutettua tiedostoa jossa 21 lisäystä ja 0 poistoa
@@ -105,6 +105,14 @@ The following table lists the parameters relevant to thread tracing:
| att-gpu-index | Integer | | | Comma-separated list of integers. If enabled, only the GPU |
| | (List) | | | indexes in the list will be profiled by thread trace. |
+--------------------------+---------+---------+-----------+--------------------------------------------------------------+
| att-consecutive-kernels | Integer | >=0 | | Starting at the targeted kernel, enables thread trace for the|
| | | | | next N kernel dispatches, sharing a single ATT file, |
| | | | | stats.csv and UI dir. See --kernel-include-regex and |
| | | | | --kernel-iteration-range. If multiple targeted kernels |
| | | | | overlap, the count for N next dispatches starts again from 0.|
| | | | | Recommended use with --att-gpu-index due to thread trace |
| | | | | being enabled for all GPUs. |
+--------------------------+---------+---------+-----------+--------------------------------------------------------------+
For AMD Instinct accelerators, enable perfmon streaming using:
@@ -145,6 +153,18 @@ By default, ``rocprofv3`` enables thread trace only once per kernel instance. Th
To enable thread trace for multiple kernel instances, use the ``kernel-iteration-range`` parameter.
It's recommended to use ``kernel-include-regex`` parameter to filter the desired kernel names instead of tracing everything.
Typically, each kernel profile has its own ATT file output.
To compile multiple kernel profiles into a single output file, use the ``att-consecutive-kernels`` parameter.
When using this parameter, the ``rocprofv3`` tool begins profiling kernels after encountering a targeted kernel.
The tool then continues profiling subsequent kernels until a total of ``n`` kernels are profiled including the initial targeted kernel
where ``n`` is the non-negative integer passed to ``att-consecutive-kernels``.
Note that the subsequent kernels encountered after the initial targeted kernel do not themselves have to be targeted.
If the subsequent kernels are targeted kernels, the profiler will then profile another ``n - 1`` kernels after encountering this
new targeted kernel, so it is possible for a generated ATT file to have more than ``n`` kernels profiled.
All the profiled kernels are then compiled into a single ATT file.
If a new targeted kernel is encountered after the ``rocprofv3`` tool has finished profiling a batch of kernels,
the profiler will restart profiling when encountering this new targeted kernel and create another ATT file with multiple kernels.
.. _output-files:
rocprofv3 output files