[Docs][rocprofv3]Add Consecutive Kernels Parameter Description to Docs (#1111)

* Add consecutive kernels parameter description

* remove space

* Updated docs and CHANGELOG
Этот коммит содержится в:
itrowbri
2025-09-29 11:21:13 -05:00
коммит произвёл GitHub
родитель 81775169cc
Коммит 956daca743
2 изменённых файлов: 21 добавлений и 0 удалений
+1
Просмотреть файл
@@ -195,6 +195,7 @@ Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projec
- Perfetto support for scratch memory.
- Support in the `rocprofv3` avail tool for command-line arguments.
- Documentation for `rocprofv3` advanced options.
- Support for multi dispatch ATT file added
### Changed
+20
Просмотреть файл
@@ -105,6 +105,14 @@ The following table lists the parameters relevant to thread tracing:
| att-gpu-index | Integer | | | Comma-separated list of integers. If enabled, only the GPU |
| | (List) | | | indexes in the list will be profiled by thread trace. |
+--------------------------+---------+---------+-----------+--------------------------------------------------------------+
| att-consecutive-kernels | Integer | >=0 | | Starting at the targeted kernel, enables thread trace for the|
| | | | | next N kernel dispatches, sharing a single ATT file, |
| | | | | stats.csv and UI dir. See --kernel-include-regex and |
| | | | | --kernel-iteration-range. If multiple targeted kernels |
| | | | | overlap, the count for N next dispatches starts again from 0.|
| | | | | Recommended use with --att-gpu-index due to thread trace |
| | | | | being enabled for all GPUs. |
+--------------------------+---------+---------+-----------+--------------------------------------------------------------+
For AMD Instinct accelerators, enable perfmon streaming using:
@@ -145,6 +153,18 @@ By default, ``rocprofv3`` enables thread trace only once per kernel instance. Th
To enable thread trace for multiple kernel instances, use the ``kernel-iteration-range`` parameter.
It's recommended to use ``kernel-include-regex`` parameter to filter the desired kernel names instead of tracing everything.
Typically, each kernel profile has its own ATT file output.
To compile multiple kernel profiles into a single output file, use the ``att-consecutive-kernels`` parameter.
When using this parameter, the ``rocprofv3`` tool begins profiling kernels after encountering a targeted kernel.
The tool then continues profiling subsequent kernels until a total of ``n`` kernels are profiled including the initial targeted kernel
where ``n`` is the non-negative integer passed to ``att-consecutive-kernels``.
Note that the subsequent kernels encountered after the initial targeted kernel do not themselves have to be targeted.
If the subsequent kernels are targeted kernels, the profiler will then profile another ``n - 1`` kernels after encountering this
new targeted kernel, so it is possible for a generated ATT file to have more than ``n`` kernels profiled.
All the profiled kernels are then compiled into a single ATT file.
If a new targeted kernel is encountered after the ``rocprofv3`` tool has finished profiling a batch of kernels,
the profiler will restart profiling when encountering this new targeted kernel and create another ATT file with multiple kernels.
.. _output-files:
rocprofv3 output files