From 956daca743203ea75fe67ba48cc3b3d5e5124ffb Mon Sep 17 00:00:00 2001 From: itrowbri Date: Mon, 29 Sep 2025 11:21:13 -0500 Subject: [PATCH] [Docs][rocprofv3]Add Consecutive Kernels Parameter Description to Docs (#1111) * Add consecutive kernels parameter description * remove space * Updated docs and CHANGELOG --- projects/rocprofiler-sdk/CHANGELOG.md | 1 + .../source/docs/how-to/using-thread-trace.rst | 20 +++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/projects/rocprofiler-sdk/CHANGELOG.md b/projects/rocprofiler-sdk/CHANGELOG.md index 05e5cf7046..af0f98614d 100644 --- a/projects/rocprofiler-sdk/CHANGELOG.md +++ b/projects/rocprofiler-sdk/CHANGELOG.md @@ -195,6 +195,7 @@ Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projec - Perfetto support for scratch memory. - Support in the `rocprofv3` avail tool for command-line arguments. - Documentation for `rocprofv3` advanced options. +- Support for multi dispatch ATT file added ### Changed diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst index c78c0e0c2d..aa729c6de3 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst @@ -105,6 +105,14 @@ The following table lists the parameters relevant to thread tracing: | att-gpu-index | Integer | | | Comma-separated list of integers. If enabled, only the GPU | | | (List) | | | indexes in the list will be profiled by thread trace. | +--------------------------+---------+---------+-----------+--------------------------------------------------------------+ +| att-consecutive-kernels | Integer | >=0 | | Starting at the targeted kernel, enables thread trace for the| +| | | | | next N kernel dispatches, sharing a single ATT file, | +| | | | | stats.csv and UI dir. See --kernel-include-regex and | +| | | | | --kernel-iteration-range. If multiple targeted kernels | +| | | | | overlap, the count for N next dispatches starts again from 0.| +| | | | | Recommended use with --att-gpu-index due to thread trace | +| | | | | being enabled for all GPUs. | ++--------------------------+---------+---------+-----------+--------------------------------------------------------------+ For AMD Instinct accelerators, enable perfmon streaming using: @@ -145,6 +153,18 @@ By default, ``rocprofv3`` enables thread trace only once per kernel instance. Th To enable thread trace for multiple kernel instances, use the ``kernel-iteration-range`` parameter. It's recommended to use ``kernel-include-regex`` parameter to filter the desired kernel names instead of tracing everything. +Typically, each kernel profile has its own ATT file output. +To compile multiple kernel profiles into a single output file, use the ``att-consecutive-kernels`` parameter. +When using this parameter, the ``rocprofv3`` tool begins profiling kernels after encountering a targeted kernel. +The tool then continues profiling subsequent kernels until a total of ``n`` kernels are profiled including the initial targeted kernel +where ``n`` is the non-negative integer passed to ``att-consecutive-kernels``. +Note that the subsequent kernels encountered after the initial targeted kernel do not themselves have to be targeted. +If the subsequent kernels are targeted kernels, the profiler will then profile another ``n - 1`` kernels after encountering this +new targeted kernel, so it is possible for a generated ATT file to have more than ``n`` kernels profiled. +All the profiled kernels are then compiled into a single ATT file. +If a new targeted kernel is encountered after the ``rocprofv3`` tool has finished profiling a batch of kernels, +the profiler will restart profiling when encountering this new targeted kernel and create another ATT file with multiple kernels. + .. _output-files: rocprofv3 output files