Files
vedithal-amd ae8f72fa79 [rocprofiler-compute] Use native tool for counter collection (#1212)
* Use native tool for counter collection

* Add native counter collection tool which uses rocprofiler-sdk C++
  library public API to get counter collection data
    * This is enabled by default, unless --no-native-tool option is
      provided or ROCPROF=rocprofv3 env. var. is provided
    * This tool is only supported for ROCm version >=7.x.x
    * This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
  t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
  collection and data collected by native tool is merged into the
  rocpd/csv output of rocprofiler-sdk tool

* Make `rocpd` choice the default choice for `--format-rocprof-output`
  option
    * If `rocpd` public API from rocprofiler-sdk library is not present,
      then fallback to `csv` choice
    * In this case only `pmc_perf.csv` is written in workload folder
      instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
  functions identical to `csv` option

* Rename option `--rocprofiler-sdk-library-path` to
  `--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
  rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object

* Fix the meaning of `--dispatch` option in `profile` mode to mention
  dispatch iteration filtering instead of dispatch id filtering
    * --dispatch option in analyze mode does dispatch id filtering

* Move standalone binary creation logic from cmake file to docker file

* fix native counter collection tool during attach/detach

* improve logging

* fix attach detach with native tool

* fix attach detach with native tool

* do not support attach/detach in native tool

* Update changelog

* add standalone binary creation functionality in cmake

* address review comments

* address review comments

* fix formatting

* address review comments

* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Updating changelog per docs team recommendations

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Apply suggestions from code review to changelog

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Do not required HIP complier to build native counter collection tool

* fix cmake

* gersemi formatting on latest cmake change

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* gersemi run for formatting

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* include cstdint for uint32_t

* Run formatting on helper.cpp

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr

* fix version

* fix changelog

* fix changelog

* run ruff formatter

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* fix rocprofiler-sdk attach so path

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-11-18 23:34:38 -05:00

216 lines
6.5 KiB
ReStructuredText

.. meta::
:description: ROCm Compute Profiler basic usage
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, AMD,
basics, usage, operations
***********
Basic usage
***********
The following section outlines basic ROCm Compute Profiler workflows, modes, options, and
operations.
Command line profiler
=====================
Launch and profile the target application using the command line profiler.
The command line profiler launches the target application, calls the
ROCProfiler API via the ``rocprof`` binary, and collects profile results for
the specified kernels, dispatches, and hardware components. If not
specified, ROCm Compute Profiler defaults to collecting all available counters for all
kernels and dispatches launched by the your executable.
To collect the default set of data for all kernels in the target
application, launch, for example:
.. code-block:: shell
$ rocprof-compute profile -n vcopy_data -- ./vcopy -n 1048576 -b 256
This runs the app, launches each kernel, and generates profiling results. By
default, results are written to a subdirectory with your accelerator's name;
for example, ``./workloads/vcopy_data/MI200/``, where name is configurable
via the ``-n`` argument.
.. note::
To collect all requested profile information, ROCm Compute Profiler might replay kernels
multiple times.
.. _basic-filter-data-collection:
Customize data collection
-------------------------
Options are available to specify for which kernels and metrics data should be
collected. Note that you can apply filtering in either the profiling or
analysis stage. Filtering at profiling collection often speeds up your
aggregate profiling run time.
Common filters to customize data collection include:
``-k``, ``--kernel``
Enables filtering kernels by name.
``-d``, ``--dispatch``
Enables filtering based on dispatch iteration.
``-b``, ``--block``
Enables collection metrics for only the specified analysis report blocks.
See :ref:`Filtering <filtering>` for an in-depth walkthrough.
To view available metrics by hardware block, use the ``--list-metrics``
option with a system architecture argument or ``--list-available-metrics``
to view the metrics for current system architecture:
.. code-block:: shell
$ rocprof-compute --list-metrics <sys_arch>
$ rocprof-compute profile --list-available-metrics
To view available aliases by hardware block, use the ``--list-blocks``
option with a system architecture argument
.. code-block:: shell
$ rocprof-compute --list-blocks <sys_arch>
.. _basic-analyze-cli:
Analyze in the command line
---------------------------
After generating a local output folder (for example,
``./workloads/vcopy_data/MI200``), use the command line tool to quickly
interface with profiling results. View different metrics derived from your
profiled results and get immediate access all metrics organized by hardware
blocks.
If you don't apply kernel, dispatch, or analysis report block filters at this stage,
analysis is reflective of the entirety of the profiling data.
To interact with profiling results from a different session, provide the
workload path.
``-p``, ``--path``
Enables you to analyze existing profiling data in the ROCm Compute Profiler CLI.
See :doc:`analyze/cli` for more detailed information.
.. _modes:
Modes
=====
Modes change the fundamental behavior of the ROCm Compute Profiler command line tool.
Depending on which mode you choose, different command line options become
available.
.. _modes-profile:
Profile mode
------------
``profile``
Launches the target application on the local system using
:doc:`ROCProfiler <rocprofiler:index>`. Depending on the profiling options
chosen, selected kernels, dispatches, and or hardware components used by the
application are profiled. It stores results locally in an output folder:
``./workloads/\<name>``.
.. code-block:: shell
$ rocprof-compute profile --help
See :doc:`profile/mode` to learn about this mode in depth and to get started
profiling with ROCm Compute Profiler.
.. _modes-analyze:
Analyze mode
------------
``analyze``
Loads profiling data from the ``--path`` (``-p``) directory into the ROCm Compute Profiler
CLI analyzer where you have immediate access to profiling results and
generated metrics. It generates metrics from the entirety of your profiled
application or a subset identified through the ROCm Compute Profiler CLI analysis filters.
To generate a lightweight GUI interface, you can add the ``--gui`` flag to your
analysis command.
.. code-block:: shell
$ rocprof-compute analyze --help
Analyze mode now supports a lightweight Text-based User Interface (TUI) that
provides an interactive terminal experience for enhanced usability. To enable TUI mode,
use the ``--tui`` flag when running the analyze command:
.. code-block:: shell
$ rocprof-compute analyze --tui
See :doc:`analyze/mode` to learn about these modes in depth and to get started
with analysis using ROCm Compute Profiler.
.. _global-options:
Global options
==============
The ROCm Compute Profiler command line tool has a set of *global* utility options that are
available across all modes.
``-v``, ``--version``
Prints the ROCm Compute Profiler version and exits.
``-V``, ``--verbose``
Increases output verbosity. Use multiple times for higher levels of
verbosity.
``-q``, ``--quiet``
Reduces output verbosity and runs quietly.
``-s``, ``--specs``
Prints system specs and exits.
.. note::
ROCm Compute Profiler also recognizes the project variable, ``ROCPROFCOMPUTE_COLOR`` should you
choose to disable colorful output. To disable default colorful behavior, set
this variable to ``0``.
.. _basic-operations:
Basic operations
================
The following table lists ROCm Compute Profiler's basic operations, their
:ref:`modes <modes>`, and required arguments.
.. list-table::
:header-rows: 1
* - Operation description
- Mode
- Required arguments
* - :doc:`Profile a workload </how-to/profile/mode>`
- ``profile``
- ``--name``, ``-- <profile_cmd>``
* - :ref:`Standalone roofline analysis <standalone-roofline>`
- ``profile``
- ``--name``, ``--roof-only``, ``--roofline-data-type <data_type>``, ``-- <profile_cmd>``
* - :doc:`Launch standalone GUI from CLI </how-to/analyze/standalone-gui>`
- ``analyze``
- ``--path``, ``--gui``
* - :doc:`Interact with profiling results from CLI </how-to/analyze/cli>`
- ``analyze``
- ``--path``