ae8f72fa79
* Use native tool for counter collection
* Add native counter collection tool which uses rocprofiler-sdk C++
library public API to get counter collection data
* This is enabled by default, unless --no-native-tool option is
provided or ROCPROF=rocprofv3 env. var. is provided
* This tool is only supported for ROCm version >=7.x.x
* This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
collection and data collected by native tool is merged into the
rocpd/csv output of rocprofiler-sdk tool
* Make `rocpd` choice the default choice for `--format-rocprof-output`
option
* If `rocpd` public API from rocprofiler-sdk library is not present,
then fallback to `csv` choice
* In this case only `pmc_perf.csv` is written in workload folder
instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
functions identical to `csv` option
* Rename option `--rocprofiler-sdk-library-path` to
`--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object
* Fix the meaning of `--dispatch` option in `profile` mode to mention
dispatch iteration filtering instead of dispatch id filtering
* --dispatch option in analyze mode does dispatch id filtering
* Move standalone binary creation logic from cmake file to docker file
* fix native counter collection tool during attach/detach
* improve logging
* fix attach detach with native tool
* fix attach detach with native tool
* do not support attach/detach in native tool
* Update changelog
* add standalone binary creation functionality in cmake
* address review comments
* address review comments
* fix formatting
* address review comments
* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Updating changelog per docs team recommendations
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Apply suggestions from code review to changelog
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* Do not required HIP complier to build native counter collection tool
* fix cmake
* gersemi formatting on latest cmake change
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* gersemi run for formatting
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* include cstdint for uint32_t
* Run formatting on helper.cpp
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr
* fix version
* fix changelog
* fix changelog
* run ruff formatter
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* fix rocprofiler-sdk attach so path
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
216 lines
6.5 KiB
ReStructuredText
216 lines
6.5 KiB
ReStructuredText
.. meta::
|
|
:description: ROCm Compute Profiler basic usage
|
|
:keywords: ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, AMD,
|
|
basics, usage, operations
|
|
|
|
***********
|
|
Basic usage
|
|
***********
|
|
|
|
The following section outlines basic ROCm Compute Profiler workflows, modes, options, and
|
|
operations.
|
|
|
|
Command line profiler
|
|
=====================
|
|
|
|
Launch and profile the target application using the command line profiler.
|
|
|
|
The command line profiler launches the target application, calls the
|
|
ROCProfiler API via the ``rocprof`` binary, and collects profile results for
|
|
the specified kernels, dispatches, and hardware components. If not
|
|
specified, ROCm Compute Profiler defaults to collecting all available counters for all
|
|
kernels and dispatches launched by the your executable.
|
|
|
|
To collect the default set of data for all kernels in the target
|
|
application, launch, for example:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ rocprof-compute profile -n vcopy_data -- ./vcopy -n 1048576 -b 256
|
|
|
|
This runs the app, launches each kernel, and generates profiling results. By
|
|
default, results are written to a subdirectory with your accelerator's name;
|
|
for example, ``./workloads/vcopy_data/MI200/``, where name is configurable
|
|
via the ``-n`` argument.
|
|
|
|
.. note::
|
|
|
|
To collect all requested profile information, ROCm Compute Profiler might replay kernels
|
|
multiple times.
|
|
|
|
.. _basic-filter-data-collection:
|
|
|
|
Customize data collection
|
|
-------------------------
|
|
|
|
Options are available to specify for which kernels and metrics data should be
|
|
collected. Note that you can apply filtering in either the profiling or
|
|
analysis stage. Filtering at profiling collection often speeds up your
|
|
aggregate profiling run time.
|
|
|
|
Common filters to customize data collection include:
|
|
|
|
``-k``, ``--kernel``
|
|
Enables filtering kernels by name.
|
|
|
|
``-d``, ``--dispatch``
|
|
Enables filtering based on dispatch iteration.
|
|
|
|
``-b``, ``--block``
|
|
Enables collection metrics for only the specified analysis report blocks.
|
|
|
|
See :ref:`Filtering <filtering>` for an in-depth walkthrough.
|
|
|
|
To view available metrics by hardware block, use the ``--list-metrics``
|
|
option with a system architecture argument or ``--list-available-metrics``
|
|
to view the metrics for current system architecture:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ rocprof-compute --list-metrics <sys_arch>
|
|
$ rocprof-compute profile --list-available-metrics
|
|
|
|
To view available aliases by hardware block, use the ``--list-blocks``
|
|
option with a system architecture argument
|
|
|
|
.. code-block:: shell
|
|
|
|
$ rocprof-compute --list-blocks <sys_arch>
|
|
|
|
.. _basic-analyze-cli:
|
|
|
|
Analyze in the command line
|
|
---------------------------
|
|
|
|
After generating a local output folder (for example,
|
|
``./workloads/vcopy_data/MI200``), use the command line tool to quickly
|
|
interface with profiling results. View different metrics derived from your
|
|
profiled results and get immediate access all metrics organized by hardware
|
|
blocks.
|
|
|
|
If you don't apply kernel, dispatch, or analysis report block filters at this stage,
|
|
analysis is reflective of the entirety of the profiling data.
|
|
|
|
To interact with profiling results from a different session, provide the
|
|
workload path.
|
|
|
|
``-p``, ``--path``
|
|
Enables you to analyze existing profiling data in the ROCm Compute Profiler CLI.
|
|
|
|
See :doc:`analyze/cli` for more detailed information.
|
|
|
|
.. _modes:
|
|
|
|
Modes
|
|
=====
|
|
|
|
Modes change the fundamental behavior of the ROCm Compute Profiler command line tool.
|
|
Depending on which mode you choose, different command line options become
|
|
available.
|
|
|
|
.. _modes-profile:
|
|
|
|
Profile mode
|
|
------------
|
|
|
|
``profile``
|
|
Launches the target application on the local system using
|
|
:doc:`ROCProfiler <rocprofiler:index>`. Depending on the profiling options
|
|
chosen, selected kernels, dispatches, and or hardware components used by the
|
|
application are profiled. It stores results locally in an output folder:
|
|
``./workloads/\<name>``.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ rocprof-compute profile --help
|
|
|
|
See :doc:`profile/mode` to learn about this mode in depth and to get started
|
|
profiling with ROCm Compute Profiler.
|
|
|
|
.. _modes-analyze:
|
|
|
|
Analyze mode
|
|
------------
|
|
|
|
``analyze``
|
|
Loads profiling data from the ``--path`` (``-p``) directory into the ROCm Compute Profiler
|
|
CLI analyzer where you have immediate access to profiling results and
|
|
generated metrics. It generates metrics from the entirety of your profiled
|
|
application or a subset identified through the ROCm Compute Profiler CLI analysis filters.
|
|
|
|
To generate a lightweight GUI interface, you can add the ``--gui`` flag to your
|
|
analysis command.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ rocprof-compute analyze --help
|
|
|
|
Analyze mode now supports a lightweight Text-based User Interface (TUI) that
|
|
provides an interactive terminal experience for enhanced usability. To enable TUI mode,
|
|
use the ``--tui`` flag when running the analyze command:
|
|
|
|
.. code-block:: shell
|
|
|
|
$ rocprof-compute analyze --tui
|
|
|
|
See :doc:`analyze/mode` to learn about these modes in depth and to get started
|
|
with analysis using ROCm Compute Profiler.
|
|
|
|
.. _global-options:
|
|
|
|
Global options
|
|
==============
|
|
|
|
The ROCm Compute Profiler command line tool has a set of *global* utility options that are
|
|
available across all modes.
|
|
|
|
``-v``, ``--version``
|
|
Prints the ROCm Compute Profiler version and exits.
|
|
|
|
``-V``, ``--verbose``
|
|
Increases output verbosity. Use multiple times for higher levels of
|
|
verbosity.
|
|
|
|
``-q``, ``--quiet``
|
|
Reduces output verbosity and runs quietly.
|
|
|
|
``-s``, ``--specs``
|
|
Prints system specs and exits.
|
|
|
|
.. note::
|
|
|
|
ROCm Compute Profiler also recognizes the project variable, ``ROCPROFCOMPUTE_COLOR`` should you
|
|
choose to disable colorful output. To disable default colorful behavior, set
|
|
this variable to ``0``.
|
|
|
|
.. _basic-operations:
|
|
|
|
Basic operations
|
|
================
|
|
|
|
The following table lists ROCm Compute Profiler's basic operations, their
|
|
:ref:`modes <modes>`, and required arguments.
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
|
|
* - Operation description
|
|
- Mode
|
|
- Required arguments
|
|
|
|
* - :doc:`Profile a workload </how-to/profile/mode>`
|
|
- ``profile``
|
|
- ``--name``, ``-- <profile_cmd>``
|
|
|
|
* - :ref:`Standalone roofline analysis <standalone-roofline>`
|
|
- ``profile``
|
|
- ``--name``, ``--roof-only``, ``--roofline-data-type <data_type>``, ``-- <profile_cmd>``
|
|
|
|
* - :doc:`Launch standalone GUI from CLI </how-to/analyze/standalone-gui>`
|
|
- ``analyze``
|
|
- ``--path``, ``--gui``
|
|
|
|
* - :doc:`Interact with profiling results from CLI </how-to/analyze/cli>`
|
|
- ``analyze``
|
|
- ``--path``
|