Files
vedithal-amd ae8f72fa79 [rocprofiler-compute] Use native tool for counter collection (#1212)
* Use native tool for counter collection

* Add native counter collection tool which uses rocprofiler-sdk C++
  library public API to get counter collection data
    * This is enabled by default, unless --no-native-tool option is
      provided or ROCPROF=rocprofv3 env. var. is provided
    * This tool is only supported for ROCm version >=7.x.x
    * This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
  t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
  collection and data collected by native tool is merged into the
  rocpd/csv output of rocprofiler-sdk tool

* Make `rocpd` choice the default choice for `--format-rocprof-output`
  option
    * If `rocpd` public API from rocprofiler-sdk library is not present,
      then fallback to `csv` choice
    * In this case only `pmc_perf.csv` is written in workload folder
      instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
  functions identical to `csv` option

* Rename option `--rocprofiler-sdk-library-path` to
  `--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
  rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object

* Fix the meaning of `--dispatch` option in `profile` mode to mention
  dispatch iteration filtering instead of dispatch id filtering
    * --dispatch option in analyze mode does dispatch id filtering

* Move standalone binary creation logic from cmake file to docker file

* fix native counter collection tool during attach/detach

* improve logging

* fix attach detach with native tool

* fix attach detach with native tool

* do not support attach/detach in native tool

* Update changelog

* add standalone binary creation functionality in cmake

* address review comments

* address review comments

* fix formatting

* address review comments

* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Updating changelog per docs team recommendations

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Apply suggestions from code review to changelog

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Do not required HIP complier to build native counter collection tool

* fix cmake

* gersemi formatting on latest cmake change

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* gersemi run for formatting

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* include cstdint for uint32_t

* Run formatting on helper.cpp

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr

* fix version

* fix changelog

* fix changelog

* run ruff formatter

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* fix rocprofiler-sdk attach so path

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-11-18 23:34:38 -05:00

4.9 KiB
Исходник Постоянная ссылка Ответственный История

Getting Started

.. toctree::
   :glob:
   :maxdepth: 4

Quickstart

  1. Launch & Profile the target application with the command line profiler

    The command line profiler launches the target application, calls the rocProfiler API, and collects profile results for the specified kernels, dispatches, and/or IP blocks. If not specified, Omniperf will default to collecting all available counters for all kernels/dispatches launched by the user's executable.

    To collect the default set of data for all kernels in the target application, launch, e.g.:

    $ omniperf profile -n vcopy_data -- ./vcopy 1048576 256
    

    The app runs, each kernel is launched, and profiling results are generated. By default, results are written to (e.g.,) ./workloads/vcopy_data (configurable via the -n argument). To collect all requested profile information, it may be required to replay kernels multiple times.

  2. Customize data collection

    Options are available to specify for which kernels/metrics data should be collected. Note that filtering can be applied either in the profiling or analysis stage, however filtering at during profiling collection will often speed up your overall profiling run time.

    Some common filters include:

    • -k/--kernel enables filtering kernels by name. -d/--dispatch enables filtering based on dispatch iteration
    • -b/--ipblocks enables collects metrics for only the specified (one or more) IP Blocks.

    To view available metrics by IP Block you can use the --list-metrics argument to view a list of all available metrics organized by IP Block.

    $ omniperf analyze --list-metrics <sys_arch>
    
  3. Analyze at the command line

    After generating a local output folder (./workloads/<name>), the command line tool can also be used to quickly interface with profiling results. View different metrics derived from your profiled results and get immediate access all metrics organized by IP block.

    If no kernel, dispatch, or ipblock filters are applied at this stage, analysis will be reflective of the entirety of the profiling data.

    To interact with profiling results from a different session, users just provide the workload path. -p/--path enables users to analyze existing profiling data in the Omniperf CLI.

  4. Analyze in the Grafana GUI

    To conduct a more in-depth analysis of profiling results we recommend users utilize the Omniperf Grafana GUI. To interact with profiling results, users must import their data to the MongoDB instance included in the Omniperf dockerfile.

    To interact with Grafana GUI data, stored in the Omniperf DB, users can enter database mode. For example:

     $ omniperf database --import [CONNECTION OPTIONS]
    

Usage

Modes

Modes change the fundamental behavior of the Omniperf command line tool. Depending on which mode is chosen, different command line options become available.

  • Profile: Target application is launched on the local system utilizing AMDs ROC Profiler. Depending on the profiling options chosen, selected kernels, dispatches, and/or IP Blocks in the application are profiled and results are stored locally in an output folder (./workloads/<name>).

    $ omniperf profile --help
    
  • Analyze: Profiling data from -p/--path directory is loaded into the Omniperf CLI analyzer where users have immediate access to profiling results and generated metrics. Metrics are quickly generated from the entirety of your profiled application or a subset youve identified through the Omniperf CLI analysis filters.

    To gererate a lightweight GUI interface users can add the --gui flag to their analysis command.

    This mode is designed to be a middle ground to the highly detailed Omniperf Grafana GUI and is great for users who want immediate access to an IP Block theyre already familiar with.

    $ omniperf analyze --help
    
  • Database: Our detailed Grafana GUI is built on a MongoDB database. --import profiling results to the DB to interact with the workload in Grafana or --remove the workload from the DB.

    Connection options will need to be specified. See the Grafana Analysis import section for more details on this.

    $ omniperf database --help
    

Basic Operations

Operation Mode Required Arguments
Profile a workload profile --name, -- <profile_cmd>
Standalone roofline analysis profile --name, --roof-only, -- <profile_cmd>
Import a workload to database database --import, --host, --username, --workload, --team
Remove a workload from database database --remove, --host, --username, --workload, --team
Launch standalone GUI from CLI analyze --path, --gui
Interact with profiling results from CLI analyze --path