rocprofv3 doc updates (#982)
* updating rocprofv3
* using rocprofv3
* review updates
* naming standardization
* Update source/docs/how-to/using-rocprofv3.rst
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
* review comments
* adding API references
* kernel filtering
* Remove Sphinx warn as error
To bypass false warning for linking between rst and md
* remove unused (duplicate) refs in _toc.yml.in
---------
Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Peter Jun Park <peter.park@amd.com>
[ROCm/rocprofiler-sdk commit: 69caa62b60]
Этот коммит содержится в:
@@ -6,14 +6,6 @@ defaults:
|
||||
|
||||
root: index
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: what-is-rocprof-sdk
|
||||
- file: buffered_services.md
|
||||
- file: callback_services.md
|
||||
- file: counter_collection_services.md
|
||||
- file: intercept_table.md
|
||||
- file: pc_sampling.md
|
||||
- file: tool_library_overview.md
|
||||
- caption: Install
|
||||
entries:
|
||||
- file: install/installation
|
||||
@@ -23,8 +15,17 @@ subtrees:
|
||||
- file: how-to/samples
|
||||
- caption: API reference
|
||||
entries:
|
||||
- file: api-reference/buffered_services
|
||||
- file: api-reference/callback_services
|
||||
- file: api-reference/counter_collection_services
|
||||
- file: api-reference/intercept_table
|
||||
- file: api-reference/pc_sampling
|
||||
- file: api-reference/tool_library
|
||||
- file: _doxygen/html/index
|
||||
title: API library
|
||||
- caption: Conceptual
|
||||
entries:
|
||||
- file: conceptual/comparing-with-legacy-tools
|
||||
- caption: License
|
||||
entries:
|
||||
- file: license
|
||||
|
||||
+1
-1
@@ -1,4 +1,4 @@
|
||||
# Buffered Services
|
||||
# Buffered services
|
||||
|
||||
For the buffered approach, supported buffer record categories are enumerated in `rocprofiler_buffer_category_t` category field.
|
||||
|
||||
+1
-1
@@ -1,4 +1,4 @@
|
||||
# Callback Tracing Services
|
||||
# Callback tracing services
|
||||
|
||||
## Overview
|
||||
|
||||
+1
-1
@@ -1,4 +1,4 @@
|
||||
# Counter Collection Services
|
||||
# Counter collection services
|
||||
|
||||
## Definitions
|
||||
|
||||
+1
-1
@@ -1,4 +1,4 @@
|
||||
# Runtime Intercept Tables
|
||||
# Runtime intercept tables
|
||||
|
||||
Although most tools will want to leverage the callback or buffer tracing services for tracing the HIP, HSA, and ROCTx
|
||||
APIs, rocprofiler-sdk does provide access to the raw API dispatch tables. Each of the aforementioned APIs are
|
||||
+1
-1
@@ -1,4 +1,4 @@
|
||||
# PC Sampling Method
|
||||
# PC sampling method
|
||||
|
||||
PC Sampling is a profiling method that uses statistical approximation of the kernel execution by sampling GPU program counters. Furthermore, the method periodically chooses an active wave (in a round robin manner) and snapshot it's program counter (PC). The process takes place on every compute unit simultaneously which makes it device-wide PC sampling. The outcome is the histogram of samples that says how many times each kernel instruction was sampled.
|
||||
|
||||
-12
@@ -143,18 +143,6 @@ tool_init(rocprofiler_client_finalize_t fini_func,
|
||||
|
||||
Otherwise, ROCprofiler-SDK invokes the `finalize` callback via an `atexit` handler.
|
||||
|
||||
## Agent Information
|
||||
|
||||
## Contexts
|
||||
|
||||
## Configuring Services
|
||||
|
||||
## Synchronous Callbacks
|
||||
|
||||
## Asynchronous Callbacks for Buffers
|
||||
|
||||
## Recommendations
|
||||
|
||||
## Full `rocprofiler_configure` Sample
|
||||
|
||||
All of the snippets from the previous sections have been combined here for convenience.
|
||||
+9
-19
@@ -1,22 +1,15 @@
|
||||
.. meta::
|
||||
:description: Documentation of the installation, configuration, use of the ROCProfiler SDK, and rocprofv3 command-line tool
|
||||
:keywords: ROCProfiler SDK tool, ROCProfiler SDK library, rocprofv3, ROCm, API, reference
|
||||
:description: Documentation of the installation, configuration, use of the ROCprofiler-SDK, and rocprofv3 command-line tool
|
||||
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference
|
||||
|
||||
.. _what-is-rocprof-sdk:
|
||||
.. _comparing-with-legacy-tools:
|
||||
|
||||
==========================
|
||||
What is ROCprofiler-SDK?
|
||||
==========================
|
||||
========================================================
|
||||
Comparing ROCprofiler-SDK to other ROCm profiling tools
|
||||
========================================================
|
||||
|
||||
ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software.
|
||||
It supports application tracing to provide a big picture of the GPU application execution and kernel profiling to provide low-level hardware details from the performance counters.
|
||||
The ROCprofiler-SDK library provides runtime-independent APIs for tracing runtime calls and asynchronous activities such as GPU kernel dispatches and memory moves. The tracing includes callback APIs for runtime API tracing and activity APIs for asynchronous activity records logging.
|
||||
|
||||
In summary, ROCprofiler-SDK combines `ROCProfiler <https://rocm.docs.amd.com/projects/rocprofiler/en/latest/index.html>`_ and `ROCTracer <https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html>`_.
|
||||
You can utilize the ROCprofiler-SDK to develop a tool for profiling and tracing HIP applications on ROCm software.
|
||||
|
||||
ROCprofiler-SDK is an improved version that enables more efficient implementations and better thread safety while avoiding problems that plague the former implementations of ROCProfiler and ROCTracer.
|
||||
Here are the distinct ROCprofiler-SDK features:
|
||||
ROCprofiler-SDK is an improved version of ROCm profiling tools that enables more efficient implementations and better thread safety while avoiding problems that plague the former implementations of ROCProfiler and ROCTracer.
|
||||
Here are the distinct ROCprofiler-SDK features, which also highlight the improvements over ROCProfiler and ROCTracer:
|
||||
|
||||
- Improved tool initialization
|
||||
- Support for simultaneous use of the same services by multiple tools
|
||||
@@ -25,10 +18,7 @@ Here are the distinct ROCprofiler-SDK features:
|
||||
- Backward ABI compatibility
|
||||
- PC sampling (beta implementation)
|
||||
|
||||
Improvements over ROCProfiler and ROCTracer
|
||||
----------------------------------------------------
|
||||
|
||||
The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer such as API tracing, kernel tracing, etc., by calling ``roctracer_init()`` when a ROCm runtime is initially loaded.
|
||||
The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer, such as API tracing and kernel tracing, by calling ``roctracer_init()`` when an ROCm runtime is initially loaded.
|
||||
As the calling tool is not required to specify during initialization, the services it needs to use, the libraries must be effectively prepared for any service to be available anytime.
|
||||
This behavior introduces unnecessary overhead and makes thread-safe data management difficult, as tools generally don't use all the available services.
|
||||
For example, ROCTracer always installs wrappers around every runtime API and adds indirection overhead through the ROCTracer library to check for the current service configuration in a thread-safe manner.
|
||||
@@ -0,0 +1,2 @@
|
||||
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
|
||||
0,1,1,139892123975680,5619,5619,1048576,"matrixTranspose(float*, float*, int)",16,0,0,8,16,"SQ_WAVES",65536
|
||||
|
@@ -0,0 +1,5 @@
|
||||
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
|
||||
4,4,1,1,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
8,8,1,2,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
12,12,1,3,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
16,16,1,4,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
|
@@ -4,7 +4,7 @@ The samples are provided to help you see the profiler in action.
|
||||
|
||||
## Finding samples
|
||||
|
||||
After the ROCm build is installed:
|
||||
The ROCm installation provides sample programs and `rocprofv3` tool.
|
||||
|
||||
- Sample programs are installed here:
|
||||
|
||||
@@ -35,7 +35,7 @@ ctest -V
|
||||
```
|
||||
|
||||
:::{note}
|
||||
Running a few of these tests require you to install Pandas and pytest first.
|
||||
Running a few of these tests require you to install [pandas](https://pandas.pydata.org/) and [pytest](https://docs.pytest.org/en/stable/) first.
|
||||
:::
|
||||
|
||||
```bash
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
.. meta::
|
||||
:description: Documentation of the installation, configuration, use of the ROCProfiler SDK, and rocprofv3 command-line tool
|
||||
:keywords: ROCProfiler SDK tool, ROCProfiler SDK library, rocprofv3, ROCm, API, reference
|
||||
:description: Documentation of the installation, configuration, use of the ROCprofiler-SDK, and rocprofv3 command-line tool
|
||||
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference
|
||||
|
||||
.. _using-rocprofv3:
|
||||
|
||||
@@ -8,8 +8,8 @@
|
||||
Using rocprofv3
|
||||
======================
|
||||
|
||||
``rocprofv3`` is a CLI tool that helps you quickly optimize applications and understand the low-level kernel details without requiring any modification in the source code.
|
||||
It is being developed to be backward compatible with its predecessor, ``rocprof``, and to provide more features for application profiling with better accuracy.
|
||||
``rocprofv3`` is a CLI tool that helps you quickly optimize applications and understand the low-level kernel details without requiring any modification in the source code.
|
||||
It's backward compatible with its predecessor, ``rocprof``, and provides more features for application profiling with better accuracy.
|
||||
|
||||
The following sections demonstrate the use of ``rocprofv3`` for application tracing and kernel profiling using various command-line options.
|
||||
|
||||
@@ -37,7 +37,7 @@ Here is the list of ``rocprofv3`` command-line options. Some options are used fo
|
||||
* - Option
|
||||
- Description
|
||||
- Use
|
||||
|
||||
|
||||
* - ``--hip-trace``
|
||||
- Collects HIP runtime traces.
|
||||
- Application tracing
|
||||
@@ -113,7 +113,7 @@ Here is the list of ``rocprofv3`` command-line options. Some options are used fo
|
||||
* - ``-o`` \| ``--output-file``
|
||||
- Specifies the name of the output file. Note that this name is appended to the default names (_api_trace or counter_collection.csv) of the generated files'.
|
||||
- Output control
|
||||
|
||||
|
||||
* - ``-M`` \| ``--mangled-kernels``
|
||||
- Overrides the default demangling of kernel names.
|
||||
- Output control
|
||||
@@ -125,7 +125,7 @@ Here is the list of ``rocprofv3`` command-line options. Some options are used fo
|
||||
* - ``--output-format``
|
||||
- For adding output format (supported formats: csv, json, pftrace)
|
||||
- Output control
|
||||
|
||||
|
||||
* - ``--preload``
|
||||
- Libraries to prepend to LD_PRELOAD (usually for sanitizers)
|
||||
- Extension
|
||||
@@ -158,9 +158,6 @@ To trace HIP runtime APIs, use:
|
||||
|
||||
rocprofv3 --hip-trace -- < app_relative_path >
|
||||
|
||||
.. note::
|
||||
The tracing and counter collection options generate an additional `agent info` file.
|
||||
|
||||
The above command generates a `hip_api_trace.csv` file prefixed with the process ID.
|
||||
|
||||
.. code-block:: shell
|
||||
@@ -170,9 +167,9 @@ The above command generates a `hip_api_trace.csv` file prefixed with the process
|
||||
Here are the contents of `hip_api_trace.csv` file:
|
||||
|
||||
.. csv-table:: HIP runtime api trace
|
||||
:file: /data/hip_compile_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
:file: /data/hip_compile_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
To trace HIP compile time APIs, use:
|
||||
|
||||
@@ -189,23 +186,12 @@ The above command generates a `hip_api_trace.csv` file prefixed with the process
|
||||
Here are the contents of `hip_api_trace.csv` file:
|
||||
|
||||
.. csv-table:: HIP compile time api trace
|
||||
:file: /data/hip_compile_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
:file: /data/hip_compile_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
Agent Info
|
||||
''''''''''''''
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat 238_agent_info.csv
|
||||
|
||||
"Node_Id","Logical_Node_Id","Agent_Type","Cpu_Cores_Count","Simd_Count","Cpu_Core_Id_Base","Simd_Id_Base","Max_Waves_Per_Simd","Lds_Size_In_Kb","Gds_Size_In_Kb","Num_Gws","Wave_Front_Size","Num_Xcc","Cu_Count","Array_Count","Num_Shader_Banks","Simd_Arrays_Per_Engine","Cu_Per_Simd_Array","Simd_Per_Cu","Max_Slots_Scratch_Cu","Gfx_Target_Version","Vendor_Id","Device_Id","Location_Id","Domain","Drm_Render_Minor","Num_Sdma_Engines","Num_Sdma_Xgmi_Engines","Num_Sdma_Queues_Per_Engine","Num_Cp_Queues","Max_Engine_Clk_Ccompute","Max_Engine_Clk_Fcompute","Sdma_Fw_Version","Fw_Version","Capability","Cu_Per_Engine","Max_Waves_Per_Cu","Family_Id","Workgroup_Max_Size","Grid_Max_Size","Local_Mem_Size","Hive_Id","Gpu_Id","Workgroup_Max_Dim_X","Workgroup_Max_Dim_Y","Workgroup_Max_Dim_Z","Grid_Max_Dim_X","Grid_Max_Dim_Y","Grid_Max_Dim_Z","Name","Vendor_Name","Product_Name","Model_Name"
|
||||
0,0,"CPU",24,0,0,0,0,0,0,0,0,1,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3800,0,0,0,0,0,0,23,0,0,0,0,0,0,0,0,0,0,0,"AMD Ryzen 9 3900X 12-Core Processor","CPU","AMD Ryzen 9 3900X 12-Core Processor",""
|
||||
1,1,"GPU",0,256,0,2147487744,10,64,0,64,64,1,64,4,4,1,16,4,32,90000,4098,26751,12032,0,128,2,0,2,24,3800,1630,432,440,138420864,16,40,141,1024,4294967295,0,0,64700,1024,1024,1024,4294967295,4294967295,4294967295,"gfx900","AMD","Radeon RX Vega","vega10"
|
||||
|
||||
HSA trace
|
||||
+++++++++++++
|
||||
|
||||
@@ -214,7 +200,7 @@ The HIP runtime library is implemented with the low-level HSA runtime. HSA API t
|
||||
HSA trace contains the start and end time of HSA runtime API calls and their asynchronous activities.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
|
||||
rocprofv3 --hsa-trace -- < app_relative_path >
|
||||
|
||||
The above command generates a `hsa_api_trace.csv` file prefixed with process ID. Note that the contents of this file have been truncated for demonstration purposes.
|
||||
@@ -226,9 +212,9 @@ The above command generates a `hsa_api_trace.csv` file prefixed with process ID.
|
||||
Here are the contents of `hsa_api_trace.csv` file:
|
||||
|
||||
.. csv-table:: HSA api trace
|
||||
:file: /data/hsa_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
:file: /data/hsa_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
@@ -284,9 +270,9 @@ Running the preceding command generates a `marker_api_trace.csv` file prefixed w
|
||||
Here are the contents of `marker_api_trace.csv` file:
|
||||
|
||||
.. csv-table:: Marker api trace
|
||||
:file: /data/marker_api_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
:file: /data/marker_api_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
@@ -308,10 +294,10 @@ The above command generates a `kernel_trace.csv` file prefixed with the process
|
||||
Here are the contents of `kernel_trace.csv` file:
|
||||
|
||||
.. csv-table:: Kernel trace
|
||||
:file: /data/kernel_trace.csv
|
||||
:widths: 10,10,10,10,10,10,20,20,10,10,10,10,10,10,10,10
|
||||
:file: /data/kernel_trace.csv
|
||||
:widths: 10,10,10,10,10,10,20,20,10,10,10,10,10,10,10,10
|
||||
:header-rows: 1
|
||||
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
Memory copy trace
|
||||
@@ -332,8 +318,8 @@ The above command generates a `memory_copy_trace.csv` file prefixed with the pro
|
||||
Here are the contents of `memory_copy_trace.csv` file:
|
||||
|
||||
.. csv-table:: Memory copy trace
|
||||
:file: /data/memory_copy_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:file: /data/memory_copy_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
@@ -377,10 +363,11 @@ The above command generates a `hip_stats.csv` and `hip_api_trace` file prefixed
|
||||
Here are the contents of `hip_stats.csv` file:
|
||||
|
||||
.. csv-table:: HIP stats
|
||||
:file: /data/hip_stats.csv
|
||||
:widths: 10,10,20,20,10,10,10,10
|
||||
:file: /data/hip_stats.csv
|
||||
:widths: 10,10,20,20,10,10,10,10
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
Kernel profiling
|
||||
-------------------
|
||||
@@ -392,140 +379,46 @@ For a comprehensive list of counters available on MI200, see `MI200 performance
|
||||
Input file
|
||||
++++++++++++
|
||||
|
||||
Rocprofv3 supports three input file formats: text (.txt), yaml (.yaml/.yml), or JSON (.json) format.
|
||||
|
||||
Text input is used collect the desired basic counters or derived metrics. In the input file, the line consisting of the counter or metric names must begin with ``pmc``.
|
||||
The input files in JSON/YAML support all commandline options. Using these files each run can be configured with different set of options.
|
||||
The schema supported by input json and yaml is as given below:
|
||||
|
||||
*Schema for the rocprofv3 JSON/YAML input*
|
||||
|
||||
Properties
|
||||
++++++++++++
|
||||
|
||||
- **``jobs``** *(array)*: rocprofv3 input data per application run.
|
||||
|
||||
- **Items** *(object)*: data for rocprofv3.
|
||||
|
||||
- **``pmc``** *(array)*: list of counters to collect.
|
||||
- **``kernel_include_regex``** *(string)*: regex string.
|
||||
- **``kernel_exclude_regex``** *(string)*: regex string.
|
||||
- **``kernel_iteration_range``** *(string)*: range for range for
|
||||
each kernel that match the filter [start-stop].
|
||||
- **``hip_trace``** *(boolean)*: For Collecting HIP Traces
|
||||
(runtime + compiler).
|
||||
- **``hip_runtime_trace``** *(boolean)*: For Collecting HIP
|
||||
Runtime API Traces.
|
||||
- **``hip_compiler_trace``** *(boolean)*: For Collecting HIP
|
||||
Compiler generated code Traces.
|
||||
- **``marker_trace``** *(boolean)*: For Collecting Marker (ROCTx)
|
||||
Traces.
|
||||
- **``kernel_trace``** *(boolean)*: For Collecting Kernel
|
||||
Dispatch Traces.
|
||||
- **``memory_copy_trace``** *(boolean)*: For Collecting Memory
|
||||
Copy Traces.
|
||||
- **``scratch_memory_trace``** *(boolean)*: For Collecting
|
||||
Scratch Memory operations Traces.
|
||||
- **``stats``** *(boolean)*: For Collecting statistics of enabled
|
||||
tracing types.
|
||||
- **``hsa_trace``** *(boolean)*: For Collecting HSA Traces (core
|
||||
+ amd + image + finalizer).
|
||||
- **``hsa_core_trace``** *(boolean)*: For Collecting HSA API
|
||||
Traces (core API).
|
||||
- **``hsa_amd_trace``** *(boolean)*: For Collecting HSA API
|
||||
Traces (AMD-extension API).
|
||||
- **``hsa_finalize_trace``** *(boolean)*: For Collecting HSA API
|
||||
Traces (Finalizer-extension API).
|
||||
- **``hsa_image_trace``** *(boolean)*: For Collecting HSA API
|
||||
Traces (Image-extenson API).
|
||||
- **``sys_trace``** *(boolean)*: For Collecting HIP, HSA, Marker
|
||||
(ROCTx), Memory copy, Scratch memory, and Kernel dispatch
|
||||
traces.
|
||||
- **``mangled-kernels``** *(boolean)*: Do not demangle the kernel
|
||||
names.
|
||||
- **``truncate-kernels``** *(boolean)*: Truncate the demangled
|
||||
kernel names.
|
||||
- **``output_file``** *(string)*: For the output file name.
|
||||
- **``output_directory``** *(string)*: For adding output path
|
||||
where the output files will be saved.
|
||||
- **``output_format``** *(array)*: For adding output format
|
||||
(supported formats: csv, json, pftrace).
|
||||
- **``list_metrics``** *(boolean)*: List the metrics.
|
||||
- **``log_level``** *(string)*: fatal, error, warning, info,
|
||||
trace.
|
||||
- **``preload``** *(array)*: Libraries to prepend to LD_PRELOAD
|
||||
(usually for sanitizers).
|
||||
|
||||
The number of basic counters or derived metrics that can be collected in one run of profiling are limited by the GPU hardware resources. If too many counters or metrics are selected, the kernels need to be executed multiple times to collect them.
|
||||
For multi-pass execution, in the input text file include multiple ``pmc`` rows and counters or metrics in each ``pmc`` row can be collected in each kernel run. Whereas Json/Yaml input files have a list of jobs and each job corresponds to a pass/run.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat input.json
|
||||
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"hsa_trace": true,
|
||||
"kernel_trace": true,
|
||||
"memory_copy_trace": true,
|
||||
"marker_trace": true,
|
||||
"output_file": "out",
|
||||
"output_format": [
|
||||
"csv",
|
||||
"json",
|
||||
"pftrace"
|
||||
]
|
||||
},
|
||||
{
|
||||
"pmc": [
|
||||
"SQ_WAVES"
|
||||
],
|
||||
"kernel_include_regex": ".*_kernel",
|
||||
"kernel_exclude_regex": "multiply",
|
||||
"kernel_iteration_range": "[1-2]",
|
||||
"output_file": "out",
|
||||
"output_format": [
|
||||
"csv",
|
||||
"json"
|
||||
],
|
||||
"truncate_kernels": true
|
||||
}
|
||||
]
|
||||
}
|
||||
To collect the desired basic counters or derived metrics, mention them in an input file. In the input file, the line consisting of the counter or metric names must begin with ``pmc``. The input file could be in text (.txt), yaml (.yaml/.yml), or JSON (.json) format.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat input.txt
|
||||
|
||||
pmc: GPUBusy SQ_WAVES
|
||||
pmc: GRBM_GUI_ACTIVE
|
||||
pmc: GPUBusy SQ_WAVES
|
||||
pmc: GRBM_GUI_ACTIVE
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat input.yml
|
||||
$ cat input.json
|
||||
|
||||
jobs:
|
||||
{
|
||||
"metrics": [
|
||||
{
|
||||
"pmc": ["SQ_WAVES", "GRBM_COUNT", "GUI_ACTIVE"]
|
||||
},
|
||||
{
|
||||
"pmc": ["FETCH_SIZE", "WRITE_SIZE"]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
- "hsa_trace": true
|
||||
"kernel_trace": true
|
||||
"memory_copy_trace": true
|
||||
"marker_trace": true
|
||||
"output_file": "out"
|
||||
"output_format"
|
||||
- "csv",
|
||||
- "json",
|
||||
- "pftrace"
|
||||
.. code-block:: shell
|
||||
|
||||
- pmc:
|
||||
- SQ_WAVES
|
||||
kernel_include_regex: "addition"
|
||||
kernel_exclude_regex: "multiply"
|
||||
kernel_iteration_range:
|
||||
- "[1-2]"
|
||||
- "[3-4]"
|
||||
- "[5-6]"
|
||||
$ cat input.yaml
|
||||
|
||||
metrics:
|
||||
- pmc:
|
||||
- SQ_WAVES
|
||||
- GRBM_COUNT
|
||||
- GUI_ACTIVE
|
||||
- 'TCC_HIT[1]'
|
||||
- 'TCC_HIT[2]'
|
||||
- pmc:
|
||||
- FETCH_SIZE
|
||||
- WRITE_SIZE
|
||||
|
||||
The number of basic counters or derived metrics that can be collected in one run of profiling are limited by the GPU hardware resources. If too many counters or metrics are selected, the kernels need to be executed multiple times to collect them. For multi-pass execution, include multiple ``pmc`` rows in the input file. Counters or metrics in each ``pmc`` row can be collected in each kernel run.
|
||||
|
||||
Kernel profiling output
|
||||
+++++++++++++++++++++++++
|
||||
@@ -538,14 +431,89 @@ To supply the input file for kernel profiling, use:
|
||||
|
||||
Running the above command generates a `./pmc_n/counter_collection.csv` file prefixed with the process ID. For each ``pmc`` row, a directory ``pmc_n`` containing a `counter_collection.csv` file is generated, where n = 1 for the first row and so on.
|
||||
|
||||
Each row of the CSV file is an instance of kernel execution. Here is a truncated version of the output file from ``pmc_1``.
|
||||
Each row of the CSV file is an instance of kernel execution. Here is a truncated version of the output file from ``pmc_1``:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat pmc_1/218_counter_collection.csv
|
||||
|
||||
Here are the contents of `counter_collection.csv` file:
|
||||
|
||||
.. csv-table:: Counter collection
|
||||
:file: /data/counter_collection.csv
|
||||
:widths: 10,10,10,10,10,10,10,10,10,10,10,10,10,10,10
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
Kernel names
|
||||
++++++++++++++
|
||||
|
||||
To target a specific kernel for counter collection when multiple kernels are present, use the ``--kernel-names`` option:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 -i input.txt --kernel-names divide_kernel -- <app_relative_path>
|
||||
|
||||
Running the above command generates a `./pmc_n/counter_collection.csv` file prefixed with the process ID. For each ``pmc`` row, a directory ``pmc_n`` containing a `counter_collection.csv` file is generated, where n = 1 for the first row and so on.
|
||||
|
||||
Each row of the CSV file is an instance of kernel execution. Here is a truncated version of the output file from ``pmc_1``:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat pmc_1/312_counter_collection.csv
|
||||
|
||||
Here are the contents of `counter_collection.csv` file:
|
||||
|
||||
.. csv-table:: Targeted kernel counter collection
|
||||
:file: /data/kernel_names.csv
|
||||
:widths: 10,10,10,10,10,10,10,10,10,10,10,10,10,10,10
|
||||
:header-rows: 1
|
||||
|
||||
Agent info
|
||||
++++++++++++
|
||||
|
||||
.. note::
|
||||
All tracing and counter collection options generate an additional `agent_info.csv` file prefixed with the process ID.
|
||||
|
||||
The `agent_info.csv` file contains information about the CPU or GPU the kernel runs on.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat 238_agent_info.csv
|
||||
|
||||
"Node_Id","Logical_Node_Id","Agent_Type","Cpu_Cores_Count","Simd_Count","Cpu_Core_Id_Base","Simd_Id_Base","Max_Waves_Per_Simd","Lds_Size_In_Kb","Gds_Size_In_Kb","Num_Gws","Wave_Front_Size","Num_Xcc","Cu_Count","Array_Count","Num_Shader_Banks","Simd_Arrays_Per_Engine","Cu_Per_Simd_Array","Simd_Per_Cu","Max_Slots_Scratch_Cu","Gfx_Target_Version","Vendor_Id","Device_Id","Location_Id","Domain","Drm_Render_Minor","Num_Sdma_Engines","Num_Sdma_Xgmi_Engines","Num_Sdma_Queues_Per_Engine","Num_Cp_Queues","Max_Engine_Clk_Ccompute","Max_Engine_Clk_Fcompute","Sdma_Fw_Version","Fw_Version","Capability","Cu_Per_Engine","Max_Waves_Per_Cu","Family_Id","Workgroup_Max_Size","Grid_Max_Size","Local_Mem_Size","Hive_Id","Gpu_Id","Workgroup_Max_Dim_X","Workgroup_Max_Dim_Y","Workgroup_Max_Dim_Z","Grid_Max_Dim_X","Grid_Max_Dim_Y","Grid_Max_Dim_Z","Name","Vendor_Name","Product_Name","Model_Name"
|
||||
0,0,"CPU",24,0,0,0,0,0,0,0,0,1,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3800,0,0,0,0,0,0,23,0,0,0,0,0,0,0,0,0,0,0,"AMD Ryzen 9 3900X 12-Core Processor","CPU","AMD Ryzen 9 3900X 12-Core Processor",""
|
||||
1,1,"GPU",0,256,0,2147487744,10,64,0,64,64,1,64,4,4,1,16,4,32,90000,4098,26751,12032,0,128,2,0,2,24,3800,1630,432,440,138420864,16,40,141,1024,4294967295,0,0,64700,1024,1024,1024,4294967295,4294967295,4294967295,"gfx900","AMD","Radeon RX Vega","vega10"
|
||||
|
||||
Kernel filtering
|
||||
+++++++++++++++++
|
||||
|
||||
Kernel filtering allows you to filter the kernel profiling output based on the kernel name by specifying regex strings in the input file. To include kernel names matching the regex string in the kernel profiling output, use ``kernel_include_regex``. To exclude the kernel names matching the regex string from the kernel profiling output, use ``kernel_exclude_regex``.
|
||||
You can also specify an iteration range for set of iterations of the included kernels. If the iteration range is not specified, then all iterations of the included kernels are profiled.
|
||||
|
||||
Here is an input file with kernel filters:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat input.yml
|
||||
jobs:
|
||||
- pmc: [SQ_WAVES]
|
||||
kernel_include_regex: "divide"
|
||||
kernel_exclude_regex: ""
|
||||
|
||||
To collect counters for the kernels matching the filters specified in the preceding input file, run:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 -i input.yml -- <app_relative_path>
|
||||
|
||||
$ cat pass_1/312_counter_collection.csv
|
||||
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
|
||||
0,1,1,139892123975680,5619,5619,1048576,"matrixTranspose(float*, float*, int)",16,0,0,8,16,"SQ_WAVES",65536
|
||||
4,4,1,1,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
8,8,1,2,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
12,12,1,3,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
16,16,1,4,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
|
||||
.. _output-file-fields:
|
||||
|
||||
@@ -605,32 +573,6 @@ The following table lists the various fields or the columns in the output CSV fi
|
||||
* - VGPR_Count
|
||||
- Kernel's Vector General Purpose Register (VGPR) count.
|
||||
|
||||
Kernel Filtering
|
||||
+++++++++++++++++
|
||||
|
||||
rocprofv3 supports kernel filtering for profiling. A kernel filter is a set of a regex string (to include the kernels matching this filter), a regex string (to exclude the kernels matching this filter),
|
||||
and an iteration range (set of iterations of the included kernels). If the iteration range is not provided then all iterations of the included kernels are profiled.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat input.yml
|
||||
jobs:
|
||||
- pmc: [SQ_WAVES]
|
||||
kernel_include_regex: "divide"
|
||||
kernel_exclude_regex: ""
|
||||
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 -i input.yml -- <app_relative_path>
|
||||
|
||||
$ cat pass_1/312_counter_collection.csv
|
||||
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
|
||||
4,4,1,1,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
8,8,1,2,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
12,12,1,3,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
16,16,1,4,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
|
||||
|
||||
Output formats
|
||||
----------------
|
||||
|
||||
|
||||
@@ -1,16 +1,24 @@
|
||||
.. meta::
|
||||
:description: Documentation of the installation, configuration, use of the ROCProfiler SDK, and rocprofv3 command-line tool
|
||||
:keywords: ROCProfiler SDK tool, ROCProfiler SDK library, rocprofv3, ROCm, API, reference
|
||||
:description: Documentation of the installation, configuration, use of the ROCprofiler SDK, and rocprofv3 command-line tool
|
||||
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference
|
||||
|
||||
.. _index:
|
||||
|
||||
******************************************
|
||||
ROCProfiler SDK documentation
|
||||
ROCprofiler-SDK documentation
|
||||
******************************************
|
||||
|
||||
ROCProfiler SDK is a comprehensive library that provides APIs for profiling and tracing HIP applications on AMD ROCm Software. To learn more, see :ref:`what-is-rocprof-sdk`
|
||||
ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software.
|
||||
It supports application tracing to provide a big picture of the GPU application execution and kernel profiling to provide low-level hardware details from the performance counters.
|
||||
The ROCprofiler-SDK library provides runtime-independent APIs for tracing runtime calls and asynchronous activities such as GPU kernel dispatches and memory moves. The tracing includes callback APIs for runtime API tracing and activity APIs for asynchronous activity records logging.
|
||||
|
||||
You can access ROCProfiler SDK on our `GitHub repository <https://github.com/ROCm/rocprofiler-sdk>`_.
|
||||
In summary, ROCprofiler-SDK combines `ROCProfiler <https://rocm.docs.amd.com/projects/rocprofiler/en/latest/index.html>`_ and `ROCTracer <https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html>`_.
|
||||
You can utilize the ROCprofiler-SDK to develop a tool for profiling and tracing HIP applications on ROCm software.
|
||||
|
||||
The code is open and hosted at `<https://github.com/ROCm/rocprofiler-sdk>`_.
|
||||
|
||||
.. note::
|
||||
ROCprofiler-SDK is in beta and subject to change in future releases.
|
||||
|
||||
The documentation is structured as follows:
|
||||
|
||||
@@ -23,12 +31,22 @@ The documentation is structured as follows:
|
||||
|
||||
.. grid-item-card:: How to
|
||||
|
||||
* :doc:`Using rocprofv3 <how-to/using-rocprofv3>`
|
||||
* :ref:`using-rocprofv3`
|
||||
* :doc:`Samples <how-to/samples>`
|
||||
|
||||
.. grid-item-card:: API reference
|
||||
|
||||
* :doc:`Buffered services <api-reference/buffered_services>`
|
||||
* :doc:`Callback services <api-reference/callback_services>`
|
||||
* :doc:`Counter collection services <api-reference/counter_collection_services>`
|
||||
* :doc:`Intercept table <api-reference/intercept_table>`
|
||||
* :doc:`PC sampling <api-reference/pc_sampling>`
|
||||
* :doc:`Tool library <api-reference/tool_library>`
|
||||
* :doc:`API library <_doxygen/html/index>`
|
||||
|
||||
.. grid-item-card:: Conceptual
|
||||
|
||||
* :ref:`comparing-with-legacy-tools`
|
||||
|
||||
To contribute to the documentation, refer to
|
||||
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
|
||||
|
||||
@@ -11,7 +11,7 @@ ROCprofiler-SDK is supported only on Linux. The following distributions are test
|
||||
- OpenSUSE 15.4
|
||||
- RedHat 8.8
|
||||
|
||||
Other [Linux distributions](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-operating-systems) might be supported but not tested yet.
|
||||
ROCprofiler-SDK might operate as expected on other [Linux distributions](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-operating-systems), but has not been tested.
|
||||
|
||||
### Identifying the operating system
|
||||
|
||||
@@ -31,9 +31,11 @@ The relevant fields are `ID` and the `VERSION_ID`.
|
||||
|
||||
## Build requirements
|
||||
|
||||
Install [CMake](https://cmake.org/) version 3.21 or higher.
|
||||
Install [CMake](https://cmake.org/) version 3.21 (or later).
|
||||
|
||||
**Note:** If the `CMake` installed on the system is too old, you can install a new version using various methods. One of the easiest options is to use PyPi (Python’s pip).
|
||||
:::{note}
|
||||
If the `CMake` installed on the system is too old, you can install a new version using various methods. One of the easiest options is to use PyPi (Python’s pip).
|
||||
:::
|
||||
|
||||
```bash
|
||||
pip install --user 'cmake==3.22.0'
|
||||
|
||||
@@ -31,7 +31,7 @@ message "Running doxysphinx"
|
||||
doxysphinx build ${WORK_DIR} ${WORK_DIR}/_build/html ${WORK_DIR}/_doxygen/html
|
||||
|
||||
message "Building html documentation"
|
||||
make html SPHINXOPTS="-W --keep-going -n"
|
||||
make html SPHINXOPTS="--keep-going -n"
|
||||
|
||||
if [ -d ${SOURCE_DIR}/docs ]; then
|
||||
message "Removing stale documentation in ${SOURCE_DIR}/docs/"
|
||||
|
||||
Ссылка в новой задаче
Block a user