* updating rocprofv3

* using rocprofv3

* review updates

* naming standardization

* Update source/docs/how-to/using-rocprofv3.rst

Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>

* review comments

* adding API references

* kernel filtering

* Remove Sphinx warn as error

To bypass false warning for linking between rst and md

* remove unused (duplicate) refs in _toc.yml.in

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
Co-authored-by: Leo Paoletti <164940351+lpaoletti@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Peter Jun Park <peter.park@amd.com>
Этот коммит содержится в:
srawat
2024-08-03 00:38:04 +05:30
коммит произвёл GitHub
родитель cfbac19640
Коммит 69caa62b60
15 изменённых файлов: 195 добавлений и 247 удалений
+9 -8
Просмотреть файл
@@ -6,14 +6,6 @@ defaults:
root: index
subtrees:
- entries:
- file: what-is-rocprof-sdk
- file: buffered_services.md
- file: callback_services.md
- file: counter_collection_services.md
- file: intercept_table.md
- file: pc_sampling.md
- file: tool_library_overview.md
- caption: Install
entries:
- file: install/installation
@@ -23,8 +15,17 @@ subtrees:
- file: how-to/samples
- caption: API reference
entries:
- file: api-reference/buffered_services
- file: api-reference/callback_services
- file: api-reference/counter_collection_services
- file: api-reference/intercept_table
- file: api-reference/pc_sampling
- file: api-reference/tool_library
- file: _doxygen/html/index
title: API library
- caption: Conceptual
entries:
- file: conceptual/comparing-with-legacy-tools
- caption: License
entries:
- file: license
+1 -1
Просмотреть файл
@@ -1,4 +1,4 @@
# Buffered Services
# Buffered services
For the buffered approach, supported buffer record categories are enumerated in `rocprofiler_buffer_category_t` category field.
+1 -1
Просмотреть файл
@@ -1,4 +1,4 @@
# Callback Tracing Services
# Callback tracing services
## Overview
@@ -1,4 +1,4 @@
# Counter Collection Services
# Counter collection services
## Definitions
+1 -1
Просмотреть файл
@@ -1,4 +1,4 @@
# Runtime Intercept Tables
# Runtime intercept tables
Although most tools will want to leverage the callback or buffer tracing services for tracing the HIP, HSA, and ROCTx
APIs, rocprofiler-sdk does provide access to the raw API dispatch tables. Each of the aforementioned APIs are
+1 -1
Просмотреть файл
@@ -1,4 +1,4 @@
# PC Sampling Method
# PC sampling method
PC Sampling is a profiling method that uses statistical approximation of the kernel execution by sampling GPU program counters. Furthermore, the method periodically chooses an active wave (in a round robin manner) and snapshot it's program counter (PC). The process takes place on every compute unit simultaneously which makes it device-wide PC sampling. The outcome is the histogram of samples that says how many times each kernel instruction was sampled.
-12
Просмотреть файл
@@ -143,18 +143,6 @@ tool_init(rocprofiler_client_finalize_t fini_func,
Otherwise, ROCprofiler-SDK invokes the `finalize` callback via an `atexit` handler.
## Agent Information
## Contexts
## Configuring Services
## Synchronous Callbacks
## Asynchronous Callbacks for Buffers
## Recommendations
## Full `rocprofiler_configure` Sample
All of the snippets from the previous sections have been combined here for convenience.
+9 -19
Просмотреть файл
@@ -1,22 +1,15 @@
.. meta::
:description: Documentation of the installation, configuration, use of the ROCProfiler SDK, and rocprofv3 command-line tool
:keywords: ROCProfiler SDK tool, ROCProfiler SDK library, rocprofv3, ROCm, API, reference
:description: Documentation of the installation, configuration, use of the ROCprofiler-SDK, and rocprofv3 command-line tool
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference
.. _what-is-rocprof-sdk:
.. _comparing-with-legacy-tools:
==========================
What is ROCprofiler-SDK?
==========================
========================================================
Comparing ROCprofiler-SDK to other ROCm profiling tools
========================================================
ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software.
It supports application tracing to provide a big picture of the GPU application execution and kernel profiling to provide low-level hardware details from the performance counters.
The ROCprofiler-SDK library provides runtime-independent APIs for tracing runtime calls and asynchronous activities such as GPU kernel dispatches and memory moves. The tracing includes callback APIs for runtime API tracing and activity APIs for asynchronous activity records logging.
In summary, ROCprofiler-SDK combines `ROCProfiler <https://rocm.docs.amd.com/projects/rocprofiler/en/latest/index.html>`_ and `ROCTracer <https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html>`_.
You can utilize the ROCprofiler-SDK to develop a tool for profiling and tracing HIP applications on ROCm software.
ROCprofiler-SDK is an improved version that enables more efficient implementations and better thread safety while avoiding problems that plague the former implementations of ROCProfiler and ROCTracer.
Here are the distinct ROCprofiler-SDK features:
ROCprofiler-SDK is an improved version of ROCm profiling tools that enables more efficient implementations and better thread safety while avoiding problems that plague the former implementations of ROCProfiler and ROCTracer.
Here are the distinct ROCprofiler-SDK features, which also highlight the improvements over ROCProfiler and ROCTracer:
- Improved tool initialization
- Support for simultaneous use of the same services by multiple tools
@@ -25,10 +18,7 @@ Here are the distinct ROCprofiler-SDK features:
- Backward ABI compatibility
- PC sampling (beta implementation)
Improvements over ROCProfiler and ROCTracer
----------------------------------------------------
The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer such as API tracing, kernel tracing, etc., by calling ``roctracer_init()`` when a ROCm runtime is initially loaded.
The former implementations allow a tool to access any of the services provided by ROCProfiler or ROCTracer, such as API tracing and kernel tracing, by calling ``roctracer_init()`` when an ROCm runtime is initially loaded.
As the calling tool is not required to specify during initialization, the services it needs to use, the libraries must be effectively prepared for any service to be available anytime.
This behavior introduces unnecessary overhead and makes thread-safe data management difficult, as tools generally don't use all the available services.
For example, ROCTracer always installs wrappers around every runtime API and adds indirection overhead through the ROCTracer library to check for the current service configuration in a thread-safe manner.
+2
Просмотреть файл
@@ -0,0 +1,2 @@
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
0,1,1,139892123975680,5619,5619,1048576,"matrixTranspose(float*, float*, int)",16,0,0,8,16,"SQ_WAVES",65536
1 Correlation_Id Dispatch_Id Agent_Id Queue_Id Process_Id Thread_Id Grid_Size Kernel_Name Workgroup_Size LDS_Block_Size Scratch_Size VGPR_Count SGPR_Count Counter_Name Counter_Value
2 0 1 1 139892123975680 5619 5619 1048576 matrixTranspose(float*, float*, int) 16 0 0 8 16 SQ_WAVES 65536
+5
Просмотреть файл
@@ -0,0 +1,5 @@
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
4,4,1,1,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
8,8,1,2,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
12,12,1,3,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
16,16,1,4,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
1 Correlation_Id Dispatch_Id Agent_Id Queue_Id Process_Id Thread_Id Grid_Size Kernel_Name Workgroup_Size LDS_Block_Size Scratch_Size VGPR_Count SGPR_Count Counter_Name Counter_Value
2 4 4 1 1 36499 36499 1048576 divide_kernel(float*, float const*, float const*, int, int) 64 0 0 12 16 SQ_WAVES 16384
3 8 8 1 2 36499 36499 1048576 divide_kernel(float*, float const*, float const*, int, int) 64 0 0 12 16 SQ_WAVES 16384
4 12 12 1 3 36499 36499 1048576 divide_kernel(float*, float const*, float const*, int, int) 64 0 0 12 16 SQ_WAVES 16384
5 16 16 1 4 36499 36499 1048576 divide_kernel(float*, float const*, float const*, int, int) 64 0 0 12 16 SQ_WAVES 16384
+2 -2
Просмотреть файл
@@ -4,7 +4,7 @@ The samples are provided to help you see the profiler in action.
## Finding samples
After the ROCm build is installed:
The ROCm installation provides sample programs and `rocprofv3` tool.
- Sample programs are installed here:
@@ -35,7 +35,7 @@ ctest -V
```
:::{note}
Running a few of these tests require you to install Pandas and pytest first.
Running a few of these tests require you to install [pandas](https://pandas.pydata.org/) and [pytest](https://docs.pytest.org/en/stable/) first.
:::
```bash
+133 -191
Просмотреть файл
@@ -1,6 +1,6 @@
.. meta::
:description: Documentation of the installation, configuration, use of the ROCProfiler SDK, and rocprofv3 command-line tool
:keywords: ROCProfiler SDK tool, ROCProfiler SDK library, rocprofv3, ROCm, API, reference
:description: Documentation of the installation, configuration, use of the ROCprofiler-SDK, and rocprofv3 command-line tool
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference
.. _using-rocprofv3:
@@ -8,8 +8,8 @@
Using rocprofv3
======================
``rocprofv3`` is a CLI tool that helps you quickly optimize applications and understand the low-level kernel details without requiring any modification in the source code.
It is being developed to be backward compatible with its predecessor, ``rocprof``, and to provide more features for application profiling with better accuracy.
``rocprofv3`` is a CLI tool that helps you quickly optimize applications and understand the low-level kernel details without requiring any modification in the source code.
It's backward compatible with its predecessor, ``rocprof``, and provides more features for application profiling with better accuracy.
The following sections demonstrate the use of ``rocprofv3`` for application tracing and kernel profiling using various command-line options.
@@ -37,7 +37,7 @@ Here is the list of ``rocprofv3`` command-line options. Some options are used fo
* - Option
- Description
- Use
* - ``--hip-trace``
- Collects HIP runtime traces.
- Application tracing
@@ -113,7 +113,7 @@ Here is the list of ``rocprofv3`` command-line options. Some options are used fo
* - ``-o`` \| ``--output-file``
- Specifies the name of the output file. Note that this name is appended to the default names (_api_trace or counter_collection.csv) of the generated files'.
- Output control
* - ``-M`` \| ``--mangled-kernels``
- Overrides the default demangling of kernel names.
- Output control
@@ -125,7 +125,7 @@ Here is the list of ``rocprofv3`` command-line options. Some options are used fo
* - ``--output-format``
- For adding output format (supported formats: csv, json, pftrace)
- Output control
* - ``--preload``
- Libraries to prepend to LD_PRELOAD (usually for sanitizers)
- Extension
@@ -158,9 +158,6 @@ To trace HIP runtime APIs, use:
rocprofv3 --hip-trace -- < app_relative_path >
.. note::
The tracing and counter collection options generate an additional `agent info` file.
The above command generates a `hip_api_trace.csv` file prefixed with the process ID.
.. code-block:: shell
@@ -170,9 +167,9 @@ The above command generates a `hip_api_trace.csv` file prefixed with the process
Here are the contents of `hip_api_trace.csv` file:
.. csv-table:: HIP runtime api trace
:file: /data/hip_compile_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
:file: /data/hip_compile_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
To trace HIP compile time APIs, use:
@@ -189,23 +186,12 @@ The above command generates a `hip_api_trace.csv` file prefixed with the process
Here are the contents of `hip_api_trace.csv` file:
.. csv-table:: HIP compile time api trace
:file: /data/hip_compile_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
:file: /data/hip_compile_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
Agent Info
''''''''''''''
.. code-block:: shell
$ cat 238_agent_info.csv
"Node_Id","Logical_Node_Id","Agent_Type","Cpu_Cores_Count","Simd_Count","Cpu_Core_Id_Base","Simd_Id_Base","Max_Waves_Per_Simd","Lds_Size_In_Kb","Gds_Size_In_Kb","Num_Gws","Wave_Front_Size","Num_Xcc","Cu_Count","Array_Count","Num_Shader_Banks","Simd_Arrays_Per_Engine","Cu_Per_Simd_Array","Simd_Per_Cu","Max_Slots_Scratch_Cu","Gfx_Target_Version","Vendor_Id","Device_Id","Location_Id","Domain","Drm_Render_Minor","Num_Sdma_Engines","Num_Sdma_Xgmi_Engines","Num_Sdma_Queues_Per_Engine","Num_Cp_Queues","Max_Engine_Clk_Ccompute","Max_Engine_Clk_Fcompute","Sdma_Fw_Version","Fw_Version","Capability","Cu_Per_Engine","Max_Waves_Per_Cu","Family_Id","Workgroup_Max_Size","Grid_Max_Size","Local_Mem_Size","Hive_Id","Gpu_Id","Workgroup_Max_Dim_X","Workgroup_Max_Dim_Y","Workgroup_Max_Dim_Z","Grid_Max_Dim_X","Grid_Max_Dim_Y","Grid_Max_Dim_Z","Name","Vendor_Name","Product_Name","Model_Name"
0,0,"CPU",24,0,0,0,0,0,0,0,0,1,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3800,0,0,0,0,0,0,23,0,0,0,0,0,0,0,0,0,0,0,"AMD Ryzen 9 3900X 12-Core Processor","CPU","AMD Ryzen 9 3900X 12-Core Processor",""
1,1,"GPU",0,256,0,2147487744,10,64,0,64,64,1,64,4,4,1,16,4,32,90000,4098,26751,12032,0,128,2,0,2,24,3800,1630,432,440,138420864,16,40,141,1024,4294967295,0,0,64700,1024,1024,1024,4294967295,4294967295,4294967295,"gfx900","AMD","Radeon RX Vega","vega10"
HSA trace
+++++++++++++
@@ -214,7 +200,7 @@ The HIP runtime library is implemented with the low-level HSA runtime. HSA API t
HSA trace contains the start and end time of HSA runtime API calls and their asynchronous activities.
.. code-block:: bash
rocprofv3 --hsa-trace -- < app_relative_path >
The above command generates a `hsa_api_trace.csv` file prefixed with process ID. Note that the contents of this file have been truncated for demonstration purposes.
@@ -226,9 +212,9 @@ The above command generates a `hsa_api_trace.csv` file prefixed with process ID.
Here are the contents of `hsa_api_trace.csv` file:
.. csv-table:: HSA api trace
:file: /data/hsa_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
:file: /data/hsa_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
@@ -284,9 +270,9 @@ Running the preceding command generates a `marker_api_trace.csv` file prefixed w
Here are the contents of `marker_api_trace.csv` file:
.. csv-table:: Marker api trace
:file: /data/marker_api_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
:file: /data/marker_api_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
@@ -308,10 +294,10 @@ The above command generates a `kernel_trace.csv` file prefixed with the process
Here are the contents of `kernel_trace.csv` file:
.. csv-table:: Kernel trace
:file: /data/kernel_trace.csv
:widths: 10,10,10,10,10,10,20,20,10,10,10,10,10,10,10,10
:file: /data/kernel_trace.csv
:widths: 10,10,10,10,10,10,20,20,10,10,10,10,10,10,10,10
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
Memory copy trace
@@ -332,8 +318,8 @@ The above command generates a `memory_copy_trace.csv` file prefixed with the pro
Here are the contents of `memory_copy_trace.csv` file:
.. csv-table:: Memory copy trace
:file: /data/memory_copy_trace.csv
:widths: 10,10,10,10,10,20,20
:file: /data/memory_copy_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
@@ -377,10 +363,11 @@ The above command generates a `hip_stats.csv` and `hip_api_trace` file prefixed
Here are the contents of `hip_stats.csv` file:
.. csv-table:: HIP stats
:file: /data/hip_stats.csv
:widths: 10,10,20,20,10,10,10,10
:file: /data/hip_stats.csv
:widths: 10,10,20,20,10,10,10,10
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
Kernel profiling
-------------------
@@ -392,140 +379,46 @@ For a comprehensive list of counters available on MI200, see `MI200 performance
Input file
++++++++++++
Rocprofv3 supports three input file formats: text (.txt), yaml (.yaml/.yml), or JSON (.json) format.
Text input is used collect the desired basic counters or derived metrics. In the input file, the line consisting of the counter or metric names must begin with ``pmc``.
The input files in JSON/YAML support all commandline options. Using these files each run can be configured with different set of options.
The schema supported by input json and yaml is as given below:
*Schema for the rocprofv3 JSON/YAML input*
Properties
++++++++++++
- **``jobs``** *(array)*: rocprofv3 input data per application run.
- **Items** *(object)*: data for rocprofv3.
- **``pmc``** *(array)*: list of counters to collect.
- **``kernel_include_regex``** *(string)*: regex string.
- **``kernel_exclude_regex``** *(string)*: regex string.
- **``kernel_iteration_range``** *(string)*: range for range for
each kernel that match the filter [start-stop].
- **``hip_trace``** *(boolean)*: For Collecting HIP Traces
(runtime + compiler).
- **``hip_runtime_trace``** *(boolean)*: For Collecting HIP
Runtime API Traces.
- **``hip_compiler_trace``** *(boolean)*: For Collecting HIP
Compiler generated code Traces.
- **``marker_trace``** *(boolean)*: For Collecting Marker (ROCTx)
Traces.
- **``kernel_trace``** *(boolean)*: For Collecting Kernel
Dispatch Traces.
- **``memory_copy_trace``** *(boolean)*: For Collecting Memory
Copy Traces.
- **``scratch_memory_trace``** *(boolean)*: For Collecting
Scratch Memory operations Traces.
- **``stats``** *(boolean)*: For Collecting statistics of enabled
tracing types.
- **``hsa_trace``** *(boolean)*: For Collecting HSA Traces (core
+ amd + image + finalizer).
- **``hsa_core_trace``** *(boolean)*: For Collecting HSA API
Traces (core API).
- **``hsa_amd_trace``** *(boolean)*: For Collecting HSA API
Traces (AMD-extension API).
- **``hsa_finalize_trace``** *(boolean)*: For Collecting HSA API
Traces (Finalizer-extension API).
- **``hsa_image_trace``** *(boolean)*: For Collecting HSA API
Traces (Image-extenson API).
- **``sys_trace``** *(boolean)*: For Collecting HIP, HSA, Marker
(ROCTx), Memory copy, Scratch memory, and Kernel dispatch
traces.
- **``mangled-kernels``** *(boolean)*: Do not demangle the kernel
names.
- **``truncate-kernels``** *(boolean)*: Truncate the demangled
kernel names.
- **``output_file``** *(string)*: For the output file name.
- **``output_directory``** *(string)*: For adding output path
where the output files will be saved.
- **``output_format``** *(array)*: For adding output format
(supported formats: csv, json, pftrace).
- **``list_metrics``** *(boolean)*: List the metrics.
- **``log_level``** *(string)*: fatal, error, warning, info,
trace.
- **``preload``** *(array)*: Libraries to prepend to LD_PRELOAD
(usually for sanitizers).
The number of basic counters or derived metrics that can be collected in one run of profiling are limited by the GPU hardware resources. If too many counters or metrics are selected, the kernels need to be executed multiple times to collect them.
For multi-pass execution, in the input text file include multiple ``pmc`` rows and counters or metrics in each ``pmc`` row can be collected in each kernel run. Whereas Json/Yaml input files have a list of jobs and each job corresponds to a pass/run.
.. code-block:: shell
$ cat input.json
{
"jobs": [
{
"hsa_trace": true,
"kernel_trace": true,
"memory_copy_trace": true,
"marker_trace": true,
"output_file": "out",
"output_format": [
"csv",
"json",
"pftrace"
]
},
{
"pmc": [
"SQ_WAVES"
],
"kernel_include_regex": ".*_kernel",
"kernel_exclude_regex": "multiply",
"kernel_iteration_range": "[1-2]",
"output_file": "out",
"output_format": [
"csv",
"json"
],
"truncate_kernels": true
}
]
}
To collect the desired basic counters or derived metrics, mention them in an input file. In the input file, the line consisting of the counter or metric names must begin with ``pmc``. The input file could be in text (.txt), yaml (.yaml/.yml), or JSON (.json) format.
.. code-block:: shell
$ cat input.txt
pmc: GPUBusy SQ_WAVES
pmc: GRBM_GUI_ACTIVE
pmc: GPUBusy SQ_WAVES
pmc: GRBM_GUI_ACTIVE
.. code-block:: shell
$ cat input.yml
$ cat input.json
jobs:
{
"metrics": [
{
"pmc": ["SQ_WAVES", "GRBM_COUNT", "GUI_ACTIVE"]
},
{
"pmc": ["FETCH_SIZE", "WRITE_SIZE"]
}
]
}
- "hsa_trace": true
"kernel_trace": true
"memory_copy_trace": true
"marker_trace": true
"output_file": "out"
"output_format"
- "csv",
- "json",
- "pftrace"
.. code-block:: shell
- pmc:
- SQ_WAVES
kernel_include_regex: "addition"
kernel_exclude_regex: "multiply"
kernel_iteration_range:
- "[1-2]"
- "[3-4]"
- "[5-6]"
$ cat input.yaml
metrics:
- pmc:
- SQ_WAVES
- GRBM_COUNT
- GUI_ACTIVE
- 'TCC_HIT[1]'
- 'TCC_HIT[2]'
- pmc:
- FETCH_SIZE
- WRITE_SIZE
The number of basic counters or derived metrics that can be collected in one run of profiling are limited by the GPU hardware resources. If too many counters or metrics are selected, the kernels need to be executed multiple times to collect them. For multi-pass execution, include multiple ``pmc`` rows in the input file. Counters or metrics in each ``pmc`` row can be collected in each kernel run.
Kernel profiling output
+++++++++++++++++++++++++
@@ -538,14 +431,89 @@ To supply the input file for kernel profiling, use:
Running the above command generates a `./pmc_n/counter_collection.csv` file prefixed with the process ID. For each ``pmc`` row, a directory ``pmc_n`` containing a `counter_collection.csv` file is generated, where n = 1 for the first row and so on.
Each row of the CSV file is an instance of kernel execution. Here is a truncated version of the output file from ``pmc_1``.
Each row of the CSV file is an instance of kernel execution. Here is a truncated version of the output file from ``pmc_1``:
.. code-block:: shell
$ cat pmc_1/218_counter_collection.csv
Here are the contents of `counter_collection.csv` file:
.. csv-table:: Counter collection
:file: /data/counter_collection.csv
:widths: 10,10,10,10,10,10,10,10,10,10,10,10,10,10,10
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
Kernel names
++++++++++++++
To target a specific kernel for counter collection when multiple kernels are present, use the ``--kernel-names`` option:
.. code-block:: shell
rocprofv3 -i input.txt --kernel-names divide_kernel -- <app_relative_path>
Running the above command generates a `./pmc_n/counter_collection.csv` file prefixed with the process ID. For each ``pmc`` row, a directory ``pmc_n`` containing a `counter_collection.csv` file is generated, where n = 1 for the first row and so on.
Each row of the CSV file is an instance of kernel execution. Here is a truncated version of the output file from ``pmc_1``:
.. code-block:: shell
$ cat pmc_1/312_counter_collection.csv
Here are the contents of `counter_collection.csv` file:
.. csv-table:: Targeted kernel counter collection
:file: /data/kernel_names.csv
:widths: 10,10,10,10,10,10,10,10,10,10,10,10,10,10,10
:header-rows: 1
Agent info
++++++++++++
.. note::
All tracing and counter collection options generate an additional `agent_info.csv` file prefixed with the process ID.
The `agent_info.csv` file contains information about the CPU or GPU the kernel runs on.
.. code-block:: shell
$ cat 238_agent_info.csv
"Node_Id","Logical_Node_Id","Agent_Type","Cpu_Cores_Count","Simd_Count","Cpu_Core_Id_Base","Simd_Id_Base","Max_Waves_Per_Simd","Lds_Size_In_Kb","Gds_Size_In_Kb","Num_Gws","Wave_Front_Size","Num_Xcc","Cu_Count","Array_Count","Num_Shader_Banks","Simd_Arrays_Per_Engine","Cu_Per_Simd_Array","Simd_Per_Cu","Max_Slots_Scratch_Cu","Gfx_Target_Version","Vendor_Id","Device_Id","Location_Id","Domain","Drm_Render_Minor","Num_Sdma_Engines","Num_Sdma_Xgmi_Engines","Num_Sdma_Queues_Per_Engine","Num_Cp_Queues","Max_Engine_Clk_Ccompute","Max_Engine_Clk_Fcompute","Sdma_Fw_Version","Fw_Version","Capability","Cu_Per_Engine","Max_Waves_Per_Cu","Family_Id","Workgroup_Max_Size","Grid_Max_Size","Local_Mem_Size","Hive_Id","Gpu_Id","Workgroup_Max_Dim_X","Workgroup_Max_Dim_Y","Workgroup_Max_Dim_Z","Grid_Max_Dim_X","Grid_Max_Dim_Y","Grid_Max_Dim_Z","Name","Vendor_Name","Product_Name","Model_Name"
0,0,"CPU",24,0,0,0,0,0,0,0,0,1,24,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3800,0,0,0,0,0,0,23,0,0,0,0,0,0,0,0,0,0,0,"AMD Ryzen 9 3900X 12-Core Processor","CPU","AMD Ryzen 9 3900X 12-Core Processor",""
1,1,"GPU",0,256,0,2147487744,10,64,0,64,64,1,64,4,4,1,16,4,32,90000,4098,26751,12032,0,128,2,0,2,24,3800,1630,432,440,138420864,16,40,141,1024,4294967295,0,0,64700,1024,1024,1024,4294967295,4294967295,4294967295,"gfx900","AMD","Radeon RX Vega","vega10"
Kernel filtering
+++++++++++++++++
Kernel filtering allows you to filter the kernel profiling output based on the kernel name by specifying regex strings in the input file. To include kernel names matching the regex string in the kernel profiling output, use ``kernel_include_regex``. To exclude the kernel names matching the regex string from the kernel profiling output, use ``kernel_exclude_regex``.
You can also specify an iteration range for set of iterations of the included kernels. If the iteration range is not specified, then all iterations of the included kernels are profiled.
Here is an input file with kernel filters:
.. code-block:: shell
$ cat input.yml
jobs:
- pmc: [SQ_WAVES]
kernel_include_regex: "divide"
kernel_exclude_regex: ""
To collect counters for the kernels matching the filters specified in the preceding input file, run:
.. code-block:: shell
rocprofv3 -i input.yml -- <app_relative_path>
$ cat pass_1/312_counter_collection.csv
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
0,1,1,139892123975680,5619,5619,1048576,"matrixTranspose(float*, float*, int)",16,0,0,8,16,"SQ_WAVES",65536
4,4,1,1,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
8,8,1,2,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
12,12,1,3,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
16,16,1,4,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
.. _output-file-fields:
@@ -605,32 +573,6 @@ The following table lists the various fields or the columns in the output CSV fi
* - VGPR_Count
- Kernel's Vector General Purpose Register (VGPR) count.
Kernel Filtering
+++++++++++++++++
rocprofv3 supports kernel filtering for profiling. A kernel filter is a set of a regex string (to include the kernels matching this filter), a regex string (to exclude the kernels matching this filter),
and an iteration range (set of iterations of the included kernels). If the iteration range is not provided then all iterations of the included kernels are profiled.
.. code-block:: shell
$ cat input.yml
jobs:
- pmc: [SQ_WAVES]
kernel_include_regex: "divide"
kernel_exclude_regex: ""
.. code-block:: shell
rocprofv3 -i input.yml -- <app_relative_path>
$ cat pass_1/312_counter_collection.csv
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value"
4,4,1,1,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
8,8,1,2,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
12,12,1,3,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
16,16,1,4,36499,36499,1048576,"divide_kernel(float*, float const*, float const*, int, int)",64,0,0,12,16,"SQ_WAVES",16384
Output formats
----------------
+24 -6
Просмотреть файл
@@ -1,16 +1,24 @@
.. meta::
:description: Documentation of the installation, configuration, use of the ROCProfiler SDK, and rocprofv3 command-line tool
:keywords: ROCProfiler SDK tool, ROCProfiler SDK library, rocprofv3, ROCm, API, reference
:description: Documentation of the installation, configuration, use of the ROCprofiler SDK, and rocprofv3 command-line tool
:keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference
.. _index:
******************************************
ROCProfiler SDK documentation
ROCprofiler-SDK documentation
******************************************
ROCProfiler SDK is a comprehensive library that provides APIs for profiling and tracing HIP applications on AMD ROCm Software. To learn more, see :ref:`what-is-rocprof-sdk`
ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software.
It supports application tracing to provide a big picture of the GPU application execution and kernel profiling to provide low-level hardware details from the performance counters.
The ROCprofiler-SDK library provides runtime-independent APIs for tracing runtime calls and asynchronous activities such as GPU kernel dispatches and memory moves. The tracing includes callback APIs for runtime API tracing and activity APIs for asynchronous activity records logging.
You can access ROCProfiler SDK on our `GitHub repository <https://github.com/ROCm/rocprofiler-sdk>`_.
In summary, ROCprofiler-SDK combines `ROCProfiler <https://rocm.docs.amd.com/projects/rocprofiler/en/latest/index.html>`_ and `ROCTracer <https://rocm.docs.amd.com/projects/roctracer/en/latest/index.html>`_.
You can utilize the ROCprofiler-SDK to develop a tool for profiling and tracing HIP applications on ROCm software.
The code is open and hosted at `<https://github.com/ROCm/rocprofiler-sdk>`_.
.. note::
ROCprofiler-SDK is in beta and subject to change in future releases.
The documentation is structured as follows:
@@ -23,12 +31,22 @@ The documentation is structured as follows:
.. grid-item-card:: How to
* :doc:`Using rocprofv3 <how-to/using-rocprofv3>`
* :ref:`using-rocprofv3`
* :doc:`Samples <how-to/samples>`
.. grid-item-card:: API reference
* :doc:`Buffered services <api-reference/buffered_services>`
* :doc:`Callback services <api-reference/callback_services>`
* :doc:`Counter collection services <api-reference/counter_collection_services>`
* :doc:`Intercept table <api-reference/intercept_table>`
* :doc:`PC sampling <api-reference/pc_sampling>`
* :doc:`Tool library <api-reference/tool_library>`
* :doc:`API library <_doxygen/html/index>`
.. grid-item-card:: Conceptual
* :ref:`comparing-with-legacy-tools`
To contribute to the documentation, refer to
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
+5 -3
Просмотреть файл
@@ -11,7 +11,7 @@ ROCprofiler-SDK is supported only on Linux. The following distributions are test
- OpenSUSE 15.4
- RedHat 8.8
Other [Linux distributions](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-operating-systems) might be supported but not tested yet.
ROCprofiler-SDK might operate as expected on other [Linux distributions](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html#supported-operating-systems), but has not been tested.
### Identifying the operating system
@@ -31,9 +31,11 @@ The relevant fields are `ID` and the `VERSION_ID`.
## Build requirements
Install [CMake](https://cmake.org/) version 3.21 or higher.
Install [CMake](https://cmake.org/) version 3.21 (or later).
**Note:** If the `CMake` installed on the system is too old, you can install a new version using various methods. One of the easiest options is to use PyPi (Pythons pip).
:::{note}
If the `CMake` installed on the system is too old, you can install a new version using various methods. One of the easiest options is to use PyPi (Pythons pip).
:::
```bash
pip install --user 'cmake==3.22.0'
+1 -1
Просмотреть файл
@@ -31,7 +31,7 @@ message "Running doxysphinx"
doxysphinx build ${WORK_DIR} ${WORK_DIR}/_build/html ${WORK_DIR}/_doxygen/html
message "Building html documentation"
make html SPHINXOPTS="-W --keep-going -n"
make html SPHINXOPTS="--keep-going -n"
if [ -d ${SOURCE_DIR}/docs ]; then
message "Removing stale documentation in ${SOURCE_DIR}/docs/"