Files
Gopesh Bhardwaj 43eaa1d127 [Documentation] Quick reference guide for rocprofv3 (#953)
* quick reference guide for rocprofv3

* Addressed feedback and updated with rocpd information

* rocpd docs update

* rocpd query option

* Addressing feedback

* Fixed misssing newline

* Addressing feedback

* Addressing feedback

* Addressing feedback

* Addressing feedback

* Adding process attachment
2025-10-10 09:36:28 +05:30

324 γραμμές
8.3 KiB
ReStructuredText

.. meta::
:description: Quick reference guide for rocprofv3 commands and rocprofiler-sdk tools
:keywords: rocprofv3 quick guide, rocprofiler-sdk quick reference, rocprofv3 commands, ROCprofiler-SDK CLI, GPU profiling quick start
.. _quick-guide:
==============================================
ROCprofiler-SDK Quick Reference Guide
==============================================
This quick reference guide provides an overview of the most commonly used ``rocprofv3`` commands and links to detailed documentation sections.
Getting Started
===============
Export the ROCm binary path:
.. code-block:: bash
source /opt/rocm/share/rocprofiler-sdk/setup-env.sh
Check rocprofv3 version and help:
.. code-block:: bash
rocprofv3 --version
rocprofv3 --help
Essential Commands
==================
Querying System Capabilities
-----------------------------
List available counters and capabilities:
.. code-block:: bash
# List all available features
rocprofv3 --list-avail
# Using the dedicated tool for detailed queries
rocprofv3-avail list
rocprofv3-avail info
**Documentation:** :ref:`using-rocprofv3-avail`
Basic Tracing
-------------
Application tracing (HIP API + kernel dispatches + memory operations):
.. code-block:: bash
# Runtime tracing (recommended for most use cases)
rocprofv3 --runtime-trace -- ./your_app
# System-level tracing (includes HSA API)
rocprofv3 --sys-trace -- ./your_app
**Documentation:** :ref:`using-rocprofv3`
Granular Tracing Options
------------------------
.. code-block:: bash
# HIP API, kernel dispatches, and memory operations tracing
rocprofv3 --hip-trace --kernel-trace --memory-copy-trace -- ./your_app
**Documentation:** :ref:`using-rocprofv3` (Basic tracing section)
Performance Counter Collection
------------------------------
.. code-block:: bash
# List available counters
rocprofv3-avail list --pmc
# Check if counters can be collected together
rocprofv3-avail pmc-check SQ_WAVES SQ_INSTS_VALU
# Collect specific counters
rocprofv3 --pmc SQ_WAVES,SQ_INSTS_VALU -- ./your_app
**Documentation:** :ref:`using-rocprofv3` (Counter collection section)
Advanced Profiling Features
============================
PC Sampling (Beta)
------------------
.. code-block:: bash
# Check PC sampling support
rocprofv3-avail list --pc-sampling
# Enable PC sampling
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-interval 1000 -- ./your_app
**Documentation:** :ref:`using-pc-sampling`
Thread Trace
------------
.. code-block:: bash
# Collect thread trace data
rocprofv3 --att --output-format csv -- ./your_app
**Documentation:** :ref:`using-thread-trace`
Process Attachment
------------------
.. code-block:: bash
# Attach to a running process by PID
rocprofv3 --pid 12345 --runtime-trace -d ./results
# or
# Attach for a specific duration (10 seconds)
rocprofv3 --pid 12345 --runtime-trace --attach-duration-msec 1000
**Documentation:** :ref:`using-rocprofv3-process-attachment`
Output Formats and Post-processing
===================================
rocprofv3 supports multiple output formats for different analysis needs. The default format is ``rocpd``, which stores data in a structured SQLite3 database.
Working with rocpd Database Format
-----------------------------------
.. code-block:: bash
# Generate rocpd database (default format)
rocprofv3 --runtime-trace -- ./your_app
# Creates: hostname/pid_results.db
# Query the database directly with SQL
sqlite3 hostname/12345_results.db "SELECT * FROM regions;"
# Convert rocpd database to other formats
rocpd convert -i *.db -f csv pftrace otf2 --start 20% --end 80%
Collecting and converting to Other Formats
-------------------------------------------
.. code-block:: bash
# Multiple output formats in one run
rocprofv3 --runtime-trace --output-format csv json pftrace otf2 -- ./your_app
**Documentation:** :ref:`using-rocpd-output-format`
Summary and Statistics
----------------------
.. code-block:: bash
# Overall summary statistics per domain grouped by kernel and memory operations
rocprofv3 --runtime-trace --summary-per-domain --summary-groups "KERNEL_DISPATCH|MEMORY_COPY" -- ./your_app
**Documentation:** :ref:`using-rocprofv3` (Post-processing tracing section)
Filtering and Selection
=======================
Kernel Filtering
----------------
.. code-block:: bash
# Include specific kernels by regex
rocprofv3 --kernel-trace --kernel-iteration-range 10-20 --kernel-include-regex "matmul.*" --kernel-exclude-regex ".*copy.*" -- ./your_app
**Documentation:** :ref:`using-rocprofv3` (Filtering section)
Time-based Collection
---------------------
.. code-block:: bash
# Collect for specific time periods (start_delay:collection_time:repeat)
rocprofv3 --runtime-trace --collection-period 500:2000:0 --collection-period-unit msec -- ./your_app
**Documentation:** :ref:`using-rocprofv3` (Filtering section)
Kernel Naming and Display
=========================
.. code-block:: bash
# Keep mangled kernel names
rocprofv3 --kernel-trace --mangled-kernels -- ./your_app
# Truncate kernel names for readability
rocprofv3 --kernel-trace --truncate-kernels -- ./your_app
# Use ROCTx regions to rename kernels
rocprofv3 --kernel-trace --kernel-rename -- ./your_app
**Documentation:** :ref:`using-rocprofv3` (Kernel naming section)
Code Annotation with ROCTx
===========================
.. code-block:: bash
# Trace ROCTx markers and ranges
rocprofv3 --marker-trace -- ./your_app
**Documentation:** :ref:`using-rocprofiler-sdk-roctx`
Parallel and Distributed Applications
======================================
MPI Applications
----------------
.. code-block:: bash
# Profile MPI applications
mpirun -n 4 rocprofv3 --runtime-trace --output-format csv -- ./your_mpi_app
**Documentation:** :ref:`using-rocprofv3-with-mpi`
OpenMP Applications
-------------------
.. code-block:: bash
# Profile OpenMP applications
rocprofv3 --runtime-trace --output-format csv -- ./your_openmp_app
**Documentation:** :ref:`using-rocprofv3-with-openmp`
Output Management
=================
File Organization
-----------------
.. code-block:: bash
# Specify output directory
rocprofv3 --runtime-trace --output-directory ./results --output-file my_trace -- ./your_app
# Generate configuration file
rocprofv3 --runtime-trace --output-config -- ./your_app
**Documentation:** :ref:`using-rocprofv3` (I/O options section)
Common Use Cases
================
Basic Performance Analysis
--------------------------
.. code-block:: bash
# Quick performance overview
rocprofv3 --runtime-trace --summary -- ./your_app
**Use case:** Get a high-level view of application performance
Detailed Kernel Analysis
-------------------------
.. code-block:: bash
# Detailed kernel profiling with counters
rocprofv3 --kernel-trace --pmc SQ_WAVES,SQ_INSTS_VALU,TCP_PERF_SEL_TOTAL_CACHE_ACCESSES -- ./your_app
**Use case:** Analyze specific kernel performance bottlenecks
Memory Transfer Analysis
------------------------
.. code-block:: bash
# Focus on memory operations
rocprofv3 --memory-copy-trace --memory-allocation-trace -- ./your_app
**Use case:** Optimize data movement between CPU and GPU
Timeline Visualization
----------------------
.. code-block:: bash
# Generate timeline for visualization tools
rocprofv3 --runtime-trace -- ./your_app
# Convert to Perfetto format
rocpd2pftrace -i hostname/pid_results.db -o perfetto_trace
**Use case:** Visualize execution timeline in Perfetto or similar tools
Installation and Setup
======================
**Installation Documentation:** :ref:`installing-rocprofiler-sdk`
**API Reference:** :doc:`Tool library <api-reference/tool_library>`
**Samples and Examples:** :doc:`Samples <how-to/samples>`
Troubleshooting Quick Tips
==========================
1. **Permission Issues:** Ensure proper access to GPU devices and ``/dev/kfd``
2. **Counter Collection Fails:** Use ``rocprofv3-avail pmc-check`` to verify counter compatibility
3. **Large Output Files:** Use ``--minimum-output-data`` to set file size thresholds
4. **Signal Handling:** Use ``--disable-signal-handlers`` if conflicts with application handlers
5. **ROCm Path Issues:** Use ``--rocm-root`` to specify custom ROCm installation paths
For comprehensive documentation on each feature, refer to the detailed sections linked throughout this guide.