[Documentation] Quick reference guide for rocprofv3 (#953)
* quick reference guide for rocprofv3 * Addressed feedback and updated with rocpd information * rocpd docs update * rocpd query option * Addressing feedback * Fixed misssing newline * Addressing feedback * Addressing feedback * Addressing feedback * Addressing feedback * Adding process attachment
This commit is contained in:
committad av
GitHub
förälder
182a750c08
incheckning
43eaa1d127
@@ -9,6 +9,10 @@ subtrees:
|
||||
- caption: Install
|
||||
entries:
|
||||
- file: install/installation
|
||||
- caption: Quick Reference
|
||||
entries:
|
||||
- file: quick_guide
|
||||
title: ROCprofiler-SDK Quick Reference Guide
|
||||
- caption: How to
|
||||
entries:
|
||||
- file: how-to/samples
|
||||
|
||||
@@ -49,6 +49,22 @@ The ``rocpd`` database format supports conversion to alternative output formats
|
||||
|
||||
The ``rocpd`` conversion utility is distributed as part of the ROCm installation package, located in ``/opt/rocm-<version>/bin``, and provides both executable and Python module interfaces for programmatic integration.
|
||||
|
||||
**Available rocpd Commands**
|
||||
|
||||
The ``rocpd`` tool provides three main subcommands for different analysis workflows. To see all available options:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocpd --help
|
||||
|
||||
This will display the available subcommands: ``{convert, query, summary}``
|
||||
|
||||
- **convert** - Transform rocpd databases to alternative formats (CSV, OTF2, PFTrace)
|
||||
- **query** - Execute SQL queries against rocpd databases with flexible output options
|
||||
- **summary** - Generate statistical analysis reports equivalent to rocprofv3 summary functionality
|
||||
|
||||
**Format Conversion**
|
||||
|
||||
Invoke the ``rocpd convert`` command with appropriate parameters to transform database files into target formats.
|
||||
|
||||
**CSV Format Conversion:**
|
||||
@@ -143,7 +159,7 @@ Options
|
||||
Specifies shared memory allocation hint for Perfetto inter-process communication in kilobytes (default: 64 KB).
|
||||
|
||||
- ``--group-by-queue``
|
||||
Organizes trace data by HIP stream abstractions rather than low-level HSA queue identifiers, providing higher-level application context for kernel and memory transfer operations.
|
||||
Displays the HSA queues to which these kernel and memory operations were submitted. By default, ``rocprofv3`` shows the HIP streams to which the kernel and memory copy operations were submitted
|
||||
|
||||
**Temporal Filtering Configuration:**
|
||||
|
||||
@@ -200,3 +216,885 @@ Convert multiple databases to all supported formats (CSV, OTF2, and Perfetto tra
|
||||
|
||||
/opt/rocm/bin/rocpd convert -i db{3,4}.db --output-format csv otf2 pftrace
|
||||
|
||||
Dedicated Conversion Tools
|
||||
++++++++++++++++++++++++++
|
||||
|
||||
ROCprofiler-SDK provides specialized conversion utilities for efficient format-specific operations. These tools offer streamlined interfaces for single-format conversions and are particularly useful in automated workflows and scripts.
|
||||
|
||||
rocpd2csv - CSV Export Tool
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Purpose:** Converts rocpd SQLite3 databases to Comma-Separated Values (CSV) format for spreadsheet analysis and data processing workflows.
|
||||
|
||||
**Location:** ``/opt/rocm/bin/rocpd2csv``
|
||||
|
||||
**Syntax:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocpd2csv -i INPUT [INPUT ...] [OPTIONS]
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **Structured Data Export:** Converts hierarchical database content to tabular CSV format
|
||||
- **Multi-Database Support:** Aggregates data from multiple database files into unified CSV output
|
||||
- **Time Window Filtering:** Apply temporal filters to limit exported data range
|
||||
- **Configurable Output:** Customize output file naming and directory structure
|
||||
|
||||
**Usage Examples:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Basic CSV conversion
|
||||
rocpd2csv -i profile_data.db
|
||||
|
||||
# Convert multiple databases with custom output path
|
||||
rocpd2csv -i db1.db db2.db db3.db -d ~/analysis_output/ -o combined_profile
|
||||
|
||||
# Apply time window filtering (export middle 70% of execution)
|
||||
rocpd2csv -i large_profile.db --start 15% --end 85%
|
||||
|
||||
**Common Output Files:**
|
||||
- ``out_hip_api_trace.csv`` - HIP API call trace data
|
||||
- ``out_kernel_trace.csv`` - GPU kernel execution information
|
||||
- ``out_counter_collection.csv`` - Hardware performance counter data
|
||||
|
||||
rocpd2otf2 - Open Trace Format 2 Export
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Purpose:** Generates OTF2 (Open Trace Format 2) files for high-performance trace analysis using tools like Vampir, Tau, and Score-P viewers.
|
||||
|
||||
**Location:** ``/opt/rocm/bin/rocpd2otf2``
|
||||
|
||||
**Syntax:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocpd2otf2 -i INPUT [INPUT ...] [OPTIONS]
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **HPC-Standard Format:** Produces traces compatible with scientific computing analysis tools
|
||||
- **Hierarchical Timeline:** Preserves process/thread/queue relationships in trace structure
|
||||
- **Scalable Storage:** Efficient binary format for large-scale profiling data
|
||||
- **Agent Indexing:** Configurable GPU agent indexing strategies (absolute, relative, type-relative)
|
||||
|
||||
**Usage Examples:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Generate OTF2 trace archive
|
||||
rocpd2otf2 -i gpu_workload.db
|
||||
|
||||
# Multi-process trace with custom indexing
|
||||
rocpd2otf2 -i mpi_rank_*.db --agent-index-value type-relative -o mpi_trace
|
||||
|
||||
# Time-windowed trace export
|
||||
rocpd2otf2 -i long_execution.db --start-marker "computation_begin" --end-marker "computation_end"
|
||||
|
||||
**Output Structure:**
|
||||
- ``trace.otf2`` - Main trace archive containing timeline data
|
||||
- ``trace.def`` - Trace definition file with metadata
|
||||
- Supporting files for multi-stream trace data
|
||||
|
||||
rocpd2pftrace - Perfetto Trace Export
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Purpose:** Converts rocpd databases to Perfetto protocol buffer format for interactive visualization using the Perfetto UI (ui.perfetto.dev).
|
||||
|
||||
**Location:** ``/opt/rocm/bin/rocpd2pftrace``
|
||||
|
||||
**Syntax:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocpd2pftrace -i INPUT [INPUT ...] [OPTIONS]
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **Interactive Visualization:** Optimized for modern web-based trace viewers
|
||||
- **Real-time Analysis:** Supports streaming analysis workflows
|
||||
- **GPU Timeline Integration:** Specialized visualization of GPU execution patterns
|
||||
- **Configurable Backend:** Supports both in-process and system-wide tracing backends
|
||||
|
||||
**Backend Configuration Options:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# In-process backend (default)
|
||||
rocpd2pftrace -i profile.db --perfetto-backend inprocess
|
||||
|
||||
# System-wide tracing backend
|
||||
rocpd2pftrace -i system_profile.db --perfetto-backend system \
|
||||
--perfetto-buffer-size 64MB --perfetto-shmem-size-hint 32MB
|
||||
|
||||
**Buffer Management:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Ring buffer mode (overwrites old data)
|
||||
rocpd2pftrace -i continuous_profile.db --perfetto-buffer-fill-policy ring_buffer
|
||||
|
||||
# Discard mode (stops recording when full)
|
||||
rocpd2pftrace -i bounded_profile.db --perfetto-buffer-fill-policy discard
|
||||
|
||||
**Usage Examples:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Basic Perfetto trace generation
|
||||
rocpd2pftrace -i application.db
|
||||
|
||||
# High-throughput configuration
|
||||
rocpd2pftrace -i heavy_workload.db --perfetto-buffer-size 128MB \
|
||||
--perfetto-buffer-fill-policy ring_buffer
|
||||
|
||||
# Multi-queue analysis
|
||||
rocpd2pftrace -i multi_stream.db --group-by-queue -o queue_analysis
|
||||
|
||||
**Visualization Workflow:**
|
||||
1. Generate ``.perfetto-trace`` file using ``rocpd2pftrace``
|
||||
2. Open https://ui.perfetto.dev in web browser
|
||||
3. Load generated trace file for interactive analysis
|
||||
|
||||
rocpd2summary - Statistical Analysis Tool
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Purpose:** Generates comprehensive statistical summaries and performance analysis reports from rocpd profiling data.
|
||||
|
||||
**Location:** ``/opt/rocm/bin/rocpd2summary``
|
||||
|
||||
**Syntax:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocpd2summary -i INPUT [INPUT ...] [OPTIONS]
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **Multi-Format Output:** Supports console, CSV, HTML, JSON, Markdown, and PDF report generation
|
||||
- **Comprehensive Statistics:** Kernel execution times, API call frequencies, memory transfer analysis
|
||||
- **Domain-Specific Analysis:** Separate summaries for HIP, ROCr, Markers, and other trace domains
|
||||
- **Rank-Based Analysis:** Per-process and per-rank performance breakdowns for MPI applications
|
||||
- **Configurable Scope:** Selective inclusion/exclusion of analysis categories
|
||||
|
||||
**Output Format Options:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Console output (default)
|
||||
rocpd2summary -i profile.db
|
||||
|
||||
# CSV format for data analysis
|
||||
rocpd2summary -i profile.db --format csv -o performance_metrics
|
||||
|
||||
# HTML report with visualization
|
||||
rocpd2summary -i profile.db --format html -d ~/reports/
|
||||
|
||||
# Multiple output formats
|
||||
rocpd2summary -i profile.db --format csv html json
|
||||
|
||||
**Analysis Categories:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Include all available domains
|
||||
rocpd2summary -i profile.db --region-categories HIP HSA MARKERS KERNEL
|
||||
|
||||
# Focus on GPU kernel analysis only
|
||||
rocpd2summary -i profile.db --region-categories KERNEL
|
||||
|
||||
# Exclude markers to speed up processing
|
||||
rocpd2summary -i profile.db --region-categories HIP HSA KERNEL
|
||||
|
||||
**Advanced Analysis Options:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Include domain-specific statistics
|
||||
rocpd2summary -i multi_gpu.db --domain-summary
|
||||
|
||||
# Per-rank analysis for MPI applications
|
||||
rocpd2summary -i mpi_profile_*.db --summary-by-rank --format html
|
||||
|
||||
# Time-windowed summary analysis
|
||||
rocpd2summary -i long_run.db --start 25% --end 75% --format csv
|
||||
|
||||
**Report Content:**
|
||||
- **Kernel Statistics:** Execution time distributions, call frequencies, grid/block sizes
|
||||
- **API Timing:** HIP/HSA function call durations and frequencies
|
||||
- **Memory Analysis:** Transfer patterns, bandwidth utilization, allocation statistics
|
||||
- **Device Utilization:** GPU occupancy patterns and idle time analysis
|
||||
- **Synchronization Overhead:** Barrier and synchronization point analysis
|
||||
|
||||
**Output Files:**
|
||||
- ``kernels_summary.{format}`` - GPU kernel execution summary
|
||||
- ``hip_summary.{format}`` - HIP API call statistics
|
||||
- ``hsa_summary.{format}`` - HSA runtime API analysis
|
||||
- ``memory_summary.{format}`` - Memory operation statistics
|
||||
- ``markers_summary.{format}`` - Marker event analysis
|
||||
|
||||
Summary
|
||||
+++++++
|
||||
|
||||
The ``rocpd summary`` command provides statistical analysis and performance summaries equivalent to the summary functionality available in ``rocprofv3``. This command generates comprehensive reports from rocpd database files, offering the same analytical capabilities that were previously available through ``rocprofv3 --summary`` but now operating on the structured database format.
|
||||
|
||||
**Purpose:** Generate statistical summaries and performance reports from rocpd database files, providing equivalent functionality to rocprofv3's built-in summary capabilities.
|
||||
|
||||
**Location:** ``/opt/rocm/bin/rocpd summary``
|
||||
|
||||
**Syntax:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocpd summary -i INPUT [INPUT ...] [OPTIONS]
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **Compatible Analysis:** Provides the same summary statistics and reports as ``rocprofv3 --summary``
|
||||
- **Database-Driven:** Operates on structured rocpd database files for consistent, reproducible analysis
|
||||
- **Multi-Database Aggregation:** Combine and analyze data from multiple profiling sessions, ranks, or nodes in a single operation
|
||||
- **Comparative Analysis:** Use ``--summary-by-rank`` to compare performance across different ranks, nodes, or execution contexts
|
||||
- **Flexible Output:** Generate summaries in multiple formats (console, CSV, HTML, JSON)
|
||||
- **Selective Reporting:** Focus on specific performance domains and categories
|
||||
|
||||
**Multi-Database Analysis Benefits**
|
||||
|
||||
The ``rocpd summary`` command excels at aggregating multiple database files, providing capabilities not available with single-session analysis:
|
||||
|
||||
**Unified Summary Reports:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Aggregate multiple databases into single comprehensive summary
|
||||
rocpd summary -i session1.db session2.db session3.db --format html -o unified_summary
|
||||
|
||||
# Combine all MPI rank databases for overall application analysis
|
||||
rocpd summary -i rank_*.db --format csv -o mpi_application_summary
|
||||
|
||||
# Time-series aggregation across multiple profiling runs
|
||||
rocpd summary -i daily_profile_*.db --format json -o weekly_performance_trends
|
||||
|
||||
**Rank-by-Rank Comparative Analysis:**
|
||||
|
||||
The ``--summary-by-rank`` option enables detailed comparative analysis, allowing you to identify performance variations, load balancing issues, and optimization opportunities across different execution contexts:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Compare performance across MPI ranks
|
||||
rocpd summary -i rank_0.db rank_1.db rank_2.db rank_3.db --summary-by-rank --format html -o rank_comparison
|
||||
|
||||
# Analyze multi-node performance characteristics
|
||||
rocpd summary -i node_*.db --summary-by-rank --format csv -o node_performance_analysis
|
||||
|
||||
# Compare GPU device performance in multi-GPU applications
|
||||
rocpd summary -i gpu_0.db gpu_1.db gpu_2.db gpu_3.db --summary-by-rank --format json -o gpu_scaling_analysis
|
||||
|
||||
**Use Cases for Multi-Database Summary Analysis:**
|
||||
|
||||
**1. MPI Application Performance Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile distributed MPI application
|
||||
mpirun -np 8 rocprofv3 --hip-trace --output-format rocpd -- mpi_simulation
|
||||
|
||||
# Generate unified summary for overall application performance
|
||||
rocpd summary -i results_rank_*.db --format html -o application_overview
|
||||
|
||||
# Identify load balancing issues with rank-by-rank comparison
|
||||
rocpd summary -i results_rank_*.db --summary-by-rank --format csv -o load_balance_analysis
|
||||
|
||||
**2. Multi-GPU Scaling Studies:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile scaling from 1 to 4 GPUs
|
||||
for gpus in 1 2 4; do
|
||||
rocprofv3 --hip-trace --device 0:$((gpus-1)) --output-format rocpd -o "scaling_${gpus}gpu.db" -- gpu_benchmark
|
||||
done
|
||||
|
||||
# Aggregate scaling analysis
|
||||
rocpd summary -i scaling_*gpu.db --format html -o gpu_scaling_summary
|
||||
|
||||
# Compare efficiency across different GPU counts
|
||||
rocpd summary -i scaling_*gpu.db --summary-by-rank --format json -o scaling_efficiency
|
||||
|
||||
**3. Performance Regression Testing:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile baseline and optimized versions
|
||||
rocprofv3 --hip-trace --output-format rocpd -o baseline.db -- application_v1
|
||||
rocprofv3 --hip-trace --output-format rocpd -o optimized.db -- application_v2
|
||||
|
||||
# Generate unified performance comparison
|
||||
rocpd summary -i baseline.db optimized.db --summary-by-rank --format html -o regression_analysis
|
||||
|
||||
**4. Cross-Platform Performance Comparison:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile on different hardware platforms
|
||||
rocprofv3 --hip-trace --output-format rocpd -o platform_A.db -- benchmark
|
||||
rocprofv3 --hip-trace --output-format rocpd -o platform_B.db -- benchmark
|
||||
|
||||
# Compare platform performance characteristics
|
||||
rocpd summary -i platform_*.db --summary-by-rank --format csv -o platform_comparison
|
||||
|
||||
**Advanced Summary Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Cross-rank summary for MPI applications with domain focus
|
||||
rocpd summary -i rank_*.db --summary-by-rank --region-categories KERNEL HIP --format html
|
||||
|
||||
# Time-windowed multi-database analysis
|
||||
rocpd summary -i profile_*.db --start 25% --end 75% --summary-by-rank
|
||||
|
||||
# Domain-specific comparative analysis
|
||||
rocpd summary -i node_*.db --domain-summary --summary-by-rank --region-categories HIP ROCR
|
||||
|
||||
**Output Interpretation:**
|
||||
|
||||
- **Unified Summaries:** Provide aggregate statistics across all input databases, showing combined performance metrics
|
||||
- **Rank-by-Rank Summaries:** Generate separate statistical reports for each input database, enabling direct comparison of performance characteristics
|
||||
- **Comparative Metrics:** Highlight performance variations, identify outliers, and reveal load balancing opportunities
|
||||
|
||||
**Integration with rocprofv3 Workflow:**
|
||||
|
||||
The ``rocpd summary`` command maintains full compatibility with ``rocprofv3`` summary analysis while extending capabilities to multi-database scenarios. Users familiar with ``rocprofv3 --summary`` will find identical statistical outputs and report formats when using ``rocpd summary`` on database files, with the added benefit of cross-session analysis capabilities.
|
||||
|
||||
For detailed information about summary statistics and report interpretation, see :ref:`using-rocprofv3-summary`.
|
||||
|
||||
Aggregating rocpd Data
|
||||
++++++++++++++++++++++
|
||||
|
||||
One of the key advantages of the ``rocpd`` format is its ability to aggregate and analyze data from multiple profiling sessions, ranks, or nodes within a unified framework. This capability enables comprehensive analysis workflows that were not possible with previous output formats.
|
||||
|
||||
**Multi-Database Analysis Capabilities**
|
||||
|
||||
Unlike the Perfetto output format used in earlier versions, ``rocpd`` databases can be seamlessly combined for cross-session analysis:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Aggregate analysis across multiple profiling sessions
|
||||
rocpd query -i session1.db session2.db session3.db \
|
||||
--query "SELECT name, AVG(duration) FROM kernels GROUP BY name"
|
||||
|
||||
# Cross-rank performance comparison for MPI applications
|
||||
rocpd summary -i rank_0.db rank_1.db rank_2.db rank_3.db --summary-by-rank
|
||||
|
||||
# Multi-node scaling analysis
|
||||
rocpd query -i node_*.db \
|
||||
--query "SELECT COUNT(*) as total_kernels, SUM(duration) as total_time FROM kernels"
|
||||
|
||||
**Distributed Computing Workflows**
|
||||
|
||||
**MPI Application Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile MPI application across multiple ranks
|
||||
mpirun -np 4 rocprofv3 --hip-trace --output-format rocpd -- mpi_application
|
||||
|
||||
# Generate aggregated performance summary
|
||||
rocpd summary -i results_rank_*.db --summary-by-rank --format html -o mpi_performance_report
|
||||
|
||||
# Analyze load balancing across ranks
|
||||
rocpd query -i results_rank_*.db \
|
||||
--query "SELECT pid, COUNT(*) as kernel_count, AVG(duration) as avg_duration FROM kernels GROUP BY pid"
|
||||
|
||||
**Multi-GPU Scaling Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile application with multiple GPU devices
|
||||
rocprofv3 --hip-trace --device 0,1,2,3 --output-format rocpd -- multi_gpu_app
|
||||
|
||||
# Aggregate device utilization analysis
|
||||
rocpd query -i multi_gpu_results.db \
|
||||
--query "SELECT agent_abs_index as device_id, COUNT(*) as operations, SUM(duration) as total_time FROM kernels GROUP BY device_id"
|
||||
|
||||
# Cross-device performance comparison
|
||||
rocpd summary -i multi_gpu_results.db --domain-summary
|
||||
|
||||
**Temporal Aggregation**
|
||||
|
||||
**Time-Series Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Collect profiles over time for performance monitoring
|
||||
for hour in {1..24}; do
|
||||
rocprofv3 --hip-trace --output-format rocpd -o "profile_hour_$hour.db" -- application
|
||||
done
|
||||
|
||||
# Analyze performance trends over time
|
||||
rocpd query -i profile_hour_*.db \
|
||||
--query "SELECT AVG(duration) as avg_kernel_time, COUNT(*) as kernel_count FROM kernels" \
|
||||
--format csv -o performance_trends
|
||||
|
||||
**Comparative Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Compare baseline vs optimized performance
|
||||
rocpd query -i baseline.db optimized.db \
|
||||
--query "SELECT kernel, AVG(duration) as avg_time FROM kernels GROUP BY name ORDER BY avg_time DESC"
|
||||
|
||||
# Generate comparative summary reports
|
||||
rocpd summary -i baseline.db optimized.db --format html -o comparison_report
|
||||
|
||||
**Data Aggregation Benefits**
|
||||
|
||||
- **Unified Analysis:** Combine data from different execution contexts, hardware configurations, and time periods
|
||||
- **Scalability Insights:** Analyze performance scaling across multiple nodes, ranks, or GPU devices
|
||||
- **Trend Analysis:** Track performance evolution over time or across different software versions
|
||||
- **Load Balancing:** Identify performance bottlenecks and load distribution issues in distributed applications
|
||||
- **Cross-Platform Comparison:** Compare performance across different hardware platforms using unified database schema
|
||||
|
||||
The aggregation capabilities of ``rocpd`` format enable sophisticated analysis workflows that provide deeper insights into application performance characteristics across diverse computing environments.
|
||||
|
||||
Tool Integration and Workflow Examples
|
||||
+++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
**Multi-Format Analysis Pipeline:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Generate all analysis formats for comprehensive review
|
||||
rocpd2csv -i profile.db -o analysis_data
|
||||
rocpd2summary -i profile.db --format html -o performance_report
|
||||
rocpd2pftrace -i profile.db -o interactive_trace
|
||||
|
||||
**Automated Performance Monitoring:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#!/bin/bash
|
||||
# Performance analysis automation script
|
||||
|
||||
PROFILE_DB="$1"
|
||||
OUTPUT_DIR="analysis_$(date +%Y%m%d_%H%M%S)"
|
||||
|
||||
mkdir -p "$OUTPUT_DIR"
|
||||
|
||||
# Generate CSV data for automated analysis
|
||||
rocpd2csv -i "$PROFILE_DB" -d "$OUTPUT_DIR" -o raw_data
|
||||
|
||||
# Create summary reports
|
||||
rocpd2summary -i "$PROFILE_DB" --format csv html \
|
||||
-d "$OUTPUT_DIR" -o performance_summary
|
||||
|
||||
# Generate interactive trace for detailed investigation
|
||||
rocpd2pftrace -i "$PROFILE_DB" -d "$OUTPUT_DIR" -o interactive_trace
|
||||
|
||||
|
||||
Query
|
||||
+++++
|
||||
|
||||
The ``rocpd query`` command provides powerful SQL-based analysis capabilities for exploring and extracting data from rocpd databases. This tool enables custom analysis workflows, automated reporting, and integration with external analysis pipelines.
|
||||
|
||||
rocpd query - SQL Query Engine
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Purpose:** Execute custom SQL queries against rocpd databases with support for multiple output formats, automated reporting, and email delivery.
|
||||
|
||||
**Location:** ``/opt/rocm/bin/rocpd query``
|
||||
|
||||
**Syntax:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocpd query -i INPUT [INPUT ...] --query "SQL_STATEMENT" [OPTIONS]
|
||||
|
||||
**Key Features:**
|
||||
|
||||
- **Standard SQL Support:** Full SQLite3 SQL syntax including JOINs, aggregate functions, and complex WHERE clauses
|
||||
- **Multi-Database Aggregation:** Query across multiple database files as unified virtual database
|
||||
- **Multiple Output Formats:** Console, CSV, HTML, JSON, Markdown, PDF, and interactive dashboards
|
||||
- **Script Execution:** Execute complex SQL scripts with view definitions and custom functions
|
||||
- **Automated Reporting:** Email delivery with SMTP configuration and attachment management
|
||||
- **Time Window Integration:** Apply temporal filtering before query execution
|
||||
|
||||
Database Schema and Views
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
rocpd databases provide comprehensive views for analysis. In general, any queries should be built using the `data_views`:
|
||||
|
||||
**Core Data Views:**
|
||||
|
||||
.. code-block:: sql
|
||||
|
||||
-- System and hardware information
|
||||
SELECT * FROM rocpd_info_agents;
|
||||
SELECT * FROM rocpd_info_node;
|
||||
|
||||
-- Kernel execution data
|
||||
SELECT * FROM kernels;
|
||||
SELECT * FROM top_kernels;
|
||||
|
||||
-- API trace information
|
||||
SELECT * FROM regions_and_samples WHERE category LIKE 'HIP_%';
|
||||
SELECT * FROM regions_and_samples WHERE category LIKE 'RCCL_%;
|
||||
|
||||
-- Performance counters
|
||||
SELECT * FROM counters_collection;
|
||||
|
||||
-- Memory operations
|
||||
SELECT * FROM memory_copies;
|
||||
SELECT * FROM memory_allocations;
|
||||
|
||||
-- Process and thread information
|
||||
SELECT * FROM processes;
|
||||
SELECT * FROM threads;
|
||||
|
||||
-- Marker and region data
|
||||
SELECT * FROM regions;
|
||||
SELECT * FROM regions_and_samples WHERE category LIKE 'MARKERS_%';
|
||||
|
||||
**Summary and Analysis Views:**
|
||||
|
||||
.. code-block:: sql
|
||||
|
||||
-- Top performing kernels by execution time
|
||||
SELECT * FROM top_kernels LIMIT 10;
|
||||
|
||||
-- Top Analysis
|
||||
SELECT * FROM top;
|
||||
|
||||
-- Busy Analysis
|
||||
SELECT * FROM busy;
|
||||
|
||||
Basic Query Examples
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Simple Data Exploration:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# List available GPU agents
|
||||
rocpd query -i profile.db --query "SELECT * FROM rocpd_info_agents"
|
||||
|
||||
# Show top 10 longest-running kernels
|
||||
rocpd query -i profile.db --query "SELECT name, duration FROM kernels ORDER BY duration DESC LIMIT 10"
|
||||
|
||||
# Count total number of kernel dispatches
|
||||
rocpd query -i profile.db --query "SELECT COUNT(*) as total_kernels FROM kernels"
|
||||
|
||||
**Multi-Database Aggregation:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Combine data from multiple profiling sessions
|
||||
rocpd query -i session1.db session2.db session3.db \
|
||||
--query "SELECT pid, COUNT(*) as kernel_count FROM kernels GROUP BY pid"
|
||||
|
||||
# Cross-session performance comparison
|
||||
rocpd query -i baseline.db optimized.db \
|
||||
--query "SELECT name as kernel_name, AVG(duration) as avg_duration FROM kernels GROUP BY kernel_name"
|
||||
|
||||
**Advanced Analytics:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Kernel performance analysis with statistics
|
||||
rocpd query -i profile.db --query "
|
||||
SELECT
|
||||
name as kernel_name,
|
||||
COUNT(*) as dispatch_count,
|
||||
MIN(duration) as min_duration,
|
||||
AVG(duration) as avg_duration,
|
||||
MAX(duration) as max_duration,
|
||||
SUM(duration) as total_duration
|
||||
FROM kernels
|
||||
GROUP BY kernel_name
|
||||
ORDER BY total_duration DESC"
|
||||
|
||||
**Memory Transfer Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Memory copy analysis by direction
|
||||
rocpd query -i profile.db --query "
|
||||
SELECT
|
||||
name as kernel_name,
|
||||
src_agent_type,
|
||||
src_agent_abs_index,
|
||||
dst_agent_type,
|
||||
dst_agent_abs_index,
|
||||
COUNT(*) as transfer_count,
|
||||
SUM(size) as total_bytes,
|
||||
SUM(duration) as total_duration
|
||||
FROM memory_copies
|
||||
GROUP BY src_agent_abs_index
|
||||
ORDER BY total_bytes DESC"
|
||||
|
||||
Output Format Options
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Console Output (Default):**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Display results in terminal
|
||||
rocpd query -i profile.db --query "SELECT * FROM top_kernels LIMIT 5"
|
||||
|
||||
**CSV Export for Data Analysis:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Export to CSV file
|
||||
rocpd query -i profile.db --query "SELECT * FROM kernels" --format csv -o kernel_analysis
|
||||
|
||||
# Specify custom output directory
|
||||
rocpd query -i profile.db --query "SELECT * FROM kernels" --format csv -d ~/analysis/ -o kernel_data
|
||||
|
||||
**HTML Reports:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Generate HTML table
|
||||
rocpd query -i profile.db --query "SELECT * FROM top_kernels" --format html -o performance_report
|
||||
|
||||
**Interactive Dashboard:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Create interactive HTML dashboard
|
||||
rocpd query -i profile.db --query "SELECT * FROM device_utilization" --format dashboard -o utilization_dashboard
|
||||
|
||||
# Use custom dashboard template
|
||||
rocpd query -i profile.db --query "SELECT * FROM kernels" --format dashboard \
|
||||
--template-path ~/templates/custom_dashboard.html -o custom_report
|
||||
|
||||
**JSON for Programmatic Integration:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Export structured JSON data
|
||||
rocpd query -i profile.db --query "SELECT * FROM counters_collection" --format json -o counter_data
|
||||
|
||||
**PDF Reports:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Generate PDF report with monospace formatting
|
||||
rocpd query -i profile.db --query "SELECT name, duration FROM top_kernels" --format pdf -o kernel_report
|
||||
|
||||
Script-Based Analysis
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Execute complex SQL scripts with view definitions and custom analysis logic:
|
||||
|
||||
**SQL Script Example (analysis.sql):**
|
||||
|
||||
.. code-block:: sql
|
||||
|
||||
-- Create temporary views for complex analysis
|
||||
CREATE TEMP VIEW kernel_stats AS
|
||||
SELECT
|
||||
name as kernel_name,
|
||||
COUNT(*) as dispatch_count,
|
||||
AVG(duration) as avg_duration,
|
||||
STDDEV(duration) as duration_stddev
|
||||
FROM kernels
|
||||
GROUP BY kernel_name;
|
||||
|
||||
CREATE TEMP VIEW performance_outliers AS
|
||||
SELECT k.*, ks.avg_duration, ks.duration_stddev
|
||||
FROM kernels k
|
||||
JOIN kernel_stats ks ON k.name = ks.name
|
||||
WHERE ABS(k.duration - ks.avg_duration) > 2 * ks.duration_stddev;
|
||||
|
||||
**Execute Script with Query:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Run script then execute query
|
||||
rocpd query -i profile.db --script analysis.sql \
|
||||
--query "SELECT * FROM performance_outliers" --format html -o outlier_analysis
|
||||
|
||||
Time Window Integration
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Apply temporal filtering before query execution:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Query only middle 50% of execution timeline
|
||||
rocpd query -i profile.db --start 25% --end 75% \
|
||||
--query "SELECT COUNT(*) as kernel_count FROM kernels"
|
||||
|
||||
# Use marker-based time windows
|
||||
rocpd query -i profile.db --start-marker "computation_begin" --end-marker "computation_end" \
|
||||
--query "SELECT * FROM kernels ORDER BY start_time"
|
||||
|
||||
# Absolute timestamp filtering
|
||||
rocpd query -i profile.db --start 1000000000 --end 2000000000 \
|
||||
--query "SELECT * FROM kernels WHERE start_time BETWEEN 1000000000 AND 2000000000"
|
||||
|
||||
Automated Email Reporting
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Basic Email Delivery:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Send CSV report via email
|
||||
rocpd query -i profile.db --query "SELECT * FROM top_kernels" --format csv \
|
||||
--email-to analyst@company.com --email-from profiler@company.com \
|
||||
--email-subject "Weekly Performance Report"
|
||||
|
||||
**Advanced Email Configuration:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Multiple recipients with SMTP authentication
|
||||
rocpd query -i profile.db --query "SELECT * FROM device_utilization" --format html \
|
||||
--email-to "team@company.com,manager@company.com" \
|
||||
--email-from profiler@company.com \
|
||||
--email-subject "GPU Utilization Analysis" \
|
||||
--smtp-server smtp.company.com --smtp-port 587 \
|
||||
--smtp-user profiler@company.com --smtp-password $(cat ~/.smtp_pass) \
|
||||
--inline-preview --zip-attachments
|
||||
|
||||
**Dashboard Email Reports:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Send interactive dashboard via email
|
||||
rocpd query -i profile.db --query "SELECT * FROM kernels" --format dashboard \
|
||||
--template-path ~/templates/executive_summary.html \
|
||||
--email-to executives@company.com --email-from profiler@company.com \
|
||||
--email-subject "Executive Performance Dashboard" \
|
||||
--inline-preview
|
||||
|
||||
Integration Workflows
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
**Automated Analysis Pipeline:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#!/bin/bash
|
||||
# Automated reporting script
|
||||
|
||||
DB_FILE="$1"
|
||||
REPORT_DATE=$(date +%Y-%m-%d)
|
||||
|
||||
# Generate multiple analysis reports
|
||||
rocpd query -i "$DB_FILE" --query "SELECT * FROM top_kernels LIMIT 20" \
|
||||
--format html -o "top_kernels_$REPORT_DATE"
|
||||
|
||||
rocpd query -i "$DB_FILE" --query "SELECT * FROM memory_copy_summary" \
|
||||
--format csv -o "memory_analysis_$REPORT_DATE"
|
||||
|
||||
rocpd query -i "$DB_FILE" --query "SELECT * FROM device_utilization" \
|
||||
--format dashboard -o "utilization_dashboard_$REPORT_DATE" \
|
||||
--email-to team@company.com --email-from automation@company.com
|
||||
|
||||
**Performance Regression Detection:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Compare current performance against baseline
|
||||
rocpd query -i baseline.db current.db --script performance_comparison.sql \
|
||||
--query "SELECT * FROM performance_regression_analysis" \
|
||||
--format html -o regression_report \
|
||||
--email-to devteam@company.com --email-from ci@company.com \
|
||||
--email-subject "Performance Regression Analysis"
|
||||
|
||||
**Custom Analysis Functions:**
|
||||
|
||||
rocpd databases support custom SQL functions for advanced analysis:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Use built-in rocpd functions
|
||||
rocpd query -i profile.db --query "
|
||||
SELECT
|
||||
name,
|
||||
rocpd_get_string(name_id, 0, nid, pid) as full_kernel_name,
|
||||
duration
|
||||
FROM kernels
|
||||
WHERE rocpd_get_string(name_id, 0, nid, pid) LIKE '%gemm%'"
|
||||
|
||||
rocpd query Command-Line Reference
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
usage: rocpd query [-h] -i INPUT [INPUT ...] --query QUERY [--script SCRIPT]
|
||||
[--format {console,csv,html,json,md,pdf,dashboard,clipboard}]
|
||||
[-o OUTPUT_FILE] [-d OUTPUT_PATH]
|
||||
[--email-to EMAIL_TO] [--email-from EMAIL_FROM]
|
||||
[--email-subject EMAIL_SUBJECT] [--smtp-server SMTP_SERVER]
|
||||
[--smtp-port SMTP_PORT] [--smtp-user SMTP_USER]
|
||||
[--smtp-password SMTP_PASSWORD] [--zip-attachments]
|
||||
[--inline-preview] [--template-path TEMPLATE_PATH]
|
||||
[--start START | --start-marker START_MARKER]
|
||||
[--end END | --end-marker END_MARKER]
|
||||
|
||||
**Required Arguments:**
|
||||
|
||||
- ``-i INPUT [INPUT ...]``, ``--input INPUT [INPUT ...]``
|
||||
Input database file paths. Multiple databases are merged into unified view.
|
||||
|
||||
- ``--query QUERY``
|
||||
SQL SELECT statement to execute. Enclose complex queries in quotes.
|
||||
|
||||
**Query Options:**
|
||||
|
||||
- ``--script SCRIPT``
|
||||
SQL script file to execute before running the main query. Useful for creating views and functions.
|
||||
|
||||
- ``--format {console,csv,html,json,md,pdf,dashboard,clipboard}``
|
||||
Output format (default: console). Dashboard format creates interactive HTML reports.
|
||||
|
||||
**Output Configuration:**
|
||||
|
||||
- ``-o OUTPUT_FILE``, ``--output-file OUTPUT_FILE``
|
||||
Base filename for exported files.
|
||||
|
||||
- ``-d OUTPUT_PATH``, ``--output-path OUTPUT_PATH``
|
||||
Output directory path.
|
||||
|
||||
- ``--template-path TEMPLATE_PATH``
|
||||
Jinja2 template file for dashboard format customization.
|
||||
|
||||
**Email Reporting:**
|
||||
|
||||
- ``--email-to EMAIL_TO``
|
||||
Recipient email addresses (comma-separated for multiple recipients).
|
||||
|
||||
- ``--email-from EMAIL_FROM``
|
||||
Sender email address (required when using email delivery).
|
||||
|
||||
- ``--email-subject EMAIL_SUBJECT``
|
||||
Email subject line.
|
||||
|
||||
- ``--smtp-server SMTP_SERVER``, ``--smtp-port SMTP_PORT``
|
||||
SMTP server configuration (default: localhost:25).
|
||||
|
||||
- ``--smtp-user SMTP_USER``, ``--smtp-password SMTP_PASSWORD``
|
||||
SMTP authentication credentials.
|
||||
|
||||
- ``--zip-attachments``
|
||||
Bundle all attachments into single ZIP file.
|
||||
|
||||
- ``--inline-preview``
|
||||
Embed HTML reports as email body content.
|
||||
|
||||
**Time Window Filtering:**
|
||||
|
||||
- ``--start START``, ``--end END``
|
||||
Temporal boundaries using percentage (e.g., 25%) or absolute timestamps.
|
||||
|
||||
- ``--start-marker START_MARKER``, ``--end-marker END_MARKER``
|
||||
Named marker events defining time window boundaries.
|
||||
|
||||
The ``rocpd query`` tool provides comprehensive SQL-based analysis capabilities, enabling custom workflows and automated reporting for GPU profiling data analysis.
|
||||
|
||||
**Documentation:** :ref:`using-rocpd-output-format` (SQL Schema Reference), :ref:`using-rocprofv3` (Marker Integration)
|
||||
|
||||
@@ -28,6 +28,10 @@ The documentation is structured as follows:
|
||||
|
||||
* :ref:`installing-rocprofiler-sdk`
|
||||
|
||||
.. grid-item-card:: Quick Reference
|
||||
|
||||
* :ref:`quick-guide`
|
||||
|
||||
.. grid-item-card:: How to
|
||||
|
||||
* :doc:`Samples <how-to/samples>`
|
||||
|
||||
@@ -0,0 +1,323 @@
|
||||
.. meta::
|
||||
:description: Quick reference guide for rocprofv3 commands and rocprofiler-sdk tools
|
||||
:keywords: rocprofv3 quick guide, rocprofiler-sdk quick reference, rocprofv3 commands, ROCprofiler-SDK CLI, GPU profiling quick start
|
||||
|
||||
.. _quick-guide:
|
||||
|
||||
==============================================
|
||||
ROCprofiler-SDK Quick Reference Guide
|
||||
==============================================
|
||||
|
||||
This quick reference guide provides an overview of the most commonly used ``rocprofv3`` commands and links to detailed documentation sections.
|
||||
|
||||
Getting Started
|
||||
===============
|
||||
|
||||
Export the ROCm binary path:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
source /opt/rocm/share/rocprofiler-sdk/setup-env.sh
|
||||
|
||||
Check rocprofv3 version and help:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --version
|
||||
rocprofv3 --help
|
||||
|
||||
Essential Commands
|
||||
==================
|
||||
|
||||
Querying System Capabilities
|
||||
-----------------------------
|
||||
|
||||
List available counters and capabilities:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# List all available features
|
||||
rocprofv3 --list-avail
|
||||
|
||||
# Using the dedicated tool for detailed queries
|
||||
rocprofv3-avail list
|
||||
rocprofv3-avail info
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3-avail`
|
||||
|
||||
Basic Tracing
|
||||
-------------
|
||||
|
||||
Application tracing (HIP API + kernel dispatches + memory operations):
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Runtime tracing (recommended for most use cases)
|
||||
rocprofv3 --runtime-trace -- ./your_app
|
||||
|
||||
# System-level tracing (includes HSA API)
|
||||
rocprofv3 --sys-trace -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3`
|
||||
|
||||
Granular Tracing Options
|
||||
------------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# HIP API, kernel dispatches, and memory operations tracing
|
||||
rocprofv3 --hip-trace --kernel-trace --memory-copy-trace -- ./your_app
|
||||
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3` (Basic tracing section)
|
||||
|
||||
Performance Counter Collection
|
||||
------------------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# List available counters
|
||||
rocprofv3-avail list --pmc
|
||||
|
||||
# Check if counters can be collected together
|
||||
rocprofv3-avail pmc-check SQ_WAVES SQ_INSTS_VALU
|
||||
|
||||
# Collect specific counters
|
||||
rocprofv3 --pmc SQ_WAVES,SQ_INSTS_VALU -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3` (Counter collection section)
|
||||
|
||||
Advanced Profiling Features
|
||||
============================
|
||||
|
||||
PC Sampling (Beta)
|
||||
------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Check PC sampling support
|
||||
rocprofv3-avail list --pc-sampling
|
||||
|
||||
# Enable PC sampling
|
||||
rocprofv3 --pc-sampling-beta-enabled --pc-sampling-interval 1000 -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-pc-sampling`
|
||||
|
||||
Thread Trace
|
||||
------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Collect thread trace data
|
||||
rocprofv3 --att --output-format csv -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-thread-trace`
|
||||
|
||||
Process Attachment
|
||||
------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Attach to a running process by PID
|
||||
rocprofv3 --pid 12345 --runtime-trace -d ./results
|
||||
# or
|
||||
|
||||
# Attach for a specific duration (10 seconds)
|
||||
rocprofv3 --pid 12345 --runtime-trace --attach-duration-msec 1000
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3-process-attachment`
|
||||
|
||||
Output Formats and Post-processing
|
||||
===================================
|
||||
|
||||
rocprofv3 supports multiple output formats for different analysis needs. The default format is ``rocpd``, which stores data in a structured SQLite3 database.
|
||||
|
||||
Working with rocpd Database Format
|
||||
-----------------------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Generate rocpd database (default format)
|
||||
rocprofv3 --runtime-trace -- ./your_app
|
||||
# Creates: hostname/pid_results.db
|
||||
|
||||
# Query the database directly with SQL
|
||||
sqlite3 hostname/12345_results.db "SELECT * FROM regions;"
|
||||
|
||||
# Convert rocpd database to other formats
|
||||
rocpd convert -i *.db -f csv pftrace otf2 --start 20% --end 80%
|
||||
|
||||
Collecting and converting to Other Formats
|
||||
-------------------------------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Multiple output formats in one run
|
||||
rocprofv3 --runtime-trace --output-format csv json pftrace otf2 -- ./your_app
|
||||
|
||||
|
||||
**Documentation:** :ref:`using-rocpd-output-format`
|
||||
|
||||
Summary and Statistics
|
||||
----------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Overall summary statistics per domain grouped by kernel and memory operations
|
||||
rocprofv3 --runtime-trace --summary-per-domain --summary-groups "KERNEL_DISPATCH|MEMORY_COPY" -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3` (Post-processing tracing section)
|
||||
|
||||
Filtering and Selection
|
||||
=======================
|
||||
|
||||
Kernel Filtering
|
||||
----------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Include specific kernels by regex
|
||||
rocprofv3 --kernel-trace --kernel-iteration-range 10-20 --kernel-include-regex "matmul.*" --kernel-exclude-regex ".*copy.*" -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3` (Filtering section)
|
||||
|
||||
Time-based Collection
|
||||
---------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Collect for specific time periods (start_delay:collection_time:repeat)
|
||||
rocprofv3 --runtime-trace --collection-period 500:2000:0 --collection-period-unit msec -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3` (Filtering section)
|
||||
|
||||
Kernel Naming and Display
|
||||
=========================
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Keep mangled kernel names
|
||||
rocprofv3 --kernel-trace --mangled-kernels -- ./your_app
|
||||
|
||||
# Truncate kernel names for readability
|
||||
rocprofv3 --kernel-trace --truncate-kernels -- ./your_app
|
||||
|
||||
# Use ROCTx regions to rename kernels
|
||||
rocprofv3 --kernel-trace --kernel-rename -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3` (Kernel naming section)
|
||||
|
||||
Code Annotation with ROCTx
|
||||
===========================
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Trace ROCTx markers and ranges
|
||||
rocprofv3 --marker-trace -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofiler-sdk-roctx`
|
||||
|
||||
Parallel and Distributed Applications
|
||||
======================================
|
||||
|
||||
MPI Applications
|
||||
----------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile MPI applications
|
||||
mpirun -n 4 rocprofv3 --runtime-trace --output-format csv -- ./your_mpi_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3-with-mpi`
|
||||
|
||||
OpenMP Applications
|
||||
-------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Profile OpenMP applications
|
||||
rocprofv3 --runtime-trace --output-format csv -- ./your_openmp_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3-with-openmp`
|
||||
|
||||
Output Management
|
||||
=================
|
||||
|
||||
File Organization
|
||||
-----------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Specify output directory
|
||||
rocprofv3 --runtime-trace --output-directory ./results --output-file my_trace -- ./your_app
|
||||
|
||||
# Generate configuration file
|
||||
rocprofv3 --runtime-trace --output-config -- ./your_app
|
||||
|
||||
**Documentation:** :ref:`using-rocprofv3` (I/O options section)
|
||||
|
||||
Common Use Cases
|
||||
================
|
||||
|
||||
Basic Performance Analysis
|
||||
--------------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Quick performance overview
|
||||
rocprofv3 --runtime-trace --summary -- ./your_app
|
||||
|
||||
**Use case:** Get a high-level view of application performance
|
||||
|
||||
Detailed Kernel Analysis
|
||||
-------------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Detailed kernel profiling with counters
|
||||
rocprofv3 --kernel-trace --pmc SQ_WAVES,SQ_INSTS_VALU,TCP_PERF_SEL_TOTAL_CACHE_ACCESSES -- ./your_app
|
||||
|
||||
**Use case:** Analyze specific kernel performance bottlenecks
|
||||
|
||||
Memory Transfer Analysis
|
||||
------------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Focus on memory operations
|
||||
rocprofv3 --memory-copy-trace --memory-allocation-trace -- ./your_app
|
||||
|
||||
**Use case:** Optimize data movement between CPU and GPU
|
||||
|
||||
Timeline Visualization
|
||||
----------------------
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Generate timeline for visualization tools
|
||||
rocprofv3 --runtime-trace -- ./your_app
|
||||
|
||||
# Convert to Perfetto format
|
||||
rocpd2pftrace -i hostname/pid_results.db -o perfetto_trace
|
||||
|
||||
**Use case:** Visualize execution timeline in Perfetto or similar tools
|
||||
|
||||
Installation and Setup
|
||||
======================
|
||||
|
||||
**Installation Documentation:** :ref:`installing-rocprofiler-sdk`
|
||||
|
||||
**API Reference:** :doc:`Tool library <api-reference/tool_library>`
|
||||
|
||||
**Samples and Examples:** :doc:`Samples <how-to/samples>`
|
||||
|
||||
Troubleshooting Quick Tips
|
||||
==========================
|
||||
|
||||
1. **Permission Issues:** Ensure proper access to GPU devices and ``/dev/kfd``
|
||||
2. **Counter Collection Fails:** Use ``rocprofv3-avail pmc-check`` to verify counter compatibility
|
||||
3. **Large Output Files:** Use ``--minimum-output-data`` to set file size thresholds
|
||||
4. **Signal Handling:** Use ``--disable-signal-handlers`` if conflicts with application handlers
|
||||
5. **ROCm Path Issues:** Use ``--rocm-root`` to specify custom ROCm installation paths
|
||||
|
||||
For comprehensive documentation on each feature, refer to the detailed sections linked throughout this guide.
|
||||
Referens i nytt ärende
Block a user