diff --git a/projects/rocprofiler-sdk/source/docs/_toc.yml.in b/projects/rocprofiler-sdk/source/docs/_toc.yml.in index 1019b61f6a..69f6b086a5 100644 --- a/projects/rocprofiler-sdk/source/docs/_toc.yml.in +++ b/projects/rocprofiler-sdk/source/docs/_toc.yml.in @@ -9,6 +9,10 @@ subtrees: - caption: Install entries: - file: install/installation + - caption: Quick Reference + entries: + - file: quick_guide + title: ROCprofiler-SDK Quick Reference Guide - caption: How to entries: - file: how-to/samples diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-rocpd-output-format.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-rocpd-output-format.rst index b0c98e153b..2fe591b0a5 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/using-rocpd-output-format.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-rocpd-output-format.rst @@ -49,6 +49,22 @@ The ``rocpd`` database format supports conversion to alternative output formats The ``rocpd`` conversion utility is distributed as part of the ROCm installation package, located in ``/opt/rocm-/bin``, and provides both executable and Python module interfaces for programmatic integration. +**Available rocpd Commands** + +The ``rocpd`` tool provides three main subcommands for different analysis workflows. To see all available options: + +.. code-block:: bash + + rocpd --help + +This will display the available subcommands: ``{convert, query, summary}`` + +- **convert** - Transform rocpd databases to alternative formats (CSV, OTF2, PFTrace) +- **query** - Execute SQL queries against rocpd databases with flexible output options +- **summary** - Generate statistical analysis reports equivalent to rocprofv3 summary functionality + +**Format Conversion** + Invoke the ``rocpd convert`` command with appropriate parameters to transform database files into target formats. **CSV Format Conversion:** @@ -143,7 +159,7 @@ Options Specifies shared memory allocation hint for Perfetto inter-process communication in kilobytes (default: 64 KB). - ``--group-by-queue`` - Organizes trace data by HIP stream abstractions rather than low-level HSA queue identifiers, providing higher-level application context for kernel and memory transfer operations. + Displays the HSA queues to which these kernel and memory operations were submitted. By default, ``rocprofv3`` shows the HIP streams to which the kernel and memory copy operations were submitted **Temporal Filtering Configuration:** @@ -200,3 +216,885 @@ Convert multiple databases to all supported formats (CSV, OTF2, and Perfetto tra /opt/rocm/bin/rocpd convert -i db{3,4}.db --output-format csv otf2 pftrace +Dedicated Conversion Tools +++++++++++++++++++++++++++ + +ROCprofiler-SDK provides specialized conversion utilities for efficient format-specific operations. These tools offer streamlined interfaces for single-format conversions and are particularly useful in automated workflows and scripts. + +rocpd2csv - CSV Export Tool +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Purpose:** Converts rocpd SQLite3 databases to Comma-Separated Values (CSV) format for spreadsheet analysis and data processing workflows. + +**Location:** ``/opt/rocm/bin/rocpd2csv`` + +**Syntax:** + +.. code-block:: bash + + rocpd2csv -i INPUT [INPUT ...] [OPTIONS] + +**Key Features:** + +- **Structured Data Export:** Converts hierarchical database content to tabular CSV format +- **Multi-Database Support:** Aggregates data from multiple database files into unified CSV output +- **Time Window Filtering:** Apply temporal filters to limit exported data range +- **Configurable Output:** Customize output file naming and directory structure + +**Usage Examples:** + +.. code-block:: bash + + # Basic CSV conversion + rocpd2csv -i profile_data.db + + # Convert multiple databases with custom output path + rocpd2csv -i db1.db db2.db db3.db -d ~/analysis_output/ -o combined_profile + + # Apply time window filtering (export middle 70% of execution) + rocpd2csv -i large_profile.db --start 15% --end 85% + +**Common Output Files:** +- ``out_hip_api_trace.csv`` - HIP API call trace data +- ``out_kernel_trace.csv`` - GPU kernel execution information +- ``out_counter_collection.csv`` - Hardware performance counter data + +rocpd2otf2 - Open Trace Format 2 Export +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Purpose:** Generates OTF2 (Open Trace Format 2) files for high-performance trace analysis using tools like Vampir, Tau, and Score-P viewers. + +**Location:** ``/opt/rocm/bin/rocpd2otf2`` + +**Syntax:** + +.. code-block:: bash + + rocpd2otf2 -i INPUT [INPUT ...] [OPTIONS] + +**Key Features:** + +- **HPC-Standard Format:** Produces traces compatible with scientific computing analysis tools +- **Hierarchical Timeline:** Preserves process/thread/queue relationships in trace structure +- **Scalable Storage:** Efficient binary format for large-scale profiling data +- **Agent Indexing:** Configurable GPU agent indexing strategies (absolute, relative, type-relative) + +**Usage Examples:** + +.. code-block:: bash + + # Generate OTF2 trace archive + rocpd2otf2 -i gpu_workload.db + + # Multi-process trace with custom indexing + rocpd2otf2 -i mpi_rank_*.db --agent-index-value type-relative -o mpi_trace + + # Time-windowed trace export + rocpd2otf2 -i long_execution.db --start-marker "computation_begin" --end-marker "computation_end" + +**Output Structure:** +- ``trace.otf2`` - Main trace archive containing timeline data +- ``trace.def`` - Trace definition file with metadata +- Supporting files for multi-stream trace data + +rocpd2pftrace - Perfetto Trace Export +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Purpose:** Converts rocpd databases to Perfetto protocol buffer format for interactive visualization using the Perfetto UI (ui.perfetto.dev). + +**Location:** ``/opt/rocm/bin/rocpd2pftrace`` + +**Syntax:** + +.. code-block:: bash + + rocpd2pftrace -i INPUT [INPUT ...] [OPTIONS] + +**Key Features:** + +- **Interactive Visualization:** Optimized for modern web-based trace viewers +- **Real-time Analysis:** Supports streaming analysis workflows +- **GPU Timeline Integration:** Specialized visualization of GPU execution patterns +- **Configurable Backend:** Supports both in-process and system-wide tracing backends + +**Backend Configuration Options:** + +.. code-block:: bash + + # In-process backend (default) + rocpd2pftrace -i profile.db --perfetto-backend inprocess + + # System-wide tracing backend + rocpd2pftrace -i system_profile.db --perfetto-backend system \ + --perfetto-buffer-size 64MB --perfetto-shmem-size-hint 32MB + +**Buffer Management:** + +.. code-block:: bash + + # Ring buffer mode (overwrites old data) + rocpd2pftrace -i continuous_profile.db --perfetto-buffer-fill-policy ring_buffer + + # Discard mode (stops recording when full) + rocpd2pftrace -i bounded_profile.db --perfetto-buffer-fill-policy discard + +**Usage Examples:** + +.. code-block:: bash + + # Basic Perfetto trace generation + rocpd2pftrace -i application.db + + # High-throughput configuration + rocpd2pftrace -i heavy_workload.db --perfetto-buffer-size 128MB \ + --perfetto-buffer-fill-policy ring_buffer + + # Multi-queue analysis + rocpd2pftrace -i multi_stream.db --group-by-queue -o queue_analysis + +**Visualization Workflow:** +1. Generate ``.perfetto-trace`` file using ``rocpd2pftrace`` +2. Open https://ui.perfetto.dev in web browser +3. Load generated trace file for interactive analysis + +rocpd2summary - Statistical Analysis Tool +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Purpose:** Generates comprehensive statistical summaries and performance analysis reports from rocpd profiling data. + +**Location:** ``/opt/rocm/bin/rocpd2summary`` + +**Syntax:** + +.. code-block:: bash + + rocpd2summary -i INPUT [INPUT ...] [OPTIONS] + +**Key Features:** + +- **Multi-Format Output:** Supports console, CSV, HTML, JSON, Markdown, and PDF report generation +- **Comprehensive Statistics:** Kernel execution times, API call frequencies, memory transfer analysis +- **Domain-Specific Analysis:** Separate summaries for HIP, ROCr, Markers, and other trace domains +- **Rank-Based Analysis:** Per-process and per-rank performance breakdowns for MPI applications +- **Configurable Scope:** Selective inclusion/exclusion of analysis categories + +**Output Format Options:** + +.. code-block:: bash + + # Console output (default) + rocpd2summary -i profile.db + + # CSV format for data analysis + rocpd2summary -i profile.db --format csv -o performance_metrics + + # HTML report with visualization + rocpd2summary -i profile.db --format html -d ~/reports/ + + # Multiple output formats + rocpd2summary -i profile.db --format csv html json + +**Analysis Categories:** + +.. code-block:: bash + + # Include all available domains + rocpd2summary -i profile.db --region-categories HIP HSA MARKERS KERNEL + + # Focus on GPU kernel analysis only + rocpd2summary -i profile.db --region-categories KERNEL + + # Exclude markers to speed up processing + rocpd2summary -i profile.db --region-categories HIP HSA KERNEL + +**Advanced Analysis Options:** + +.. code-block:: bash + + # Include domain-specific statistics + rocpd2summary -i multi_gpu.db --domain-summary + + # Per-rank analysis for MPI applications + rocpd2summary -i mpi_profile_*.db --summary-by-rank --format html + + # Time-windowed summary analysis + rocpd2summary -i long_run.db --start 25% --end 75% --format csv + +**Report Content:** +- **Kernel Statistics:** Execution time distributions, call frequencies, grid/block sizes +- **API Timing:** HIP/HSA function call durations and frequencies +- **Memory Analysis:** Transfer patterns, bandwidth utilization, allocation statistics +- **Device Utilization:** GPU occupancy patterns and idle time analysis +- **Synchronization Overhead:** Barrier and synchronization point analysis + +**Output Files:** +- ``kernels_summary.{format}`` - GPU kernel execution summary +- ``hip_summary.{format}`` - HIP API call statistics +- ``hsa_summary.{format}`` - HSA runtime API analysis +- ``memory_summary.{format}`` - Memory operation statistics +- ``markers_summary.{format}`` - Marker event analysis + +Summary ++++++++ + +The ``rocpd summary`` command provides statistical analysis and performance summaries equivalent to the summary functionality available in ``rocprofv3``. This command generates comprehensive reports from rocpd database files, offering the same analytical capabilities that were previously available through ``rocprofv3 --summary`` but now operating on the structured database format. + +**Purpose:** Generate statistical summaries and performance reports from rocpd database files, providing equivalent functionality to rocprofv3's built-in summary capabilities. + +**Location:** ``/opt/rocm/bin/rocpd summary`` + +**Syntax:** + +.. code-block:: bash + + rocpd summary -i INPUT [INPUT ...] [OPTIONS] + +**Key Features:** + +- **Compatible Analysis:** Provides the same summary statistics and reports as ``rocprofv3 --summary`` +- **Database-Driven:** Operates on structured rocpd database files for consistent, reproducible analysis +- **Multi-Database Aggregation:** Combine and analyze data from multiple profiling sessions, ranks, or nodes in a single operation +- **Comparative Analysis:** Use ``--summary-by-rank`` to compare performance across different ranks, nodes, or execution contexts +- **Flexible Output:** Generate summaries in multiple formats (console, CSV, HTML, JSON) +- **Selective Reporting:** Focus on specific performance domains and categories + +**Multi-Database Analysis Benefits** + +The ``rocpd summary`` command excels at aggregating multiple database files, providing capabilities not available with single-session analysis: + +**Unified Summary Reports:** + +.. code-block:: bash + + # Aggregate multiple databases into single comprehensive summary + rocpd summary -i session1.db session2.db session3.db --format html -o unified_summary + + # Combine all MPI rank databases for overall application analysis + rocpd summary -i rank_*.db --format csv -o mpi_application_summary + + # Time-series aggregation across multiple profiling runs + rocpd summary -i daily_profile_*.db --format json -o weekly_performance_trends + +**Rank-by-Rank Comparative Analysis:** + +The ``--summary-by-rank`` option enables detailed comparative analysis, allowing you to identify performance variations, load balancing issues, and optimization opportunities across different execution contexts: + +.. code-block:: bash + + # Compare performance across MPI ranks + rocpd summary -i rank_0.db rank_1.db rank_2.db rank_3.db --summary-by-rank --format html -o rank_comparison + + # Analyze multi-node performance characteristics + rocpd summary -i node_*.db --summary-by-rank --format csv -o node_performance_analysis + + # Compare GPU device performance in multi-GPU applications + rocpd summary -i gpu_0.db gpu_1.db gpu_2.db gpu_3.db --summary-by-rank --format json -o gpu_scaling_analysis + +**Use Cases for Multi-Database Summary Analysis:** + +**1. MPI Application Performance Analysis:** + +.. code-block:: bash + + # Profile distributed MPI application + mpirun -np 8 rocprofv3 --hip-trace --output-format rocpd -- mpi_simulation + + # Generate unified summary for overall application performance + rocpd summary -i results_rank_*.db --format html -o application_overview + + # Identify load balancing issues with rank-by-rank comparison + rocpd summary -i results_rank_*.db --summary-by-rank --format csv -o load_balance_analysis + +**2. Multi-GPU Scaling Studies:** + +.. code-block:: bash + + # Profile scaling from 1 to 4 GPUs + for gpus in 1 2 4; do + rocprofv3 --hip-trace --device 0:$((gpus-1)) --output-format rocpd -o "scaling_${gpus}gpu.db" -- gpu_benchmark + done + + # Aggregate scaling analysis + rocpd summary -i scaling_*gpu.db --format html -o gpu_scaling_summary + + # Compare efficiency across different GPU counts + rocpd summary -i scaling_*gpu.db --summary-by-rank --format json -o scaling_efficiency + +**3. Performance Regression Testing:** + +.. code-block:: bash + + # Profile baseline and optimized versions + rocprofv3 --hip-trace --output-format rocpd -o baseline.db -- application_v1 + rocprofv3 --hip-trace --output-format rocpd -o optimized.db -- application_v2 + + # Generate unified performance comparison + rocpd summary -i baseline.db optimized.db --summary-by-rank --format html -o regression_analysis + +**4. Cross-Platform Performance Comparison:** + +.. code-block:: bash + + # Profile on different hardware platforms + rocprofv3 --hip-trace --output-format rocpd -o platform_A.db -- benchmark + rocprofv3 --hip-trace --output-format rocpd -o platform_B.db -- benchmark + + # Compare platform performance characteristics + rocpd summary -i platform_*.db --summary-by-rank --format csv -o platform_comparison + +**Advanced Summary Analysis:** + +.. code-block:: bash + + # Cross-rank summary for MPI applications with domain focus + rocpd summary -i rank_*.db --summary-by-rank --region-categories KERNEL HIP --format html + + # Time-windowed multi-database analysis + rocpd summary -i profile_*.db --start 25% --end 75% --summary-by-rank + + # Domain-specific comparative analysis + rocpd summary -i node_*.db --domain-summary --summary-by-rank --region-categories HIP ROCR + +**Output Interpretation:** + +- **Unified Summaries:** Provide aggregate statistics across all input databases, showing combined performance metrics +- **Rank-by-Rank Summaries:** Generate separate statistical reports for each input database, enabling direct comparison of performance characteristics +- **Comparative Metrics:** Highlight performance variations, identify outliers, and reveal load balancing opportunities + +**Integration with rocprofv3 Workflow:** + +The ``rocpd summary`` command maintains full compatibility with ``rocprofv3`` summary analysis while extending capabilities to multi-database scenarios. Users familiar with ``rocprofv3 --summary`` will find identical statistical outputs and report formats when using ``rocpd summary`` on database files, with the added benefit of cross-session analysis capabilities. + +For detailed information about summary statistics and report interpretation, see :ref:`using-rocprofv3-summary`. + +Aggregating rocpd Data +++++++++++++++++++++++ + +One of the key advantages of the ``rocpd`` format is its ability to aggregate and analyze data from multiple profiling sessions, ranks, or nodes within a unified framework. This capability enables comprehensive analysis workflows that were not possible with previous output formats. + +**Multi-Database Analysis Capabilities** + +Unlike the Perfetto output format used in earlier versions, ``rocpd`` databases can be seamlessly combined for cross-session analysis: + +.. code-block:: bash + + # Aggregate analysis across multiple profiling sessions + rocpd query -i session1.db session2.db session3.db \ + --query "SELECT name, AVG(duration) FROM kernels GROUP BY name" + + # Cross-rank performance comparison for MPI applications + rocpd summary -i rank_0.db rank_1.db rank_2.db rank_3.db --summary-by-rank + + # Multi-node scaling analysis + rocpd query -i node_*.db \ + --query "SELECT COUNT(*) as total_kernels, SUM(duration) as total_time FROM kernels" + +**Distributed Computing Workflows** + +**MPI Application Analysis:** + +.. code-block:: bash + + # Profile MPI application across multiple ranks + mpirun -np 4 rocprofv3 --hip-trace --output-format rocpd -- mpi_application + + # Generate aggregated performance summary + rocpd summary -i results_rank_*.db --summary-by-rank --format html -o mpi_performance_report + + # Analyze load balancing across ranks + rocpd query -i results_rank_*.db \ + --query "SELECT pid, COUNT(*) as kernel_count, AVG(duration) as avg_duration FROM kernels GROUP BY pid" + +**Multi-GPU Scaling Analysis:** + +.. code-block:: bash + + # Profile application with multiple GPU devices + rocprofv3 --hip-trace --device 0,1,2,3 --output-format rocpd -- multi_gpu_app + + # Aggregate device utilization analysis + rocpd query -i multi_gpu_results.db \ + --query "SELECT agent_abs_index as device_id, COUNT(*) as operations, SUM(duration) as total_time FROM kernels GROUP BY device_id" + + # Cross-device performance comparison + rocpd summary -i multi_gpu_results.db --domain-summary + +**Temporal Aggregation** + +**Time-Series Analysis:** + +.. code-block:: bash + + # Collect profiles over time for performance monitoring + for hour in {1..24}; do + rocprofv3 --hip-trace --output-format rocpd -o "profile_hour_$hour.db" -- application + done + + # Analyze performance trends over time + rocpd query -i profile_hour_*.db \ + --query "SELECT AVG(duration) as avg_kernel_time, COUNT(*) as kernel_count FROM kernels" \ + --format csv -o performance_trends + +**Comparative Analysis:** + +.. code-block:: bash + + # Compare baseline vs optimized performance + rocpd query -i baseline.db optimized.db \ + --query "SELECT kernel, AVG(duration) as avg_time FROM kernels GROUP BY name ORDER BY avg_time DESC" + + # Generate comparative summary reports + rocpd summary -i baseline.db optimized.db --format html -o comparison_report + +**Data Aggregation Benefits** + +- **Unified Analysis:** Combine data from different execution contexts, hardware configurations, and time periods +- **Scalability Insights:** Analyze performance scaling across multiple nodes, ranks, or GPU devices +- **Trend Analysis:** Track performance evolution over time or across different software versions +- **Load Balancing:** Identify performance bottlenecks and load distribution issues in distributed applications +- **Cross-Platform Comparison:** Compare performance across different hardware platforms using unified database schema + +The aggregation capabilities of ``rocpd`` format enable sophisticated analysis workflows that provide deeper insights into application performance characteristics across diverse computing environments. + +Tool Integration and Workflow Examples ++++++++++++++++++++++++++++++++++++++++ + +**Multi-Format Analysis Pipeline:** + +.. code-block:: bash + + # Generate all analysis formats for comprehensive review + rocpd2csv -i profile.db -o analysis_data + rocpd2summary -i profile.db --format html -o performance_report + rocpd2pftrace -i profile.db -o interactive_trace + +**Automated Performance Monitoring:** + +.. code-block:: bash + + #!/bin/bash + # Performance analysis automation script + + PROFILE_DB="$1" + OUTPUT_DIR="analysis_$(date +%Y%m%d_%H%M%S)" + + mkdir -p "$OUTPUT_DIR" + + # Generate CSV data for automated analysis + rocpd2csv -i "$PROFILE_DB" -d "$OUTPUT_DIR" -o raw_data + + # Create summary reports + rocpd2summary -i "$PROFILE_DB" --format csv html \ + -d "$OUTPUT_DIR" -o performance_summary + + # Generate interactive trace for detailed investigation + rocpd2pftrace -i "$PROFILE_DB" -d "$OUTPUT_DIR" -o interactive_trace + + +Query ++++++ + +The ``rocpd query`` command provides powerful SQL-based analysis capabilities for exploring and extracting data from rocpd databases. This tool enables custom analysis workflows, automated reporting, and integration with external analysis pipelines. + +rocpd query - SQL Query Engine +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Purpose:** Execute custom SQL queries against rocpd databases with support for multiple output formats, automated reporting, and email delivery. + +**Location:** ``/opt/rocm/bin/rocpd query`` + +**Syntax:** + +.. code-block:: bash + + rocpd query -i INPUT [INPUT ...] --query "SQL_STATEMENT" [OPTIONS] + +**Key Features:** + +- **Standard SQL Support:** Full SQLite3 SQL syntax including JOINs, aggregate functions, and complex WHERE clauses +- **Multi-Database Aggregation:** Query across multiple database files as unified virtual database +- **Multiple Output Formats:** Console, CSV, HTML, JSON, Markdown, PDF, and interactive dashboards +- **Script Execution:** Execute complex SQL scripts with view definitions and custom functions +- **Automated Reporting:** Email delivery with SMTP configuration and attachment management +- **Time Window Integration:** Apply temporal filtering before query execution + +Database Schema and Views +~~~~~~~~~~~~~~~~~~~~~~~~~ + +rocpd databases provide comprehensive views for analysis. In general, any queries should be built using the `data_views`: + +**Core Data Views:** + +.. code-block:: sql + + -- System and hardware information + SELECT * FROM rocpd_info_agents; + SELECT * FROM rocpd_info_node; + + -- Kernel execution data + SELECT * FROM kernels; + SELECT * FROM top_kernels; + + -- API trace information + SELECT * FROM regions_and_samples WHERE category LIKE 'HIP_%'; + SELECT * FROM regions_and_samples WHERE category LIKE 'RCCL_%; + + -- Performance counters + SELECT * FROM counters_collection; + + -- Memory operations + SELECT * FROM memory_copies; + SELECT * FROM memory_allocations; + + -- Process and thread information + SELECT * FROM processes; + SELECT * FROM threads; + + -- Marker and region data + SELECT * FROM regions; + SELECT * FROM regions_and_samples WHERE category LIKE 'MARKERS_%'; + +**Summary and Analysis Views:** + +.. code-block:: sql + + -- Top performing kernels by execution time + SELECT * FROM top_kernels LIMIT 10; + + -- Top Analysis + SELECT * FROM top; + + -- Busy Analysis + SELECT * FROM busy; + +Basic Query Examples +~~~~~~~~~~~~~~~~~~~~ + +**Simple Data Exploration:** + +.. code-block:: bash + + # List available GPU agents + rocpd query -i profile.db --query "SELECT * FROM rocpd_info_agents" + + # Show top 10 longest-running kernels + rocpd query -i profile.db --query "SELECT name, duration FROM kernels ORDER BY duration DESC LIMIT 10" + + # Count total number of kernel dispatches + rocpd query -i profile.db --query "SELECT COUNT(*) as total_kernels FROM kernels" + +**Multi-Database Aggregation:** + +.. code-block:: bash + + # Combine data from multiple profiling sessions + rocpd query -i session1.db session2.db session3.db \ + --query "SELECT pid, COUNT(*) as kernel_count FROM kernels GROUP BY pid" + + # Cross-session performance comparison + rocpd query -i baseline.db optimized.db \ + --query "SELECT name as kernel_name, AVG(duration) as avg_duration FROM kernels GROUP BY kernel_name" + +**Advanced Analytics:** + +.. code-block:: bash + + # Kernel performance analysis with statistics + rocpd query -i profile.db --query " + SELECT + name as kernel_name, + COUNT(*) as dispatch_count, + MIN(duration) as min_duration, + AVG(duration) as avg_duration, + MAX(duration) as max_duration, + SUM(duration) as total_duration + FROM kernels + GROUP BY kernel_name + ORDER BY total_duration DESC" + +**Memory Transfer Analysis:** + +.. code-block:: bash + + # Memory copy analysis by direction + rocpd query -i profile.db --query " + SELECT + name as kernel_name, + src_agent_type, + src_agent_abs_index, + dst_agent_type, + dst_agent_abs_index, + COUNT(*) as transfer_count, + SUM(size) as total_bytes, + SUM(duration) as total_duration + FROM memory_copies + GROUP BY src_agent_abs_index + ORDER BY total_bytes DESC" + +Output Format Options +~~~~~~~~~~~~~~~~~~~~ + +**Console Output (Default):** + +.. code-block:: bash + + # Display results in terminal + rocpd query -i profile.db --query "SELECT * FROM top_kernels LIMIT 5" + +**CSV Export for Data Analysis:** + +.. code-block:: bash + + # Export to CSV file + rocpd query -i profile.db --query "SELECT * FROM kernels" --format csv -o kernel_analysis + + # Specify custom output directory + rocpd query -i profile.db --query "SELECT * FROM kernels" --format csv -d ~/analysis/ -o kernel_data + +**HTML Reports:** + +.. code-block:: bash + + # Generate HTML table + rocpd query -i profile.db --query "SELECT * FROM top_kernels" --format html -o performance_report + +**Interactive Dashboard:** + +.. code-block:: bash + + # Create interactive HTML dashboard + rocpd query -i profile.db --query "SELECT * FROM device_utilization" --format dashboard -o utilization_dashboard + + # Use custom dashboard template + rocpd query -i profile.db --query "SELECT * FROM kernels" --format dashboard \ + --template-path ~/templates/custom_dashboard.html -o custom_report + +**JSON for Programmatic Integration:** + +.. code-block:: bash + + # Export structured JSON data + rocpd query -i profile.db --query "SELECT * FROM counters_collection" --format json -o counter_data + +**PDF Reports:** + +.. code-block:: bash + + # Generate PDF report with monospace formatting + rocpd query -i profile.db --query "SELECT name, duration FROM top_kernels" --format pdf -o kernel_report + +Script-Based Analysis +~~~~~~~~~~~~~~~~~~~~~ + +Execute complex SQL scripts with view definitions and custom analysis logic: + +**SQL Script Example (analysis.sql):** + +.. code-block:: sql + + -- Create temporary views for complex analysis + CREATE TEMP VIEW kernel_stats AS + SELECT + name as kernel_name, + COUNT(*) as dispatch_count, + AVG(duration) as avg_duration, + STDDEV(duration) as duration_stddev + FROM kernels + GROUP BY kernel_name; + + CREATE TEMP VIEW performance_outliers AS + SELECT k.*, ks.avg_duration, ks.duration_stddev + FROM kernels k + JOIN kernel_stats ks ON k.name = ks.name + WHERE ABS(k.duration - ks.avg_duration) > 2 * ks.duration_stddev; + +**Execute Script with Query:** + +.. code-block:: bash + + # Run script then execute query + rocpd query -i profile.db --script analysis.sql \ + --query "SELECT * FROM performance_outliers" --format html -o outlier_analysis + +Time Window Integration +~~~~~~~~~~~~~~~~~~~~~~ + +Apply temporal filtering before query execution: + +.. code-block:: bash + + # Query only middle 50% of execution timeline + rocpd query -i profile.db --start 25% --end 75% \ + --query "SELECT COUNT(*) as kernel_count FROM kernels" + + # Use marker-based time windows + rocpd query -i profile.db --start-marker "computation_begin" --end-marker "computation_end" \ + --query "SELECT * FROM kernels ORDER BY start_time" + + # Absolute timestamp filtering + rocpd query -i profile.db --start 1000000000 --end 2000000000 \ + --query "SELECT * FROM kernels WHERE start_time BETWEEN 1000000000 AND 2000000000" + +Automated Email Reporting +~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Basic Email Delivery:** + +.. code-block:: bash + + # Send CSV report via email + rocpd query -i profile.db --query "SELECT * FROM top_kernels" --format csv \ + --email-to analyst@company.com --email-from profiler@company.com \ + --email-subject "Weekly Performance Report" + +**Advanced Email Configuration:** + +.. code-block:: bash + + # Multiple recipients with SMTP authentication + rocpd query -i profile.db --query "SELECT * FROM device_utilization" --format html \ + --email-to "team@company.com,manager@company.com" \ + --email-from profiler@company.com \ + --email-subject "GPU Utilization Analysis" \ + --smtp-server smtp.company.com --smtp-port 587 \ + --smtp-user profiler@company.com --smtp-password $(cat ~/.smtp_pass) \ + --inline-preview --zip-attachments + +**Dashboard Email Reports:** + +.. code-block:: bash + + # Send interactive dashboard via email + rocpd query -i profile.db --query "SELECT * FROM kernels" --format dashboard \ + --template-path ~/templates/executive_summary.html \ + --email-to executives@company.com --email-from profiler@company.com \ + --email-subject "Executive Performance Dashboard" \ + --inline-preview + +Integration Workflows +~~~~~~~~~~~~~~~~~~~~ + +**Automated Analysis Pipeline:** + +.. code-block:: bash + + #!/bin/bash + # Automated reporting script + + DB_FILE="$1" + REPORT_DATE=$(date +%Y-%m-%d) + + # Generate multiple analysis reports + rocpd query -i "$DB_FILE" --query "SELECT * FROM top_kernels LIMIT 20" \ + --format html -o "top_kernels_$REPORT_DATE" + + rocpd query -i "$DB_FILE" --query "SELECT * FROM memory_copy_summary" \ + --format csv -o "memory_analysis_$REPORT_DATE" + + rocpd query -i "$DB_FILE" --query "SELECT * FROM device_utilization" \ + --format dashboard -o "utilization_dashboard_$REPORT_DATE" \ + --email-to team@company.com --email-from automation@company.com + +**Performance Regression Detection:** + +.. code-block:: bash + + # Compare current performance against baseline + rocpd query -i baseline.db current.db --script performance_comparison.sql \ + --query "SELECT * FROM performance_regression_analysis" \ + --format html -o regression_report \ + --email-to devteam@company.com --email-from ci@company.com \ + --email-subject "Performance Regression Analysis" + +**Custom Analysis Functions:** + +rocpd databases support custom SQL functions for advanced analysis: + +.. code-block:: bash + + # Use built-in rocpd functions + rocpd query -i profile.db --query " + SELECT + name, + rocpd_get_string(name_id, 0, nid, pid) as full_kernel_name, + duration + FROM kernels + WHERE rocpd_get_string(name_id, 0, nid, pid) LIKE '%gemm%'" + +rocpd query Command-Line Reference +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: none + + usage: rocpd query [-h] -i INPUT [INPUT ...] --query QUERY [--script SCRIPT] + [--format {console,csv,html,json,md,pdf,dashboard,clipboard}] + [-o OUTPUT_FILE] [-d OUTPUT_PATH] + [--email-to EMAIL_TO] [--email-from EMAIL_FROM] + [--email-subject EMAIL_SUBJECT] [--smtp-server SMTP_SERVER] + [--smtp-port SMTP_PORT] [--smtp-user SMTP_USER] + [--smtp-password SMTP_PASSWORD] [--zip-attachments] + [--inline-preview] [--template-path TEMPLATE_PATH] + [--start START | --start-marker START_MARKER] + [--end END | --end-marker END_MARKER] + +**Required Arguments:** + +- ``-i INPUT [INPUT ...]``, ``--input INPUT [INPUT ...]`` + Input database file paths. Multiple databases are merged into unified view. + +- ``--query QUERY`` + SQL SELECT statement to execute. Enclose complex queries in quotes. + +**Query Options:** + +- ``--script SCRIPT`` + SQL script file to execute before running the main query. Useful for creating views and functions. + +- ``--format {console,csv,html,json,md,pdf,dashboard,clipboard}`` + Output format (default: console). Dashboard format creates interactive HTML reports. + +**Output Configuration:** + +- ``-o OUTPUT_FILE``, ``--output-file OUTPUT_FILE`` + Base filename for exported files. + +- ``-d OUTPUT_PATH``, ``--output-path OUTPUT_PATH`` + Output directory path. + +- ``--template-path TEMPLATE_PATH`` + Jinja2 template file for dashboard format customization. + +**Email Reporting:** + +- ``--email-to EMAIL_TO`` + Recipient email addresses (comma-separated for multiple recipients). + +- ``--email-from EMAIL_FROM`` + Sender email address (required when using email delivery). + +- ``--email-subject EMAIL_SUBJECT`` + Email subject line. + +- ``--smtp-server SMTP_SERVER``, ``--smtp-port SMTP_PORT`` + SMTP server configuration (default: localhost:25). + +- ``--smtp-user SMTP_USER``, ``--smtp-password SMTP_PASSWORD`` + SMTP authentication credentials. + +- ``--zip-attachments`` + Bundle all attachments into single ZIP file. + +- ``--inline-preview`` + Embed HTML reports as email body content. + +**Time Window Filtering:** + +- ``--start START``, ``--end END`` + Temporal boundaries using percentage (e.g., 25%) or absolute timestamps. + +- ``--start-marker START_MARKER``, ``--end-marker END_MARKER`` + Named marker events defining time window boundaries. + +The ``rocpd query`` tool provides comprehensive SQL-based analysis capabilities, enabling custom workflows and automated reporting for GPU profiling data analysis. + +**Documentation:** :ref:`using-rocpd-output-format` (SQL Schema Reference), :ref:`using-rocprofv3` (Marker Integration) diff --git a/projects/rocprofiler-sdk/source/docs/index.rst b/projects/rocprofiler-sdk/source/docs/index.rst index 1c4c566ad8..50457e0201 100644 --- a/projects/rocprofiler-sdk/source/docs/index.rst +++ b/projects/rocprofiler-sdk/source/docs/index.rst @@ -28,6 +28,10 @@ The documentation is structured as follows: * :ref:`installing-rocprofiler-sdk` + .. grid-item-card:: Quick Reference + + * :ref:`quick-guide` + .. grid-item-card:: How to * :doc:`Samples ` diff --git a/projects/rocprofiler-sdk/source/docs/quick_guide.rst b/projects/rocprofiler-sdk/source/docs/quick_guide.rst new file mode 100644 index 0000000000..33bdd4ccb1 --- /dev/null +++ b/projects/rocprofiler-sdk/source/docs/quick_guide.rst @@ -0,0 +1,323 @@ +.. meta:: + :description: Quick reference guide for rocprofv3 commands and rocprofiler-sdk tools + :keywords: rocprofv3 quick guide, rocprofiler-sdk quick reference, rocprofv3 commands, ROCprofiler-SDK CLI, GPU profiling quick start + +.. _quick-guide: + +============================================== +ROCprofiler-SDK Quick Reference Guide +============================================== + +This quick reference guide provides an overview of the most commonly used ``rocprofv3`` commands and links to detailed documentation sections. + +Getting Started +=============== + +Export the ROCm binary path: + +.. code-block:: bash + + source /opt/rocm/share/rocprofiler-sdk/setup-env.sh + +Check rocprofv3 version and help: + +.. code-block:: bash + + rocprofv3 --version + rocprofv3 --help + +Essential Commands +================== + +Querying System Capabilities +----------------------------- + +List available counters and capabilities: + +.. code-block:: bash + + # List all available features + rocprofv3 --list-avail + + # Using the dedicated tool for detailed queries + rocprofv3-avail list + rocprofv3-avail info + +**Documentation:** :ref:`using-rocprofv3-avail` + +Basic Tracing +------------- + +Application tracing (HIP API + kernel dispatches + memory operations): + +.. code-block:: bash + + # Runtime tracing (recommended for most use cases) + rocprofv3 --runtime-trace -- ./your_app + + # System-level tracing (includes HSA API) + rocprofv3 --sys-trace -- ./your_app + +**Documentation:** :ref:`using-rocprofv3` + +Granular Tracing Options +------------------------ + +.. code-block:: bash + + # HIP API, kernel dispatches, and memory operations tracing + rocprofv3 --hip-trace --kernel-trace --memory-copy-trace -- ./your_app + + +**Documentation:** :ref:`using-rocprofv3` (Basic tracing section) + +Performance Counter Collection +------------------------------ + +.. code-block:: bash + + # List available counters + rocprofv3-avail list --pmc + + # Check if counters can be collected together + rocprofv3-avail pmc-check SQ_WAVES SQ_INSTS_VALU + + # Collect specific counters + rocprofv3 --pmc SQ_WAVES,SQ_INSTS_VALU -- ./your_app + +**Documentation:** :ref:`using-rocprofv3` (Counter collection section) + +Advanced Profiling Features +============================ + +PC Sampling (Beta) +------------------ + +.. code-block:: bash + + # Check PC sampling support + rocprofv3-avail list --pc-sampling + + # Enable PC sampling + rocprofv3 --pc-sampling-beta-enabled --pc-sampling-interval 1000 -- ./your_app + +**Documentation:** :ref:`using-pc-sampling` + +Thread Trace +------------ + +.. code-block:: bash + + # Collect thread trace data + rocprofv3 --att --output-format csv -- ./your_app + +**Documentation:** :ref:`using-thread-trace` + +Process Attachment +------------------ + +.. code-block:: bash + + # Attach to a running process by PID + rocprofv3 --pid 12345 --runtime-trace -d ./results + # or + + # Attach for a specific duration (10 seconds) + rocprofv3 --pid 12345 --runtime-trace --attach-duration-msec 1000 + +**Documentation:** :ref:`using-rocprofv3-process-attachment` + +Output Formats and Post-processing +=================================== + +rocprofv3 supports multiple output formats for different analysis needs. The default format is ``rocpd``, which stores data in a structured SQLite3 database. + +Working with rocpd Database Format +----------------------------------- + +.. code-block:: bash + + # Generate rocpd database (default format) + rocprofv3 --runtime-trace -- ./your_app + # Creates: hostname/pid_results.db + + # Query the database directly with SQL + sqlite3 hostname/12345_results.db "SELECT * FROM regions;" + + # Convert rocpd database to other formats + rocpd convert -i *.db -f csv pftrace otf2 --start 20% --end 80% + +Collecting and converting to Other Formats +------------------------------------------- + +.. code-block:: bash + + # Multiple output formats in one run + rocprofv3 --runtime-trace --output-format csv json pftrace otf2 -- ./your_app + + +**Documentation:** :ref:`using-rocpd-output-format` + +Summary and Statistics +---------------------- + +.. code-block:: bash + + # Overall summary statistics per domain grouped by kernel and memory operations + rocprofv3 --runtime-trace --summary-per-domain --summary-groups "KERNEL_DISPATCH|MEMORY_COPY" -- ./your_app + +**Documentation:** :ref:`using-rocprofv3` (Post-processing tracing section) + +Filtering and Selection +======================= + +Kernel Filtering +---------------- + +.. code-block:: bash + + # Include specific kernels by regex + rocprofv3 --kernel-trace --kernel-iteration-range 10-20 --kernel-include-regex "matmul.*" --kernel-exclude-regex ".*copy.*" -- ./your_app + +**Documentation:** :ref:`using-rocprofv3` (Filtering section) + +Time-based Collection +--------------------- + +.. code-block:: bash + + # Collect for specific time periods (start_delay:collection_time:repeat) + rocprofv3 --runtime-trace --collection-period 500:2000:0 --collection-period-unit msec -- ./your_app + +**Documentation:** :ref:`using-rocprofv3` (Filtering section) + +Kernel Naming and Display +========================= + +.. code-block:: bash + + # Keep mangled kernel names + rocprofv3 --kernel-trace --mangled-kernels -- ./your_app + + # Truncate kernel names for readability + rocprofv3 --kernel-trace --truncate-kernels -- ./your_app + + # Use ROCTx regions to rename kernels + rocprofv3 --kernel-trace --kernel-rename -- ./your_app + +**Documentation:** :ref:`using-rocprofv3` (Kernel naming section) + +Code Annotation with ROCTx +=========================== + +.. code-block:: bash + + # Trace ROCTx markers and ranges + rocprofv3 --marker-trace -- ./your_app + +**Documentation:** :ref:`using-rocprofiler-sdk-roctx` + +Parallel and Distributed Applications +====================================== + +MPI Applications +---------------- + +.. code-block:: bash + + # Profile MPI applications + mpirun -n 4 rocprofv3 --runtime-trace --output-format csv -- ./your_mpi_app + +**Documentation:** :ref:`using-rocprofv3-with-mpi` + +OpenMP Applications +------------------- + +.. code-block:: bash + + # Profile OpenMP applications + rocprofv3 --runtime-trace --output-format csv -- ./your_openmp_app + +**Documentation:** :ref:`using-rocprofv3-with-openmp` + +Output Management +================= + +File Organization +----------------- + +.. code-block:: bash + + # Specify output directory + rocprofv3 --runtime-trace --output-directory ./results --output-file my_trace -- ./your_app + + # Generate configuration file + rocprofv3 --runtime-trace --output-config -- ./your_app + +**Documentation:** :ref:`using-rocprofv3` (I/O options section) + +Common Use Cases +================ + +Basic Performance Analysis +-------------------------- + +.. code-block:: bash + + # Quick performance overview + rocprofv3 --runtime-trace --summary -- ./your_app + +**Use case:** Get a high-level view of application performance + +Detailed Kernel Analysis +------------------------- + +.. code-block:: bash + + # Detailed kernel profiling with counters + rocprofv3 --kernel-trace --pmc SQ_WAVES,SQ_INSTS_VALU,TCP_PERF_SEL_TOTAL_CACHE_ACCESSES -- ./your_app + +**Use case:** Analyze specific kernel performance bottlenecks + +Memory Transfer Analysis +------------------------ + +.. code-block:: bash + + # Focus on memory operations + rocprofv3 --memory-copy-trace --memory-allocation-trace -- ./your_app + +**Use case:** Optimize data movement between CPU and GPU + +Timeline Visualization +---------------------- + +.. code-block:: bash + + # Generate timeline for visualization tools + rocprofv3 --runtime-trace -- ./your_app + + # Convert to Perfetto format + rocpd2pftrace -i hostname/pid_results.db -o perfetto_trace + +**Use case:** Visualize execution timeline in Perfetto or similar tools + +Installation and Setup +====================== + +**Installation Documentation:** :ref:`installing-rocprofiler-sdk` + +**API Reference:** :doc:`Tool library ` + +**Samples and Examples:** :doc:`Samples ` + +Troubleshooting Quick Tips +========================== + +1. **Permission Issues:** Ensure proper access to GPU devices and ``/dev/kfd`` +2. **Counter Collection Fails:** Use ``rocprofv3-avail pmc-check`` to verify counter compatibility +3. **Large Output Files:** Use ``--minimum-output-data`` to set file size thresholds +4. **Signal Handling:** Use ``--disable-signal-handlers`` if conflicts with application handlers +5. **ROCm Path Issues:** Use ``--rocm-root`` to specify custom ROCm installation paths + +For comprehensive documentation on each feature, refer to the detailed sections linked throughout this guide.