SWDEV-544115 Adding documentation for rocprofv3 advanced options (#516)
* SWDEV-544115 Adding documentaiton for rocprofv3 advanced options
* minor changes
* updating rocpd documentation
* updated changelog
* adressed Feedback
[ROCm/rocprofiler-sdk commit: 4120c12ed5]
Этот коммит содержится в:
коммит произвёл
GitHub
родитель
b30a084da2
Коммит
f625253208
@@ -193,6 +193,7 @@ Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projec
|
||||
- Added `rocpd` output format documentation
|
||||
- Requires the ROCprof Trace Decoder plugin installed (see above)
|
||||
- Added perfetto support for scratch memory.
|
||||
- Added documentation for rocprofv3 advanced options
|
||||
|
||||
### Changed
|
||||
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
.. meta::
|
||||
:description: "ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software."
|
||||
:keywords: "ROCprofiler-SDK, ROCProfiler-SDK output formats, rocpd, SQLite3, CSV, JSON, PFTrace, OTF2"
|
||||
:description: "ROCprofiler-SDK rocpd output format documentation - comprehensive guide for SQLite3 database storage, format conversion utilities, and multi-format export capabilities for GPU profiling data analysis."
|
||||
:keywords: "ROCprofiler-SDK, rocpd, SQLite3, profiling database, format conversion, CSV export, JSON export, PFTrace, OTF2, GPU profiling, trace analysis"
|
||||
|
||||
.. _using-rocpd-output-format:
|
||||
|
||||
@@ -8,73 +8,76 @@
|
||||
Using rocpd Output Format
|
||||
=========================
|
||||
|
||||
``rocprofv3`` supports the following output formats:
|
||||
``rocprofv3`` provides comprehensive support for multiple output formats to accommodate diverse analysis workflows:
|
||||
|
||||
- **rocpd** (SQLite3 Database, Default)
|
||||
- **CSV**
|
||||
- **JSON** (Custom format for programmatic analysis only)
|
||||
- **PFTrace** (Perfetto trace for visualization with Perfetto)
|
||||
- **OTF2** (Open Trace Format for visualization with compatible third-party tools)
|
||||
- **rocpd** (SQLite3 Database) - Default format providing structured data storage
|
||||
- **CSV** (Comma-Separated Values) - Tabular format for spreadsheet applications and data analysis tools
|
||||
- **JSON** (JavaScript Object Notation) - Structured format optimized for programmatic analysis and integration
|
||||
- **PFTrace** (Perfetto Protocol Buffers) - Binary trace format for high-performance visualization using Perfetto
|
||||
- **OTF2** (Open Trace Format 2) - Standardized trace format for interoperability with third-party analysis tools
|
||||
|
||||
The ``rocpd`` output format is the default for ``rocprofv3``. It stores profiling results in a SQLite3 database, providing a structured and efficient way to analyze and post-process profiling data. This format allows users to query and manipulate profiling data using SQL, making it easy to extract specific information or perform complex analyses.
|
||||
The ``rocpd`` output format serves as the primary data repository for ``rocprofv3`` profiling sessions. This format leverages SQLite3's ACID-compliant database engine to provide robust, structured storage of comprehensive profiling datasets. The relational schema enables efficient querying and manipulation of profiling data through standard SQL interfaces, facilitating complex analytical operations and custom reporting workflows.
|
||||
|
||||
Features
|
||||
++++++++
|
||||
|
||||
- **Rich Data Model**: Stores all collected profiling data, including traces, counters, and metadata, in a single `.db` (SQLite3) file.
|
||||
- **Programmatic Access**: Can be queried using standard SQL tools or libraries (e.g., `sqlite3` CLI, Python's `sqlite3` module).
|
||||
- **Post-Processing**: Enables advanced analysis and visualization using custom scripts or third-party tools that support SQLite3.
|
||||
- **Comprehensive Data Model**: Consolidates all profiling artifacts including execution traces, performance counters, hardware metrics, and contextual metadata within a single SQLite3 database file (`.db` extension).
|
||||
- **Standards-Compliant Access**: Supports querying through industry-standard SQL interfaces including command-line tools (``sqlite3`` CLI), programming language bindings (Python ``sqlite3`` module, C/C++ SQLite API), and database management applications.
|
||||
- **Advanced Analytics Integration**: Facilitates sophisticated post-processing workflows through custom analytical scripts, automated reporting systems, and integration with third-party visualization and analysis frameworks that provide SQLite3 connectivity.
|
||||
|
||||
Generating rocpd Output
|
||||
+++++++++++++++++++++++
|
||||
|
||||
To generate output in rocpd format, simply use:
|
||||
To generate profiling data in the default rocpd format:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --hip-trace -- <application>
|
||||
|
||||
Or use the ``--output-format`` option with ``rocpd``:
|
||||
Alternatively, explicitly specify the rocpd output format using the ``--output-format`` parameter:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --hip-trace --output-format rocpd -- <application>
|
||||
|
||||
The output will be saved as ``%hostname%/%pid%_results.db``, where ``%hostname%`` is the name of the host machine and ``%pid%`` is the process ID of the application being profiled.
|
||||
The profiling session generates output files following the naming convention ``%hostname%/%pid%_results.db``, where ``%hostname%`` represents the system hostname and ``%pid%`` corresponds to the process identifier of the profiled application.
|
||||
|
||||
Converting rocpd to Other Formats
|
||||
+++++++++++++++++++++++++++++++++
|
||||
Converting rocpd to Alternative Formats
|
||||
+++++++++++++++++++++++++++++++++++++
|
||||
|
||||
The ``rocpd`` output format can be converted to other formats for further analysis or visualization.
|
||||
First, ensure the ``rocpd`` Python module is available in your environment:
|
||||
The ``rocpd`` database format supports conversion to alternative output formats for specialized analysis and visualization workflows.
|
||||
|
||||
The ``rocpd`` conversion utility is distributed as part of the ROCm installation package, located in ``/opt/rocm-<version>/bin``, and provides both executable and Python module interfaces for programmatic integration.
|
||||
|
||||
Invoke the ``rocpd convert`` command with appropriate parameters to transform database files into target formats.
|
||||
|
||||
**CSV Format Conversion:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
export PYTHONPATH=<install-path>/lib/pythonX.Y/site-packages:$PYTHONPATH
|
||||
/opt/rocm/bin/rocpd convert -i <input-file>.db --output-format csv
|
||||
|
||||
where ``<install-path>`` is the ROCm installation path (usually ``/opt/rocm-<major.minor.patch>``), and ``X.Y`` is your Python version.
|
||||
**Python Interpreter Compatibility:**
|
||||
|
||||
Once the ``rocpd`` module is available, use the ``rocpd convert`` command to convert the output to other formats.
|
||||
|
||||
Convert to CSV format:
|
||||
When encountering Python interpreter version conflicts, specify the appropriate Python executable explicitly:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 -m rocpd convert -i <input-file>.db --output-format csv
|
||||
python3.10 $(which rocpd) convert -f csv -i <input-file>.db
|
||||
|
||||
The converted CSV will be saved as ``rocpd-output-data/out_hip_api_trace.csv`` in the current working directory.
|
||||
The CSV conversion process generates output files in the ``rocpd-output-data/out_hip_api_trace.csv`` path relative to the current working directory.
|
||||
|
||||
Convert to OTF2 format:
|
||||
**OTF2 Format Conversion:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 -m rocpd convert -i <input-file>.db --output-format otf2
|
||||
/opt/rocm/bin/rocpd convert -i <input-file>.db --output-format otf2
|
||||
|
||||
Convert to PFTrace format:
|
||||
**Perfetto Trace Format Conversion:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 -m rocpd convert -i <input-file>.db --output-format pftrace
|
||||
/opt/rocm/bin/rocpd convert -i <input-file>.db --output-format pftrace
|
||||
|
||||
rocpd convert Command-Line Options
|
||||
++++++++++++++++++++++++++++++++++
|
||||
@@ -98,96 +101,102 @@ Options
|
||||
**Required Arguments:**
|
||||
|
||||
- ``-i INPUT [INPUT ...]``, ``--input INPUT [INPUT ...]``
|
||||
Input path and filename to one or more database(s), separated by spaces.
|
||||
Specifies input database file paths. Accepts multiple SQLite3 database files separated by whitespace for batch processing operations.
|
||||
|
||||
- ``-f {csv,pftrace,otf2} [{csv,pftrace,otf2} ...]``, ``--output-format {csv,pftrace,otf2} [{csv,pftrace,otf2} ...]``
|
||||
Specify one or more output formats. Supported: ``csv``, ``pftrace``, ``otf2``.
|
||||
Defines target output format(s). Supports concurrent conversion to multiple formats: ``csv`` (Comma-Separated Values), ``pftrace`` (Perfetto Protocol Buffers), ``otf2`` (Open Trace Format 2).
|
||||
|
||||
**I/O Options:**
|
||||
**I/O Configuration:**
|
||||
|
||||
- ``-o OUTPUT_FILE``, ``--output-file OUTPUT_FILE``
|
||||
Sets the base output file name (default: ``out``).
|
||||
Configures the base filename for generated output files (default: ``out``).
|
||||
|
||||
- ``-d OUTPUT_PATH``, ``--output-path OUTPUT_PATH``
|
||||
Sets the output directory (default: ``./rocpd-output-data``).
|
||||
Specifies the target directory for output file generation (default: ``./rocpd-output-data``).
|
||||
|
||||
**Kernel Naming Options:**
|
||||
**Kernel Identification Options:**
|
||||
|
||||
- ``--kernel-rename``
|
||||
Use ROCTx marker names instead of kernel names.
|
||||
Substitutes kernel function names with corresponding ROCTx marker annotations for enhanced semantic context.
|
||||
|
||||
**Generic Options:**
|
||||
**Device Identification Configuration:**
|
||||
|
||||
- ``--agent-index-value {absolute,relative,type-relative}``
|
||||
Device identification format in output:
|
||||
Controls device identification methodology in converted output:
|
||||
|
||||
- ``absolute``: Uses node_id (e.g., Agent-0, Agent-2, Agent-4), ignoring cgroups.
|
||||
- ``relative``: Uses logical_node_id (e.g., Agent-0, Agent-1, Agent-2), considering cgroups. *(Default)*
|
||||
- ``type-relative``: Uses logical_node_type_id (e.g., CPU-0, GPU-0, GPU-1), numbering resets for each device type.
|
||||
- ``absolute``: Utilizes hardware node identifiers (e.g., Agent-0, Agent-2, Agent-4), bypassing container group abstractions.
|
||||
- ``relative``: Employs logical node identifiers (e.g., Agent-0, Agent-1, Agent-2), incorporating container group context. *(Default)*
|
||||
- ``type-relative``: Applies device-type-specific logical identifiers (e.g., CPU-0, GPU-0, GPU-1), with independent numbering sequences per device class.
|
||||
|
||||
**Perfetto Trace (pftrace) Options:**
|
||||
**Perfetto Trace Configuration:**
|
||||
|
||||
- ``--perfetto-backend {inprocess,system}``
|
||||
Perfetto data collection backend. ``system`` mode requires running ``traced`` and ``perfetto`` daemons (default: ``inprocess``).
|
||||
Configures Perfetto data collection architecture. The ``system`` backend requires active ``traced`` and ``perfetto`` daemon processes, while ``inprocess`` operates autonomously (default: ``inprocess``).
|
||||
|
||||
- ``--perfetto-buffer-fill-policy {discard,ring_buffer}``
|
||||
Policy for handling new records when buffer is full (default: ``discard``).
|
||||
Defines buffer overflow handling strategy: ``discard`` drops new records when capacity is exceeded, ``ring_buffer`` overwrites oldest records (default: ``discard``).
|
||||
|
||||
- ``--perfetto-buffer-size KB``
|
||||
Buffer size for perfetto output in KB (default: 1 GB).
|
||||
Sets the trace buffer capacity in kilobytes for Perfetto output generation (default: 1,048,576 KB / 1 GB).
|
||||
|
||||
- ``--perfetto-shmem-size-hint KB``
|
||||
Perfetto shared memory size hint in KB (default: 64 KB).
|
||||
Specifies shared memory allocation hint for Perfetto inter-process communication in kilobytes (default: 64 KB).
|
||||
|
||||
- ``--group-by-queue``
|
||||
Display HIP streams that kernels and memory copy operations are submitted to, rather than HSA queues.
|
||||
Organizes trace data by HIP stream abstractions rather than low-level HSA queue identifiers, providing higher-level application context for kernel and memory transfer operations.
|
||||
|
||||
**Time Window Options:**
|
||||
**Temporal Filtering Configuration:**
|
||||
|
||||
- ``--start START``
|
||||
Start time as percentage or nanoseconds from trace file (e.g., ``50%`` or ``781470909013049``).
|
||||
Defines trace window start boundary using percentage notation (e.g., ``50%``) or absolute nanosecond timestamps (e.g., ``781470909013049``).
|
||||
|
||||
- ``--start-marker START_MARKER``
|
||||
Named marker event to use as window start point.
|
||||
Specifies named marker event identifier to establish trace window start boundary.
|
||||
|
||||
- ``--end END``
|
||||
End time as percentage or nanoseconds from trace file (e.g., ``75%`` or ``3543724246381057``).
|
||||
Defines trace window end boundary using percentage notation (e.g., ``75%``) or absolute nanosecond timestamps (e.g., ``3543724246381057``).
|
||||
|
||||
- ``--end-marker END_MARKER``
|
||||
Named marker event to use as window end point.
|
||||
Specifies named marker event identifier to establish trace window end boundary.
|
||||
|
||||
- ``--inclusive INCLUSIVE``
|
||||
``True``: include events if START or END in window; ``False``: only if BOTH in window (default: ``True``).
|
||||
Controls event inclusion criteria: ``True`` includes events with either start or end timestamps within the specified window; ``False`` requires both timestamps within the window (default: ``True``).
|
||||
|
||||
**Help:**
|
||||
**Command-Line Help:**
|
||||
|
||||
- ``-h``, ``--help``
|
||||
Show help message and exit.
|
||||
Displays comprehensive command syntax, parameter descriptions, and usage examples.
|
||||
|
||||
Examples
|
||||
++++++++
|
||||
|
||||
Convert one database to Perfetto trace:
|
||||
**Single Database Conversion to Perfetto Format:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 -m rocpd convert -i db1.db --output-format pftrace
|
||||
/opt/rocm/bin/rocpd convert -i db1.db --output-format pftrace
|
||||
|
||||
Convert two databases to Perfetto trace, set output path and filename, and limit to last 70% of trace:
|
||||
**Multi-Database Conversion with Temporal Filtering:**
|
||||
|
||||
Convert multiple databases to Perfetto format, specifying custom output directory and filename, with temporal window constraint to the final 70% of the trace duration:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 -m rocpd convert -i db1.db db2.db --output-format pftrace -d "./output/" -o "twoFileTraces" --start 30% --end 100%
|
||||
/opt/rocm/bin/rocpd convert -i db1.db db2.db --output-format pftrace -d "./output/" -o "twoFileTraces" --start 30% --end 100%
|
||||
|
||||
Convert six databases to CSV and Perfetto trace formats:
|
||||
**Batch Conversion to Multiple Formats:**
|
||||
|
||||
Process six database files simultaneously, generating both CSV and Perfetto trace outputs with custom output configuration:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 -m rocpd convert -i db{0..5}.db --output-format csv pftrace -d "~/output_folder/" -o "sixFileTraces"
|
||||
/opt/rocm/bin/rocpd convert -i db{0..5}.db --output-format csv pftrace -d "~/output_folder/" -o "sixFileTraces"
|
||||
|
||||
Convert two databases to CSV, OTF2, and Perfetto trace formats:
|
||||
**Comprehensive Format Conversion:**
|
||||
|
||||
Convert multiple databases to all supported formats (CSV, OTF2, and Perfetto trace) in a single operation:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python3 -m rocpd convert -i db{3,4}.db --output-format csv otf2 pftrace
|
||||
/opt/rocm/bin/rocpd convert -i db{3,4}.db --output-format csv otf2 pftrace
|
||||
|
||||
|
||||
@@ -169,11 +169,17 @@ The following table lists the commonly used ``rocprofv3`` command-line options c
|
||||
|
||||
* - Other
|
||||
- | ``--preload`` PRELOAD |br| |br|
|
||||
| ``--minimum-output-data`` |br| |br|
|
||||
| ``--disable-signal-handlers``
|
||||
- | Specifies libraries to prepend to ``LD_PRELOAD``. It is useful for sanitizer libraries. |br| |br|
|
||||
| Output files are generated only if output data size is greater than minimum output data size. It can be used for controlling the generation of output files so that user don't recieve empty files. The input is in KB units. |br| |br|
|
||||
| Disables the signal handlers in the rocprofv3 tool. It disables the prioritizing of rocprofv3 signal handler over application installed signal handler. When --disable-signal-handlers is set to true, and application has its signal handler on SIGSEGV or similar installed, then its signal handler will be used not the rocprofv3 signal handler. Note: glog still installs signal handlers which provide backtraces.
|
||||
| ``--minimum-output-data`` KB |br| |br|
|
||||
| ``--disable-signal-handlers`` [BOOL] |br| |br|
|
||||
| ``--rocm-root`` PATH |br| |br|
|
||||
| ``--sdk-soversion`` SDK_SOVERSION |br| |br|
|
||||
| ``--sdk-version`` SDK_VERSION
|
||||
- | Specifies libraries to prepend to ``LD_PRELOAD``. Useful for sanitizer libraries and custom instrumentation tools. Multiple libraries can be specified. |br| |br|
|
||||
| Specifies the minimum output data size threshold in KB. Output files are generated only if the collected profiling data exceeds this threshold. This prevents creation of empty or very small output files. Default is 0 (no threshold). |br| |br|
|
||||
| Controls signal handler prioritization. When set to true, disables rocprofv3 signal handler prioritization, allowing application signal handlers to take precedence. Useful for applications with custom crash handling or when integrating with testing frameworks. Default is false (rocprofv3 handlers have priority). |br| |br|
|
||||
| Specifies custom ROCm installation directory instead of automatic detection. Useful for multiple ROCm installations, custom builds, or non-standard locations. |br| |br|
|
||||
| Specifies the shared object version number for ROCProfiler SDK library resolution. Controls which major version of librocprofiler-sdk.so.X to use. |br| |br|
|
||||
| Specifies the exact version number for ROCProfiler SDK library resolution. Controls library selection with full semantic versioning (X.Y.Z format).
|
||||
|
||||
To see exhaustive list of ``rocprofv3`` options:
|
||||
|
||||
@@ -1124,6 +1130,565 @@ Advanced options
|
||||
|
||||
``rocprofv3`` provides the following miscellaneous functionalities for improved control and flexibility.
|
||||
|
||||
Minimum output data threshold
|
||||
+++++++++++++++++++++++++++++
|
||||
|
||||
The ``--minimum-output-data`` option allows you to control the generation of output files by setting a minimum data size threshold. This prevents the creation of empty or very small output files that contain no meaningful profiling data.
|
||||
|
||||
When this option is specified, ``rocprofv3`` only generates output files if the collected data size exceeds the specified threshold. This is particularly useful in scenarios where:
|
||||
|
||||
- You're profiling applications that may have sporadic GPU activity
|
||||
- You want to avoid processing empty trace files in automated workflows
|
||||
- You're running batch jobs and only want meaningful results
|
||||
|
||||
To specify the minimum output data threshold, use the ``--minimum-output-data`` option followed by the size in KB:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --minimum-output-data 100 --hip-trace --output-format csv -- <application_path>
|
||||
|
||||
The preceding command only generates output files if the HIP trace data is larger than 100 KB.
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
**Scenario 1: Filtering out applications with minimal GPU activity**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Only generate output if kernel trace data > 50 KB
|
||||
rocprofv3 --minimum-output-data 50 --kernel-trace --output-format csv -- <application_path>
|
||||
|
||||
**Scenario 2: Batch profiling with meaningful data collection**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# For system tracing, only output files if data > 1 MB
|
||||
rocprofv3 --minimum-output-data 1024 --sys-trace --output-format pftrace -- <application_path>
|
||||
|
||||
**Using with input files:**
|
||||
|
||||
You can also specify this option in YAML or JSON input files:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
jobs:
|
||||
- hip_trace: true
|
||||
kernel_trace: true
|
||||
minimum_output_data: 100
|
||||
output_format: ["csv", "json"]
|
||||
output_directory: "filtered_results"
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"hip_trace": true,
|
||||
"kernel_trace": true,
|
||||
"minimum_output_data": 100,
|
||||
"output_format": ["csv", "json"],
|
||||
"output_directory": "filtered_results"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
**Important notes:**
|
||||
|
||||
- The threshold applies to the raw profiling data size, not the final output file size
|
||||
- If multiple output formats are specified, the threshold check applies to each format independently
|
||||
- A value of 0 (default) means all output files are generated regardless of size
|
||||
- This option works with all tracing and counter collection modes
|
||||
|
||||
This feature is especially valuable in automated testing environments where you want to ensure that only applications with meaningful GPU activity generate profiling outputs, reducing storage overhead and simplifying result analysis.
|
||||
|
||||
Signal handler control
|
||||
++++++++++++++++++++++
|
||||
|
||||
The ``--disable-signal-handlers`` option provides control over signal handling behavior in ``rocprofv3``, allowing you to manage how the profiler responds to system signals like SIGSEGV, SIGTERM, and others.
|
||||
|
||||
By default, ``rocprofv3`` installs its own signal handlers to ensure proper cleanup and data collection when the application encounters errors or is terminated. However, in some scenarios, you may want the application's own signal handlers to take precedence.
|
||||
|
||||
When ``--disable-signal-handlers`` is set to ``true``, ``rocprofv3`` disables the prioritization of its signal handlers over application-installed signal handlers. This means:
|
||||
|
||||
- If your application has custom signal handlers for SIGSEGV, SIGTERM, or similar signals, those handlers will be executed instead of ``rocprofv3``'s handlers
|
||||
- The application maintains full control over signal handling behavior
|
||||
- ``rocprofv3`` will still attempt to collect and save profiling data when possible
|
||||
|
||||
**Important note**: Even with this option enabled, the underlying ``glog`` library may still install signal handlers that provide stack backtraces for debugging purposes.
|
||||
|
||||
**Basic usage:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --disable-signal-handlers --hip-trace --output-format csv -- <application_path>
|
||||
|
||||
The preceding command disables ``rocprofv3`` signal handler prioritization, allowing the application's signal handlers to take precedence.
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
**Scenario 1: Application with custom crash handling**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# For applications that implement custom crash reporting or recovery
|
||||
rocprofv3 --disable-signal-handlers --sys-trace --output-format pftrace -- ./my_app_with_custom_handlers
|
||||
|
||||
**Scenario 2: Debugging applications with existing signal handlers**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# When debugging applications that rely on specific signal handling behavior
|
||||
rocprofv3 --disable-signal-handlers --kernel-trace --pmc SQ_WAVES -- ./debug_application
|
||||
|
||||
**Scenario 3: Integration with testing frameworks**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# For test frameworks that need to handle signals for test orchestration
|
||||
rocprofv3 --disable-signal-handlers --runtime-trace --output-directory test_results -- ./test_suite
|
||||
|
||||
**Using with input files:**
|
||||
|
||||
You can also specify this option in YAML or JSON input files:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
jobs:
|
||||
- hip_trace: true
|
||||
kernel_trace: true
|
||||
disable_signal_handlers: true
|
||||
output_format: ["csv", "json"]
|
||||
output_directory: "custom_signal_handling"
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"hip_trace": true,
|
||||
"kernel_trace": true,
|
||||
"disable_signal_handlers": true,
|
||||
"output_format": ["csv", "json"],
|
||||
"output_directory": "custom_signal_handling"
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
**When to use this option:**
|
||||
|
||||
**Use when:**
|
||||
- Your application has custom signal handlers that must execute
|
||||
- You're integrating with testing frameworks that manage signals
|
||||
- Debugging applications where signal handling behavior is critical
|
||||
- Working with applications that implement custom crash reporting
|
||||
|
||||
**Avoid when:**
|
||||
- You want ``rocprofv3`` to provide maximum protection against data loss
|
||||
- Your application doesn't have custom signal handlers
|
||||
- You're doing standard profiling where signal handling isn't a concern
|
||||
|
||||
**Example: Application with custom SIGSEGV handler**
|
||||
|
||||
If your application has a custom segmentation fault handler:
|
||||
|
||||
.. code-block:: cpp
|
||||
|
||||
#include <signal.h>
|
||||
#include <stdio.h>
|
||||
|
||||
void custom_sigsegv_handler(int sig) {
|
||||
printf("Custom SIGSEGV handler called\n");
|
||||
// Custom crash reporting logic
|
||||
exit(1);
|
||||
}
|
||||
|
||||
int main() {
|
||||
signal(SIGSEGV, custom_sigsegv_handler);
|
||||
|
||||
// Application code that might trigger SIGSEGV
|
||||
return 0;
|
||||
}
|
||||
|
||||
Use ``--disable-signal-handlers`` to ensure your custom handler executes:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --disable-signal-handlers --hip-trace -- ./app_with_custom_handler
|
||||
|
||||
**Troubleshooting:**
|
||||
|
||||
- If profiling data appears incomplete with this option enabled, check if your application's signal handlers are properly saving or flushing data
|
||||
- Consider implementing explicit ``rocprofv3`` cleanup calls in your application's signal handlers if data integrity is important
|
||||
- Monitor application behavior to ensure custom signal handling doesn't interfere with profiling data collection
|
||||
|
||||
This option provides the flexibility needed for complex applications and testing environments while maintaining ``rocprofv3``'s core profiling functionality.
|
||||
|
||||
Library preloading
|
||||
+++++++++++++++++++
|
||||
|
||||
The ``--preload`` option allows you to specify additional libraries to prepend to the ``LD_PRELOAD`` environment variable. This is particularly useful when working with sanitizer libraries, debugging tools, or other instrumentation libraries that need to be loaded before the application starts.
|
||||
|
||||
``LD_PRELOAD`` is a powerful mechanism in Linux that allows you to load shared libraries before any other libraries, effectively intercepting and overriding function calls. The ``--preload`` option in ``rocprofv3`` provides a convenient way to manage this without manually setting environment variables.
|
||||
|
||||
**Basic usage:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --preload /path/to/library.so --hip-trace --output-format csv -- <application_path>
|
||||
|
||||
The preceding command preloads the specified library and enables HIP tracing.
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
**Scenario 1: Using AddressSanitizer (ASan)**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Preload AddressSanitizer for memory error detection
|
||||
rocprofv3 --preload /usr/lib/x86_64-linux-gnu/libasan.so.5 --sys-trace -- ./my_application
|
||||
|
||||
**Scenario 2: Using ThreadSanitizer (TSan)**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Preload ThreadSanitizer for race condition detection
|
||||
rocprofv3 --preload /usr/lib/x86_64-linux-gnu/libtsan.so.0 --kernel-trace --pmc SQ_WAVES -- ./threaded_app
|
||||
|
||||
**Scenario 3: Multiple preloaded libraries**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Preload multiple libraries (custom profiler and sanitizer)
|
||||
rocprofv3 --preload /opt/custom/libprofiler.so /usr/lib/libasan.so --runtime-trace -- ./complex_app
|
||||
|
||||
**Scenario 4: Using MemorySanitizer (MSan)**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Preload MemorySanitizer for uninitialized memory detection
|
||||
rocprofv3 --preload /usr/lib/x86_64-linux-gnu/libmsan.so.0 --hip-trace -- ./memory_intensive_app
|
||||
|
||||
**Using with input files:**
|
||||
|
||||
You can also specify this option in YAML or JSON input files:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
jobs:
|
||||
- hip_trace: true
|
||||
kernel_trace: true
|
||||
preload:
|
||||
- "/usr/lib/x86_64-linux-gnu/libasan.so.5"
|
||||
- "/opt/custom/libprofiler.so"
|
||||
output_format: ["csv"]
|
||||
|
||||
.. code-block:: json
|
||||
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"hip_trace": true,
|
||||
"kernel_trace": true,
|
||||
"preload": [
|
||||
"/usr/lib/x86_64-linux-gnu/libasan.so.5",
|
||||
"/opt/custom/libprofiler.so"
|
||||
],
|
||||
"output_format": ["csv"]
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
**Common use cases:**
|
||||
|
||||
**Sanitizer libraries:**
|
||||
- AddressSanitizer (``libasan.so``) for memory error detection
|
||||
- ThreadSanitizer (``libtsan.so``) for race condition detection
|
||||
- MemorySanitizer (``libmsan.so``) for uninitialized memory detection
|
||||
- UndefinedBehaviorSanitizer (``libubsan.so``) for undefined behavior detection
|
||||
|
||||
**Debugging and profiling tools:**
|
||||
- Custom memory allocators (``jemalloc``, ``tcmalloc``)
|
||||
- Performance profiling libraries
|
||||
- Custom instrumentation libraries
|
||||
- Mock libraries for testing
|
||||
|
||||
**Third-party analysis tools:**
|
||||
- Valgrind replacement libraries
|
||||
- Custom logging frameworks
|
||||
- Security analysis tools
|
||||
|
||||
**Library order considerations:**
|
||||
|
||||
The order of libraries in ``--preload`` matters as they are processed in the order specified:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Library1 will be loaded before Library2
|
||||
rocprofv3 --preload /path/to/library1.so /path/to/library2.so --hip-trace -- ./app
|
||||
|
||||
**Environment variable interaction:**
|
||||
|
||||
The ``--preload`` option works alongside existing ``LD_PRELOAD`` settings:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# If LD_PRELOAD is already set, --preload libraries are prepended
|
||||
export LD_PRELOAD="/existing/library.so"
|
||||
rocprofv3 --preload /new/library.so --hip-trace -- ./app
|
||||
# Effective LD_PRELOAD: "/new/library.so:/existing/library.so"
|
||||
|
||||
**Troubleshooting:**
|
||||
|
||||
- **Library not found**: Ensure the library path is correct and the library exists
|
||||
- **Symbol conflicts**: Check for conflicting symbols between preloaded libraries
|
||||
- **Performance impact**: Sanitizers can significantly slow down execution
|
||||
- **Memory usage**: Some tools like AddressSanitizer increase memory consumption substantially
|
||||
|
||||
ROCm root path configuration
|
||||
++++++++++++++++++++++++++++
|
||||
|
||||
The ``--rocm-root`` option allows you to specify a custom ROCm installation directory instead of using the default relative path detection. This is useful when working with multiple ROCm installations, custom builds, or non-standard installation locations.
|
||||
|
||||
By default, ``rocprofv3`` automatically detects the ROCm installation path relative to its own location. However, in some environments, you may need to explicitly specify which ROCm installation to use.
|
||||
|
||||
**Basic usage:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --rocm-root /opt/custom-rocm --hip-trace --output-format csv -- <application_path>
|
||||
|
||||
The preceding command uses the ROCm installation located at ``/opt/custom-rocm``.
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
**Scenario 1: Multiple ROCm versions**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Use ROCm 5.7.0 specifically
|
||||
rocprofv3 --rocm-root /opt/rocm-5.7.0 --sys-trace -- ./app_for_rocm_5_7
|
||||
|
||||
# Use ROCm 6.0.0 for comparison
|
||||
rocprofv3 --rocm-root /opt/rocm-6.0.0 --sys-trace -- ./app_for_rocm_6_0
|
||||
|
||||
**Scenario 2: Custom ROCm build**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Use custom ROCm build with debugging symbols
|
||||
rocprofv3 --rocm-root /home/developer/rocm-debug-build --kernel-trace --pmc SQ_WAVES -- ./debug_app
|
||||
|
||||
**Scenario 3: Development environment**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Use locally built ROCm for development
|
||||
rocprofv3 --rocm-root /workspace/rocm-dev --runtime-trace -- ./test_application
|
||||
|
||||
**Scenario 4: Container environments**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Use ROCm mounted at custom location in container
|
||||
rocprofv3 --rocm-root /usr/local/rocm --hip-trace -- ./containerized_app
|
||||
|
||||
**Directory structure requirements:**
|
||||
|
||||
The specified ROCm root path should contain the standard ROCm directory structure:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
/opt/custom-rocm/
|
||||
├── bin/ # ROCm executables
|
||||
├── lib/ # ROCm libraries
|
||||
├── include/ # ROCm headers
|
||||
├── share/ # Shared resources
|
||||
└── ...
|
||||
|
||||
**Using with input files:**
|
||||
|
||||
This option is typically used from the command line, but can be specified in wrapper scripts:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#!/bin/bash
|
||||
# profile_with_custom_rocm.sh
|
||||
ROCM_PATH="/opt/rocm-custom"
|
||||
rocprofv3 --rocm-root "$ROCM_PATH" -i input.yaml -- "$@"
|
||||
|
||||
**Environment variable interaction:**
|
||||
|
||||
The ``--rocm-root`` option overrides automatic path detection and environment variables like ``ROCM_PATH``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# --rocm-root takes precedence over environment variables
|
||||
export ROCM_PATH="/opt/rocm-default"
|
||||
rocprofv3 --rocm-root /opt/rocm-override --hip-trace -- ./app
|
||||
# Uses /opt/rocm-override, not /opt/rocm-default
|
||||
|
||||
**Validation and troubleshooting:**
|
||||
|
||||
- **Path validation**: Ensure the specified path contains a valid ROCm installation
|
||||
- **Library compatibility**: Verify that the ROCm version is compatible with your application
|
||||
- **Permission issues**: Check read permissions for the ROCm directory
|
||||
- **Path format**: Use absolute paths to avoid ambiguity
|
||||
|
||||
SDK shared object version control
|
||||
++++++++++++++++++++++++++++++++++
|
||||
|
||||
The ``--sdk-soversion`` option allows you to specify the shared object version number for the ROCProfiler SDK library. This provides precise control over which version of the library is loaded, useful for testing, compatibility verification, or working with specific library versions.
|
||||
|
||||
Shared object versioning follows the Linux convention where libraries have version suffixes like ``.so.X`` where X is the major version number. This option helps resolve library paths when multiple versions are installed.
|
||||
|
||||
**Basic usage:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --sdk-soversion 2 --hip-trace --output-format csv -- <application_path>
|
||||
|
||||
The preceding command uses ``librocprofiler-sdk.so.2`` instead of the default version.
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
**Scenario 1: Testing with specific library version**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Test application with SDK version 1
|
||||
rocprofv3 --sdk-soversion 1 --kernel-trace --pmc SQ_WAVES -- ./app_v1_test
|
||||
|
||||
# Test same application with SDK version 2
|
||||
rocprofv3 --sdk-soversion 2 --kernel-trace --pmc SQ_WAVES -- ./app_v2_test
|
||||
|
||||
**Scenario 2: Compatibility verification**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Verify backward compatibility with older SDK
|
||||
rocprofv3 --sdk-soversion 0 --sys-trace -- ./legacy_application
|
||||
|
||||
**Scenario 3: Development and testing**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Use specific version for regression testing
|
||||
rocprofv3 --sdk-soversion 3 --runtime-trace --output-directory regression_test -- ./test_suite
|
||||
|
||||
**Scenario 4: Production environment pinning**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Pin to specific version for production consistency
|
||||
rocprofv3 --sdk-soversion 1 --hip-trace --minimum-output-data 100 -- ./production_app
|
||||
|
||||
**Library resolution behavior:**
|
||||
|
||||
The option affects library loading in the following order:
|
||||
|
||||
1. ``librocprofiler-sdk.so.X`` (where X is the specified soversion)
|
||||
2. Fallback to default library if specific version not found
|
||||
|
||||
**Using with scripts:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#!/bin/bash
|
||||
# test_matrix.sh - Test with multiple SDK versions
|
||||
for version in 0 1 2; do
|
||||
echo "Testing with SDK SO version $version"
|
||||
rocprofv3 --sdk-soversion $version --hip-trace -- ./test_app
|
||||
done
|
||||
|
||||
**Troubleshooting:**
|
||||
|
||||
- **Library not found**: Verify the specified soversion exists in the library path
|
||||
- **ABI compatibility**: Ensure the SDK version is compatible with your ROCm installation
|
||||
- **Symbol mismatches**: Check for symbol compatibility between versions
|
||||
- **Performance differences**: Different versions may have performance characteristics
|
||||
|
||||
SDK version specification
|
||||
+++++++++++++++++++++++++
|
||||
|
||||
The ``--sdk-version`` option allows you to specify the exact version number for the ROCProfiler SDK library resolution. This provides the finest level of control over library selection, useful for testing specific versions, development workflows, or ensuring reproducible profiling environments.
|
||||
|
||||
This option helps resolve library paths for version-specific libraries like ``librocprofiler-sdk.so.X.Y.Z`` where X.Y.Z represents the full semantic version.
|
||||
|
||||
**Basic usage:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --sdk-version 1.2.3 --hip-trace --output-format csv -- <application_path>
|
||||
|
||||
The preceding command uses ``librocprofiler-sdk.so.1.2.3`` if available.
|
||||
|
||||
**Example scenarios:**
|
||||
|
||||
**Scenario 1: Exact version testing**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Test with specific patch version for bug verification
|
||||
rocprofv3 --sdk-version 2.1.5 --kernel-trace -- ./bug_reproduction_case
|
||||
|
||||
# Test with fixed version
|
||||
rocprofv3 --sdk-version 2.1.6 --kernel-trace -- ./bug_verification_case
|
||||
|
||||
**Scenario 2: Reproducible profiling**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Ensure exact same SDK version for reproducible results
|
||||
rocprofv3 --sdk-version 2.2.1 --pmc SQ_WAVES GRBM_COUNT --output-format pftrace -- ./benchmark_app
|
||||
|
||||
**Version format support:**
|
||||
|
||||
The option supports various version formats:
|
||||
|
||||
- **Semantic versioning**: ``1.2.3``, ``2.0.0``, ``1.5.10``
|
||||
|
||||
**Library resolution priority:**
|
||||
|
||||
When ``--sdk-version`` is specified, the library resolution follows this order:
|
||||
|
||||
1. ``librocprofiler-sdk.so.X.Y.Z`` (exact version match)
|
||||
2. ``librocprofiler-sdk.so.X.Y`` (major.minor match)
|
||||
3. ``librocprofiler-sdk.so.X`` (major version match)
|
||||
4. Default library (``librocprofiler-sdk.so``)
|
||||
|
||||
**Using with input files:**
|
||||
|
||||
While typically used from command line, it can be scripted:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#!/bin/bash
|
||||
# version_matrix_test.sh
|
||||
VERSIONS=("2.1.0" "2.1.1" "2.1.2" "2.2.0")
|
||||
|
||||
for version in "${VERSIONS[@]}"; do
|
||||
echo "Testing SDK version $version"
|
||||
rocprofv3 --sdk-version "$version" --hip-trace --output-directory "results_$version" -- ./test_app
|
||||
done
|
||||
|
||||
**Combined with other version options:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Combine with soversion for maximum control
|
||||
rocprofv3 --sdk-version 2.1.5 --sdk-soversion 2 --hip-trace -- ./app
|
||||
|
||||
# Combine with custom ROCm root
|
||||
rocprofv3 --rocm-root /opt/rocm-6.0 --sdk-version 2.2.0 --sys-trace -- ./app
|
||||
|
||||
**Environment integration:**
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Use environment variable for version
|
||||
export ROCPROF_SDK_VERSION="2.1.3"
|
||||
rocprofv3 --sdk-version "$ROCPROF_SDK_VERSION" --kernel-trace -- ./app
|
||||
|
||||
Agent index
|
||||
++++++++++++++
|
||||
|
||||
|
||||
Ссылка в новой задаче
Block a user