Files
rocm-systems/projects/rocprofiler-sdk/source/lib/python/rocpd
Ammar ELWazir a697941150 [ROCProfiler SDK CI] Runners Update & Workflow Cache Improvement (#722)
Overriding checks/reviewers as CODEOWNER changes are pending

* Runners Update

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Testing ROCProfiler-SDK

Testing ROCProfiler-SDK

Changing CDash

Fixing ROCProfiler-SDK

Moving AQLProfile Navi3 and Navi4 to DIND

Moving AQLProfile Navi3 and Navi4 to DIND

Moving AQLProfile Navi3 and Navi4 to DIND

Moving AQLProfile Navi3 and Navi4 to DIND

Moving AQLProfile Navi3 and Navi4 to DIND

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating images

Updating images

Updating images

Updating images

Updating RHEL and SLES for AQLProfile

Fixing RPM OSes AQLprofile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for AQLProfile

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

Updating RHEL and SLES for ROCProfiler-SDK

* Fixing ENV for ROCProfiler-SDK

Fixing ENV for ROCProfiler-SDK

Temp workaround for OpenMP targets

Fixing ROCProfiler-SDK for Ubuntu

* Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Update rocprofiler-sdk-continuous_integration.yml

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Fixing Ubuntu Workflow

Adding RPM Package

Adding RPM Package

Fixing OPenMP Compiler Issues

Fixing OPenMP Compiler Issues

Fixing OPenMP Compiler Issues

Fixing OPenMP Compiler Issues

Fixing OPenMP Compiler Issues

Fixing OPenMP Compiler Issues

Fixing OPenMP Compiler Issues

Fixing OPenMP Compiler Issues

Update rocprofiler-sdk-continuous_integration.yml

Update rocprofiler-sdk-continuous_integration.yml

Update aqlprofile-continuous_integration.yml

Update rocprofiler-sdk-continuous_integration.yml

Fixing AQLProfile

* [rocprofiler-sdk][CI] add latest aqlprofile to rocprofiler-sdk workflow (#352)

* add aqlprofile

* misc.

* format

* add sudo to install

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

---------

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

Update aqlprofile-continuous_integration.yml

Removing extra packages

Removing extra packages

Fixing ROCM Path Issues

Fixing ROCM Path Issues

Fixing ROCM Path Issues

Fixing RHEL

Fixing RHEL

Fixing RHEL

Fixing RHEL

Fixing RHEL

Fixing Sanitizers

* General Fixes

* Fixing ROCProfiler-SDK CI

* Fixing ROCProfiler-SDK CI

* Update projects/aqlprofile/dashboard.cmake

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* General Fixes

* Update Readme.txt

* Fix ROCProfiler SDK CI

* Fix ROCProfiler SDK CI

* Fix ROCProfiler SDK CI

* Fix ROCProfiler SDK CI

* Update rocprofiler-sdk-continuous_integration.yml

* Fix ROCProfiler SDK CI

* Fix ROCProfiler SDK CI

* Fix for RHEL and Sanitizers for ROCProfiler-SDK

* Fix for RHEL and Sanitizers for ROCProfiler-SDK

* Fix for RHEL and Sanitizers for ROCProfiler-SDK

* Fix for RHEL and Sanitizers for ROCProfiler-SDK

* Upgrade ROCm Release & Fix for RHEL & SLES - ROCProfiler SDK CI

* Fix for RHEL & SLES - ROCProfiler SDK CI

* Fix for RHEL & SLES & Sanitizers - ROCProfiler SDK CI

* Fix for RHEL & SLES & Sanitizers - ROCProfiler SDK CI

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Adding ROCR Installation

* Adding ROCR Installation

* Adding ROCR Installation

* Adding ROCR Installation

* Adding ROCR Installation

* Adding ROCR Installation

* Update run-ci.py

* Fix for Sanitizers & Fix for RHEL 8.8

* Updating Code Coverage Workflow

* Updating Code Coverage Workflow

* Formatting Fix

* Formatting Fix

* Fix for Code Coverage & Sanitizers

* Fix for Code Coverage & Sanitizers

* Fix for Code Coverage & Sanitizers

* Caching Docker

* Caching Docker

* Caching Docker

* Changing Runner for CI Builder

* Adding CCache

* Fixing Core

* Fixing Core

* Fixing Core

* Fixing Core

* Fixing Core

* Update rocprofiler-sdk-continuous_integration.yml

* Update ROCm and amdgpu repository configurations

* Refactor repository configuration commands in CI

* Fix installation commands in CI workflow

* Remove unnecessary packages from installation commands

* Update ROCm and amdgpu repository paths in CI config

* Update pip installation commands to handle errors

* Install AWS CLI in CI workflow

* Update rocprofiler-sdk-continuous_integration.yml

* Remove awscli installation from CI workflow

* Modify PATH and pipx install commands in CI config

* Refactor ROCm SDK CI workflow to eliminate redundancy

* Add safe.directory configuration for git

* Update rocprofiler-sdk-continuous_integration.yml

* Fix CMake install prefix in CI workflow

* Add variant option to ccache configuration

* Change compiler launcher from ccache to sccache

* Set up Python virtual environment in CI workflow

* Remove ccache launcher from CMake build

* Add environment setup for building projects

* Add Curl installation step for RHEL 8.8

* Update rocprofiler-sdk-continuous_integration.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Fixing RPM

* Fixing RPM & Code Coverage

* Fixing RPM

* Fixing CI

* Lowering the size of the docker image

* Update aqlprofile-continuous_integration.yml

* Updating paths in AQLProfile

* Splitting the Build CI Docker Images from Main CI

* Create Dockerfile.ci, update ci docker workflow to reference it

* Splitting the Build CI Docker Images from Main CI

* Add new line to Dockerfile.ci

* Remove on schedule logic from ci docker workflow, change cdash project name in run-ci.py

* Update file path in build_ci_docker_images.yml

* Remove context from docker step

* Update file path in build_ci_docker_images

* more path changes

* remove context again

* Update rocprofiler-sdk-build_ci_docker_images.yml

* Update rocprofiler-sdk-code_coverage.yml

* Update rocprofiler-sdk-continuous_integration.yml

* Remove env variables from rocprofiler-sdk-build_ci_docker_images.yml

* Rename docker images file

* Rename KEY to FILE_NAME for Docker tarball

* [rocprofiler-sdk][CI] lint fixes  (#830)

* lint fixes.

* Updating Code Coverage Workflow

* Update rocprofiler-sdk-code_coverage.yml

* Update format.hpp

* Update format.hpp

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>

* TEMP: Removing ROCR build from develop

* [rocprofiler-sdk][SDK] Add new HIP API changes for ROCm 7.1 (#856)

* Add new HIP 7.1 changes.

* bug fix.

* bug fix.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix typo in hipDriverEntryPoint case statement

---------

Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Venkateshwar Reddy Kandula <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
2025-09-09 15:25:07 -04:00
..
2025-06-06 04:03:14 -05:00

ROCm Profiling Data (RocPD)

The RocPD Python package provides a scriptable API for analyzing, summarizing, filtering, and merging tracing data collected with the ROCm profiling tools suite.

Background

In the past, the ROCm profiling tools (e.g. rocprofv3, rocprofiler-systems, etc.) have directly written data to various output formats such as CSV, JSON, Perfetto, OTF2, etc. This approach has a significant number of flaws:

No standardization in the CSV and JSON output formats

The ROCm profiling groups considers the standardization of the CSV and JSON output formats for all the tools as a waste of time. Neither of these data formats scale well when large amounts of profiling data is collected Due to the inherent overhead of parsing textual data as opposed to binary, the archane simplicity of the CSV format, and the (general) requirement to parse/load the entire JSON file in order to perform any meaningful data processing.

Inability to unify output collected across multiple processes and nodes

Supporting the unification of output collected across multiple processes and nodes is a difficult endeavor. The complexity of communicating profiling information between processes, especially when the processes exist on separate nodes connected through a network, at best, requires integration with the job launchers and/or explicit support for the job launchers. The general expectation for profiling tools is for them to work regardless of the user application's choice of process-level parallelism (e.g. MPI, fork, spawn, Python multi-processing, UPC, etc.) and job scheduler (e.g. SLURM, flux, PBS/Torque, LSD, etc.). Adding explicit integration/support for this many flavors of parallelism and jobs schedulers is untenable. The most consistent aspect of multi-node jobs is a shared filesystem: it is considered a necessity for the user experience. Without a shared filesystem, the user would be responsible for transferring the application's input and output to/from the specific nodes the job scheduler decided to give them. Thus, the most reliable output for in-process profiling tools is adopting the approach of generating (at least) one output file per process.

In order to unify the output colleted across multiple processes, the one-output-per-process approach requires either (A) a post-processing step which combines the various outputs into a single output, (B) an output format which utilizes a single "metadata" file which links together the individual outputs, or (C) a visualizer which supports opening multiple files at once. The ROCm profiling group considers Option A are the most flexible and reliable approach since Option B does require a small amount of inter-process communication to write the "metadata" file and Option C imposes a rigid restriction on the choice of visualizer.

Data filtering at the data collection stage

In rocprofiler-systems and rocprofv3 with the direct output to Perfetto approach, if the tool collects 2 GB of tracing data per-process in a multi-node job with 16 processes, Perfetto will struggle to visualize each individual 2 GB trace and fail to load a combined 32 GB trace. In this situation, the user must re-run the application and collect less data -- all of that tracing data from the previous run is effectively lost. However, if rocprofiler-systems and rocprofv3 were to adopt an intermediate output format approach and the Perfetto visualization is generated from this intermediate output format, the user would have a multitude of options to remedy this issue. For example, the user could filter out data (e.g. drop HSA functions from the trace), instruct the Perfetto generator to skip adding Perfetto debug annotations on the trace events, combine the 32 GB of data and split it into 32 separate visualizations based on time instead of processes, etc.

Absence of automated analysis

Certain formats such as Perfetto are great for visualization. However, they lack any automated analysis of the data. For example, a flat profile is an extremely useful companion when visually analyzing a trace and other forms of automated analysis can quickly and easily do anomaly detection.

Overview

RocPD is essentially a Python package which understands a standardized SQLite3 schema. This Python package intends to provide a centralized place for a multitude of post-process analysis capabilities. The capabilities include, but are not limited to, analyzing, summarizing, filtering, merging, and generating visualizations of tracing data. This design allows tools such as rocprofv3, rocprofiler-systems, rocprofiler-compute, etc. to focus on minimizing overhead during data collection and adding new data collection features. These tools simply need to write one SQL database per process which adheres to the agreed upon RocPD SQL schema and RocPD will handle the analysis and visualization of the data.

RocPD uses a unique approach to view multiple on-disk databases as a single-database when performing queries. Python applications using RocPD must load the on-disk databases by constructing a rocpd.importer.RocpdImportData object with a list of the database filepaths or by using the rocpd.connect function which returns a rocpd.importer.RocpdImportData object.

Loading Databases Example

input = ["A.db", "B.db"]
rpd_data = rocpd.connect(input)

Executing Queries

The rocpd.importer.RocpdImportData object supports all of the same functions as sqlite3.Connection:

for itr in rpd_data.execute("SELECT * FROM kernels"):
    print(f"{itr}")

cursor = rpd_data.cursor()
for itr in cursor.execute("SELECT * FROM top").fetchall():
    print(f"{itr}")