ROCm Profiling Data (RocPD)
The RocPD Python package provides a scriptable API for analyzing, summarizing, filtering, and merging tracing data collected with the ROCm profiling tools suite.
Background
In the past, the ROCm profiling tools (e.g. rocprofv3, rocprofiler-systems, etc.) have directly written data to various output formats such as CSV, JSON, Perfetto, OTF2, etc. This approach has a significant number of flaws:
No standardization in the CSV and JSON output formats
The ROCm profiling groups considers the standardization of the CSV and JSON output formats for all the tools as a waste of time. Neither of these data formats scale well when large amounts of profiling data is collected Due to the inherent overhead of parsing textual data as opposed to binary, the archane simplicity of the CSV format, and the (general) requirement to parse/load the entire JSON file in order to perform any meaningful data processing.
Inability to unify output collected across multiple processes and nodes
Supporting the unification of output collected across multiple processes and nodes is a difficult endeavor. The complexity of communicating profiling information between processes, especially when the processes exist on separate nodes connected through a network, at best, requires integration with the job launchers and/or explicit support for the job launchers. The general expectation for profiling tools is for them to work regardless of the user application's choice of process-level parallelism (e.g. MPI, fork, spawn, Python multi-processing, UPC, etc.) and job scheduler (e.g. SLURM, flux, PBS/Torque, LSD, etc.). Adding explicit integration/support for this many flavors of parallelism and jobs schedulers is untenable. The most consistent aspect of multi-node jobs is a shared filesystem: it is considered a necessity for the user experience. Without a shared filesystem, the user would be responsible for transferring the application's input and output to/from the specific nodes the job scheduler decided to give them. Thus, the most reliable output for in-process profiling tools is adopting the approach of generating (at least) one output file per process.
In order to unify the output colleted across multiple processes, the one-output-per-process approach requires either (A) a post-processing step which combines the various outputs into a single output, (B) an output format which utilizes a single "metadata" file which links together the individual outputs, or (C) a visualizer which supports opening multiple files at once. The ROCm profiling group considers Option A are the most flexible and reliable approach since Option B does require a small amount of inter-process communication to write the "metadata" file and Option C imposes a rigid restriction on the choice of visualizer.
Data filtering at the data collection stage
In rocprofiler-systems and rocprofv3 with the direct output to Perfetto approach, if the tool collects 2 GB of tracing data per-process in a multi-node job with 16 processes, Perfetto will struggle to visualize each individual 2 GB trace and fail to load a combined 32 GB trace. In this situation, the user must re-run the application and collect less data -- all of that tracing data from the previous run is effectively lost. However, if rocprofiler-systems and rocprofv3 were to adopt an intermediate output format approach and the Perfetto visualization is generated from this intermediate output format, the user would have a multitude of options to remedy this issue. For example, the user could filter out data (e.g. drop HSA functions from the trace), instruct the Perfetto generator to skip adding Perfetto debug annotations on the trace events, combine the 32 GB of data and split it into 32 separate visualizations based on time instead of processes, etc.
Absence of automated analysis
Certain formats such as Perfetto are great for visualization. However, they lack any automated analysis of the data. For example, a flat profile is an extremely useful companion when visually analyzing a trace and other forms of automated analysis can quickly and easily do anomaly detection.
Overview
RocPD is essentially a Python package which understands a standardized SQLite3 schema. This Python package intends to provide a centralized place for a multitude of post-process analysis capabilities. The capabilities include, but are not limited to, analyzing, summarizing, filtering, merging, and generating visualizations of tracing data. This design allows tools such as rocprofv3, rocprofiler-systems, rocprofiler-compute, etc. to focus on minimizing overhead during data collection and adding new data collection features. These tools simply need to write one SQL database per process which adheres to the agreed upon RocPD SQL schema and RocPD will handle the analysis and visualization of the data.
RocPD uses a unique approach to view multiple on-disk databases as a single-database when performing queries.
Python applications using RocPD must load the on-disk databases by constructing a rocpd.importer.RocpdImportData
object with a list of the database filepaths or by using the rocpd.connect function which returns a
rocpd.importer.RocpdImportData object.
Loading Databases Example
input = ["A.db", "B.db"]
rpd_data = rocpd.connect(input)
Executing Queries
The rocpd.importer.RocpdImportData object supports all of the same functions as sqlite3.Connection:
for itr in rpd_data.execute("SELECT * FROM kernels"):
print(f"{itr}")
cursor = rpd_data.cursor()
for itr in cursor.execute("SELECT * FROM top").fetchall():
print(f"{itr}")