Fichiers
marantic-amd 7af2dba741 [rocprof-sys] Align rocpd symbols of the same counter types (#2675)
## Motivation

In order for Optiq to be able to detect that counter tracks are of the same type, we aligned `info_pmc` symbol naming across the tracks of the same type. Being able to know this will be useful for grouping and categorizing similar types of counter tracks and for setting up a consistent y-axis scale when plotting the values on charts.

## Technical Details

Replace unique and/or ordered symbol names with counter-common symbol name which will be the same for the counters of the same type, with counter track name remaining the unique identifier for that counter track. For example, the "symbol" field was "JpegAct_0" but is now "JpegAct".
2026-01-23 00:36:08 -05:00

8.2 KiB

Changelog for ROCm Systems Profiler

Full documentation for ROCm Systems Profiler is available at https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/.

ROCm Systems Profiler 1.5.0 for ROCm x.y.z (unreleased)

Changed

  • Simplify categorizing like pmc_info events by removing the _ from the "symbol" field. ie., "JpegAct_0" -> "JpegAct".

ROCm Systems Profiler 1.4.0 for ROCm 7.11.0

Added

  • Support for UCX (Unified Communication X) API tracing.
  • Profiling and metric collection capabilities for XGMI and PCIe data.
  • How-to document for XGMI and PCIe sampling and monitoring.
  • Documentation for --trace-legacy / -L CLI flag for direct tracing mode.
  • Added dependency to spdlog library.
  • Added environment variable ROCPROFSYS_LOG_LEVEL which control level of logging.
    • Available log levels: critical, error, warning, info(default), debug, trace and off.
  • Added cmake option ROCPROFSYS_GFX_TARGETS which controls GFX targets used to build example binaries.

Changed

  • ROCPROFSYS_TRACE now controls whether perfetto tracing is enabled (default: true when tracing mode).
  • ROCPROFSYS_TRACE_LEGACY controls whether to use legacy direct mode (true) or cached mode (false, default).
  • By default, tracing uses deferred trace generation (cached mode) for improved performance and minimal runtime overhead.
  • --trace / -T CLI flag enables tracing with cached mode by default.
  • --trace-legacy / -L CLI flag enables legacy direct mode for tracing.
  • Changed thread storage allocation from a hard-coded 4096-element array to a compile-time computed size derived from the ROCPROFSYS_MAX_THREADS configuration flag.
  • Changed logging module to use spdlog library.

Resolved issues

  • Fixed application termination with segfault when thread creation surpasses ROCPROFSYS_MAX_THREADS configuration.
  • Fixed how roctxRange markers are handled in the rocpd output. The "push" and "pop" markers are now shown as a single event.

Removed

  • ROCPROFSYS_TRACE_CACHED environment variable (tracing now uses cached mode by default when ROCPROFSYS_TRACE_LEGACY=false).

Deprecated

  • ROCPROFSYS_USE_PERFETTO environment variable (use ROCPROFSYS_TRACE).
  • ROCPROFSYS_VERBOSE and ROCPROFSYS_DEBUG environment variables (use ROCPROFSYS_LOG_LEVEL).

ROCm Systems Profiler 1.3.0 for ROCm 7.2.0

Added

  • Added a ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MS configuration setting to set the flush period for Perfetto traces. The default value is 10000 ms (10 seconds).
  • Added fetching of the rocpd schema from rocprofiler-sdk-rocpd

Changed

  • Improved Fortran main function detection to ensure rocprof-sys-instrument uses the Fortran program main function instead of the C wrapper.

Resolved issues

  • Fixed a crash when running rocprof-sys-python with ROCPROFSYS_USE_ROCPD enabled.
  • Fixed an issue where kernel/memory-copy events could appear on the wrong Perfetto track (e.g., queue track when stream grouping was requested) because _group_by_queue state leaked between records.

ROCm Systems Profiler 1.2.1 for ROCm 7.1.1

Resolved issues

  • Fixed an issue of OpenMP Tools (OMPT) events, GPU performance counters, VA-API, MPI, and host events failing to be collected in the rocpd output.

ROCm Systems Profiler 1.2.0 for ROCm 7.1.0

Added

  • ROCPROFSYS_ROCM_GROUP_BY_QUEUE configuration setting to allow grouping of events by hardware queue, instead of the default grouping.
  • Support for rocpd database output with the ROCPROFSYS_USE_ROCPD configuration setting.
  • Support for profiling PyTorch workloads using the rocpd output database.
  • Support for tracing OpenMP API in Fortran applications.
  • An error warning that is triggered if the profiler application fails due to SELinux enforcement being enabled. The warning includes steps to disable SELinux enforcement.

Changed

  • Updated the grouping of "kernel dispatch" and "memory copy" events in Perfetto traces. They are now grouped together by HIP Stream rather than separately and by hardware queue.
  • Updated PAPI module to v7.2.0b2.
  • ROCprofiler-SDK is now used for tracing OMPT API calls.

ROCm Systems Profiler 1.1.1 for ROCm 7.0.2

Resolved issues

  • Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.

ROCm Systems Profiler 1.1.0 for ROCm 7.0

Added

  • Profiling and metric collection capabilities for VCN engine activity, JPEG engine activity, and API tracing for rocDecode, rocJPEG, and VA-APIs.
  • How-to document for VCN and JPEG activity sampling and tracing.
  • Support for tracing Fortran applications.
  • Support for tracing MPI API in Fortran.

Changed

  • Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics.
  • ROCprofiler-SDK is now used to trace RCCL API and collect communication counters.
    • Use the setting ROCPROFSYS_USE_RCCLP = ON to enable profiling and tracing of RCCL application data.
  • Updated the Dyninst submodule to v13.0.
  • Set the default value of ROCPROFSYS_SAMPLING_CPUS to none.

Resolved issues

  • Fixed GPU metric collection settings with ROCPROFSYS_AMD_SMI_METRICS.
  • Fixed a build issue with CMake 4.
  • Fixed incorrect kernel names shown for kernel dispatch tracks in Perfetto.
  • Fixed formatting of some output logs.
  • Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.

ROCm Systems Profiler 1.0.2 for ROCm 6.4.2

Optimized

  • Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.

Resolved issues

  • Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from merge-multiprocess-output.sh to rocprof-sys-merge-output.sh.

ROCm Systems Profiler 1.0.1 for ROCm 6.4.1

Added

Resolved issues

  • Fixed a build issue with Dyninst on GCC 13.

ROCm Systems Profiler 1.0.0 for ROCm 6.4.0

Added

  • Support for VA-API and rocDecode tracing.

  • Aggregation of MPI data collected across distributed nodes and ranks. The data is concatenated into a single proto file.

Changed

  • Backend refactored to use ROCprofiler-SDK rather than ROCProfiler and ROCTracer.

Resolved issues

  • Fixed hardware counter summary files not being generated after profiling.

  • Fixed an application crash when collecting performance counters with ROCProfiler.

  • Fixed interruption in config file generation.

  • Fixed segmentation fault while running rocprof-sys-instrument.

  • Fixed an issue where running rocprof-sys-causal or using the -I all option with rocprof-sys-sample caused the system to become non-responsive.

  • Fixed an issue where sampling multi-GPU Python workloads caused the system to stop responding.

ROCm Systems Profiler 0.1.1 for ROCm 6.3.2

Resolved issues

  • Fixed an error when building from source on some SUSE and RHEL systems when using the ROCPROFSYS_BUILD_DYNINST option.

ROCm Systems Profiler 0.1.0 for ROCm 6.3.1

Added

  • Improvements to support OMPT target offload.

Resolved issues

  • Fixed an issue with generated Perfetto files.

  • Fixed an issue with merging multiple .proto files.

  • Fixed an issue causing GPU resource data to be missing from traces of Instinct MI300A systems.

  • Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from omnitrace.

ROCm Systems Profiler 0.1.0 for ROCm 6.3.0

Changed

  • Renamed Omnitrace to ROCm Systems Profiler.

Omnitrace 1.11.2 for ROCm 6.2.1

Known issues

  • Perfetto can no longer open Omnitrace proto files. Loading the Perfetto trace output .proto file in ui.perfetto.dev can result in a dialog with the message, "Oops, something went wrong! Please file a bug." The information in the dialog will refer to an "Unknown field type." The workaround is to open the files with the previous version of the Perfetto UI found at https://ui.perfetto.dev/v46.0-35b3d9845/#!/.