Files
rocm-systems/projects/rocprofiler-systems/CHANGELOG.md
T
Kian Cossettini 698ac6b8bc [rocprofiler-systems] Add build option for "examples" to specify gfx-arch (#2626)
## Motivation
 - Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
 - Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake` 
 - Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
 - Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
 - GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
 - OMPVV offload tests now only compile if `amdflang` version is `>= 20`
 - Improve link time by reducing the number of GFX targets that binaries need to support.
   - RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
2026-01-20 12:13:21 -05:00

7.8 KiB

Changelog for ROCm Systems Profiler

Full documentation for ROCm Systems Profiler is available at https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/.

ROCm Systems Profiler 1.4.0 for ROCm x.y.z (unreleased)

Added

  • Documentation for --trace-legacy / -L CLI flag for direct tracing mode.
  • Added dependency to spdlog library.
  • Added environment variable ROCPROFSYS_LOG_LEVEL which control level of logging.
    • Available log levels: critical, error, warning, info(default), debug, trace and off.
  • Added cmake option ROCPROFSYS_GFX_TARGETS which controls GFX targets used to build example binaries.

Changed

  • ROCPROFSYS_TRACE now controls whether perfetto tracing is enabled (default: true when tracing mode).
  • ROCPROFSYS_TRACE_LEGACY controls whether to use legacy direct mode (true) or cached mode (false, default).
  • By default, tracing uses deferred trace generation (cached mode) for improved performance and minimal runtime overhead.
  • --trace / -T CLI flag enables tracing with cached mode by default.
  • --trace-legacy / -L CLI flag enables legacy direct mode for tracing.
  • Changed thread storage allocation from a hard-coded 4096-element array to a compile-time computed size derived from the ROCPROFSYS_MAX_THREADS configuration flag.
  • Changed logging module to use spdlog library.

Resolved issues

  • Fixed application termination with segfault when thread creation surpasses ROCPROFSYS_MAX_THREADS configuration.

Removed

  • ROCPROFSYS_TRACE_CACHED environment variable (tracing now uses cached mode by default when ROCPROFSYS_TRACE_LEGACY=false).

Deprecated

  • ROCPROFSYS_USE_PERFETTO environment variable (use ROCPROFSYS_TRACE).
  • ROCPROFSYS_VERBOSE and ROCPROFSYS_DEBUG environment variables (use ROCPROFSYS_LOG_LEVEL).

ROCm Systems Profiler 1.3.0 for ROCm 7.2.0

Added

  • Profiling and metric collection capabilities for XGMI and PCIe data.
  • How-to document for XGMI and PCIe sampling and monitoring.
  • Added a ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MS configuration setting to set the flush period for Perfetto traces. The default value is 10000 ms (10 seconds).
  • Added fetching of the rocpd schema from rocprofiler-sdk-rocpd

Changed

  • Improved Fortran main function detection to ensure rocprof-sys-instrument uses the Fortran program main function instead of the C wrapper.

Resolved issues

  • Fixed a crash when running rocprof-sys-python with ROCPROFSYS_USE_ROCPD enabled.
  • Fixed an issue where kernel/memory-copy events could appear on the wrong Perfetto track (e.g., queue track when stream grouping was requested) because _group_by_queue state leaked between records.

ROCm Systems Profiler 1.2.1 for ROCm 7.1.1

Resolved issues

  • Fixed an issue of OpenMP Tools (OMPT) events, GPU performance counters, VA-API, MPI, and host events failing to be collected in the rocpd output.

ROCm Systems Profiler 1.2.0 for ROCm 7.1.0

Added

  • ROCPROFSYS_ROCM_GROUP_BY_QUEUE configuration setting to allow grouping of events by hardware queue, instead of the default grouping.
  • Support for rocpd database output with the ROCPROFSYS_USE_ROCPD configuration setting.
  • Support for profiling PyTorch workloads using the rocpd output database.
  • Support for tracing OpenMP API in Fortran applications.
  • An error warning that is triggered if the profiler application fails due to SELinux enforcement being enabled. The warning includes steps to disable SELinux enforcement.

Changed

  • Updated the grouping of "kernel dispatch" and "memory copy" events in Perfetto traces. They are now grouped together by HIP Stream rather than separately and by hardware queue.
  • Updated PAPI module to v7.2.0b2.
  • ROCprofiler-SDK is now used for tracing OMPT API calls.

ROCm Systems Profiler 1.1.1 for ROCm 7.0.2

Resolved issues

  • Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.

ROCm Systems Profiler 1.1.0 for ROCm 7.0

Added

  • Profiling and metric collection capabilities for VCN engine activity, JPEG engine activity, and API tracing for rocDecode, rocJPEG, and VA-APIs.
  • How-to document for VCN and JPEG activity sampling and tracing.
  • Support for tracing Fortran applications.
  • Support for tracing MPI API in Fortran.

Changed

  • Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics.
  • ROCprofiler-SDK is now used to trace RCCL API and collect communication counters.
    • Use the setting ROCPROFSYS_USE_RCCLP = ON to enable profiling and tracing of RCCL application data.
  • Updated the Dyninst submodule to v13.0.
  • Set the default value of ROCPROFSYS_SAMPLING_CPUS to none.

Resolved issues

  • Fixed GPU metric collection settings with ROCPROFSYS_AMD_SMI_METRICS.
  • Fixed a build issue with CMake 4.
  • Fixed incorrect kernel names shown for kernel dispatch tracks in Perfetto.
  • Fixed formatting of some output logs.
  • Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.

ROCm Systems Profiler 1.0.2 for ROCm 6.4.2

Optimized

  • Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.

Resolved issues

  • Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from merge-multiprocess-output.sh to rocprof-sys-merge-output.sh.

ROCm Systems Profiler 1.0.1 for ROCm 6.4.1

Added

Resolved issues

  • Fixed a build issue with Dyninst on GCC 13.

ROCm Systems Profiler 1.0.0 for ROCm 6.4.0

Added

  • Support for VA-API and rocDecode tracing.

  • Aggregation of MPI data collected across distributed nodes and ranks. The data is concatenated into a single proto file.

Changed

  • Backend refactored to use ROCprofiler-SDK rather than ROCProfiler and ROCTracer.

Resolved issues

  • Fixed hardware counter summary files not being generated after profiling.

  • Fixed an application crash when collecting performance counters with ROCProfiler.

  • Fixed interruption in config file generation.

  • Fixed segmentation fault while running rocprof-sys-instrument.

  • Fixed an issue where running rocprof-sys-causal or using the -I all option with rocprof-sys-sample caused the system to become non-responsive.

  • Fixed an issue where sampling multi-GPU Python workloads caused the system to stop responding.

ROCm Systems Profiler 0.1.1 for ROCm 6.3.2

Resolved issues

  • Fixed an error when building from source on some SUSE and RHEL systems when using the ROCPROFSYS_BUILD_DYNINST option.

ROCm Systems Profiler 0.1.0 for ROCm 6.3.1

Added

  • Improvements to support OMPT target offload.

Resolved issues

  • Fixed an issue with generated Perfetto files.

  • Fixed an issue with merging multiple .proto files.

  • Fixed an issue causing GPU resource data to be missing from traces of Instinct MI300A systems.

  • Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from omnitrace.

ROCm Systems Profiler 0.1.0 for ROCm 6.3.0

Changed

  • Renamed Omnitrace to ROCm Systems Profiler.

Omnitrace 1.11.2 for ROCm 6.2.1

Known issues

  • Perfetto can no longer open Omnitrace proto files. Loading the Perfetto trace output .proto file in ui.perfetto.dev can result in a dialog with the message, "Oops, something went wrong! Please file a bug." The information in the dialog will refer to an "Unknown field type." The workaround is to open the files with the previous version of the Perfetto UI found at https://ui.perfetto.dev/v46.0-35b3d9845/#!/.