2025-08-26 22:03:51 -04:00
<!-- markdownlint-disable MD024 -->
2024-11-07 11:51:02 -05:00
# Changelog for ROCm Systems Profiler
Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/ ](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/ ).
2026-01-23 06:36:08 +01:00
## ROCm Systems Profiler 1.5.0 for ROCm x.y.z (unreleased)
### Changed
- Simplify categorizing like pmc_info events by removing the _<idx> from the "symbol" field. ie., "JpegAct_0" -> "JpegAct".
2026-01-22 17:48:30 -05:00
## ROCm Systems Profiler 1.4.0 for ROCm 7.11.0
2025-12-22 12:47:35 +01:00
### Added
2026-01-20 13:16:43 -05:00
- Support for UCX (Unified Communication X) API tracing.
2026-01-22 17:48:30 -05:00
- Profiling and metric collection capabilities for XGMI and PCIe data.
- How-to document for XGMI and PCIe sampling and monitoring.
2025-12-22 12:47:35 +01:00
- Documentation for `--trace-legacy` / `-L` CLI flag for direct tracing mode.
2026-01-14 21:27:51 +01:00
- Added dependency to `spdlog` library.
- Added environment variable `ROCPROFSYS_LOG_LEVEL` which control level of logging.
- Available log levels: `critical` , `error` , `warning` , `info` (default), `debug` , `trace` and `off` .
2026-01-20 12:13:21 -05:00
- Added cmake option `ROCPROFSYS_GFX_TARGETS` which controls GFX targets used to build example binaries.
2025-12-22 12:47:35 +01:00
### Changed
2025-12-25 13:36:04 +01:00
- `ROCPROFSYS_TRACE` now controls whether perfetto tracing is enabled (default: true when tracing mode).
- `ROCPROFSYS_TRACE_LEGACY` controls whether to use legacy direct mode (true) or cached mode (false, default).
- By default, tracing uses deferred trace generation (cached mode) for improved performance and minimal runtime overhead.
- `--trace` / `-T` CLI flag enables tracing with cached mode by default.
- `--trace-legacy` / `-L` CLI flag enables legacy direct mode for tracing.
2026-01-08 00:33:37 +05:30
- Changed thread storage allocation from a hard-coded 4096-element array to a compile-time computed size derived from the ROCPROFSYS_MAX_THREADS configuration flag.
2026-01-14 21:27:51 +01:00
- Changed logging module to use `spdlog` library.
2026-01-08 00:33:37 +05:30
### Resolved issues
- Fixed application termination with segfault when thread creation surpasses ROCPROFSYS_MAX_THREADS configuration.
2026-01-23 10:17:43 +05:30
- Fixed how `roctxRange` markers are handled in the `rocpd` output. The "push" and "pop" markers are now shown as a single event.
2025-12-25 13:36:04 +01:00
### Removed
- `ROCPROFSYS_TRACE_CACHED` environment variable (tracing now uses cached mode by default when `ROCPROFSYS_TRACE_LEGACY=false` ).
2025-12-22 12:47:35 +01:00
### Deprecated
2025-12-25 13:36:04 +01:00
- `ROCPROFSYS_USE_PERFETTO` environment variable (use `ROCPROFSYS_TRACE` ).
2026-01-14 21:27:51 +01:00
- `ROCPROFSYS_VERBOSE` and `ROCPROFSYS_DEBUG` environment variables (use `ROCPROFSYS_LOG_LEVEL` ).
2025-12-22 12:47:35 +01:00
2025-10-17 15:30:29 +02:00
## ROCm Systems Profiler 1.3.0 for ROCm 7.2.0
### Added
- Added a `ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MS` configuration setting to set the flush period for Perfetto traces. The default value is 10000 ms (10 seconds).
2025-11-07 15:45:29 +01:00
- Added fetching of the `rocpd` schema from rocprofiler-sdk-rocpd
2025-10-17 15:30:29 +02:00
2025-10-21 16:41:29 -04:00
### Changed
- Improved Fortran main function detection to ensure `rocprof-sys-instrument` uses the Fortran program main function instead of the C wrapper.
2025-10-28 13:06:07 -04:00
### Resolved issues
- Fixed a crash when running `rocprof-sys-python` with ROCPROFSYS_USE_ROCPD enabled.
2025-11-06 08:16:44 +05:30
- Fixed an issue where kernel/memory-copy events could appear on the wrong Perfetto track (e.g., queue track when stream grouping was requested) because _group_by_queue state leaked between records.
2025-10-28 13:06:07 -04:00
2025-11-17 11:47:08 -05:00
## ROCm Systems Profiler 1.2.1 for ROCm 7.1.1
### Resolved issues
- Fixed an issue of OpenMP Tools (OMPT) events, GPU performance counters, VA-API, MPI, and host events failing to be collected in the `rocpd` output.
2025-10-15 23:11:46 -04:00
## ROCm Systems Profiler 1.2.0 for ROCm 7.1.0
2025-08-26 22:03:51 -04:00
### Added
- ``ROCPROFSYS_ROCM_GROUP_BY_QUEUE` ` configuration setting to allow grouping of events by hardware queue, instead of the default grouping.
2025-10-15 23:11:46 -04:00
- Support for `rocpd` database output with the `ROCPROFSYS_USE_ROCPD` configuration setting.
- Support for profiling PyTorch workloads using the `rocpd` output database.
- Support for tracing OpenMP API in Fortran applications.
- An error warning that is triggered if the profiler application fails due to SELinux enforcement being enabled. The warning includes steps to disable SELinux enforcement.
2025-08-26 22:03:51 -04:00
2025-10-15 23:11:46 -04:00
### Changed
2025-08-26 22:03:51 -04:00
- Updated the grouping of "kernel dispatch" and "memory copy" events in Perfetto traces. They are now grouped together by HIP Stream rather than separately and by hardware queue.
2025-10-15 23:11:46 -04:00
- Updated PAPI module to v7.2.0b2.
- ROCprofiler-SDK is now used for tracing OMPT API calls.
## ROCm Systems Profiler 1.1.1 for ROCm 7.0.2
### Resolved issues
- Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.
2025-08-26 22:03:51 -04:00
2025-06-24 11:10:26 -04:00
## ROCm Systems Profiler 1.1.0 for ROCm 7.0
2025-03-06 18:03:33 -05:00
### Added
2025-06-24 11:10:26 -04:00
- Profiling and metric collection capabilities for VCN engine activity, JPEG engine activity, and API tracing for rocDecode, rocJPEG, and VA-APIs.
2025-06-02 13:31:18 -04:00
- How-to document for VCN and JPEG activity sampling and tracing.
2025-06-24 11:10:26 -04:00
- Support for tracing Fortran applications.
- Support for tracing MPI API in Fortran.
2025-03-06 18:03:33 -05:00
2025-03-31 11:07:50 -04:00
### Changed
2025-04-24 16:19:13 -04:00
- Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics.
2025-06-24 11:10:26 -04:00
- ROCprofiler-SDK is now used to trace RCCL API and collect communication counters.
2025-08-13 18:01:18 -04:00
- Use the setting `ROCPROFSYS_USE_RCCLP = ON` to enable profiling and tracing of RCCL application data.
2025-06-24 11:10:26 -04:00
- Updated the Dyninst submodule to v13.0.
- Set the default value of `ROCPROFSYS_SAMPLING_CPUS` to `none` .
2025-03-31 11:07:50 -04:00
2025-03-20 12:33:48 -04:00
### Resolved issues
2025-06-24 11:10:26 -04:00
- Fixed GPU metric collection settings with `ROCPROFSYS_AMD_SMI_METRICS` .
- Fixed a build issue with CMake 4.
- Fixed incorrect kernel names shown for kernel dispatch tracks in Perfetto.
- Fixed formatting of some output logs.
2025-07-14 19:31:14 -04:00
- Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.
2025-06-24 11:10:26 -04:00
## ROCm Systems Profiler 1.0.2 for ROCm 6.4.2
### Optimized
- Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
### Resolved issues
- Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from `merge-multiprocess-output.sh` to `rocprof-sys-merge-output.sh` .
2025-03-20 12:33:48 -04:00
2025-05-05 15:06:37 -04:00
## ROCm Systems Profiler 1.0.1 for ROCm 6.4.1
2025-04-24 16:19:13 -04:00
### Added
2025-05-05 15:06:37 -04:00
- How-to document for [network performance profiling ](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/amd-staging/how-to/nic-profiling.html ) for standard Network Interface Cards (NICs).
### Resolved issues
- Fixed a build issue with Dyninst on GCC 13.
2025-04-24 16:19:13 -04:00
## ROCm Systems Profiler 1.0.0 for ROCm 6.4.0
2025-03-20 12:33:48 -04:00
### Added
- Support for VA-API and rocDecode tracing.
2025-04-24 16:19:13 -04:00
- Aggregation of MPI data collected across distributed nodes and ranks. The data is concatenated into a single proto file.
2025-03-20 12:33:48 -04:00
### Changed
2025-04-24 16:19:13 -04:00
- Backend refactored to use ROCprofiler-SDK rather than ROCProfiler and ROCTracer.
2025-03-20 12:33:48 -04:00
### Resolved issues
- Fixed hardware counter summary files not being generated after profiling.
2025-04-24 16:19:13 -04:00
- Fixed an application crash when collecting performance counters with ROCProfiler.
2025-03-20 12:33:48 -04:00
- Fixed interruption in config file generation.
2025-04-24 16:19:13 -04:00
- Fixed segmentation fault while running `rocprof-sys-instrument` .
- Fixed an issue where running `rocprof-sys-causal` or using the `-I all` option with `rocprof-sys-sample` caused the system to become non-responsive.
- Fixed an issue where sampling multi-GPU Python workloads caused the system to stop responding.
2025-03-20 12:33:48 -04:00
2025-01-10 11:05:25 -05:00
## ROCm Systems Profiler 0.1.1 for ROCm 6.3.2
### Resolved issues
- Fixed an error when building from source on some SUSE and RHEL systems when using the `ROCPROFSYS_BUILD_DYNINST` option.
## ROCm Systems Profiler 0.1.0 for ROCm 6.3.1
### Added
- Improvements to support OMPT target offload.
### Resolved issues
- Fixed an issue with generated Perfetto files.
- Fixed an issue with merging multiple `.proto` files.
- Fixed an issue causing GPU resource data to be missing from traces of Instinct MI300A systems.
- Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from `omnitrace` .
2024-11-07 11:51:02 -05:00
## ROCm Systems Profiler 0.1.0 for ROCm 6.3.0
### Changed
- Renamed Omnitrace to ROCm Systems Profiler.
## Omnitrace 1.11.2 for ROCm 6.2.1
### Known issues
- Perfetto can no longer open Omnitrace proto files. Loading the Perfetto trace output `.proto` file in `ui.perfetto.dev` can
result in a dialog with the message, "Oops, something went wrong! Please file a bug." The information in the dialog will
refer to an "Unknown field type." The workaround is to open the files with the previous version of the Perfetto UI found
2025-08-26 22:03:51 -04:00
at <https://ui.perfetto.dev/v46.0-35b3d9845/#!/>.