Files
rocm-systems/projects/rocprofiler-systems/CHANGELOG.md
T

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

201 строка
8.2 KiB
Markdown
Исходник Обычный вид История

<!-- markdownlint-disable MD024 -->
2024-11-07 11:51:02 -05:00
# Changelog for ROCm Systems Profiler
Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/).
## ROCm Systems Profiler 1.5.0 for ROCm x.y.z (unreleased)
### Changed
- Simplify categorizing like pmc_info events by removing the _<idx> from the "symbol" field. ie., "JpegAct_0" -> "JpegAct".
2026-01-22 17:48:30 -05:00
## ROCm Systems Profiler 1.4.0 for ROCm 7.11.0
### Added
- Support for UCX (Unified Communication X) API tracing.
2026-01-22 17:48:30 -05:00
- Profiling and metric collection capabilities for XGMI and PCIe data.
- How-to document for XGMI and PCIe sampling and monitoring.
- Documentation for `--trace-legacy` / `-L` CLI flag for direct tracing mode.
- Added dependency to `spdlog` library.
- Added environment variable `ROCPROFSYS_LOG_LEVEL` which control level of logging.
- Available log levels: `critical`, `error`, `warning`, `info`(default), `debug`, `trace` and `off`.
- Added cmake option `ROCPROFSYS_GFX_TARGETS` which controls GFX targets used to build example binaries.
### Changed
- `ROCPROFSYS_TRACE` now controls whether perfetto tracing is enabled (default: true when tracing mode).
- `ROCPROFSYS_TRACE_LEGACY` controls whether to use legacy direct mode (true) or cached mode (false, default).
- By default, tracing uses deferred trace generation (cached mode) for improved performance and minimal runtime overhead.
- `--trace` / `-T` CLI flag enables tracing with cached mode by default.
- `--trace-legacy` / `-L` CLI flag enables legacy direct mode for tracing.
- Changed thread storage allocation from a hard-coded 4096-element array to a compile-time computed size derived from the ROCPROFSYS_MAX_THREADS configuration flag.
- Changed logging module to use `spdlog` library.
### Resolved issues
- Fixed application termination with segfault when thread creation surpasses ROCPROFSYS_MAX_THREADS configuration.
- Fixed how `roctxRange` markers are handled in the `rocpd` output. The "push" and "pop" markers are now shown as a single event.
### Removed
- `ROCPROFSYS_TRACE_CACHED` environment variable (tracing now uses cached mode by default when `ROCPROFSYS_TRACE_LEGACY=false`).
### Deprecated
- `ROCPROFSYS_USE_PERFETTO` environment variable (use `ROCPROFSYS_TRACE`).
- `ROCPROFSYS_VERBOSE` and `ROCPROFSYS_DEBUG` environment variables (use `ROCPROFSYS_LOG_LEVEL`).
2025-10-17 15:30:29 +02:00
## ROCm Systems Profiler 1.3.0 for ROCm 7.2.0
### Added
- Added a `ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MS` configuration setting to set the flush period for Perfetto traces. The default value is 10000 ms (10 seconds).
- Added fetching of the `rocpd` schema from rocprofiler-sdk-rocpd
2025-10-17 15:30:29 +02:00
### Changed
- Improved Fortran main function detection to ensure `rocprof-sys-instrument` uses the Fortran program main function instead of the C wrapper.
### Resolved issues
- Fixed a crash when running `rocprof-sys-python` with ROCPROFSYS_USE_ROCPD enabled.
- Fixed an issue where kernel/memory-copy events could appear on the wrong Perfetto track (e.g., queue track when stream grouping was requested) because _group_by_queue state leaked between records.
## ROCm Systems Profiler 1.2.1 for ROCm 7.1.1
### Resolved issues
- Fixed an issue of OpenMP Tools (OMPT) events, GPU performance counters, VA-API, MPI, and host events failing to be collected in the `rocpd` output.
## ROCm Systems Profiler 1.2.0 for ROCm 7.1.0
### Added
- ``ROCPROFSYS_ROCM_GROUP_BY_QUEUE`` configuration setting to allow grouping of events by hardware queue, instead of the default grouping.
- Support for `rocpd` database output with the `ROCPROFSYS_USE_ROCPD` configuration setting.
- Support for profiling PyTorch workloads using the `rocpd` output database.
- Support for tracing OpenMP API in Fortran applications.
- An error warning that is triggered if the profiler application fails due to SELinux enforcement being enabled. The warning includes steps to disable SELinux enforcement.
### Changed
- Updated the grouping of "kernel dispatch" and "memory copy" events in Perfetto traces. They are now grouped together by HIP Stream rather than separately and by hardware queue.
- Updated PAPI module to v7.2.0b2.
- ROCprofiler-SDK is now used for tracing OMPT API calls.
## ROCm Systems Profiler 1.1.1 for ROCm 7.0.2
### Resolved issues
- Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.
2025-06-24 11:10:26 -04:00
## ROCm Systems Profiler 1.1.0 for ROCm 7.0
### Added
2025-06-24 11:10:26 -04:00
- Profiling and metric collection capabilities for VCN engine activity, JPEG engine activity, and API tracing for rocDecode, rocJPEG, and VA-APIs.
- How-to document for VCN and JPEG activity sampling and tracing.
2025-06-24 11:10:26 -04:00
- Support for tracing Fortran applications.
- Support for tracing MPI API in Fortran.
2025-03-31 11:07:50 -04:00
### Changed
- Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics.
2025-06-24 11:10:26 -04:00
- ROCprofiler-SDK is now used to trace RCCL API and collect communication counters.
- Use the setting `ROCPROFSYS_USE_RCCLP = ON` to enable profiling and tracing of RCCL application data.
2025-06-24 11:10:26 -04:00
- Updated the Dyninst submodule to v13.0.
- Set the default value of `ROCPROFSYS_SAMPLING_CPUS` to `none`.
2025-03-31 11:07:50 -04:00
2025-03-20 12:33:48 -04:00
### Resolved issues
2025-06-24 11:10:26 -04:00
- Fixed GPU metric collection settings with `ROCPROFSYS_AMD_SMI_METRICS`.
- Fixed a build issue with CMake 4.
- Fixed incorrect kernel names shown for kernel dispatch tracks in Perfetto.
- Fixed formatting of some output logs.
- Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.
2025-06-24 11:10:26 -04:00
## ROCm Systems Profiler 1.0.2 for ROCm 6.4.2
### Optimized
- Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
### Resolved issues
- Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from `merge-multiprocess-output.sh` to `rocprof-sys-merge-output.sh`.
2025-03-20 12:33:48 -04:00
## ROCm Systems Profiler 1.0.1 for ROCm 6.4.1
### Added
- How-to document for [network performance profiling](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/amd-staging/how-to/nic-profiling.html) for standard Network Interface Cards (NICs).
### Resolved issues
- Fixed a build issue with Dyninst on GCC 13.
## ROCm Systems Profiler 1.0.0 for ROCm 6.4.0
2025-03-20 12:33:48 -04:00
### Added
- Support for VA-API and rocDecode tracing.
- Aggregation of MPI data collected across distributed nodes and ranks. The data is concatenated into a single proto file.
2025-03-20 12:33:48 -04:00
### Changed
- Backend refactored to use ROCprofiler-SDK rather than ROCProfiler and ROCTracer.
2025-03-20 12:33:48 -04:00
### Resolved issues
- Fixed hardware counter summary files not being generated after profiling.
- Fixed an application crash when collecting performance counters with ROCProfiler.
2025-03-20 12:33:48 -04:00
- Fixed interruption in config file generation.
- Fixed segmentation fault while running `rocprof-sys-instrument`.
- Fixed an issue where running `rocprof-sys-causal` or using the `-I all` option with `rocprof-sys-sample` caused the system to become non-responsive.
- Fixed an issue where sampling multi-GPU Python workloads caused the system to stop responding.
2025-03-20 12:33:48 -04:00
2025-01-10 11:05:25 -05:00
## ROCm Systems Profiler 0.1.1 for ROCm 6.3.2
### Resolved issues
- Fixed an error when building from source on some SUSE and RHEL systems when using the `ROCPROFSYS_BUILD_DYNINST` option.
## ROCm Systems Profiler 0.1.0 for ROCm 6.3.1
### Added
- Improvements to support OMPT target offload.
### Resolved issues
- Fixed an issue with generated Perfetto files.
- Fixed an issue with merging multiple `.proto` files.
- Fixed an issue causing GPU resource data to be missing from traces of Instinct MI300A systems.
- Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from `omnitrace`.
2024-11-07 11:51:02 -05:00
## ROCm Systems Profiler 0.1.0 for ROCm 6.3.0
### Changed
- Renamed Omnitrace to ROCm Systems Profiler.
## Omnitrace 1.11.2 for ROCm 6.2.1
### Known issues
- Perfetto can no longer open Omnitrace proto files. Loading the Perfetto trace output `.proto` file in `ui.perfetto.dev` can
result in a dialog with the message, "Oops, something went wrong! Please file a bug." The information in the dialog will
refer to an "Unknown field type." The workaround is to open the files with the previous version of the Perfetto UI found
at <https://ui.perfetto.dev/v46.0-35b3d9845/#!/>.