## Motivation In order for Optiq to be able to detect that counter tracks are of the same type, we aligned `info_pmc` symbol naming across the tracks of the same type. Being able to know this will be useful for grouping and categorizing similar types of counter tracks and for setting up a consistent y-axis scale when plotting the values on charts. ## Technical Details Replace unique and/or ordered symbol names with counter-common symbol name which will be the same for the counters of the same type, with counter track name remaining the unique identifier for that counter track. For example, the "symbol" field was "JpegAct_0" but is now "JpegAct".
8.2 KiB
Changelog for ROCm Systems Profiler
Full documentation for ROCm Systems Profiler is available at https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/.
ROCm Systems Profiler 1.5.0 for ROCm x.y.z (unreleased)
Changed
- Simplify categorizing like pmc_info events by removing the _ from the "symbol" field. ie., "JpegAct_0" -> "JpegAct".
ROCm Systems Profiler 1.4.0 for ROCm 7.11.0
Added
- Support for UCX (Unified Communication X) API tracing.
- Profiling and metric collection capabilities for XGMI and PCIe data.
- How-to document for XGMI and PCIe sampling and monitoring.
- Documentation for
--trace-legacy/-LCLI flag for direct tracing mode. - Added dependency to
spdloglibrary. - Added environment variable
ROCPROFSYS_LOG_LEVELwhich control level of logging.- Available log levels:
critical,error,warning,info(default),debug,traceandoff.
- Available log levels:
- Added cmake option
ROCPROFSYS_GFX_TARGETSwhich controls GFX targets used to build example binaries.
Changed
ROCPROFSYS_TRACEnow controls whether perfetto tracing is enabled (default: true when tracing mode).ROCPROFSYS_TRACE_LEGACYcontrols whether to use legacy direct mode (true) or cached mode (false, default).- By default, tracing uses deferred trace generation (cached mode) for improved performance and minimal runtime overhead.
--trace/-TCLI flag enables tracing with cached mode by default.--trace-legacy/-LCLI flag enables legacy direct mode for tracing.- Changed thread storage allocation from a hard-coded 4096-element array to a compile-time computed size derived from the ROCPROFSYS_MAX_THREADS configuration flag.
- Changed logging module to use
spdloglibrary.
Resolved issues
- Fixed application termination with segfault when thread creation surpasses ROCPROFSYS_MAX_THREADS configuration.
- Fixed how
roctxRangemarkers are handled in therocpdoutput. The "push" and "pop" markers are now shown as a single event.
Removed
ROCPROFSYS_TRACE_CACHEDenvironment variable (tracing now uses cached mode by default whenROCPROFSYS_TRACE_LEGACY=false).
Deprecated
ROCPROFSYS_USE_PERFETTOenvironment variable (useROCPROFSYS_TRACE).ROCPROFSYS_VERBOSEandROCPROFSYS_DEBUGenvironment variables (useROCPROFSYS_LOG_LEVEL).
ROCm Systems Profiler 1.3.0 for ROCm 7.2.0
Added
- Added a
ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MSconfiguration setting to set the flush period for Perfetto traces. The default value is 10000 ms (10 seconds). - Added fetching of the
rocpdschema from rocprofiler-sdk-rocpd
Changed
- Improved Fortran main function detection to ensure
rocprof-sys-instrumentuses the Fortran program main function instead of the C wrapper.
Resolved issues
- Fixed a crash when running
rocprof-sys-pythonwith ROCPROFSYS_USE_ROCPD enabled. - Fixed an issue where kernel/memory-copy events could appear on the wrong Perfetto track (e.g., queue track when stream grouping was requested) because _group_by_queue state leaked between records.
ROCm Systems Profiler 1.2.1 for ROCm 7.1.1
Resolved issues
- Fixed an issue of OpenMP Tools (OMPT) events, GPU performance counters, VA-API, MPI, and host events failing to be collected in the
rocpdoutput.
ROCm Systems Profiler 1.2.0 for ROCm 7.1.0
Added
ROCPROFSYS_ROCM_GROUP_BY_QUEUEconfiguration setting to allow grouping of events by hardware queue, instead of the default grouping.- Support for
rocpddatabase output with theROCPROFSYS_USE_ROCPDconfiguration setting. - Support for profiling PyTorch workloads using the
rocpdoutput database. - Support for tracing OpenMP API in Fortran applications.
- An error warning that is triggered if the profiler application fails due to SELinux enforcement being enabled. The warning includes steps to disable SELinux enforcement.
Changed
- Updated the grouping of "kernel dispatch" and "memory copy" events in Perfetto traces. They are now grouped together by HIP Stream rather than separately and by hardware queue.
- Updated PAPI module to v7.2.0b2.
- ROCprofiler-SDK is now used for tracing OMPT API calls.
ROCm Systems Profiler 1.1.1 for ROCm 7.0.2
Resolved issues
- Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.
ROCm Systems Profiler 1.1.0 for ROCm 7.0
Added
- Profiling and metric collection capabilities for VCN engine activity, JPEG engine activity, and API tracing for rocDecode, rocJPEG, and VA-APIs.
- How-to document for VCN and JPEG activity sampling and tracing.
- Support for tracing Fortran applications.
- Support for tracing MPI API in Fortran.
Changed
- Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics.
- ROCprofiler-SDK is now used to trace RCCL API and collect communication counters.
- Use the setting
ROCPROFSYS_USE_RCCLP = ONto enable profiling and tracing of RCCL application data.
- Use the setting
- Updated the Dyninst submodule to v13.0.
- Set the default value of
ROCPROFSYS_SAMPLING_CPUStonone.
Resolved issues
- Fixed GPU metric collection settings with
ROCPROFSYS_AMD_SMI_METRICS. - Fixed a build issue with CMake 4.
- Fixed incorrect kernel names shown for kernel dispatch tracks in Perfetto.
- Fixed formatting of some output logs.
- Fixed an issue where ROC-TX ranges were displayed as two separate events instead of a single spanning event.
ROCm Systems Profiler 1.0.2 for ROCm 6.4.2
Optimized
- Improved readability of the OpenMP target offload traces by showing on a single Perfetto track.
Resolved issues
- Fixed the file path to the script that merges Perfetto files from multi-process MPI runs. The script has also been renamed from
merge-multiprocess-output.shtorocprof-sys-merge-output.sh.
ROCm Systems Profiler 1.0.1 for ROCm 6.4.1
Added
- How-to document for network performance profiling for standard Network Interface Cards (NICs).
Resolved issues
- Fixed a build issue with Dyninst on GCC 13.
ROCm Systems Profiler 1.0.0 for ROCm 6.4.0
Added
-
Support for VA-API and rocDecode tracing.
-
Aggregation of MPI data collected across distributed nodes and ranks. The data is concatenated into a single proto file.
Changed
- Backend refactored to use ROCprofiler-SDK rather than ROCProfiler and ROCTracer.
Resolved issues
-
Fixed hardware counter summary files not being generated after profiling.
-
Fixed an application crash when collecting performance counters with ROCProfiler.
-
Fixed interruption in config file generation.
-
Fixed segmentation fault while running
rocprof-sys-instrument. -
Fixed an issue where running
rocprof-sys-causalor using the-I alloption withrocprof-sys-samplecaused the system to become non-responsive. -
Fixed an issue where sampling multi-GPU Python workloads caused the system to stop responding.
ROCm Systems Profiler 0.1.1 for ROCm 6.3.2
Resolved issues
- Fixed an error when building from source on some SUSE and RHEL systems when using the
ROCPROFSYS_BUILD_DYNINSToption.
ROCm Systems Profiler 0.1.0 for ROCm 6.3.1
Added
- Improvements to support OMPT target offload.
Resolved issues
-
Fixed an issue with generated Perfetto files.
-
Fixed an issue with merging multiple
.protofiles. -
Fixed an issue causing GPU resource data to be missing from traces of Instinct MI300A systems.
-
Fixed a minor issue for users upgrading to ROCm 6.3 from 6.2 post-rename from
omnitrace.
ROCm Systems Profiler 0.1.0 for ROCm 6.3.0
Changed
- Renamed Omnitrace to ROCm Systems Profiler.
Omnitrace 1.11.2 for ROCm 6.2.1
Known issues
- Perfetto can no longer open Omnitrace proto files. Loading the Perfetto trace output
.protofile inui.perfetto.devcan result in a dialog with the message, "Oops, something went wrong! Please file a bug." The information in the dialog will refer to an "Unknown field type." The workaround is to open the files with the previous version of the Perfetto UI found at https://ui.perfetto.dev/v46.0-35b3d9845/#!/.