2024-11-01 12:20:21 -04:00
# Changelog for ROCm Compute Profiler
2024-06-04 00:03:43 +00:00
2024-11-01 12:20:21 -04:00
Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/ ](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/ ).
2024-08-13 12:29:32 -04:00
2025-02-20 17:51:57 -05:00
## Unreleased
2025-02-27 12:46:47 -07:00
### Added
2025-07-28 11:02:28 -04:00
* Add `rocpd` choice for `--format-rocprof-output` option in profile mode
2025-08-08 14:39:10 -04:00
2025-07-28 11:02:28 -04:00
* Add `--retain-rocpd-output` option in profile mode to save large raw rocpd databases in workload directory
2025-08-08 14:39:10 -04:00
2025-07-31 19:02:50 -04:00
* Show description of metrics during analysis
* Use `--include-cols Description` to show the Description column, which is excluded by default from the
ROCm Compute Profiler CLI output.
2025-07-28 11:02:28 -04:00
2025-08-08 14:39:10 -04:00
* Add missing counters based on register specification which enables missing metrics
* Enable SQC_DCACHE_INFLIGHT_LEVEL counter and associated metrics
* Enable TCP_TCP_LATENCY counter and associated counter for all GPUs except MI300
2025-07-28 11:02:28 -04:00
### Changed
* Add notice for change in default output format to `rocpd` in a future release
* This is displayed when `--format-rocprof-output rocpd` is not used in profile mode
* When `--format-rocprof-output rocpd` is used, only pmc_perf.csv will be written to workload directory instead of mulitple csv files.
2025-07-31 19:02:50 -04:00
* Improve analysis block based filtering to accept metric id level filtering
* This can be used to collect individual metrics from various sections of analysis config
2025-07-28 11:02:28 -04:00
2025-07-31 19:02:50 -04:00
* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
* Remove metrics from analysis configuration files which are explicitly marked as empty or None
2025-07-28 11:02:28 -04:00
2025-08-01 11:31:43 -04:00
* Change the basic view of TUI from aggregated analysis data to individual kernel analysis data
2025-08-06 18:39:50 -04:00
* Update `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit`
* Theoretical Bandwidth (section 1202)
* L1I-L2 Bandwidth (section 1303)
* sL1D-L2 BW (section 1403)
* Cache BW (section 1603)
* L1-L2 BW (section 1603)
* Read BW (section 1702)
* Write and Atomic BW (section 1702)
* Bandwidth (section 1703)
* Atomic/Read/Write Bandwidth (section 1703)
* Atomic/Read/Write Bandwidth - (HBM/PCIe/Infinity Fabric) (section 1706)
* Add `Utilization` to metric name for the following `Bandwidth` related metrics whose `Unit` is `Percent`
* Theoretical Bandwidth Utilization (section 1201)
* L1I-L2 Bandwidth Utilization (section 1301)
* Bandwidth Utilization (section 1301)
* Bandwidth Utilization (section 1401)
* sL1D-L2 BW Utilization (section 1401)
* Bandwidth Utilization (section 1601)
2025-08-20 17:00:54 -04:00
* Update `System Speed-of-Light` panel to `GPU Speed-of-Light` in TUI with the following metrics:
* Theoretical LDS Bandwidth
* vL1D Cache BW
* L2 Cache BW
* L2-Fabric Read BW
* L2-Fabric Write BW
* Kernel Time
* Kernel Time (Cycles)
* SIMD Utilization
* Clock Rate
* Add `Compute Throughput` panel to TUI with the following metrics:
* VALU FLOPs
* VALU IOPs
* MFMA FLOPs (F8)
* MFMA FLOPs (BF16)
* MFMA FLOPs (F16)
* MFMA FLOPs (F32)
* MFMA FLOPs (F64)
* MFMA FLOPs (F6F4) (in gfx950)
* MFMA IOPs (Int8)
* SALU Utilization
* VALU Utilization
* MFMA Utilization
* VMEM Utilization
* Branch Utilization
* IPC
* Add `Memory Throughput` panel to TUI with the following metrics:
* vL1D Cache BW
* vL1D Cache Utilization
* Theoretical LDS Bandwidth
* LDS Utilization
* L2 Cache BW
* L2 Cache Utilization
* L2-Fabric Read BW
* L2-Fabric Write BW
* sL1D Cache BW
* L1I BW
* Address Processing Unit Busy
* Data-Return Busy
* L1I-L2 Bandwidth
* sL1D-L2 BW
2025-07-31 19:02:50 -04:00
### Resolved issues
2025-07-28 11:02:28 -04:00
2025-07-31 19:02:50 -04:00
* Fixed not detecting memory clock issue when using amd-smi
* Fixed standalone GUI crashing
* Fixed L2 read/write/atomic bandwidths on MI350
* Update metric names for better alignment between analysis configuration and documentation
2025-08-08 14:39:10 -04:00
* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100
2025-06-18 13:19:58 -04:00
2025-07-31 19:02:50 -04:00
### Known issues
2025-06-06 12:43:52 -06:00
2025-07-31 19:02:50 -04:00
### Optimized
2025-06-06 12:43:52 -06:00
2025-07-31 19:02:50 -04:00
* Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
2025-06-04 17:06:08 -04:00
2025-08-08 14:39:10 -04:00
* Improve logic to obtain rocprof supported counters which prevents unnecessary warnings
2025-07-31 19:02:50 -04:00
### Removed
2025-04-11 19:30:18 -04:00
2025-07-31 19:02:50 -04:00
* Usage of rocm-smi
* Hardware IP block based filtering has been removed in favor of analysis report block based filtering
2025-08-01 11:31:43 -04:00
* Remove aggregated analysis view from TUI mode
2025-02-27 12:46:47 -07:00
2025-03-10 14:42:56 -04:00
2025-08-13 14:49:53 -04:00
## ROCm Compute Profiler 3.2.3 for ROCm 7.0.0
2025-03-25 15:02:09 -04:00
2025-07-31 19:02:50 -04:00
### Added
2025-03-26 21:07:48 -04:00
2025-07-31 19:02:50 -04:00
#### CDNA4 (AMD Instinct MI350/MI355) support
2025-03-28 16:51:49 -06:00
2025-04-03 02:21:18 -04:00
* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
* VALU co-issue (Two VALUs are issued instructions) efficiency
* Stream Processor Instruction (SPI) Wave Occupancy
* Scheduler-Pipe Wave Utilization
* Scheduler FIFO Full Rate
* CPC ADC Utilization
2025-07-16 12:39:24 -04:00
* F6F4 data type metrics
2025-04-03 02:21:18 -04:00
* Update formula for total FLOPs while taking into account F6F4 ops
* LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
* LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
* LDS FIFO full rate
* Sequencer -> TA ADDR Stall rates
* Sequencer -> TA CMD Stall rates
* Sequencer -> TA DATA Stall rates
* L1 latencies
* L2 latencies
* L2 to EA stalls
* L2 to EA stalls per channel
2025-04-02 14:43:12 -04:00
2025-07-31 19:02:50 -04:00
* Roofline support for AMD Instinct MI350 series architecture.
#### Textual User Interface (TUI) (beta version)
* Text User Interface (TUI) support for analyze mode
* A command line based user interface to support interactive single-run analysis
* To launch, use `--tui` option in analyze mode. For example, ``rocprof-compute analyze --tui` `.
#### PC Sampling (beta version)
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
* Host-trap PC Sampling has been enabled for AMD Instinct MI200 series and later accelerators.
* Support for sorting of PC sampling by type: offset or count.
* PC Sampling Support on CLI and TUI analysis.
#### Roofline
2025-04-11 17:45:53 -04:00
2025-07-31 19:02:50 -04:00
* Support for Roofline plot on CLI (single run) analysis.
2025-04-28 16:08:23 -04:00
2025-07-31 19:02:50 -04:00
* Roofline support for RHEL 10 OS.
* FP4 and FP6 data types have been added for roofline profiling on AMD Instinct MI350 series.
#### rocprofv3 support
* ``rocprofv3` ` is supported as the default backend for profiling.
* Support to obtain performance information for all channels for TCC counters.
* Support for profiling on AMD Instinct MI 100 using ``rocprofv3` `.
* Deprecation warning for ``rocprofv3` ` interface in favor of the ROCprofiler-SDK interface, which directly accesses ` `rocprofv3` ` C++ tool.
#### Others
* Docker files to package the application and dependencies into a single portable and executable standalone binary file.
* Analysis report based filtering
* ``-b` ` option in profile mode now also accepts metric id(s) for analysis report based filtering.
* ``-b` ` option in profile mode also accepts hardware IP block for filtering; however, this filter support will be deprecated soon.
* ``--list-metrics` ` option added in profile mode to list possible metric id(s), similar to analyze mode.
2025-06-18 13:19:58 -04:00
* Support MEM chart on CLI (single run)
2025-06-06 16:15:56 -06:00
2025-07-31 19:02:50 -04:00
* ``--specs-correction` ` option to provide missing system specifications for analysis.
2025-07-25 14:01:34 -04:00
2025-02-27 12:46:47 -07:00
### Changed
2025-07-31 19:02:50 -04:00
* Changed the default ``rocprof` ` version to ` `rocprofv3` `. This is used when environment variable ` `ROCPROF` ` is not set.
* Changed ``normal_unit` ` default to ` `per_kernel` `.
* Decreased profiling time by not collecting unused counters in post-analysis.
* Updated Dash to >=3.0.0 (for web UI).
* Changed the condition when Roofline PDFs are generated during general profiling and ``--roof-only` ` profiling (skip only when ` `--no-roof` ` option is present).
* Updated Roofline binaries:
2025-07-08 16:51:50 -04:00
* Rebuild using latest ROCm stack
2025-08-07 12:24:02 -04:00
* Minimum OS distribution support minimum for roofline feature is now Ubuntu 22.04, RHEL 8, and SLES15 SP6.
2025-08-07 00:13:04 -04:00
* Fixed not detecting memory clock issue when using amd-smi
* Fixed standalone GUI crashing
* Fixed L2 read/write/atomic bandwidths on MI350
2025-02-27 12:46:47 -07:00
2025-07-14 13:05:01 -06:00
### Optimized
* ROCm Compute Profiler CLI has been improved to better display the GPU architecture analytics
2025-03-06 12:52:54 -07:00
### Resolved issues
2025-07-31 19:02:50 -04:00
* Fixed kernel name and kernel dispatch filtering when using ``rocprofv3` `.
* Fixed an issue of TCC channel counters collection in ``rocprofv3` `.
* Fixed peak FLOPS of F8, I8, F16, and BF16 on AMD Instinct MI 300.
2025-03-18 11:26:45 -04:00
### Known issues
2025-07-31 19:02:50 -04:00
* On AMD Instinct MI100, accumulation counters are not collected, resulting in the following metrics failing to show up in the analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
* As a workaround, use the environment variable ``ROCPROF=rocprof` `, to use ` `rocprof v1` ` for profiling on AMD Instinct MI100.
2025-06-26 09:03:18 -04:00
2025-07-31 19:02:50 -04:00
* GPU id filtering is not supported when using ``rocprofv3` `.
2025-02-20 17:51:57 -05:00
2025-07-31 19:02:50 -04:00
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change.
* As a workaround, re-run the profiling operation for the workload and interrupt the process after 10 seconds.
Followed by copying the ``sysinfo.csv` ` file from the new data folder to the old one.
This assumes your system specification hasn't changed since the creation of the previous workload data.
2025-04-03 02:21:18 -04:00
* Analysis of new workloads might require providing shader/memory clock speed using
2025-07-31 19:02:50 -04:00
` `--specs-correction` ` operation if amd-smi or rocminfo does not provide clock speeds.
2025-04-03 02:21:18 -04:00
2025-07-31 19:02:50 -04:00
* Memory chart on ROCm Compute Profiler CLI might look corrupted if the CLI width is too narrow.
2025-07-08 17:15:09 -04:00
2025-04-28 16:08:23 -04:00
### Removed
* Roofline support for Ubuntu 20.04 and SLES below 15.6
2025-07-31 19:02:50 -04:00
* Removed support for AMD Instinct MI50 and MI60.
### Upcoming changes
2025-08-07 00:13:04 -04:00
* ` `rocprof v1/v2/v3` ` interfaces will be removed in favor of the ROCprofiler-SDK interface, which directly accesses ` `rocprofv3` ` C++ tool. Using ` `rocprof v1/v2/v3` ` interfaces will trigger a deprecation warning.
* To use ROCprofiler-SDK interface, set environment variable ` ROCPROF=rocprofiler-sdk` and optionally provide profile mode option ` `--rocprofiler-sdk-library-path /path/to/librocprofiler-sdk.so` `. Add ` `--rocprofiler-sdk-library-path` ` runtime option to choose the path to ROCprofiler-SDK library to be used.
2025-07-31 19:02:50 -04:00
* Hardware IP block based filtering using ` `-b` ` option in profile mode will be removed in favor of analysis report block based filtering using ` `-b` ` option in profile mode.
2025-08-07 00:13:04 -04:00
* MongoDB database support will be removed, and a deprecation warning has been added to the application interface.
* Usage of ` `rocm-smi` ` is deprecated in favor of ` `amd-smi` `, and a deprecation warning has been added to the application interface.
2025-07-31 19:02:50 -04:00
## ROCm Compute Profiler 3.1.1 for ROCm 6.4.2
### Added
* 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
* Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
* Data type selection option ` `--roofline-data-type / -R` ` for roofline profiling. The default data type is FP32.
### Changed
* Change dependency from ` rocm-smi` to ` amd-smi`.
### Resolved issues
* Fixed a crash related to Agent ID caused by the new format of the ` rocprofv3` output CSV file.
2025-04-28 16:08:23 -04:00
2025-03-24 17:43:29 -04:00
## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0
2025-02-11 12:46:33 -05:00
### Added
2025-02-13 18:05:49 -07:00
* Roofline support for Ubuntu 24.04
* Experimental support rocprofv3 (not enabled as default)
### Resolved issues
* Fixed PoP of VALU Active Threads
* Workaround broken mclk for old version of rocm-smi
2025-02-11 12:46:33 -05:00
## ROCm Compute Profiler 3.0.0 for ROCm 6.3.0
### Changed
* Renamed Omniperf to ROCm Compute Profiler (#475)
2024-11-04 16:45:16 -05:00
## Omniperf 2.0.1 for ROCm 6.2.1
2024-09-27 17:10:31 -04:00
2024-11-04 16:45:16 -05:00
### Changed
2024-09-27 17:10:31 -04:00
2024-11-04 16:45:16 -05:00
* enable rocprofv1 for MI300 hardware (#391)
* refactoring and updating documemtation (#362, #394, #398, #414, #420)
* branch renaming and workflow updates (#389, #404, #409)
* bug fix for analysis output
* add dependency checks on application launch (#393)
* patch for profiling multi-process/multi-GPU applications (#376, #396)
* packaging updates (#386)
* rename CHANGES to CHANGELOG.md (#410)
* rollback Grafana version in Dockerfile for Angular plugin compatibility (#416)
* enable CI triggers for Azure CI (#426)
* add GPU model distinction for MI300 systems (#423)
* new MAINTAINERS.md guide for omniperf publishing procedures (#402)
2024-09-27 17:10:31 -04:00
2024-11-04 16:45:16 -05:00
### Optimized
2024-09-27 17:10:31 -04:00
2025-01-02 13:29:47 -08:00
* reduced running time of Omniperf when profiling (#384)
2024-11-04 16:45:16 -05:00
* console logging improvements
2024-09-27 17:10:31 -04:00
## Omniperf 2.0.1 for ROCm 6.2.0
2024-08-13 12:29:32 -04:00
2024-11-04 16:45:16 -05:00
### Added
2024-08-13 12:29:32 -04:00
* new option to force hardware target via ` OMNIPERF_ARCH_OVERRIDE` global (#370)
2024-06-04 00:03:43 +00:00
* CI/CD support for MI300 hardware (#373)
* support for MI308X hardware (#375)
2024-11-04 16:45:16 -05:00
### Optimized
2024-08-13 12:29:32 -04:00
* cmake build improvements (#374)
## Omniperf 2.0.0 (17 May 2024)
2024-05-17 18:36:11 +00:00
* improved logging than spans all modes (#177) (#317) (#335) (#341)
* overhauled CI/CD that spans all modes (#179)
* extensible SoC classes to better support adding new hardware configs (#180)
* --kernel-verbose no longer overwrites kernel names (#193)
2025-01-02 13:29:47 -08:00
* general cleanup and improved organization of source code (#200) (#210)
2024-05-17 18:36:11 +00:00
* separate requirement files for docs and testing dependencies (#205) (#262) (#358)
* add support for MI300 hardware (#231)
* upgrade Grafana assets and build script to latest release (#235)
* update minimum ROCm and Python requirements (#277)
* sort rocprofiler input files prior to profiling (#304)
* new --quiet option will suppress verbose output and show a progress bar (#308)
* roofline support for Ubuntu 22.04 (#319)
2024-08-13 12:29:32 -04:00
## Omniperf 1.1.0-PR1 (13 Oct 2023)
2024-03-05 12:16:23 -05:00
* standardize headers to use 'avg' instead of 'mean'
* add color code thresholds to standalone gui to match grafana
* modify kernel name shortener to use cpp_filt (#168)
* enable stochastic kernel dispatch selection (#183)
* patch grafana plugin module to address a known issue in the latest version (#186)
* enhanced communication between analyze mode kernel flags (#187)
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.10 (22 Aug 2023)
2023-08-22 12:45:36 -05:00
* critical patch for detection of llvm in rocm installs on SLURM systems
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.9 (17 Aug 2023)
2023-08-17 11:04:43 -05:00
* add units to L2 per-channel panel (#133)
* new quickstart guide for Grafana setup in docs (#135)
* more detail on kernel and dispatch filtering in docs (#136, #137)
* patch manual join utility for ROCm >5.2.x (#139)
* add % of peak values to low level speed-of-light panels (#140)
* patch critical bug in Grafana by removing a deprecated plugin (#141)
* enhancements to KernelName demangeler (#142)
* general metric updates and enhancements (#144, #155, #159)
* add min/max/avg breakdown to instruction mix panel (#154)
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.8 (30 May 2023)
2023-05-30 11:28:44 -05:00
* add ` --kernel-names` option to toggle kernelName overlay in standalone roofline plot (#93)
* remove unused python modules (#96)
* fix empirical roofline calculation for single dispatch workloads (#97)
* match color of arithmetic intensity points to corresponding bw lines
* ux improvements in standalone GUI (#101)
* enhanced readability for filtering dropdowns in standalone GUI (#102)
* new logfile to capture rocprofiler output (#106)
* roofline support for sles15 sp4 and future service packs (#109)
2023-05-30 11:34:33 -05:00
* adding dockerfiles for all supported Linux distros
2023-05-30 11:28:44 -05:00
* new examples for ` --roof-only` and ` --kernel` options added to documentation
2025-01-02 13:29:47 -08:00
2023-05-30 11:28:44 -05:00
* enable cli analysis in Windows (#110)
* optional random port number in standalone GUI (#111)
2023-05-30 11:34:33 -05:00
* limit length of visible kernelName in ` --kernel-names` option (#115)
2023-05-30 11:28:44 -05:00
* adjust metric definitions (#117, #130)
* manually merge rocprof runs, overriding default rocprofiler implementation (#125)
* fixed compatibility issues with Python 3.11 (#131)
2025-01-02 13:29:47 -08:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.8-PR2 (17 Apr 2023)
2023-05-30 11:28:44 -05:00
2023-04-17 14:14:51 -05:00
* ux improvements in standalone GUI (#101)
* enhanced readability for filtering dropdowns in standalone GUI (#102)
* new logfile to capture rocprofiler output (#106)
* roofline support for sles15 sp4 and future service packs (#109)
2023-05-30 11:34:33 -05:00
* adding dockerfiles for all supported Linux distros
2023-04-17 14:14:51 -05:00
* new examples for ` --roof-only` and ` --kernel` options added to documentation
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.8-PR1 (13 Mar 2023)
2023-03-13 15:52:11 -05:00
* add ` --kernel-names` option to toggle kernelName overlay in standalone roofline plot (#93 )
* remove unused python modules (#96 )
* fix empirical roofline calculation for single dispatch workloads (#97 )
* match color of arithmetic intensity points to corresponding bw lines
2025-01-02 13:29:47 -08:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.7 (21 Feb 2023)
2023-02-21 14:44:03 -06:00
2023-02-21 15:46:35 -06:00
* update documentation (#52 , #64 )
* improved detection of invalid command line arguments (#58 , #76 )
* enhancements to standalone roofline (#61 )
* enable Omniperf on systems with X-server (#62 )
* raise minimum version requirement for rocm (#64 )
* enable baseline comparison in CLI analysis (#65 )
* add multi-normalization to new metrics (#68 , #81 )
* support alternative profilers (#70 )
* add MI100 configs to override rocprofiler's incomplete default (#75 )
* improve error message when no GPU(s) detected (#85 )
2023-02-21 15:53:35 -06:00
* separate CI tests by Linux distro and add status badges
2025-01-02 13:29:47 -08:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.6 (21 Dec 2022)
2022-12-21 12:39:05 -06:00
* CI update: documentation now published via github action (#22 )
* better error detection for incomplete ROCm installs (#56 )
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.5 (13 Dec 2022)
2022-12-12 15:01:35 -05:00
* store application command-line parameters in profiling output (#27 )
* enable additional normalizations in CLI mode (#30 )
2022-12-13 09:28:26 -05:00
* add missing ubuntu 20.04 roofline binary to packaging (#34 )
2022-12-12 15:01:35 -05:00
* update L1 bandwidth metric calculations (#36 )
* add L1 <-> L2 bandwidth calculation (#37 )
* documentation updates (#38 , #41 )
2022-12-12 17:53:55 -06:00
* enhanced subprocess logging to identify critical errors in rocprofiler (#50 )
2022-12-13 09:28:26 -05:00
* maintain git sha in production installs from tarball (#53 )
2022-12-12 15:01:35 -05:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.4 (11 Nov 2022)
2022-11-11 15:15:07 -06:00
* update python requirements.txt with minimum versions for numpy and pandas
* addition of progress bar indicator in web-based GUI (#8 )
* reduced default content for web-based GUI to reduce load times (#9 )
2025-01-02 13:29:47 -08:00
* minor packaging and CI updates
* variety of documentation updates
2022-11-11 15:52:00 -06:00
* added an optional argument to vcopy.cpp workload example to specify device id
2022-11-11 15:15:07 -06:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.3 (07 Nov 2022)
2022-11-11 15:15:07 -06:00
2022-12-13 09:29:27 -05:00
* initial Omniperf release