Fichiers
rocm-systems/projects/rocprofiler-compute/CHANGELOG.md
T

285 lignes
12 KiB
Markdown
Brut Vue normale Historique

# Changelog for ROCm Compute Profiler
Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).
2024-08-13 12:29:32 -04:00
2025-02-20 17:51:57 -05:00
## Unreleased
2025-02-27 12:46:47 -07:00
### Added
2025-06-18 13:19:58 -04:00
* Support Roofline plot on CLI (single run)
2025-06-06 12:43:52 -06:00
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
* Sorting of PC sampling by type: offset or count.
2025-06-18 13:58:42 -04:00
* Add rocprof-compute Text User Interface (TUI) support for analyze mode (beta version)
* A command line based user interface to support interactive single-run analysis
* launch with `--tui` option in analyze mode. i.e., `rocprof-compute analyze --tui`
* Add support to be able to acquire from rocprofv3 every single channle on each XCD of TCC counters
2025-02-27 12:46:47 -07:00
* Add Docker files to package the application and dependencies into a single portable and executable standalone binary file
* Analysis report based filtering
* -b option in profile mode now additionally accepts metric id(s) for analysis report based filtering
* -b option in profile mode also accept hardware IP block for filtering, however, this support will be deprecated soon
* --list-metrics option added in profile mode to list possible metric id(s), similar to analyze mode
* Datatype selection option for roofline profiling
* --roofline-data-type / -R option added to specify which datatypes the user wants to capture in the roofline PDF plot outputs
* Default is FP32, but user can specify as many types as desired to overlay on the same plot output
2025-03-26 21:07:48 -04:00
* Additional datatypes for roofline profiling
* Now supports FP4, FP6, FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
2025-03-26 21:07:48 -04:00
* Support host-trap PC Sampling on CLI (beta version)
2025-04-03 02:21:18 -04:00
* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
* VALU co-issue (Two VALUs are issued instructions) efficiency
* Stream Processor Instruction (SPI) Wave Occupancy
* Scheduler-Pipe Wave Utilization
* Scheduler FIFO Full Rate
* CPC ADC Utilization
* F6F4 datatype metrics
* Update formula for total FLOPs while taking into account F6F4 ops
* LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
* LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
* LDS FIFO full rate
* Sequencer -> TA ADDR Stall rates
* Sequencer -> TA CMD Stall rates
* Sequencer -> TA DATA Stall rates
* L1 latencies
* L2 latencies
* L2 to EA stalls
* L2 to EA stalls per channel
2025-04-11 17:45:53 -04:00
* Roofline support for RHEL 10
2025-04-28 16:08:23 -04:00
* Roofline support for MI350 series architecture
* Interface to rocprofiler-sdk
* Setting ROCPROF=rocprofiler-sdk environment variable will use rocprofiler-sdk C++ library instead of rocprofv3 python script
* Add --rocprofiler-sdk-library-path runtime option to choose the path to rocprofiler-sdk library to be used
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
2025-06-18 13:19:58 -04:00
* Support MEM chart on CLI (single run)
2025-06-06 16:15:56 -06:00
* Add deprecation warning for database update mode.
2025-02-27 12:46:47 -07:00
### Changed
2025-06-12 15:45:11 -04:00
* Change the default rocprof version to rocprofv3, this is used when environment variable "ROCPROF" is not set
* Change the rocprof version for unit tests to rocprofv3 on all SoCs except MI100
2025-02-27 12:46:47 -07:00
* Change normal_unit default to per_kernel
2025-03-24 17:43:29 -04:00
* Change dependency from rocm-smi to amd-smi
* Decrease profiling time by not collecting counters not used in post analysis
2025-04-03 02:21:18 -04:00
* Update definition of following metrics for MI 350:
* VGPR Writes
* Total FLOPs (consider fp6 and fp4 ops)
2025-05-29 14:36:03 -04:00
* Update Dash to >=3.0.0 (for web UI)
* Change when Roofline PDFs are generated- during general profiling and --roof-only profiling (skip only when --no-roof option is present)
* Update Roofline binaries
* Rebuild using latest ROCm stack
* OS distribution support minimum for roofline feature is now Ubuntu22.04, RHEL9, and SLES15SP6
2025-02-27 12:46:47 -07:00
2025-03-06 12:52:54 -07:00
### Resolved issues
* Fixed MI 100 counters not being collected when rocprofv3 is used
2025-03-06 12:52:54 -07:00
* Fixed option specs-correction
* Fixed kernel name and kernel dispatch filtering when using rocprof v3
2025-03-24 17:43:29 -04:00
* Fixed not collecting TCC channel counters in rocprof v3
* Fixed peak FLOPS of F8 I8 F16 and BF16 on MI300
### Known issues
* On MI 100, accumulation counters will not be collected and the following metrics will not show up in analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
* As a workaround, use ROCPROF=rocprof environement variable, to use rocprofv1 for profiling on MI 100
2025-03-24 17:43:29 -04:00
* GPU id filtering is not supported when using rocprof v3
2025-02-20 17:51:57 -05:00
2025-04-03 02:21:18 -04:00
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change
* As a workaround, run the profiling operation again for the workload and interrupt the process after ten seconds.
Followed by copying the `sysinfo.csv` file from the new data folder to the old one.
This assumes your system specification hasn't changed since the creation of the previous workload data.
* Analysis of new workloads might require providing shader/memory clock speed using
--specs-correction operation if `amd-smi` or `rocminfo` does not provide clock speeds.
2025-07-08 17:15:09 -04:00
* Memory chart on CLI might look corrupted if CLI width is too narrow
2025-04-28 16:08:23 -04:00
### Removed
* Roofline support for Ubuntu 20.04 and SLES below 15.6
2025-03-24 17:43:29 -04:00
## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0
2025-02-11 12:46:33 -05:00
### Added
2025-02-13 18:05:49 -07:00
* Roofline support for Ubuntu 24.04
* Experimental support rocprofv3 (not enabled as default)
### Resolved issues
* Fixed PoP of VALU Active Threads
* Workaround broken mclk for old version of rocm-smi
2025-02-11 12:46:33 -05:00
## ROCm Compute Profiler 3.0.0 for ROCm 6.3.0
### Changed
* Renamed Omniperf to ROCm Compute Profiler (#475)
2024-11-04 16:45:16 -05:00
## Omniperf 2.0.1 for ROCm 6.2.1
2024-11-04 16:45:16 -05:00
### Changed
2024-11-04 16:45:16 -05:00
* enable rocprofv1 for MI300 hardware (#391)
* refactoring and updating documemtation (#362, #394, #398, #414, #420)
* branch renaming and workflow updates (#389, #404, #409)
* bug fix for analysis output
* add dependency checks on application launch (#393)
* patch for profiling multi-process/multi-GPU applications (#376, #396)
* packaging updates (#386)
* rename CHANGES to CHANGELOG.md (#410)
* rollback Grafana version in Dockerfile for Angular plugin compatibility (#416)
* enable CI triggers for Azure CI (#426)
* add GPU model distinction for MI300 systems (#423)
* new MAINTAINERS.md guide for omniperf publishing procedures (#402)
2024-11-04 16:45:16 -05:00
### Optimized
2025-01-02 13:29:47 -08:00
* reduced running time of Omniperf when profiling (#384)
2024-11-04 16:45:16 -05:00
* console logging improvements
## Omniperf 2.0.1 for ROCm 6.2.0
2024-08-13 12:29:32 -04:00
2024-11-04 16:45:16 -05:00
### Added
2024-08-13 12:29:32 -04:00
* new option to force hardware target via `OMNIPERF_ARCH_OVERRIDE` global (#370)
* CI/CD support for MI300 hardware (#373)
* support for MI308X hardware (#375)
2024-11-04 16:45:16 -05:00
### Optimized
2024-08-13 12:29:32 -04:00
* cmake build improvements (#374)
## Omniperf 2.0.0 (17 May 2024)
2024-05-17 18:36:11 +00:00
* improved logging than spans all modes (#177) (#317) (#335) (#341)
* overhauled CI/CD that spans all modes (#179)
* extensible SoC classes to better support adding new hardware configs (#180)
* --kernel-verbose no longer overwrites kernel names (#193)
2025-01-02 13:29:47 -08:00
* general cleanup and improved organization of source code (#200) (#210)
2024-05-17 18:36:11 +00:00
* separate requirement files for docs and testing dependencies (#205) (#262) (#358)
* add support for MI300 hardware (#231)
* upgrade Grafana assets and build script to latest release (#235)
* update minimum ROCm and Python requirements (#277)
* sort rocprofiler input files prior to profiling (#304)
* new --quiet option will suppress verbose output and show a progress bar (#308)
* roofline support for Ubuntu 22.04 (#319)
2024-08-13 12:29:32 -04:00
## Omniperf 1.1.0-PR1 (13 Oct 2023)
2024-03-05 12:16:23 -05:00
* standardize headers to use 'avg' instead of 'mean'
* add color code thresholds to standalone gui to match grafana
* modify kernel name shortener to use cpp_filt (#168)
* enable stochastic kernel dispatch selection (#183)
* patch grafana plugin module to address a known issue in the latest version (#186)
* enhanced communication between analyze mode kernel flags (#187)
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.10 (22 Aug 2023)
2023-08-22 12:45:36 -05:00
* critical patch for detection of llvm in rocm installs on SLURM systems
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.9 (17 Aug 2023)
2023-08-17 11:04:43 -05:00
* add units to L2 per-channel panel (#133)
* new quickstart guide for Grafana setup in docs (#135)
* more detail on kernel and dispatch filtering in docs (#136, #137)
* patch manual join utility for ROCm >5.2.x (#139)
* add % of peak values to low level speed-of-light panels (#140)
* patch critical bug in Grafana by removing a deprecated plugin (#141)
* enhancements to KernelName demangeler (#142)
* general metric updates and enhancements (#144, #155, #159)
* add min/max/avg breakdown to instruction mix panel (#154)
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.8 (30 May 2023)
2023-05-30 11:28:44 -05:00
* add `--kernel-names` option to toggle kernelName overlay in standalone roofline plot (#93)
* remove unused python modules (#96)
* fix empirical roofline calculation for single dispatch workloads (#97)
* match color of arithmetic intensity points to corresponding bw lines
* ux improvements in standalone GUI (#101)
* enhanced readability for filtering dropdowns in standalone GUI (#102)
* new logfile to capture rocprofiler output (#106)
* roofline support for sles15 sp4 and future service packs (#109)
2023-05-30 11:34:33 -05:00
* adding dockerfiles for all supported Linux distros
2023-05-30 11:28:44 -05:00
* new examples for `--roof-only` and `--kernel` options added to documentation
2025-01-02 13:29:47 -08:00
2023-05-30 11:28:44 -05:00
* enable cli analysis in Windows (#110)
* optional random port number in standalone GUI (#111)
2023-05-30 11:34:33 -05:00
* limit length of visible kernelName in `--kernel-names` option (#115)
2023-05-30 11:28:44 -05:00
* adjust metric definitions (#117, #130)
* manually merge rocprof runs, overriding default rocprofiler implementation (#125)
* fixed compatibility issues with Python 3.11 (#131)
2025-01-02 13:29:47 -08:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.8-PR2 (17 Apr 2023)
2023-05-30 11:28:44 -05:00
2023-04-17 14:14:51 -05:00
* ux improvements in standalone GUI (#101)
* enhanced readability for filtering dropdowns in standalone GUI (#102)
* new logfile to capture rocprofiler output (#106)
* roofline support for sles15 sp4 and future service packs (#109)
2023-05-30 11:34:33 -05:00
* adding dockerfiles for all supported Linux distros
2023-04-17 14:14:51 -05:00
* new examples for `--roof-only` and `--kernel` options added to documentation
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.8-PR1 (13 Mar 2023)
2023-03-13 15:52:11 -05:00
* add `--kernel-names` option to toggle kernelName overlay in standalone roofline plot (#93)
* remove unused python modules (#96)
* fix empirical roofline calculation for single dispatch workloads (#97)
* match color of arithmetic intensity points to corresponding bw lines
2025-01-02 13:29:47 -08:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.7 (21 Feb 2023)
2023-02-21 15:46:35 -06:00
* update documentation (#52, #64)
* improved detection of invalid command line arguments (#58, #76)
* enhancements to standalone roofline (#61)
* enable Omniperf on systems with X-server (#62)
* raise minimum version requirement for rocm (#64)
* enable baseline comparison in CLI analysis (#65)
* add multi-normalization to new metrics (#68, #81)
* support alternative profilers (#70)
* add MI100 configs to override rocprofiler's incomplete default (#75)
* improve error message when no GPU(s) detected (#85)
2023-02-21 15:53:35 -06:00
* separate CI tests by Linux distro and add status badges
2025-01-02 13:29:47 -08:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.6 (21 Dec 2022)
2022-12-21 12:39:05 -06:00
* CI update: documentation now published via github action (#22)
* better error detection for incomplete ROCm installs (#56)
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.5 (13 Dec 2022)
2022-12-12 15:01:35 -05:00
* store application command-line parameters in profiling output (#27)
* enable additional normalizations in CLI mode (#30)
2022-12-13 09:28:26 -05:00
* add missing ubuntu 20.04 roofline binary to packaging (#34)
2022-12-12 15:01:35 -05:00
* update L1 bandwidth metric calculations (#36)
* add L1 <-> L2 bandwidth calculation (#37)
* documentation updates (#38, #41)
2022-12-12 17:53:55 -06:00
* enhanced subprocess logging to identify critical errors in rocprofiler (#50)
2022-12-13 09:28:26 -05:00
* maintain git sha in production installs from tarball (#53)
2022-12-12 15:01:35 -05:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.4 (11 Nov 2022)
2022-11-11 15:15:07 -06:00
* update python requirements.txt with minimum versions for numpy and pandas
* addition of progress bar indicator in web-based GUI (#8)
* reduced default content for web-based GUI to reduce load times (#9)
2025-01-02 13:29:47 -08:00
* minor packaging and CI updates
* variety of documentation updates
2022-11-11 15:52:00 -06:00
* added an optional argument to vcopy.cpp workload example to specify device id
2022-11-11 15:15:07 -06:00
2024-08-13 12:29:32 -04:00
## Omniperf 1.0.3 (07 Nov 2022)
2022-11-11 15:15:07 -06:00
2022-12-13 09:29:27 -05:00
* initial Omniperf release