Backport CHANGELOG changes from 7.0 release (#845)
* Backport CHANGELOG changes from 7.0 release * Backport CHANGELOG changes from https://github.com/ROCm/rocprofiler-compute/pull/815
Esse commit está contido em:
+117
-73
@@ -8,6 +8,9 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
|
||||
* Add `rocpd` choice for `--format-rocprof-output` option in profile mode
|
||||
* Add `--retain-rocpd-output` option in profile mode to save large raw rocpd databases in workload directory
|
||||
* Show description of metrics during analysis
|
||||
* Use `--include-cols Description` to show the Description column, which is excluded by default from the
|
||||
ROCm Compute Profiler CLI output.
|
||||
|
||||
### Changed
|
||||
|
||||
@@ -16,43 +19,36 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
|
||||
* When `--format-rocprof-output rocpd` is used, only pmc_perf.csv will be written to workload directory instead of mulitple csv files.
|
||||
|
||||
* Improve analysis block based filtering to accept metric id level filtering
|
||||
* This can be used to collect individual metrics from various sections of analysis config
|
||||
|
||||
* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
|
||||
* Remove metrics from analysis configuration files which are explicitly marked as empty or None
|
||||
|
||||
### Resolved issues
|
||||
|
||||
* Fixed not detecting memory clock issue when using amd-smi
|
||||
* Fixed standalone GUI crashing
|
||||
* Fixed L2 read/write/atomic bandwidths on MI350
|
||||
* Update metric names for better alignment between analysis configuration and documentation
|
||||
|
||||
### Known issues
|
||||
|
||||
### Optimized
|
||||
|
||||
* Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
|
||||
|
||||
### Removed
|
||||
|
||||
## ROCm Compute Profiler 3.2.0 for ROCm 7.0.0
|
||||
* Usage of rocm-smi
|
||||
* Hardware IP block based filtering has been removed in favor of analysis report block based filtering
|
||||
|
||||
|
||||
## ROCm Compute Profiler 3.2.1 for ROCm 7.0.0
|
||||
|
||||
### Added
|
||||
|
||||
* Support Roofline plot on CLI (single run)
|
||||
|
||||
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
|
||||
|
||||
* Sorting of PC sampling by type: offset or count.
|
||||
|
||||
* Add rocprof-compute Text User Interface (TUI) support for analyze mode (beta version)
|
||||
* A command line based user interface to support interactive single-run analysis
|
||||
* launch with `--tui` option in analyze mode. i.e., `rocprof-compute analyze --tui`
|
||||
|
||||
* Add support to be able to acquire from rocprofv3 every single channle on each XCD of TCC counters
|
||||
|
||||
* Add Docker files to package the application and dependencies into a single portable and executable standalone binary file
|
||||
|
||||
* Analysis report based filtering
|
||||
* -b option in profile mode now additionally accepts metric id(s) for analysis report based filtering
|
||||
* -b option in profile mode also accept hardware IP block for filtering, however, this support will be deprecated soon
|
||||
* --list-metrics option added in profile mode to list possible metric id(s), similar to analyze mode
|
||||
|
||||
* Data type selection option for roofline profiling
|
||||
* --roofline-data-type / -R option added to specify which data types the user wants to capture in the roofline PDF plot outputs
|
||||
* Default is FP32, but user can specify as many types as desired to overlay on the same plot output
|
||||
|
||||
* Additional data types for roofline profiling
|
||||
* Now supports FP4, FP6, FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on gpu architecture)
|
||||
|
||||
* Support host-trap PC Sampling on CLI (beta version)
|
||||
#### CDNA4 (AMD Instinct MI350/MI355) support
|
||||
|
||||
* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
|
||||
* VALU co-issue (Two VALUs are issued instructions) efficiency
|
||||
@@ -73,82 +69,130 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
* L2 to EA stalls
|
||||
* L2 to EA stalls per channel
|
||||
|
||||
* Roofline support for RHEL 10
|
||||
* Roofline support for AMD Instinct MI350 series architecture.
|
||||
|
||||
* Roofline support for MI350 series architecture
|
||||
#### Textual User Interface (TUI) (beta version)
|
||||
|
||||
* Interface to rocprofiler-sdk
|
||||
* Setting ROCPROF=rocprofiler-sdk environment variable will use rocprofiler-sdk C++ library instead of rocprofv3 python script
|
||||
* Text User Interface (TUI) support for analyze mode
|
||||
* A command line based user interface to support interactive single-run analysis
|
||||
* To launch, use `--tui` option in analyze mode. For example, ``rocprof-compute analyze --tui``.
|
||||
|
||||
#### PC Sampling (beta version)
|
||||
|
||||
* Stochastic (hardware-based) PC sampling has been enabled for AMD Instinct MI300X series and later accelerators.
|
||||
|
||||
* Host-trap PC Sampling has been enabled for AMD Instinct MI200 series and later accelerators.
|
||||
|
||||
* Support for sorting of PC sampling by type: offset or count.
|
||||
|
||||
* PC Sampling Support on CLI and TUI analysis.
|
||||
|
||||
#### Roofline
|
||||
|
||||
* Support for Roofline plot on CLI (single run) analysis.
|
||||
|
||||
* Roofline support for RHEL 10 OS.
|
||||
|
||||
* FP4 and FP6 data types have been added for roofline profiling on AMD Instinct MI350 series.
|
||||
|
||||
#### rocprofv3 support
|
||||
|
||||
* ``rocprofv3`` is supported as the default backend for profiling.
|
||||
* Support to obtain performance information for all channels for TCC counters.
|
||||
* Support for profiling on AMD Instinct MI 100 using ``rocprofv3``.
|
||||
* Deprecation warning for ``rocprofv3`` interface in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool.
|
||||
|
||||
#### Others
|
||||
|
||||
* Docker files to package the application and dependencies into a single portable and executable standalone binary file.
|
||||
|
||||
* Analysis report based filtering
|
||||
* ``-b`` option in profile mode now also accepts metric id(s) for analysis report based filtering.
|
||||
* ``-b`` option in profile mode also accepts hardware IP block for filtering; however, this filter support will be deprecated soon.
|
||||
* ``--list-metrics`` option added in profile mode to list possible metric id(s), similar to analyze mode.
|
||||
|
||||
* Interface to ROCprofiler-SDK.
|
||||
* Setting the environment variable ``ROCPROF=rocprofiler-sdk`` will use ROCprofiler-SDK C++ library instead of ``rocprofv3`` python script.
|
||||
* Add --rocprofiler-sdk-library-path runtime option to choose the path to rocprofiler-sdk library to be used
|
||||
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
|
||||
|
||||
* Support MEM chart on CLI (single run)
|
||||
|
||||
* Add deprecation warning for database update mode.
|
||||
* Deprecation warning for MongoDB database update mode.
|
||||
|
||||
* Show description of metrics during analysis
|
||||
* Use `--include-cols Description` to show `Description` column which is excluded by default from cli output
|
||||
* Deprecation warning for ``rocm-smi``
|
||||
|
||||
* ``--specs-correction`` option to provide missing system specifications for analysis.
|
||||
|
||||
### Changed
|
||||
|
||||
* Change the default rocprof version to rocprofv3, this is used when environment variable "ROCPROF" is not set
|
||||
* Change the rocprof version for unit tests to rocprofv3 on all SoCs except MI100
|
||||
* Change normal_unit default to per_kernel
|
||||
* Change dependency from rocm-smi to amd-smi
|
||||
* Decrease profiling time by not collecting counters not used in post analysis
|
||||
* Update definition of following metrics for MI 350:
|
||||
* VGPR Writes
|
||||
* Total FLOPs (consider fp6 and fp4 ops)
|
||||
* Update Dash to >=3.0.0 (for web UI)
|
||||
* Change when Roofline PDFs are generated- during general profiling and --roof-only profiling (skip only when --no-roof option is present)
|
||||
* Update Roofline binaries
|
||||
* Changed the default ``rocprof`` version to ``rocprofv3``. This is used when environment variable ``ROCPROF`` is not set.
|
||||
* Changed ``normal_unit`` default to ``per_kernel``.
|
||||
* Decreased profiling time by not collecting unused counters in post-analysis.
|
||||
* Updated Dash to >=3.0.0 (for web UI).
|
||||
* Changed the condition when Roofline PDFs are generated during general profiling and ``--roof-only`` profiling (skip only when ``--no-roof`` option is present).
|
||||
* Updated Roofline binaries:
|
||||
* Rebuild using latest ROCm stack
|
||||
* OS distribution support minimum for roofline feature is now Ubuntu22.04, RHEL9, and SLES15SP6
|
||||
* Improve analysis block based filtering to accept metric id level filtering
|
||||
* This can be used to collect individual metrics from various sections of analysis config
|
||||
* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
|
||||
* Remove metrics from analysis configuration files which are explicitly marked as empty or None
|
||||
* Minimum OS distribution support minimum for roofline feature is now Ubuntu 22.04, RHEL 9, and SLES15 SP6.
|
||||
|
||||
### Optimized
|
||||
|
||||
* ROCm Compute Profiler CLI has been improved to better display the GPU architecture analytics
|
||||
* Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
|
||||
|
||||
### Resolved issues
|
||||
|
||||
* Fixed MI 100 counters not being collected when rocprofv3 is used
|
||||
* Fixed option specs-correction
|
||||
* Fixed kernel name and kernel dispatch filtering when using rocprof v3
|
||||
* Fixed not collecting TCC channel counters in rocprof v3
|
||||
* Fixed peak FLOPS of F8 I8 F16 and BF16 on MI300
|
||||
* Fixed not detecting memory clock issue when using amd-smi
|
||||
* Fixed standalone GUI crashing
|
||||
* Fixed L2 read/write/atomic bandwidths on MI350
|
||||
* Update metric names for better alignment between analysis configuration and documentation
|
||||
* Fixed kernel name and kernel dispatch filtering when using ``rocprofv3``.
|
||||
* Fixed an issue of TCC channel counters collection in ``rocprofv3``.
|
||||
* Fixed peak FLOPS of F8, I8, F16, and BF16 on AMD Instinct MI 300.
|
||||
|
||||
### Known issues
|
||||
|
||||
* On MI 100, accumulation counters will not be collected and the following metrics will not show up in analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
|
||||
* As a workaround, use ROCPROF=rocprof environement variable, to use rocprofv1 for profiling on MI 100
|
||||
* On AMD Instinct MI100, accumulation counters are not collected, resulting in the following metrics failing to show up in the analysis: Instruction Fetch Latency, Wavefront Occupancy, LDS Latency
|
||||
* As a workaround, use the environment variable ``ROCPROF=rocprof``, to use ``rocprof v1`` for profiling on AMD Instinct MI100.
|
||||
|
||||
* GPU id filtering is not supported when using rocprof v3
|
||||
* GPU id filtering is not supported when using ``rocprofv3``.
|
||||
|
||||
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change
|
||||
* As a workaround, run the profiling operation again for the workload and interrupt the process after ten seconds.
|
||||
Followed by copying the `sysinfo.csv` file from the new data folder to the old one.
|
||||
This assumes your system specification hasn't changed since the creation of the previous workload data.
|
||||
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change.
|
||||
* As a workaround, re-run the profiling operation for the workload and interrupt the process after 10 seconds.
|
||||
Followed by copying the ``sysinfo.csv`` file from the new data folder to the old one.
|
||||
This assumes your system specification hasn't changed since the creation of the previous workload data.
|
||||
|
||||
* Analysis of new workloads might require providing shader/memory clock speed using
|
||||
--specs-correction operation if `amd-smi` or `rocminfo` does not provide clock speeds.
|
||||
``--specs-correction`` operation if amd-smi or rocminfo does not provide clock speeds.
|
||||
|
||||
* Memory chart on CLI might look corrupted if CLI width is too narrow
|
||||
* Memory chart on ROCm Compute Profiler CLI might look corrupted if the CLI width is too narrow.
|
||||
|
||||
### Removed
|
||||
|
||||
* Roofline support for Ubuntu 20.04 and SLES below 15.6
|
||||
* Usage of rocm-smi
|
||||
* Remove support for MI50/MI60 in accordance with the documentation
|
||||
* Hardware IP block based filtering has been removed in favor of analysis report block based filtering
|
||||
* Removed support for AMD Instinct MI50 and MI60.
|
||||
|
||||
### Upcoming changes
|
||||
|
||||
* ``rocprof v1/v2/v3`` interfaces will be removed in favor of the ROCprofiler-SDK interface, which directly accesses ``rocprofv3`` C++ tool.
|
||||
* To use ROCprofiler-SDK interface, set environment variable `ROCPROF=rocprofiler-sdk` and optionally provide profile mode option ``--rocprofiler-sdk-library-path /path/to/librocprofiler-sdk.so``
|
||||
* Hardware IP block based filtering using ``-b`` option in profile mode will be removed in favor of analysis report block based filtering using ``-b`` option in profile mode.
|
||||
* Using rocprof v1 / v2 / v3 interfaces will trigger a deprecation warning to use rocprofiler-sdk interface
|
||||
* MongoDB database support will be removed.
|
||||
* Usage of ``rocm-smi`` will be removed in favor of ``amd-smi``.
|
||||
|
||||
|
||||
## ROCm Compute Profiler 3.1.1 for ROCm 6.4.2
|
||||
|
||||
### Added
|
||||
|
||||
* 8-bit floating point (FP8) metrics support for AMD Instinct MI300 GPUs.
|
||||
* Additional data types for roofline: FP8, FP16, BF16, FP32, FP64, I8, I32, I64 (dependent on the GPU architecture).
|
||||
* Data type selection option ``--roofline-data-type / -R`` for roofline profiling. The default data type is FP32.
|
||||
|
||||
### Changed
|
||||
|
||||
* Change dependency from `rocm-smi` to `amd-smi`.
|
||||
|
||||
### Resolved issues
|
||||
|
||||
* Fixed a crash related to Agent ID caused by the new format of the `rocprofv3` output CSV file.
|
||||
|
||||
|
||||
## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0
|
||||
|
||||
|
||||
Referência em uma Nova Issue
Bloquear um usuário