[DOC] TUI kernel selection (#94)
Bu işleme şunda yer alıyor:
işlemeyi yapan:
GitHub
ebeveyn
e84f93ea3b
işleme
d58adf96da
@@ -23,6 +23,39 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
|
||||
* Add support for analysis report output as a sqlite database using ``--output-format db`` analysis mode option
|
||||
|
||||
* `Compute Throughput` panel to TUI's `High Level Analysis` category with the following metrics:
|
||||
* VALU FLOPs
|
||||
* VALU IOPs
|
||||
* MFMA FLOPs (F8)
|
||||
* MFMA FLOPs (BF16)
|
||||
* MFMA FLOPs (F16)
|
||||
* MFMA FLOPs (F32)
|
||||
* MFMA FLOPs (F64)
|
||||
* MFMA FLOPs (F6F4) (in gfx950)
|
||||
* MFMA IOPs (Int8)
|
||||
* SALU Utilization
|
||||
* VALU Utilization
|
||||
* MFMA Utilization
|
||||
* VMEM Utilization
|
||||
* Branch Utilization
|
||||
* IPC
|
||||
|
||||
* `Memory Throughput` panel to TUI's `High Level Analysis` category with the following metrics:
|
||||
* vL1D Cache BW
|
||||
* vL1D Cache Utilization
|
||||
* Theoretical LDS Bandwidth
|
||||
* LDS Utilization
|
||||
* L2 Cache BW
|
||||
* L2 Cache Utilization
|
||||
* L2-Fabric Read BW
|
||||
* L2-Fabric Write BW
|
||||
* sL1D Cache BW
|
||||
* L1I BW
|
||||
* Address Processing Unit Busy
|
||||
* Data-Return Busy
|
||||
* L1I-L2 Bandwidth
|
||||
* sL1D-L2 BW
|
||||
|
||||
### Changed
|
||||
|
||||
* On memory chart, long string of numbers are displayed as scientific notation. It also solves the issue of overflow of displaying long number
|
||||
@@ -38,7 +71,9 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
|
||||
* Remove metrics from analysis configuration files which are explicitly marked as empty or None
|
||||
|
||||
* Change the basic view of TUI from aggregated analysis data to individual kernel analysis data
|
||||
* Changed the basic (default) view of TUI from aggregated analysis data to individual kernel analysis data.
|
||||
|
||||
* Updated Roofline plots to handle and apply kernel filtering.
|
||||
|
||||
* Update `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit`
|
||||
* Theoretical Bandwidth (section 1202)
|
||||
@@ -71,39 +106,6 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
* SIMD Utilization
|
||||
* Clock Rate
|
||||
|
||||
* Add `Compute Throughput` panel to TUI with the following metrics:
|
||||
* VALU FLOPs
|
||||
* VALU IOPs
|
||||
* MFMA FLOPs (F8)
|
||||
* MFMA FLOPs (BF16)
|
||||
* MFMA FLOPs (F16)
|
||||
* MFMA FLOPs (F32)
|
||||
* MFMA FLOPs (F64)
|
||||
* MFMA FLOPs (F6F4) (in gfx950)
|
||||
* MFMA IOPs (Int8)
|
||||
* SALU Utilization
|
||||
* VALU Utilization
|
||||
* MFMA Utilization
|
||||
* VMEM Utilization
|
||||
* Branch Utilization
|
||||
* IPC
|
||||
|
||||
* Add `Memory Throughput` panel to TUI with the following metrics:
|
||||
* vL1D Cache BW
|
||||
* vL1D Cache Utilization
|
||||
* Theoretical LDS Bandwidth
|
||||
* LDS Utilization
|
||||
* L2 Cache BW
|
||||
* L2 Cache Utilization
|
||||
* L2-Fabric Read BW
|
||||
* L2-Fabric Write BW
|
||||
* sL1D Cache BW
|
||||
* L1I BW
|
||||
* Address Processing Unit Busy
|
||||
* Data-Return Busy
|
||||
* L1I-L2 Bandwidth
|
||||
* sL1D-L2 BW
|
||||
|
||||
* Analysis output:
|
||||
* Replace `-o / --output` analyze mode option with `--output-format` and `--output-name`
|
||||
* Add ``--output-format`` analysis mode option to select the output format of the analysis report.
|
||||
@@ -118,32 +120,30 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
* `--list-available-metrics` analyze mode option to display the metrics available for analysis.
|
||||
* `--block` option cannot be used with `--list-metrics` and `--list-available-metrics`options.
|
||||
|
||||
### Resolved issues
|
||||
### Removed
|
||||
|
||||
* Fixed not detecting memory clock issue when using amd-smi
|
||||
* Fixed standalone GUI crashing
|
||||
* Fixed L2 read/write/atomic bandwidths on MI350
|
||||
* Update metric names for better alignment between analysis configuration and documentation
|
||||
* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100
|
||||
* Updated Roofline plots to handle and apply kernel filtering.
|
||||
|
||||
|
||||
### Known issues
|
||||
* Usage of `rocm-smi` in favor of `amd-smi`.
|
||||
* Hardware IP block-based filtering has been removed in favor of analysis report block-based filtering.
|
||||
* Removed aggregated analysis view from TUI analyze mode.
|
||||
|
||||
### Optimized
|
||||
|
||||
* Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
|
||||
* Improved logic to obtain rocprof supported counters which prevents unnecessary warnings.
|
||||
* Improved post-analysis runtime performance by caching and multi-processing.
|
||||
|
||||
* Improve logic to obtain rocprof supported counters which prevents unnecessary warnings
|
||||
### Resolved issues
|
||||
|
||||
* Improve post-analysis runtime performance by caching and multi-processing
|
||||
* Fixed an issue of not detecting the memory clock when using `amd-smi`.
|
||||
* Fixed standalone GUI crashing.
|
||||
* Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 series accelerators.
|
||||
* Update metric names for better alignment between analysis configuration and documentation
|
||||
* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100.
|
||||
* Updated Roofline plots to handle and apply kernel filtering.
|
||||
|
||||
### Removed
|
||||
|
||||
* Usage of rocm-smi
|
||||
* Hardware IP block based filtering has been removed in favor of analysis report block based filtering
|
||||
* Remove aggregated analysis view from TUI mode
|
||||
### Known issues
|
||||
|
||||
### Upcoming changes
|
||||
|
||||
## ROCm Compute Profiler 3.2.3 for ROCm 7.0.0
|
||||
|
||||
|
||||
İkili dosya gösterilmiyor.
|
Önce Genişlik: | Yükseklik: | Boyut: 79 KiB |
@@ -8,10 +8,10 @@ Text-based User Interface (TUI) analysis
|
||||
|
||||
ROCm Compute Profiler's analyze mode now supports a lightweight Text-based User Interface (TUI)
|
||||
that provides an interactive terminal experience for enhanced usability. You can use the TUI
|
||||
interface as a more visually engaging and interactive alternative to explore analysis results
|
||||
compared to the standard :doc:`cli`. It provides enhanced visual feedback and easy navigation without
|
||||
needing the extra setup of a full graphical interface. This analysis option is implemented as a
|
||||
terminal-based interface that offers real-time visual feedback, keyboard shortcuts for common
|
||||
interface as a more visually engaging and interactive alternative to explore individual kernel analysis
|
||||
results compared to the standard :doc:`cli`. It provides enhanced visual feedback and easy navigation
|
||||
without needing the extra setup of a full graphical interface. This analysis option is implemented as
|
||||
a terminal-based interface that offers real-time visual feedback, keyboard shortcuts for common
|
||||
actions, and improved readability with formatted output.
|
||||
|
||||
.. note::
|
||||
@@ -30,19 +30,24 @@ For example:
|
||||
|
||||
$ rocprof-compute analyze --tui
|
||||
|
||||
2. To start the analysis, use the dropdown menu at the top left of the screen to select a single
|
||||
workload from ``rocprof-compute profile`` generated output directories.
|
||||
2. To start the individual kernel analysis, use the drop-down menu at the top left of the screen to select
|
||||
a single workload from ``rocprof-compute profile`` generated output directories.
|
||||
|
||||
.. image:: ../../data/analyze/tui.png
|
||||
.. image:: ../../data/analyze/tui_home.png
|
||||
:align: center
|
||||
:alt: ROCm Compute Profiler TUI home screen
|
||||
:width: 800
|
||||
|
||||
3. You can see the center window update with collapsed contents. Uncollapse to view tables, charts,
|
||||
and graphs visualizing the analysis data.
|
||||
3. You can see the center window update with a top header for kernel selection and collapsed contents beneath.
|
||||
Select a kernel of interest to load the corresponding analysis results. The top kernel is selected by default.
|
||||
|
||||
4. After the analysis results are loaded, you can start interactive analysis with detailed metrics.
|
||||
You can left click on any metric cell to view detailed descriptions in the dedicated `METRIC DESCRIPTION` tab.
|
||||
.. image:: ../../data/analyze/tui_kernel_selection.png
|
||||
:align: center
|
||||
:alt: ROCm Compute Profiler TUI home screen
|
||||
:width: 800
|
||||
|
||||
4. After the analysis results are loaded, you can start interactive analysis with the detailed metrics by
|
||||
expanding the collapsed contents to view tables, charts, and graphs, and visualizing the analysis data.
|
||||
The TUI supports basic keyboard shortcuts, including quit application commands for easy navigation.
|
||||
|
||||
TUI analysis structure
|
||||
@@ -51,10 +56,28 @@ TUI analysis structure
|
||||
Unlike the :doc:`cli` plain style interfaces, the TUI restructures the analysis workflow into four
|
||||
hierarchical categories to provide a more organized, top-down analysis approach:
|
||||
|
||||
1. Top Stat
|
||||
2. High Level analysis
|
||||
3. Detailed block analysis
|
||||
4. Source Level analysis
|
||||
#. Kernel Selection Header with Top Stats:
|
||||
|
||||
Supports interactive kernel selection to toggle between kernel(s) to view individual kernel
|
||||
analysis results.
|
||||
|
||||
#. High Level Analysis:
|
||||
|
||||
Experimental performance metrics layout, reorganized performance metrics grouping to display the new
|
||||
GPU Speed-of-Light section, Compute Throughput section, and Memory Throughput section.
|
||||
|
||||
#. Detailed Block Analysis
|
||||
|
||||
Displays analysis results grouped by metric blocks, similar to the CLI output.
|
||||
When applicable, performance metrics are shown as charts instead of only tables,
|
||||
providing a more visual representation.
|
||||
|
||||
#. Source Level analysis
|
||||
|
||||
Displays the PC Sampling section.
|
||||
Source Level analysis does not have PC sampling enabled by default during the
|
||||
profiling stage. Refer to :doc:`../pc_sampling` for details on how to build and enable PC sampling
|
||||
manually.
|
||||
|
||||
You are recommended to follow this top-down hierarchical structure to conduct a thorough performance
|
||||
analysis, starting with the broad overview and progressively drilling down to specific details.
|
||||
|
||||
@@ -153,14 +153,6 @@ Analyze mode
|
||||
To generate a lightweight GUI interface, you can add the ``--gui`` flag to your
|
||||
analysis command.
|
||||
|
||||
Analyze mode now supports a lightweight Text-based User Interface (TUI) that
|
||||
provides an interactive terminal experience for enhanced usability. To enable TUI mode,
|
||||
use the ``--tui`` flag when running the analyze command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ rocprof-compute analyze --tui
|
||||
|
||||
This mode is a middle ground to the highly detailed ROCm Compute Profiler Grafana GUI and
|
||||
is great if you want immediate access to a hardware component you’re already
|
||||
familiar with.
|
||||
@@ -169,7 +161,15 @@ Analyze mode
|
||||
|
||||
$ rocprof-compute analyze --help
|
||||
|
||||
See :doc:`analyze/mode` to learn about this mode in depth and to get started
|
||||
Analyze mode now supports a lightweight Text-based User Interface (TUI) that
|
||||
provides an interactive terminal experience for enhanced usability. To enable TUI mode,
|
||||
use the ``--tui`` flag when running the analyze command:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ rocprof-compute analyze --tui
|
||||
|
||||
See :doc:`analyze/mode` to learn about these modes in depth and to get started
|
||||
with analysis using ROCm Compute Profiler.
|
||||
|
||||
.. _modes-database:
|
||||
|
||||
Yeni konuda referans
Bir kullanıcı engelle