[DOC] TUI kernel selection (#94)

Bu işleme şunda yer alıyor:
systems-assistant[bot]
2025-09-08 13:52:39 -04:00
işlemeyi yapan: GitHub
ebeveyn e84f93ea3b
işleme d58adf96da
4 değiştirilmiş dosya ile 98 ekleme ve 75 silme
+51 -51
Dosyayı Görüntüle
@@ -23,6 +23,39 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
* Add support for analysis report output as a sqlite database using ``--output-format db`` analysis mode option
* `Compute Throughput` panel to TUI's `High Level Analysis` category with the following metrics:
* VALU FLOPs
* VALU IOPs
* MFMA FLOPs (F8)
* MFMA FLOPs (BF16)
* MFMA FLOPs (F16)
* MFMA FLOPs (F32)
* MFMA FLOPs (F64)
* MFMA FLOPs (F6F4) (in gfx950)
* MFMA IOPs (Int8)
* SALU Utilization
* VALU Utilization
* MFMA Utilization
* VMEM Utilization
* Branch Utilization
* IPC
* `Memory Throughput` panel to TUI's `High Level Analysis` category with the following metrics:
* vL1D Cache BW
* vL1D Cache Utilization
* Theoretical LDS Bandwidth
* LDS Utilization
* L2 Cache BW
* L2 Cache Utilization
* L2-Fabric Read BW
* L2-Fabric Write BW
* sL1D Cache BW
* L1I BW
* Address Processing Unit Busy
* Data-Return Busy
* L1I-L2 Bandwidth
* sL1D-L2 BW
### Changed
* On memory chart, long string of numbers are displayed as scientific notation. It also solves the issue of overflow of displaying long number
@@ -38,7 +71,9 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
* CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID
* Remove metrics from analysis configuration files which are explicitly marked as empty or None
* Change the basic view of TUI from aggregated analysis data to individual kernel analysis data
* Changed the basic (default) view of TUI from aggregated analysis data to individual kernel analysis data.
* Updated Roofline plots to handle and apply kernel filtering.
* Update `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit`
* Theoretical Bandwidth (section 1202)
@@ -71,39 +106,6 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
* SIMD Utilization
* Clock Rate
* Add `Compute Throughput` panel to TUI with the following metrics:
* VALU FLOPs
* VALU IOPs
* MFMA FLOPs (F8)
* MFMA FLOPs (BF16)
* MFMA FLOPs (F16)
* MFMA FLOPs (F32)
* MFMA FLOPs (F64)
* MFMA FLOPs (F6F4) (in gfx950)
* MFMA IOPs (Int8)
* SALU Utilization
* VALU Utilization
* MFMA Utilization
* VMEM Utilization
* Branch Utilization
* IPC
* Add `Memory Throughput` panel to TUI with the following metrics:
* vL1D Cache BW
* vL1D Cache Utilization
* Theoretical LDS Bandwidth
* LDS Utilization
* L2 Cache BW
* L2 Cache Utilization
* L2-Fabric Read BW
* L2-Fabric Write BW
* sL1D Cache BW
* L1I BW
* Address Processing Unit Busy
* Data-Return Busy
* L1I-L2 Bandwidth
* sL1D-L2 BW
* Analysis output:
* Replace `-o / --output` analyze mode option with `--output-format` and `--output-name`
* Add ``--output-format`` analysis mode option to select the output format of the analysis report.
@@ -118,32 +120,30 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
* `--list-available-metrics` analyze mode option to display the metrics available for analysis.
* `--block` option cannot be used with `--list-metrics` and `--list-available-metrics`options.
### Resolved issues
### Removed
* Fixed not detecting memory clock issue when using amd-smi
* Fixed standalone GUI crashing
* Fixed L2 read/write/atomic bandwidths on MI350
* Update metric names for better alignment between analysis configuration and documentation
* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100
* Updated Roofline plots to handle and apply kernel filtering.
### Known issues
* Usage of `rocm-smi` in favor of `amd-smi`.
* Hardware IP block-based filtering has been removed in favor of analysis report block-based filtering.
* Removed aggregated analysis view from TUI analyze mode.
### Optimized
* Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats.
* Improved logic to obtain rocprof supported counters which prevents unnecessary warnings.
* Improved post-analysis runtime performance by caching and multi-processing.
* Improve logic to obtain rocprof supported counters which prevents unnecessary warnings
### Resolved issues
* Improve post-analysis runtime performance by caching and multi-processing
* Fixed an issue of not detecting the memory clock when using `amd-smi`.
* Fixed standalone GUI crashing.
* Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 series accelerators.
* Update metric names for better alignment between analysis configuration and documentation
* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100.
* Updated Roofline plots to handle and apply kernel filtering.
### Removed
* Usage of rocm-smi
* Hardware IP block based filtering has been removed in favor of analysis report block based filtering
* Remove aggregated analysis view from TUI mode
### Known issues
### Upcoming changes
## ROCm Compute Profiler 3.2.3 for ROCm 7.0.0
İkili dosya gösterilmiyor.

Önce

Genişlik:  |  Yükseklik:  |  Boyut: 79 KiB

+38 -15
Dosyayı Görüntüle
@@ -8,10 +8,10 @@ Text-based User Interface (TUI) analysis
ROCm Compute Profiler's analyze mode now supports a lightweight Text-based User Interface (TUI)
that provides an interactive terminal experience for enhanced usability. You can use the TUI
interface as a more visually engaging and interactive alternative to explore analysis results
compared to the standard :doc:`cli`. It provides enhanced visual feedback and easy navigation without
needing the extra setup of a full graphical interface. This analysis option is implemented as a
terminal-based interface that offers real-time visual feedback, keyboard shortcuts for common
interface as a more visually engaging and interactive alternative to explore individual kernel analysis
results compared to the standard :doc:`cli`. It provides enhanced visual feedback and easy navigation
without needing the extra setup of a full graphical interface. This analysis option is implemented as
a terminal-based interface that offers real-time visual feedback, keyboard shortcuts for common
actions, and improved readability with formatted output.
.. note::
@@ -30,19 +30,24 @@ For example:
$ rocprof-compute analyze --tui
2. To start the analysis, use the dropdown menu at the top left of the screen to select a single
workload from ``rocprof-compute profile`` generated output directories.
2. To start the individual kernel analysis, use the drop-down menu at the top left of the screen to select
a single workload from ``rocprof-compute profile`` generated output directories.
.. image:: ../../data/analyze/tui.png
.. image:: ../../data/analyze/tui_home.png
:align: center
:alt: ROCm Compute Profiler TUI home screen
:width: 800
3. You can see the center window update with collapsed contents. Uncollapse to view tables, charts,
and graphs visualizing the analysis data.
3. You can see the center window update with a top header for kernel selection and collapsed contents beneath.
Select a kernel of interest to load the corresponding analysis results. The top kernel is selected by default.
4. After the analysis results are loaded, you can start interactive analysis with detailed metrics.
You can left click on any metric cell to view detailed descriptions in the dedicated `METRIC DESCRIPTION` tab.
.. image:: ../../data/analyze/tui_kernel_selection.png
:align: center
:alt: ROCm Compute Profiler TUI home screen
:width: 800
4. After the analysis results are loaded, you can start interactive analysis with the detailed metrics by
expanding the collapsed contents to view tables, charts, and graphs, and visualizing the analysis data.
The TUI supports basic keyboard shortcuts, including quit application commands for easy navigation.
TUI analysis structure
@@ -51,10 +56,28 @@ TUI analysis structure
Unlike the :doc:`cli` plain style interfaces, the TUI restructures the analysis workflow into four
hierarchical categories to provide a more organized, top-down analysis approach:
1. Top Stat
2. High Level analysis
3. Detailed block analysis
4. Source Level analysis
#. Kernel Selection Header with Top Stats:
Supports interactive kernel selection to toggle between kernel(s) to view individual kernel
analysis results.
#. High Level Analysis:
Experimental performance metrics layout, reorganized performance metrics grouping to display the new
GPU Speed-of-Light section, Compute Throughput section, and Memory Throughput section.
#. Detailed Block Analysis
Displays analysis results grouped by metric blocks, similar to the CLI output.
When applicable, performance metrics are shown as charts instead of only tables,
providing a more visual representation.
#. Source Level analysis
Displays the PC Sampling section.
Source Level analysis does not have PC sampling enabled by default during the
profiling stage. Refer to :doc:`../pc_sampling` for details on how to build and enable PC sampling
manually.
You are recommended to follow this top-down hierarchical structure to conduct a thorough performance
analysis, starting with the broad overview and progressively drilling down to specific details.
+9 -9
Dosyayı Görüntüle
@@ -153,14 +153,6 @@ Analyze mode
To generate a lightweight GUI interface, you can add the ``--gui`` flag to your
analysis command.
Analyze mode now supports a lightweight Text-based User Interface (TUI) that
provides an interactive terminal experience for enhanced usability. To enable TUI mode,
use the ``--tui`` flag when running the analyze command:
.. code-block:: shell
$ rocprof-compute analyze --tui
This mode is a middle ground to the highly detailed ROCm Compute Profiler Grafana GUI and
is great if you want immediate access to a hardware component you’re already
familiar with.
@@ -169,7 +161,15 @@ Analyze mode
$ rocprof-compute analyze --help
See :doc:`analyze/mode` to learn about this mode in depth and to get started
Analyze mode now supports a lightweight Text-based User Interface (TUI) that
provides an interactive terminal experience for enhanced usability. To enable TUI mode,
use the ``--tui`` flag when running the analyze command:
.. code-block:: shell
$ rocprof-compute analyze --tui
See :doc:`analyze/mode` to learn about these modes in depth and to get started
with analysis using ROCm Compute Profiler.
.. _modes-database: