diff --git a/projects/rocprofiler-compute/CHANGELOG.md b/projects/rocprofiler-compute/CHANGELOG.md index 33addf4b52..dee38fcd7e 100644 --- a/projects/rocprofiler-compute/CHANGELOG.md +++ b/projects/rocprofiler-compute/CHANGELOG.md @@ -23,6 +23,39 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs. * Add support for analysis report output as a sqlite database using ``--output-format db`` analysis mode option +* `Compute Throughput` panel to TUI's `High Level Analysis` category with the following metrics: + * VALU FLOPs + * VALU IOPs + * MFMA FLOPs (F8) + * MFMA FLOPs (BF16) + * MFMA FLOPs (F16) + * MFMA FLOPs (F32) + * MFMA FLOPs (F64) + * MFMA FLOPs (F6F4) (in gfx950) + * MFMA IOPs (Int8) + * SALU Utilization + * VALU Utilization + * MFMA Utilization + * VMEM Utilization + * Branch Utilization + * IPC + +* `Memory Throughput` panel to TUI's `High Level Analysis` category with the following metrics: + * vL1D Cache BW + * vL1D Cache Utilization + * Theoretical LDS Bandwidth + * LDS Utilization + * L2 Cache BW + * L2 Cache Utilization + * L2-Fabric Read BW + * L2-Fabric Write BW + * sL1D Cache BW + * L1I BW + * Address Processing Unit Busy + * Data-Return Busy + * L1I-L2 Bandwidth + * sL1D-L2 BW + ### Changed * On memory chart, long string of numbers are displayed as scientific notation. It also solves the issue of overflow of displaying long number @@ -38,7 +71,9 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs. * CLI analysis mode baseline comparison will now only compare common metrics across workloads and will not show Metric ID * Remove metrics from analysis configuration files which are explicitly marked as empty or None -* Change the basic view of TUI from aggregated analysis data to individual kernel analysis data +* Changed the basic (default) view of TUI from aggregated analysis data to individual kernel analysis data. + +* Updated Roofline plots to handle and apply kernel filtering. * Update `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit` * Theoretical Bandwidth (section 1202) @@ -71,39 +106,6 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs. * SIMD Utilization * Clock Rate -* Add `Compute Throughput` panel to TUI with the following metrics: - * VALU FLOPs - * VALU IOPs - * MFMA FLOPs (F8) - * MFMA FLOPs (BF16) - * MFMA FLOPs (F16) - * MFMA FLOPs (F32) - * MFMA FLOPs (F64) - * MFMA FLOPs (F6F4) (in gfx950) - * MFMA IOPs (Int8) - * SALU Utilization - * VALU Utilization - * MFMA Utilization - * VMEM Utilization - * Branch Utilization - * IPC - -* Add `Memory Throughput` panel to TUI with the following metrics: - * vL1D Cache BW - * vL1D Cache Utilization - * Theoretical LDS Bandwidth - * LDS Utilization - * L2 Cache BW - * L2 Cache Utilization - * L2-Fabric Read BW - * L2-Fabric Write BW - * sL1D Cache BW - * L1I BW - * Address Processing Unit Busy - * Data-Return Busy - * L1I-L2 Bandwidth - * sL1D-L2 BW - * Analysis output: * Replace `-o / --output` analyze mode option with `--output-format` and `--output-name` * Add ``--output-format`` analysis mode option to select the output format of the analysis report. @@ -118,32 +120,30 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs. * `--list-available-metrics` analyze mode option to display the metrics available for analysis. * `--block` option cannot be used with `--list-metrics` and `--list-available-metrics`options. -### Resolved issues +### Removed -* Fixed not detecting memory clock issue when using amd-smi -* Fixed standalone GUI crashing -* Fixed L2 read/write/atomic bandwidths on MI350 -* Update metric names for better alignment between analysis configuration and documentation -* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100 -* Updated Roofline plots to handle and apply kernel filtering. - - -### Known issues +* Usage of `rocm-smi` in favor of `amd-smi`. +* Hardware IP block-based filtering has been removed in favor of analysis report block-based filtering. +* Removed aggregated analysis view from TUI analyze mode. ### Optimized * Improved `--time-unit` option in analyze mode to apply time unit conversion across all analysis sections, not just kernel top stats. +* Improved logic to obtain rocprof supported counters which prevents unnecessary warnings. +* Improved post-analysis runtime performance by caching and multi-processing. -* Improve logic to obtain rocprof supported counters which prevents unnecessary warnings +### Resolved issues -* Improve post-analysis runtime performance by caching and multi-processing +* Fixed an issue of not detecting the memory clock when using `amd-smi`. +* Fixed standalone GUI crashing. +* Fixed L2 read/write/atomic bandwidths on AMD Instinct MI350 series accelerators. +* Update metric names for better alignment between analysis configuration and documentation +* Fixed an issue where accumulation counters could not be collected on AMD Instinct MI100. +* Updated Roofline plots to handle and apply kernel filtering. -### Removed - -* Usage of rocm-smi -* Hardware IP block based filtering has been removed in favor of analysis report block based filtering -* Remove aggregated analysis view from TUI mode +### Known issues +### Upcoming changes ## ROCm Compute Profiler 3.2.3 for ROCm 7.0.0 diff --git a/projects/rocprofiler-compute/docs/data/analyze/tui.png b/projects/rocprofiler-compute/docs/data/analyze/tui.png deleted file mode 100644 index 60f7c2b6f0..0000000000 Binary files a/projects/rocprofiler-compute/docs/data/analyze/tui.png and /dev/null differ diff --git a/projects/rocprofiler-compute/docs/how-to/analyze/tui.rst b/projects/rocprofiler-compute/docs/how-to/analyze/tui.rst index aa4438fb48..95d3c9902c 100644 --- a/projects/rocprofiler-compute/docs/how-to/analyze/tui.rst +++ b/projects/rocprofiler-compute/docs/how-to/analyze/tui.rst @@ -8,10 +8,10 @@ Text-based User Interface (TUI) analysis ROCm Compute Profiler's analyze mode now supports a lightweight Text-based User Interface (TUI) that provides an interactive terminal experience for enhanced usability. You can use the TUI -interface as a more visually engaging and interactive alternative to explore analysis results -compared to the standard :doc:`cli`. It provides enhanced visual feedback and easy navigation without -needing the extra setup of a full graphical interface. This analysis option is implemented as a -terminal-based interface that offers real-time visual feedback, keyboard shortcuts for common +interface as a more visually engaging and interactive alternative to explore individual kernel analysis +results compared to the standard :doc:`cli`. It provides enhanced visual feedback and easy navigation +without needing the extra setup of a full graphical interface. This analysis option is implemented as +a terminal-based interface that offers real-time visual feedback, keyboard shortcuts for common actions, and improved readability with formatted output. .. note:: @@ -30,19 +30,24 @@ For example: $ rocprof-compute analyze --tui -2. To start the analysis, use the dropdown menu at the top left of the screen to select a single -workload from ``rocprof-compute profile`` generated output directories. +2. To start the individual kernel analysis, use the drop-down menu at the top left of the screen to select +a single workload from ``rocprof-compute profile`` generated output directories. -.. image:: ../../data/analyze/tui.png +.. image:: ../../data/analyze/tui_home.png :align: center :alt: ROCm Compute Profiler TUI home screen :width: 800 -3. You can see the center window update with collapsed contents. Uncollapse to view tables, charts, -and graphs visualizing the analysis data. +3. You can see the center window update with a top header for kernel selection and collapsed contents beneath. +Select a kernel of interest to load the corresponding analysis results. The top kernel is selected by default. -4. After the analysis results are loaded, you can start interactive analysis with detailed metrics. -You can left click on any metric cell to view detailed descriptions in the dedicated `METRIC DESCRIPTION` tab. +.. image:: ../../data/analyze/tui_kernel_selection.png + :align: center + :alt: ROCm Compute Profiler TUI home screen + :width: 800 + +4. After the analysis results are loaded, you can start interactive analysis with the detailed metrics by +expanding the collapsed contents to view tables, charts, and graphs, and visualizing the analysis data. The TUI supports basic keyboard shortcuts, including quit application commands for easy navigation. TUI analysis structure @@ -51,10 +56,28 @@ TUI analysis structure Unlike the :doc:`cli` plain style interfaces, the TUI restructures the analysis workflow into four hierarchical categories to provide a more organized, top-down analysis approach: -1. Top Stat -2. High Level analysis -3. Detailed block analysis -4. Source Level analysis +#. Kernel Selection Header with Top Stats: + + Supports interactive kernel selection to toggle between kernel(s) to view individual kernel + analysis results. + +#. High Level Analysis: + + Experimental performance metrics layout, reorganized performance metrics grouping to display the new + GPU Speed-of-Light section, Compute Throughput section, and Memory Throughput section. + +#. Detailed Block Analysis + + Displays analysis results grouped by metric blocks, similar to the CLI output. + When applicable, performance metrics are shown as charts instead of only tables, + providing a more visual representation. + +#. Source Level analysis + + Displays the PC Sampling section. + Source Level analysis does not have PC sampling enabled by default during the + profiling stage. Refer to :doc:`../pc_sampling` for details on how to build and enable PC sampling + manually. You are recommended to follow this top-down hierarchical structure to conduct a thorough performance analysis, starting with the broad overview and progressively drilling down to specific details. diff --git a/projects/rocprofiler-compute/docs/how-to/use.rst b/projects/rocprofiler-compute/docs/how-to/use.rst index 2c43a111ce..614be6a53b 100644 --- a/projects/rocprofiler-compute/docs/how-to/use.rst +++ b/projects/rocprofiler-compute/docs/how-to/use.rst @@ -153,14 +153,6 @@ Analyze mode To generate a lightweight GUI interface, you can add the ``--gui`` flag to your analysis command. - Analyze mode now supports a lightweight Text-based User Interface (TUI) that - provides an interactive terminal experience for enhanced usability. To enable TUI mode, - use the ``--tui`` flag when running the analyze command: - - .. code-block:: shell - - $ rocprof-compute analyze --tui - This mode is a middle ground to the highly detailed ROCm Compute Profiler Grafana GUI and is great if you want immediate access to a hardware component you’re already familiar with. @@ -169,7 +161,15 @@ Analyze mode $ rocprof-compute analyze --help -See :doc:`analyze/mode` to learn about this mode in depth and to get started + Analyze mode now supports a lightweight Text-based User Interface (TUI) that + provides an interactive terminal experience for enhanced usability. To enable TUI mode, + use the ``--tui`` flag when running the analyze command: + + .. code-block:: shell + + $ rocprof-compute analyze --tui + +See :doc:`analyze/mode` to learn about these modes in depth and to get started with analysis using ROCm Compute Profiler. .. _modes-database: