Support MI 350 profiling (#632)

* Add MI 350 hardware information * Refactor MI GPU YAML file and corresponding interface * Add SoC file for gfx950 architecture * Add analysis report configs for MI 350 containing existing metrics * Add placeholder None valued metrics for previous architectures to make baseline comparison work * Enable testing on MI 350 * Analysis config metric changes - SPI changes - Update metric formula for default SPI pipe counter - Use efficiently collected pipe wise SPI counters - Add SPI Wave Occupancy - Add Scheduler-Pipe Wave Utilization - Update formula for VGPR Writes - Add Scheduler-Pipe FIFO Full Rate - CPC changes - Add CPC SYNC FIFO Full Rate - Add CPC CANE Stall Rate - Add CPC ADC Utilization - SQ changes - Add VALU co-issue efficiency - Add F6F4 datatype metrics - Update formula for total FLOPs by adding F6F4 counters - Add LDS STORE / LOAD / ATOMIC metrics - Add LDS STORE / LOAD / ATOMIC bandwidth - Add LDS FIFO and TA ADDR / CMD / DATA FIFO full rates * Collect TCP_TCP_LATENCY_sum only for gfx950 (MI 350) * Do not inject SQ_ACCUM_PREV_HIRES unnecesarily * Do not hardcode memory and shader clock speeds * Write num_hbm_channels to sysinfo.csv instead of hbm_bw while profiling * Move generate sysinfo.csv to pre processing step of profiling * Add warnings to use --specs-correction for missing sysinfo.csv values during analysis phase * Update CHANGELOG * Analysis phase warning to use --specs-correction when needed [ROCm/rocprofiler-compute commit: f9aa7be97c]
2025-04-03 02:21:18 -04:00
commit 27585a8a2b
@@ -22,17 +22,33 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.

 * Support host-trap PC Sampling on CLI (beta version)

-* Add support for tuned performance counters for gfx950 GPUs
-  * Add L1 latencies
-  * Add L2 latencies
-  * Add L2 to EA stalls
-  * Add L2 to EA stalls per channel
+* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
+  * VALU co-issue (Two VALUs are issued instructions) efficiency
+  * Stream Processor Instruction (SPI) Wave Occupancy
+  * Scheduler-Pipe Wave Utilization
+  * Scheduler FIFO Full Rate
+  * CPC ADC Utilization
+  * F6F4 datatype metrics
+  * Update formula for total FLOPs while taking into account F6F4 ops
+  * LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
+  * LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
+  * LDS FIFO full rate
+  * Sequencer -> TA ADDR Stall rates
+  * Sequencer -> TA CMD Stall rates
+  * Sequencer -> TA DATA Stall rates
+  * L1 latencies
+  * L2 latencies
+  * L2 to EA stalls
+  * L2 to EA stalls per channel

 ### Changed

 * Change normal_unit default to per_kernel
 * Change dependency from rocm-smi to amd-smi
 * Decrease profiling time by not collecting counters not used in post analysis
+* Update definition of following metrics for MI 350:
+  * VGPR Writes
+  * Total FLOPs (consider fp6 and fp4 ops)

 ### Resolved issues

@@ -44,6 +60,14 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.

 * GPU id filtering is not supported when using rocprof v3

+* Analysis of previously collected workload data will not work due to sysinfo.csv schema change
+  * As a workaround, run the profiling operation again for the workload and interrupt the process after ten seconds.
+    Followed by copying the `sysinfo.csv` file from the new data folder to the old one.
+    This assumes your system specification hasn't changed since the creation of the previous workload data.
+
+* Analysis of new workloads might require providing shader/memory clock speed using
+--specs-correction operation if `amd-smi` or `rocminfo` does not provide clock speeds.
+
 ## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0

 ### Added
@@ -292,9 +292,8 @@ add_test(
 add_test(
    NAME test_L1_cache_counters
    COMMAND
-        ${Python3_EXECUTABLE} -m pytest -m L1_cache
-        --junitxml=tests/test_TCP_counters.xml ${COV_OPTION}
-        ${PROJECT_SOURCE_DIR}/tests/test_TCP_counters.py
+        ${Python3_EXECUTABLE} -m pytest -m L1_cache --junitxml=tests/test_TCP_counters.xml
+        ${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_TCP_counters.py
    WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

 # ---------
@@ -673,7 +673,7 @@ Examples:
        "--specs-correction",
        type=str,
        metavar="",
-        help="\t\tSpecify the specs to correct.",
+        help="\t\tSpecify the specs to correct. e.g. --specs-correction='specname1:specvalue1,specname2:specvalue2'",
    )
    analyze_advanced_group.add_argument(
        "--list-nodes",
@@ -107,7 +107,6 @@ class webui_analysis(OmniAnalyze_Base):
            console_debug("analysis", "gui normalization is %s" % norm_filt)

            base_data = self.initalize_runs()  # Re-initalizes everything
-            hbm_bw = base_data[base_run].sys_info["hbm_bw"][0]
            panel_configs = copy.deepcopy(arch_configs.panel_configs)
            # Generate original raw df
            base_data[base_run].raw_pmc = file_io.create_df_pmc(
@@ -231,7 +230,6 @@ class webui_analysis(OmniAnalyze_Base):
                                norm_filt=norm_filt,
                                comparable_columns=comparable_columns,
                                decimal=self.get_args().decimal,
-                                hbm_bw=base_data[base_run].sys_info["hbm_bw"][0],
                            )

                            # Update content for this section
@@ -358,7 +356,6 @@ def determine_chart_type(
    norm_filt,
    comparable_columns,
    decimal,
-    hbm_bw,
 ):
    content = []

@@ -372,9 +369,7 @@ def determine_chart_type(
    # Determine chart type:
    # a) Barchart
    if table_config["id"] in [x for i in barchart_elements.values() for x in i]:
-        d_figs = build_bar_chart(
-            display_df, table_config, barchart_elements, norm_filt, hbm_bw
-        )
+        d_figs = build_bar_chart(display_df, table_config, barchart_elements, norm_filt)
        # Smaller formatting if barchart yeilds several graphs
        if (
            len(d_figs)
@@ -311,6 +311,21 @@ class RocProfCompute_Base:
        if self.__args.name.find(".") != -1 or self.__args.name.find("-") != -1:
            console_error("'-' and '.' are not permitted in -n/--name")

+        gen_sysinfo(
+            workload_name=self.__args.name,
+            workload_dir=self.get_args().path,
+            ip_blocks=[
+                name
+                for name, type in self.__args.filter_blocks.items()
+                if type == "hardware_block"
+            ],
+            app_cmd=self.__args.remaining,
+            skip_roof=self.__args.no_roof,
+            roof_only=self.__args.roof_only,
+            mspec=self._soc._mspec,
+            soc=self._soc,
+        )
+
    @abstractmethod
    def run_profiling(self, version: str, prog: str):
        """Run profiling."""
@@ -446,21 +461,6 @@ class RocProfCompute_Base:
            "performing post-processing using %s profiler" % self.__profiler,
        )

-        gen_sysinfo(
-            workload_name=self.__args.name,
-            workload_dir=self.get_args().path,
-            ip_blocks=[
-                name
-                for name, type in self.__args.filter_blocks.items()
-                if type == "hardware_block"
-            ],
-            app_cmd=self.__args.remaining,
-            skip_roof=self.__args.no_roof,
-            roof_only=self.__args.roof_only,
-            mspec=self._soc._mspec,
-            soc=self._soc,
-        )
-

 def test_df_column_equality(df):
    return df.eq(df.iloc[:, 0], axis=0).all(1).all()
@@ -62,6 +62,13 @@ Panel Config:
            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
            pop: None # No perf counter
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (Int8):
            value: None # No perf counter
            unit: GOPs
@@ -179,17 +186,17 @@ Panel Config:
            value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
              * 64)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Write BW:
            value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Read Latency:
            value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
@@ -19,6 +19,24 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          CPC SYNC FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          CPC CANE Stall Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          CPC ADC Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          CPF Utilization:
            avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Schedule-Pipe Wave Occupancy:
+            avg: None
+            min: None
+            max: None
+            unit: Wave
+            tips:
          Accelerator Utilization:
            avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
            min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
            max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe Wave Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Workgroup Manager Utilization:
            avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
            min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
              0) else None)
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Scheduler-Pipe Stall Rate:
            avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
              0) else None))
@@ -181,6 +181,13 @@ Panel Config:
            max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
            unit: (instr + $normUnit)
            tips:
+         # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          Spill/Stack Coalesceable Instr:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
          Global/Generic Coalesceable Instr:
            avg: None # No perf counter
            min: None # No perf counter
@@ -283,3 +290,10 @@ Panel Config:
            max: None # No HW module
            unit: (instr + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA-F6F4:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
@@ -61,6 +61,13 @@ Panel Config:
            peak: None
            pop: None
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (INT8):
            value: None # No perf counter
            unit: None
@@ -109,6 +116,13 @@ Panel Config:
            max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          VALU Co-Issue Efficiency:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          VMEM Utilization:
            avg: None # No HW module
            min: None # No HW module
@@ -210,6 +224,13 @@ Panel Config:
            max: None # No perf counter
            unit: (OPs  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          F6F4 OPs:
+            avg: None
+            min: None
+            max: None
+            unit: (OPs  + $normUnit)
+            tips:
          INT8 OPs:
            avg: None # No perf counter
            min: None # No perf counter
@@ -55,6 +55,48 @@ Panel Config:
            max: MAX((SQ_INSTS_LDS / $denom))
            unit: (Instr  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
          Theoretical Bandwidth:
            avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
              / $denom))
@@ -116,3 +158,17 @@ Panel Config:
            max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
            unit: (Accesses + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Command FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Data FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -43,6 +43,27 @@ Panel Config:
            max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Address Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Command Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Data Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
          Total Instructions:
            avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
            min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:
+          HBM Bandwidth:
+            value: $hbmBandwidth
+            unit: GB/s
+            tips:

    - metric_table:
        id: 1702
@@ -62,6 +62,13 @@ Panel Config:
            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
            pop: None # No perf counter
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (Int8):
            value: None # No perf counter
            unit: GOPs
@@ -179,17 +186,17 @@ Panel Config:
            value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
              * 64)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Write BW:
            value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Read Latency:
            value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
@@ -19,6 +19,27 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC SYNC FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC CANE Stall Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC ADC Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          CPF Utilization:
            avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Schedule-Pipe Wave Occupancy:
+            avg: None
+            min: None
+            max: None
+            unit: Wave
+            tips:
          Accelerator Utilization:
            avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
            min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
            max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe Wave Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Workgroup Manager Utilization:
            avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
            min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
              0) else None)
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Scheduler-Pipe Stall Rate:
            avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
              0) else None))
@@ -181,6 +181,13 @@ Panel Config:
            max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
            unit: (instr + $normUnit)
            tips:
+         # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          Spill/Stack Coalesceable Instr:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
          Global/Generic Read:
            avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
            min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
@@ -271,3 +278,10 @@ Panel Config:
            max: None # No HW module
            unit: (instr + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA-F6F4:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
@@ -61,6 +61,13 @@ Panel Config:
            peak: None
            pop: None
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (INT8):
            value: None # No perf counter
            unit: None
@@ -109,6 +116,13 @@ Panel Config:
            max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          VALU Co-Issue Efficiency:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          VMEM Utilization:
            avg: None # No HW module
            min: None # No HW module
@@ -210,6 +224,13 @@ Panel Config:
            max: None # No perf counter
            unit: (OPs  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          F6F4 OPs:
+            avg: None
+            min: None
+            max: None
+            unit: (OPs  + $normUnit)
+            tips:
          INT8 OPs:
            avg: None # No perf counter
            min: None # No perf counter
@@ -55,6 +55,48 @@ Panel Config:
            max: MAX((SQ_INSTS_LDS / $denom))
            unit: (Instr  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
          Theoretical Bandwidth:
            avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
              / $denom))
@@ -116,3 +158,17 @@ Panel Config:
            max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
            unit: (Accesses + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Command FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Data FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -43,6 +43,27 @@ Panel Config:
            max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Address Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Command Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Data Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
          Total Instructions:
            avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
            min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:
+          HBM Bandwidth:
+            value: $hbmBandwidth
+            unit: GB/s
+            tips:

    - metric_table:
        id: 1702
@@ -76,6 +76,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (Int8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -197,17 +204,17 @@ Panel Config:
            value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
              * 64)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Write BW:
            value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Read Latency:
            value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
@@ -19,6 +19,24 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          CPC SYNC FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          CPC CANE Stall Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          CPC ADC Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          CPF Utilization:
            avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Schedule-Pipe Wave Occupancy:
+            avg: None
+            min: None
+            max: None
+            unit: Wave
+            tips:
          Accelerator Utilization:
            avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
            min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
            max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe Wave Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Workgroup Manager Utilization:
            avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
            min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
              0) else None)
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Scheduler-Pipe Stall Rate:
            avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
              0) else None))
@@ -181,6 +181,13 @@ Panel Config:
            max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
            unit: (instr + $normUnit)
            tips:
+         # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          Spill/Stack Coalesceable Instr:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
          Global/Generic Read:
            avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
            min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
@@ -271,3 +278,10 @@ Panel Config:
            max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
            unit: (instr + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA-F6F4:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
@@ -75,6 +75,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (INT8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -124,6 +131,13 @@ Panel Config:
            max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          VALU Co-Issue Efficiency:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          VMEM Utilization:
            avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -264,6 +278,13 @@ Panel Config:
              + (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
            unit: (OPs  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          F6F4 OPs:
+            avg: None
+            min: None
+            max: None
+            unit: (OPs  + $normUnit)
+            tips:
          INT8 OPs:
            avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
            min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
            max: MAX((SQ_INSTS_LDS / $denom))
            unit: (Instr  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
          Theoretical Bandwidth:
            avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
              / $denom))
@@ -116,3 +158,17 @@ Panel Config:
            max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
            unit: (Accesses + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Command FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Data FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -43,6 +43,27 @@ Panel Config:
            max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Address Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Command Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Data Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
          Total Instructions:
            avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
            min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:
+          HBM Bandwidth:
+            value: $hbmBandwidth
+            unit: GB/s
+            tips:

    - metric_table:
        id: 1702
@@ -77,6 +77,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (Int8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -198,18 +205,18 @@ Panel Config:
                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
-                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Write BW:
            value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Read Latency:
            value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
@@ -19,6 +19,27 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC SYNC FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC CANE Stall Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC ADC Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          CPF Utilization:
            avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Schedule-Pipe Wave Occupancy:
+            avg: None
+            min: None
+            max: None
+            unit: Wave
+            tips:
          Accelerator Utilization:
            avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
            min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
            max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe Wave Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Workgroup Manager Utilization:
            avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
            min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
              0) else None)
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Scheduler-Pipe Stall Rate:
            avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
              0) else None))
@@ -209,6 +209,13 @@ Panel Config:
            max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
            unit: (instr + $normUnit)
            tips:
+         # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          Spill/Stack Coalesceable Instr:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
          Spill/Stack Read:
            avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
            min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
@@ -274,4 +281,11 @@ Panel Config:
            min: MIN((SQ_INSTS_VALU_MFMA_F64 / $denom))
            max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
            unit: (instr + $normUnit)
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA-F6F4:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
            tips:
@@ -76,6 +76,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (INT8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -125,6 +132,13 @@ Panel Config:
            max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          VALU Co-Issue Efficiency:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          VMEM Utilization:
            avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -265,6 +279,13 @@ Panel Config:
              + (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
            unit: (OPs  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          F6F4 OPs:
+            avg: None
+            min: None
+            max: None
+            unit: (OPs  + $normUnit)
+            tips:
          INT8 OPs:
            avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
            min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
            max: MAX((SQ_INSTS_LDS / $denom))
            unit: (Instr  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
          Theoretical Bandwidth:
            avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
              / $denom))
@@ -116,3 +158,17 @@ Panel Config:
            max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
            unit: (Accesses + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Command FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Data FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -43,6 +43,27 @@ Panel Config:
            max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Address Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Command Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Data Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
          Total Instructions:
            avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
            min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:
+          HBM Bandwidth:
+            value: $hbmBandwidth
+            unit: GB/s
+            tips:

    - metric_table:
        id: 1702
@@ -77,6 +77,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (Int8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -198,18 +205,18 @@ Panel Config:
                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
-                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Write BW:
            value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Read Latency:
            value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
@@ -19,6 +19,27 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          CPC SYNC FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          CPC CANE Stall Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          CPC ADC Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          CPF Utilization:
            avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Schedule-Pipe Wave Occupancy:
+            avg: None
+            min: None
+            max: None
+            unit: Wave
+            tips:
          Accelerator Utilization:
            avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
            min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
            max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe Wave Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Workgroup Manager Utilization:
            avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
            min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
              0) else None)
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Scheduler-Pipe Stall Rate:
            avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
              0) else None))
@@ -209,6 +209,13 @@ Panel Config:
            max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
            unit: (instr + $normUnit)
            tips:
+         # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          Spill/Stack Coalesceable Instr:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
          Spill/Stack Read:
            avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
            min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
@@ -275,3 +282,10 @@ Panel Config:
            max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
            unit: (instr + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA-F6F4:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
@@ -76,6 +76,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (INT8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -125,6 +132,13 @@ Panel Config:
            max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          VALU Co-Issue Efficiency:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          VMEM Utilization:
            avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -265,6 +279,13 @@ Panel Config:
              + (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
            unit: (OPs  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          F6F4 OPs:
+            avg: None
+            min: None
+            max: None
+            unit: (OPs  + $normUnit)
+            tips:
          INT8 OPs:
            avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
            min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
            max: MAX((SQ_INSTS_LDS / $denom))
            unit: (Instr  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
          Theoretical Bandwidth:
            avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
              / $denom))
@@ -116,3 +158,17 @@ Panel Config:
            max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
            unit: (Accesses + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Command FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Data FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -43,6 +43,27 @@ Panel Config:
            max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Address Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Command Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Data Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
          Total Instructions:
            avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
            min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:
+          HBM Bandwidth:
+            value: $hbmBandwidth
+            unit: GB/s
+            tips:

    - metric_table:
        id: 1702
@@ -77,6 +77,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (Int8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -198,18 +205,18 @@ Panel Config:
                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
-                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Write BW:
            value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
-            peak: $hbm_bw
+            peak: $hbmBandwidth
            pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
+              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
            tips:
          L2-Fabric Read Latency:
            value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
@@ -76,6 +76,27 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC SYNC FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC CANE Stall Rate:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          CPC ADC Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          CPC Utilization:
            avg: AVG((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
              if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Schedule-Pipe Wave Occupancy:
+            avg: None
+            min: None
+            max: None
+            unit: Wave
+            tips:
          Accelerator Utilization:
            avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
            min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
            max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe Wave Utilization:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Workgroup Manager Utilization:
            avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
            min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
              0) else None)
            unit: Pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Scheduler-Pipe FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: Pct
+            tips:
          Scheduler-Pipe Stall Rate:
            avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
              0) else None))
@@ -209,6 +209,13 @@ Panel Config:
            max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
            unit: (instr + $normUnit)
            tips:
+         # TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
+          Spill/Stack Coalesceable Instr:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
          Spill/Stack Read:
            avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
            min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
@@ -275,3 +282,10 @@ Panel Config:
            max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
            unit: (instr + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA-F6F4:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
@@ -76,6 +76,13 @@ Panel Config:
            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          MFMA FLOPs (F6F4):
+            value: None
+            unit: GFLOP
+            peak: None
+            pop: None
+            tips:
          MFMA IOPs (INT8):
            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
            unit: GIOP
@@ -125,6 +132,13 @@ Panel Config:
            max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          VALU Co-Issue Efficiency:
+            avg: None
+            min: None
+            max: None
+            unit: pct
+            tips:
          VMEM Utilization:
            avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
            min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -265,6 +279,13 @@ Panel Config:
              + (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
            unit: (OPs  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          F6F4 OPs:
+            avg: None
+            min: None
+            max: None
+            unit: (OPs  + $normUnit)
+            tips:
          INT8 OPs:
            avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
            min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
            max: MAX((SQ_INSTS_LDS / $denom))
            unit: (Instr  + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC:
+            avg: None
+            min: None
+            max: None
+            unit: (instr + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS LOAD Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS STORE Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS ATOMIC Bandwidth:
+            avg: None
+            min: None
+            max: None
+            units: Gbps
+            tips:
          Theoretical Bandwidth:
            avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
              / $denom))
@@ -116,3 +158,17 @@ Panel Config:
            max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
            unit: (Accesses + $normUnit)
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Command FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          LDS Data FIFO Full Rate:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -43,6 +43,27 @@ Panel Config:
            max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
            unit: pct
            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Address Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Command Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
+          # TODO: Fix baseline comparision logic to handle non existent metrics, then 
+          Sequencer → TA Data Stall:
+            avg: None
+            min: None
+            max: None
+            unit: (Cycles + $normUnit)
+            tips:
          Total Instructions:
            avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
            min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -41,6 +41,10 @@ Panel Config:
              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:
+          HBM Bandwidth:
+            value: $hbmBandwidth
+            unit: GB/s
+            tips:

    - metric_table:
        id: 1702
@@ -0,0 +1,14 @@
+---
+Panel Config:
+  id: 000
+  title: Top Stats
+  data source:
+    - raw_csv_table:
+        id: 001
+        title: Top Kernels
+        source: pmc_kernel_top.csv
+
+    - raw_csv_table:
+        id: 002
+        title: Dispatch List
+        source: pmc_dispatch_info.csv
@@ -0,0 +1,9 @@
+---
+Panel Config:
+  id: 100
+  title: System Info
+  data source:
+    - raw_csv_table:
+        id: 101
+        source: sysinfo.csv
+        columnwise: True
@@ -0,0 +1,269 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+  SALU: &SALU_anchor Scalar Arithmetic Logic Unit
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 200
+  title: System Speed-of-Light
+  data source:
+    - metric_table:
+        id: 201
+        title: Speed-of-Light
+        header:
+          metric: Metric
+          value: Avg
+          unit: Unit
+          peak: Peak
+          pop: Pct of Peak
+          tips: Tips
+        metric:
+          VALU FLOPs:
+            value: AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+              + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+              + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32)))) + (64 * (((SQ_INSTS_VALU_ADD_F64
+              + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64) + (2 * SQ_INSTS_VALU_FMA_F64))))
+              / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
+            pop: ((100 * AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16)
+              + SQ_INSTS_VALU_TRANS_F16) + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32
+              + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32))))
+              + (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (2 * SQ_INSTS_VALU_FMA_F64)))) / (End_Timestamp - Start_Timestamp)))) / (((($max_sclk
+              * $cu_per_gpu) * 64) * 2) / 1000))
+            tips:
+          VALU IOPs:
+            value: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp - Start_Timestamp)))
+            unit: GIOP
+            peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
+            pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
+              - Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
+            tips:
+          MFMA FLOPs (F8):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
+            tips:
+          MFMA FLOPs (BF16):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
+            tips:
+          MFMA FLOPs (F16):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
+            tips:
+          MFMA FLOPs (F32):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
+            tips:
+          MFMA FLOPs (F64):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
+            tips:
+          MFMA FLOPs (F6F4):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
+            tips:
+          MFMA IOPs (Int8):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GIOP
+            peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
+            tips:
+          Active CUs:
+            value: $numActiveCUs
+            unit: CUs
+            peak: $cu_per_gpu
+            pop: ((100 * $numActiveCUs) / $cu_per_gpu)
+            tips:
+          SALU Utilization:
+            value: AVG(((100 * SQ_ACTIVE_INST_SCA) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
+            unit: pct
+            peak: 100
+            pop: AVG(((100 * SQ_ACTIVE_INST_SCA) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
+            tips:
+          VALU Utilization:
+            value: AVG(((100 * SQ_ACTIVE_INST_VALU) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
+            unit: pct
+            peak: 100
+            pop: AVG(((100 * SQ_ACTIVE_INST_VALU) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
+            tips:
+          MFMA Utilization:
+            value: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / (($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)
+              * 4)))
+            unit: pct
+            peak: 100
+            pop: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / (($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)
+              * 4)))
+            tips:
+          VMEM Utilization:
+            value: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            unit: pct
+            peak: 100
+            pop: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            tips:
+          Branch Utilization:
+            value: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            unit: pct
+            peak: 100
+            pop: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            tips:
+          VALU Active Threads:
+            value: AVG(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
+              != 0) else None))
+            unit: Threads
+            peak: $wave_size
+            pop: (100 * AVG((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU / $wave_size) if (SQ_ACTIVE_INST_VALU != 0) else None))
+            tips:
+          IPC:
+            value: AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))
+            unit: Instr/cycle
+            peak: 5
+            pop: ((100 * AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))) / 5)
+            tips:
+          Wavefront Occupancy:
+            value: AVG((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
+            unit: Wavefronts
+            peak: ($max_waves_per_cu * $cu_per_gpu)
+            pop: (100 * AVG(((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD) / ($max_waves_per_cu
+              * $cu_per_gpu))))
+            coll_level: SQ_LEVEL_WAVES
+            tips:
+          Theoretical LDS Bandwidth:
+            value: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
+              / (End_Timestamp - Start_Timestamp)))
+            unit: GB/s
+            peak: (($max_sclk * $cu_per_gpu) * 0.128)
+            pop: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
+              / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
+            tips:
+          LDS Bank Conflicts/Access:
+            value: AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
+              if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
+            unit: Conflicts/access
+            peak: 32
+            pop: ((100 * AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
+              if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))) / 32)
+            tips:
+          vL1D Cache Hit Rate:
+            value: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
+              + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
+              / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0) else
+              None))
+            unit: pct
+            peak: 100
+            pop: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) +
+              TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) /
+              TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0) else
+              None))
+            tips:
+          vL1D Cache BW:
+            value: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            unit: GB/s
+            peak: ((($max_sclk / 1000) * 128) * $cu_per_gpu)
+            pop: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
+            tips:
+          L2 Cache Hit Rate:
+            value: AVG((((100 * TCC_HIT_sum) / (TCC_HIT_sum + TCC_MISS_sum)) if ((TCC_HIT_sum
+              + TCC_MISS_sum) != 0) else None))
+            unit: pct
+            peak: 100
+            pop: AVG((((100 * TCC_HIT_sum) / (TCC_HIT_sum + TCC_MISS_sum)) if ((TCC_HIT_sum
+              + TCC_MISS_sum) != 0) else None))
+            tips:
+          L2 Cache BW:
+            value: AVG(((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            unit: GB/s
+            peak: ((($max_sclk / 1000) * 128) * TO_INT($total_l2_chan))
+            pop: ((100 * AVG(((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk / 1000) * 128) * TO_INT($total_l2_chan)))
+            tips:
+          L2-Fabric Read BW:
+            value: AVG((128 * TCC_BUBBLE_sum +
+                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
+                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
+            unit: GB/s
+            peak: $hbmBandwidth
+            pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
+                        64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
+                        32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
+            tips:
+          L2-Fabric Write BW:
+            value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
+              * 32)) / (End_Timestamp - Start_Timestamp)))
+            unit: GB/s
+            peak: $hbmBandwidth
+            pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
+              * 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
+            tips:
+          L2-Fabric Read Latency:
+            value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
+              != 0) else None))
+            unit: Cycles
+            peak: None
+            pop: None
+            tips:
+          L2-Fabric Write Latency:
+            value: AVG(((TCC_EA0_WRREQ_LEVEL_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
+              != 0) else None))
+            unit: Cycles
+            peak: None
+            pop: None
+            tips:
+          sL1D Cache Hit Rate:
+            value: AVG((((100 * SQC_DCACHE_HITS) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES))
+              if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES) != 0) else None))
+            unit: pct
+            peak: 100
+            pop: AVG((((100 * SQC_DCACHE_HITS) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES))
+              if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES) != 0) else None))
+            tips:
+          sL1D Cache BW:
+            value: AVG(((SQC_DCACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))
+            unit: GB/s
+            peak: ((($max_sclk / 1000) * 64) * $sqc_per_gpu)
+            pop: ((100 * AVG(((SQC_DCACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))) / ((($max_sclk
+              / 1000) * 64) * $sqc_per_gpu))
+            tips:
+          L1I Hit Rate:
+            value: AVG(((100 * SQC_ICACHE_HITS) / (SQC_ICACHE_HITS + SQC_ICACHE_MISSES)))
+            unit: pct
+            peak: 100
+            pop: AVG(((100 * SQC_ICACHE_HITS) / (SQC_ICACHE_HITS + SQC_ICACHE_MISSES)))
+            tips:
+          L1I BW:
+            value: AVG(((SQC_ICACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))
+            unit: GB/s
+            peak: ((($max_sclk / 1000) * 64) * $sqc_per_gpu)
+            pop: ((100 * AVG(((SQC_ICACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))) / ((($max_sclk
+              / 1000) * 64) * $sqc_per_gpu))
+            tips:
+          L1I Fetch Latency:
+            value: AVG((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
+            unit: Cycles
+            peak: None
+            pop: None
+            coll_level: SQ_IFETCH_LEVEL
+            tips:
@@ -0,0 +1,153 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 500
+  title: Command Processor (CPC/CPF)
+  data source:
+    - metric_table:
+        id: 501
+        title: Command Processor Fetcher
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          CPF Utilization:
+            avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
+              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
+            min: MIN((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
+              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
+            max: MAX((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
+              if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
+            unit: pct
+            tips:
+          CPF Stall:
+            avg: AVG((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
+              != 0) else None))
+            min: MIN((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
+              != 0) else None))
+            max: MAX((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
+              != 0) else None))
+            unit: pct
+            tips:
+          CPF-L2 Utilization:
+            avg: AVG((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
+              if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
+            min: MIN((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
+              if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
+            max: MAX((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
+              if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
+            unit: pct
+            tips:
+          CPF-L2 Stall:
+            avg: AVG((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
+              != 0) else None))
+            min: MIN((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
+              != 0) else None))
+            max: MAX((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
+              != 0) else None))
+            unit: pct
+            tips:
+          CPF-UTCL1 Stall:
+            avg: AVG(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
+              != 0) else None)
+            min: MIN(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
+              != 0) else None)
+            max: MAX(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
+              != 0) else None)
+            unit: pct
+            tips:
+
+    - metric_table:
+        id: 502
+        title: Packet Processor
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          CPC SYNC FIFO Full Rate:
+            avg: AVG((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
+            min: MIN((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
+            max: MAX((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
+            unit: pct
+            tips:
+          CPC CANE Stall Rate:
+            avg: AVG((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
+            min: MIN((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
+            max: MAX((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
+            unit: pct
+            tips:
+          CPC ADC Utilization:
+            avg: AVG((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
+            min: MIN((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
+            max: MAX((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
+            unit: pct
+            tips:
+          CPC Utilization:
+            avg: AVG((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
+              if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
+            min: MIN((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
+              if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
+            max: MAX((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
+              if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
+            unit: pct
+            tips:
+          CPC Stall Rate:
+            avg: AVG((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
+              != 0) else None))
+            min: MIN((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
+              != 0) else None))
+            max: MAX((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
+              != 0) else None))
+            unit: pct
+            tips:
+          CPC Packet Decoding Utilization:
+            avg: AVG((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
+            min: MIN((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
+            max: MAX((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
+            unit: pct
+            tips:
+          CPC-Workgroup Manager Utilization:
+            avg: AVG((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
+            min: MIN((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
+            max: MAX((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
+            unit: Pct
+            tips:
+          CPC-L2 Utilization:
+            avg: AVG((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
+              if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
+            min: MIN((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
+              if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
+            max: MAX((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
+              if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
+            unit: pct
+            tips:
+          CPC-UTCL1 Stall:
+            avg: AVG(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
+              != 0) else None)
+            min: MIN(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
+              != 0) else None)
+            max: MAX(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
+              != 0) else None)
+            unit: pct
+            tips:
+          CPC-UTCL2 Utilization:
+            avg: AVG((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
+              if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
+            min: MIN((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
+              if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
+            max: MAX((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
+              if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
+            unit: pct
+            tips:
@@ -0,0 +1,188 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 600
+  title: Workgroup Manager (SPI)
+  data source:
+    - metric_table:
+        id: 601
+        title: Workgroup Manager Utilizations
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          Schedule-Pipe Wave Occupancy:
+            avg: AVG(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
+            min: MIN(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
+            max: MAX(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
+            unit: Wave
+            tips:
+          Accelerator Utilization:
+            avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
+            min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
+            max: MAX(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
+            unit: Pct
+            tips:
+          Scheduler-Pipe Utilization:
+            avg: AVG(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
+            min: MIN(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
+            max: MAX(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
+            unit: Pct
+            tips:
+          Scheduler-Pipe Wave Utilization:
+            avg: AVG(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
+            min: MIN(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
+            max: MAX(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
+            unit: Pct
+            tips:
+          Workgroup Manager Utilization:
+            avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
+            min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
+            max: MAX(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
+            unit: Pct
+            tips:
+          Shader Engine Utilization:
+            avg: AVG(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
+            min: MIN(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
+            max: MAX(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
+            unit: Pct
+            tips:
+          SIMD Utilization:
+            avg: AVG(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
+          Dispatched Workgroups:
+            avg: AVG(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
+            min: MIN(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
+            max: MAX(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
+            unit: Workgroups
+            tips:
+          Dispatched Wavefronts:
+            avg: AVG(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
+            min: MIN(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
+            max: MAX(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
+            unit: Wavefronts
+            tips:
+          VGPR Writes:
+            avg: AVG((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
+              None))
+            min: MIN((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
+              None))
+            max: MAX((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
+              None))
+            unit: Cycles/wave
+            tips:
+          SGPR Writes:
+            avg: AVG((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
+              None))
+            min: MIN((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
+              None))
+            max: MAX((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
+              None))
+            unit: Cycles/wave
+            tips:
+    - metric_table:
+        id: 602
+        title: Workgroup Manager - Resource Allocation
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          Not-scheduled Rate (Workgroup Manager):
+            avg: AVG((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            min: MIN((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            max: MAX((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            unit: Pct
+            tips:
+          Not-scheduled Rate (Scheduler-Pipe):
+            avg: AVG((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            min: MIN((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            max: MAX((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            unit: Pct
+            tips:
+          Scheduler-Pipe FIFO Full Rate:
+            avg: AVG((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            min: MIN((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            max: MAX((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None)
+            unit: Pct
+            tips:
+          Scheduler-Pipe Stall Rate:
+            avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None))
+            min: MIN((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None))
+            max: MAX((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
+              0) else None))
+            unit: Pct
+            tips:
+          Scratch Stall Rate:
+            avg: AVG((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
+            min: MIN((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
+            max: MAX((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
+            unit: Pct
+            tips:
+          Insufficient SIMD Waveslots:
+            avg: AVG(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
+          Insufficient SIMD VGPRs:
+            avg: AVG(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
+          Insufficient SIMD SGPRs:
+            avg: AVG(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
+          Insufficient CU LDS:
+            avg: AVG(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
+          Insufficient CU Barriers:
+            avg: AVG(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
+          Reached CU Workgroup Limit:
+            avg: AVG(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
+          Reached CU Wavefront Limit:
+            avg: AVG(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            min: MIN(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            max: MAX(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
+            unit: Pct
+            tips:
@@ -0,0 +1,142 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 700
+  title: Wavefront
+  data source:
+    - metric_table:
+        id: 701
+        title: Wavefront Launch Stats
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          Grid Size:
+            avg: AVG(Grid_Size)
+            min: MIN(Grid_Size)
+            max: MAX(Grid_Size)
+            unit: Work Items
+            tips:
+          Workgroup Size:
+            avg: AVG(Workgroup_Size)
+            min: MIN(Workgroup_Size)
+            max: MAX(Workgroup_Size)
+            unit: Work Items
+            tips:
+          Total Wavefronts:
+            avg: AVG(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
+            min: MIN(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
+            max: MAX(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
+            unit: Wavefronts
+            tips:
+          Saved Wavefronts:
+            avg: AVG(SQ_WAVES_SAVED)
+            min: MIN(SQ_WAVES_SAVED)
+            max: MAX(SQ_WAVES_SAVED)
+            unit: Wavefronts
+            tips:
+          Restored Wavefronts:
+            avg: AVG(SQ_WAVES_RESTORED)
+            min: MIN(SQ_WAVES_RESTORED)
+            max: MAX(SQ_WAVES_RESTORED)
+            unit: Wavefronts
+            tips:
+          VGPRs:
+            avg: AVG(Arch_VGPR)
+            min: MIN(Arch_VGPR)
+            max: MAX(Arch_VGPR)
+            unit: Registers
+            tips:
+          AGPRs:
+            avg: AVG(Accum_VGPR)
+            min: MIN(Accum_VGPR)
+            max: MAX(Accum_VGPR)
+            unit: Registers
+            tips:
+          SGPRs:
+            avg: AVG(SGPR)
+            min: MIN(SGPR)
+            max: MAX(SGPR)
+            unit: Registers
+            tips:
+          LDS Allocation:
+            avg: AVG(LDS_Per_Workgroup)
+            min: MIN(LDS_Per_Workgroup)
+            max: MAX(LDS_Per_Workgroup)
+            unit: Bytes
+            tips:
+          Scratch Allocation:
+            avg: AVG(Scratch_Per_Workitem)
+            min: MIN(Scratch_Per_Workitem)
+            max: MAX(Scratch_Per_Workitem)
+            unit: Bytes/Workitem
+            tips:
+
+    - metric_table:
+        id: 702
+        title: Wavefront Runtime Stats
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          Kernel Time (Nanosec):
+            avg: AVG((End_Timestamp - Start_Timestamp))
+            min: MIN((End_Timestamp - Start_Timestamp))
+            max: MAX((End_Timestamp - Start_Timestamp))
+            unit: ns
+            tips:
+          Kernel Time (Cycles):
+            avg: AVG($GRBM_GUI_ACTIVE_PER_XCD)
+            min: MIN($GRBM_GUI_ACTIVE_PER_XCD)
+            max: MAX($GRBM_GUI_ACTIVE_PER_XCD)
+            unit: Cycle
+            tips:
+          Instructions per wavefront:
+            avg: AVG((SQ_INSTS / SQ_WAVES))
+            min: MIN((SQ_INSTS / SQ_WAVES))
+            max: MAX((SQ_INSTS / SQ_WAVES))
+            unit: Instr/wavefront
+            tips:
+          Wave Cycles:
+            avg: AVG(((4 * SQ_WAVE_CYCLES) / $denom))
+            min: MIN(((4 * SQ_WAVE_CYCLES) / $denom))
+            max: MAX(((4 * SQ_WAVE_CYCLES) / $denom))
+            unit: (Cycles + $normUnit)
+            tips:
+          Dependency Wait Cycles:
+            avg: AVG(((4 * SQ_WAIT_ANY) / $denom))
+            min: MIN(((4 * SQ_WAIT_ANY) / $denom))
+            max: MAX(((4 * SQ_WAIT_ANY) / $denom))
+            unit: (Cycles + $normUnit)
+            tips:
+          Issue Wait Cycles:
+            avg: AVG(((4 * SQ_WAIT_INST_ANY) / $denom))
+            min: MIN(((4 * SQ_WAIT_INST_ANY) / $denom))
+            max: MAX(((4 * SQ_WAIT_INST_ANY) / $denom))
+            unit: (Cycles + $normUnit)
+            tips:
+          Active Cycles:
+            avg: AVG(((4 * SQ_ACTIVE_INST_ANY) / $denom))
+            min: MIN(((4 * SQ_ACTIVE_INST_ANY) / $denom))
+            max: MAX(((4 * SQ_ACTIVE_INST_ANY) / $denom))
+            unit: (Cycles + $normUnit)
+            tips:
+          Wavefront Occupancy:
+            avg: AVG((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
+            min: MIN((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
+            max: MAX((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
+            unit: Wavefronts
+            coll_level: SQ_LEVEL_WAVES
+            tips:
@@ -185,15 +185,6 @@ Panel Config:
            max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
            unit: (instr + $normUnit)
            tips:
-          Global/Generic Coalesceable Instr:
-            avg: None
-            # AVG((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
-            min: None
-            # MIN((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
-            max: None
-            # MAX((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
-            unit: (instr + $normUnit)
-            tips:
          Global/Generic Read:
            avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
            min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
@@ -290,3 +281,9 @@ Panel Config:
            max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
            unit: (instr + $normUnit)
            tips:
+          MFMA-F6F4:
+            avg: AVG((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
+            min: MIN((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
+            max: MAX((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
+            unit: (instr + $normUnit)
+            tips:
@@ -0,0 +1,293 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 1100
+  title: Compute Units - Compute Pipeline
+  data source:
+    - metric_table:
+        id: 1101
+        title: Speed-of-Light
+        header:
+          metric: Metric
+          value: Avg
+          unit: Unit
+          peak: Peak
+          pop: Pct of Peak
+          tips: Tips
+        metric:
+          VALU FLOPs:
+            value: AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+              + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+              + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32)))) + (64 * (((SQ_INSTS_VALU_ADD_F64
+              + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64) + (2 * SQ_INSTS_VALU_FMA_F64))))
+              / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
+            pop: ((100 * AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16)
+              + SQ_INSTS_VALU_TRANS_F16) + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32
+              + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32))))
+              + (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (2 * SQ_INSTS_VALU_FMA_F64)))) / (End_Timestamp - Start_Timestamp)))) / (((($max_sclk
+              * $cu_per_gpu) * 64) * 2) / 1000))
+            tips:
+          VALU IOPs:
+            value: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp - Start_Timestamp)))
+            unit: GIOP
+            peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
+            pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
+              - Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
+            tips:
+          MFMA FLOPs (F8):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
+            tips:
+          MFMA FLOPs (BF16):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
+            tips:
+          MFMA FLOPs (F16):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
+            tips:
+          MFMA FLOPs (F32):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
+            tips:
+          MFMA FLOPs (F64):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
+            tips:
+          MFMA FLOPs (F6F4):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GFLOP
+            peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 256) / 1000))
+            tips:
+          MFMA IOPs (INT8):
+            value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
+            unit: GIOP
+            peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
+            pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp))))
+              / ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
+            tips:
+
+    - metric_table:
+        id: 1102
+        title: Pipeline Stats
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          IPC:
+            avg: AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))
+            min: MIN((SQ_INSTS / SQ_BUSY_CU_CYCLES))
+            max: MAX((SQ_INSTS / SQ_BUSY_CU_CYCLES))
+            unit: Instr/cycle
+            tips:
+          IPC (Issued):
+            avg: AVG(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
+              + SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED  + SQ_INSTS_LDS)
+              / SQ_ACTIVE_INST_ANY))
+            min: MIN(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
+              + SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED + SQ_INSTS_LDS)
+              / SQ_ACTIVE_INST_ANY))
+            max: MAX(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
+              + SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED  + SQ_INSTS_LDS)
+              / SQ_ACTIVE_INST_ANY))
+            unit: Instr/cycle
+            tips:
+          SALU Utilization:
+            avg: AVG((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            min: MIN((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            max: MAX((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            unit: pct
+            tips:
+          VALU Utilization:
+            avg: AVG((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            min: MIN((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            unit: pct
+            tips:
+          # Precentage of VALU instructions which are issued to two VALUs at a time
+          VALU Co-Issue Efficiency:
+            avg: AVG((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
+            min: MIN((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
+            max: MAX((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
+            unit: pct
+            tips:
+          VMEM Utilization:
+            avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            max: MAX((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            unit: pct
+            tips:
+          Branch Utilization:
+            avg: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            min: MIN((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            max: MAX((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
+            unit: pct
+            tips:
+          VALU Active Threads:
+            avg: AVG(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
+              != 0) else None))
+            min: MIN(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
+              != 0) else None))
+            max: MAX(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
+              != 0) else None))
+            unit: Threads
+            tips:
+          MFMA Utilization:
+            avg: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
+            min: MIN(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
+            max: MAX(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
+            unit: pct
+            tips:
+          MFMA Instr Cycles:
+            avg: AVG(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
+              else None))
+            min: MIN(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
+              else None))
+            max: MAX(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
+              else None))
+            unit: cycles/instr
+            tips:
+          VMEM Latency:
+            avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
+              else None))
+            min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
+              else None))
+            max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
+              else None))
+            unit: Cycles
+            coll_level: SQ_INST_LEVEL_VMEM
+            tips:
+          SMEM Latency:
+            avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
+              else None))
+            min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
+              else None))
+            max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
+              else None))
+            unit: Cycles
+            coll_level: SQ_INST_LEVEL_SMEM
+            tips:
+
+    - metric_table:
+        id: 1103
+        title: Arithmetic Operations
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          FLOPs (Total):
+            avg: AVG((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+              + (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
+              * SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+              + SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
+              + (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
+              $denom))
+            min: MIN((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+              + (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
+              * SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+              + SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
+              + (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
+              $denom))
+            max: MAX((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+              + (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
+              * SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+              + SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
+              + (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
+              $denom))
+            unit: (OPs  + $normUnit)
+            tips:
+          IOPs (Total):
+            avg: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
+            min: MIN(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
+            max: MAX(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
+            unit: (OPs  + $normUnit)
+            tips:
+          F8 OPs:
+            avg: AVG(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
+            min: MIN(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
+            max: MAX(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
+            unit: (OPs  + $normUnit)
+            tips:
+          F16 OPs:
+            avg: AVG(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
+              (64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
+              SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
+            min: MIN(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
+              (64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
+              SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
+            max: MAX(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
+              (64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
+              SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
+            unit: (OPs  + $normUnit)
+            tips:
+          BF16 OPs:
+            avg: AVG(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
+            min: MIN(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
+            max: MAX(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
+            unit: (OPs  + $normUnit)
+            tips:
+          F32 OPs:
+            avg: AVG((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
+              + (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
+            min: MIN((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
+              + (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
+            max: MAX((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
+              + (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
+            unit: (OPs  + $normUnit)
+            tips:
+          F64 OPs:
+            avg: AVG((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
+            min: MIN((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
+            max: MAX((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+              + (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
+            unit: (OPs  + $normUnit)
+            tips:
+          F6F4 OPs:
+            avg: AVG((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
+            min: MIN((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
+            max: MAX((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
+            unit: (OPs  + $normUnit)
+            tips:
+          INT8 OPs:
+            avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
+            min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
+            max: MAX(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
+            unit: (OPs  + $normUnit)
+            tips:
@@ -0,0 +1,166 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 1200
+  title: Local Data Share (LDS)
+  data source:
+    - metric_table:
+        id: 1201
+        title: Speed-of-Light
+        header:
+          metric: Metric
+          value: Avg
+          unit: Unit
+          tips: Tips
+        metric:
+          Utilization:
+            value: AVG(((100 * SQ_LDS_IDX_ACTIVE) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
+            unit: Pct of Peak
+            tips:
+          Access Rate:
+            value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
+            unit: Pct of Peak
+            tips:
+          Theoretical Bandwidth (% of Peak):
+            value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
+              / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
+            unit: Pct of Peak
+            tips:
+          Bank Conflict Rate:
+            value: AVG((((SQ_LDS_BANK_CONFLICT * 3.125) / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
+              if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
+            unit: Pct of Peak
+            tips:
+        comparable: false # for now
+        cli_style: simple_bar
+
+    - metric_table:
+        id: 1202
+        title: LDS Stats
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          LDS Instrs:
+            avg: AVG((SQ_INSTS_LDS / $denom))
+            min: MIN((SQ_INSTS_LDS / $denom))
+            max: MAX((SQ_INSTS_LDS / $denom))
+            unit: (Instr  + $normUnit)
+            tips:
+          LDS LOAD:
+            avg: AVG((SQ_INSTS_LDS_LOAD / $denom))
+            min: MIN((SQ_INSTS_LDS_LOAD  / $denom))
+            max: MAX((SQ_INSTS_LDS_LOAD  / $denom))
+            unit: (instr + $normUnit)
+            tips:
+          LDS STORE:
+            avg: AVG((SQ_INSTS_LDS_STORE / $denom))
+            min: MIN((SQ_INSTS_LDS_STORE / $denom))
+            max: MAX((SQ_INSTS_LDS_STORE / $denom))
+            unit: (instr + $normUnit)
+            tips:
+          LDS ATOMIC:
+            avg: AVG((SQ_INSTS_LDS_ATOMIC / $denom))
+            min: MIN((SQ_INSTS_LDS_ATOMIC / $denom))
+            max: MAX((SQ_INSTS_LDS_ATOMIC / $denom))
+            unit: (instr + $normUnit)
+            tips:
+          LDS LOAD Bandwidth:
+            avg: AVG(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            min: MIN(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            max: MAX(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            units: Gbps
+            tips:
+          LDS STORE Bandwidth:
+            avg: AVG(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            min: MIN(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            max: MAX(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            units: Gbps
+            tips:
+          LDS ATOMIC Bandwidth:
+            avg: AVG(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            min: MIN(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            max: MAX(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
+            units: Gbps
+            tips:
+          Theoretical Bandwidth:
+            avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
+              / $denom))
+            min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
+              / $denom))
+            max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
+              / $denom))
+            unit: (Bytes  + $normUnit)
+            tips:
+          LDS Latency:
+            avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
+            min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
+            max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
+            unit: Cycles
+            coll_level: SQ_INST_LEVEL_LDS
+            tips:
+          Bank Conflicts/Access:
+            avg: AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
+              if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
+            min: MIN(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
+              if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
+            max: MAX(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
+              if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
+            unit: Conflicts/Access
+            tips:
+          Index Accesses:
+            avg: AVG((SQ_LDS_IDX_ACTIVE / $denom))
+            min: MIN((SQ_LDS_IDX_ACTIVE / $denom))
+            max: MAX((SQ_LDS_IDX_ACTIVE / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
+          Atomic Return Cycles:
+            avg: AVG((SQ_LDS_ATOMIC_RETURN / $denom))
+            min: MIN((SQ_LDS_ATOMIC_RETURN / $denom))
+            max: MAX((SQ_LDS_ATOMIC_RETURN / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
+          Bank Conflict:
+            avg: AVG((SQ_LDS_BANK_CONFLICT / $denom))
+            min: MIN((SQ_LDS_BANK_CONFLICT / $denom))
+            max: MAX((SQ_LDS_BANK_CONFLICT / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
+          Addr Conflict:
+            avg: AVG((SQ_LDS_ADDR_CONFLICT / $denom))
+            min: MIN((SQ_LDS_ADDR_CONFLICT / $denom))
+            max: MAX((SQ_LDS_ADDR_CONFLICT / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
+          Unaligned Stall:
+            avg: AVG((SQ_LDS_UNALIGNED_STALL / $denom))
+            min: MIN((SQ_LDS_UNALIGNED_STALL / $denom))
+            max: MAX((SQ_LDS_UNALIGNED_STALL / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
+          Mem Violations:
+            avg: AVG((SQ_LDS_MEM_VIOLATIONS / $denom))
+            min: MIN((SQ_LDS_MEM_VIOLATIONS / $denom))
+            max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
+            unit: (Accesses + $normUnit)
+            tips:
+          LDS Command FIFO Full Rate:
+            avg: AVG((SQ_LDS_CMD_FIFO_FULL / $denom))
+            min: MIN((SQ_LDS_CMD_FIFO_FULL / $denom))
+            max: MAX((SQ_LDS_CMD_FIFO_FULL / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
+          LDS Data FIFO Full Rate:
+            avg: AVG((SQ_LDS_DATA_FIFO_FULL / $denom))
+            min: MIN((SQ_LDS_DATA_FIFO_FULL / $denom))
+            max: MAX((SQ_LDS_DATA_FIFO_FULL / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -0,0 +1,105 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 1300
+  title: Instruction Cache
+  data source:
+    - metric_table:
+        id: 1301
+        title: Speed-of-Light
+        header:
+          metric: Metric
+          value: Avg
+          unit: Unit
+          tips: Tips
+        metric:
+          Bandwidth:
+            value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu)
+              * (End_Timestamp - Start_Timestamp))))
+            unit: Pct of Peak
+            tips:
+          Cache Hit Rate:
+            value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+              + SQC_ICACHE_MISSES_DUPLICATE)))
+            unit: Pct of Peak
+            tips:
+          L1I-L2 Bandwidth:
+            value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
+              * (End_Timestamp - Start_Timestamp))))
+            unit: Pct of Peak
+            tips:
+        comparable: false # for now
+        cli_style: simple_bar
+
+    - metric_table:
+        id: 1302
+        title: Instruction Cache Accesses
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          Req:
+            avg: AVG((SQC_ICACHE_REQ / $denom))
+            min: MIN((SQC_ICACHE_REQ / $denom))
+            max: MAX((SQC_ICACHE_REQ / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Hits:
+            avg: AVG((SQC_ICACHE_HITS / $denom))
+            min: MIN((SQC_ICACHE_HITS / $denom))
+            max: MAX((SQC_ICACHE_HITS / $denom))
+            unit: (Hits  + $normUnit)
+            tips:
+          Misses - Non Duplicated:
+            avg: AVG((SQC_ICACHE_MISSES / $denom))
+            min: MIN((SQC_ICACHE_MISSES / $denom))
+            max: MAX((SQC_ICACHE_MISSES / $denom))
+            unit: (Misses  + $normUnit)
+            tips:
+          Misses - Duplicated:
+            avg: AVG((SQC_ICACHE_MISSES_DUPLICATE / $denom))
+            min: MIN((SQC_ICACHE_MISSES_DUPLICATE / $denom))
+            max: MAX((SQC_ICACHE_MISSES_DUPLICATE / $denom))
+            unit: (Misses  + $normUnit)
+            tips:
+          Cache Hit Rate:
+            avg: AVG(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+              + SQC_ICACHE_MISSES_DUPLICATE)))
+            min: MIN(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+              + SQC_ICACHE_MISSES_DUPLICATE)))
+            max: MAX(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+              + SQC_ICACHE_MISSES_DUPLICATE)))
+            unit: pct
+            tips:
+          Instruction Fetch Latency:
+            avg: AVG((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
+            min: MIN((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
+            max: MAX((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
+            unit: Cycles
+            coll_level: SQ_IFETCH_LEVEL
+            tips:
+    - metric_table:
+        id: 1303
+        title: Instruction Cache - L2 Interface
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          L1I-L2 Bandwidth:
+            avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
+            min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
+            max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
+            unit: (Bytes + $normUnit)
+            tips:
@@ -0,0 +1,171 @@
+---
+# Add description/tips for each metric in this section.
+# So it could be shown in hover.
+Metric Description:
+
+# Define the panel properties and properties of each metric in the panel.
+Panel Config:
+  id: 1400
+  title: Scalar L1 Data Cache
+  data source:
+    - metric_table:
+        id: 1401
+        title: Speed-of-Light
+        header:
+          metric: Metric
+          value: Avg
+          unit: Unit
+          tips: Tips
+        metric:
+          Bandwidth:
+            value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu)
+              * (End_Timestamp - Start_Timestamp))))
+            unit: Pct of Peak
+            tips:
+          Cache Hit Rate:
+            value: AVG((((SQC_DCACHE_HITS * 100) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES + SQC_DCACHE_MISSES_DUPLICATE))
+              if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
+            unit: Pct of Peak
+            tips:
+          sL1D-L2 BW:
+            value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 100000)
+                        / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
+            unit: Pct of Peak
+            tips:
+        comparable: false # for now
+        cli_style: simple_bar
+
+    - metric_table:
+        id: 1402
+        title: Scalar L1D Cache Accesses
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          Req:
+            avg: AVG((SQC_DCACHE_REQ / $denom))
+            min: MIN((SQC_DCACHE_REQ / $denom))
+            max: MAX((SQC_DCACHE_REQ / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Hits:
+            avg: AVG((SQC_DCACHE_HITS / $denom))
+            min: MIN((SQC_DCACHE_HITS / $denom))
+            max: MAX((SQC_DCACHE_HITS / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Misses - Non Duplicated:
+            avg: AVG((SQC_DCACHE_MISSES / $denom))
+            min: MIN((SQC_DCACHE_MISSES / $denom))
+            max: MAX((SQC_DCACHE_MISSES / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Misses- Duplicated:
+            avg: AVG((SQC_DCACHE_MISSES_DUPLICATE / $denom))
+            min: MIN((SQC_DCACHE_MISSES_DUPLICATE / $denom))
+            max: MAX((SQC_DCACHE_MISSES_DUPLICATE / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Cache Hit Rate:
+            avg: AVG((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+              + SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+              + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
+            min: MIN((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+              + SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+              + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
+            max: MAX((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+              + SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+              + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
+            unit: pct
+            tips:
+          Read Req (Total):
+            avg: AVG((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
+              + SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
+            min: MIN((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
+              + SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
+            max: MAX((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
+              + SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Atomic Req:
+            avg: AVG((SQC_DCACHE_ATOMIC / $denom))
+            min: MIN((SQC_DCACHE_ATOMIC / $denom))
+            max: MAX((SQC_DCACHE_ATOMIC / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Read Req (1 DWord):
+            avg: AVG((SQC_DCACHE_REQ_READ_1 / $denom))
+            min: MIN((SQC_DCACHE_REQ_READ_1 / $denom))
+            max: MAX((SQC_DCACHE_REQ_READ_1 / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Read Req (2 DWord):
+            avg: AVG((SQC_DCACHE_REQ_READ_2 / $denom))
+            min: MIN((SQC_DCACHE_REQ_READ_2 / $denom))
+            max: MAX((SQC_DCACHE_REQ_READ_2 / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Read Req (4 DWord):
+            avg: AVG((SQC_DCACHE_REQ_READ_4 / $denom))
+            min: MIN((SQC_DCACHE_REQ_READ_4 / $denom))
+            max: MAX((SQC_DCACHE_REQ_READ_4 / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Read Req (8 DWord):
+            avg: AVG((SQC_DCACHE_REQ_READ_8 / $denom))
+            min: MIN((SQC_DCACHE_REQ_READ_8 / $denom))
+            max: MAX((SQC_DCACHE_REQ_READ_8 / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Read Req (16 DWord):
+            avg: AVG((SQC_DCACHE_REQ_READ_16 / $denom))
+            min: MIN((SQC_DCACHE_REQ_READ_16 / $denom))
+            max: MAX((SQC_DCACHE_REQ_READ_16 / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+
+    - metric_table:
+        id: 1403
+        title: Scalar L1D Cache - L2 Interface
+        header:
+          metric: Metric
+          avg: Avg
+          min: Min
+          max: Max
+          unit: Unit
+          tips: Tips
+        metric:
+          sL1D-L2 BW:
+            avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
+            min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
+            max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
+            unit: (Bytes + $normUnit)
+            tips:
+          Read Req:
+            avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
+            min: MIN((SQC_TC_DATA_READ_REQ / $denom))
+            max: MAX((SQC_TC_DATA_READ_REQ / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Write Req:
+            avg: AVG((SQC_TC_DATA_WRITE_REQ / $denom))
+            min: MIN((SQC_TC_DATA_WRITE_REQ / $denom))
+            max: MAX((SQC_TC_DATA_WRITE_REQ / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Atomic Req:
+            avg: AVG((SQC_TC_DATA_ATOMIC_REQ / $denom))
+            min: MIN((SQC_TC_DATA_ATOMIC_REQ / $denom))
+            max: MAX((SQC_TC_DATA_ATOMIC_REQ / $denom))
+            unit: (Req  + $normUnit)
+            tips:
+          Stall Cycles:
+            avg: AVG((SQC_TC_STALL / $denom))
+            min: MIN((SQC_TC_STALL / $denom))
+            max: MAX((SQC_TC_STALL / $denom))
+            unit: (Cycles  + $normUnit)
+            tips:
@@ -43,6 +43,24 @@ Panel Config:
            max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
            unit: pct
            tips:
+          Sequencer → TA Address Stall:
+            avg: AVG((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
+            min: MIN((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
+            max: MAX((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
+            unit: (Cycles + $normUnit)
+            tips:
+          Sequencer → TA Command Stall:
+            avg: AVG((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
+            min: MIN((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
+            max: MAX((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
+            unit: (Cycles + $normUnit)
+            tips:
+          Sequencer → TA Data Stall:
+            avg: AVG((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
+            min: MIN((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
+            max: MAX((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
+            unit: (Cycles + $normUnit)
+            tips:
          Total Instructions:
            avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
            min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -32,12 +32,12 @@ Panel Config:
            tips:
          L2-Fabric Read BW:
            value: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum
-              * 128)) / (End_Timestamp - Start_Timestamp))
+              * 128)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:
          L2-Fabric Write and Atomic BW:
            value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / (End_Timestamp - Start_Timestamp))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
            unit: GB/s
            tips:

@@ -52,6 +52,15 @@ Panel Config:
          unit: Unit
          tips: Tips
        metric:
+          Read BW:
+            avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
+              * 64)) / $denom))
+            min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
+              * 64)) / $denom))
+            max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
+              * 64)) / $denom))
+            unit: (Bytes  + $normUnit)
+            tips:
          Read BW:
            avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
            min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
@@ -457,13 +466,13 @@ Panel Config:
            max: MAX((TCC_EA0_RD_UNCACHED_32B_sum / $denom))
            unit: (Req  + $normUnit)
            tips:
-          Read - HBM:
+          HBM Read:
            avg: AVG((TCC_EA0_RDREQ_DRAM_sum / $denom))
            min: MIN((TCC_EA0_RDREQ_DRAM_sum / $denom))
            max: MAX((TCC_EA0_RDREQ_DRAM_sum / $denom))
            unit: (Req  + $normUnit)
            tips:
-          Read - Remote:
+          Remote Read:
            avg: AVG((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
            min: MIN((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
            max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
@@ -505,13 +514,13 @@ Panel Config:
            max: MAX((TCC_EA0_WRREQ_64B_sum / $denom))
            unit: (Req  + $normUnit)
            tips:
-          Write - HBM:
+          HBM Write and Atomic:
            avg: AVG((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
            min: MIN((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
            max: MAX((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
            unit: (Req  + $normUnit)
            tips:
-          Write and Atomic - Remote:
+          Remote Write and Atomic:
            avg: AVG((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
            min: MIN((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
            max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
@@ -0,0 +1,9 @@
+---
+Panel Config:
+  id: 2100
+  title: PC Sampling
+  data source:
+    - pc_sampling_table:
+        id: 2101
+        source: ps_file
+        comparable: false # enable it later
@@ -42,15 +42,16 @@ from utils.logger import (
    console_warning,
    demarcate,
 )
-from utils.mi_gpu_spec import get_gpu_model, get_gpu_series
+from utils.mi_gpu_spec import get_gpu_model, get_gpu_series, get_num_xcds
 from utils.parser import build_in_vars, supported_denom
 from utils.utils import (
    capture_subprocess_output,
    convert_metric_id_to_panel_idx,
    detect_rocprof,
+    get_base_spi_pipe_counter,
    get_submodules,
+    is_spi_pipe_counter,
    is_tcc_channel_counter,
-    total_xcds,
    using_v3,
 )

@@ -186,7 +187,7 @@ class OmniSoC_Base:
            self._mspec.gpu_arch, self._mspec.gpu_chip_id
        )
        self._mspec.num_xcd = str(
-            total_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
+            get_num_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
        )

    @demarcate
@@ -316,10 +317,10 @@ class OmniSoC_Base:
            counters = counters - {"SQ_INSTS_VALU_MFMA_F8", "SQ_INSTS_VALU_MFMA_MOPS_F8"}

        # Following counters are not supported
-        # TCP_TCP_LATENCY_sum (except for gfx908 and gfx90a)
+        # TCP_TCP_LATENCY_sum (except for gfx950)
        # SQC_DCACHE_INFLIGHT_LEVEL
        counters = counters - {"SQC_DCACHE_INFLIGHT_LEVEL"}
-        if self.__arch not in ("gfx908", "gfx90a"):
+        if self.__arch != "gfx950":
            counters = counters - {"TCP_TCP_LATENCY_sum"}

        # SQ_ACCUM_PREV_HIRES will be injected for level counters later on
@@ -510,6 +511,8 @@ class OmniSoC_Base:
        file_count = 0
        # Store all channels for a TCC channel counter in the same file
        tcc_channel_counter_file_map = dict()
+        # Store all pipes for SPI pipe counters in the same file
+        spi_pipe_counter_file_map = dict()
        for ctr in counters:
            # Store all channels for a TCC channel counter in the same file
            if is_tcc_channel_counter(ctr):
@@ -517,13 +520,27 @@ class OmniSoC_Base:
                if output_file:
                    output_file.add(ctr)
                    continue
+            # Store all pipes for SPI pipe counters in the same file
+            if is_spi_pipe_counter(ctr):
+                output_file = spi_pipe_counter_file_map.get(
+                    get_base_spi_pipe_counter(ctr)
+                )
+                if output_file:
+                    output_file.add(ctr)
+                    continue
            # Add counter to first file that has room
            added = False
            for i in range(len(output_files)):
                if output_files[i].add(ctr):
                    added = True
+                    # Store all channels for a TCC channel counter in the same file
                    if is_tcc_channel_counter(ctr):
                        tcc_channel_counter_file_map[ctr.split("[")[0]] = output_files[i]
+                    # Store all pipes for SPI pipe counters in the same file
+                    if is_spi_pipe_counter(ctr):
+                        spi_pipe_counter_file_map[get_base_spi_pipe_counter(ctr)] = (
+                            output_files[i]
+                        )
                    break

            # All files are full, create a new file
@@ -711,8 +728,18 @@ class LimitedSet:
        if e.split("[")[0] in {element.split("[")[0] for element in self.elements}:
            self.elements.append(e)
            return True
+        # Store all pipes for SPI pipe counters in the same file
+        if is_spi_pipe_counter(e) and get_base_spi_pipe_counter(e) in {
+            get_base_spi_pipe_counter(element) for element in self.elements
+        }:
+            self.elements.append(e)
+            return True
        if self.avail > 0:
-            self.avail -= 1
+            # SPI pipe counters take space of 2 counters
+            if is_spi_pipe_counter(e):
+                self.avail -= 2
+            else:
+                self.avail -= 1
            self.elements.append(e)
            return True
        return False
@@ -54,10 +54,6 @@ class gfx908_soc(OmniSoC_Base):
        self._mspec._l2_banks = 32
        self._mspec.lds_banks_per_cu = 32
        self._mspec.pipes_per_gpu = 4
-        # --showmclkrange is broken in Mi100, hardcode freq
-        if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
-            self._mspec.max_mclk = 1200
-            self._mspec.cur_mclk = 1200

    # -----------------------
    # Required child methods
@@ -64,12 +64,6 @@ class gfx90a_soc(OmniSoC_Base):
        )
        self.roofline_obj = Roofline(args, self._mspec)

-        # Workaround for broken --showmclkrange
-        # MI210/MI250/MI250X have 1600MHz mclk
-        if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
-            self._mspec.max_mclk = 1600
-            self._mspec.cur_mclk = 1600
-
        # Set arch specific specs
        self._mspec._l2_banks = 32
        self._mspec.lds_banks_per_cu = 32
@@ -64,12 +64,6 @@ class gfx942_soc(OmniSoC_Base):
        )
        self.roofline_obj = Roofline(args, self._mspec)

-        # Workaround for broken --showmclkrange
-        # MI300X/MI300A/MI308X have 1300MHz mclk
-        if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
-            self._mspec.max_mclk = 1300
-            self._mspec.cur_mclk = 1300
-
        # Set arch specific specs
        self._mspec._l2_banks = 16
        self._mspec.lds_banks_per_cu = 32
@@ -0,0 +1,117 @@
+##############################################################################bl
+# MIT License
+#
+# Copyright (c) 2021 - 2025 Advanced Micro Devices, Inc. All Rights Reserved.
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so, subject to the following conditions:
+#
+# The above copyright notice and this permission notice shall be included in all
+# copies or substantial portions of the Software.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+##############################################################################el
+
+from pathlib import Path
+
+import config
+from rocprof_compute_soc.soc_base import OmniSoC_Base
+from roofline import Roofline
+from utils.logger import demarcate
+from utils.utils import console_error, console_log, mibench
+
+
+class gfx950_soc(OmniSoC_Base):
+    def __init__(self, args, mspec):
+        super().__init__(args, mspec)
+        self.set_arch("gfx950")
+        if hasattr(self.get_args(), "roof_only") and self.get_args().roof_only:
+            self.set_perfmon_dir(
+                str(
+                    Path(str(config.rocprof_compute_home)).joinpath(
+                        "rocprof_compute_soc",
+                        "profile_configs",
+                        "gfx950",
+                        "roofline",
+                    )
+                )
+            )
+        else:
+            # NB: We're using generalized Mi300 perfmon configs
+            self.set_perfmon_dir(
+                str(
+                    Path(str(config.rocprof_compute_home)).joinpath(
+                        "rocprof_compute_soc",
+                        "profile_configs",
+                        "gfx950",
+                    )
+                )
+            )
+        self.set_compatible_profilers(["rocprofv3"])
+        # Per IP block max number of simultaneous counters. GFX IP Blocks
+        self.set_perfmon_config(
+            {
+                "SQ": 8,
+                "TA": 2,
+                "TD": 2,
+                "TCP": 4,
+                "TCC": 4,
+                "CPC": 2,
+                "CPF": 2,
+                "SPI": 2,
+                "GRBM": 2,
+                "GDS": 4,
+                "TCC_channels": 16,
+            }
+        )
+        self.roofline_obj = Roofline(args, self._mspec)
+
+        # Set arch specific specs
+        self._mspec._l2_banks = 16
+        self._mspec.lds_banks_per_cu = 32
+        self._mspec.pipes_per_gpu = 4
+
+    # -----------------------
+    # Required child methods
+    # -----------------------
+    @demarcate
+    def profiling_setup(self):
+        """Perform any SoC-specific setup prior to profiling."""
+        super().profiling_setup()
+        # Performance counter filtering
+        self.perfmon_filter(self.get_args().roof_only)
+
+    @demarcate
+    def post_profiling(self):
+        """Perform any SoC-specific post profiling activities."""
+        super().post_profiling()
+
+        if not self.get_args().no_roof:
+            console_log(
+                "roofline", "Checking for roofline.csv in " + str(self.get_args().path)
+            )
+            if not Path(self.get_args().path).joinpath("roofline.csv").is_file():
+                mibench(self.get_args(), self._mspec)
+            self.roofline_obj.post_processing()
+        else:
+            console_log("roofline", "Skipping roofline")
+
+    @demarcate
+    def analysis_setup(self, roofline_parameters=None):
+        """Perform any SoC-specific setup prior to analysis."""
+        super().analysis_setup()
+        # configure roofline for analysis
+        if roofline_parameters:
+            self.roofline_obj = Roofline(
+                self.get_args(), self._mspec, roofline_parameters
+            )
@@ -120,7 +120,7 @@ def discrete_background_color_bins(df, n_bins=5, columns="all"):
 ####################
 # GRAPHICAL ELEMENTS
 ####################
-def build_bar_chart(display_df, table_config, barchart_elements, norm_filt, hbm_bw):
+def build_bar_chart(display_df, table_config, barchart_elements, norm_filt):
    """
    Read data into a bar chart. ID will determine which subtype of barchart.
    """
@@ -214,6 +214,9 @@ def build_bar_chart(display_df, table_config, barchart_elements, norm_filt, hbm_
                    orientation="h",
                ).update_xaxes(range=[0, 110], ticks="inside", title="%")
            )  # append first % chart
+            hbm_bw = float(
+                display_df[display_df["Metric"] == "HBM Bandwidth"]["Avg"].iloc[0]
+            )
            d_figs.append(
                px.bar(
                    display_df[display_df["Unit"] == "Gb/s"],
@@ -1,7 +1,5 @@
 import os
-import sys
-from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, Union
+from typing import Any, Dict

 import yaml

@@ -13,14 +11,20 @@ MI50 = 0
 MI100 = 1
 MI200 = 2
 MI300 = 3
+MI350 = 4

-MI_CONSTANS = {MI50: "mi50", MI100: "mi100", MI200: "mi200", MI300: "mi300"}
+MI_CONSTANS = {
+    MI50: "mi50",
+    MI100: "mi100",
+    MI200: "mi200",
+    MI300: "mi300",
+    MI350: "mi350",
+}

 gpu_series_dict = {}  # key: gpu arch
 gpu_model_dict = {}  # key: gpu_arch
-mi300_num_xcds_dict = {}  # key: gpu model
-mi300_nps_dict = {}  # key: gpu model
-mi300_chip_id_dict = {}  # key: chip id (int)
+num_xcds_dict = {}  # key: gpu model
+chip_id_dict = {}  # key: chip id (int)


 # ----------------------------
@@ -60,10 +64,9 @@ def parse_mi_gpu_spec():
    MI GPUs
      |-- series
          |-- architecture (list)
-              |-- models
-                  |-- chip_ids
-                  |-- mi300_arch
-                  |-- partition_mode
+                |-- gpu model
+                |-- chip_ids
+                |-- partition_mode
    """

    current_dir = os.path.dirname(__file__)
@@ -71,61 +74,26 @@ def parse_mi_gpu_spec():

    # Load the YAML data
    yaml_data = load_yaml(yaml_file_path)
-    mi300_models_dict = {}

-    for mi_index, mi_series in MI_CONSTANS.items():
-        if mi_series != MI_CONSTANS[MI300]:
-            console_debug("[parse_mi_gpu_spec] Processing series: %s" % mi_series)
-            for key, value in yaml_data.items():
-                # parse out gpu series and gpu model information for mi50, 100, 200
-                curr_gpu_arch = value[mi_index]["gpu_archs"][0]["gpu_arch"]
-                gpu_series_dict[curr_gpu_arch] = mi_series
-                gpu_model_dict[curr_gpu_arch] = []
-                for models in value[mi_index]["gpu_archs"][0]["models"]:
-                    gpu_model_dict[curr_gpu_arch].append(models["gpu_model"])
-        elif mi_series == MI_CONSTANS[MI300]:
-            # MI300 requires specific processing
-            for key, value in yaml_data.items():
-                mi300_gpu_archs_list = []
-                # NOTE: only MI300 have multiple architectures
-                for archs in value[MI300]["gpu_archs"]:
-                    curr_gpu_arch = archs["gpu_arch"]
-                    mi300_gpu_archs_list.append(curr_gpu_arch)
-                    gpu_series_dict[curr_gpu_arch] = mi_series
-
-                for idx, arch in enumerate(mi300_gpu_archs_list):
-                    mi300_models_dict[arch] = []
-                    for models in value[MI300]["gpu_archs"][idx]["models"]:
-                        gpu_model = models["gpu_model"]
-
-                        # 1. Parse compute partition. NOTE: compute partition mode num xcds is available for all mi300 gpu models
-                        mi300_num_xcds_dict[gpu_model] = models["partition_mode"][
-                            "compute_partition_mode"
-                        ]["num_xcds"]
-
-                        # 2. Parse memory_partition. NOTE: memory partition mode nps is available for all mi300 gpu models
-                        mi300_nps_dict[gpu_model] = models["partition_mode"][
-                            "memory_partition_mode"
-                        ]
-
-                        # 3. Parse chip id (physical and virtual).
-                        if models["chip_ids"]["physical"]:
-                            # save chip_id, gpu_model pair if chip id is available
-                            # NOTE: chip id is available for all gfx942 machines
-                            mi300_chip_id_dict[models["chip_ids"]["physical"]] = models[
-                                "gpu_model"
-                            ]
-
-                        if models["chip_ids"]["virtual"]:
-                            # save chip_id, gpu_model pair if chip id is available
-                            # NOTE: chip id is available for all gfx942 machines
-                            mi300_chip_id_dict[models["chip_ids"]["virtual"]] = models[
-                                "gpu_model"
-                            ]
-
-                        mi300_models_dict[arch].append(gpu_model)
-
-    gpu_model_dict.update(mi300_models_dict)
+    for series in yaml_data["mi_gpu_spec"]:
+        curr_gpu_series = series["gpu_series"]
+        console_debug("[parse_mi_gpu_spec] Processing series: %s" % curr_gpu_series)
+        for archs in series["gpu_archs"]:
+            curr_gpu_arch = archs["gpu_arch"]
+            gpu_series_dict[curr_gpu_arch] = curr_gpu_series
+            gpu_model_dict[curr_gpu_arch] = []
+            for models in archs["models"]:
+                curr_gpu_model = models["gpu_model"]
+                gpu_model_dict[curr_gpu_arch].append(curr_gpu_model)
+                num_xcds_dict[curr_gpu_model] = (
+                    models.get("partition_mode", {})
+                    .get("compute_partition_mode", {})
+                    .get("num_xcds", {})
+                )
+                if "chip_ids" in models and "physical" in models["chip_ids"]:
+                    chip_id_dict[models["chip_ids"]["physical"]] = curr_gpu_model
+                if "chip_ids" in models and "virtual" in models["chip_ids"]:
+                    chip_id_dict[models["chip_ids"]["virtual"]] = curr_gpu_model


 def get_gpu_series_dict():
@@ -164,9 +132,9 @@ def get_gpu_model(gpu_arch_, chip_id_):
    gpu_arch_lower = gpu_arch_.lower()

    # Handle gfx942 with chip_id mapping
-    if gpu_arch_lower == "gfx942":
-        if chip_id_ and int(chip_id_) in mi300_chip_id_dict:
-            gpu_model = mi300_chip_id_dict.get(int(chip_id_))
+    if gpu_arch_lower not in ("gfx906", "gfx908", "gfx90a"):
+        if chip_id_ and int(chip_id_) in chip_id_dict:
+            gpu_model = chip_id_dict.get(int(chip_id_))
        else:
            console_warning(f"No gpu model found for chip id: {chip_id_}")
            return None
@@ -186,8 +154,12 @@ def get_gpu_model(gpu_arch_, chip_id_):
    return gpu_model.upper()


-def get_mi300_num_xcds(gpu_model_, compute_partition_):
-    if not mi300_num_xcds_dict:
+def get_num_xcds(gpu_model_, compute_partition_):
+    # Only gpu in and above mi 300 series have more than one XCDs
+    if gpu_model_.lower() in ("mi50", "mi60", "mi100", "mi210", "mi250", "mi250x"):
+        return 1
+
+    if not num_xcds_dict:
        console_error(
            "mi300_num_xcds_dict not yet populated, did you run parse_mi_gpu_spec()?"
        )
@@ -196,10 +168,10 @@ def get_mi300_num_xcds(gpu_model_, compute_partition_):
    gpu_model_lower = gpu_model_.lower()
    partition_lower = compute_partition_.lower()

-    if gpu_model_lower not in mi300_num_xcds_dict:
+    if gpu_model_lower not in num_xcds_dict:
        return None

-    model_dict = mi300_num_xcds_dict[gpu_model_lower]
+    model_dict = num_xcds_dict[gpu_model_lower]
    if partition_lower not in model_dict:
        console_log(f"Unknown compute partition: {compute_partition_}")
        return None
@@ -214,9 +186,9 @@ def get_mi300_num_xcds(gpu_model_, compute_partition_):
    return num_xcds


-def get_mi300_chip_id_dict():
-    if mi300_chip_id_dict:
-        return mi300_chip_id_dict
+def get_chip_id_dict():
+    if chip_id_dict:
+        return chip_id_dict
    else:
        console_error(
            "mi300_chip_id_dict not yet populated, did you run parse_mi_gpu_spec()?"
@@ -9,11 +9,11 @@
 # MI GPUs
 #   |-- series: the specific MI series; mi50, mi100, mi200, mi300
 #       |-- architecture: currently, only mi300 gpus hold different architectures
-#           |-- models
-#               |-- chip_ids: chip id is specific to the environment the gpu is being used on
-#               |-- partition_mode: currently, only mi300 gpus hold partition mode information
-#                                   two types: compute partition mode, memory partition mode,
-#                                   currently only mi300 gpus contains compute partition mode information on number of xcds
+#           |-- gpu model
+#           |-- chip_ids: chip id is specific to the environment the gpu is being used on
+#           |-- partition_mode
+#               | -- compute partition mode
+#               | -- memory partition mode
 #
 # --------------------------------------------------------------------------------

@@ -23,45 +23,31 @@ mi_gpu_spec:
      - gpu_arch: gfx906
        models:
          - gpu_model: mi50
-            partition_mode: null
-            chip_ids:
-              physical: null
-              virtual: null
          - gpu_model: mi60
-            partition_mode: null
-            chip_ids:
-              physical: null
-              virtual: null

  - gpu_series: mi100
    gpu_archs:
      - gpu_arch: gfx908
        models:
          - gpu_model: mi100
-            partition_mode: null
            chip_ids:
              physical: 29580
-              virtual: null

  - gpu_series: mi200
    gpu_archs:
      - gpu_arch: gfx90a
        models:
          - gpu_model: mi210
-            partition_mode: null
            chip_ids:
              physical: 29711
-              virtual: null
          - gpu_model: mi250
-            partition_mode: null
            chip_ids:
              physical: 29708
-              virtual: null
          - gpu_model: mi250x
-            partition_mode: null
            chip_ids:
              physical: 29704
-              virtual: null
+          - gpu_model: mi250
+          - gpu_model: mi250x

  - gpu_series: mi300
    gpu_archs:
@@ -72,16 +58,10 @@ mi_gpu_spec:
              compute_partition_mode:
                num_xcds:
                  spx: 6
-                  dpx: null
                  tpx: 2
-                  qpx: null
-                  cpx: null
              memory_partition_mode:
                nps4: [tpx]
                nps1: [spx, tpx]
-            chip_ids:
-              physical: null
-              virtual: null

      - gpu_arch: gfx941
        models:
@@ -91,15 +71,11 @@ mi_gpu_spec:
                num_xcds:
                  spx: 8
                  dpx: 4
-                  tpx: null
                  qpx: 2
                  cpx: 1
              memory_partition_mode:
                nps4: [qpx, cpx]
                nps1: [spx, qpx, cpx]
-            chip_ids:
-              physical: null
-              virtual: null

      - gpu_arch: gfx942
        models:
@@ -108,10 +84,7 @@ mi_gpu_spec:
              compute_partition_mode:
                num_xcds:
                  spx: 6
-                  dpx: null
                  tpx: 2
-                  qpx: null
-                  cpx: null
              memory_partition_mode:
                nps4: [tpx]
                nps1: [spx, tpx]
@@ -125,7 +98,6 @@ mi_gpu_spec:
                num_xcds:
                  spx: 8
                  dpx: 4
-                  tpx: null
                  qpx: 2
                  cpx: 1
              memory_partition_mode:
@@ -141,8 +113,6 @@ mi_gpu_spec:
                num_xcds:
                  spx: 4
                  dpx: 2
-                  tpx: null
-                  qpx: null
                  cpx: 1
              memory_partition_mode:
                nps4: [cpx]
@@ -150,3 +120,21 @@ mi_gpu_spec:
            chip_ids:
              physical: 29858
              virtual: 29878
+
+  - gpu_series: mi350
+    gpu_archs:
+      - gpu_arch: gfx950
+        models:
+          - gpu_model: mi350
+            partition_mode:
+              compute_partition_mode:
+                num_xcds:
+                  spx: 8
+                  dpx: 4
+                  qpx: 2
+                  cpx: 1
+              memory_partition_mode:
+                  nps1: [spx, dpx, qpx, cpx]
+                  nps4: [qpx, cpx]
+            chip_ids:
+              physical: 30112
@@ -86,6 +86,7 @@ build_in_vars = {
              0) / $max_waves_per_cu) * 8) + MIN(MOD(ROUND(AVG(((4 * SQ_BUSY_CU_CYCLES) \
              / $GRBM_GUI_ACTIVE_PER_XCD)), 0), $max_waves_per_cu), 8)), $cu_per_gpu))",
    "kernelBusyCycles": "ROUND(AVG((((End_Timestamp - Start_Timestamp) / 1000) * $max_sclk)), 0)",
+    "hbmBandwidth": "($max_mclk / 1000 * 32 * $num_hbm_channels)",
 }

 supported_call = {
@@ -700,19 +701,80 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
        console_error("Hauting execution for warning above.")

    ammolite__se_per_gpu = int(sys_info.se_per_gpu)
+    if np.isnan(ammolite__se_per_gpu) or ammolite__se_per_gpu == 0:
+        console_warning(
+            "se_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__pipes_per_gpu = int(sys_info.pipes_per_gpu)
+    if np.isnan(ammolite__pipes_per_gpu) or ammolite__pipes_per_gpu == 0:
+        console_warning(
+            "pipes_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__cu_per_gpu = int(sys_info.cu_per_gpu)
+    if np.isnan(ammolite__cu_per_gpu) or ammolite__cu_per_gpu == 0:
+        console_warning(
+            "cu_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__simd_per_cu = int(sys_info.simd_per_cu)  # not used
+    if np.isnan(ammolite__simd_per_cu) or ammolite__simd_per_cu == 0:
+        console_warning(
+            "simd_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__sqc_per_gpu = int(sys_info.sqc_per_gpu)
+    if np.isnan(ammolite__sqc_per_gpu) or ammolite__sqc_per_gpu == 0:
+        console_warning(
+            "sqc_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__lds_banks_per_cu = int(sys_info.lds_banks_per_cu)
+    if np.isnan(ammolite__lds_banks_per_cu) or ammolite__lds_banks_per_cu == 0:
+        console_warning(
+            "lds_banks_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__cur_sclk = float(sys_info.cur_sclk)  # not used
-    ammolite__mclk = float(sys_info.cur_mclk)  # not used
+    if np.isnan(ammolite__cur_sclk) or ammolite__cur_sclk == 0:
+        console_warning(
+            "cur_sclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
+    ammolite__cur_mclk = float(sys_info.cur_mclk)  # not used
+    if np.isnan(ammolite__cur_mclk) or ammolite__cur_mclk == 0:
+        console_warning(
+            "cur_mclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
+    ammolite__max_mclk = float(sys_info.max_mclk)
+    if np.isnan(ammolite__max_mclk) or ammolite__max_mclk == 0:
+        console_warning(
+            "max_mclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__max_sclk = float(sys_info.max_sclk)
+    if np.isnan(ammolite__max_sclk) or ammolite__max_sclk == 0:
+        console_warning(
+            "max_sclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__max_waves_per_cu = int(sys_info.max_waves_per_cu)
-    ammolite__hbm_bw = float(sys_info.hbm_bw)
+    if np.isnan(ammolite__max_waves_per_cu) or ammolite__max_waves_per_cu == 0:
+        console_warning(
+            "max_waver_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
+    ammolite__num_hbm_channels = float(sys_info.num_hbm_channels)
+    if np.isnan(ammolite__num_hbm_channels) or ammolite__num_hbm_channels == 0:
+        console_warning(
+            "num_hbm_channels is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__total_l2_chan = calc_builtin_var("$total_l2_chan", sys_info)
+    if np.isnan(ammolite__total_l2_chan) or ammolite__total_l2_chan == 0:
+        console_warning(
+            "total_l2_chan is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__num_xcd = int(sys_info.num_xcd)
+    if np.isnan(ammolite__num_xcd) or ammolite__num_xcd == 0:
+        console_warning(
+            "num_xcd is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )
    ammolite__wave_size = int(sys_info.wave_size)
+    if np.isnan(ammolite__wave_size) or ammolite__wave_size == 0:
+        console_warning(
+            "wave_size is not available in sysinfo.csv, please provide the correct value using --specs-correction"
+        )

    # TODO: fix all $normUnit in Unit column or title

@@ -751,6 +813,7 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
                ammolite__build_in[key] = None
    ammolite__numActiveCUs = ammolite__build_in["numActiveCUs"]
    ammolite__kernelBusyCycles = ammolite__build_in["kernelBusyCycles"]
+    ammolite__hbmBandwidth = ammolite__build_in["hbmBandwidth"]

    # Hmmm... apply + lambda should just work
    # df['Value'] = df['Value'].apply(lambda s: eval(compile(str(s), '<string>', 'eval')))
@@ -821,7 +884,6 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
                                        else:
                                            console_error("analysis", str(ae))

-                                # print("eval_metric", id, expr)
                                try:
                                    out = eval(compile(row[expr], "<string>", "eval"))

@@ -39,9 +39,9 @@ import pandas as pd

 import config
 from utils.logger import console_debug, console_error, console_log, console_warning
-from utils.mi_gpu_spec import get_gpu_series_dict, get_mi300_chip_id_dict
+from utils.mi_gpu_spec import get_chip_id_dict, get_gpu_series_dict, get_num_xcds
 from utils.tty import get_table_string
-from utils.utils import get_version, total_xcds
+from utils.utils import get_version

 VERSION_LOC = [
    "version",
@@ -72,7 +72,6 @@ def detect_arch(_rocminfo):

 def detect_gpu_chip_id(_rocminfo):
    gpu_chip_id = None
-    mi300_chip_id_dict = get_mi300_chip_id_dict().keys()

    for idx1, linetext in enumerate(_rocminfo):
        # NOTE: current supported socs only have numbers in Chip ID
@@ -84,8 +83,8 @@ def detect_gpu_chip_id(_rocminfo):
    if not gpu_chip_id:
        console_warning("No Chip ID detected: " + str(gpu_chip_id))
    elif (
-        gpu_chip_id not in mi300_chip_id_dict
-        and int(gpu_chip_id) not in mi300_chip_id_dict
+        gpu_chip_id not in get_chip_id_dict().keys()
+        and int(gpu_chip_id) not in get_chip_id_dict().keys()
    ):
        console_warning("Unknown Chip ID detected: " + str(gpu_chip_id))
    return gpu_chip_id
@@ -214,7 +213,7 @@ def generate_machine_specs(args, sysinfo: dict = None):
    specs.total_l2_chan: str = total_l2_banks(
        specs.gpu_model, int(specs._l2_banks), specs.compute_partition
    )
-    specs.hbm_bw: str = str(int(specs.max_mclk) / 1000 * 32 * specs.get_hbm_channels())
+    specs.num_hbm_channels: str = str(specs.get_hbm_channels())
    return specs


@@ -518,15 +517,6 @@ class MachineSpecs:
            "name": "Pipes per GPU",
        },
    )
-    hbm_bw: str = field(
-        default=None,
-        metadata={
-            "doc": "The peak theoretical HBM bandwidth for the accelerators/GPUs in the system. On systems with\n"
-            "configurable partitioning, (e.g., MI300) this is the peak theoretical HBM bandwidth for a partition.",
-            "name": "HBM BW",
-            "unit": "GB/s",
-        },
-    )
    num_xcd: str = field(
        default=None,
        metadata={
@@ -536,14 +526,13 @@ class MachineSpecs:
            "unit": "XCDs",
        },
    )
+    num_hbm_channels: str = field(
+        default=None,
+        metadata={"doc": "Number of HBM channels", "name": "HBM channels"},
+    )

    def get_hbm_channels(self):
-        # check MI300 has a valid compute partition
-        mi300a_archs = ["mi300a_a0", "mi300a_a1"]
-        mi300x_archs = ["mi300x_a0", "mi300x_a1"]
-        mi308x_archs = ["mi308x"]
-
-        if self.gpu_model.lower() in mi300a_archs + mi300x_archs + mi308x_archs:
+        if self.memory_partition.lower().startswith("nps"):
            hbmchannels = 128
            if self.memory_partition.lower() == "nps2":
                hbmchannels /= 2
@@ -551,10 +540,9 @@ class MachineSpecs:
                hbmchannels /= 4
            elif self.memory_partition.lower() == "nps8":
                hbmchannels /= 8
-            return int(hbmchannels)
+            return hbmchannels
        else:
-            hbmchannels = int(self.total_l2_chan)
-        return hbmchannels
+            return int(self.total_l2_chan)

    def get_class_members(self):
        all_populated = True
@@ -581,7 +569,7 @@ class MachineSpecs:
                data[name] = value

        if not all_populated:
-            console_error("Missing specs fields for %s" % self.gpu_arch)
+            console_warning("Missing specs fields for %s" % self.gpu_arch)
        return pd.DataFrame(data, index=[0])

    def __repr__(self):
@@ -682,7 +670,7 @@ def total_sqc(archname, numCUs, numSEs):


 def total_l2_banks(archname, L2Banks, compute_partition):
-    xcds = total_xcds(archname, compute_partition)
+    xcds = get_num_xcds(archname, compute_partition)
    totalL2Banks = L2Banks * xcds
    return totalL2Banks

@@ -43,16 +43,32 @@ import pandas as pd

 import config
 from utils.logger import console_debug, console_error, console_log, console_warning
-from utils.mi_gpu_spec import get_mi300_num_xcds
+from utils.mi_gpu_spec import get_num_xcds

 rocprof_cmd = ""
 rocprof_args = ""
+spi_pipe_counter_regexs = [r"SPI_CS\d+_(.*)", r"SPI_CSQ_P\d+_(.*)"]


 def is_tcc_channel_counter(counter):
    return counter.startswith("TCC") and counter.endswith("]")


+def is_spi_pipe_counter(counter):
+    for pattern in spi_pipe_counter_regexs:
+        if re.match(pattern, counter):
+            return True
+    return False
+
+
+def get_base_spi_pipe_counter(counter):
+    for pattern in spi_pipe_counter_regexs:
+        match = re.match(pattern, counter)
+        if match:
+            return match.group(1)
+    return ""
+
+
 def using_v1():

    return "ROCPROF" not in os.environ.keys() or (
@@ -571,12 +587,7 @@ def run_prof(

    # set required env var for mi300
    new_env = None
-    if (
-        mspec.gpu_model.lower() == "mi300x_a0"
-        or mspec.gpu_model.lower() == "mi300x_a1"
-        or mspec.gpu_model.lower() == "mi300a_a0"
-        or mspec.gpu_model.lower() == "mi300a_a1"
-    ):
+    if mspec.gpu_model.lower() not in ("mi50", "mi60", "mi210", "mi250", "mi250x"):
        new_env = os.environ.copy()
        new_env["ROCPROFILER_INDIVIDUAL_XCC_MODE"] = "1"

@@ -661,7 +672,7 @@ def run_prof(
    if new_env and not using_v3() and not using_v1():
        # flatten tcc for applicable mi300 input
        f = path(workload_dir + "/out/pmc_1/results_" + fbase + ".csv")
-        xcds = total_xcds(mspec.gpu_model, mspec.compute_partition)
+        xcds = get_num_xcds(mspec.gpu_model, mspec.compute_partition)
        df = flatten_tcc_info_across_xcds(f, xcds, int(mspec._l2_banks))
        df.to_csv(f, index=False)

@@ -1065,62 +1076,6 @@ def flatten_tcc_info_across_xcds(file, xcds, tcc_channel_per_xcd):
    return df


-def total_xcds(gpu_model, compute_partition):
-    """
-    Returns the number of xcds for a gpu model and compute_partition pair.
-    """
-
-    # For mi300 chips, return result from mi_gpu_spec
-    result = get_mi300_num_xcds(gpu_model, compute_partition)
-    if result:
-        return result
-
-    # For other systems, use manual check
-    # check MI300 has a valid compute partition
-    mi300a_model = ["mi300a_a0", "mi300a_a1"]
-    mi300x_model = ["mi300x_a0", "mi300x_a1"]
-    mi308x_model = ["mi308x"]
-    if (
-        gpu_model.lower() in mi300a_model + mi300x_model + mi308x_model
-        and compute_partition == "NA"
-    ):
-        console_error("Invalid compute partition found for {}".format(gpu_model))
-
-    if gpu_model.lower() not in mi300a_model + mi300x_model + mi308x_model:
-        return 1
-    # from the whitepaper
-    # https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
-    if compute_partition.lower() == "spx":
-        if gpu_model.lower() in mi300a_model:
-            return 6
-        if gpu_model.lower() in mi300x_model:
-            return 8
-        if gpu_model.lower() in mi308x_model:
-            return 4
-    if compute_partition.lower() == "tpx":
-        if gpu_model.lower() in mi300a_model:
-            return 2
-    if compute_partition.lower() == "dpx":
-        if gpu_model.lower() in mi300x_model:
-            return 4
-        if gpu_model.lower() in mi308x_model:
-            return 2
-    if compute_partition.lower() == "qpx":
-        if gpu_model.lower() in mi300x_model:
-            return 2
-    if compute_partition.lower() == "cpx":
-        if gpu_model.lower() in mi300x_model:
-            return 1
-        if gpu_model.lower() in mi308x_model:
-            return 1
-    # TODO implement other archs here as needed
-    console_error(
-        "Unknown compute partition / arch found for {} / {}".format(
-            compute_partition, gpu_model
-        )
-    )
-
-
 def get_submodules(package_name):
    """List all submodules for a target package"""
    import importlib
@@ -136,7 +136,7 @@ def test_L1_cache_counters(
            options,
            check_success=False,
            roof=False,
-            app_name=app_name
+            app_name=app_name,
        )
        assert return_code == 0

@@ -15,6 +15,7 @@ indirs = [
    "tests/workloads/vcopy/MI200",
    "tests/workloads/vcopy/MI300A_A1",
    "tests/workloads/vcopy/MI300X_A1",
+    "tests/workloads/vcopy/MI350",
 ]


@@ -255,9 +256,13 @@ def test_dispatch_5(binary_handler_analyze_rocprof_compute):
@pytest.mark.misc
 def test_gpu_ids(binary_handler_analyze_rocprof_compute):
    for dir in indirs:
+        if dir.endswith("MI350"):
+            gpu_id = "0"
+        else:
+            gpu_id = "2"
        workload_dir = test_utils.setup_workload_dir(dir)
        code = binary_handler_analyze_rocprof_compute(
-            ["analyze", "--path", workload_dir, "--gpu-id", "2"]
+            ["analyze", "--path", workload_dir, "--gpu-id", gpu_id]
        )
        assert code == 0

@@ -112,6 +112,13 @@ def test_analyze_ipblocks_TCC_MI200(binary_handler_analyze_rocprof_compute):
    assert code == 0


+def test_analyze_no_roof_MI350(binary_handler_analyze_rocprof_compute):
+    code = binary_handler_analyze_rocprof_compute(
+        ["analyze", "--path", "tests/workloads/no_roof/MI350"]
+    )
+    assert code == 0
+
+
 def test_analyze_no_roof_MI300X_A1(binary_handler_analyze_rocprof_compute):
    code = binary_handler_analyze_rocprof_compute(
        ["analyze", "--path", "tests/workloads/no_roof/MI300X_A1"]
@@ -14,6 +14,7 @@ import test_utils

 # Globals

+# TODO: MI350 What are the gpu models in MI 350 series
 SUPPORTED_ARCHS = {
    "gfx906": {"mi50": ["MI50", "MI60"]},
    "gfx908": {"mi100": ["MI100"]},
@@ -21,12 +22,14 @@ SUPPORTED_ARCHS = {
    "gfx940": {"mi300": ["MI300A_A0"]},
    "gfx941": {"mi300": ["MI300X_A0"]},
    "gfx942": {"mi300": ["MI300A_A1", "MI300X_A1"]},
+    "gfx950": {"mi350": ["MI350"]},
 }

-MI300_CHIP_IDS = {
+CHIP_IDS = {
    "29856": "MI300A_A1",
    "29857": "MI300X_A1",
    "29858": "MI308X",
+    "30112": "MI350",
 }


@@ -106,6 +109,25 @@ ALL_CSVS_MI300 = sorted(
        "timestamps.csv",
    ]
 )
+ALL_CSVS_MI350 = sorted(
+    [
+        "SQ_IFETCH_LEVEL.csv",
+        "SQ_INST_LEVEL_LDS.csv",
+        "SQ_INST_LEVEL_SMEM.csv",
+        "SQ_INST_LEVEL_VMEM.csv",
+        "SQ_LEVEL_WAVES.csv",
+        "pmc_perf.csv",
+        "pmc_perf_0.csv",
+        "pmc_perf_1.csv",
+        "pmc_perf_2.csv",
+        "pmc_perf_3.csv",
+        "pmc_perf_4.csv",
+        "pmc_perf_5.csv",
+        "pmc_perf_6.csv",
+        "pmc_perf_7.csv",
+        "sysinfo.csv",
+    ]
+)

 ROOF_ONLY_FILES = sorted(
    [
@@ -290,9 +312,9 @@ def gpu_soc():

    ## 3) Deduce gpu model name from arch
    gpu_model = list(SUPPORTED_ARCHS[gpu_arch].keys())[0].upper()
-    if gpu_model == "MI300":
-        if chip_id in MI300_CHIP_IDS:
-            gpu_model = MI300_CHIP_IDS[chip_id]
+    if gpu_model not in ("MI50", "MI100", "MI200"):
+        if chip_id in CHIP_IDS:
+            gpu_model = CHIP_IDS[chip_id]

    return gpu_model

@@ -303,6 +325,9 @@ soc = gpu_soc()
 if "MI300" in soc:
    os.environ["ROCPROF"] = "rocprofv2"

+if "MI350" in soc:
+    os.environ["ROCPROF"] = "rocprofv3"
+
 Baseline_dir = str(Path("tests/workloads/vcopy/" + soc).resolve())


@@ -491,6 +516,8 @@ def test_path(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("This test is not supported for {}".format(soc))
        assert 0
@@ -502,7 +529,7 @@ def test_path(binary_handler_profile_rocprof_compute):

@pytest.mark.misc
 def test_roof_kernel_names(binary_handler_profile_rocprof_compute):
-    if soc == "MI100":
+    if soc in ("MI100", "MI350"):
        # roofline is not supported on MI100
        assert True
        # Do not continue testing
@@ -517,7 +544,7 @@ def test_roof_kernel_names(binary_handler_profile_rocprof_compute):
    # assert successful run
    assert returncode == 0
    file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
-    if soc == "MI200" or "MI300" in soc:
+    if soc == "MI200" in soc or "MI300" in soc:
        assert sorted(list(file_dict.keys())) == sorted(
            ROOF_ONLY_FILES + ["kernelName_legend.pdf"]
        )
@@ -546,6 +573,8 @@ def test_device_filter(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("Testing isn't supported yet for {}".format(soc))
        assert 0
@@ -574,6 +603,8 @@ def test_kernel(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("Testing isn't supported yet for {}".format(soc))
        assert 0
@@ -625,6 +656,24 @@ def test_block_SQ(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs = [
+            "SQ_IFETCH_LEVEL.csv",
+            "SQ_INST_LEVEL_LDS.csv",
+            "SQ_INST_LEVEL_SMEM.csv",
+            "SQ_INST_LEVEL_VMEM.csv",
+            "SQ_LEVEL_WAVES.csv",
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -652,6 +701,8 @@ def test_block_SQC(binary_handler_profile_rocprof_compute):
        "sysinfo.csv",
        "timestamps.csv",
    ]
+    if soc == "MI350":
+        expected_csvs.remove("timestamps.csv")

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -684,6 +735,8 @@ def test_block_TA(binary_handler_profile_rocprof_compute):
        "sysinfo.csv",
        "timestamps.csv",
    ]
+    if soc == "MI350":
+        expected_csvs.remove("timestamps.csv")

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -721,6 +774,15 @@ def test_block_TD(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs = [
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -771,6 +833,8 @@ def test_block_TCP(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs.remove("timestamps.csv")

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -825,6 +889,8 @@ def test_block_TCC(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs.remove("timestamps.csv")

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -857,6 +923,23 @@ def test_block_SPI(binary_handler_profile_rocprof_compute):
        "sysinfo.csv",
        "timestamps.csv",
    ]
+    if soc == "MI350":
+        expected_csvs = [
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "pmc_perf_8.csv",
+            "pmc_perf_9.csv",
+            "pmc_perf_10.csv",
+            "pmc_perf_11.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -886,6 +969,19 @@ def test_block_CPC(binary_handler_profile_rocprof_compute):
        "sysinfo.csv",
        "timestamps.csv",
    ]
+    if soc == "MI350":
+        expected_csvs = [
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -910,6 +1006,8 @@ def test_block_CPF(binary_handler_profile_rocprof_compute):
        "sysinfo.csv",
        "timestamps.csv",
    ]
+    if soc == "MI350":
+        expected_csvs.remove("timestamps.csv")
    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

    validate(
@@ -959,6 +1057,24 @@ def test_block_SQ_CPC(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs = [
+            "SQ_IFETCH_LEVEL.csv",
+            "SQ_INST_LEVEL_LDS.csv",
+            "SQ_INST_LEVEL_SMEM.csv",
+            "SQ_INST_LEVEL_VMEM.csv",
+            "SQ_LEVEL_WAVES.csv",
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -1009,6 +1125,24 @@ def test_block_SQ_TA(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs = [
+            "SQ_IFETCH_LEVEL.csv",
+            "SQ_INST_LEVEL_LDS.csv",
+            "SQ_INST_LEVEL_SMEM.csv",
+            "SQ_INST_LEVEL_VMEM.csv",
+            "SQ_LEVEL_WAVES.csv",
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -1055,6 +1189,24 @@ def test_block_SQ_SPI(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs = [
+            "SQ_IFETCH_LEVEL.csv",
+            "SQ_INST_LEVEL_LDS.csv",
+            "SQ_INST_LEVEL_SMEM.csv",
+            "SQ_INST_LEVEL_VMEM.csv",
+            "SQ_LEVEL_WAVES.csv",
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -1106,6 +1258,24 @@ def test_block_SQ_SQC_TCP_CPC(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs = [
+            "SQ_IFETCH_LEVEL.csv",
+            "SQ_INST_LEVEL_LDS.csv",
+            "SQ_INST_LEVEL_SMEM.csv",
+            "SQ_INST_LEVEL_VMEM.csv",
+            "SQ_LEVEL_WAVES.csv",
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -1171,6 +1341,24 @@ def test_block_SQ_SPI_TA_TCC_CPF(binary_handler_profile_rocprof_compute):
            "sysinfo.csv",
            "timestamps.csv",
        ]
+    if soc == "MI350":
+        expected_csvs = [
+            "SQ_IFETCH_LEVEL.csv",
+            "SQ_INST_LEVEL_LDS.csv",
+            "SQ_INST_LEVEL_SMEM.csv",
+            "SQ_INST_LEVEL_VMEM.csv",
+            "SQ_LEVEL_WAVES.csv",
+            "pmc_perf.csv",
+            "pmc_perf_0.csv",
+            "pmc_perf_1.csv",
+            "pmc_perf_2.csv",
+            "pmc_perf_3.csv",
+            "pmc_perf_4.csv",
+            "pmc_perf_5.csv",
+            "pmc_perf_6.csv",
+            "pmc_perf_7.csv",
+            "sysinfo.csv",
+        ]

    assert sorted(list(file_dict.keys())) == sorted(expected_csvs)

@@ -1196,6 +1384,8 @@ def test_dispatch_0(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("Testing isn't supported yet for {}".format(soc))
        assert 0
@@ -1226,6 +1416,8 @@ def test_dispatch_0_1(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("Testing isn't supported yet for {}".format(soc))
        assert 0
@@ -1253,6 +1445,8 @@ def test_dispatch_2(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("Testing isn't supported yet for {}".format(soc))
        assert 0
@@ -1283,6 +1477,8 @@ def test_join_type_grid(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("Testing isn't supported yet for {}".format(soc))
        assert 0
@@ -1310,6 +1506,8 @@ def test_join_type_kernel(binary_handler_profile_rocprof_compute):
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
    elif "MI300" in soc:
        assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
+    elif "MI350" in soc:
+        assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
    else:
        print("Testing isn't supported yet for {}".format(soc))
        assert 0
@@ -1326,7 +1524,7 @@ def test_join_type_kernel(binary_handler_profile_rocprof_compute):
@pytest.mark.sort
 def test_roof_sort_dispatches(binary_handler_profile_rocprof_compute):
    # only test 1 device for roofline
-    if soc == "MI100":
+    if soc in ("MI100", "MI350"):
        # roofline is not supported on MI100
        assert True
        # Do not continue testing
@@ -1356,7 +1554,7 @@ def test_roof_sort_dispatches(binary_handler_profile_rocprof_compute):
@pytest.mark.sort
 def test_roof_sort_kernels(binary_handler_profile_rocprof_compute):
    # only test 1 device for roofline
-    if soc == "MI100":
+    if soc in ("MI100", "MI350"):
        # roofline is not supported on MI100
        assert True
        # Do not continue testing
@@ -1386,7 +1584,7 @@ def test_roof_sort_kernels(binary_handler_profile_rocprof_compute):
@pytest.mark.mem
 def test_roof_mem_levels_vL1D(binary_handler_profile_rocprof_compute):
    # only test 1 device for roofline
-    if soc == "MI100":
+    if soc in ("MI100", "MI350"):
        # roofline is not supported on MI100
        assert True
        # Do not continue testing
@@ -1416,7 +1614,7 @@ def test_roof_mem_levels_vL1D(binary_handler_profile_rocprof_compute):
@pytest.mark.mem
 def test_roof_mem_levels_LDS(binary_handler_profile_rocprof_compute):
    # only test 1 device for roofline
-    if soc == "MI100":
+    if soc in ("MI100", "MI350"):
        # roofline is not supported on MI100
        assert True
        # Do not continue testing
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:54:18 PM  (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM  (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:35:56 PM  (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM  (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
@@ -0,0 +1,4 @@
+Dispatch_ID,Kernel_Name,GPU_ID
+0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
+1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
+2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:39:25 PM  (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM  (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
@@ -0,0 +1,4 @@
+Dispatch_ID,Kernel_Name,GPU_ID
+0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
+1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
+2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:03:10 PM  (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM  (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:53:52 PM  (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM  (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:33:56 PM  (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM  (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
@@ -0,0 +1,4 @@
+Dispatch_ID,Kernel_Name,GPU_ID
+0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
+1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
+2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:38:17 PM  (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM  (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
@@ -0,0 +1,4 @@
+Dispatch_ID,Kernel_Name,GPU_ID
+0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
+1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
+2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:02:25 PM  (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM  (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:53:14 PM  (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM  (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:24:01 PM  (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM  (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
@@ -0,0 +1,4 @@
+Dispatch_ID,Kernel_Name,GPU_ID
+0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
+1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
+2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:36:42 PM  (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM  (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
@@ -0,0 +1,4 @@
+Dispatch_ID,Kernel_Name,GPU_ID
+0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
+1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
+2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
@@ -1,2 +1,2 @@
-workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
-dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:01:22 PM  (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
+workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
+vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM  (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
--- a/Show More
+++ b/Show More