From 89c74ac3d3aa6516aa8be2eadf4dc850b227472b Mon Sep 17 00:00:00 2001
From: "systems-assistant[bot]"
 <221163467+systems-assistant[bot]@users.noreply.github.com>
Date: Wed, 6 Aug 2025 18:39:50 -0400
Subject: [PATCH] Update `Unit` of Bandwidth metrics to Gbps (#96)

* Add Utilization to metric name for Bandwidth related metrics whose Unit
  is Percent

* Update Unit of Bandwidth metrics to Gbps
    * Update metric Formula to use total duration as denominator instead of normalization unit.
    * Update metric Description
    * Update metric Unit

* Update CHANGELOG
---
 projects/rocprofiler-compute/CHANGELOG.md     |  20 +
 .../docs/data/metrics_description.yaml        | 168 ++--
 .../gfx908/1200_local_data_share_lds.yaml     |  16 +-
 .../gfx908/1300_instruction_cache.yaml        |  27 +-
 .../gfx908/1400_scalar_l1_data_cache.yaml     |  23 +-
 .../gfx908/1600_vector_l1_data_cache.yaml     |  42 +-
 .../gfx908/1700_l2_cache.yaml                 |  60 +-
 .../gfx90a/1200_local_data_share_lds.yaml     |  16 +-
 .../gfx90a/1300_instruction_cache.yaml        |  27 +-
 .../gfx90a/1400_scalar_l1_data_cache.yaml     |  23 +-
 .../gfx90a/1600_vector_l1_data_cache.yaml     |  42 +-
 .../gfx90a/1700_l2_cache.yaml                 |  60 +-
 .../gfx940/1200_local_data_share_lds.yaml     |  16 +-
 .../gfx940/1300_instruction_cache.yaml        |  27 +-
 .../gfx940/1400_scalar_l1_data_cache.yaml     |  23 +-
 .../gfx940/1600_vector_l1_data_cache.yaml     |  42 +-
 .../gfx940/1700_l2_cache.yaml                 |  60 +-
 .../gfx941/1200_local_data_share_lds.yaml     |  16 +-
 .../gfx941/1300_instruction_cache.yaml        |  27 +-
 .../gfx941/1400_scalar_l1_data_cache.yaml     |  23 +-
 .../gfx941/1600_vector_l1_data_cache.yaml     |  42 +-
 .../gfx941/1700_l2_cache.yaml                 |  60 +-
 .../gfx942/1200_local_data_share_lds.yaml     |  16 +-
 .../gfx942/1300_instruction_cache.yaml        |  27 +-
 .../gfx942/1400_scalar_l1_data_cache.yaml     |  23 +-
 .../gfx942/1600_vector_l1_data_cache.yaml     |  42 +-
 .../gfx942/1700_l2_cache.yaml                 |  63 +-
 .../gfx950/1200_local_data_share_lds.yaml     |  16 +-
 .../gfx950/1300_instruction_cache.yaml        |  27 +-
 .../gfx950/1400_scalar_l1_data_cache.yaml     |  23 +-
 .../gfx950/1600_vector_l1_data_cache.yaml     |  42 +-
 .../gfx950/1700_l2_cache.yaml                 | 156 ++--
 .../utils/autogen_hash.yaml                   |  62 +-
 .../utils/unified_config.yaml                 | 719 +++++++++---------
 34 files changed, 1088 insertions(+), 988 deletions(-)

diff --git a/projects/rocprofiler-compute/CHANGELOG.md b/projects/rocprofiler-compute/CHANGELOG.md
index 9f33653aa6..9c2a3075fe 100644
--- a/projects/rocprofiler-compute/CHANGELOG.md
+++ b/projects/rocprofiler-compute/CHANGELOG.md
@@ -27,6 +27,26 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
 
 * Change the basic view of TUI from aggregated analysis data to individual kernel analysis data
 
+* Update `Unit` of the following `Bandwidth` related metrics to `Gbps` instead of `Bytes per Normalization Unit`
+  * Theoretical Bandwidth (section 1202)
+  * L1I-L2 Bandwidth (section 1303)
+  * sL1D-L2 BW (section 1403)
+  * Cache BW (section 1603)
+  * L1-L2 BW (section 1603)
+  * Read BW (section 1702)
+  * Write and Atomic BW (section 1702)
+  * Bandwidth (section 1703)
+  * Atomic/Read/Write Bandwidth (section 1703)
+  * Atomic/Read/Write Bandwidth - (HBM/PCIe/Infinity Fabric) (section 1706)
+
+* Add `Utilization` to metric name for the following `Bandwidth` related metrics whose `Unit` is `Percent`
+  * Theoretical Bandwidth Utilization (section 1201)
+  * L1I-L2 Bandwidth Utilization (section 1301)
+  * Bandwidth Utilization (section 1301)
+  * Bandwidth Utilization (section 1401)
+  * sL1D-L2 BW Utilization (section 1401)
+  * Bandwidth Utilization (section 1601)
+
 ### Resolved issues
 
 * Fixed not detecting memory clock issue when using amd-smi
diff --git a/projects/rocprofiler-compute/docs/data/metrics_description.yaml b/projects/rocprofiler-compute/docs/data/metrics_description.yaml
index 512518ab65..12eb28816a 100644
--- a/projects/rocprofiler-compute/docs/data/metrics_description.yaml
+++ b/projects/rocprofiler-compute/docs/data/metrics_description.yaml
@@ -397,13 +397,13 @@ LDS Speed-of-Light:
       over the number of LDS cycles that would have been  required to move the same
       amount of data in an uncontended access. [#lds-bank-conflict]_
     unit: Percent
-  Theoretical Bandwidth:
+  Theoretical Bandwidth Utilization:
     rst: Indicates the maximum amount of bytes that could have been loaded from,  stored
-      to, or atomically updated in the LDS per  :ref:`normalization unit <normalization-units>`.
+      to, or atomically updated in the LDS divided as percentage of theoretical peak.
       Does *not* take into  account the execution mask of the wavefront when the instruction
       was  executed. See the  :ref:`LDS bandwidth example <lds-bandwidth>` for more
       detail.
-    unit: Bytes per normalization unit
+    unit: Percent
   Utilization:
     rst: Indicates what percent of the kernel's duration the :ref:`LDS <desc-lds>`  was
       actively executing instructions (including, but not limited to, load,  store,
@@ -450,17 +450,16 @@ LDS Statistics:
     unit: Accesses per normalization unit
   Theoretical Bandwidth:
     rst: Indicates the maximum amount of bytes that could have been loaded from,  stored
-      to, or atomically updated in the LDS per  :ref:`normalization unit <normalization-units>`.
-      Does *not* take into  account the execution mask of the wavefront when the instruction
-      was  executed. See the  :ref:`LDS bandwidth example <lds-bandwidth>` for more
-      detail.
-    unit: Bytes per normalization unit
+      to, or atomically updated in the LDS divided by total duration. Does *not* take
+      into  account the execution mask of the wavefront when the instruction was  executed.
+      See the  :ref:`LDS bandwidth example <lds-bandwidth>` for more detail.
+    unit: Gbps
   Unaligned Stall:
     rst: The total number of cycles spent in the :ref:`LDS scheduler <desc-lds>`  due
       to stalls from non-dword aligned addresses per  :ref:`normalization unit <normalization-units>`.
     unit: Cycles per normalization unit
 vL1D Speed-of-Light:
-  Bandwidth:
+  Bandwidth Utilization:
     rst: The number of bytes looked up in the vL1D cache as a result of  :ref:`VMEM
       <desc-vmem>` instructions, as a percent of the peak  theoretical bandwidth achievable
       on the specific accelerator. The number  of bytes is calculated as the number
@@ -614,13 +613,13 @@ vL1D cache access metrics:
     rst: The total number of cache line lookups in the vL1D.
     unit: Cache lines
   Cache BW:
-    rst: The number of bytes looked up in the vL1D cache as a result of  :ref:`VMEM
-      <desc-vmem>` instructions per  :ref:`normalization unit <normalization-units>`.  The
-      number of bytes is  calculated as the number of cache lines requested multiplied
-      by the cache  line size.  This value does not consider partial requests, so
-      for  instance, if only a single value is requested in a cache line, the data  movement
-      will still be counted as a full cache line.
-    unit: Bytes per normalization unit
+    rst: The number of bytes looked up in the vL1D cache as a result of :ref:`VMEM
+      <desc-vmem>` instructions divided by total duration. The number of bytes is
+      calculated as the number of cache lines requested multiplied by the cache line
+      size. This value does not consider partial requests, so for  instance, if only
+      a single value is requested in a cache line, the data movement will still be
+      counted as a full cache line.
+    unit: Gbps
   Cache Hit Rate:
     rst: The ratio of the number of vL1D cache line requests that hit in vL1D  cache
       over the total number of cache line requests to the  :ref:`vL1D Cache RAM <desc-tc>`.
@@ -646,12 +645,12 @@ vL1D cache access metrics:
     unit: Requests per normalization unit
   L1-L2 BW:
     rst: The number of bytes transferred across the vL1D-L2 interface as a result  of
-      :ref:`VMEM <desc-vmem>` instructions, per  :ref:`normalization unit <normalization-units>`.
-      The number of bytes is  calculated as the number of cache lines requested multiplied
-      by the cache  line size. This value does not consider partial requests, so for  instance,
+      :ref:`VMEM <desc-vmem>` instructions, divided by total duration. The number
+      of bytes is  calculated as the number of cache lines requested multiplied by
+      the cache  line size. This value does not consider partial requests, so for  instance,
       if only a single value is requested in a cache line, the data  movement will
       still be counted as a full cache line.
-    unit: Bytes per normalization unit
+    unit: Gbps
   L1-L2 Read:
     rst: The number of read requests for a vL1D cache line that were not satisfied  by
       the vL1D and must be retrieved from the to the  :doc:`L2 Cache <l2-cache>` per  :ref:`normalization
@@ -761,20 +760,20 @@ L2 Speed-of-Light:
     unit: Percent
 L2 cache accesses:
   Atomic Bandwidth:
-    rst: Total number of bytes looked up in the L2 cache for atomic requests, per
-      :ref:`normalization unit <normalization-units>`.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes looked up in the L2 cache for atomic requests, divided
+      by total duration.
+    unit: Gbps
   Atomic Req:
     rst: The total number of atomic requests (with and without return) to the L2 from
       all clients.
     unit: Requests per normalization unit
   Bandwidth:
-    rst: The number of bytes looked up in the L2 cache, per  :ref:`normalization unit
-      <normalization-units>`.  The number of bytes is  calculated as the number of
-      cache lines requested multiplied by the cache  line size. This value does not
-      consider partial requests, so for example,  if only a single value is requested
-      in a cache line, the data movement  will still be counted as a full cache line.
-    unit: Bytes per normalization unit
+    rst: The number of bytes looked up in the L2 cache, divided by total duration.
+      The number of bytes is  calculated as the number of cache lines requested multiplied
+      by the cache line size. This value does not consider partial requests, so for
+      example, if only a single value is requested in a cache line, the data movement  will
+      still be counted as a full cache line.
+    unit: Gbps
   CC Req:
     rst: The total number of requests to the L2 that go to Coherently Cacheable (CC)  memory
       allocations. See the :ref:`memory-type` for more information.
@@ -818,9 +817,9 @@ L2 cache accesses:
       allocations. See the :ref:`memory-type` for more information.
     unit: Requests per normalization unit
   Read Bandwidth:
-    rst: Total number of bytes looked up in the L2 cache for read requests, per :ref:`normalization
-      unit <normalization-units>`.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes looked up in the L2 cache for read requests, divided
+      by total duration.
+    unit: Gbps
   Read Req:
     rst: 'The total number of read requests to the L2 from all clients.  '
     unit: Requests per normalization unit
@@ -841,9 +840,9 @@ L2 cache accesses:
       See the :ref:`memory-type` for more information.
     unit: Requests per normalization unit
   Write Bandwidth:
-    rst: Total number of bytes looked up in the L2 cache for write requests, per :ref:`normalization
-      unit <normalization-units>`.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes looked up in the L2 cache for write requests, divided
+      by total duration.
+    unit: Gbps
   Write Req:
     rst: The total number of write requests to the L2 from all clients.
     unit: Requests per normalization unit
@@ -896,9 +895,9 @@ L2-Fabric interface metrics:
       memory <memory-type>` allocations.
     unit: Percent
   Read BW:
-    rst: The total number of bytes read by the L2 cache from Infinity Fabric per  :ref:`normalization
-      unit <normalization-units>`.
-    unit: Bytes per normalization unit
+    rst: The total number of bytes read by the L2 cache from Infinity Fabric divided
+      by total duration.
+    unit: Gbps
   Read Latency:
     rst: The time-averaged number of cycles read requests spent in Infinity Fabric  before
       data was returned to the L2.
@@ -954,12 +953,12 @@ L2-Fabric interface metrics:
     unit: Percent
   Write and Atomic BW:
     rst: The total number of bytes written by the L2 over Infinity Fabric by write  and
-      atomic operations per  :ref:`normalization unit <normalization-units>`. Note
-      that on current  CDNA accelerators, such as the :ref:`MI2XX <mixxx-note>`, requests
-      are  only considered *atomic* by Infinity Fabric if they are targeted at  non-write-cacheable
-      memory, for example,  :ref:`fine-grained memory <memory-type>` allocations or  :ref:`uncached
+      atomic operations divided by total duration. Note that on current  CDNA accelerators,
+      such as the :ref:`MI2XX <mixxx-note>`, requests are  only considered *atomic*
+      by Infinity Fabric if they are targeted at  non-write-cacheable memory, for
+      example,  :ref:`fine-grained memory <memory-type>` allocations or  :ref:`uncached
       memory <memory-type>` allocations on the  MI2XX.
-    unit: Bytes per normalization unit
+    unit: Gbps
   Write and Atomic Latency:
     rst: The time-averaged number of cycles write requests spent in Infinity Fabric
       before a completion acknowledgement was returned to the L2.
@@ -975,17 +974,17 @@ L2 - Fabric interface detailed metrics:
       memory <memory-type>` allocations on the MI2XX.
     unit: Requests per normalization unit
   Atomic Bandwidth - HBM:
-    rst: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization
-      unit.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes due to L2 atomic requests due to HBM traffic, divided
+      by total duration.
+    unit: Gbps
   "Atomic Bandwidth - Infinity Fabric\u2122":
     rst: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic,
-      per normalization unit.
-    unit: Bytes per normalization unit
+      divided by total duration.
+    unit: Gbps
   Atomic Bandwidth - PCIe:
-    rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, per
-      normalization unit.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, divided
+      by total duration.
+    unit: Gbps
   HBM Read:
     rst: The total number of L2 requests to Infinity Fabric to read 32B or 64B of  data
       from the accelerator's local HBM, per  :ref:`normalization unit <normalization-units>`.
@@ -1013,17 +1012,17 @@ L2 - Fabric interface detailed metrics:
       uncached data requests. See  :ref:`l2-request-flow` for more detail.
     unit: Requests per normalization unit
   Read Bandwidth - HBM:
-    rst: Total number of bytes due to L2 read requests due to HBM traffic, per normalization
-      unit.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes due to L2 read requests due to HBM traffic, divided
+      by total duration.
+    unit: Gbps
   "Read Bandwidth - Infinity Fabric\u2122":
     rst: Total number of bytes due to L2 read requests due to Infinity Fabric traffic,
-      per normalization unit.
-    unit: Bytes per normalization unit
+      divided by total duration.
+    unit: Gbps
   Read Bandwidth - PCIe:
-    rst: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization
-      unit.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes due to L2 read requests due to PCIe traffic, divided
+      by total duration.
+    unit: Gbps
   Remote Read:
     rst: The total number of L2 requests to Infinity Fabric to read 32B or 64B of  data
       from any source other than the accelerator's local HBM, per  :ref:`normalization
@@ -1036,17 +1035,17 @@ L2 - Fabric interface detailed metrics:
       for more detail.
     unit: Requests per normalization unit
   Write Bandwidth - HBM:
-    rst: Total number of bytes due to L2 write requests due to HBM traffic, per normalization
-      unit.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes due to L2 write requests due to HBM traffic, divided
+      by total duration.
+    unit: Gbps
   "Write Bandwidth - Infinity Fabric\u2122":
     rst: Total number of bytes due to L2 write requests due to Infinity Fabric traffic,
-      per normalization unit.
-    unit: Bytes per normalization unit
+      divided by total duration.
+    unit: Gbps
   Write Bandwidth - PCIe:
-    rst: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization
-      unit.
-    unit: Bytes per normalization unit
+    rst: Total number of bytes due to L2 write requests due to PCIe traffic, divided
+      by total duration.
+    unit: Gbps
   Write and Atomic (32B):
     rst: The total number of L2 requests to Infinity Fabric to write or atomically  update
       32B of data to any memory location, per  :ref:`normalization unit <normalization-units>`.
@@ -1098,7 +1097,7 @@ L2 - Fabric Interface stalls:
       of the :ref:`total active L2 cycles <total-active-l2-cycles>`.
     unit: Percent
 Scalar L1D Speed-of-Light:
-  Bandwidth:
+  Bandwidth Utilization:
     rst: The number of bytes looked up in the sL1D cache, as a percent of the peak  theoretical
       bandwidth. Calculated as the ratio of sL1D requests over the  :ref:`total sL1D
       cycles <total-sl1d-cycles>`.
@@ -1108,13 +1107,11 @@ Scalar L1D Speed-of-Light:
       the cache. The ratio of the number of sL1D requests that hit  [#sl1d-cache]_
       over the number of all sL1D requests.
     unit: Percent
-  sL1D-L2 BW:
-    rst: "The total number of bytes read from, written to, or atomically updated \
-      \ across the sL1D\u2194:doc:`L2 <l2-cache>` interface, per  :ref:`normalization\
-      \ unit <normalization-units>`. Note that sL1D writes  and atomics are typically\
-      \ unused on current CDNA accelerators, so in the  majority of cases this can\
-      \ be interpreted as an sL1D\u2192L2 read bandwidth."
-    unit: Bytes per normalization unit
+  sL1D-L2 BW Utilization:
+    rst: The percentage of the peak theoretical sL1D - L2 interface bandwidth acheived.\
+      \ Caclulated as total number of bytes read from, written to, or atomically updated\
+      \ across the sL1D - L2 interface.
+    unit: Percent
 Scalar L1D cache accesses:
   Atomic Req:
     rst: The total number of atomic requests from sL1D to the  :doc:`L2 <l2-cache>`,
@@ -1189,13 +1186,13 @@ Scalar L1D Cache - L2 Interface:
     unit: Requests per normalization unit
   sL1D-L2 BW:
     rst: "The total number of bytes read from, written to, or atomically updated \
-      \ across the sL1D\u2194:doc:`L2 <l2-cache>` interface, per  :ref:`normalization\
-      \ unit <normalization-units>`. Note that sL1D writes  and atomics are typically\
-      \ unused on current CDNA accelerators, so in the  majority of cases this can\
-      \ be interpreted as an sL1D\u2192L2 read bandwidth."
-    unit: Bytes per normalization unit
+      \ across the sL1D\u2194:doc:`L2 <l2-cache>` interface, divided by total duration.\
+      \ Note that sL1D writes and atomics are typically unused on current CDNA accelerators,\
+      \ so in the  majority of cases this can be interpreted as an sL1D\u2192L2 read\
+      \ bandwidth."
+    unit: Gbps
 L1I Speed-of-Light:
-  Bandwidth:
+  Bandwidth Utilization:
     rst: The number of bytes looked up in the L1I cache, as a percent of the peak  theoretical
       bandwidth. Calculated as the ratio of L1I requests over the  :ref:`total L1I
       cycles <total-l1i-cycles>`.
@@ -1205,7 +1202,7 @@ L1I Speed-of-Light:
       the cache. Calculated as the ratio of the number of L1I requests  that hit over
       the number of all L1I requests.
     unit: Percent
-  L1I-L2 Bandwidth:
+  L1I-L2 Bandwidth Utilization:
     rst: "The percent of the peak theoretical L1I \u2192 L2 cache request bandwidth\
       \  achieved. Calculated as the ratio of the total number of requests from  the\
       \ L1I to the L2 cache over the  :ref:`total L1I-L2 interface cycles <total-l1i-cycles>`."
@@ -1238,10 +1235,9 @@ L1I cache accesses:
     unit: Requests per normalization unit
 L1I <-> L2 interface:
   L1I-L2 Bandwidth:
-    rst: "The percent of the peak theoretical L1I \u2192 L2 cache request bandwidth\
-      \  achieved. Calculated as the ratio of the total number of requests from  the\
-      \ L1I to the L2 cache over the  :ref:`total L1I-L2 interface cycles <total-l1i-cycles>`."
-    unit: Percent
+    rst: Total number of bytes transferred across L1I - L2 interface divided by total
+      duration.
+    unit: Gbps
 Workgroup manager utilizations:
   Accelerator Utilization:
     rst: The percent of cycles in the kernel where the accelerator was actively doing
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1200_local_data_share_lds.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1200_local_data_share_lds.yaml
index 6cfe19d9de..2718654ad4 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1200_local_data_share_lds.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1200_local_data_share_lds.yaml
@@ -11,8 +11,12 @@ Panel Config:
       instructions, averaged over the lifetime of the kernel. Calculated as the ratio
       of the total number of cycles spent by the scheduler issuing LDS instructions
       over the total CU cycles.
+    Theoretical Bandwidth Utilization: Indicates the maximum amount of bytes that
+      could have been loaded from, stored to, or atomically updated in the LDS divided
+      as percentage of theoretical peak. Does not take into account the execution
+      mask of the wavefront when the instruction was executed.
     Theoretical Bandwidth: Indicates the maximum amount of bytes that could have been
-      loaded from, stored to, or atomically updated in the LDS per normalization unit.
+      loaded from, stored to, or atomically updated in the LDS divided by total duration.
       Does not take into account the execution mask of the wavefront when the instruction
       was executed.
     Bank Conflict Rate: Indicates the percentage of active LDS cycles that were spent
@@ -58,7 +62,7 @@ Panel Config:
         Access Rate:
           value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
           unit: Pct of Peak
-        Theoretical Bandwidth:
+        Theoretical Bandwidth Utilization:
           value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
             / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
           unit: Pct of Peak
@@ -86,12 +90,12 @@ Panel Config:
           unit: (Instr  + $normUnit)
         Theoretical Bandwidth:
           avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
-          unit: (Bytes  + $normUnit)
+            / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         LDS Latency:
           avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else
             None))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1300_instruction_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1300_instruction_cache.yaml
index a53c23691f..aeda9bc6c7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1300_instruction_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1300_instruction_cache.yaml
@@ -3,15 +3,18 @@ Panel Config:
   id: 1300
   title: Instruction Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the L1I cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of L1I requests over the
-      total L1I cycles.
+    Bandwidth Utilization: The number of bytes looked up in the L1I cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of L1I requests over
+      the total L1I cycles.
     Cache Hit Rate: The percent of L1I requests that hit [#l1i-cache]_ on a previously
       loaded line the cache. Calculated as the ratio of the number of L1I requests
       that hit over the number of all L1I requests.
-    L1I-L2 Bandwidth: "The percent of the peak theoretical L1I \u2192 L2 cache request\
-      \ bandwidth achieved. Calculated as the ratio of the total number of requests\
-      \ from the L1I to the L2 cache over the total L1I-L2 interface cycles."
+    L1I-L2 Bandwidth Utilization: "The percent of the peak theoretical L1I \u2192\
+      \ L2 cache request bandwidth achieved. Calculated as the ratio of the total\
+      \ number of requests from the L1I to the L2 cache over the total L1I-L2 interface\
+      \ cycles."
+    L1I-L2 Bandwidth: Total number of bytes transferred across L1I - L2 interface
+      divided by total duration.
     Req: The total number of requests made to the L1I per normalization-unit
     Hits: The total number of L1I requests that hit on a previously loaded cache line,
       per normalization-unit.
@@ -30,7 +33,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -38,7 +41,7 @@ Panel Config:
           value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
             + SQC_ICACHE_MISSES_DUPLICATE)))
           unit: Pct of Peak
-        L1I-L2 Bandwidth:
+        L1I-L2 Bandwidth Utilization:
           value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
             * (End_Timestamp - Start_Timestamp))))
           unit: Pct of Peak
@@ -100,7 +103,7 @@ Panel Config:
         unit: Unit
       metric:
         L1I-L2 Bandwidth:
-          avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-          min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-          max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1400_scalar_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1400_scalar_l1_data_cache.yaml
index d43157ce8e..282b97ad1f 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1400_scalar_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1400_scalar_l1_data_cache.yaml
@@ -3,14 +3,17 @@ Panel Config:
   id: 1400
   title: Scalar L1 Data Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the sL1D cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of sL1D requests over the
-      total sL1D cycles.
+    Bandwidth Utilization: The number of bytes looked up in the sL1D cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of sL1D requests
+      over the total sL1D cycles.
     Cache Hit Rate: Indicates the percent of sL1D requests that hit on a previously
       loaded line the cache. The ratio of the number of sL1D requests that hit over
       the number of all sL1D requests.
+    sL1D-L2 BW Utilization: The percentage of the peak theoretical sL1D - L2 interface
+      bandwidth acheived.\ \ Caclulated as total number of bytes read from, written
+      to, or atomically updated\ \ across the sL1D - L2 interface.
     sL1D-L2 BW: "The total number of bytes read from, written to, or atomically updated\
-      \ across the sL1D\u2194L2 interface, per normalization unit. Note that sL1D\
+      \ across the sL1D\u2194L2 interface, divided by total duration. Note that sL1D\
       \ writes and atomics are typically unused on current CDNA accelerators, so in\
       \ the majority of cases this can be interpreted as an sL1D\u2192L2 read bandwidth."
     Req: The total number of requests, of any size or type, made to the sL1D per normalization
@@ -51,7 +54,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -60,7 +63,7 @@ Panel Config:
             + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
             + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
           unit: Pct of Peak
-        sL1D-L2 BW:
+        sL1D-L2 BW Utilization:
           value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
             * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
           unit: Pct of Peak
@@ -158,12 +161,12 @@ Panel Config:
       metric:
         sL1D-L2 BW:
           avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
-          unit: (Bytes + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Read Req:
           avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
           min: MIN((SQC_TC_DATA_READ_REQ / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml
index 96e021e378..50af33c21b 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml
@@ -5,12 +5,12 @@ Panel Config:
   metrics_description:
     Hit rate: The ratio of the number of vL1D cache line requests that hit in vL1D
       cache over the total number of cache line requests to the vL1D Cache RAM.
-    Bandwidth: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions, as a percent of the peak theoretical bandwidth achievable on the
-      specific accelerator. The number of bytes is calculated as the number of cache
-      lines requested multiplied by the cache line size. This value does not consider
-      partial requests, so for instance, if only a single value is requested in a
-      cache line, the data movement will still be counted as a full cache line.
+    Bandwidth Utilization: The number of bytes looked up in the vL1D cache as a result
+      of VMEM instructions, as a percent of the peak theoretical bandwidth achievable
+      on the specific accelerator. The number of bytes is calculated as the number
+      of cache lines requested multiplied by the cache line size. This value does
+      not consider partial requests, so for instance, if only a single value is requested
+      in a cache line, the data movement will still be counted as a full cache line.
     Utilization: Indicates how busy the vL1D Cache RAM was during the kernel execution.
       The number of cycles where the vL1D Cache RAM is actively processing any request
       divided by the number of cycles where the vL1D is active.
@@ -42,11 +42,11 @@ Panel Config:
     Atomic Req: The total number of incoming atomic requests from the address processing
       unit after coalescing per normalization unit.
     Cache BW: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions per normalization unit. The number of bytes is calculated as the
-      number of cache lines requested multiplied by the cache line size.  This value
-      does not consider partial requests, so for instance, if only a single value
-      is requested in a cache line, the data movement will still be counted as a full
-      cache line.
+      instructions divided by total duration. The number of bytes is calculated as
+      the number of cache lines requested multiplied by the cache line size.  This
+      value does not consider partial requests, so for instance, if only a single
+      value is requested in a cache line, the data movement will still be counted
+      as a full cache line.
     Cache Hit Rate: The ratio of the number of vL1D cache line requests that hit in
       vL1D cache over the total number of cache line requests to the vL1D Cache RAM.
     Cache Accesses: The total number of cache line lookups in the vL1D.
@@ -57,7 +57,7 @@ Panel Config:
       command during the kernel's execution per normalization unit. This may be triggered
       by, for instance, the buffer_wbinvl1 instruction.
     L1-L2 BW: The number of bytes transferred across the vL1D-L2 interface as a result
-      of VMEM instructions, per normalization unit. The number of bytes is calculated
+      of VMEM instructions, divided by total duration. The number of bytes is calculated
       as the number of cache lines requested multiplied by the cache line size. This
       value does not consider partial requests, so for instance, if only a single
       value is requested in a cache line, the data movement will still be counted
@@ -128,7 +128,7 @@ Panel Config:
             / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0)
             else None))
           unit: Pct of Peak
-        Bandwidth:
+        Bandwidth Utilization:
           value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp
             - Start_Timestamp)))) / ((($max_sclk / 1000) * 64) * $cu_per_gpu))
           unit: Pct of Peak
@@ -201,10 +201,10 @@ Panel Config:
             / $denom))
           unit: (Req  + $normUnit)
         Cache BW:
-          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Cache Hit Rate:
           avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
             + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -242,12 +242,12 @@ Panel Config:
           unit: (Req + $normUnit)
         L1-L2 BW:
           avg: AVG(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-          unit: (Bytes + $normUnit)
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         L1-L2 Read:
           avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
           min: MIN((TCP_TCC_READ_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml
index 6e77eb8f93..54046c8470 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml
@@ -20,8 +20,8 @@ Panel Config:
     HBM Bandwidth: Maximum theoretical bandwidth of the accelerator's local high-bandwidth
       memory (HBM) per unit time. This value is calculated as the number of HBM channels
       multiplied by the HBM channel width multiplied by the HBM clock frequency.
-    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric per
-      normalization unit.
+    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric divided
+      by total duration.
     HBM Read Traffic: The percent of read requests generated by the L2 cache that
       are routed to the accelerator's local high-bandwidth memory (HBM). This breakdown
       does not consider the size of the request (meaning that 32B and 64B requests
@@ -42,9 +42,9 @@ Panel Config:
       as a single request), so this metric only approximates the percent of the L2-Fabric
       read bandwidth directed to an uncached memory location.
     Write and Atomic BW: The total number of bytes written by the L2 over Infinity
-      Fabric by write and atomic operations per normalization unit. Note that on current
-      CDNA accelerators, such as the MI2XX, requests are only considered atomic by
-      Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
+      Fabric by write and atomic operations divided by total duration. Note that on
+      current CDNA accelerators, such as the MI2XX, requests are only considered atomic
+      by Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
       fine-grained memory allocations or uncached memory allocations on the MI2XX.
     HBM Write and Atomic Traffic: The percent of write and atomic requests generated
       by the L2 cache that are routed to the accelerator's local high-bandwidth memory
@@ -82,17 +82,17 @@ Panel Config:
     Atomic Latency: The time-averaged number of cycles atomic requests spent in Infinity
       Fabric before a completion acknowledgement (atomic without return value) or
       data (atomic with return value) was returned to the L2.
-    Bandwidth: The number of bytes looked up in the L2 cache, per normalization unit.
+    Bandwidth: The number of bytes looked up in the L2 cache, divided by total duration.
       The number of bytes is calculated as the number of cache lines requested multiplied
       by the cache line size. This value does not consider partial requests, so for
       example, if only a single value is requested in a cache line, the data movement
       will still be counted as a full cache line.
     Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
-      per normalization unit.
+      divided by total duration.
     Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
-      per normalization unit.
+      divided by total duration.
     Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
-      per normalization unit.
+      divided by total duration.
     Req: The total number of incoming requests to the L2 from all clients for all
       request types, per normalization unit.
     Read Req: The total number of read requests to the L2 from all clients.
@@ -150,11 +150,11 @@ Panel Config:
       64B of data from any source other than the accelerator's local HBM, per normalization
       unit.
     Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
       write or atomically update 32B of data to any memory location, per normalization
       unit.
@@ -171,17 +171,17 @@ Panel Config:
       write or atomically update 32B or 64B of data in any memory location other than
       the accelerator's local HBM, per normalization unit.
     Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
-      HBM traffic, per normalization unit.
+      HBM traffic, divided by total duration.
     Atomic: The total number of L2 requests to Infinity Fabric to atomically update
       32B or 64B of data in any memory location, per normalization unit. See Request
       flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
@@ -257,12 +257,12 @@ Panel Config:
       metric:
         Read BW:
           avg: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-            * 64)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Read Traffic:
           avg: AVG((100 * (TCC_EA_RDREQ_DRAM_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
             != 0) else None))
@@ -289,12 +289,12 @@ Panel Config:
           unit: pct
         Write and Atomic BW:
           avg: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-            * 32)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 32)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Write and Atomic Traffic:
           avg: AVG((100 * (TCC_EA_WRREQ_DRAM_sum / TCC_EA_WRREQ_sum) if (TCC_EA_WRREQ_sum
             != 0) else None))
@@ -362,10 +362,10 @@ Panel Config:
         unit: Unit
       metric:
         Bandwidth:
-          avg: AVG((TCC_REQ_sum * 64) / $denom)
-          min: MIN((TCC_REQ_sum * 64) / $denom)
-          max: MAX((TCC_REQ_sum * 64) / $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG((TCC_REQ_sum * 64) / (End_Timestamp - Start_Timestamp))
+          min: MIN((TCC_REQ_sum * 64) / (End_Timestamp - Start_Timestamp))
+          max: MAX((TCC_REQ_sum * 64) / (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Req:
           avg: AVG((TCC_REQ_sum / $denom))
           min: MIN((TCC_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1200_local_data_share_lds.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1200_local_data_share_lds.yaml
index 6cfe19d9de..2718654ad4 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1200_local_data_share_lds.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1200_local_data_share_lds.yaml
@@ -11,8 +11,12 @@ Panel Config:
       instructions, averaged over the lifetime of the kernel. Calculated as the ratio
       of the total number of cycles spent by the scheduler issuing LDS instructions
       over the total CU cycles.
+    Theoretical Bandwidth Utilization: Indicates the maximum amount of bytes that
+      could have been loaded from, stored to, or atomically updated in the LDS divided
+      as percentage of theoretical peak. Does not take into account the execution
+      mask of the wavefront when the instruction was executed.
     Theoretical Bandwidth: Indicates the maximum amount of bytes that could have been
-      loaded from, stored to, or atomically updated in the LDS per normalization unit.
+      loaded from, stored to, or atomically updated in the LDS divided by total duration.
       Does not take into account the execution mask of the wavefront when the instruction
       was executed.
     Bank Conflict Rate: Indicates the percentage of active LDS cycles that were spent
@@ -58,7 +62,7 @@ Panel Config:
         Access Rate:
           value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
           unit: Pct of Peak
-        Theoretical Bandwidth:
+        Theoretical Bandwidth Utilization:
           value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
             / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
           unit: Pct of Peak
@@ -86,12 +90,12 @@ Panel Config:
           unit: (Instr  + $normUnit)
         Theoretical Bandwidth:
           avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
-          unit: (Bytes  + $normUnit)
+            / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         LDS Latency:
           avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else
             None))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1300_instruction_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1300_instruction_cache.yaml
index a53c23691f..aeda9bc6c7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1300_instruction_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1300_instruction_cache.yaml
@@ -3,15 +3,18 @@ Panel Config:
   id: 1300
   title: Instruction Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the L1I cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of L1I requests over the
-      total L1I cycles.
+    Bandwidth Utilization: The number of bytes looked up in the L1I cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of L1I requests over
+      the total L1I cycles.
     Cache Hit Rate: The percent of L1I requests that hit [#l1i-cache]_ on a previously
       loaded line the cache. Calculated as the ratio of the number of L1I requests
       that hit over the number of all L1I requests.
-    L1I-L2 Bandwidth: "The percent of the peak theoretical L1I \u2192 L2 cache request\
-      \ bandwidth achieved. Calculated as the ratio of the total number of requests\
-      \ from the L1I to the L2 cache over the total L1I-L2 interface cycles."
+    L1I-L2 Bandwidth Utilization: "The percent of the peak theoretical L1I \u2192\
+      \ L2 cache request bandwidth achieved. Calculated as the ratio of the total\
+      \ number of requests from the L1I to the L2 cache over the total L1I-L2 interface\
+      \ cycles."
+    L1I-L2 Bandwidth: Total number of bytes transferred across L1I - L2 interface
+      divided by total duration.
     Req: The total number of requests made to the L1I per normalization-unit
     Hits: The total number of L1I requests that hit on a previously loaded cache line,
       per normalization-unit.
@@ -30,7 +33,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -38,7 +41,7 @@ Panel Config:
           value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
             + SQC_ICACHE_MISSES_DUPLICATE)))
           unit: Pct of Peak
-        L1I-L2 Bandwidth:
+        L1I-L2 Bandwidth Utilization:
           value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
             * (End_Timestamp - Start_Timestamp))))
           unit: Pct of Peak
@@ -100,7 +103,7 @@ Panel Config:
         unit: Unit
       metric:
         L1I-L2 Bandwidth:
-          avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-          min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-          max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1400_scalar_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1400_scalar_l1_data_cache.yaml
index d43157ce8e..282b97ad1f 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1400_scalar_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1400_scalar_l1_data_cache.yaml
@@ -3,14 +3,17 @@ Panel Config:
   id: 1400
   title: Scalar L1 Data Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the sL1D cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of sL1D requests over the
-      total sL1D cycles.
+    Bandwidth Utilization: The number of bytes looked up in the sL1D cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of sL1D requests
+      over the total sL1D cycles.
     Cache Hit Rate: Indicates the percent of sL1D requests that hit on a previously
       loaded line the cache. The ratio of the number of sL1D requests that hit over
       the number of all sL1D requests.
+    sL1D-L2 BW Utilization: The percentage of the peak theoretical sL1D - L2 interface
+      bandwidth acheived.\ \ Caclulated as total number of bytes read from, written
+      to, or atomically updated\ \ across the sL1D - L2 interface.
     sL1D-L2 BW: "The total number of bytes read from, written to, or atomically updated\
-      \ across the sL1D\u2194L2 interface, per normalization unit. Note that sL1D\
+      \ across the sL1D\u2194L2 interface, divided by total duration. Note that sL1D\
       \ writes and atomics are typically unused on current CDNA accelerators, so in\
       \ the majority of cases this can be interpreted as an sL1D\u2192L2 read bandwidth."
     Req: The total number of requests, of any size or type, made to the sL1D per normalization
@@ -51,7 +54,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -60,7 +63,7 @@ Panel Config:
             + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
             + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
           unit: Pct of Peak
-        sL1D-L2 BW:
+        sL1D-L2 BW Utilization:
           value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
             * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
           unit: Pct of Peak
@@ -158,12 +161,12 @@ Panel Config:
       metric:
         sL1D-L2 BW:
           avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
-          unit: (Bytes + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Read Req:
           avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
           min: MIN((SQC_TC_DATA_READ_REQ / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml
index 96e021e378..50af33c21b 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml
@@ -5,12 +5,12 @@ Panel Config:
   metrics_description:
     Hit rate: The ratio of the number of vL1D cache line requests that hit in vL1D
       cache over the total number of cache line requests to the vL1D Cache RAM.
-    Bandwidth: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions, as a percent of the peak theoretical bandwidth achievable on the
-      specific accelerator. The number of bytes is calculated as the number of cache
-      lines requested multiplied by the cache line size. This value does not consider
-      partial requests, so for instance, if only a single value is requested in a
-      cache line, the data movement will still be counted as a full cache line.
+    Bandwidth Utilization: The number of bytes looked up in the vL1D cache as a result
+      of VMEM instructions, as a percent of the peak theoretical bandwidth achievable
+      on the specific accelerator. The number of bytes is calculated as the number
+      of cache lines requested multiplied by the cache line size. This value does
+      not consider partial requests, so for instance, if only a single value is requested
+      in a cache line, the data movement will still be counted as a full cache line.
     Utilization: Indicates how busy the vL1D Cache RAM was during the kernel execution.
       The number of cycles where the vL1D Cache RAM is actively processing any request
       divided by the number of cycles where the vL1D is active.
@@ -42,11 +42,11 @@ Panel Config:
     Atomic Req: The total number of incoming atomic requests from the address processing
       unit after coalescing per normalization unit.
     Cache BW: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions per normalization unit. The number of bytes is calculated as the
-      number of cache lines requested multiplied by the cache line size.  This value
-      does not consider partial requests, so for instance, if only a single value
-      is requested in a cache line, the data movement will still be counted as a full
-      cache line.
+      instructions divided by total duration. The number of bytes is calculated as
+      the number of cache lines requested multiplied by the cache line size.  This
+      value does not consider partial requests, so for instance, if only a single
+      value is requested in a cache line, the data movement will still be counted
+      as a full cache line.
     Cache Hit Rate: The ratio of the number of vL1D cache line requests that hit in
       vL1D cache over the total number of cache line requests to the vL1D Cache RAM.
     Cache Accesses: The total number of cache line lookups in the vL1D.
@@ -57,7 +57,7 @@ Panel Config:
       command during the kernel's execution per normalization unit. This may be triggered
       by, for instance, the buffer_wbinvl1 instruction.
     L1-L2 BW: The number of bytes transferred across the vL1D-L2 interface as a result
-      of VMEM instructions, per normalization unit. The number of bytes is calculated
+      of VMEM instructions, divided by total duration. The number of bytes is calculated
       as the number of cache lines requested multiplied by the cache line size. This
       value does not consider partial requests, so for instance, if only a single
       value is requested in a cache line, the data movement will still be counted
@@ -128,7 +128,7 @@ Panel Config:
             / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0)
             else None))
           unit: Pct of Peak
-        Bandwidth:
+        Bandwidth Utilization:
           value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp
             - Start_Timestamp)))) / ((($max_sclk / 1000) * 64) * $cu_per_gpu))
           unit: Pct of Peak
@@ -201,10 +201,10 @@ Panel Config:
             / $denom))
           unit: (Req  + $normUnit)
         Cache BW:
-          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Cache Hit Rate:
           avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
             + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -242,12 +242,12 @@ Panel Config:
           unit: (Req + $normUnit)
         L1-L2 BW:
           avg: AVG(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-          unit: (Bytes + $normUnit)
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         L1-L2 Read:
           avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
           min: MIN((TCP_TCC_READ_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml
index 14398e1104..8153f7363c 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml
@@ -20,8 +20,8 @@ Panel Config:
     HBM Bandwidth: Maximum theoretical bandwidth of the accelerator's local high-bandwidth
       memory (HBM) per unit time. This value is calculated as the number of HBM channels
       multiplied by the HBM channel width multiplied by the HBM clock frequency.
-    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric per
-      normalization unit.
+    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric divided
+      by total duration.
     HBM Read Traffic: The percent of read requests generated by the L2 cache that
       are routed to the accelerator's local high-bandwidth memory (HBM). This breakdown
       does not consider the size of the request (meaning that 32B and 64B requests
@@ -42,9 +42,9 @@ Panel Config:
       as a single request), so this metric only approximates the percent of the L2-Fabric
       read bandwidth directed to an uncached memory location.
     Write and Atomic BW: The total number of bytes written by the L2 over Infinity
-      Fabric by write and atomic operations per normalization unit. Note that on current
-      CDNA accelerators, such as the MI2XX, requests are only considered atomic by
-      Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
+      Fabric by write and atomic operations divided by total duration. Note that on
+      current CDNA accelerators, such as the MI2XX, requests are only considered atomic
+      by Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
       fine-grained memory allocations or uncached memory allocations on the MI2XX.
     HBM Write and Atomic Traffic: The percent of write and atomic requests generated
       by the L2 cache that are routed to the accelerator's local high-bandwidth memory
@@ -82,17 +82,17 @@ Panel Config:
     Atomic Latency: The time-averaged number of cycles atomic requests spent in Infinity
       Fabric before a completion acknowledgement (atomic without return value) or
       data (atomic with return value) was returned to the L2.
-    Bandwidth: The number of bytes looked up in the L2 cache, per normalization unit.
+    Bandwidth: The number of bytes looked up in the L2 cache, divided by total duration.
       The number of bytes is calculated as the number of cache lines requested multiplied
       by the cache line size. This value does not consider partial requests, so for
       example, if only a single value is requested in a cache line, the data movement
       will still be counted as a full cache line.
     Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
-      per normalization unit.
+      divided by total duration.
     Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
-      per normalization unit.
+      divided by total duration.
     Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
-      per normalization unit.
+      divided by total duration.
     Req: The total number of incoming requests to the L2 from all clients for all
       request types, per normalization unit.
     Read Req: The total number of read requests to the L2 from all clients.
@@ -150,11 +150,11 @@ Panel Config:
       64B of data from any source other than the accelerator's local HBM, per normalization
       unit.
     Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
       write or atomically update 32B of data to any memory location, per normalization
       unit.
@@ -171,17 +171,17 @@ Panel Config:
       write or atomically update 32B or 64B of data in any memory location other than
       the accelerator's local HBM, per normalization unit.
     Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
-      HBM traffic, per normalization unit.
+      HBM traffic, divided by total duration.
     Atomic: The total number of L2 requests to Infinity Fabric to atomically update
       32B or 64B of data in any memory location, per normalization unit. See Request
       flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
@@ -257,12 +257,12 @@ Panel Config:
       metric:
         Read BW:
           avg: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-            * 64)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Read Traffic:
           avg: AVG((100 * (TCC_EA_RDREQ_DRAM_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
             != 0) else None))
@@ -289,12 +289,12 @@ Panel Config:
           unit: pct
         Write and Atomic BW:
           avg: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-            * 32)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 32)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Write and Atomic Traffic:
           avg: AVG((100 * (TCC_EA_WRREQ_DRAM_sum / TCC_EA_WRREQ_sum) if (TCC_EA_WRREQ_sum
             != 0) else None))
@@ -362,10 +362,10 @@ Panel Config:
         unit: Unit
       metric:
         Bandwidth:
-          avg: AVG((TCC_REQ_sum * 128) / $denom)
-          min: MIN((TCC_REQ_sum * 128) / $denom)
-          max: MAX((TCC_REQ_sum * 128) / $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Req:
           avg: AVG((TCC_REQ_sum / $denom))
           min: MIN((TCC_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1200_local_data_share_lds.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1200_local_data_share_lds.yaml
index c1a8525348..2718654ad4 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1200_local_data_share_lds.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1200_local_data_share_lds.yaml
@@ -11,8 +11,12 @@ Panel Config:
       instructions, averaged over the lifetime of the kernel. Calculated as the ratio
       of the total number of cycles spent by the scheduler issuing LDS instructions
       over the total CU cycles.
+    Theoretical Bandwidth Utilization: Indicates the maximum amount of bytes that
+      could have been loaded from, stored to, or atomically updated in the LDS divided
+      as percentage of theoretical peak. Does not take into account the execution
+      mask of the wavefront when the instruction was executed.
     Theoretical Bandwidth: Indicates the maximum amount of bytes that could have been
-      loaded from, stored to, or atomically updated in the LDS per normalization unit.
+      loaded from, stored to, or atomically updated in the LDS divided by total duration.
       Does not take into account the execution mask of the wavefront when the instruction
       was executed.
     Bank Conflict Rate: Indicates the percentage of active LDS cycles that were spent
@@ -58,7 +62,7 @@ Panel Config:
         Access Rate:
           value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
           unit: Pct of Peak
-        Theoretical Bandwidth (% of Peak):
+        Theoretical Bandwidth Utilization:
           value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
             / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
           unit: Pct of Peak
@@ -86,12 +90,12 @@ Panel Config:
           unit: (Instr  + $normUnit)
         Theoretical Bandwidth:
           avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
-          unit: (Bytes  + $normUnit)
+            / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         LDS Latency:
           avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else
             None))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1300_instruction_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1300_instruction_cache.yaml
index a53c23691f..aeda9bc6c7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1300_instruction_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1300_instruction_cache.yaml
@@ -3,15 +3,18 @@ Panel Config:
   id: 1300
   title: Instruction Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the L1I cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of L1I requests over the
-      total L1I cycles.
+    Bandwidth Utilization: The number of bytes looked up in the L1I cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of L1I requests over
+      the total L1I cycles.
     Cache Hit Rate: The percent of L1I requests that hit [#l1i-cache]_ on a previously
       loaded line the cache. Calculated as the ratio of the number of L1I requests
       that hit over the number of all L1I requests.
-    L1I-L2 Bandwidth: "The percent of the peak theoretical L1I \u2192 L2 cache request\
-      \ bandwidth achieved. Calculated as the ratio of the total number of requests\
-      \ from the L1I to the L2 cache over the total L1I-L2 interface cycles."
+    L1I-L2 Bandwidth Utilization: "The percent of the peak theoretical L1I \u2192\
+      \ L2 cache request bandwidth achieved. Calculated as the ratio of the total\
+      \ number of requests from the L1I to the L2 cache over the total L1I-L2 interface\
+      \ cycles."
+    L1I-L2 Bandwidth: Total number of bytes transferred across L1I - L2 interface
+      divided by total duration.
     Req: The total number of requests made to the L1I per normalization-unit
     Hits: The total number of L1I requests that hit on a previously loaded cache line,
       per normalization-unit.
@@ -30,7 +33,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -38,7 +41,7 @@ Panel Config:
           value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
             + SQC_ICACHE_MISSES_DUPLICATE)))
           unit: Pct of Peak
-        L1I-L2 Bandwidth:
+        L1I-L2 Bandwidth Utilization:
           value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
             * (End_Timestamp - Start_Timestamp))))
           unit: Pct of Peak
@@ -100,7 +103,7 @@ Panel Config:
         unit: Unit
       metric:
         L1I-L2 Bandwidth:
-          avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-          min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-          max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml
index d43157ce8e..282b97ad1f 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml
@@ -3,14 +3,17 @@ Panel Config:
   id: 1400
   title: Scalar L1 Data Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the sL1D cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of sL1D requests over the
-      total sL1D cycles.
+    Bandwidth Utilization: The number of bytes looked up in the sL1D cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of sL1D requests
+      over the total sL1D cycles.
     Cache Hit Rate: Indicates the percent of sL1D requests that hit on a previously
       loaded line the cache. The ratio of the number of sL1D requests that hit over
       the number of all sL1D requests.
+    sL1D-L2 BW Utilization: The percentage of the peak theoretical sL1D - L2 interface
+      bandwidth acheived.\ \ Caclulated as total number of bytes read from, written
+      to, or atomically updated\ \ across the sL1D - L2 interface.
     sL1D-L2 BW: "The total number of bytes read from, written to, or atomically updated\
-      \ across the sL1D\u2194L2 interface, per normalization unit. Note that sL1D\
+      \ across the sL1D\u2194L2 interface, divided by total duration. Note that sL1D\
       \ writes and atomics are typically unused on current CDNA accelerators, so in\
       \ the majority of cases this can be interpreted as an sL1D\u2192L2 read bandwidth."
     Req: The total number of requests, of any size or type, made to the sL1D per normalization
@@ -51,7 +54,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -60,7 +63,7 @@ Panel Config:
             + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
             + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
           unit: Pct of Peak
-        sL1D-L2 BW:
+        sL1D-L2 BW Utilization:
           value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
             * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
           unit: Pct of Peak
@@ -158,12 +161,12 @@ Panel Config:
       metric:
         sL1D-L2 BW:
           avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
-          unit: (Bytes + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Read Req:
           avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
           min: MIN((SQC_TC_DATA_READ_REQ / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml
index 708bbafe14..db745209b7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml
@@ -5,12 +5,12 @@ Panel Config:
   metrics_description:
     Hit rate: The ratio of the number of vL1D cache line requests that hit in vL1D
       cache over the total number of cache line requests to the vL1D Cache RAM.
-    Bandwidth: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions, as a percent of the peak theoretical bandwidth achievable on the
-      specific accelerator. The number of bytes is calculated as the number of cache
-      lines requested multiplied by the cache line size. This value does not consider
-      partial requests, so for instance, if only a single value is requested in a
-      cache line, the data movement will still be counted as a full cache line.
+    Bandwidth Utilization: The number of bytes looked up in the vL1D cache as a result
+      of VMEM instructions, as a percent of the peak theoretical bandwidth achievable
+      on the specific accelerator. The number of bytes is calculated as the number
+      of cache lines requested multiplied by the cache line size. This value does
+      not consider partial requests, so for instance, if only a single value is requested
+      in a cache line, the data movement will still be counted as a full cache line.
     Utilization: Indicates how busy the vL1D Cache RAM was during the kernel execution.
       The number of cycles where the vL1D Cache RAM is actively processing any request
       divided by the number of cycles where the vL1D is active.
@@ -42,11 +42,11 @@ Panel Config:
     Atomic Req: The total number of incoming atomic requests from the address processing
       unit after coalescing per normalization unit.
     Cache BW: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions per normalization unit. The number of bytes is calculated as the
-      number of cache lines requested multiplied by the cache line size.  This value
-      does not consider partial requests, so for instance, if only a single value
-      is requested in a cache line, the data movement will still be counted as a full
-      cache line.
+      instructions divided by total duration. The number of bytes is calculated as
+      the number of cache lines requested multiplied by the cache line size.  This
+      value does not consider partial requests, so for instance, if only a single
+      value is requested in a cache line, the data movement will still be counted
+      as a full cache line.
     Cache Hit Rate: The ratio of the number of vL1D cache line requests that hit in
       vL1D cache over the total number of cache line requests to the vL1D Cache RAM.
     Cache Accesses: The total number of cache line lookups in the vL1D.
@@ -57,7 +57,7 @@ Panel Config:
       command during the kernel's execution per normalization unit. This may be triggered
       by, for instance, the buffer_wbinvl1 instruction.
     L1-L2 BW: The number of bytes transferred across the vL1D-L2 interface as a result
-      of VMEM instructions, per normalization unit. The number of bytes is calculated
+      of VMEM instructions, divided by total duration. The number of bytes is calculated
       as the number of cache lines requested multiplied by the cache line size. This
       value does not consider partial requests, so for instance, if only a single
       value is requested in a cache line, the data movement will still be counted
@@ -128,7 +128,7 @@ Panel Config:
             / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0)
             else None))
           unit: Pct of Peak
-        Bandwidth:
+        Bandwidth Utilization:
           value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
             - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
           unit: Pct of Peak
@@ -201,10 +201,10 @@ Panel Config:
             / $denom))
           unit: (Req  + $normUnit)
         Cache BW:
-          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Cache Hit Rate:
           avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
             + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -242,12 +242,12 @@ Panel Config:
           unit: (Req + $normUnit)
         L1-L2 BW:
           avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-          unit: (Bytes + $normUnit)
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         L1-L2 Read:
           avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
           min: MIN((TCP_TCC_READ_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml
index 36d5943858..74c12857e0 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml
@@ -20,8 +20,8 @@ Panel Config:
     HBM Bandwidth: Maximum theoretical bandwidth of the accelerator's local high-bandwidth
       memory (HBM) per unit time. This value is calculated as the number of HBM channels
       multiplied by the HBM channel width multiplied by the HBM clock frequency.
-    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric per
-      normalization unit.
+    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric divided
+      by total duration.
     HBM Read Traffic: The percent of read requests generated by the L2 cache that
       are routed to the accelerator's local high-bandwidth memory (HBM). This breakdown
       does not consider the size of the request (meaning that 32B and 64B requests
@@ -42,9 +42,9 @@ Panel Config:
       as a single request), so this metric only approximates the percent of the L2-Fabric
       read bandwidth directed to an uncached memory location.
     Write and Atomic BW: The total number of bytes written by the L2 over Infinity
-      Fabric by write and atomic operations per normalization unit. Note that on current
-      CDNA accelerators, such as the MI2XX, requests are only considered atomic by
-      Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
+      Fabric by write and atomic operations divided by total duration. Note that on
+      current CDNA accelerators, such as the MI2XX, requests are only considered atomic
+      by Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
       fine-grained memory allocations or uncached memory allocations on the MI2XX.
     HBM Write and Atomic Traffic: The percent of write and atomic requests generated
       by the L2 cache that are routed to the accelerator's local high-bandwidth memory
@@ -82,17 +82,17 @@ Panel Config:
     Atomic Latency: The time-averaged number of cycles atomic requests spent in Infinity
       Fabric before a completion acknowledgement (atomic without return value) or
       data (atomic with return value) was returned to the L2.
-    Bandwidth: The number of bytes looked up in the L2 cache, per normalization unit.
+    Bandwidth: The number of bytes looked up in the L2 cache, divided by total duration.
       The number of bytes is calculated as the number of cache lines requested multiplied
       by the cache line size. This value does not consider partial requests, so for
       example, if only a single value is requested in a cache line, the data movement
       will still be counted as a full cache line.
     Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
-      per normalization unit.
+      divided by total duration.
     Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
-      per normalization unit.
+      divided by total duration.
     Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
-      per normalization unit.
+      divided by total duration.
     Req: The total number of incoming requests to the L2 from all clients for all
       request types, per normalization unit.
     Read Req: The total number of read requests to the L2 from all clients.
@@ -150,11 +150,11 @@ Panel Config:
       64B of data from any source other than the accelerator's local HBM, per normalization
       unit.
     Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
       write or atomically update 32B of data to any memory location, per normalization
       unit.
@@ -171,17 +171,17 @@ Panel Config:
       write or atomically update 32B or 64B of data in any memory location other than
       the accelerator's local HBM, per normalization unit.
     Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
-      HBM traffic, per normalization unit.
+      HBM traffic, divided by total duration.
     Atomic: The total number of L2 requests to Infinity Fabric to atomically update
       32B or 64B of data in any memory location, per normalization unit. See Request
       flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
@@ -257,12 +257,12 @@ Panel Config:
       metric:
         Read BW:
           avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-            * 64)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Read Traffic:
           avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
             != 0) else None))
@@ -289,12 +289,12 @@ Panel Config:
           unit: pct
         Write and Atomic BW:
           avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 32)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Write and Atomic Traffic:
           avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
             != 0) else None))
@@ -362,10 +362,10 @@ Panel Config:
         unit: Unit
       metric:
         Bandwidth:
-          avg: AVG((TCC_REQ_sum * 128) / $denom)
-          min: MIN((TCC_REQ_sum * 128) / $denom)
-          max: MAX((TCC_REQ_sum * 128) / $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Req:
           avg: AVG((TCC_REQ_sum / $denom))
           min: MIN((TCC_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1200_local_data_share_lds.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1200_local_data_share_lds.yaml
index c1a8525348..2718654ad4 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1200_local_data_share_lds.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1200_local_data_share_lds.yaml
@@ -11,8 +11,12 @@ Panel Config:
       instructions, averaged over the lifetime of the kernel. Calculated as the ratio
       of the total number of cycles spent by the scheduler issuing LDS instructions
       over the total CU cycles.
+    Theoretical Bandwidth Utilization: Indicates the maximum amount of bytes that
+      could have been loaded from, stored to, or atomically updated in the LDS divided
+      as percentage of theoretical peak. Does not take into account the execution
+      mask of the wavefront when the instruction was executed.
     Theoretical Bandwidth: Indicates the maximum amount of bytes that could have been
-      loaded from, stored to, or atomically updated in the LDS per normalization unit.
+      loaded from, stored to, or atomically updated in the LDS divided by total duration.
       Does not take into account the execution mask of the wavefront when the instruction
       was executed.
     Bank Conflict Rate: Indicates the percentage of active LDS cycles that were spent
@@ -58,7 +62,7 @@ Panel Config:
         Access Rate:
           value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
           unit: Pct of Peak
-        Theoretical Bandwidth (% of Peak):
+        Theoretical Bandwidth Utilization:
           value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
             / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
           unit: Pct of Peak
@@ -86,12 +90,12 @@ Panel Config:
           unit: (Instr  + $normUnit)
         Theoretical Bandwidth:
           avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
-          unit: (Bytes  + $normUnit)
+            / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         LDS Latency:
           avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else
             None))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1300_instruction_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1300_instruction_cache.yaml
index a53c23691f..aeda9bc6c7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1300_instruction_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1300_instruction_cache.yaml
@@ -3,15 +3,18 @@ Panel Config:
   id: 1300
   title: Instruction Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the L1I cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of L1I requests over the
-      total L1I cycles.
+    Bandwidth Utilization: The number of bytes looked up in the L1I cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of L1I requests over
+      the total L1I cycles.
     Cache Hit Rate: The percent of L1I requests that hit [#l1i-cache]_ on a previously
       loaded line the cache. Calculated as the ratio of the number of L1I requests
       that hit over the number of all L1I requests.
-    L1I-L2 Bandwidth: "The percent of the peak theoretical L1I \u2192 L2 cache request\
-      \ bandwidth achieved. Calculated as the ratio of the total number of requests\
-      \ from the L1I to the L2 cache over the total L1I-L2 interface cycles."
+    L1I-L2 Bandwidth Utilization: "The percent of the peak theoretical L1I \u2192\
+      \ L2 cache request bandwidth achieved. Calculated as the ratio of the total\
+      \ number of requests from the L1I to the L2 cache over the total L1I-L2 interface\
+      \ cycles."
+    L1I-L2 Bandwidth: Total number of bytes transferred across L1I - L2 interface
+      divided by total duration.
     Req: The total number of requests made to the L1I per normalization-unit
     Hits: The total number of L1I requests that hit on a previously loaded cache line,
       per normalization-unit.
@@ -30,7 +33,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -38,7 +41,7 @@ Panel Config:
           value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
             + SQC_ICACHE_MISSES_DUPLICATE)))
           unit: Pct of Peak
-        L1I-L2 Bandwidth:
+        L1I-L2 Bandwidth Utilization:
           value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
             * (End_Timestamp - Start_Timestamp))))
           unit: Pct of Peak
@@ -100,7 +103,7 @@ Panel Config:
         unit: Unit
       metric:
         L1I-L2 Bandwidth:
-          avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-          min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-          max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml
index d43157ce8e..282b97ad1f 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml
@@ -3,14 +3,17 @@ Panel Config:
   id: 1400
   title: Scalar L1 Data Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the sL1D cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of sL1D requests over the
-      total sL1D cycles.
+    Bandwidth Utilization: The number of bytes looked up in the sL1D cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of sL1D requests
+      over the total sL1D cycles.
     Cache Hit Rate: Indicates the percent of sL1D requests that hit on a previously
       loaded line the cache. The ratio of the number of sL1D requests that hit over
       the number of all sL1D requests.
+    sL1D-L2 BW Utilization: The percentage of the peak theoretical sL1D - L2 interface
+      bandwidth acheived.\ \ Caclulated as total number of bytes read from, written
+      to, or atomically updated\ \ across the sL1D - L2 interface.
     sL1D-L2 BW: "The total number of bytes read from, written to, or atomically updated\
-      \ across the sL1D\u2194L2 interface, per normalization unit. Note that sL1D\
+      \ across the sL1D\u2194L2 interface, divided by total duration. Note that sL1D\
       \ writes and atomics are typically unused on current CDNA accelerators, so in\
       \ the majority of cases this can be interpreted as an sL1D\u2192L2 read bandwidth."
     Req: The total number of requests, of any size or type, made to the sL1D per normalization
@@ -51,7 +54,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -60,7 +63,7 @@ Panel Config:
             + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
             + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
           unit: Pct of Peak
-        sL1D-L2 BW:
+        sL1D-L2 BW Utilization:
           value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
             * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
           unit: Pct of Peak
@@ -158,12 +161,12 @@ Panel Config:
       metric:
         sL1D-L2 BW:
           avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
-          unit: (Bytes + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Read Req:
           avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
           min: MIN((SQC_TC_DATA_READ_REQ / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml
index 708bbafe14..db745209b7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml
@@ -5,12 +5,12 @@ Panel Config:
   metrics_description:
     Hit rate: The ratio of the number of vL1D cache line requests that hit in vL1D
       cache over the total number of cache line requests to the vL1D Cache RAM.
-    Bandwidth: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions, as a percent of the peak theoretical bandwidth achievable on the
-      specific accelerator. The number of bytes is calculated as the number of cache
-      lines requested multiplied by the cache line size. This value does not consider
-      partial requests, so for instance, if only a single value is requested in a
-      cache line, the data movement will still be counted as a full cache line.
+    Bandwidth Utilization: The number of bytes looked up in the vL1D cache as a result
+      of VMEM instructions, as a percent of the peak theoretical bandwidth achievable
+      on the specific accelerator. The number of bytes is calculated as the number
+      of cache lines requested multiplied by the cache line size. This value does
+      not consider partial requests, so for instance, if only a single value is requested
+      in a cache line, the data movement will still be counted as a full cache line.
     Utilization: Indicates how busy the vL1D Cache RAM was during the kernel execution.
       The number of cycles where the vL1D Cache RAM is actively processing any request
       divided by the number of cycles where the vL1D is active.
@@ -42,11 +42,11 @@ Panel Config:
     Atomic Req: The total number of incoming atomic requests from the address processing
       unit after coalescing per normalization unit.
     Cache BW: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions per normalization unit. The number of bytes is calculated as the
-      number of cache lines requested multiplied by the cache line size.  This value
-      does not consider partial requests, so for instance, if only a single value
-      is requested in a cache line, the data movement will still be counted as a full
-      cache line.
+      instructions divided by total duration. The number of bytes is calculated as
+      the number of cache lines requested multiplied by the cache line size.  This
+      value does not consider partial requests, so for instance, if only a single
+      value is requested in a cache line, the data movement will still be counted
+      as a full cache line.
     Cache Hit Rate: The ratio of the number of vL1D cache line requests that hit in
       vL1D cache over the total number of cache line requests to the vL1D Cache RAM.
     Cache Accesses: The total number of cache line lookups in the vL1D.
@@ -57,7 +57,7 @@ Panel Config:
       command during the kernel's execution per normalization unit. This may be triggered
       by, for instance, the buffer_wbinvl1 instruction.
     L1-L2 BW: The number of bytes transferred across the vL1D-L2 interface as a result
-      of VMEM instructions, per normalization unit. The number of bytes is calculated
+      of VMEM instructions, divided by total duration. The number of bytes is calculated
       as the number of cache lines requested multiplied by the cache line size. This
       value does not consider partial requests, so for instance, if only a single
       value is requested in a cache line, the data movement will still be counted
@@ -128,7 +128,7 @@ Panel Config:
             / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0)
             else None))
           unit: Pct of Peak
-        Bandwidth:
+        Bandwidth Utilization:
           value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
             - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
           unit: Pct of Peak
@@ -201,10 +201,10 @@ Panel Config:
             / $denom))
           unit: (Req  + $normUnit)
         Cache BW:
-          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Cache Hit Rate:
           avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
             + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -242,12 +242,12 @@ Panel Config:
           unit: (Req + $normUnit)
         L1-L2 BW:
           avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-          unit: (Bytes + $normUnit)
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         L1-L2 Read:
           avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
           min: MIN((TCP_TCC_READ_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml
index e7acf40a5c..f0aefff869 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml
@@ -20,8 +20,8 @@ Panel Config:
     HBM Bandwidth: Maximum theoretical bandwidth of the accelerator's local high-bandwidth
       memory (HBM) per unit time. This value is calculated as the number of HBM channels
       multiplied by the HBM channel width multiplied by the HBM clock frequency.
-    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric per
-      normalization unit.
+    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric divided
+      by total duration.
     HBM Read Traffic: The percent of read requests generated by the L2 cache that
       are routed to the accelerator's local high-bandwidth memory (HBM). This breakdown
       does not consider the size of the request (meaning that 32B and 64B requests
@@ -42,9 +42,9 @@ Panel Config:
       as a single request), so this metric only approximates the percent of the L2-Fabric
       read bandwidth directed to an uncached memory location.
     Write and Atomic BW: The total number of bytes written by the L2 over Infinity
-      Fabric by write and atomic operations per normalization unit. Note that on current
-      CDNA accelerators, such as the MI2XX, requests are only considered atomic by
-      Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
+      Fabric by write and atomic operations divided by total duration. Note that on
+      current CDNA accelerators, such as the MI2XX, requests are only considered atomic
+      by Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
       fine-grained memory allocations or uncached memory allocations on the MI2XX.
     HBM Write and Atomic Traffic: The percent of write and atomic requests generated
       by the L2 cache that are routed to the accelerator's local high-bandwidth memory
@@ -82,17 +82,17 @@ Panel Config:
     Atomic Latency: The time-averaged number of cycles atomic requests spent in Infinity
       Fabric before a completion acknowledgement (atomic without return value) or
       data (atomic with return value) was returned to the L2.
-    Bandwidth: The number of bytes looked up in the L2 cache, per normalization unit.
+    Bandwidth: The number of bytes looked up in the L2 cache, divided by total duration.
       The number of bytes is calculated as the number of cache lines requested multiplied
       by the cache line size. This value does not consider partial requests, so for
       example, if only a single value is requested in a cache line, the data movement
       will still be counted as a full cache line.
     Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
-      per normalization unit.
+      divided by total duration.
     Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
-      per normalization unit.
+      divided by total duration.
     Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
-      per normalization unit.
+      divided by total duration.
     Req: The total number of incoming requests to the L2 from all clients for all
       request types, per normalization unit.
     Read Req: The total number of read requests to the L2 from all clients.
@@ -150,11 +150,11 @@ Panel Config:
       64B of data from any source other than the accelerator's local HBM, per normalization
       unit.
     Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
       write or atomically update 32B of data to any memory location, per normalization
       unit.
@@ -171,17 +171,17 @@ Panel Config:
       write or atomically update 32B or 64B of data in any memory location other than
       the accelerator's local HBM, per normalization unit.
     Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
-      HBM traffic, per normalization unit.
+      HBM traffic, divided by total duration.
     Atomic: The total number of L2 requests to Infinity Fabric to atomically update
       32B or 64B of data in any memory location, per normalization unit. See Request
       flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
@@ -257,12 +257,12 @@ Panel Config:
       metric:
         Read BW:
           avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-            * 64)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Read Traffic:
           avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
             != 0) else None))
@@ -289,12 +289,12 @@ Panel Config:
           unit: pct
         Write and Atomic BW:
           avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 32)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Write and Atomic Traffic:
           avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
             != 0) else None))
@@ -362,10 +362,10 @@ Panel Config:
         unit: Unit
       metric:
         Bandwidth:
-          avg: AVG((TCC_REQ_sum * 128) / $denom)
-          min: MIN((TCC_REQ_sum * 128) / $denom)
-          max: MAX((TCC_REQ_sum * 128) / $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Req:
           avg: AVG((TCC_REQ_sum / $denom))
           min: MIN((TCC_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1200_local_data_share_lds.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1200_local_data_share_lds.yaml
index c1a8525348..2718654ad4 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1200_local_data_share_lds.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1200_local_data_share_lds.yaml
@@ -11,8 +11,12 @@ Panel Config:
       instructions, averaged over the lifetime of the kernel. Calculated as the ratio
       of the total number of cycles spent by the scheduler issuing LDS instructions
       over the total CU cycles.
+    Theoretical Bandwidth Utilization: Indicates the maximum amount of bytes that
+      could have been loaded from, stored to, or atomically updated in the LDS divided
+      as percentage of theoretical peak. Does not take into account the execution
+      mask of the wavefront when the instruction was executed.
     Theoretical Bandwidth: Indicates the maximum amount of bytes that could have been
-      loaded from, stored to, or atomically updated in the LDS per normalization unit.
+      loaded from, stored to, or atomically updated in the LDS divided by total duration.
       Does not take into account the execution mask of the wavefront when the instruction
       was executed.
     Bank Conflict Rate: Indicates the percentage of active LDS cycles that were spent
@@ -58,7 +62,7 @@ Panel Config:
         Access Rate:
           value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
           unit: Pct of Peak
-        Theoretical Bandwidth (% of Peak):
+        Theoretical Bandwidth Utilization:
           value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
             / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
           unit: Pct of Peak
@@ -86,12 +90,12 @@ Panel Config:
           unit: (Instr  + $normUnit)
         Theoretical Bandwidth:
           avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
-          unit: (Bytes  + $normUnit)
+            / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         LDS Latency:
           avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else
             None))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1300_instruction_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1300_instruction_cache.yaml
index a53c23691f..aeda9bc6c7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1300_instruction_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1300_instruction_cache.yaml
@@ -3,15 +3,18 @@ Panel Config:
   id: 1300
   title: Instruction Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the L1I cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of L1I requests over the
-      total L1I cycles.
+    Bandwidth Utilization: The number of bytes looked up in the L1I cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of L1I requests over
+      the total L1I cycles.
     Cache Hit Rate: The percent of L1I requests that hit [#l1i-cache]_ on a previously
       loaded line the cache. Calculated as the ratio of the number of L1I requests
       that hit over the number of all L1I requests.
-    L1I-L2 Bandwidth: "The percent of the peak theoretical L1I \u2192 L2 cache request\
-      \ bandwidth achieved. Calculated as the ratio of the total number of requests\
-      \ from the L1I to the L2 cache over the total L1I-L2 interface cycles."
+    L1I-L2 Bandwidth Utilization: "The percent of the peak theoretical L1I \u2192\
+      \ L2 cache request bandwidth achieved. Calculated as the ratio of the total\
+      \ number of requests from the L1I to the L2 cache over the total L1I-L2 interface\
+      \ cycles."
+    L1I-L2 Bandwidth: Total number of bytes transferred across L1I - L2 interface
+      divided by total duration.
     Req: The total number of requests made to the L1I per normalization-unit
     Hits: The total number of L1I requests that hit on a previously loaded cache line,
       per normalization-unit.
@@ -30,7 +33,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -38,7 +41,7 @@ Panel Config:
           value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
             + SQC_ICACHE_MISSES_DUPLICATE)))
           unit: Pct of Peak
-        L1I-L2 Bandwidth:
+        L1I-L2 Bandwidth Utilization:
           value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
             * (End_Timestamp - Start_Timestamp))))
           unit: Pct of Peak
@@ -100,7 +103,7 @@ Panel Config:
         unit: Unit
       metric:
         L1I-L2 Bandwidth:
-          avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-          min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-          max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml
index d43157ce8e..282b97ad1f 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml
@@ -3,14 +3,17 @@ Panel Config:
   id: 1400
   title: Scalar L1 Data Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the sL1D cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of sL1D requests over the
-      total sL1D cycles.
+    Bandwidth Utilization: The number of bytes looked up in the sL1D cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of sL1D requests
+      over the total sL1D cycles.
     Cache Hit Rate: Indicates the percent of sL1D requests that hit on a previously
       loaded line the cache. The ratio of the number of sL1D requests that hit over
       the number of all sL1D requests.
+    sL1D-L2 BW Utilization: The percentage of the peak theoretical sL1D - L2 interface
+      bandwidth acheived.\ \ Caclulated as total number of bytes read from, written
+      to, or atomically updated\ \ across the sL1D - L2 interface.
     sL1D-L2 BW: "The total number of bytes read from, written to, or atomically updated\
-      \ across the sL1D\u2194L2 interface, per normalization unit. Note that sL1D\
+      \ across the sL1D\u2194L2 interface, divided by total duration. Note that sL1D\
       \ writes and atomics are typically unused on current CDNA accelerators, so in\
       \ the majority of cases this can be interpreted as an sL1D\u2192L2 read bandwidth."
     Req: The total number of requests, of any size or type, made to the sL1D per normalization
@@ -51,7 +54,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -60,7 +63,7 @@ Panel Config:
             + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
             + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
           unit: Pct of Peak
-        sL1D-L2 BW:
+        sL1D-L2 BW Utilization:
           value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
             * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
           unit: Pct of Peak
@@ -158,12 +161,12 @@ Panel Config:
       metric:
         sL1D-L2 BW:
           avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
-          unit: (Bytes + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Read Req:
           avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
           min: MIN((SQC_TC_DATA_READ_REQ / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml
index 708bbafe14..db745209b7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml
@@ -5,12 +5,12 @@ Panel Config:
   metrics_description:
     Hit rate: The ratio of the number of vL1D cache line requests that hit in vL1D
       cache over the total number of cache line requests to the vL1D Cache RAM.
-    Bandwidth: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions, as a percent of the peak theoretical bandwidth achievable on the
-      specific accelerator. The number of bytes is calculated as the number of cache
-      lines requested multiplied by the cache line size. This value does not consider
-      partial requests, so for instance, if only a single value is requested in a
-      cache line, the data movement will still be counted as a full cache line.
+    Bandwidth Utilization: The number of bytes looked up in the vL1D cache as a result
+      of VMEM instructions, as a percent of the peak theoretical bandwidth achievable
+      on the specific accelerator. The number of bytes is calculated as the number
+      of cache lines requested multiplied by the cache line size. This value does
+      not consider partial requests, so for instance, if only a single value is requested
+      in a cache line, the data movement will still be counted as a full cache line.
     Utilization: Indicates how busy the vL1D Cache RAM was during the kernel execution.
       The number of cycles where the vL1D Cache RAM is actively processing any request
       divided by the number of cycles where the vL1D is active.
@@ -42,11 +42,11 @@ Panel Config:
     Atomic Req: The total number of incoming atomic requests from the address processing
       unit after coalescing per normalization unit.
     Cache BW: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions per normalization unit. The number of bytes is calculated as the
-      number of cache lines requested multiplied by the cache line size.  This value
-      does not consider partial requests, so for instance, if only a single value
-      is requested in a cache line, the data movement will still be counted as a full
-      cache line.
+      instructions divided by total duration. The number of bytes is calculated as
+      the number of cache lines requested multiplied by the cache line size.  This
+      value does not consider partial requests, so for instance, if only a single
+      value is requested in a cache line, the data movement will still be counted
+      as a full cache line.
     Cache Hit Rate: The ratio of the number of vL1D cache line requests that hit in
       vL1D cache over the total number of cache line requests to the vL1D Cache RAM.
     Cache Accesses: The total number of cache line lookups in the vL1D.
@@ -57,7 +57,7 @@ Panel Config:
       command during the kernel's execution per normalization unit. This may be triggered
       by, for instance, the buffer_wbinvl1 instruction.
     L1-L2 BW: The number of bytes transferred across the vL1D-L2 interface as a result
-      of VMEM instructions, per normalization unit. The number of bytes is calculated
+      of VMEM instructions, divided by total duration. The number of bytes is calculated
       as the number of cache lines requested multiplied by the cache line size. This
       value does not consider partial requests, so for instance, if only a single
       value is requested in a cache line, the data movement will still be counted
@@ -128,7 +128,7 @@ Panel Config:
             / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0)
             else None))
           unit: Pct of Peak
-        Bandwidth:
+        Bandwidth Utilization:
           value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
             - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
           unit: Pct of Peak
@@ -201,10 +201,10 @@ Panel Config:
             / $denom))
           unit: (Req  + $normUnit)
         Cache BW:
-          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Cache Hit Rate:
           avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
             + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -242,12 +242,12 @@ Panel Config:
           unit: (Req + $normUnit)
         L1-L2 BW:
           avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-          unit: (Bytes + $normUnit)
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         L1-L2 Read:
           avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
           min: MIN((TCP_TCC_READ_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml
index 0a72362ea7..efff4769b6 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml
@@ -20,8 +20,8 @@ Panel Config:
     HBM Bandwidth: Maximum theoretical bandwidth of the accelerator's local high-bandwidth
       memory (HBM) per unit time. This value is calculated as the number of HBM channels
       multiplied by the HBM channel width multiplied by the HBM clock frequency.
-    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric per
-      normalization unit.
+    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric divided
+      by total duration.
     HBM Read Traffic: The percent of read requests generated by the L2 cache that
       are routed to the accelerator's local high-bandwidth memory (HBM). This breakdown
       does not consider the size of the request (meaning that 32B and 64B requests
@@ -42,9 +42,9 @@ Panel Config:
       as a single request), so this metric only approximates the percent of the L2-Fabric
       read bandwidth directed to an uncached memory location.
     Write and Atomic BW: The total number of bytes written by the L2 over Infinity
-      Fabric by write and atomic operations per normalization unit. Note that on current
-      CDNA accelerators, such as the MI2XX, requests are only considered atomic by
-      Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
+      Fabric by write and atomic operations divided by total duration. Note that on
+      current CDNA accelerators, such as the MI2XX, requests are only considered atomic
+      by Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
       fine-grained memory allocations or uncached memory allocations on the MI2XX.
     HBM Write and Atomic Traffic: The percent of write and atomic requests generated
       by the L2 cache that are routed to the accelerator's local high-bandwidth memory
@@ -82,17 +82,17 @@ Panel Config:
     Atomic Latency: The time-averaged number of cycles atomic requests spent in Infinity
       Fabric before a completion acknowledgement (atomic without return value) or
       data (atomic with return value) was returned to the L2.
-    Bandwidth: The number of bytes looked up in the L2 cache, per normalization unit.
+    Bandwidth: The number of bytes looked up in the L2 cache, divided by total duration.
       The number of bytes is calculated as the number of cache lines requested multiplied
       by the cache line size. This value does not consider partial requests, so for
       example, if only a single value is requested in a cache line, the data movement
       will still be counted as a full cache line.
     Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
-      per normalization unit.
+      divided by total duration.
     Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
-      per normalization unit.
+      divided by total duration.
     Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
-      per normalization unit.
+      divided by total duration.
     Req: The total number of incoming requests to the L2 from all clients for all
       request types, per normalization unit.
     Read Req: The total number of read requests to the L2 from all clients.
@@ -150,11 +150,11 @@ Panel Config:
       64B of data from any source other than the accelerator's local HBM, per normalization
       unit.
     Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
       write or atomically update 32B of data to any memory location, per normalization
       unit.
@@ -171,17 +171,17 @@ Panel Config:
       write or atomically update 32B or 64B of data in any memory location other than
       the accelerator's local HBM, per normalization unit.
     Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
-      HBM traffic, per normalization unit.
+      HBM traffic, divided by total duration.
     Atomic: The total number of L2 requests to Infinity Fabric to atomically update
       32B or 64B of data in any memory location, per normalization unit. See Request
       flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
@@ -258,12 +258,15 @@ Panel Config:
       metric:
         Read BW:
           avg: AVG(((128 * TCC_BUBBLE_sum + 64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum
-            - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / $denom))
+            - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp
+            - Start_Timestamp)))
           min: MIN(((128 * TCC_BUBBLE_sum + 64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum
-            - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / $denom))
+            - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp
+            - Start_Timestamp)))
           max: MAX(((128 * TCC_BUBBLE_sum + 64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum
-            - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / $denom))
-          unit: (Bytes  + $normUnit)
+            - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp
+            - Start_Timestamp)))
+          unit: Gbps
         HBM Read Traffic:
           avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
             != 0) else None))
@@ -290,12 +293,12 @@ Panel Config:
           unit: pct
         Write and Atomic BW:
           avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 32)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Write and Atomic Traffic:
           avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
             != 0) else None))
@@ -363,10 +366,10 @@ Panel Config:
         unit: Unit
       metric:
         Bandwidth:
-          avg: AVG((TCC_REQ_sum * 128) / $denom)
-          min: MIN((TCC_REQ_sum * 128) / $denom)
-          max: MAX((TCC_REQ_sum * 128) / $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Req:
           avg: AVG((TCC_REQ_sum / $denom))
           min: MIN((TCC_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1200_local_data_share_lds.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1200_local_data_share_lds.yaml
index 0609c0a203..c334698661 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1200_local_data_share_lds.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1200_local_data_share_lds.yaml
@@ -11,8 +11,12 @@ Panel Config:
       instructions, averaged over the lifetime of the kernel. Calculated as the ratio
       of the total number of cycles spent by the scheduler issuing LDS instructions
       over the total CU cycles.
+    Theoretical Bandwidth Utilization: Indicates the maximum amount of bytes that
+      could have been loaded from, stored to, or atomically updated in the LDS divided
+      as percentage of theoretical peak. Does not take into account the execution
+      mask of the wavefront when the instruction was executed.
     Theoretical Bandwidth: Indicates the maximum amount of bytes that could have been
-      loaded from, stored to, or atomically updated in the LDS per normalization unit.
+      loaded from, stored to, or atomically updated in the LDS divided by total duration.
       Does not take into account the execution mask of the wavefront when the instruction
       was executed.
     Bank Conflict Rate: Indicates the percentage of active LDS cycles that were spent
@@ -58,7 +62,7 @@ Panel Config:
         Access Rate:
           value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
           unit: Pct of Peak
-        Theoretical Bandwidth (% of Peak):
+        Theoretical Bandwidth Utilization:
           value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
             / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
           unit: Pct of Peak
@@ -116,12 +120,12 @@ Panel Config:
           units: Gbps
         Theoretical Bandwidth:
           avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
+            / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-            / $denom))
-          unit: (Bytes  + $normUnit)
+            / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         LDS Latency:
           avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else
             None))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1300_instruction_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1300_instruction_cache.yaml
index a53c23691f..aeda9bc6c7 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1300_instruction_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1300_instruction_cache.yaml
@@ -3,15 +3,18 @@ Panel Config:
   id: 1300
   title: Instruction Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the L1I cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of L1I requests over the
-      total L1I cycles.
+    Bandwidth Utilization: The number of bytes looked up in the L1I cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of L1I requests over
+      the total L1I cycles.
     Cache Hit Rate: The percent of L1I requests that hit [#l1i-cache]_ on a previously
       loaded line the cache. Calculated as the ratio of the number of L1I requests
       that hit over the number of all L1I requests.
-    L1I-L2 Bandwidth: "The percent of the peak theoretical L1I \u2192 L2 cache request\
-      \ bandwidth achieved. Calculated as the ratio of the total number of requests\
-      \ from the L1I to the L2 cache over the total L1I-L2 interface cycles."
+    L1I-L2 Bandwidth Utilization: "The percent of the peak theoretical L1I \u2192\
+      \ L2 cache request bandwidth achieved. Calculated as the ratio of the total\
+      \ number of requests from the L1I to the L2 cache over the total L1I-L2 interface\
+      \ cycles."
+    L1I-L2 Bandwidth: Total number of bytes transferred across L1I - L2 interface
+      divided by total duration.
     Req: The total number of requests made to the L1I per normalization-unit
     Hits: The total number of L1I requests that hit on a previously loaded cache line,
       per normalization-unit.
@@ -30,7 +33,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -38,7 +41,7 @@ Panel Config:
           value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
             + SQC_ICACHE_MISSES_DUPLICATE)))
           unit: Pct of Peak
-        L1I-L2 Bandwidth:
+        L1I-L2 Bandwidth Utilization:
           value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
             * (End_Timestamp - Start_Timestamp))))
           unit: Pct of Peak
@@ -100,7 +103,7 @@ Panel Config:
         unit: Unit
       metric:
         L1I-L2 Bandwidth:
-          avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-          min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-          max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml
index d43157ce8e..282b97ad1f 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml
@@ -3,14 +3,17 @@ Panel Config:
   id: 1400
   title: Scalar L1 Data Cache
   metrics_description:
-    Bandwidth: The number of bytes looked up in the sL1D cache, as a percent of the
-      peak theoretical bandwidth. Calculated as the ratio of sL1D requests over the
-      total sL1D cycles.
+    Bandwidth Utilization: The number of bytes looked up in the sL1D cache, as a percent
+      of the peak theoretical bandwidth. Calculated as the ratio of sL1D requests
+      over the total sL1D cycles.
     Cache Hit Rate: Indicates the percent of sL1D requests that hit on a previously
       loaded line the cache. The ratio of the number of sL1D requests that hit over
       the number of all sL1D requests.
+    sL1D-L2 BW Utilization: The percentage of the peak theoretical sL1D - L2 interface
+      bandwidth acheived.\ \ Caclulated as total number of bytes read from, written
+      to, or atomically updated\ \ across the sL1D - L2 interface.
     sL1D-L2 BW: "The total number of bytes read from, written to, or atomically updated\
-      \ across the sL1D\u2194L2 interface, per normalization unit. Note that sL1D\
+      \ across the sL1D\u2194L2 interface, divided by total duration. Note that sL1D\
       \ writes and atomics are typically unused on current CDNA accelerators, so in\
       \ the majority of cases this can be interpreted as an sL1D\u2192L2 read bandwidth."
     Req: The total number of requests, of any size or type, made to the sL1D per normalization
@@ -51,7 +54,7 @@ Panel Config:
         value: Avg
         unit: Unit
       metric:
-        Bandwidth:
+        Bandwidth Utilization:
           value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) * (End_Timestamp
             - Start_Timestamp))))
           unit: Pct of Peak
@@ -60,7 +63,7 @@ Panel Config:
             + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
             + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
           unit: Pct of Peak
-        sL1D-L2 BW:
+        sL1D-L2 BW Utilization:
           value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
             * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
           unit: Pct of Peak
@@ -158,12 +161,12 @@ Panel Config:
       metric:
         sL1D-L2 BW:
           avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
+            * 64)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-            * 64)) / $denom))
-          unit: (Bytes + $normUnit)
+            * 64)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Read Req:
           avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
           min: MIN((SQC_TC_DATA_READ_REQ / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml
index a196aa64f0..f95e3fcb1f 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml
@@ -5,12 +5,12 @@ Panel Config:
   metrics_description:
     Hit rate: The ratio of the number of vL1D cache line requests that hit in vL1D
       cache over the total number of cache line requests to the vL1D Cache RAM.
-    Bandwidth: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions, as a percent of the peak theoretical bandwidth achievable on the
-      specific accelerator. The number of bytes is calculated as the number of cache
-      lines requested multiplied by the cache line size. This value does not consider
-      partial requests, so for instance, if only a single value is requested in a
-      cache line, the data movement will still be counted as a full cache line.
+    Bandwidth Utilization: The number of bytes looked up in the vL1D cache as a result
+      of VMEM instructions, as a percent of the peak theoretical bandwidth achievable
+      on the specific accelerator. The number of bytes is calculated as the number
+      of cache lines requested multiplied by the cache line size. This value does
+      not consider partial requests, so for instance, if only a single value is requested
+      in a cache line, the data movement will still be counted as a full cache line.
     Utilization: Indicates how busy the vL1D Cache RAM was during the kernel execution.
       The number of cycles where the vL1D Cache RAM is actively processing any request
       divided by the number of cycles where the vL1D is active.
@@ -42,11 +42,11 @@ Panel Config:
     Atomic Req: The total number of incoming atomic requests from the address processing
       unit after coalescing per normalization unit.
     Cache BW: The number of bytes looked up in the vL1D cache as a result of VMEM
-      instructions per normalization unit. The number of bytes is calculated as the
-      number of cache lines requested multiplied by the cache line size.  This value
-      does not consider partial requests, so for instance, if only a single value
-      is requested in a cache line, the data movement will still be counted as a full
-      cache line.
+      instructions divided by total duration. The number of bytes is calculated as
+      the number of cache lines requested multiplied by the cache line size.  This
+      value does not consider partial requests, so for instance, if only a single
+      value is requested in a cache line, the data movement will still be counted
+      as a full cache line.
     Cache Hit Rate: The ratio of the number of vL1D cache line requests that hit in
       vL1D cache over the total number of cache line requests to the vL1D Cache RAM.
     Cache Accesses: The total number of cache line lookups in the vL1D.
@@ -57,7 +57,7 @@ Panel Config:
       command during the kernel's execution per normalization unit. This may be triggered
       by, for instance, the buffer_wbinvl1 instruction.
     L1-L2 BW: The number of bytes transferred across the vL1D-L2 interface as a result
-      of VMEM instructions, per normalization unit. The number of bytes is calculated
+      of VMEM instructions, divided by total duration. The number of bytes is calculated
       as the number of cache lines requested multiplied by the cache line size. This
       value does not consider partial requests, so for instance, if only a single
       value is requested in a cache line, the data movement will still be counted
@@ -128,7 +128,7 @@ Panel Config:
             / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0)
             else None))
           unit: Pct of Peak
-        Bandwidth:
+        Bandwidth Utilization:
           value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
             - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
           unit: Pct of Peak
@@ -216,10 +216,10 @@ Panel Config:
             / $denom))
           unit: (Req  + $normUnit)
         Cache BW:
-          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-          unit: (Bytes + $normUnit)
+          avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Cache Hit Rate:
           avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
             + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -257,12 +257,12 @@ Panel Config:
           unit: (Req + $normUnit)
         L1-L2 BW:
           avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
           max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum + TCP_TCC_ATOMIC_WITH_RET_REQ_sum
-            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-          unit: (Bytes + $normUnit)
+            + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         Tag RAM 0 Req:
           avg: AVG((TCP_TAGRAM0_REQ_sum / $denom))
           min: MIN((TCP_TAGRAM0_REQ_sum / $denom))
diff --git a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml
index c354429c0e..15ba2f4745 100644
--- a/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml
+++ b/projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml
@@ -20,8 +20,8 @@ Panel Config:
     HBM Bandwidth: Maximum theoretical bandwidth of the accelerator's local high-bandwidth
       memory (HBM) per unit time. This value is calculated as the number of HBM channels
       multiplied by the HBM channel width multiplied by the HBM clock frequency.
-    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric per
-      normalization unit.
+    Read BW: The total number of bytes read by the L2 cache from Infinity Fabric divided
+      by total duration.
     HBM Read Traffic: The percent of read requests generated by the L2 cache that
       are routed to the accelerator's local high-bandwidth memory (HBM). This breakdown
       does not consider the size of the request (meaning that 32B and 64B requests
@@ -42,9 +42,9 @@ Panel Config:
       as a single request), so this metric only approximates the percent of the L2-Fabric
       read bandwidth directed to an uncached memory location.
     Write and Atomic BW: The total number of bytes written by the L2 over Infinity
-      Fabric by write and atomic operations per normalization unit. Note that on current
-      CDNA accelerators, such as the MI2XX, requests are only considered atomic by
-      Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
+      Fabric by write and atomic operations divided by total duration. Note that on
+      current CDNA accelerators, such as the MI2XX, requests are only considered atomic
+      by Infinity Fabric if they are targeted at non-write-cacheable memory, for example,
       fine-grained memory allocations or uncached memory allocations on the MI2XX.
     HBM Write and Atomic Traffic: The percent of write and atomic requests generated
       by the L2 cache that are routed to the accelerator's local high-bandwidth memory
@@ -82,17 +82,17 @@ Panel Config:
     Atomic Latency: The time-averaged number of cycles atomic requests spent in Infinity
       Fabric before a completion acknowledgement (atomic without return value) or
       data (atomic with return value) was returned to the L2.
-    Bandwidth: The number of bytes looked up in the L2 cache, per normalization unit.
+    Bandwidth: The number of bytes looked up in the L2 cache, divided by total duration.
       The number of bytes is calculated as the number of cache lines requested multiplied
       by the cache line size. This value does not consider partial requests, so for
       example, if only a single value is requested in a cache line, the data movement
       will still be counted as a full cache line.
     Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
-      per normalization unit.
+      divided by total duration.
     Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
-      per normalization unit.
+      divided by total duration.
     Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
-      per normalization unit.
+      divided by total duration.
     Req: The total number of incoming requests to the L2 from all clients for all
       request types, per normalization unit.
     Read Req: The total number of read requests to the L2 from all clients.
@@ -150,11 +150,11 @@ Panel Config:
       64B of data from any source other than the accelerator's local HBM, per normalization
       unit.
     Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
       write or atomically update 32B of data to any memory location, per normalization
       unit.
@@ -171,17 +171,17 @@ Panel Config:
       write or atomically update 32B or 64B of data in any memory location other than
       the accelerator's local HBM, per normalization unit.
     Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
-      traffic, per normalization unit.
+      traffic, divided by total duration.
     Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
-      PCIe traffic, per normalization unit.
+      PCIe traffic, divided by total duration.
     "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
-      requests due to Infinity Fabric traffic, per normalization unit.
+      requests due to Infinity Fabric traffic, divided by total duration.
     Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
-      HBM traffic, per normalization unit.
+      HBM traffic, divided by total duration.
     Atomic: The total number of L2 requests to Infinity Fabric to atomically update
       32B or 64B of data in any memory location, per normalization unit. See Request
       flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
@@ -257,12 +257,12 @@ Panel Config:
       metric:
         Read BW:
           avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) +
-            (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
+            (TCC_EA0_RDREQ_128B_sum * 128)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) +
-            (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
+            (TCC_EA0_RDREQ_128B_sum * 128)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) +
-            (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
-          unit: (Bytes  + $normUnit)
+            (TCC_EA0_RDREQ_128B_sum * 128)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Read Traffic:
           avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
             != 0) else None))
@@ -289,12 +289,12 @@ Panel Config:
           unit: pct
         Write and Atomic BW:
           avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
+            * 32)) / (End_Timestamp - Start_Timestamp)))
           max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-            * 32)) / $denom))
-          unit: (Bytes  + $normUnit)
+            * 32)) / (End_Timestamp - Start_Timestamp)))
+          unit: Gbps
         HBM Write and Atomic Traffic:
           avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
             != 0) else None))
@@ -381,25 +381,25 @@ Panel Config:
         unit: Unit
       metric:
         Bandwidth:
-          avg: AVG((TCC_REQ_sum * 128) / $denom)
-          min: MIN((TCC_REQ_sum * 128) / $denom)
-          max: MAX((TCC_REQ_sum * 128) / $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Read Bandwidth:
-          avg: AVG(TCC_READ_SECTORS_sum * 32/ $denom)
-          min: MIN(TCC_READ_SECTORS_sum * 32/ $denom)
-          max: MAX(TCC_READ_SECTORS_sum * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_READ_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_READ_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_READ_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Write Bandwidth:
-          avg: AVG(TCC_WRITE_SECTORS_sum * 32/ $denom)
-          min: MIN(TCC_WRITE_SECTORS_sum * 32/ $denom)
-          max: MAX(TCC_WRITE_SECTORS_sum * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_WRITE_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_WRITE_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_WRITE_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Atomic Bandwidth:
-          avg: AVG(TCC_ATOMIC_SECTORS_sum * 32/ $denom)
-          min: MIN(TCC_ATOMIC_SECTORS_sum * 32/ $denom)
-          max: MAX(TCC_ATOMIC_SECTORS_sum * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_ATOMIC_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_ATOMIC_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_ATOMIC_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Req:
           avg: AVG((TCC_REQ_sum / $denom))
           min: MIN((TCC_REQ_sum / $denom))
@@ -653,20 +653,20 @@ Panel Config:
           max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
           unit: (Req  + $normUnit)
         Read Bandwidth - PCIe:
-          avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
-          min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
-          max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         "Read Bandwidth - Infinity Fabric\u2122":
-          avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
-          min: MIN(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ $denom)
-          max: MAX(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Read Bandwidth - HBM:
-          avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ $denom)
-          min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ $denom)
-          max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Write and Atomic (32B):
           avg: AVG(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
           min: MIN(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
@@ -693,20 +693,20 @@ Panel Config:
           max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
           unit: (Req  + $normUnit)
         Write Bandwidth - PCIe:
-          avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
-          min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
-          max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         "Write Bandwidth - Infinity Fabric\u2122":
-          avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
-          min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ $denom)
-          max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Write Bandwidth - HBM:
-          avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ $denom)
-          min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ $denom)
-          max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Atomic:
           avg: AVG((TCC_EA0_ATOMIC_sum / $denom))
           min: MIN((TCC_EA0_ATOMIC_sum / $denom))
@@ -718,17 +718,17 @@ Panel Config:
           max: MAX((TCC_EA0_WRREQ_ATOMIC_DRAM_sum / $denom))
           unit: (Req  + $normUnit)
         Atomic Bandwidth - PCIe:
-          avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ $denom)
-          min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ $denom)
-          max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         "Atomic Bandwidth - Infinity Fabric\u2122":
-          avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ $denom)
-          min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ $denom)
-          max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
         Atomic Bandwidth - HBM:
-          avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ $denom)
-          min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ $denom)
-          max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ $denom)
-          unit: (Bytes + $normUnit)
+          avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+          unit: Gbps
diff --git a/projects/rocprofiler-compute/utils/autogen_hash.yaml b/projects/rocprofiler-compute/utils/autogen_hash.yaml
index 2c50e5470b..a078e5122d 100644
--- a/projects/rocprofiler-compute/utils/autogen_hash.yaml
+++ b/projects/rocprofiler-compute/utils/autogen_hash.yaml
@@ -59,42 +59,42 @@ src/rocprof_compute_soc/analysis_configs/gfx940/1100_compute_units_compute_pipel
 src/rocprof_compute_soc/analysis_configs/gfx941/1100_compute_units_compute_pipeline.yaml: 4a25b6abf24f4a622fde1a3cfe65fe7236cf1e626fc2444667883997564cea1e
 src/rocprof_compute_soc/analysis_configs/gfx942/1100_compute_units_compute_pipeline.yaml: 4a25b6abf24f4a622fde1a3cfe65fe7236cf1e626fc2444667883997564cea1e
 src/rocprof_compute_soc/analysis_configs/gfx950/1100_compute_units_compute_pipeline.yaml: 4ef656938f8a9667ae872db522855856469accff9cb42bc0444b469346760dfd
-src/rocprof_compute_soc/analysis_configs/gfx908/1200_local_data_share_lds.yaml: 80f3ca3ea15de009c5278ea20566d8c08d62e0087971e5f9aeae1c89df1dd898
-src/rocprof_compute_soc/analysis_configs/gfx90a/1200_local_data_share_lds.yaml: 80f3ca3ea15de009c5278ea20566d8c08d62e0087971e5f9aeae1c89df1dd898
-src/rocprof_compute_soc/analysis_configs/gfx940/1200_local_data_share_lds.yaml: 3bbf3928288990863cfe72fd00a28785fde0a36f103f5381df578aae2eb28be0
-src/rocprof_compute_soc/analysis_configs/gfx941/1200_local_data_share_lds.yaml: 3bbf3928288990863cfe72fd00a28785fde0a36f103f5381df578aae2eb28be0
-src/rocprof_compute_soc/analysis_configs/gfx942/1200_local_data_share_lds.yaml: 3bbf3928288990863cfe72fd00a28785fde0a36f103f5381df578aae2eb28be0
-src/rocprof_compute_soc/analysis_configs/gfx950/1200_local_data_share_lds.yaml: 505163510a3b0132ee487f9e024188de2deb97d0f72e3d729b95f86e7c3434b3
-src/rocprof_compute_soc/analysis_configs/gfx908/1300_instruction_cache.yaml: 2437e2f8191675c4116d0da1db291f3ad2715281ea812e9fdd6506cf213e5d1b
-src/rocprof_compute_soc/analysis_configs/gfx90a/1300_instruction_cache.yaml: 2437e2f8191675c4116d0da1db291f3ad2715281ea812e9fdd6506cf213e5d1b
-src/rocprof_compute_soc/analysis_configs/gfx940/1300_instruction_cache.yaml: 2437e2f8191675c4116d0da1db291f3ad2715281ea812e9fdd6506cf213e5d1b
-src/rocprof_compute_soc/analysis_configs/gfx941/1300_instruction_cache.yaml: 2437e2f8191675c4116d0da1db291f3ad2715281ea812e9fdd6506cf213e5d1b
-src/rocprof_compute_soc/analysis_configs/gfx942/1300_instruction_cache.yaml: 2437e2f8191675c4116d0da1db291f3ad2715281ea812e9fdd6506cf213e5d1b
-src/rocprof_compute_soc/analysis_configs/gfx950/1300_instruction_cache.yaml: 2437e2f8191675c4116d0da1db291f3ad2715281ea812e9fdd6506cf213e5d1b
-src/rocprof_compute_soc/analysis_configs/gfx908/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
-src/rocprof_compute_soc/analysis_configs/gfx90a/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
-src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
-src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
-src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
-src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
+src/rocprof_compute_soc/analysis_configs/gfx908/1200_local_data_share_lds.yaml: f3f7a74e8b2915fe27eec7948f006f218a6b0a96c91b95cdff9e624b2c484bb2
+src/rocprof_compute_soc/analysis_configs/gfx90a/1200_local_data_share_lds.yaml: f3f7a74e8b2915fe27eec7948f006f218a6b0a96c91b95cdff9e624b2c484bb2
+src/rocprof_compute_soc/analysis_configs/gfx940/1200_local_data_share_lds.yaml: f3f7a74e8b2915fe27eec7948f006f218a6b0a96c91b95cdff9e624b2c484bb2
+src/rocprof_compute_soc/analysis_configs/gfx941/1200_local_data_share_lds.yaml: f3f7a74e8b2915fe27eec7948f006f218a6b0a96c91b95cdff9e624b2c484bb2
+src/rocprof_compute_soc/analysis_configs/gfx942/1200_local_data_share_lds.yaml: f3f7a74e8b2915fe27eec7948f006f218a6b0a96c91b95cdff9e624b2c484bb2
+src/rocprof_compute_soc/analysis_configs/gfx950/1200_local_data_share_lds.yaml: 6333e18126bde83da4c66fd967531d394bd22e69c08358096b27168a9dc11a30
+src/rocprof_compute_soc/analysis_configs/gfx908/1300_instruction_cache.yaml: f60b9c657bece161e34219f3ada4041107dc5ca3d248590ee3b67e7bd400ff54
+src/rocprof_compute_soc/analysis_configs/gfx90a/1300_instruction_cache.yaml: f60b9c657bece161e34219f3ada4041107dc5ca3d248590ee3b67e7bd400ff54
+src/rocprof_compute_soc/analysis_configs/gfx940/1300_instruction_cache.yaml: f60b9c657bece161e34219f3ada4041107dc5ca3d248590ee3b67e7bd400ff54
+src/rocprof_compute_soc/analysis_configs/gfx941/1300_instruction_cache.yaml: f60b9c657bece161e34219f3ada4041107dc5ca3d248590ee3b67e7bd400ff54
+src/rocprof_compute_soc/analysis_configs/gfx942/1300_instruction_cache.yaml: f60b9c657bece161e34219f3ada4041107dc5ca3d248590ee3b67e7bd400ff54
+src/rocprof_compute_soc/analysis_configs/gfx950/1300_instruction_cache.yaml: f60b9c657bece161e34219f3ada4041107dc5ca3d248590ee3b67e7bd400ff54
+src/rocprof_compute_soc/analysis_configs/gfx908/1400_scalar_l1_data_cache.yaml: 29fac4ea38e4a018baffc4a27a720b47078fd890c10da307655d40f693e6f0e7
+src/rocprof_compute_soc/analysis_configs/gfx90a/1400_scalar_l1_data_cache.yaml: 29fac4ea38e4a018baffc4a27a720b47078fd890c10da307655d40f693e6f0e7
+src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml: 29fac4ea38e4a018baffc4a27a720b47078fd890c10da307655d40f693e6f0e7
+src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml: 29fac4ea38e4a018baffc4a27a720b47078fd890c10da307655d40f693e6f0e7
+src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml: 29fac4ea38e4a018baffc4a27a720b47078fd890c10da307655d40f693e6f0e7
+src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml: 29fac4ea38e4a018baffc4a27a720b47078fd890c10da307655d40f693e6f0e7
 src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 633d59aba82b3a495b7ba33fa4b2ae4da638b58632bcc37ff18be87af68ce4d4
 src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 2bdb9d7b3bea1057b3baee29ba3b428b211808261063a97bc4b6b319f4a19fb3
 src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19
 src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19
 src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19
 src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 9e56cef5b066fb575a5c530bcf9400f1291dd8636b12c8a2244cdba1defafc9f
-src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml: e6ec43014ce7b7cc072385d4eba072dd187b5de14979c169a3c1e9b8fc4c2762
-src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml: e6ec43014ce7b7cc072385d4eba072dd187b5de14979c169a3c1e9b8fc4c2762
-src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28
-src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28
-src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28
-src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml: cd21327c193d2af8c18066b9c13f67e3d5dfb44731777bc5a1b6a7738c902dd1
-src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml: 5b48c690b6069a5610d07cc0c2a5e1da65a52296205dcf48a3b6fa5e3df36e9b
-src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml: a9b128267a069060e891533334c52586c706f145b1e813a4081cb21d425516ad
-src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml: b4eea39f0e23e501ad503cdd96db377109c7f0e212949828fe06102de7355349
-src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml: da0189cd7f6e1ab4b79d0c054c2cdc1f7a9c81972dae9e5285f2f3d9c30ca644
-src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml: b0802f923052eb584ce138210ebf2db70fb7883926896da1861a9e857d4abe81
-src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml: 58bdd965421d610567e461becd7094fa41d668b119eddab99054d2bd6dc12acf
+src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml: 438d0f4a972dd341eb2485f51a47d6860fbb30a6169054cd8550b4b7226e199f
+src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml: 438d0f4a972dd341eb2485f51a47d6860fbb30a6169054cd8550b4b7226e199f
+src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml: 6100b218f24de9f1433b39a093ed04b9bb9dfe656c5df77583c9db332c447230
+src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml: 6100b218f24de9f1433b39a093ed04b9bb9dfe656c5df77583c9db332c447230
+src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml: 6100b218f24de9f1433b39a093ed04b9bb9dfe656c5df77583c9db332c447230
+src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml: 67054ec0a4c6ca147a5dd40cc91f0e8e81378e1affe7d479274747579ecc524a
+src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml: b1baa76f9dbfcc52d5e12cc1834102a0011ddf8bdece5be5fabc2945ab8971f4
+src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml: 4d834a2066d7f2cb655a8e41fc17531282150b6fe64bbc9c5ff3a10acddee5af
+src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml: 78f9fee5dafc83d311da1c801200c1820e16a0678dd0548fafa8a966ec6a94d5
+src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml: 51fe6e3888975b805594c2ab2b3147e717ae5e015468ee592cbcddc389c689bc
+src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml: dc2dc9ff61b1747e492c28ef5ac76764fd75c18fd0827834130bc583f2afc619
+src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml: d181f753c3fff608c72b8015d1af30bfd8cf8cdfbc0a17c505f717ddaa3b1efc
 src/rocprof_compute_soc/analysis_configs/gfx908/1800_l2_cache_per_channel.yaml: a0c53202fe9f68d5e1fa689ce0643c471ced7d47e007d8ccc68fba294f7f6a05
 src/rocprof_compute_soc/analysis_configs/gfx90a/1800_l2_cache_per_channel.yaml: a0c53202fe9f68d5e1fa689ce0643c471ced7d47e007d8ccc68fba294f7f6a05
 src/rocprof_compute_soc/analysis_configs/gfx940/1800_l2_cache_per_channel.yaml: e184e3692eb0d641fb2e37fada0e58a6c4958553931d7c038b884e1e6986093f
@@ -113,4 +113,4 @@ src/rocprof_compute_soc/profile_configs/sets/gfx940_sets.yaml: 44cd2b32b050cafa7
 src/rocprof_compute_soc/profile_configs/sets/gfx941_sets.yaml: 44cd2b32b050cafa73d0ead5703b82836edf25a057c21699046b6b8b8918b242
 src/rocprof_compute_soc/profile_configs/sets/gfx942_sets.yaml: 44cd2b32b050cafa73d0ead5703b82836edf25a057c21699046b6b8b8918b242
 src/rocprof_compute_soc/profile_configs/sets/gfx950_sets.yaml: 238d9dc8a98cfead3fc904885bfe413e5bcb4f1af31e9820cd640388bcd1e1c2
-docs/data/metrics_description.yaml: 819c08a584ae8b418e6983aa51108b95e43eda4f3b7892eab336c61d844b20bf
+docs/data/metrics_description.yaml: c2ddad7ef7973b128c1612e56cc6286e49c2f59af829b1795dc64b38c0ecfd61
diff --git a/projects/rocprofiler-compute/utils/unified_config.yaml b/projects/rocprofiler-compute/utils/unified_config.yaml
index fb6286d7ab..0f3e89e781 100644
--- a/projects/rocprofiler-compute/utils/unified_config.yaml
+++ b/projects/rocprofiler-compute/utils/unified_config.yaml
@@ -7972,7 +7972,7 @@ panels:
           Access Rate:
             value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
             unit: Pct of Peak
-          Theoretical Bandwidth:
+          Theoretical Bandwidth Utilization:
             value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
               / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) *
               0.00128)))
@@ -7988,7 +7988,7 @@ panels:
           Access Rate:
             value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
             unit: Pct of Peak
-          Theoretical Bandwidth (% of Peak):
+          Theoretical Bandwidth Utilization:
             value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
               / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) *
               0.00128)))
@@ -8004,7 +8004,7 @@ panels:
           Access Rate:
             value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
             unit: Pct of Peak
-          Theoretical Bandwidth (% of Peak):
+          Theoretical Bandwidth Utilization:
             value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
               / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) *
               0.00128)))
@@ -8020,7 +8020,7 @@ panels:
           Access Rate:
             value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
             unit: Pct of Peak
-          Theoretical Bandwidth (% of Peak):
+          Theoretical Bandwidth Utilization:
             value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
               / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) *
               0.00128)))
@@ -8036,7 +8036,7 @@ panels:
           Access Rate:
             value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
             unit: Pct of Peak
-          Theoretical Bandwidth (% of Peak):
+          Theoretical Bandwidth Utilization:
             value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
               / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) *
               0.00128)))
@@ -8052,7 +8052,7 @@ panels:
           Access Rate:
             value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
             unit: Pct of Peak
-          Theoretical Bandwidth:
+          Theoretical Bandwidth Utilization:
             value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
               / (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) *
               0.00128)))
@@ -8082,12 +8082,12 @@ panels:
             unit: (Instr  + $normUnit)
           Theoretical Bandwidth:
             avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
-            unit: (Bytes  + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           LDS Latency:
             avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0)
               else None))
@@ -8143,12 +8143,12 @@ panels:
             unit: (Instr  + $normUnit)
           Theoretical Bandwidth:
             avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
-            unit: (Bytes  + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           LDS Latency:
             avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0)
               else None))
@@ -8204,12 +8204,12 @@ panels:
             unit: (Instr  + $normUnit)
           Theoretical Bandwidth:
             avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
-            unit: (Bytes  + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           LDS Latency:
             avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0)
               else None))
@@ -8265,12 +8265,12 @@ panels:
             unit: (Instr  + $normUnit)
           Theoretical Bandwidth:
             avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
-            unit: (Bytes  + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           LDS Latency:
             avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0)
               else None))
@@ -8356,12 +8356,12 @@ panels:
             units: Gbps
           Theoretical Bandwidth:
             avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
-            unit: (Bytes  + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           LDS Latency:
             avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0)
               else None))
@@ -8427,12 +8427,12 @@ panels:
             unit: (Instr  + $normUnit)
           Theoretical Bandwidth:
             avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
-              / $denom))
-            unit: (Bytes  + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           LDS Latency:
             avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0)
               else None))
@@ -8502,17 +8502,28 @@ panels:
         <desc-scheduler>` issuing :ref:`LDS <desc-lds>` instructions over the :ref:`total
         CU cycles <total-cu-cycles>`.
       unit: Percent
-    Theoretical Bandwidth:
+    Theoretical Bandwidth Utilization:
       plain: Indicates the maximum amount of bytes that could have been loaded from,
-        stored to, or atomically updated in the LDS per normalization unit. Does not
-        take into account the execution mask of the wavefront when the instruction
+        stored to, or atomically updated in the LDS divided as percentage of theoretical peak.
+        Does not take into account the execution mask of the wavefront when the instruction
         was executed.
       rst: Indicates the maximum amount of bytes that could have been loaded from,  stored
-        to, or atomically updated in the LDS per  :ref:`normalization unit <normalization-units>`.
+        to, or atomically updated in the LDS divided as percentage of theoretical peak.
         Does *not* take into  account the execution mask of the wavefront when the
         instruction was  executed. See the  :ref:`LDS bandwidth example <lds-bandwidth>`
         for more detail.
-      unit: Bytes per normalization unit
+      unit: Percent
+    Theoretical Bandwidth:
+      plain: Indicates the maximum amount of bytes that could have been loaded from,
+        stored to, or atomically updated in the LDS divided by total duration. Does not
+        take into account the execution mask of the wavefront when the instruction
+        was executed.
+      rst: Indicates the maximum amount of bytes that could have been loaded from,  stored
+        to, or atomically updated in the LDS divided by total duration.
+        Does *not* take into  account the execution mask of the wavefront when the
+        instruction was  executed. See the  :ref:`LDS bandwidth example <lds-bandwidth>`
+        for more detail.
+      unit: Gbps
     Bank Conflict Rate:
       plain: Indicates the percentage of active LDS cycles that were spent servicing
         bank conflicts. Calculated as the ratio of LDS cycles spent servicing bank
@@ -8601,7 +8612,7 @@ panels:
         unit: Unit
       metric:
         gfx90a:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -8609,12 +8620,12 @@ panels:
             value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
               + SQC_ICACHE_MISSES_DUPLICATE)))
             unit: Pct of Peak
-          L1I-L2 Bandwidth:
+          L1I-L2 Bandwidth Utilization:
             value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
               * (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
         gfx941:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -8622,12 +8633,12 @@ panels:
             value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
               + SQC_ICACHE_MISSES_DUPLICATE)))
             unit: Pct of Peak
-          L1I-L2 Bandwidth:
+          L1I-L2 Bandwidth Utilization:
             value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
               * (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
         gfx940:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -8635,12 +8646,12 @@ panels:
             value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
               + SQC_ICACHE_MISSES_DUPLICATE)))
             unit: Pct of Peak
-          L1I-L2 Bandwidth:
+          L1I-L2 Bandwidth Utilization:
             value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
               * (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
         gfx942:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -8648,12 +8659,12 @@ panels:
             value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
               + SQC_ICACHE_MISSES_DUPLICATE)))
             unit: Pct of Peak
-          L1I-L2 Bandwidth:
+          L1I-L2 Bandwidth Utilization:
             value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
               * (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
         gfx950:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -8661,12 +8672,12 @@ panels:
             value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
               + SQC_ICACHE_MISSES_DUPLICATE)))
             unit: Pct of Peak
-          L1I-L2 Bandwidth:
+          L1I-L2 Bandwidth Utilization:
             value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
               * (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
         gfx908:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -8674,7 +8685,7 @@ panels:
             value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
               + SQC_ICACHE_MISSES_DUPLICATE)))
             unit: Pct of Peak
-          L1I-L2 Bandwidth:
+          L1I-L2 Bandwidth Utilization:
             value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
               * (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -8913,42 +8924,42 @@ panels:
       metric:
         gfx90a:
           L1I-L2 Bandwidth:
-            avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-            min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-            max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
         gfx941:
           L1I-L2 Bandwidth:
-            avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-            min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-            max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
         gfx940:
           L1I-L2 Bandwidth:
-            avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-            min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-            max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
         gfx942:
           L1I-L2 Bandwidth:
-            avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-            min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-            max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
         gfx950:
           L1I-L2 Bandwidth:
-            avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-            min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-            max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
         gfx908:
           L1I-L2 Bandwidth:
-            avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
-            min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
-            max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((SQC_TC_INST_REQ * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
   metrics_description:
-    Bandwidth:
+    Bandwidth Utilization:
       plain: The number of bytes looked up in the L1I cache, as a percent of the peak
         theoretical bandwidth. Calculated as the ratio of L1I requests over the total
         L1I cycles.
@@ -8964,7 +8975,7 @@ panels:
         the cache. Calculated as the ratio of the number of L1I requests  that hit
         over the number of all L1I requests.
       unit: Percent
-    L1I-L2 Bandwidth:
+    L1I-L2 Bandwidth Utilization:
       plain: "The percent of the peak theoretical L1I \u2192 L2 cache request bandwidth\
         \ achieved. Calculated as the ratio of the total number of requests from the\
         \ L1I to the L2 cache over the total L1I-L2 interface cycles."
@@ -8972,6 +8983,10 @@ panels:
         \  achieved. Calculated as the ratio of the total number of requests from\
         \  the L1I to the L2 cache over the  :ref:`total L1I-L2 interface cycles <total-l1i-cycles>`."
       unit: Percent
+    L1I-L2 Bandwidth:
+      plain: Total number of bytes transferred across L1I - L2 interface divided by total duration.
+      rst: Total number of bytes transferred across L1I - L2 interface divided by total duration.
+      unit: Gbps
     Req:
       plain: The total number of requests made to the L1I per normalization-unit
       rst: The total number of requests made to the L1I per normalization-unit
@@ -9013,7 +9028,7 @@ panels:
         unit: Unit
       metric:
         gfx90a:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -9022,12 +9037,12 @@ panels:
               + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
               + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
             unit: Pct of Peak
-          sL1D-L2 BW:
+          sL1D-L2 BW Utilization:
             value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
               * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
             unit: Pct of Peak
         gfx941:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -9036,12 +9051,12 @@ panels:
               + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
               + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
             unit: Pct of Peak
-          sL1D-L2 BW:
+          sL1D-L2 BW Utilization:
             value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
               * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
             unit: Pct of Peak
         gfx940:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -9050,12 +9065,12 @@ panels:
               + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
               + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
             unit: Pct of Peak
-          sL1D-L2 BW:
+          sL1D-L2 BW Utilization:
             value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
               * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
             unit: Pct of Peak
         gfx942:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -9064,12 +9079,12 @@ panels:
               + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
               + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
             unit: Pct of Peak
-          sL1D-L2 BW:
+          sL1D-L2 BW Utilization:
             value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
               * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
             unit: Pct of Peak
         gfx950:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -9078,12 +9093,12 @@ panels:
               + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
               + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
             unit: Pct of Peak
-          sL1D-L2 BW:
+          sL1D-L2 BW Utilization:
             value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
               * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
             unit: Pct of Peak
         gfx908:
-          Bandwidth:
+          Bandwidth Utilization:
             value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu) *
               (End_Timestamp - Start_Timestamp))))
             unit: Pct of Peak
@@ -9092,7 +9107,7 @@ panels:
               + SQC_DCACHE_MISSES_DUPLICATE)) if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES
               + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
             unit: Pct of Peak
-          sL1D-L2 BW:
+          sL1D-L2 BW Utilization:
             value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
               * 100000) / (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
             unit: Pct of Peak
@@ -9542,12 +9557,12 @@ panels:
         gfx90a:
           sL1D-L2 BW:
             avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
-            unit: (Bytes + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Read Req:
             avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
             min: MIN((SQC_TC_DATA_READ_REQ / $denom))
@@ -9571,12 +9586,12 @@ panels:
         gfx941:
           sL1D-L2 BW:
             avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
-            unit: (Bytes + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Read Req:
             avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
             min: MIN((SQC_TC_DATA_READ_REQ / $denom))
@@ -9600,12 +9615,12 @@ panels:
         gfx940:
           sL1D-L2 BW:
             avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
-            unit: (Bytes + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Read Req:
             avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
             min: MIN((SQC_TC_DATA_READ_REQ / $denom))
@@ -9629,12 +9644,12 @@ panels:
         gfx942:
           sL1D-L2 BW:
             avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
-            unit: (Bytes + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Read Req:
             avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
             min: MIN((SQC_TC_DATA_READ_REQ / $denom))
@@ -9658,12 +9673,12 @@ panels:
         gfx950:
           sL1D-L2 BW:
             avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
-            unit: (Bytes + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Read Req:
             avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
             min: MIN((SQC_TC_DATA_READ_REQ / $denom))
@@ -9687,12 +9702,12 @@ panels:
         gfx908:
           sL1D-L2 BW:
             avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ)
-              * 64)) / $denom))
-            unit: (Bytes + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Read Req:
             avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
             min: MIN((SQC_TC_DATA_READ_REQ / $denom))
@@ -9714,7 +9729,7 @@ panels:
             max: MAX((SQC_TC_STALL / $denom))
             unit: (Cycles  + $normUnit)
   metrics_description:
-    Bandwidth:
+    Bandwidth Utilization:
       plain: The number of bytes looked up in the sL1D cache, as a percent of the
         peak theoretical bandwidth. Calculated as the ratio of sL1D requests over
         the total sL1D cycles.
@@ -9730,18 +9745,26 @@ panels:
         the cache. The ratio of the number of sL1D requests that hit  [#sl1d-cache]_
         over the number of all sL1D requests.
       unit: Percent
+    sL1D-L2 BW Utilization:
+      plain: The percentage of the peak theoretical sL1D - L2 interface bandwidth acheived.\
+        \ Caclulated as total number of bytes read from, written to, or atomically updated\
+        \ across the sL1D - L2 interface.
+      rst: The percentage of the peak theoretical sL1D - L2 interface bandwidth acheived.\
+        \ Caclulated as total number of bytes read from, written to, or atomically updated\
+        \ across the sL1D - L2 interface.
+      unit: Percent
     sL1D-L2 BW:
       plain: "The total number of bytes read from, written to, or atomically updated\
-        \ across the sL1D\u2194L2 interface, per normalization unit. Note that sL1D\
+        \ across the sL1D\u2194L2 interface, divided by total duration. Note that sL1D\
         \ writes and atomics are typically unused on current CDNA accelerators, so\
         \ in the majority of cases this can be interpreted as an sL1D\u2192L2 read\
         \ bandwidth."
       rst: "The total number of bytes read from, written to, or atomically updated\
-        \  across the sL1D\u2194:doc:`L2 <l2-cache>` interface, per  :ref:`normalization\
-        \ unit <normalization-units>`. Note that sL1D writes  and atomics are typically\
+        \  across the sL1D\u2194:doc:`L2 <l2-cache>` interface, divided by total duration.\
+        \ Note that sL1D writes and atomics are typically\
         \ unused on current CDNA accelerators, so in the  majority of cases this can\
         \ be interpreted as an sL1D\u2192L2 read bandwidth."
-      unit: Bytes per normalization unit
+      unit: Gbps
     Req:
       plain: The total number of requests, of any size or type, made to the sL1D per
         normalization unit.
@@ -10938,7 +10961,7 @@ panels:
               / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum !=
               0) else None))
             unit: Pct of Peak
-          Bandwidth:
+          Bandwidth Utilization:
             value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp
               - Start_Timestamp)))) / ((($max_sclk / 1000) * 64) * $cu_per_gpu))
             unit: Pct of Peak
@@ -10957,7 +10980,7 @@ panels:
               / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum !=
               0) else None))
             unit: Pct of Peak
-          Bandwidth:
+          Bandwidth Utilization:
             value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
               - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
             unit: Pct of Peak
@@ -10976,7 +10999,7 @@ panels:
               / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum !=
               0) else None))
             unit: Pct of Peak
-          Bandwidth:
+          Bandwidth Utilization:
             value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
               - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
             unit: Pct of Peak
@@ -10995,7 +11018,7 @@ panels:
               / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum !=
               0) else None))
             unit: Pct of Peak
-          Bandwidth:
+          Bandwidth Utilization:
             value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
               - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
             unit: Pct of Peak
@@ -11014,7 +11037,7 @@ panels:
               / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum !=
               0) else None))
             unit: Pct of Peak
-          Bandwidth:
+          Bandwidth Utilization:
             value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp
               - Start_Timestamp)))) / ((($max_sclk / 1000) * 128) * $cu_per_gpu))
             unit: Pct of Peak
@@ -11033,7 +11056,7 @@ panels:
               / TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum !=
               0) else None))
             unit: Pct of Peak
-          Bandwidth:
+          Bandwidth Utilization:
             value: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp
               - Start_Timestamp)))) / ((($max_sclk / 1000) * 64) * $cu_per_gpu))
             unit: Pct of Peak
@@ -11203,10 +11226,10 @@ panels:
               / $denom))
             unit: (Req  + $normUnit)
           Cache BW:
-            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Cache Hit Rate:
             avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
               + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -11244,12 +11267,12 @@ panels:
             unit: (Req + $normUnit)
           L1-L2 BW:
             avg: AVG(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-            unit: (Bytes + $normUnit)
+              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           L1-L2 Read:
             avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
             min: MIN((TCP_TCC_READ_REQ_sum / $denom))
@@ -11323,10 +11346,10 @@ panels:
               / $denom))
             unit: (Req  + $normUnit)
           Cache BW:
-            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Cache Hit Rate:
             avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
               + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -11365,14 +11388,14 @@ panels:
           L1-L2 BW:
             avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
-            unit: (Bytes + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           L1-L2 Read:
             avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
             min: MIN((TCP_TCC_READ_REQ_sum / $denom))
@@ -11416,10 +11439,10 @@ panels:
               / $denom))
             unit: (Req  + $normUnit)
           Cache BW:
-            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Cache Hit Rate:
             avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
               + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -11458,14 +11481,14 @@ panels:
           L1-L2 BW:
             avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
-            unit: (Bytes + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           L1-L2 Read:
             avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
             min: MIN((TCP_TCC_READ_REQ_sum / $denom))
@@ -11509,10 +11532,10 @@ panels:
               / $denom))
             unit: (Req  + $normUnit)
           Cache BW:
-            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Cache Hit Rate:
             avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
               + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -11551,14 +11574,14 @@ panels:
           L1-L2 BW:
             avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
-            unit: (Bytes + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           L1-L2 Read:
             avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
             min: MIN((TCP_TCC_READ_REQ_sum / $denom))
@@ -11602,10 +11625,10 @@ panels:
               / $denom))
             unit: (Req  + $normUnit)
           Cache BW:
-            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Cache Hit Rate:
             avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
               + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -11644,14 +11667,14 @@ panels:
           L1-L2 BW:
             avg: AVG(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             min: MIN(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
+              / (End_Timestamp - Start_Timestamp)))
             max: MAX(((128 * TCP_TCC_READ_REQ_sum + 64 * (TCP_TCC_WRITE_REQ_sum +
               TCP_TCC_ATOMIC_WITH_RET_REQ_sum + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
-              / $denom))
-            unit: (Bytes + $normUnit)
+              / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Tag RAM 0 Req:
             avg: AVG((TCP_TAGRAM0_REQ_sum / $denom))
             min: MIN((TCP_TAGRAM0_REQ_sum / $denom))
@@ -11730,10 +11753,10 @@ panels:
               / $denom))
             unit: (Req  + $normUnit)
           Cache BW:
-            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / $denom))
-            unit: (Bytes + $normUnit)
+            avg: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+            min: MIN(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+            max: MAX(((TCP_TOTAL_CACHE_ACCESSES_sum * 64) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           Cache Hit Rate:
             avg: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
               + TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
@@ -11771,12 +11794,12 @@ panels:
             unit: (Req + $normUnit)
           L1-L2 BW:
             avg: AVG(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
+              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((64 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) + TCP_TCC_ATOMIC_WITH_RET_REQ_sum)
-              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / $denom))
-            unit: (Bytes + $normUnit)
+              + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           L1-L2 Read:
             avg: AVG((TCP_TCC_READ_REQ_sum / $denom))
             min: MIN((TCP_TCC_READ_REQ_sum / $denom))
@@ -12600,7 +12623,7 @@ panels:
         vL1D cache over the total number of cache line requests to the  :ref:`vL1D
         Cache RAM <desc-tc>`.
       unit: Percent
-    Bandwidth:
+    Bandwidth Utilization:
       plain: The number of bytes looked up in the vL1D cache as a result of VMEM instructions,
         as a percent of the peak theoretical bandwidth achievable on the specific
         accelerator. The number of bytes is calculated as the number of cache lines
@@ -12700,18 +12723,18 @@ panels:
       unit: Requests per normalization unit
     Cache BW:
       plain: The number of bytes looked up in the vL1D cache as a result of VMEM instructions
-        per normalization unit. The number of bytes is calculated as the number of
+        divided by total duration. The number of bytes is calculated as the number of
         cache lines requested multiplied by the cache line size.  This value does
         not consider partial requests, so for instance, if only a single value is
         requested in a cache line, the data movement will still be counted as a full
         cache line.
-      rst: The number of bytes looked up in the vL1D cache as a result of  :ref:`VMEM
-        <desc-vmem>` instructions per  :ref:`normalization unit <normalization-units>`.  The
-        number of bytes is  calculated as the number of cache lines requested multiplied
-        by the cache  line size.  This value does not consider partial requests, so
-        for  instance, if only a single value is requested in a cache line, the data  movement
+      rst: The number of bytes looked up in the vL1D cache as a result of :ref:`VMEM
+        <desc-vmem>` instructions divided by total duration. The
+        number of bytes is calculated as the number of cache lines requested multiplied
+        by the cache line size. This value does not consider partial requests, so
+        for  instance, if only a single value is requested in a cache line, the data movement
         will still be counted as a full cache line.
-      unit: Bytes per normalization unit
+      unit: Gbps
     Cache Hit Rate:
       plain: The ratio of the number of vL1D cache line requests that hit in vL1D
         cache over the total number of cache line requests to the vL1D Cache RAM.
@@ -12741,18 +12764,18 @@ panels:
       unit: Invalidations per normalization unit
     L1-L2 BW:
       plain: The number of bytes transferred across the vL1D-L2 interface as a result
-        of VMEM instructions, per normalization unit. The number of bytes is calculated
+        of VMEM instructions, divided by total duration. The number of bytes is calculated
         as the number of cache lines requested multiplied by the cache line size.
         This value does not consider partial requests, so for instance, if only a
         single value is requested in a cache line, the data movement will still be
         counted as a full cache line.
       rst: The number of bytes transferred across the vL1D-L2 interface as a result  of
-        :ref:`VMEM <desc-vmem>` instructions, per  :ref:`normalization unit <normalization-units>`.
+        :ref:`VMEM <desc-vmem>` instructions, divided by total duration.
         The number of bytes is  calculated as the number of cache lines requested
         multiplied by the cache  line size. This value does not consider partial requests,
         so for  instance, if only a single value is requested in a cache line, the
         data  movement will still be counted as a full cache line.
-      unit: Bytes per normalization unit
+      unit: Gbps
     L1-L2 Read:
       plain: The number of read requests for a vL1D cache line that were not satisfied
         by the vL1D and must be retrieved from the to the L2 Cache per normalization
@@ -13064,12 +13087,12 @@ panels:
         gfx90a:
           Read BW:
             avg: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Read Traffic:
             avg: AVG((100 * (TCC_EA_RDREQ_DRAM_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
               != 0) else None))
@@ -13096,12 +13119,12 @@ panels:
             unit: pct
           Write and Atomic BW:
             avg: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 32)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Write and Atomic Traffic:
             avg: AVG((100 * (TCC_EA_WRREQ_DRAM_sum / TCC_EA_WRREQ_sum) if (TCC_EA_WRREQ_sum
               != 0) else None))
@@ -13161,12 +13184,12 @@ panels:
         gfx941:
           Read BW:
             avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-              * 64)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Read Traffic:
             avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
               != 0) else None))
@@ -13193,12 +13216,12 @@ panels:
             unit: pct
           Write and Atomic BW:
             avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 32)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Write and Atomic Traffic:
             avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
               != 0) else None))
@@ -13258,12 +13281,12 @@ panels:
         gfx940:
           Read BW:
             avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
-              * 64)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Read Traffic:
             avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
               != 0) else None))
@@ -13290,12 +13313,12 @@ panels:
             unit: pct
           Write and Atomic BW:
             avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 32)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Write and Atomic Traffic:
             avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
               != 0) else None))
@@ -13355,12 +13378,12 @@ panels:
         gfx942:
           Read BW:
             avg: AVG(((128 * TCC_BUBBLE_sum + 64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum
-              - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / $denom))
+              - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))
             min: MIN(((128 * TCC_BUBBLE_sum + 64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum
-              - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / $denom))
+              - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))
             max: MAX(((128 * TCC_BUBBLE_sum + 64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum
-              - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / $denom))
-            unit: (Bytes  + $normUnit)
+              - TCC_EA0_RDREQ_32B_sum) + 32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Read Traffic:
             avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
               != 0) else None))
@@ -13387,12 +13410,12 @@ panels:
             unit: pct
           Write and Atomic BW:
             avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 32)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Write and Atomic Traffic:
             avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
               != 0) else None))
@@ -13452,12 +13475,12 @@ panels:
         gfx950:
           Read BW:
             avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64)
-              + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
+              + (TCC_EA0_RDREQ_128B_sum * 128)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64)
-              + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
+              + (TCC_EA0_RDREQ_128B_sum * 128)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64)
-              + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
-            unit: (Bytes  + $normUnit)
+              + (TCC_EA0_RDREQ_128B_sum * 128)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Read Traffic:
             avg: AVG((100 * (TCC_EA0_RDREQ_DRAM_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
               != 0) else None))
@@ -13484,12 +13507,12 @@ panels:
             unit: pct
           Write and Atomic BW:
             avg: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
-              * 32)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 32)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Write and Atomic Traffic:
             avg: AVG((100 * (TCC_EA0_WRREQ_DRAM_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
               != 0) else None))
@@ -13568,12 +13591,12 @@ panels:
         gfx908:
           Read BW:
             avg: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / $denom))
+              * 64)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
-              * 64)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 64)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Read Traffic:
             avg: AVG((100 * (TCC_EA_RDREQ_DRAM_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
               != 0) else None))
@@ -13600,12 +13623,12 @@ panels:
             unit: pct
           Write and Atomic BW:
             avg: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             min: MIN((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / $denom))
+              * 32)) / (End_Timestamp - Start_Timestamp)))
             max: MAX((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
-              * 32)) / $denom))
-            unit: (Bytes  + $normUnit)
+              * 32)) / (End_Timestamp - Start_Timestamp)))
+            unit: Gbps
           HBM Write and Atomic Traffic:
             avg: AVG((100 * (TCC_EA_WRREQ_DRAM_sum / TCC_EA_WRREQ_sum) if (TCC_EA_WRREQ_sum
               != 0) else None))
@@ -13674,10 +13697,10 @@ panels:
       metric:
         gfx90a:
           Bandwidth:
-            avg: AVG((TCC_REQ_sum * 128) / $denom)
-            min: MIN((TCC_REQ_sum * 128) / $denom)
-            max: MAX((TCC_REQ_sum * 128) / $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Req:
             avg: AVG((TCC_REQ_sum / $denom))
             min: MIN((TCC_REQ_sum / $denom))
@@ -13773,10 +13796,10 @@ panels:
             unit: (Req  + $normUnit)
         gfx941:
           Bandwidth:
-            avg: AVG((TCC_REQ_sum * 128) / $denom)
-            min: MIN((TCC_REQ_sum * 128) / $denom)
-            max: MAX((TCC_REQ_sum * 128) / $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Req:
             avg: AVG((TCC_REQ_sum / $denom))
             min: MIN((TCC_REQ_sum / $denom))
@@ -13872,10 +13895,10 @@ panels:
             unit: (Req  + $normUnit)
         gfx940:
           Bandwidth:
-            avg: AVG((TCC_REQ_sum * 128) / $denom)
-            min: MIN((TCC_REQ_sum * 128) / $denom)
-            max: MAX((TCC_REQ_sum * 128) / $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Req:
             avg: AVG((TCC_REQ_sum / $denom))
             min: MIN((TCC_REQ_sum / $denom))
@@ -13971,10 +13994,10 @@ panels:
             unit: (Req  + $normUnit)
         gfx942:
           Bandwidth:
-            avg: AVG((TCC_REQ_sum * 128) / $denom)
-            min: MIN((TCC_REQ_sum * 128) / $denom)
-            max: MAX((TCC_REQ_sum * 128) / $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Req:
             avg: AVG((TCC_REQ_sum / $denom))
             min: MIN((TCC_REQ_sum / $denom))
@@ -14070,25 +14093,25 @@ panels:
             unit: (Req  + $normUnit)
         gfx950:
           Bandwidth:
-            avg: AVG((TCC_REQ_sum * 128) / $denom)
-            min: MIN((TCC_REQ_sum * 128) / $denom)
-            max: MAX((TCC_REQ_sum * 128) / $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            min: MIN((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            max: MAX((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Read Bandwidth:
-            avg: AVG(TCC_READ_SECTORS_sum * 32/ $denom)
-            min: MIN(TCC_READ_SECTORS_sum * 32/ $denom)
-            max: MAX(TCC_READ_SECTORS_sum * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_READ_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_READ_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_READ_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Write Bandwidth:
-            avg: AVG(TCC_WRITE_SECTORS_sum * 32/ $denom)
-            min: MIN(TCC_WRITE_SECTORS_sum * 32/ $denom)
-            max: MAX(TCC_WRITE_SECTORS_sum * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_WRITE_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_WRITE_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_WRITE_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Atomic Bandwidth:
-            avg: AVG(TCC_ATOMIC_SECTORS_sum * 32/ $denom)
-            min: MIN(TCC_ATOMIC_SECTORS_sum * 32/ $denom)
-            max: MAX(TCC_ATOMIC_SECTORS_sum * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_ATOMIC_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_ATOMIC_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_ATOMIC_SECTORS_sum * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Req:
             avg: AVG((TCC_REQ_sum / $denom))
             min: MIN((TCC_REQ_sum / $denom))
@@ -14194,10 +14217,10 @@ panels:
             unit: (Req  + $normUnit)
         gfx908:
           Bandwidth:
-            avg: AVG((TCC_REQ_sum * 64) / $denom)
-            min: MIN((TCC_REQ_sum * 64) / $denom)
-            max: MAX((TCC_REQ_sum * 64) / $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG((TCC_REQ_sum * 64) / (End_Timestamp - Start_Timestamp))
+            min: MIN((TCC_REQ_sum * 64) / (End_Timestamp - Start_Timestamp))
+            max: MAX((TCC_REQ_sum * 64) / (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Req:
             avg: AVG((TCC_REQ_sum / $denom))
             min: MIN((TCC_REQ_sum / $denom))
@@ -14736,20 +14759,20 @@ panels:
             max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
             unit: (Req  + $normUnit)
           Read Bandwidth - PCIe:
-            avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
-            min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
-            max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           "Read Bandwidth - Infinity Fabric\u2122":
-            avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
-            min: MIN(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ $denom)
-            max: MAX(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_RDREQ_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Read Bandwidth - HBM:
-            avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ $denom)
-            min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ $denom)
-            max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Write and Atomic (32B):
             avg: AVG(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
             min: MIN(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
@@ -14776,20 +14799,20 @@ panels:
             max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
             unit: (Req  + $normUnit)
           Write Bandwidth - PCIe:
-            avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
-            min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
-            max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           "Write Bandwidth - Infinity Fabric\u2122":
-            avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
-            min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ $denom)
-            max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Write Bandwidth - HBM:
-            avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ $denom)
-            min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ $denom)
-            max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Atomic:
             avg: AVG((TCC_EA0_ATOMIC_sum / $denom))
             min: MIN((TCC_EA0_ATOMIC_sum / $denom))
@@ -14801,20 +14824,20 @@ panels:
             max: MAX((TCC_EA0_WRREQ_ATOMIC_DRAM_sum / $denom))
             unit: (Req  + $normUnit)
           Atomic Bandwidth - PCIe:
-            avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ $denom)
-            min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ $denom)
-            max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           "Atomic Bandwidth - Infinity Fabric\u2122":
-            avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ $denom)
-            min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ $denom)
-            max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
           Atomic Bandwidth - HBM:
-            avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ $denom)
-            min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ $denom)
-            max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ $denom)
-            unit: (Bytes + $normUnit)
+            avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum  * 32/ (End_Timestamp - Start_Timestamp))
+            unit: Gbps
         gfx908:
           Read (32B):
             avg: AVG((TCC_EA_RDREQ_32B_sum / $denom))
@@ -14920,11 +14943,9 @@ panels:
         channels multiplied by the HBM channel width multiplied by the HBM clock frequency.
       unit: GB/s
     Read BW:
-      plain: The total number of bytes read by the L2 cache from Infinity Fabric per
-        normalization unit.
-      rst: The total number of bytes read by the L2 cache from Infinity Fabric per  :ref:`normalization
-        unit <normalization-units>`.
-      unit: Bytes per normalization unit
+      plain: The total number of bytes read by the L2 cache from Infinity Fabric divided by total duration.
+      rst: The total number of bytes read by the L2 cache from Infinity Fabric divided by total duration.
+      unit: Gbps
     HBM Read Traffic:
       plain: The percent of read requests generated by the L2 cache that are routed
         to the accelerator's local high-bandwidth memory (HBM). This breakdown does
@@ -14972,17 +14993,17 @@ panels:
       unit: Percent
     Write and Atomic BW:
       plain: The total number of bytes written by the L2 over Infinity Fabric by write
-        and atomic operations per normalization unit. Note that on current CDNA accelerators,
+        and atomic operations divided by total duration. Note that on current CDNA accelerators,
         such as the MI2XX, requests are only considered atomic by Infinity Fabric
         if they are targeted at non-write-cacheable memory, for example, fine-grained
         memory allocations or uncached memory allocations on the MI2XX.
       rst: The total number of bytes written by the L2 over Infinity Fabric by write  and
-        atomic operations per  :ref:`normalization unit <normalization-units>`. Note
+        atomic operations divided by total duration. Note
         that on current  CDNA accelerators, such as the :ref:`MI2XX <mixxx-note>`,
         requests are  only considered *atomic* by Infinity Fabric if they are targeted
         at  non-write-cacheable memory, for example,  :ref:`fine-grained memory <memory-type>`
         allocations or  :ref:`uncached memory <memory-type>` allocations on the  MI2XX.
-      unit: Bytes per normalization unit
+      unit: Gbps
     HBM Write and Atomic Traffic:
       plain: The percent of write and atomic requests generated by the L2 cache that
         are routed to the accelerator's local high-bandwidth memory (HBM). This breakdown
@@ -15074,36 +15095,36 @@ panels:
         (atomic with return value) was returned to the L2.
       unit: Cycles
     Bandwidth:
-      plain: The number of bytes looked up in the L2 cache, per normalization unit.
+      plain: The number of bytes looked up in the L2 cache, divided by total duration.
         The number of bytes is calculated as the number of cache lines requested multiplied
         by the cache line size. This value does not consider partial requests, so
         for example, if only a single value is requested in a cache line, the data
         movement will still be counted as a full cache line.
-      rst: The number of bytes looked up in the L2 cache, per  :ref:`normalization
-        unit <normalization-units>`.  The number of bytes is  calculated as the number
-        of cache lines requested multiplied by the cache  line size. This value does
-        not consider partial requests, so for example,  if only a single value is
+      rst: The number of bytes looked up in the L2 cache, divided by total duration.
+        The number of bytes is  calculated as the number of cache lines requested
+        multiplied by the cache line size. This value does
+        not consider partial requests, so for example, if only a single value is
         requested in a cache line, the data movement  will still be counted as a full
         cache line.
-      unit: Bytes per normalization unit
+      unit: Gbps
     Read Bandwidth:
       plain: Total number of bytes looked up in the L2 cache for read requests,
-        per normalization unit.
+        divided by total duration.
       rst: Total number of bytes looked up in the L2 cache for read requests,
-        per :ref:`normalization unit <normalization-units>`.
-      unit: Bytes per normalization unit
+        divided by total duration.
+      unit: Gbps
     Write Bandwidth:
       plain: Total number of bytes looked up in the L2 cache for write requests,
-        per normalization unit.
+        divided by total duration.
       rst: Total number of bytes looked up in the L2 cache for write requests,
-        per :ref:`normalization unit <normalization-units>`.
-      unit: Bytes per normalization unit
+        divided by total duration.
+      unit: Gbps
     Atomic Bandwidth:
       plain: Total number of bytes looked up in the L2 cache for atomic requests,
-        per normalization unit.
+        divided by total duration.
       rst: Total number of bytes looked up in the L2 cache for atomic requests,
-        per :ref:`normalization unit <normalization-units>`.
-      unit: Bytes per normalization unit
+        divided by total duration.
+      unit: Gbps
     Req:
       plain: The total number of incoming requests to the L2 from all clients for
         all request types, per normalization unit.
@@ -15276,17 +15297,17 @@ panels:
         unit <normalization-units>`. See  :ref:`l2-request-flow` for more detail.
       unit: Requests per normalization unit
     Read Bandwidth - PCIe:
-      plain: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization unit.
-      rst: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 read requests due to PCIe traffic, divided by total duration.
+      rst: Total number of bytes due to L2 read requests due to PCIe traffic, divided by total duration.
+      unit: Gbps
     "Read Bandwidth - Infinity Fabric\u2122":
-      plain: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, per normalization unit.
-      rst: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, divided by total duration.
+      rst: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, divided by total duration.
+      unit: Gbps
     Read Bandwidth - HBM:
-      plain: Total number of bytes due to L2 read requests due to HBM traffic, per normalization unit.
-      rst: Total number of bytes due to L2 read requests due to HBM traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 read requests due to HBM traffic, divided by total duration.
+      rst: Total number of bytes due to L2 read requests due to HBM traffic, divided by total duration.
+      unit: Gbps
     Write and Atomic (32B):
       plain: The total number of L2 requests to Infinity Fabric to write or atomically
         update 32B of data to any memory location, per normalization unit.
@@ -15326,29 +15347,29 @@ panels:
         for more detail.
       unit: Requests per normalization unit
     Write Bandwidth - PCIe:
-      plain: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization unit.
-      rst: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 write requests due to PCIe traffic, divided by total duration.
+      rst: Total number of bytes due to L2 write requests due to PCIe traffic, divided by total duration.
+      unit: Gbps
     "Write Bandwidth - Infinity Fabric\u2122":
-      plain: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, per normalization unit.
-      rst: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, divided by total duration.
+      rst: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, divided by total duration.
+      unit: Gbps
     Write Bandwidth - HBM:
-      plain: Total number of bytes due to L2 write requests due to HBM traffic, per normalization unit.
-      rst: Total number of bytes due to L2 write requests due to HBM traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 write requests due to HBM traffic, divided by total duration.
+      rst: Total number of bytes due to L2 write requests due to HBM traffic, divided by total duration.
+      unit: Gbps
     Atomic Bandwidth - PCIe:
-      plain: Total number of bytes due to L2 atomic requests due to PCIe traffic, per normalization unit.
-      rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 atomic requests due to PCIe traffic, divided by total duration.
+      rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, divided by total duration.
+      unit: Gbps
     "Atomic Bandwidth - Infinity Fabric\u2122":
-      plain: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, per normalization unit.
-      rst: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, divided by total duration.
+      rst: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, divided by total duration.
+      unit: Gbps
     Atomic Bandwidth - HBM:
-      plain: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization unit.
-      rst: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization unit.
-      unit: Bytes per normalization unit
+      plain: Total number of bytes due to L2 atomic requests due to HBM traffic, divided by total duration.
+      rst: Total number of bytes due to L2 atomic requests due to HBM traffic, divided by total duration.
+      unit: Gbps
     Atomic:
       plain: The total number of L2 requests to Infinity Fabric to atomically update
         32B or 64B of data in any memory location, per normalization unit. See Request