diff --git a/docs/data/metrics_description.yaml b/docs/data/metrics_description.yaml index 822119dc28..512518ab65 100644 --- a/docs/data/metrics_description.yaml +++ b/docs/data/metrics_description.yaml @@ -716,6 +716,11 @@ Vector L1 data-return path or Texture Data (TD): was stalled by the :ref:`workgroup manager ` due to initialization of registers as a part of launching new workgroups. unit: Percent + Write Ack Instructions: + rst: The total number of write acknowledgements submitted by :ref:`data-return + unit ` to SQ, summed over all compute units on the accelerator, per + normalization unit. + unit: Instructions per normalization unit Write Instructions: rst: The number of store instructions submitted to the :ref:`data-return unit ` by the :ref:`address processor ` summed over all :doc:`compute @@ -755,6 +760,10 @@ L2 Speed-of-Light: :ref:`total L2 cycles `. unit: Percent L2 cache accesses: + Atomic Bandwidth: + rst: Total number of bytes looked up in the L2 cache for atomic requests, per + :ref:`normalization unit `. + unit: Bytes per normalization unit Atomic Req: rst: The total number of atomic requests (with and without return) to the L2 from all clients. @@ -808,6 +817,10 @@ L2 cache accesses: rst: The total number of requests to the L2 that go to Read-Write coherent memory (RW) allocations. See the :ref:`memory-type` for more information. unit: Requests per normalization unit + Read Bandwidth: + rst: Total number of bytes looked up in the L2 cache for read requests, per :ref:`normalization + unit `. + unit: Bytes per normalization unit Read Req: rst: 'The total number of read requests to the L2 from all clients. ' unit: Requests per normalization unit @@ -827,6 +840,10 @@ L2 cache accesses: rst: The total number of requests to the L2 that go to Uncached (UC) memory allocations. See the :ref:`memory-type` for more information. unit: Requests per normalization unit + Write Bandwidth: + rst: Total number of bytes looked up in the L2 cache for write requests, per :ref:`normalization + unit `. + unit: Bytes per normalization unit Write Req: rst: The total number of write requests to the L2 from all clients. unit: Requests per normalization unit @@ -957,6 +974,18 @@ L2 - Fabric interface detailed metrics: as :ref:`fine-grained memory ` allocations or :ref:`uncached memory ` allocations on the MI2XX. unit: Requests per normalization unit + Atomic Bandwidth - HBM: + rst: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization + unit. + unit: Bytes per normalization unit + "Atomic Bandwidth - Infinity Fabric\u2122": + rst: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, + per normalization unit. + unit: Bytes per normalization unit + Atomic Bandwidth - PCIe: + rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, per + normalization unit. + unit: Bytes per normalization unit HBM Read: rst: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from the accelerator's local HBM, per :ref:`normalization unit `. @@ -983,6 +1012,18 @@ L2 - Fabric interface detailed metrics: `. 64B requests for uncached data are counted as two 32B uncached data requests. See :ref:`l2-request-flow` for more detail. unit: Requests per normalization unit + Read Bandwidth - HBM: + rst: Total number of bytes due to L2 read requests due to HBM traffic, per normalization + unit. + unit: Bytes per normalization unit + "Read Bandwidth - Infinity Fabric\u2122": + rst: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, + per normalization unit. + unit: Bytes per normalization unit + Read Bandwidth - PCIe: + rst: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization + unit. + unit: Bytes per normalization unit Remote Read: rst: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from any source other than the accelerator's local HBM, per :ref:`normalization @@ -994,6 +1035,18 @@ L2 - Fabric interface detailed metrics: HBM, per :ref:`normalization unit `. See :ref:`l2-request-flow` for more detail. unit: Requests per normalization unit + Write Bandwidth - HBM: + rst: Total number of bytes due to L2 write requests due to HBM traffic, per normalization + unit. + unit: Bytes per normalization unit + "Write Bandwidth - Infinity Fabric\u2122": + rst: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, + per normalization unit. + unit: Bytes per normalization unit + Write Bandwidth - PCIe: + rst: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization + unit. + unit: Bytes per normalization unit Write and Atomic (32B): rst: The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per :ref:`normalization unit `. diff --git a/src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml b/src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml index 754cbbb688..67c3aa1dfc 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml @@ -63,6 +63,9 @@ Panel Config: unit by the address processor summed over all compute units on the accelerator, per normalization unit. This is expected to be the sum of global/generic and spill/stack atomics in the address processor. + Write Ack Instructions: The total number of write acknowledgements submitted by + data-return unit to SQ, summed over all compute units on the accelerator, per + normalization unit. data source: - metric_table: id: 1501 diff --git a/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml b/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml index d9bc1ca1a9..6e77eb8f93 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml @@ -87,6 +87,12 @@ Panel Config: by the cache line size. This value does not consider partial requests, so for example, if only a single value is requested in a cache line, the data movement will still be counted as a full cache line. + Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests, + per normalization unit. + Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests, + per normalization unit. + Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests, + per normalization unit. Req: The total number of incoming requests to the L2 from all clients for all request types, per normalization unit. Read Req: The total number of read requests to the L2 from all clients. @@ -143,6 +149,12 @@ Panel Config: Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from any source other than the accelerator's local HBM, per normalization unit. + Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe + traffic, per normalization unit. + "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read + requests due to Infinity Fabric traffic, per normalization unit. + Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM + traffic, per normalization unit. Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per normalization unit. @@ -158,6 +170,18 @@ Panel Config: Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to write or atomically update 32B or 64B of data in any memory location other than the accelerator's local HBM, per normalization unit. + Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to + PCIe traffic, per normalization unit. + "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write + requests due to Infinity Fabric traffic, per normalization unit. + Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM + traffic, per normalization unit. + Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to + PCIe traffic, per normalization unit. + "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic + requests due to Infinity Fabric traffic, per normalization unit. + Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to + HBM traffic, per normalization unit. Atomic: The total number of L2 requests to Infinity Fabric to atomically update 32B or 64B of data in any memory location, per normalization unit. See Request flow for more detail. Note that on current CDNA accelerators, such as the MI2XX, diff --git a/src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml b/src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml index 4d808aecab..0d826ceb1b 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml @@ -63,6 +63,9 @@ Panel Config: unit by the address processor summed over all compute units on the accelerator, per normalization unit. This is expected to be the sum of global/generic and spill/stack atomics in the address processor. + Write Ack Instructions: The total number of write acknowledgements submitted by + data-return unit to SQ, summed over all compute units on the accelerator, per + normalization unit. data source: - metric_table: id: 1501 diff --git a/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml b/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml index f3ecdc468c..14398e1104 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml @@ -87,6 +87,12 @@ Panel Config: by the cache line size. This value does not consider partial requests, so for example, if only a single value is requested in a cache line, the data movement will still be counted as a full cache line. + Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests, + per normalization unit. + Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests, + per normalization unit. + Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests, + per normalization unit. Req: The total number of incoming requests to the L2 from all clients for all request types, per normalization unit. Read Req: The total number of read requests to the L2 from all clients. @@ -143,6 +149,12 @@ Panel Config: Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from any source other than the accelerator's local HBM, per normalization unit. + Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe + traffic, per normalization unit. + "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read + requests due to Infinity Fabric traffic, per normalization unit. + Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM + traffic, per normalization unit. Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per normalization unit. @@ -158,6 +170,18 @@ Panel Config: Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to write or atomically update 32B or 64B of data in any memory location other than the accelerator's local HBM, per normalization unit. + Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to + PCIe traffic, per normalization unit. + "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write + requests due to Infinity Fabric traffic, per normalization unit. + Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM + traffic, per normalization unit. + Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to + PCIe traffic, per normalization unit. + "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic + requests due to Infinity Fabric traffic, per normalization unit. + Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to + HBM traffic, per normalization unit. Atomic: The total number of L2 requests to Infinity Fabric to atomically update 32B or 64B of data in any memory location, per normalization unit. See Request flow for more detail. Note that on current CDNA accelerators, such as the MI2XX, diff --git a/src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml b/src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml index f920234926..cdbb5393aa 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml @@ -63,6 +63,9 @@ Panel Config: unit by the address processor summed over all compute units on the accelerator, per normalization unit. This is expected to be the sum of global/generic and spill/stack atomics in the address processor. + Write Ack Instructions: The total number of write acknowledgements submitted by + data-return unit to SQ, summed over all compute units on the accelerator, per + normalization unit. data source: - metric_table: id: 1501 diff --git a/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml b/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml index c2b82a38ec..36d5943858 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml @@ -87,6 +87,12 @@ Panel Config: by the cache line size. This value does not consider partial requests, so for example, if only a single value is requested in a cache line, the data movement will still be counted as a full cache line. + Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests, + per normalization unit. + Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests, + per normalization unit. + Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests, + per normalization unit. Req: The total number of incoming requests to the L2 from all clients for all request types, per normalization unit. Read Req: The total number of read requests to the L2 from all clients. @@ -143,6 +149,12 @@ Panel Config: Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from any source other than the accelerator's local HBM, per normalization unit. + Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe + traffic, per normalization unit. + "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read + requests due to Infinity Fabric traffic, per normalization unit. + Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM + traffic, per normalization unit. Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per normalization unit. @@ -158,6 +170,18 @@ Panel Config: Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to write or atomically update 32B or 64B of data in any memory location other than the accelerator's local HBM, per normalization unit. + Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to + PCIe traffic, per normalization unit. + "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write + requests due to Infinity Fabric traffic, per normalization unit. + Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM + traffic, per normalization unit. + Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to + PCIe traffic, per normalization unit. + "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic + requests due to Infinity Fabric traffic, per normalization unit. + Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to + HBM traffic, per normalization unit. Atomic: The total number of L2 requests to Infinity Fabric to atomically update 32B or 64B of data in any memory location, per normalization unit. See Request flow for more detail. Note that on current CDNA accelerators, such as the MI2XX, diff --git a/src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml b/src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml index f920234926..cdbb5393aa 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml @@ -63,6 +63,9 @@ Panel Config: unit by the address processor summed over all compute units on the accelerator, per normalization unit. This is expected to be the sum of global/generic and spill/stack atomics in the address processor. + Write Ack Instructions: The total number of write acknowledgements submitted by + data-return unit to SQ, summed over all compute units on the accelerator, per + normalization unit. data source: - metric_table: id: 1501 diff --git a/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml b/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml index f1fd043df1..e7acf40a5c 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml @@ -87,6 +87,12 @@ Panel Config: by the cache line size. This value does not consider partial requests, so for example, if only a single value is requested in a cache line, the data movement will still be counted as a full cache line. + Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests, + per normalization unit. + Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests, + per normalization unit. + Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests, + per normalization unit. Req: The total number of incoming requests to the L2 from all clients for all request types, per normalization unit. Read Req: The total number of read requests to the L2 from all clients. @@ -143,6 +149,12 @@ Panel Config: Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from any source other than the accelerator's local HBM, per normalization unit. + Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe + traffic, per normalization unit. + "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read + requests due to Infinity Fabric traffic, per normalization unit. + Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM + traffic, per normalization unit. Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per normalization unit. @@ -158,6 +170,18 @@ Panel Config: Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to write or atomically update 32B or 64B of data in any memory location other than the accelerator's local HBM, per normalization unit. + Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to + PCIe traffic, per normalization unit. + "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write + requests due to Infinity Fabric traffic, per normalization unit. + Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM + traffic, per normalization unit. + Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to + PCIe traffic, per normalization unit. + "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic + requests due to Infinity Fabric traffic, per normalization unit. + Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to + HBM traffic, per normalization unit. Atomic: The total number of L2 requests to Infinity Fabric to atomically update 32B or 64B of data in any memory location, per normalization unit. See Request flow for more detail. Note that on current CDNA accelerators, such as the MI2XX, diff --git a/src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml b/src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml index f920234926..cdbb5393aa 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml @@ -63,6 +63,9 @@ Panel Config: unit by the address processor summed over all compute units on the accelerator, per normalization unit. This is expected to be the sum of global/generic and spill/stack atomics in the address processor. + Write Ack Instructions: The total number of write acknowledgements submitted by + data-return unit to SQ, summed over all compute units on the accelerator, per + normalization unit. data source: - metric_table: id: 1501 diff --git a/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml b/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml index 35777aa064..0a72362ea7 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml @@ -87,6 +87,12 @@ Panel Config: by the cache line size. This value does not consider partial requests, so for example, if only a single value is requested in a cache line, the data movement will still be counted as a full cache line. + Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests, + per normalization unit. + Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests, + per normalization unit. + Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests, + per normalization unit. Req: The total number of incoming requests to the L2 from all clients for all request types, per normalization unit. Read Req: The total number of read requests to the L2 from all clients. @@ -143,6 +149,12 @@ Panel Config: Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from any source other than the accelerator's local HBM, per normalization unit. + Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe + traffic, per normalization unit. + "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read + requests due to Infinity Fabric traffic, per normalization unit. + Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM + traffic, per normalization unit. Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per normalization unit. @@ -158,6 +170,18 @@ Panel Config: Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to write or atomically update 32B or 64B of data in any memory location other than the accelerator's local HBM, per normalization unit. + Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to + PCIe traffic, per normalization unit. + "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write + requests due to Infinity Fabric traffic, per normalization unit. + Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM + traffic, per normalization unit. + Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to + PCIe traffic, per normalization unit. + "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic + requests due to Infinity Fabric traffic, per normalization unit. + Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to + HBM traffic, per normalization unit. Atomic: The total number of L2 requests to Infinity Fabric to atomically update 32B or 64B of data in any memory location, per normalization unit. See Request flow for more detail. Note that on current CDNA accelerators, such as the MI2XX, diff --git a/src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml b/src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml index dfe29d7b99..a37f24eab6 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml @@ -63,6 +63,9 @@ Panel Config: unit by the address processor summed over all compute units on the accelerator, per normalization unit. This is expected to be the sum of global/generic and spill/stack atomics in the address processor. + Write Ack Instructions: The total number of write acknowledgements submitted by + data-return unit to SQ, summed over all compute units on the accelerator, per + normalization unit. data source: - metric_table: id: 1501 diff --git a/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml b/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml index 85abb7d025..c354429c0e 100644 --- a/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml +++ b/src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml @@ -87,6 +87,12 @@ Panel Config: by the cache line size. This value does not consider partial requests, so for example, if only a single value is requested in a cache line, the data movement will still be counted as a full cache line. + Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests, + per normalization unit. + Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests, + per normalization unit. + Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests, + per normalization unit. Req: The total number of incoming requests to the L2 from all clients for all request types, per normalization unit. Read Req: The total number of read requests to the L2 from all clients. @@ -143,6 +149,12 @@ Panel Config: Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data from any source other than the accelerator's local HBM, per normalization unit. + Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe + traffic, per normalization unit. + "Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read + requests due to Infinity Fabric traffic, per normalization unit. + Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM + traffic, per normalization unit. Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per normalization unit. @@ -158,6 +170,18 @@ Panel Config: Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to write or atomically update 32B or 64B of data in any memory location other than the accelerator's local HBM, per normalization unit. + Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to + PCIe traffic, per normalization unit. + "Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write + requests due to Infinity Fabric traffic, per normalization unit. + Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM + traffic, per normalization unit. + Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to + PCIe traffic, per normalization unit. + "Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic + requests due to Infinity Fabric traffic, per normalization unit. + Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to + HBM traffic, per normalization unit. Atomic: The total number of L2 requests to Infinity Fabric to atomically update 32B or 64B of data in any memory location, per normalization unit. See Request flow for more detail. Note that on current CDNA accelerators, such as the MI2XX, @@ -628,6 +652,21 @@ Panel Config: min: MIN((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom)) max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom)) unit: (Req + $normUnit) + Read Bandwidth - PCIe: + avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom) + unit: (Bytes + $normUnit) + "Read Bandwidth - Infinity Fabric\u2122": + avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom) + unit: (Bytes + $normUnit) + Read Bandwidth - HBM: + avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom) + unit: (Bytes + $normUnit) Write and Atomic (32B): avg: AVG(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom)) min: MIN(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom)) @@ -654,19 +693,19 @@ Panel Config: max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom)) unit: (Req + $normUnit) Write Bandwidth - PCIe: - avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) "Write Bandwidth - Infinity Fabric\u2122": - avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) Write Bandwidth - HBM: - avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) Atomic: avg: AVG((TCC_EA0_ATOMIC_sum / $denom)) @@ -679,17 +718,17 @@ Panel Config: max: MAX((TCC_EA0_WRREQ_ATOMIC_DRAM_sum / $denom)) unit: (Req + $normUnit) Atomic Bandwidth - PCIe: - avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) "Atomic Bandwidth - Infinity Fabric\u2122": - avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) Atomic Bandwidth - HBM: - avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) diff --git a/utils/autogen_hash.yaml b/utils/autogen_hash.yaml index ff42ad10e6..b3b20b7a8e 100644 --- a/utils/autogen_hash.yaml +++ b/utils/autogen_hash.yaml @@ -77,24 +77,24 @@ src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml: src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5 src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5 src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5 -src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 231f9b7c09266c4aac50ac4db1b055c36eb6e563ba713c5f3aa30508d03b9170 -src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml: eb1ec287cc1f9f133b80fdde072a2b86e819f96ccdf4c305e721f3466d37b156 -src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 52ae21cec4ce4990e966d7fb438ac02b7e63ad4bc428f9770cd2c08d80f712da -src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 52ae21cec4ce4990e966d7fb438ac02b7e63ad4bc428f9770cd2c08d80f712da -src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 52ae21cec4ce4990e966d7fb438ac02b7e63ad4bc428f9770cd2c08d80f712da -src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml: f7b032202e1aea6befda0d62e3d9f04b846f473218bd62e90d59a34678b62a77 +src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 633d59aba82b3a495b7ba33fa4b2ae4da638b58632bcc37ff18be87af68ce4d4 +src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 2bdb9d7b3bea1057b3baee29ba3b428b211808261063a97bc4b6b319f4a19fb3 +src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19 +src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19 +src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19 +src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 9e56cef5b066fb575a5c530bcf9400f1291dd8636b12c8a2244cdba1defafc9f src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml: e6ec43014ce7b7cc072385d4eba072dd187b5de14979c169a3c1e9b8fc4c2762 src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml: e6ec43014ce7b7cc072385d4eba072dd187b5de14979c169a3c1e9b8fc4c2762 src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28 src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28 src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28 src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml: cd21327c193d2af8c18066b9c13f67e3d5dfb44731777bc5a1b6a7738c902dd1 -src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml: 6aeda249093c666000b104f8631b4a85698e083dd55e77e1e1f095f222054742 -src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml: a4ec667e0b827c046de207416d185dd528f030f29bdee162a2634e579bb31846 -src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml: a9ac811e491fce354aef029b11a96edb589535e84224fa2e2b323623e9fd6e00 -src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml: 7d925c3369b366c23e638ca2b3d074672324a5b9fd0fa586a3e71dee458743a6 -src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml: 7532dc55c28c809f435f5edae98632a2d99adc898b2b71a661e2c9696f674f4a -src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml: a9f3146a99e74eaba5327be3cdf9361fb8b69d1640751fb05519e44dd2ec7292 +src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml: 5b48c690b6069a5610d07cc0c2a5e1da65a52296205dcf48a3b6fa5e3df36e9b +src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml: a9b128267a069060e891533334c52586c706f145b1e813a4081cb21d425516ad +src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml: b4eea39f0e23e501ad503cdd96db377109c7f0e212949828fe06102de7355349 +src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml: da0189cd7f6e1ab4b79d0c054c2cdc1f7a9c81972dae9e5285f2f3d9c30ca644 +src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml: b0802f923052eb584ce138210ebf2db70fb7883926896da1861a9e857d4abe81 +src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml: 58bdd965421d610567e461becd7094fa41d668b119eddab99054d2bd6dc12acf src/rocprof_compute_soc/analysis_configs/gfx908/1800_l2_cache_per_channel.yaml: a0c53202fe9f68d5e1fa689ce0643c471ced7d47e007d8ccc68fba294f7f6a05 src/rocprof_compute_soc/analysis_configs/gfx90a/1800_l2_cache_per_channel.yaml: a0c53202fe9f68d5e1fa689ce0643c471ced7d47e007d8ccc68fba294f7f6a05 src/rocprof_compute_soc/analysis_configs/gfx940/1800_l2_cache_per_channel.yaml: e184e3692eb0d641fb2e37fada0e58a6c4958553931d7c038b884e1e6986093f @@ -107,4 +107,4 @@ src/rocprof_compute_soc/analysis_configs/gfx940/2100_pc_sampling.yaml: 4f3af5504 src/rocprof_compute_soc/analysis_configs/gfx941/2100_pc_sampling.yaml: 4f3af55040c40bee5f1fd88d83e2324d06e5dc462c0adc3e6d5b19b3f31af5e7 src/rocprof_compute_soc/analysis_configs/gfx942/2100_pc_sampling.yaml: 4f3af55040c40bee5f1fd88d83e2324d06e5dc462c0adc3e6d5b19b3f31af5e7 src/rocprof_compute_soc/analysis_configs/gfx950/2100_pc_sampling.yaml: 4f3af55040c40bee5f1fd88d83e2324d06e5dc462c0adc3e6d5b19b3f31af5e7 -docs/data/metrics_description.yaml: b912cf868d488d6ff78d4efc6ceeca27cca5811f4c705efa68a21dd6ddb1609b +docs/data/metrics_description.yaml: 819c08a584ae8b418e6983aa51108b95e43eda4f3b7892eab336c61d844b20bf diff --git a/utils/unified_config.yaml b/utils/unified_config.yaml index fbc585e6c8..fb6286d7ab 100644 --- a/utils/unified_config.yaml +++ b/utils/unified_config.yaml @@ -10913,6 +10913,13 @@ panels: This is expected to be the sum of global/generic and spill/stack atomics in the :ref:`address processor `. unit: Instructions per normalization unit + Write Ack Instructions: + plain: The total number of write acknowledgements submitted by data-return + unit to SQ, summed over all compute units on the accelerator, per normalization + unit. + rst: The total number of write acknowledgements submitted by :ref:`data-return unit ` + to SQ, summed over all compute units on the accelerator, per normalization unit. + unit: Instructions per normalization unit - id: 1600 title: Vector L1 Data Cache data source: @@ -14728,6 +14735,21 @@ panels: min: MIN((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom)) max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom)) unit: (Req + $normUnit) + Read Bandwidth - PCIe: + avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom) + unit: (Bytes + $normUnit) + "Read Bandwidth - Infinity Fabric\u2122": + avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom) + unit: (Bytes + $normUnit) + Read Bandwidth - HBM: + avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom) + unit: (Bytes + $normUnit) Write and Atomic (32B): avg: AVG(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom)) min: MIN(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom)) @@ -14754,19 +14776,19 @@ panels: max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom)) unit: (Req + $normUnit) Write Bandwidth - PCIe: - avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) "Write Bandwidth - Infinity Fabric\u2122": - avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) Write Bandwidth - HBM: - avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) Atomic: avg: AVG((TCC_EA0_ATOMIC_sum / $denom)) @@ -14779,19 +14801,19 @@ panels: max: MAX((TCC_EA0_WRREQ_ATOMIC_DRAM_sum / $denom)) unit: (Req + $normUnit) Atomic Bandwidth - PCIe: - avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) "Atomic Bandwidth - Infinity Fabric\u2122": - avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) Atomic Bandwidth - HBM: - avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom) - min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom) - max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom) + avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom) + min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom) + max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom) unit: (Bytes + $normUnit) gfx908: Read (32B): @@ -15064,6 +15086,24 @@ panels: requested in a cache line, the data movement will still be counted as a full cache line. unit: Bytes per normalization unit + Read Bandwidth: + plain: Total number of bytes looked up in the L2 cache for read requests, + per normalization unit. + rst: Total number of bytes looked up in the L2 cache for read requests, + per :ref:`normalization unit `. + unit: Bytes per normalization unit + Write Bandwidth: + plain: Total number of bytes looked up in the L2 cache for write requests, + per normalization unit. + rst: Total number of bytes looked up in the L2 cache for write requests, + per :ref:`normalization unit `. + unit: Bytes per normalization unit + Atomic Bandwidth: + plain: Total number of bytes looked up in the L2 cache for atomic requests, + per normalization unit. + rst: Total number of bytes looked up in the L2 cache for atomic requests, + per :ref:`normalization unit `. + unit: Bytes per normalization unit Req: plain: The total number of incoming requests to the L2 from all clients for all request types, per normalization unit. @@ -15235,6 +15275,18 @@ panels: from any source other than the accelerator's local HBM, per :ref:`normalization unit `. See :ref:`l2-request-flow` for more detail. unit: Requests per normalization unit + Read Bandwidth - PCIe: + plain: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization unit. + rst: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization unit. + unit: Bytes per normalization unit + "Read Bandwidth - Infinity Fabric\u2122": + plain: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, per normalization unit. + rst: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, per normalization unit. + unit: Bytes per normalization unit + Read Bandwidth - HBM: + plain: Total number of bytes due to L2 read requests due to HBM traffic, per normalization unit. + rst: Total number of bytes due to L2 read requests due to HBM traffic, per normalization unit. + unit: Bytes per normalization unit Write and Atomic (32B): plain: The total number of L2 requests to Infinity Fabric to write or atomically update 32B of data to any memory location, per normalization unit. @@ -15273,6 +15325,30 @@ panels: HBM, per :ref:`normalization unit `. See :ref:`l2-request-flow` for more detail. unit: Requests per normalization unit + Write Bandwidth - PCIe: + plain: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization unit. + rst: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization unit. + unit: Bytes per normalization unit + "Write Bandwidth - Infinity Fabric\u2122": + plain: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, per normalization unit. + rst: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, per normalization unit. + unit: Bytes per normalization unit + Write Bandwidth - HBM: + plain: Total number of bytes due to L2 write requests due to HBM traffic, per normalization unit. + rst: Total number of bytes due to L2 write requests due to HBM traffic, per normalization unit. + unit: Bytes per normalization unit + Atomic Bandwidth - PCIe: + plain: Total number of bytes due to L2 atomic requests due to PCIe traffic, per normalization unit. + rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, per normalization unit. + unit: Bytes per normalization unit + "Atomic Bandwidth - Infinity Fabric\u2122": + plain: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, per normalization unit. + rst: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, per normalization unit. + unit: Bytes per normalization unit + Atomic Bandwidth - HBM: + plain: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization unit. + rst: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization unit. + unit: Bytes per normalization unit Atomic: plain: The total number of L2 requests to Infinity Fabric to atomically update 32B or 64B of data in any memory location, per normalization unit. See Request