Fix L2 cache bandwidth metrics for MI350 (#843)
* Fix L2 cache bandwidth metrics for MI350 * Address review comments
This commit is contained in:
committed by
GitHub
orang tua
6902b12e65
melakukan
b349e406ed
@@ -716,6 +716,11 @@ Vector L1 data-return path or Texture Data (TD):
|
||||
was stalled by the :ref:`workgroup manager <desc-spi>` due to initialization
|
||||
of registers as a part of launching new workgroups.
|
||||
unit: Percent
|
||||
Write Ack Instructions:
|
||||
rst: The total number of write acknowledgements submitted by :ref:`data-return
|
||||
unit <desc-td>` to SQ, summed over all compute units on the accelerator, per
|
||||
normalization unit.
|
||||
unit: Instructions per normalization unit
|
||||
Write Instructions:
|
||||
rst: The number of store instructions submitted to the :ref:`data-return unit
|
||||
<desc-td>` by the :ref:`address processor <desc-ta>` summed over all :doc:`compute
|
||||
@@ -755,6 +760,10 @@ L2 Speed-of-Light:
|
||||
:ref:`total L2 cycles <total-l2-cycles>`.
|
||||
unit: Percent
|
||||
L2 cache accesses:
|
||||
Atomic Bandwidth:
|
||||
rst: Total number of bytes looked up in the L2 cache for atomic requests, per
|
||||
:ref:`normalization unit <normalization-units>`.
|
||||
unit: Bytes per normalization unit
|
||||
Atomic Req:
|
||||
rst: The total number of atomic requests (with and without return) to the L2 from
|
||||
all clients.
|
||||
@@ -808,6 +817,10 @@ L2 cache accesses:
|
||||
rst: The total number of requests to the L2 that go to Read-Write coherent memory (RW)
|
||||
allocations. See the :ref:`memory-type` for more information.
|
||||
unit: Requests per normalization unit
|
||||
Read Bandwidth:
|
||||
rst: Total number of bytes looked up in the L2 cache for read requests, per :ref:`normalization
|
||||
unit <normalization-units>`.
|
||||
unit: Bytes per normalization unit
|
||||
Read Req:
|
||||
rst: 'The total number of read requests to the L2 from all clients. '
|
||||
unit: Requests per normalization unit
|
||||
@@ -827,6 +840,10 @@ L2 cache accesses:
|
||||
rst: The total number of requests to the L2 that go to Uncached (UC) memory allocations.
|
||||
See the :ref:`memory-type` for more information.
|
||||
unit: Requests per normalization unit
|
||||
Write Bandwidth:
|
||||
rst: Total number of bytes looked up in the L2 cache for write requests, per :ref:`normalization
|
||||
unit <normalization-units>`.
|
||||
unit: Bytes per normalization unit
|
||||
Write Req:
|
||||
rst: The total number of write requests to the L2 from all clients.
|
||||
unit: Requests per normalization unit
|
||||
@@ -957,6 +974,18 @@ L2 - Fabric interface detailed metrics:
|
||||
as :ref:`fine-grained memory <memory-type>` allocations or :ref:`uncached
|
||||
memory <memory-type>` allocations on the MI2XX.
|
||||
unit: Requests per normalization unit
|
||||
Atomic Bandwidth - HBM:
|
||||
rst: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization
|
||||
unit.
|
||||
unit: Bytes per normalization unit
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122":
|
||||
rst: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic,
|
||||
per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Atomic Bandwidth - PCIe:
|
||||
rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, per
|
||||
normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
HBM Read:
|
||||
rst: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data
|
||||
from the accelerator's local HBM, per :ref:`normalization unit <normalization-units>`.
|
||||
@@ -983,6 +1012,18 @@ L2 - Fabric interface detailed metrics:
|
||||
<normalization-units>`. 64B requests for uncached data are counted as two 32B
|
||||
uncached data requests. See :ref:`l2-request-flow` for more detail.
|
||||
unit: Requests per normalization unit
|
||||
Read Bandwidth - HBM:
|
||||
rst: Total number of bytes due to L2 read requests due to HBM traffic, per normalization
|
||||
unit.
|
||||
unit: Bytes per normalization unit
|
||||
"Read Bandwidth - Infinity Fabric\u2122":
|
||||
rst: Total number of bytes due to L2 read requests due to Infinity Fabric traffic,
|
||||
per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Read Bandwidth - PCIe:
|
||||
rst: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization
|
||||
unit.
|
||||
unit: Bytes per normalization unit
|
||||
Remote Read:
|
||||
rst: The total number of L2 requests to Infinity Fabric to read 32B or 64B of data
|
||||
from any source other than the accelerator's local HBM, per :ref:`normalization
|
||||
@@ -994,6 +1035,18 @@ L2 - Fabric interface detailed metrics:
|
||||
HBM, per :ref:`normalization unit <normalization-units>`. See :ref:`l2-request-flow`
|
||||
for more detail.
|
||||
unit: Requests per normalization unit
|
||||
Write Bandwidth - HBM:
|
||||
rst: Total number of bytes due to L2 write requests due to HBM traffic, per normalization
|
||||
unit.
|
||||
unit: Bytes per normalization unit
|
||||
"Write Bandwidth - Infinity Fabric\u2122":
|
||||
rst: Total number of bytes due to L2 write requests due to Infinity Fabric traffic,
|
||||
per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Write Bandwidth - PCIe:
|
||||
rst: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization
|
||||
unit.
|
||||
unit: Bytes per normalization unit
|
||||
Write and Atomic (32B):
|
||||
rst: The total number of L2 requests to Infinity Fabric to write or atomically update
|
||||
32B of data to any memory location, per :ref:`normalization unit <normalization-units>`.
|
||||
|
||||
+3
@@ -63,6 +63,9 @@ Panel Config:
|
||||
unit by the address processor summed over all compute units on the accelerator,
|
||||
per normalization unit. This is expected to be the sum of global/generic and
|
||||
spill/stack atomics in the address processor.
|
||||
Write Ack Instructions: The total number of write acknowledgements submitted by
|
||||
data-return unit to SQ, summed over all compute units on the accelerator, per
|
||||
normalization unit.
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1501
|
||||
|
||||
@@ -87,6 +87,12 @@ Panel Config:
|
||||
by the cache line size. This value does not consider partial requests, so for
|
||||
example, if only a single value is requested in a cache line, the data movement
|
||||
will still be counted as a full cache line.
|
||||
Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per normalization unit.
|
||||
Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per normalization unit.
|
||||
Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per normalization unit.
|
||||
Req: The total number of incoming requests to the L2 from all clients for all
|
||||
request types, per normalization unit.
|
||||
Read Req: The total number of read requests to the L2 from all clients.
|
||||
@@ -143,6 +149,12 @@ Panel Config:
|
||||
Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or
|
||||
64B of data from any source other than the accelerator's local HBM, per normalization
|
||||
unit.
|
||||
Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
|
||||
traffic, per normalization unit.
|
||||
"Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B of data to any memory location, per normalization
|
||||
unit.
|
||||
@@ -158,6 +170,18 @@ Panel Config:
|
||||
Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B or 64B of data in any memory location other than
|
||||
the accelerator's local HBM, per normalization unit.
|
||||
Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
|
||||
HBM traffic, per normalization unit.
|
||||
Atomic: The total number of L2 requests to Infinity Fabric to atomically update
|
||||
32B or 64B of data in any memory location, per normalization unit. See Request
|
||||
flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
|
||||
|
||||
+3
@@ -63,6 +63,9 @@ Panel Config:
|
||||
unit by the address processor summed over all compute units on the accelerator,
|
||||
per normalization unit. This is expected to be the sum of global/generic and
|
||||
spill/stack atomics in the address processor.
|
||||
Write Ack Instructions: The total number of write acknowledgements submitted by
|
||||
data-return unit to SQ, summed over all compute units on the accelerator, per
|
||||
normalization unit.
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1501
|
||||
|
||||
@@ -87,6 +87,12 @@ Panel Config:
|
||||
by the cache line size. This value does not consider partial requests, so for
|
||||
example, if only a single value is requested in a cache line, the data movement
|
||||
will still be counted as a full cache line.
|
||||
Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per normalization unit.
|
||||
Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per normalization unit.
|
||||
Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per normalization unit.
|
||||
Req: The total number of incoming requests to the L2 from all clients for all
|
||||
request types, per normalization unit.
|
||||
Read Req: The total number of read requests to the L2 from all clients.
|
||||
@@ -143,6 +149,12 @@ Panel Config:
|
||||
Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or
|
||||
64B of data from any source other than the accelerator's local HBM, per normalization
|
||||
unit.
|
||||
Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
|
||||
traffic, per normalization unit.
|
||||
"Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B of data to any memory location, per normalization
|
||||
unit.
|
||||
@@ -158,6 +170,18 @@ Panel Config:
|
||||
Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B or 64B of data in any memory location other than
|
||||
the accelerator's local HBM, per normalization unit.
|
||||
Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
|
||||
HBM traffic, per normalization unit.
|
||||
Atomic: The total number of L2 requests to Infinity Fabric to atomically update
|
||||
32B or 64B of data in any memory location, per normalization unit. See Request
|
||||
flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
|
||||
|
||||
+3
@@ -63,6 +63,9 @@ Panel Config:
|
||||
unit by the address processor summed over all compute units on the accelerator,
|
||||
per normalization unit. This is expected to be the sum of global/generic and
|
||||
spill/stack atomics in the address processor.
|
||||
Write Ack Instructions: The total number of write acknowledgements submitted by
|
||||
data-return unit to SQ, summed over all compute units on the accelerator, per
|
||||
normalization unit.
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1501
|
||||
|
||||
@@ -87,6 +87,12 @@ Panel Config:
|
||||
by the cache line size. This value does not consider partial requests, so for
|
||||
example, if only a single value is requested in a cache line, the data movement
|
||||
will still be counted as a full cache line.
|
||||
Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per normalization unit.
|
||||
Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per normalization unit.
|
||||
Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per normalization unit.
|
||||
Req: The total number of incoming requests to the L2 from all clients for all
|
||||
request types, per normalization unit.
|
||||
Read Req: The total number of read requests to the L2 from all clients.
|
||||
@@ -143,6 +149,12 @@ Panel Config:
|
||||
Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or
|
||||
64B of data from any source other than the accelerator's local HBM, per normalization
|
||||
unit.
|
||||
Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
|
||||
traffic, per normalization unit.
|
||||
"Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B of data to any memory location, per normalization
|
||||
unit.
|
||||
@@ -158,6 +170,18 @@ Panel Config:
|
||||
Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B or 64B of data in any memory location other than
|
||||
the accelerator's local HBM, per normalization unit.
|
||||
Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
|
||||
HBM traffic, per normalization unit.
|
||||
Atomic: The total number of L2 requests to Infinity Fabric to atomically update
|
||||
32B or 64B of data in any memory location, per normalization unit. See Request
|
||||
flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
|
||||
|
||||
+3
@@ -63,6 +63,9 @@ Panel Config:
|
||||
unit by the address processor summed over all compute units on the accelerator,
|
||||
per normalization unit. This is expected to be the sum of global/generic and
|
||||
spill/stack atomics in the address processor.
|
||||
Write Ack Instructions: The total number of write acknowledgements submitted by
|
||||
data-return unit to SQ, summed over all compute units on the accelerator, per
|
||||
normalization unit.
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1501
|
||||
|
||||
@@ -87,6 +87,12 @@ Panel Config:
|
||||
by the cache line size. This value does not consider partial requests, so for
|
||||
example, if only a single value is requested in a cache line, the data movement
|
||||
will still be counted as a full cache line.
|
||||
Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per normalization unit.
|
||||
Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per normalization unit.
|
||||
Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per normalization unit.
|
||||
Req: The total number of incoming requests to the L2 from all clients for all
|
||||
request types, per normalization unit.
|
||||
Read Req: The total number of read requests to the L2 from all clients.
|
||||
@@ -143,6 +149,12 @@ Panel Config:
|
||||
Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or
|
||||
64B of data from any source other than the accelerator's local HBM, per normalization
|
||||
unit.
|
||||
Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
|
||||
traffic, per normalization unit.
|
||||
"Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B of data to any memory location, per normalization
|
||||
unit.
|
||||
@@ -158,6 +170,18 @@ Panel Config:
|
||||
Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B or 64B of data in any memory location other than
|
||||
the accelerator's local HBM, per normalization unit.
|
||||
Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
|
||||
HBM traffic, per normalization unit.
|
||||
Atomic: The total number of L2 requests to Infinity Fabric to atomically update
|
||||
32B or 64B of data in any memory location, per normalization unit. See Request
|
||||
flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
|
||||
|
||||
+3
@@ -63,6 +63,9 @@ Panel Config:
|
||||
unit by the address processor summed over all compute units on the accelerator,
|
||||
per normalization unit. This is expected to be the sum of global/generic and
|
||||
spill/stack atomics in the address processor.
|
||||
Write Ack Instructions: The total number of write acknowledgements submitted by
|
||||
data-return unit to SQ, summed over all compute units on the accelerator, per
|
||||
normalization unit.
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1501
|
||||
|
||||
@@ -87,6 +87,12 @@ Panel Config:
|
||||
by the cache line size. This value does not consider partial requests, so for
|
||||
example, if only a single value is requested in a cache line, the data movement
|
||||
will still be counted as a full cache line.
|
||||
Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per normalization unit.
|
||||
Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per normalization unit.
|
||||
Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per normalization unit.
|
||||
Req: The total number of incoming requests to the L2 from all clients for all
|
||||
request types, per normalization unit.
|
||||
Read Req: The total number of read requests to the L2 from all clients.
|
||||
@@ -143,6 +149,12 @@ Panel Config:
|
||||
Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or
|
||||
64B of data from any source other than the accelerator's local HBM, per normalization
|
||||
unit.
|
||||
Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
|
||||
traffic, per normalization unit.
|
||||
"Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B of data to any memory location, per normalization
|
||||
unit.
|
||||
@@ -158,6 +170,18 @@ Panel Config:
|
||||
Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B or 64B of data in any memory location other than
|
||||
the accelerator's local HBM, per normalization unit.
|
||||
Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
|
||||
HBM traffic, per normalization unit.
|
||||
Atomic: The total number of L2 requests to Infinity Fabric to atomically update
|
||||
32B or 64B of data in any memory location, per normalization unit. See Request
|
||||
flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
|
||||
|
||||
+3
@@ -63,6 +63,9 @@ Panel Config:
|
||||
unit by the address processor summed over all compute units on the accelerator,
|
||||
per normalization unit. This is expected to be the sum of global/generic and
|
||||
spill/stack atomics in the address processor.
|
||||
Write Ack Instructions: The total number of write acknowledgements submitted by
|
||||
data-return unit to SQ, summed over all compute units on the accelerator, per
|
||||
normalization unit.
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1501
|
||||
|
||||
@@ -87,6 +87,12 @@ Panel Config:
|
||||
by the cache line size. This value does not consider partial requests, so for
|
||||
example, if only a single value is requested in a cache line, the data movement
|
||||
will still be counted as a full cache line.
|
||||
Read Bandwidth: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per normalization unit.
|
||||
Write Bandwidth: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per normalization unit.
|
||||
Atomic Bandwidth: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per normalization unit.
|
||||
Req: The total number of incoming requests to the L2 from all clients for all
|
||||
request types, per normalization unit.
|
||||
Read Req: The total number of read requests to the L2 from all clients.
|
||||
@@ -143,6 +149,12 @@ Panel Config:
|
||||
Remote Read: The total number of L2 requests to Infinity Fabric to read 32B or
|
||||
64B of data from any source other than the accelerator's local HBM, per normalization
|
||||
unit.
|
||||
Read Bandwidth - PCIe: Total number of bytes due to L2 read requests due to PCIe
|
||||
traffic, per normalization unit.
|
||||
"Read Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 read
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Read Bandwidth - HBM: Total number of bytes due to L2 read requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Write and Atomic (32B): The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B of data to any memory location, per normalization
|
||||
unit.
|
||||
@@ -158,6 +170,18 @@ Panel Config:
|
||||
Remote Write and Atomic: The total number of L2 requests to Infinity Fabric to
|
||||
write or atomically update 32B or 64B of data in any memory location other than
|
||||
the accelerator's local HBM, per normalization unit.
|
||||
Write Bandwidth - PCIe: Total number of bytes due to L2 write requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Write Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 write
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Write Bandwidth - HBM: Total number of bytes due to L2 write requests due to HBM
|
||||
traffic, per normalization unit.
|
||||
Atomic Bandwidth - PCIe: Total number of bytes due to L2 atomic requests due to
|
||||
PCIe traffic, per normalization unit.
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122": Total number of bytes due to L2 atomic
|
||||
requests due to Infinity Fabric traffic, per normalization unit.
|
||||
Atomic Bandwidth - HBM: Total number of bytes due to L2 atomic requests due to
|
||||
HBM traffic, per normalization unit.
|
||||
Atomic: The total number of L2 requests to Infinity Fabric to atomically update
|
||||
32B or 64B of data in any memory location, per normalization unit. See Request
|
||||
flow for more detail. Note that on current CDNA accelerators, such as the MI2XX,
|
||||
@@ -628,6 +652,21 @@ Panel Config:
|
||||
min: MIN((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
|
||||
max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
Read Bandwidth - PCIe:
|
||||
avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
"Read Bandwidth - Infinity Fabric\u2122":
|
||||
avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Read Bandwidth - HBM:
|
||||
avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Write and Atomic (32B):
|
||||
avg: AVG(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
|
||||
min: MIN(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
|
||||
@@ -654,19 +693,19 @@ Panel Config:
|
||||
max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
Write Bandwidth - PCIe:
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
"Write Bandwidth - Infinity Fabric\u2122":
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Write Bandwidth - HBM:
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Atomic:
|
||||
avg: AVG((TCC_EA0_ATOMIC_sum / $denom))
|
||||
@@ -679,17 +718,17 @@ Panel Config:
|
||||
max: MAX((TCC_EA0_WRREQ_ATOMIC_DRAM_sum / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
Atomic Bandwidth - PCIe:
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122":
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Atomic Bandwidth - HBM:
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
|
||||
+13
-13
@@ -77,24 +77,24 @@ src/rocprof_compute_soc/analysis_configs/gfx940/1400_scalar_l1_data_cache.yaml:
|
||||
src/rocprof_compute_soc/analysis_configs/gfx941/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
|
||||
src/rocprof_compute_soc/analysis_configs/gfx942/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
|
||||
src/rocprof_compute_soc/analysis_configs/gfx950/1400_scalar_l1_data_cache.yaml: 8871e3b65132321cb3880a48f894d8c3b2c56a3936d382c3c2b02723ed5c8ec5
|
||||
src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 231f9b7c09266c4aac50ac4db1b055c36eb6e563ba713c5f3aa30508d03b9170
|
||||
src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml: eb1ec287cc1f9f133b80fdde072a2b86e819f96ccdf4c305e721f3466d37b156
|
||||
src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 52ae21cec4ce4990e966d7fb438ac02b7e63ad4bc428f9770cd2c08d80f712da
|
||||
src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 52ae21cec4ce4990e966d7fb438ac02b7e63ad4bc428f9770cd2c08d80f712da
|
||||
src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 52ae21cec4ce4990e966d7fb438ac02b7e63ad4bc428f9770cd2c08d80f712da
|
||||
src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml: f7b032202e1aea6befda0d62e3d9f04b846f473218bd62e90d59a34678b62a77
|
||||
src/rocprof_compute_soc/analysis_configs/gfx908/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 633d59aba82b3a495b7ba33fa4b2ae4da638b58632bcc37ff18be87af68ce4d4
|
||||
src/rocprof_compute_soc/analysis_configs/gfx90a/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 2bdb9d7b3bea1057b3baee29ba3b428b211808261063a97bc4b6b319f4a19fb3
|
||||
src/rocprof_compute_soc/analysis_configs/gfx940/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19
|
||||
src/rocprof_compute_soc/analysis_configs/gfx941/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19
|
||||
src/rocprof_compute_soc/analysis_configs/gfx942/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 3180c2f3266be0ff44e01d73d247ca43ae2ee18ecaf61765f58849e36c701b19
|
||||
src/rocprof_compute_soc/analysis_configs/gfx950/1500_address_processing_unit_and_data_return_path_ta_td.yaml: 9e56cef5b066fb575a5c530bcf9400f1291dd8636b12c8a2244cdba1defafc9f
|
||||
src/rocprof_compute_soc/analysis_configs/gfx908/1600_vector_l1_data_cache.yaml: e6ec43014ce7b7cc072385d4eba072dd187b5de14979c169a3c1e9b8fc4c2762
|
||||
src/rocprof_compute_soc/analysis_configs/gfx90a/1600_vector_l1_data_cache.yaml: e6ec43014ce7b7cc072385d4eba072dd187b5de14979c169a3c1e9b8fc4c2762
|
||||
src/rocprof_compute_soc/analysis_configs/gfx940/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28
|
||||
src/rocprof_compute_soc/analysis_configs/gfx941/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28
|
||||
src/rocprof_compute_soc/analysis_configs/gfx942/1600_vector_l1_data_cache.yaml: 0e53921cc8d87a9adade250b9632fa42d33c825565152e37d6e56f45f83a3a28
|
||||
src/rocprof_compute_soc/analysis_configs/gfx950/1600_vector_l1_data_cache.yaml: cd21327c193d2af8c18066b9c13f67e3d5dfb44731777bc5a1b6a7738c902dd1
|
||||
src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml: 6aeda249093c666000b104f8631b4a85698e083dd55e77e1e1f095f222054742
|
||||
src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml: a4ec667e0b827c046de207416d185dd528f030f29bdee162a2634e579bb31846
|
||||
src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml: a9ac811e491fce354aef029b11a96edb589535e84224fa2e2b323623e9fd6e00
|
||||
src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml: 7d925c3369b366c23e638ca2b3d074672324a5b9fd0fa586a3e71dee458743a6
|
||||
src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml: 7532dc55c28c809f435f5edae98632a2d99adc898b2b71a661e2c9696f674f4a
|
||||
src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml: a9f3146a99e74eaba5327be3cdf9361fb8b69d1640751fb05519e44dd2ec7292
|
||||
src/rocprof_compute_soc/analysis_configs/gfx908/1700_l2_cache.yaml: 5b48c690b6069a5610d07cc0c2a5e1da65a52296205dcf48a3b6fa5e3df36e9b
|
||||
src/rocprof_compute_soc/analysis_configs/gfx90a/1700_l2_cache.yaml: a9b128267a069060e891533334c52586c706f145b1e813a4081cb21d425516ad
|
||||
src/rocprof_compute_soc/analysis_configs/gfx940/1700_l2_cache.yaml: b4eea39f0e23e501ad503cdd96db377109c7f0e212949828fe06102de7355349
|
||||
src/rocprof_compute_soc/analysis_configs/gfx941/1700_l2_cache.yaml: da0189cd7f6e1ab4b79d0c054c2cdc1f7a9c81972dae9e5285f2f3d9c30ca644
|
||||
src/rocprof_compute_soc/analysis_configs/gfx942/1700_l2_cache.yaml: b0802f923052eb584ce138210ebf2db70fb7883926896da1861a9e857d4abe81
|
||||
src/rocprof_compute_soc/analysis_configs/gfx950/1700_l2_cache.yaml: 58bdd965421d610567e461becd7094fa41d668b119eddab99054d2bd6dc12acf
|
||||
src/rocprof_compute_soc/analysis_configs/gfx908/1800_l2_cache_per_channel.yaml: a0c53202fe9f68d5e1fa689ce0643c471ced7d47e007d8ccc68fba294f7f6a05
|
||||
src/rocprof_compute_soc/analysis_configs/gfx90a/1800_l2_cache_per_channel.yaml: a0c53202fe9f68d5e1fa689ce0643c471ced7d47e007d8ccc68fba294f7f6a05
|
||||
src/rocprof_compute_soc/analysis_configs/gfx940/1800_l2_cache_per_channel.yaml: e184e3692eb0d641fb2e37fada0e58a6c4958553931d7c038b884e1e6986093f
|
||||
@@ -107,4 +107,4 @@ src/rocprof_compute_soc/analysis_configs/gfx940/2100_pc_sampling.yaml: 4f3af5504
|
||||
src/rocprof_compute_soc/analysis_configs/gfx941/2100_pc_sampling.yaml: 4f3af55040c40bee5f1fd88d83e2324d06e5dc462c0adc3e6d5b19b3f31af5e7
|
||||
src/rocprof_compute_soc/analysis_configs/gfx942/2100_pc_sampling.yaml: 4f3af55040c40bee5f1fd88d83e2324d06e5dc462c0adc3e6d5b19b3f31af5e7
|
||||
src/rocprof_compute_soc/analysis_configs/gfx950/2100_pc_sampling.yaml: 4f3af55040c40bee5f1fd88d83e2324d06e5dc462c0adc3e6d5b19b3f31af5e7
|
||||
docs/data/metrics_description.yaml: b912cf868d488d6ff78d4efc6ceeca27cca5811f4c705efa68a21dd6ddb1609b
|
||||
docs/data/metrics_description.yaml: 819c08a584ae8b418e6983aa51108b95e43eda4f3b7892eab336c61d844b20bf
|
||||
|
||||
@@ -10913,6 +10913,13 @@ panels:
|
||||
This is expected to be the sum of global/generic and spill/stack atomics
|
||||
in the :ref:`address processor <desc-ta>`.
|
||||
unit: Instructions per normalization unit
|
||||
Write Ack Instructions:
|
||||
plain: The total number of write acknowledgements submitted by data-return
|
||||
unit to SQ, summed over all compute units on the accelerator, per normalization
|
||||
unit.
|
||||
rst: The total number of write acknowledgements submitted by :ref:`data-return unit <desc-td>`
|
||||
to SQ, summed over all compute units on the accelerator, per normalization unit.
|
||||
unit: Instructions per normalization unit
|
||||
- id: 1600
|
||||
title: Vector L1 Data Cache
|
||||
data source:
|
||||
@@ -14728,6 +14735,21 @@ panels:
|
||||
min: MIN((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
|
||||
max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
Read Bandwidth - PCIe:
|
||||
avg: AVG(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_RDREQ_IO_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
"Read Bandwidth - Infinity Fabric\u2122":
|
||||
avg: AVG(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_RDREQ_GMI_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Read Bandwidth - HBM:
|
||||
avg: AVG(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_RDREQ_DRAM_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Write and Atomic (32B):
|
||||
avg: AVG(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
|
||||
min: MIN(((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum) / $denom))
|
||||
@@ -14754,19 +14776,19 @@ panels:
|
||||
max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
Write Bandwidth - PCIe:
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_IO_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
"Write Bandwidth - Infinity Fabric\u2122":
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_GMI_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Write Bandwidth - HBM:
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_WRITE_DRAM_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Atomic:
|
||||
avg: AVG((TCC_EA0_ATOMIC_sum / $denom))
|
||||
@@ -14779,19 +14801,19 @@ panels:
|
||||
max: MAX((TCC_EA0_WRREQ_ATOMIC_DRAM_sum / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
Atomic Bandwidth - PCIe:
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_IO_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122":
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_GMI_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
Atomic Bandwidth - HBM:
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum / $denom)
|
||||
avg: AVG(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom)
|
||||
min: MIN(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom)
|
||||
max: MAX(TCC_EA0_WRREQ_ATOMIC_DRAM_32B_sum * 32/ $denom)
|
||||
unit: (Bytes + $normUnit)
|
||||
gfx908:
|
||||
Read (32B):
|
||||
@@ -15064,6 +15086,24 @@ panels:
|
||||
requested in a cache line, the data movement will still be counted as a full
|
||||
cache line.
|
||||
unit: Bytes per normalization unit
|
||||
Read Bandwidth:
|
||||
plain: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per normalization unit.
|
||||
rst: Total number of bytes looked up in the L2 cache for read requests,
|
||||
per :ref:`normalization unit <normalization-units>`.
|
||||
unit: Bytes per normalization unit
|
||||
Write Bandwidth:
|
||||
plain: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per normalization unit.
|
||||
rst: Total number of bytes looked up in the L2 cache for write requests,
|
||||
per :ref:`normalization unit <normalization-units>`.
|
||||
unit: Bytes per normalization unit
|
||||
Atomic Bandwidth:
|
||||
plain: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per normalization unit.
|
||||
rst: Total number of bytes looked up in the L2 cache for atomic requests,
|
||||
per :ref:`normalization unit <normalization-units>`.
|
||||
unit: Bytes per normalization unit
|
||||
Req:
|
||||
plain: The total number of incoming requests to the L2 from all clients for
|
||||
all request types, per normalization unit.
|
||||
@@ -15235,6 +15275,18 @@ panels:
|
||||
from any source other than the accelerator's local HBM, per :ref:`normalization
|
||||
unit <normalization-units>`. See :ref:`l2-request-flow` for more detail.
|
||||
unit: Requests per normalization unit
|
||||
Read Bandwidth - PCIe:
|
||||
plain: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 read requests due to PCIe traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
"Read Bandwidth - Infinity Fabric\u2122":
|
||||
plain: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 read requests due to Infinity Fabric traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Read Bandwidth - HBM:
|
||||
plain: Total number of bytes due to L2 read requests due to HBM traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 read requests due to HBM traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Write and Atomic (32B):
|
||||
plain: The total number of L2 requests to Infinity Fabric to write or atomically
|
||||
update 32B of data to any memory location, per normalization unit.
|
||||
@@ -15273,6 +15325,30 @@ panels:
|
||||
HBM, per :ref:`normalization unit <normalization-units>`. See :ref:`l2-request-flow`
|
||||
for more detail.
|
||||
unit: Requests per normalization unit
|
||||
Write Bandwidth - PCIe:
|
||||
plain: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 write requests due to PCIe traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
"Write Bandwidth - Infinity Fabric\u2122":
|
||||
plain: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 write requests due to Infinity Fabric traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Write Bandwidth - HBM:
|
||||
plain: Total number of bytes due to L2 write requests due to HBM traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 write requests due to HBM traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Atomic Bandwidth - PCIe:
|
||||
plain: Total number of bytes due to L2 atomic requests due to PCIe traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 atomic requests due to PCIe traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
"Atomic Bandwidth - Infinity Fabric\u2122":
|
||||
plain: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 atomic requests due to Infinity Fabric traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Atomic Bandwidth - HBM:
|
||||
plain: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization unit.
|
||||
rst: Total number of bytes due to L2 atomic requests due to HBM traffic, per normalization unit.
|
||||
unit: Bytes per normalization unit
|
||||
Atomic:
|
||||
plain: The total number of L2 requests to Infinity Fabric to atomically update
|
||||
32B or 64B of data in any memory location, per normalization unit. See Request
|
||||
|
||||
Reference in New Issue
Block a user