Support MI 350 profiling (#632)

* Add MI 350 hardware information

* Refactor MI GPU YAML file and corresponding interface

* Add SoC file for gfx950 architecture

* Add analysis report configs for MI 350 containing existing metrics

* Add placeholder None valued metrics for previous architectures to make
  baseline comparison work

* Enable testing on MI 350

* Analysis config metric changes
    - SPI changes
        - Update metric formula for default SPI pipe counter
             - Use efficiently collected pipe wise SPI counters
        - Add SPI Wave Occupancy
        - Add Scheduler-Pipe Wave Utilization
        - Update formula for VGPR Writes
        - Add Scheduler-Pipe FIFO Full Rate
   - CPC changes
	- Add CPC SYNC FIFO Full Rate
	- Add CPC CANE Stall Rate
        - Add CPC ADC Utilization
   - SQ changes
        - Add VALU co-issue efficiency
        - Add F6F4 datatype metrics
        - Update formula for total FLOPs by adding F6F4 counters
        - Add LDS STORE / LOAD / ATOMIC metrics
        - Add LDS STORE / LOAD / ATOMIC bandwidth
        - Add LDS FIFO and TA ADDR / CMD / DATA FIFO full rates

* Collect TCP_TCP_LATENCY_sum only for gfx950 (MI 350)

* Do not inject SQ_ACCUM_PREV_HIRES unnecesarily

* Do not hardcode memory and shader clock speeds

* Write num_hbm_channels to sysinfo.csv instead of hbm_bw while profiling

* Move generate sysinfo.csv to pre processing step of profiling

* Add warnings to use --specs-correction for missing sysinfo.csv values during analysis phase

* Update CHANGELOG

* Analysis phase warning to use --specs-correction when needed

[ROCm/rocprofiler-compute commit: f9aa7be97c]
This commit is contained in:
vedithal-amd
2025-04-03 02:21:18 -04:00
zatwierdzone przez GitHub
rodzic 1273a5e2a9
commit 27585a8a2b
366 zmienionych plików z 22368 dodań i 577 usunięć
+29 -5
Wyświetl plik
@@ -22,17 +22,33 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
* Support host-trap PC Sampling on CLI (beta version)
* Add support for tuned performance counters for gfx950 GPUs
* Add L1 latencies
* Add L2 latencies
* Add L2 to EA stalls
* Add L2 to EA stalls per channel
* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
* VALU co-issue (Two VALUs are issued instructions) efficiency
* Stream Processor Instruction (SPI) Wave Occupancy
* Scheduler-Pipe Wave Utilization
* Scheduler FIFO Full Rate
* CPC ADC Utilization
* F6F4 datatype metrics
* Update formula for total FLOPs while taking into account F6F4 ops
* LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
* LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
* LDS FIFO full rate
* Sequencer -> TA ADDR Stall rates
* Sequencer -> TA CMD Stall rates
* Sequencer -> TA DATA Stall rates
* L1 latencies
* L2 latencies
* L2 to EA stalls
* L2 to EA stalls per channel
### Changed
* Change normal_unit default to per_kernel
* Change dependency from rocm-smi to amd-smi
* Decrease profiling time by not collecting counters not used in post analysis
* Update definition of following metrics for MI 350:
* VGPR Writes
* Total FLOPs (consider fp6 and fp4 ops)
### Resolved issues
@@ -44,6 +60,14 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
* GPU id filtering is not supported when using rocprof v3
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change
* As a workaround, run the profiling operation again for the workload and interrupt the process after ten seconds.
Followed by copying the `sysinfo.csv` file from the new data folder to the old one.
This assumes your system specification hasn't changed since the creation of the previous workload data.
* Analysis of new workloads might require providing shader/memory clock speed using
--specs-correction operation if `amd-smi` or `rocminfo` does not provide clock speeds.
## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0
### Added
@@ -292,9 +292,8 @@ add_test(
add_test(
NAME test_L1_cache_counters
COMMAND
${Python3_EXECUTABLE} -m pytest -m L1_cache
--junitxml=tests/test_TCP_counters.xml ${COV_OPTION}
${PROJECT_SOURCE_DIR}/tests/test_TCP_counters.py
${Python3_EXECUTABLE} -m pytest -m L1_cache --junitxml=tests/test_TCP_counters.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_TCP_counters.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
# ---------
@@ -673,7 +673,7 @@ Examples:
"--specs-correction",
type=str,
metavar="",
help="\t\tSpecify the specs to correct.",
help="\t\tSpecify the specs to correct. e.g. --specs-correction='specname1:specvalue1,specname2:specvalue2'",
)
analyze_advanced_group.add_argument(
"--list-nodes",
@@ -107,7 +107,6 @@ class webui_analysis(OmniAnalyze_Base):
console_debug("analysis", "gui normalization is %s" % norm_filt)
base_data = self.initalize_runs() # Re-initalizes everything
hbm_bw = base_data[base_run].sys_info["hbm_bw"][0]
panel_configs = copy.deepcopy(arch_configs.panel_configs)
# Generate original raw df
base_data[base_run].raw_pmc = file_io.create_df_pmc(
@@ -231,7 +230,6 @@ class webui_analysis(OmniAnalyze_Base):
norm_filt=norm_filt,
comparable_columns=comparable_columns,
decimal=self.get_args().decimal,
hbm_bw=base_data[base_run].sys_info["hbm_bw"][0],
)
# Update content for this section
@@ -358,7 +356,6 @@ def determine_chart_type(
norm_filt,
comparable_columns,
decimal,
hbm_bw,
):
content = []
@@ -372,9 +369,7 @@ def determine_chart_type(
# Determine chart type:
# a) Barchart
if table_config["id"] in [x for i in barchart_elements.values() for x in i]:
d_figs = build_bar_chart(
display_df, table_config, barchart_elements, norm_filt, hbm_bw
)
d_figs = build_bar_chart(display_df, table_config, barchart_elements, norm_filt)
# Smaller formatting if barchart yeilds several graphs
if (
len(d_figs)
@@ -311,6 +311,21 @@ class RocProfCompute_Base:
if self.__args.name.find(".") != -1 or self.__args.name.find("-") != -1:
console_error("'-' and '.' are not permitted in -n/--name")
gen_sysinfo(
workload_name=self.__args.name,
workload_dir=self.get_args().path,
ip_blocks=[
name
for name, type in self.__args.filter_blocks.items()
if type == "hardware_block"
],
app_cmd=self.__args.remaining,
skip_roof=self.__args.no_roof,
roof_only=self.__args.roof_only,
mspec=self._soc._mspec,
soc=self._soc,
)
@abstractmethod
def run_profiling(self, version: str, prog: str):
"""Run profiling."""
@@ -446,21 +461,6 @@ class RocProfCompute_Base:
"performing post-processing using %s profiler" % self.__profiler,
)
gen_sysinfo(
workload_name=self.__args.name,
workload_dir=self.get_args().path,
ip_blocks=[
name
for name, type in self.__args.filter_blocks.items()
if type == "hardware_block"
],
app_cmd=self.__args.remaining,
skip_roof=self.__args.no_roof,
roof_only=self.__args.roof_only,
mspec=self._soc._mspec,
soc=self._soc,
)
def test_df_column_equality(df):
return df.eq(df.iloc[:, 0], axis=0).all(1).all()
@@ -62,6 +62,13 @@ Panel Config:
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: None # No perf counter
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (Int8):
value: None # No perf counter
unit: GOPs
@@ -179,17 +186,17 @@ Panel Config:
value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Read Latency:
value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
@@ -19,6 +19,24 @@ Panel Config:
unit: Unit
tips: Tips
metric:
CPC SYNC FIFO Full Rate:
avg: None
min: None
max: None
unit: pct
tips:
CPC CANE Stall Rate:
avg: None
min: None
max: None
unit: pct
tips:
CPC ADC Utilization:
avg: None
min: None
max: None
unit: pct
tips:
CPF Utilization:
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Schedule-Pipe Wave Occupancy:
avg: None
min: None
max: None
unit: Wave
tips:
Accelerator Utilization:
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe Wave Utilization:
avg: None
min: None
max: None
unit: Pct
tips:
Workgroup Manager Utilization:
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
0) else None)
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe FIFO Full Rate:
avg: None
min: None
max: None
unit: Pct
tips:
Scheduler-Pipe Stall Rate:
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
@@ -181,6 +181,13 @@ Panel Config:
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
Spill/Stack Coalesceable Instr:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
Global/Generic Coalesceable Instr:
avg: None # No perf counter
min: None # No perf counter
@@ -283,3 +290,10 @@ Panel Config:
max: None # No HW module
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA-F6F4:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
@@ -61,6 +61,13 @@ Panel Config:
peak: None
pop: None
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: None # No perf counter
unit: None
@@ -109,6 +116,13 @@ Panel Config:
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
VALU Co-Issue Efficiency:
avg: None
min: None
max: None
unit: pct
tips:
VMEM Utilization:
avg: None # No HW module
min: None # No HW module
@@ -210,6 +224,13 @@ Panel Config:
max: None # No perf counter
unit: (OPs + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
F6F4 OPs:
avg: None
min: None
max: None
unit: (OPs + $normUnit)
tips:
INT8 OPs:
avg: None # No perf counter
min: None # No perf counter
@@ -55,6 +55,48 @@ Panel Config:
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
Theoretical Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
@@ -116,3 +158,17 @@ Panel Config:
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
unit: (Accesses + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Command FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Data FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
@@ -43,6 +43,27 @@ Panel Config:
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Address Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Command Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Data Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
Total Instructions:
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
HBM Bandwidth:
value: $hbmBandwidth
unit: GB/s
tips:
- metric_table:
id: 1702
@@ -62,6 +62,13 @@ Panel Config:
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: None # No perf counter
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (Int8):
value: None # No perf counter
unit: GOPs
@@ -179,17 +186,17 @@ Panel Config:
value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Read Latency:
value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
@@ -19,6 +19,27 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC SYNC FIFO Full Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC CANE Stall Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC ADC Utilization:
avg: None
min: None
max: None
unit: pct
tips:
CPF Utilization:
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Schedule-Pipe Wave Occupancy:
avg: None
min: None
max: None
unit: Wave
tips:
Accelerator Utilization:
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe Wave Utilization:
avg: None
min: None
max: None
unit: Pct
tips:
Workgroup Manager Utilization:
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
0) else None)
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe FIFO Full Rate:
avg: None
min: None
max: None
unit: Pct
tips:
Scheduler-Pipe Stall Rate:
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
@@ -181,6 +181,13 @@ Panel Config:
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
Spill/Stack Coalesceable Instr:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
Global/Generic Read:
avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
@@ -271,3 +278,10 @@ Panel Config:
max: None # No HW module
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA-F6F4:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
@@ -61,6 +61,13 @@ Panel Config:
peak: None
pop: None
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: None # No perf counter
unit: None
@@ -109,6 +116,13 @@ Panel Config:
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
VALU Co-Issue Efficiency:
avg: None
min: None
max: None
unit: pct
tips:
VMEM Utilization:
avg: None # No HW module
min: None # No HW module
@@ -210,6 +224,13 @@ Panel Config:
max: None # No perf counter
unit: (OPs + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
F6F4 OPs:
avg: None
min: None
max: None
unit: (OPs + $normUnit)
tips:
INT8 OPs:
avg: None # No perf counter
min: None # No perf counter
@@ -55,6 +55,48 @@ Panel Config:
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
Theoretical Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
@@ -116,3 +158,17 @@ Panel Config:
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
unit: (Accesses + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Command FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Data FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
@@ -43,6 +43,27 @@ Panel Config:
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Address Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Command Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Data Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
Total Instructions:
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
HBM Bandwidth:
value: $hbmBandwidth
unit: GB/s
tips:
- metric_table:
id: 1702
@@ -76,6 +76,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (Int8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -197,17 +204,17 @@ Panel Config:
value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Read Latency:
value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
@@ -19,6 +19,24 @@ Panel Config:
unit: Unit
tips: Tips
metric:
CPC SYNC FIFO Full Rate:
avg: None
min: None
max: None
unit: pct
tips:
CPC CANE Stall Rate:
avg: None
min: None
max: None
unit: pct
tips:
CPC ADC Utilization:
avg: None
min: None
max: None
unit: pct
tips:
CPF Utilization:
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Schedule-Pipe Wave Occupancy:
avg: None
min: None
max: None
unit: Wave
tips:
Accelerator Utilization:
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe Wave Utilization:
avg: None
min: None
max: None
unit: Pct
tips:
Workgroup Manager Utilization:
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
0) else None)
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe FIFO Full Rate:
avg: None
min: None
max: None
unit: Pct
tips:
Scheduler-Pipe Stall Rate:
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
@@ -181,6 +181,13 @@ Panel Config:
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
Spill/Stack Coalesceable Instr:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
Global/Generic Read:
avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
@@ -271,3 +278,10 @@ Panel Config:
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA-F6F4:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
@@ -75,6 +75,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -124,6 +131,13 @@ Panel Config:
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
VALU Co-Issue Efficiency:
avg: None
min: None
max: None
unit: pct
tips:
VMEM Utilization:
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -264,6 +278,13 @@ Panel Config:
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
unit: (OPs + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
F6F4 OPs:
avg: None
min: None
max: None
unit: (OPs + $normUnit)
tips:
INT8 OPs:
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
Theoretical Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
@@ -116,3 +158,17 @@ Panel Config:
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
unit: (Accesses + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Command FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Data FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
@@ -43,6 +43,27 @@ Panel Config:
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Address Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Command Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Data Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
Total Instructions:
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
HBM Bandwidth:
value: $hbmBandwidth
unit: GB/s
tips:
- metric_table:
id: 1702
@@ -77,6 +77,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (Int8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -198,18 +205,18 @@ Panel Config:
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Read Latency:
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
@@ -19,6 +19,27 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC SYNC FIFO Full Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC CANE Stall Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC ADC Utilization:
avg: None
min: None
max: None
unit: pct
tips:
CPF Utilization:
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Schedule-Pipe Wave Occupancy:
avg: None
min: None
max: None
unit: Wave
tips:
Accelerator Utilization:
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe Wave Utilization:
avg: None
min: None
max: None
unit: Pct
tips:
Workgroup Manager Utilization:
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
0) else None)
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe FIFO Full Rate:
avg: None
min: None
max: None
unit: Pct
tips:
Scheduler-Pipe Stall Rate:
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
@@ -209,6 +209,13 @@ Panel Config:
max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
Spill/Stack Coalesceable Instr:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
Spill/Stack Read:
avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
@@ -274,4 +281,11 @@ Panel Config:
min: MIN((SQ_INSTS_VALU_MFMA_F64 / $denom))
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
unit: (instr + $normUnit)
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA-F6F4:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
tips:
@@ -76,6 +76,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -125,6 +132,13 @@ Panel Config:
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
VALU Co-Issue Efficiency:
avg: None
min: None
max: None
unit: pct
tips:
VMEM Utilization:
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -265,6 +279,13 @@ Panel Config:
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
unit: (OPs + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
F6F4 OPs:
avg: None
min: None
max: None
unit: (OPs + $normUnit)
tips:
INT8 OPs:
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
Theoretical Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
@@ -116,3 +158,17 @@ Panel Config:
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
unit: (Accesses + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Command FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Data FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
@@ -43,6 +43,27 @@ Panel Config:
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Address Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Command Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Data Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
Total Instructions:
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
HBM Bandwidth:
value: $hbmBandwidth
unit: GB/s
tips:
- metric_table:
id: 1702
@@ -77,6 +77,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (Int8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -198,18 +205,18 @@ Panel Config:
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Read Latency:
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
@@ -19,6 +19,27 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
CPC SYNC FIFO Full Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
CPC CANE Stall Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
CPC ADC Utilization:
avg: None
min: None
max: None
unit: pct
tips:
CPF Utilization:
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Schedule-Pipe Wave Occupancy:
avg: None
min: None
max: None
unit: Wave
tips:
Accelerator Utilization:
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe Wave Utilization:
avg: None
min: None
max: None
unit: Pct
tips:
Workgroup Manager Utilization:
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
0) else None)
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe FIFO Full Rate:
avg: None
min: None
max: None
unit: Pct
tips:
Scheduler-Pipe Stall Rate:
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
@@ -209,6 +209,13 @@ Panel Config:
max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
Spill/Stack Coalesceable Instr:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
Spill/Stack Read:
avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
@@ -275,3 +282,10 @@ Panel Config:
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA-F6F4:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
@@ -76,6 +76,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -125,6 +132,13 @@ Panel Config:
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
VALU Co-Issue Efficiency:
avg: None
min: None
max: None
unit: pct
tips:
VMEM Utilization:
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -265,6 +279,13 @@ Panel Config:
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
unit: (OPs + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
F6F4 OPs:
avg: None
min: None
max: None
unit: (OPs + $normUnit)
tips:
INT8 OPs:
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
Theoretical Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
@@ -116,3 +158,17 @@ Panel Config:
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
unit: (Accesses + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Command FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Data FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
@@ -43,6 +43,27 @@ Panel Config:
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Address Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Command Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Data Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
Total Instructions:
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -40,6 +40,10 @@ Panel Config:
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
HBM Bandwidth:
value: $hbmBandwidth
unit: GB/s
tips:
- metric_table:
id: 1702
@@ -77,6 +77,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (Int8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -198,18 +205,18 @@ Panel Config:
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbm_bw
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Read Latency:
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
@@ -76,6 +76,27 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC SYNC FIFO Full Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC CANE Stall Rate:
avg: None
min: None
max: None
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
CPC ADC Utilization:
avg: None
min: None
max: None
unit: pct
tips:
CPC Utilization:
avg: AVG((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
@@ -19,6 +19,13 @@ Panel Config:
unit: Unit
tips: Tips
metric:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Schedule-Pipe Wave Occupancy:
avg: None
min: None
max: None
unit: Wave
tips:
Accelerator Utilization:
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
@@ -31,6 +38,13 @@ Panel Config:
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe Wave Utilization:
avg: None
min: None
max: None
unit: Pct
tips:
Workgroup Manager Utilization:
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
@@ -108,6 +122,13 @@ Panel Config:
0) else None)
unit: Pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Scheduler-Pipe FIFO Full Rate:
avg: None
min: None
max: None
unit: Pct
tips:
Scheduler-Pipe Stall Rate:
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
@@ -209,6 +209,13 @@ Panel Config:
max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
Spill/Stack Coalesceable Instr:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
Spill/Stack Read:
avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
@@ -275,3 +282,10 @@ Panel Config:
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA-F6F4:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
@@ -76,6 +76,13 @@ Panel Config:
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
MFMA FLOPs (F6F4):
value: None
unit: GFLOP
peak: None
pop: None
tips:
MFMA IOPs (INT8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
@@ -125,6 +132,13 @@ Panel Config:
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
VALU Co-Issue Efficiency:
avg: None
min: None
max: None
unit: pct
tips:
VMEM Utilization:
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
@@ -265,6 +279,13 @@ Panel Config:
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
unit: (OPs + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
F6F4 OPs:
avg: None
min: None
max: None
unit: (OPs + $normUnit)
tips:
INT8 OPs:
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
@@ -55,6 +55,48 @@ Panel Config:
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC:
avg: None
min: None
max: None
unit: (instr + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS LOAD Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS STORE Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS ATOMIC Bandwidth:
avg: None
min: None
max: None
units: Gbps
tips:
Theoretical Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
@@ -116,3 +158,17 @@ Panel Config:
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
unit: (Accesses + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Command FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
LDS Data FIFO Full Rate:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
@@ -43,6 +43,27 @@ Panel Config:
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Address Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Command Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
# TODO: Fix baseline comparision logic to handle non existent metrics, then
Sequencer → TA Data Stall:
avg: None
min: None
max: None
unit: (Cycles + $normUnit)
tips:
Total Instructions:
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -41,6 +41,10 @@ Panel Config:
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
HBM Bandwidth:
value: $hbmBandwidth
unit: GB/s
tips:
- metric_table:
id: 1702
@@ -0,0 +1,14 @@
---
Panel Config:
id: 000
title: Top Stats
data source:
- raw_csv_table:
id: 001
title: Top Kernels
source: pmc_kernel_top.csv
- raw_csv_table:
id: 002
title: Dispatch List
source: pmc_dispatch_info.csv
@@ -0,0 +1,9 @@
---
Panel Config:
id: 100
title: System Info
data source:
- raw_csv_table:
id: 101
source: sysinfo.csv
columnwise: True
@@ -0,0 +1,269 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
SALU: &SALU_anchor Scalar Arithmetic Logic Unit
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 200
title: System Speed-of-Light
data source:
- metric_table:
id: 201
title: Speed-of-Light
header:
metric: Metric
value: Avg
unit: Unit
peak: Peak
pop: Pct of Peak
tips: Tips
metric:
VALU FLOPs:
value: AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+ (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+ SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32)))) + (64 * (((SQ_INSTS_VALU_ADD_F64
+ SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64) + (2 * SQ_INSTS_VALU_FMA_F64))))
/ (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: ((100 * AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16)
+ SQ_INSTS_VALU_TRANS_F16) + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32
+ SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32))))
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (2 * SQ_INSTS_VALU_FMA_F64)))) / (End_Timestamp - Start_Timestamp)))) / (((($max_sclk
* $cu_per_gpu) * 64) * 2) / 1000))
tips:
VALU IOPs:
value: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
tips:
MFMA FLOPs (F8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
tips:
MFMA FLOPs (BF16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
tips:
MFMA FLOPs (F16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
tips:
MFMA FLOPs (F32):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
MFMA FLOPs (F64):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
MFMA FLOPs (F6F4):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
MFMA IOPs (Int8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
tips:
Active CUs:
value: $numActiveCUs
unit: CUs
peak: $cu_per_gpu
pop: ((100 * $numActiveCUs) / $cu_per_gpu)
tips:
SALU Utilization:
value: AVG(((100 * SQ_ACTIVE_INST_SCA) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
peak: 100
pop: AVG(((100 * SQ_ACTIVE_INST_SCA) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
tips:
VALU Utilization:
value: AVG(((100 * SQ_ACTIVE_INST_VALU) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
peak: 100
pop: AVG(((100 * SQ_ACTIVE_INST_VALU) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
tips:
MFMA Utilization:
value: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / (($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)
* 4)))
unit: pct
peak: 100
pop: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / (($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)
* 4)))
tips:
VMEM Utilization:
value: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
peak: 100
pop: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
tips:
Branch Utilization:
value: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
peak: 100
pop: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
tips:
VALU Active Threads:
value: AVG(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
!= 0) else None))
unit: Threads
peak: $wave_size
pop: (100 * AVG((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU / $wave_size) if (SQ_ACTIVE_INST_VALU != 0) else None))
tips:
IPC:
value: AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))
unit: Instr/cycle
peak: 5
pop: ((100 * AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))) / 5)
tips:
Wavefront Occupancy:
value: AVG((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
unit: Wavefronts
peak: ($max_waves_per_cu * $cu_per_gpu)
pop: (100 * AVG(((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD) / ($max_waves_per_cu
* $cu_per_gpu))))
coll_level: SQ_LEVEL_WAVES
tips:
Theoretical LDS Bandwidth:
value: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: (($max_sclk * $cu_per_gpu) * 0.128)
pop: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
tips:
LDS Bank Conflicts/Access:
value: AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
unit: Conflicts/access
peak: 32
pop: ((100 * AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))) / 32)
tips:
vL1D Cache Hit Rate:
value: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
+ TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
/ TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0) else
None))
unit: pct
peak: 100
pop: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) +
TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) /
TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0) else
None))
tips:
vL1D Cache BW:
value: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: ((($max_sclk / 1000) * 128) * $cu_per_gpu)
pop: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk / 1000) * 128) * $cu_per_gpu))
tips:
L2 Cache Hit Rate:
value: AVG((((100 * TCC_HIT_sum) / (TCC_HIT_sum + TCC_MISS_sum)) if ((TCC_HIT_sum
+ TCC_MISS_sum) != 0) else None))
unit: pct
peak: 100
pop: AVG((((100 * TCC_HIT_sum) / (TCC_HIT_sum + TCC_MISS_sum)) if ((TCC_HIT_sum
+ TCC_MISS_sum) != 0) else None))
tips:
L2 Cache BW:
value: AVG(((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: ((($max_sclk / 1000) * 128) * TO_INT($total_l2_chan))
pop: ((100 * AVG(((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk / 1000) * 128) * TO_INT($total_l2_chan)))
tips:
L2-Fabric Read BW:
value: AVG((128 * TCC_BUBBLE_sum +
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
unit: GB/s
peak: $hbmBandwidth
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Write BW:
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
peak: $hbmBandwidth
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
tips:
L2-Fabric Read Latency:
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
!= 0) else None))
unit: Cycles
peak: None
pop: None
tips:
L2-Fabric Write Latency:
value: AVG(((TCC_EA0_WRREQ_LEVEL_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
!= 0) else None))
unit: Cycles
peak: None
pop: None
tips:
sL1D Cache Hit Rate:
value: AVG((((100 * SQC_DCACHE_HITS) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES))
if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES) != 0) else None))
unit: pct
peak: 100
pop: AVG((((100 * SQC_DCACHE_HITS) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES))
if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES) != 0) else None))
tips:
sL1D Cache BW:
value: AVG(((SQC_DCACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))
unit: GB/s
peak: ((($max_sclk / 1000) * 64) * $sqc_per_gpu)
pop: ((100 * AVG(((SQC_DCACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))) / ((($max_sclk
/ 1000) * 64) * $sqc_per_gpu))
tips:
L1I Hit Rate:
value: AVG(((100 * SQC_ICACHE_HITS) / (SQC_ICACHE_HITS + SQC_ICACHE_MISSES)))
unit: pct
peak: 100
pop: AVG(((100 * SQC_ICACHE_HITS) / (SQC_ICACHE_HITS + SQC_ICACHE_MISSES)))
tips:
L1I BW:
value: AVG(((SQC_ICACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))
unit: GB/s
peak: ((($max_sclk / 1000) * 64) * $sqc_per_gpu)
pop: ((100 * AVG(((SQC_ICACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))) / ((($max_sclk
/ 1000) * 64) * $sqc_per_gpu))
tips:
L1I Fetch Latency:
value: AVG((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
unit: Cycles
peak: None
pop: None
coll_level: SQ_IFETCH_LEVEL
tips:
@@ -0,0 +1,153 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 500
title: Command Processor (CPC/CPF)
data source:
- metric_table:
id: 501
title: Command Processor Fetcher
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
CPF Utilization:
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
min: MIN((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
max: MAX((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
unit: pct
tips:
CPF Stall:
avg: AVG((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
!= 0) else None))
min: MIN((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
!= 0) else None))
max: MAX((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
!= 0) else None))
unit: pct
tips:
CPF-L2 Utilization:
avg: AVG((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
min: MIN((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
max: MAX((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
unit: pct
tips:
CPF-L2 Stall:
avg: AVG((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
!= 0) else None))
min: MIN((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
!= 0) else None))
max: MAX((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
!= 0) else None))
unit: pct
tips:
CPF-UTCL1 Stall:
avg: AVG(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
!= 0) else None)
min: MIN(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
!= 0) else None)
max: MAX(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
!= 0) else None)
unit: pct
tips:
- metric_table:
id: 502
title: Packet Processor
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
CPC SYNC FIFO Full Rate:
avg: AVG((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
min: MIN((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
max: MAX((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
unit: pct
tips:
CPC CANE Stall Rate:
avg: AVG((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
min: MIN((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
max: MAX((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
unit: pct
tips:
CPC ADC Utilization:
avg: AVG((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
min: MIN((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
max: MAX((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
unit: pct
tips:
CPC Utilization:
avg: AVG((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
min: MIN((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
max: MAX((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
unit: pct
tips:
CPC Stall Rate:
avg: AVG((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
!= 0) else None))
min: MIN((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
!= 0) else None))
max: MAX((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
!= 0) else None))
unit: pct
tips:
CPC Packet Decoding Utilization:
avg: AVG((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
min: MIN((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
max: MAX((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
unit: pct
tips:
CPC-Workgroup Manager Utilization:
avg: AVG((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
min: MIN((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
max: MAX((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
unit: Pct
tips:
CPC-L2 Utilization:
avg: AVG((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
min: MIN((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
max: MAX((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
unit: pct
tips:
CPC-UTCL1 Stall:
avg: AVG(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
!= 0) else None)
min: MIN(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
!= 0) else None)
max: MAX(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
!= 0) else None)
unit: pct
tips:
CPC-UTCL2 Utilization:
avg: AVG((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
min: MIN((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
max: MAX((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
unit: pct
tips:
@@ -0,0 +1,188 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 600
title: Workgroup Manager (SPI)
data source:
- metric_table:
id: 601
title: Workgroup Manager Utilizations
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
Schedule-Pipe Wave Occupancy:
avg: AVG(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
min: MIN(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
max: MAX(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
unit: Wave
tips:
Accelerator Utilization:
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
max: MAX(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
unit: Pct
tips:
Scheduler-Pipe Utilization:
avg: AVG(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
min: MIN(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
max: MAX(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
Scheduler-Pipe Wave Utilization:
avg: AVG(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
min: MIN(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
max: MAX(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
unit: Pct
tips:
Workgroup Manager Utilization:
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
max: MAX(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
unit: Pct
tips:
Shader Engine Utilization:
avg: AVG(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
min: MIN(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
max: MAX(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
unit: Pct
tips:
SIMD Utilization:
avg: AVG(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
Dispatched Workgroups:
avg: AVG(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
min: MIN(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
max: MAX(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
unit: Workgroups
tips:
Dispatched Wavefronts:
avg: AVG(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
min: MIN(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
max: MAX(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
unit: Wavefronts
tips:
VGPR Writes:
avg: AVG((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
None))
min: MIN((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
None))
max: MAX((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
None))
unit: Cycles/wave
tips:
SGPR Writes:
avg: AVG((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
None))
min: MIN((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
None))
max: MAX((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
None))
unit: Cycles/wave
tips:
- metric_table:
id: 602
title: Workgroup Manager - Resource Allocation
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
Not-scheduled Rate (Workgroup Manager):
avg: AVG((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
min: MIN((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
max: MAX((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
unit: Pct
tips:
Not-scheduled Rate (Scheduler-Pipe):
avg: AVG((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
min: MIN((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
max: MAX((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
unit: Pct
tips:
Scheduler-Pipe FIFO Full Rate:
avg: AVG((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
min: MIN((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
max: MAX((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None)
unit: Pct
tips:
Scheduler-Pipe Stall Rate:
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
min: MIN((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
max: MAX((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
0) else None))
unit: Pct
tips:
Scratch Stall Rate:
avg: AVG((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
min: MIN((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
max: MAX((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
unit: Pct
tips:
Insufficient SIMD Waveslots:
avg: AVG(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
Insufficient SIMD VGPRs:
avg: AVG(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
Insufficient SIMD SGPRs:
avg: AVG(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
Insufficient CU LDS:
avg: AVG(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
Insufficient CU Barriers:
avg: AVG(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
Reached CU Workgroup Limit:
avg: AVG(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
Reached CU Wavefront Limit:
avg: AVG(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
min: MIN(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
max: MAX(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
unit: Pct
tips:
@@ -0,0 +1,142 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 700
title: Wavefront
data source:
- metric_table:
id: 701
title: Wavefront Launch Stats
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
Grid Size:
avg: AVG(Grid_Size)
min: MIN(Grid_Size)
max: MAX(Grid_Size)
unit: Work Items
tips:
Workgroup Size:
avg: AVG(Workgroup_Size)
min: MIN(Workgroup_Size)
max: MAX(Workgroup_Size)
unit: Work Items
tips:
Total Wavefronts:
avg: AVG(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
min: MIN(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
max: MAX(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
unit: Wavefronts
tips:
Saved Wavefronts:
avg: AVG(SQ_WAVES_SAVED)
min: MIN(SQ_WAVES_SAVED)
max: MAX(SQ_WAVES_SAVED)
unit: Wavefronts
tips:
Restored Wavefronts:
avg: AVG(SQ_WAVES_RESTORED)
min: MIN(SQ_WAVES_RESTORED)
max: MAX(SQ_WAVES_RESTORED)
unit: Wavefronts
tips:
VGPRs:
avg: AVG(Arch_VGPR)
min: MIN(Arch_VGPR)
max: MAX(Arch_VGPR)
unit: Registers
tips:
AGPRs:
avg: AVG(Accum_VGPR)
min: MIN(Accum_VGPR)
max: MAX(Accum_VGPR)
unit: Registers
tips:
SGPRs:
avg: AVG(SGPR)
min: MIN(SGPR)
max: MAX(SGPR)
unit: Registers
tips:
LDS Allocation:
avg: AVG(LDS_Per_Workgroup)
min: MIN(LDS_Per_Workgroup)
max: MAX(LDS_Per_Workgroup)
unit: Bytes
tips:
Scratch Allocation:
avg: AVG(Scratch_Per_Workitem)
min: MIN(Scratch_Per_Workitem)
max: MAX(Scratch_Per_Workitem)
unit: Bytes/Workitem
tips:
- metric_table:
id: 702
title: Wavefront Runtime Stats
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
Kernel Time (Nanosec):
avg: AVG((End_Timestamp - Start_Timestamp))
min: MIN((End_Timestamp - Start_Timestamp))
max: MAX((End_Timestamp - Start_Timestamp))
unit: ns
tips:
Kernel Time (Cycles):
avg: AVG($GRBM_GUI_ACTIVE_PER_XCD)
min: MIN($GRBM_GUI_ACTIVE_PER_XCD)
max: MAX($GRBM_GUI_ACTIVE_PER_XCD)
unit: Cycle
tips:
Instructions per wavefront:
avg: AVG((SQ_INSTS / SQ_WAVES))
min: MIN((SQ_INSTS / SQ_WAVES))
max: MAX((SQ_INSTS / SQ_WAVES))
unit: Instr/wavefront
tips:
Wave Cycles:
avg: AVG(((4 * SQ_WAVE_CYCLES) / $denom))
min: MIN(((4 * SQ_WAVE_CYCLES) / $denom))
max: MAX(((4 * SQ_WAVE_CYCLES) / $denom))
unit: (Cycles + $normUnit)
tips:
Dependency Wait Cycles:
avg: AVG(((4 * SQ_WAIT_ANY) / $denom))
min: MIN(((4 * SQ_WAIT_ANY) / $denom))
max: MAX(((4 * SQ_WAIT_ANY) / $denom))
unit: (Cycles + $normUnit)
tips:
Issue Wait Cycles:
avg: AVG(((4 * SQ_WAIT_INST_ANY) / $denom))
min: MIN(((4 * SQ_WAIT_INST_ANY) / $denom))
max: MAX(((4 * SQ_WAIT_INST_ANY) / $denom))
unit: (Cycles + $normUnit)
tips:
Active Cycles:
avg: AVG(((4 * SQ_ACTIVE_INST_ANY) / $denom))
min: MIN(((4 * SQ_ACTIVE_INST_ANY) / $denom))
max: MAX(((4 * SQ_ACTIVE_INST_ANY) / $denom))
unit: (Cycles + $normUnit)
tips:
Wavefront Occupancy:
avg: AVG((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
min: MIN((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
max: MAX((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
unit: Wavefronts
coll_level: SQ_LEVEL_WAVES
tips:
@@ -185,15 +185,6 @@ Panel Config:
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
Global/Generic Coalesceable Instr:
avg: None
# AVG((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
min: None
# MIN((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
max: None
# MAX((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
unit: (instr + $normUnit)
tips:
Global/Generic Read:
avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
@@ -290,3 +281,9 @@ Panel Config:
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
unit: (instr + $normUnit)
tips:
MFMA-F6F4:
avg: AVG((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
min: MIN((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
max: MAX((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
unit: (instr + $normUnit)
tips:
@@ -0,0 +1,293 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 1100
title: Compute Units - Compute Pipeline
data source:
- metric_table:
id: 1101
title: Speed-of-Light
header:
metric: Metric
value: Avg
unit: Unit
peak: Peak
pop: Pct of Peak
tips: Tips
metric:
VALU FLOPs:
value: AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+ (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+ SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32)))) + (64 * (((SQ_INSTS_VALU_ADD_F64
+ SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64) + (2 * SQ_INSTS_VALU_FMA_F64))))
/ (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: ((100 * AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16)
+ SQ_INSTS_VALU_TRANS_F16) + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32
+ SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32))))
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (2 * SQ_INSTS_VALU_FMA_F64)))) / (End_Timestamp - Start_Timestamp)))) / (((($max_sclk
* $cu_per_gpu) * 64) * 2) / 1000))
tips:
VALU IOPs:
value: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
tips:
MFMA FLOPs (F8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
tips:
MFMA FLOPs (BF16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
tips:
MFMA FLOPs (F16):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
tips:
MFMA FLOPs (F32):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
MFMA FLOPs (F64):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
MFMA FLOPs (F6F4):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GFLOP
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
tips:
MFMA IOPs (INT8):
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
unit: GIOP
peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp))))
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
tips:
- metric_table:
id: 1102
title: Pipeline Stats
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
IPC:
avg: AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))
min: MIN((SQ_INSTS / SQ_BUSY_CU_CYCLES))
max: MAX((SQ_INSTS / SQ_BUSY_CU_CYCLES))
unit: Instr/cycle
tips:
IPC (Issued):
avg: AVG(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
+ SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED + SQ_INSTS_LDS)
/ SQ_ACTIVE_INST_ANY))
min: MIN(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
+ SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED + SQ_INSTS_LDS)
/ SQ_ACTIVE_INST_ANY))
max: MAX(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
+ SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED + SQ_INSTS_LDS)
/ SQ_ACTIVE_INST_ANY))
unit: Instr/cycle
tips:
SALU Utilization:
avg: AVG((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
max: MAX((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
VALU Utilization:
avg: AVG((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
# Precentage of VALU instructions which are issued to two VALUs at a time
VALU Co-Issue Efficiency:
avg: AVG((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
min: MIN((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
max: MAX((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
unit: pct
tips:
VMEM Utilization:
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
max: MAX((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
Branch Utilization:
avg: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
min: MIN((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
max: MAX((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
unit: pct
tips:
VALU Active Threads:
avg: AVG(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
!= 0) else None))
min: MIN(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
!= 0) else None))
max: MAX(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
!= 0) else None))
unit: Threads
tips:
MFMA Utilization:
avg: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
min: MIN(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
max: MAX(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
unit: pct
tips:
MFMA Instr Cycles:
avg: AVG(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
else None))
min: MIN(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
else None))
max: MAX(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
else None))
unit: cycles/instr
tips:
VMEM Latency:
avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
else None))
min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
else None))
max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
else None))
unit: Cycles
coll_level: SQ_INST_LEVEL_VMEM
tips:
SMEM Latency:
avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
else None))
min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
else None))
max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
else None))
unit: Cycles
coll_level: SQ_INST_LEVEL_SMEM
tips:
- metric_table:
id: 1103
title: Arithmetic Operations
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
FLOPs (Total):
avg: AVG((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+ (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
* SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+ SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
$denom))
min: MIN((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+ (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
* SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+ SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
$denom))
max: MAX((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
+ (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
* SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
+ SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
$denom))
unit: (OPs + $normUnit)
tips:
IOPs (Total):
avg: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
min: MIN(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
max: MAX(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
unit: (OPs + $normUnit)
tips:
F8 OPs:
avg: AVG(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
min: MIN(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
max: MAX(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
unit: (OPs + $normUnit)
tips:
F16 OPs:
avg: AVG(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
(64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
min: MIN(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
(64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
max: MAX(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
(64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
unit: (OPs + $normUnit)
tips:
BF16 OPs:
avg: AVG(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
min: MIN(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
max: MAX(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
unit: (OPs + $normUnit)
tips:
F32 OPs:
avg: AVG((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
+ (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
min: MIN((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
+ (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
max: MAX((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
+ (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
unit: (OPs + $normUnit)
tips:
F64 OPs:
avg: AVG((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
min: MIN((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
max: MAX((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
unit: (OPs + $normUnit)
tips:
F6F4 OPs:
avg: AVG((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
min: MIN((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
max: MAX((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
unit: (OPs + $normUnit)
tips:
INT8 OPs:
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
max: MAX(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
unit: (OPs + $normUnit)
tips:
@@ -0,0 +1,166 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 1200
title: Local Data Share (LDS)
data source:
- metric_table:
id: 1201
title: Speed-of-Light
header:
metric: Metric
value: Avg
unit: Unit
tips: Tips
metric:
Utilization:
value: AVG(((100 * SQ_LDS_IDX_ACTIVE) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: Pct of Peak
tips:
Access Rate:
value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: Pct of Peak
tips:
Theoretical Bandwidth (% of Peak):
value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
unit: Pct of Peak
tips:
Bank Conflict Rate:
value: AVG((((SQ_LDS_BANK_CONFLICT * 3.125) / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
unit: Pct of Peak
tips:
comparable: false # for now
cli_style: simple_bar
- metric_table:
id: 1202
title: LDS Stats
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
LDS Instrs:
avg: AVG((SQ_INSTS_LDS / $denom))
min: MIN((SQ_INSTS_LDS / $denom))
max: MAX((SQ_INSTS_LDS / $denom))
unit: (Instr + $normUnit)
tips:
LDS LOAD:
avg: AVG((SQ_INSTS_LDS_LOAD / $denom))
min: MIN((SQ_INSTS_LDS_LOAD / $denom))
max: MAX((SQ_INSTS_LDS_LOAD / $denom))
unit: (instr + $normUnit)
tips:
LDS STORE:
avg: AVG((SQ_INSTS_LDS_STORE / $denom))
min: MIN((SQ_INSTS_LDS_STORE / $denom))
max: MAX((SQ_INSTS_LDS_STORE / $denom))
unit: (instr + $normUnit)
tips:
LDS ATOMIC:
avg: AVG((SQ_INSTS_LDS_ATOMIC / $denom))
min: MIN((SQ_INSTS_LDS_ATOMIC / $denom))
max: MAX((SQ_INSTS_LDS_ATOMIC / $denom))
unit: (instr + $normUnit)
tips:
LDS LOAD Bandwidth:
avg: AVG(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
min: MIN(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
max: MAX(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
units: Gbps
tips:
LDS STORE Bandwidth:
avg: AVG(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
min: MIN(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
max: MAX(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
units: Gbps
tips:
LDS ATOMIC Bandwidth:
avg: AVG(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
min: MIN(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
max: MAX(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
units: Gbps
tips:
Theoretical Bandwidth:
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
/ $denom))
unit: (Bytes + $normUnit)
tips:
LDS Latency:
avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
unit: Cycles
coll_level: SQ_INST_LEVEL_LDS
tips:
Bank Conflicts/Access:
avg: AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
min: MIN(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
max: MAX(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
unit: Conflicts/Access
tips:
Index Accesses:
avg: AVG((SQ_LDS_IDX_ACTIVE / $denom))
min: MIN((SQ_LDS_IDX_ACTIVE / $denom))
max: MAX((SQ_LDS_IDX_ACTIVE / $denom))
unit: (Cycles + $normUnit)
tips:
Atomic Return Cycles:
avg: AVG((SQ_LDS_ATOMIC_RETURN / $denom))
min: MIN((SQ_LDS_ATOMIC_RETURN / $denom))
max: MAX((SQ_LDS_ATOMIC_RETURN / $denom))
unit: (Cycles + $normUnit)
tips:
Bank Conflict:
avg: AVG((SQ_LDS_BANK_CONFLICT / $denom))
min: MIN((SQ_LDS_BANK_CONFLICT / $denom))
max: MAX((SQ_LDS_BANK_CONFLICT / $denom))
unit: (Cycles + $normUnit)
tips:
Addr Conflict:
avg: AVG((SQ_LDS_ADDR_CONFLICT / $denom))
min: MIN((SQ_LDS_ADDR_CONFLICT / $denom))
max: MAX((SQ_LDS_ADDR_CONFLICT / $denom))
unit: (Cycles + $normUnit)
tips:
Unaligned Stall:
avg: AVG((SQ_LDS_UNALIGNED_STALL / $denom))
min: MIN((SQ_LDS_UNALIGNED_STALL / $denom))
max: MAX((SQ_LDS_UNALIGNED_STALL / $denom))
unit: (Cycles + $normUnit)
tips:
Mem Violations:
avg: AVG((SQ_LDS_MEM_VIOLATIONS / $denom))
min: MIN((SQ_LDS_MEM_VIOLATIONS / $denom))
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
unit: (Accesses + $normUnit)
tips:
LDS Command FIFO Full Rate:
avg: AVG((SQ_LDS_CMD_FIFO_FULL / $denom))
min: MIN((SQ_LDS_CMD_FIFO_FULL / $denom))
max: MAX((SQ_LDS_CMD_FIFO_FULL / $denom))
unit: (Cycles + $normUnit)
tips:
LDS Data FIFO Full Rate:
avg: AVG((SQ_LDS_DATA_FIFO_FULL / $denom))
min: MIN((SQ_LDS_DATA_FIFO_FULL / $denom))
max: MAX((SQ_LDS_DATA_FIFO_FULL / $denom))
unit: (Cycles + $normUnit)
tips:
@@ -0,0 +1,105 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 1300
title: Instruction Cache
data source:
- metric_table:
id: 1301
title: Speed-of-Light
header:
metric: Metric
value: Avg
unit: Unit
tips: Tips
metric:
Bandwidth:
value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu)
* (End_Timestamp - Start_Timestamp))))
unit: Pct of Peak
tips:
Cache Hit Rate:
value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+ SQC_ICACHE_MISSES_DUPLICATE)))
unit: Pct of Peak
tips:
L1I-L2 Bandwidth:
value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
* (End_Timestamp - Start_Timestamp))))
unit: Pct of Peak
tips:
comparable: false # for now
cli_style: simple_bar
- metric_table:
id: 1302
title: Instruction Cache Accesses
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
Req:
avg: AVG((SQC_ICACHE_REQ / $denom))
min: MIN((SQC_ICACHE_REQ / $denom))
max: MAX((SQC_ICACHE_REQ / $denom))
unit: (Req + $normUnit)
tips:
Hits:
avg: AVG((SQC_ICACHE_HITS / $denom))
min: MIN((SQC_ICACHE_HITS / $denom))
max: MAX((SQC_ICACHE_HITS / $denom))
unit: (Hits + $normUnit)
tips:
Misses - Non Duplicated:
avg: AVG((SQC_ICACHE_MISSES / $denom))
min: MIN((SQC_ICACHE_MISSES / $denom))
max: MAX((SQC_ICACHE_MISSES / $denom))
unit: (Misses + $normUnit)
tips:
Misses - Duplicated:
avg: AVG((SQC_ICACHE_MISSES_DUPLICATE / $denom))
min: MIN((SQC_ICACHE_MISSES_DUPLICATE / $denom))
max: MAX((SQC_ICACHE_MISSES_DUPLICATE / $denom))
unit: (Misses + $normUnit)
tips:
Cache Hit Rate:
avg: AVG(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+ SQC_ICACHE_MISSES_DUPLICATE)))
min: MIN(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+ SQC_ICACHE_MISSES_DUPLICATE)))
max: MAX(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
+ SQC_ICACHE_MISSES_DUPLICATE)))
unit: pct
tips:
Instruction Fetch Latency:
avg: AVG((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
min: MIN((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
max: MAX((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
unit: Cycles
coll_level: SQ_IFETCH_LEVEL
tips:
- metric_table:
id: 1303
title: Instruction Cache - L2 Interface
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
L1I-L2 Bandwidth:
avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
unit: (Bytes + $normUnit)
tips:
@@ -0,0 +1,171 @@
---
# Add description/tips for each metric in this section.
# So it could be shown in hover.
Metric Description:
# Define the panel properties and properties of each metric in the panel.
Panel Config:
id: 1400
title: Scalar L1 Data Cache
data source:
- metric_table:
id: 1401
title: Speed-of-Light
header:
metric: Metric
value: Avg
unit: Unit
tips: Tips
metric:
Bandwidth:
value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu)
* (End_Timestamp - Start_Timestamp))))
unit: Pct of Peak
tips:
Cache Hit Rate:
value: AVG((((SQC_DCACHE_HITS * 100) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES + SQC_DCACHE_MISSES_DUPLICATE))
if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
unit: Pct of Peak
tips:
sL1D-L2 BW:
value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 100000)
/ (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
unit: Pct of Peak
tips:
comparable: false # for now
cli_style: simple_bar
- metric_table:
id: 1402
title: Scalar L1D Cache Accesses
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
Req:
avg: AVG((SQC_DCACHE_REQ / $denom))
min: MIN((SQC_DCACHE_REQ / $denom))
max: MAX((SQC_DCACHE_REQ / $denom))
unit: (Req + $normUnit)
tips:
Hits:
avg: AVG((SQC_DCACHE_HITS / $denom))
min: MIN((SQC_DCACHE_HITS / $denom))
max: MAX((SQC_DCACHE_HITS / $denom))
unit: (Req + $normUnit)
tips:
Misses - Non Duplicated:
avg: AVG((SQC_DCACHE_MISSES / $denom))
min: MIN((SQC_DCACHE_MISSES / $denom))
max: MAX((SQC_DCACHE_MISSES / $denom))
unit: (Req + $normUnit)
tips:
Misses- Duplicated:
avg: AVG((SQC_DCACHE_MISSES_DUPLICATE / $denom))
min: MIN((SQC_DCACHE_MISSES_DUPLICATE / $denom))
max: MAX((SQC_DCACHE_MISSES_DUPLICATE / $denom))
unit: (Req + $normUnit)
tips:
Cache Hit Rate:
avg: AVG((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+ SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+ SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
min: MIN((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+ SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+ SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
max: MAX((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+ SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
+ SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
unit: pct
tips:
Read Req (Total):
avg: AVG((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
+ SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
min: MIN((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
+ SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
max: MAX((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
+ SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
unit: (Req + $normUnit)
tips:
Atomic Req:
avg: AVG((SQC_DCACHE_ATOMIC / $denom))
min: MIN((SQC_DCACHE_ATOMIC / $denom))
max: MAX((SQC_DCACHE_ATOMIC / $denom))
unit: (Req + $normUnit)
tips:
Read Req (1 DWord):
avg: AVG((SQC_DCACHE_REQ_READ_1 / $denom))
min: MIN((SQC_DCACHE_REQ_READ_1 / $denom))
max: MAX((SQC_DCACHE_REQ_READ_1 / $denom))
unit: (Req + $normUnit)
tips:
Read Req (2 DWord):
avg: AVG((SQC_DCACHE_REQ_READ_2 / $denom))
min: MIN((SQC_DCACHE_REQ_READ_2 / $denom))
max: MAX((SQC_DCACHE_REQ_READ_2 / $denom))
unit: (Req + $normUnit)
tips:
Read Req (4 DWord):
avg: AVG((SQC_DCACHE_REQ_READ_4 / $denom))
min: MIN((SQC_DCACHE_REQ_READ_4 / $denom))
max: MAX((SQC_DCACHE_REQ_READ_4 / $denom))
unit: (Req + $normUnit)
tips:
Read Req (8 DWord):
avg: AVG((SQC_DCACHE_REQ_READ_8 / $denom))
min: MIN((SQC_DCACHE_REQ_READ_8 / $denom))
max: MAX((SQC_DCACHE_REQ_READ_8 / $denom))
unit: (Req + $normUnit)
tips:
Read Req (16 DWord):
avg: AVG((SQC_DCACHE_REQ_READ_16 / $denom))
min: MIN((SQC_DCACHE_REQ_READ_16 / $denom))
max: MAX((SQC_DCACHE_REQ_READ_16 / $denom))
unit: (Req + $normUnit)
tips:
- metric_table:
id: 1403
title: Scalar L1D Cache - L2 Interface
header:
metric: Metric
avg: Avg
min: Min
max: Max
unit: Unit
tips: Tips
metric:
sL1D-L2 BW:
avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
unit: (Bytes + $normUnit)
tips:
Read Req:
avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
min: MIN((SQC_TC_DATA_READ_REQ / $denom))
max: MAX((SQC_TC_DATA_READ_REQ / $denom))
unit: (Req + $normUnit)
tips:
Write Req:
avg: AVG((SQC_TC_DATA_WRITE_REQ / $denom))
min: MIN((SQC_TC_DATA_WRITE_REQ / $denom))
max: MAX((SQC_TC_DATA_WRITE_REQ / $denom))
unit: (Req + $normUnit)
tips:
Atomic Req:
avg: AVG((SQC_TC_DATA_ATOMIC_REQ / $denom))
min: MIN((SQC_TC_DATA_ATOMIC_REQ / $denom))
max: MAX((SQC_TC_DATA_ATOMIC_REQ / $denom))
unit: (Req + $normUnit)
tips:
Stall Cycles:
avg: AVG((SQC_TC_STALL / $denom))
min: MIN((SQC_TC_STALL / $denom))
max: MAX((SQC_TC_STALL / $denom))
unit: (Cycles + $normUnit)
tips:
@@ -43,6 +43,24 @@ Panel Config:
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
unit: pct
tips:
Sequencer → TA Address Stall:
avg: AVG((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
min: MIN((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
max: MAX((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
unit: (Cycles + $normUnit)
tips:
Sequencer → TA Command Stall:
avg: AVG((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
min: MIN((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
max: MAX((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
unit: (Cycles + $normUnit)
tips:
Sequencer → TA Data Stall:
avg: AVG((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
min: MIN((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
max: MAX((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
unit: (Cycles + $normUnit)
tips:
Total Instructions:
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
@@ -32,12 +32,12 @@ Panel Config:
tips:
L2-Fabric Read BW:
value: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum
* 128)) / (End_Timestamp - Start_Timestamp))
* 128)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
L2-Fabric Write and Atomic BW:
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
* 32)) / (End_Timestamp - Start_Timestamp))
* 32)) / (End_Timestamp - Start_Timestamp)))
unit: GB/s
tips:
@@ -52,6 +52,15 @@ Panel Config:
unit: Unit
tips: Tips
metric:
Read BW:
avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
* 64)) / $denom))
min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
* 64)) / $denom))
max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
* 64)) / $denom))
unit: (Bytes + $normUnit)
tips:
Read BW:
avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
@@ -457,13 +466,13 @@ Panel Config:
max: MAX((TCC_EA0_RD_UNCACHED_32B_sum / $denom))
unit: (Req + $normUnit)
tips:
Read - HBM:
HBM Read:
avg: AVG((TCC_EA0_RDREQ_DRAM_sum / $denom))
min: MIN((TCC_EA0_RDREQ_DRAM_sum / $denom))
max: MAX((TCC_EA0_RDREQ_DRAM_sum / $denom))
unit: (Req + $normUnit)
tips:
Read - Remote:
Remote Read:
avg: AVG((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
min: MIN((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
@@ -505,13 +514,13 @@ Panel Config:
max: MAX((TCC_EA0_WRREQ_64B_sum / $denom))
unit: (Req + $normUnit)
tips:
Write - HBM:
HBM Write and Atomic:
avg: AVG((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
min: MIN((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
max: MAX((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
unit: (Req + $normUnit)
tips:
Write and Atomic - Remote:
Remote Write and Atomic:
avg: AVG((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
min: MIN((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
@@ -0,0 +1,9 @@
---
Panel Config:
id: 2100
title: PC Sampling
data source:
- pc_sampling_table:
id: 2101
source: ps_file
comparable: false # enable it later
@@ -42,15 +42,16 @@ from utils.logger import (
console_warning,
demarcate,
)
from utils.mi_gpu_spec import get_gpu_model, get_gpu_series
from utils.mi_gpu_spec import get_gpu_model, get_gpu_series, get_num_xcds
from utils.parser import build_in_vars, supported_denom
from utils.utils import (
capture_subprocess_output,
convert_metric_id_to_panel_idx,
detect_rocprof,
get_base_spi_pipe_counter,
get_submodules,
is_spi_pipe_counter,
is_tcc_channel_counter,
total_xcds,
using_v3,
)
@@ -186,7 +187,7 @@ class OmniSoC_Base:
self._mspec.gpu_arch, self._mspec.gpu_chip_id
)
self._mspec.num_xcd = str(
total_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
get_num_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
)
@demarcate
@@ -316,10 +317,10 @@ class OmniSoC_Base:
counters = counters - {"SQ_INSTS_VALU_MFMA_F8", "SQ_INSTS_VALU_MFMA_MOPS_F8"}
# Following counters are not supported
# TCP_TCP_LATENCY_sum (except for gfx908 and gfx90a)
# TCP_TCP_LATENCY_sum (except for gfx950)
# SQC_DCACHE_INFLIGHT_LEVEL
counters = counters - {"SQC_DCACHE_INFLIGHT_LEVEL"}
if self.__arch not in ("gfx908", "gfx90a"):
if self.__arch != "gfx950":
counters = counters - {"TCP_TCP_LATENCY_sum"}
# SQ_ACCUM_PREV_HIRES will be injected for level counters later on
@@ -510,6 +511,8 @@ class OmniSoC_Base:
file_count = 0
# Store all channels for a TCC channel counter in the same file
tcc_channel_counter_file_map = dict()
# Store all pipes for SPI pipe counters in the same file
spi_pipe_counter_file_map = dict()
for ctr in counters:
# Store all channels for a TCC channel counter in the same file
if is_tcc_channel_counter(ctr):
@@ -517,13 +520,27 @@ class OmniSoC_Base:
if output_file:
output_file.add(ctr)
continue
# Store all pipes for SPI pipe counters in the same file
if is_spi_pipe_counter(ctr):
output_file = spi_pipe_counter_file_map.get(
get_base_spi_pipe_counter(ctr)
)
if output_file:
output_file.add(ctr)
continue
# Add counter to first file that has room
added = False
for i in range(len(output_files)):
if output_files[i].add(ctr):
added = True
# Store all channels for a TCC channel counter in the same file
if is_tcc_channel_counter(ctr):
tcc_channel_counter_file_map[ctr.split("[")[0]] = output_files[i]
# Store all pipes for SPI pipe counters in the same file
if is_spi_pipe_counter(ctr):
spi_pipe_counter_file_map[get_base_spi_pipe_counter(ctr)] = (
output_files[i]
)
break
# All files are full, create a new file
@@ -711,8 +728,18 @@ class LimitedSet:
if e.split("[")[0] in {element.split("[")[0] for element in self.elements}:
self.elements.append(e)
return True
# Store all pipes for SPI pipe counters in the same file
if is_spi_pipe_counter(e) and get_base_spi_pipe_counter(e) in {
get_base_spi_pipe_counter(element) for element in self.elements
}:
self.elements.append(e)
return True
if self.avail > 0:
self.avail -= 1
# SPI pipe counters take space of 2 counters
if is_spi_pipe_counter(e):
self.avail -= 2
else:
self.avail -= 1
self.elements.append(e)
return True
return False
@@ -54,10 +54,6 @@ class gfx908_soc(OmniSoC_Base):
self._mspec._l2_banks = 32
self._mspec.lds_banks_per_cu = 32
self._mspec.pipes_per_gpu = 4
# --showmclkrange is broken in Mi100, hardcode freq
if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
self._mspec.max_mclk = 1200
self._mspec.cur_mclk = 1200
# -----------------------
# Required child methods
@@ -64,12 +64,6 @@ class gfx90a_soc(OmniSoC_Base):
)
self.roofline_obj = Roofline(args, self._mspec)
# Workaround for broken --showmclkrange
# MI210/MI250/MI250X have 1600MHz mclk
if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
self._mspec.max_mclk = 1600
self._mspec.cur_mclk = 1600
# Set arch specific specs
self._mspec._l2_banks = 32
self._mspec.lds_banks_per_cu = 32
@@ -64,12 +64,6 @@ class gfx942_soc(OmniSoC_Base):
)
self.roofline_obj = Roofline(args, self._mspec)
# Workaround for broken --showmclkrange
# MI300X/MI300A/MI308X have 1300MHz mclk
if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
self._mspec.max_mclk = 1300
self._mspec.cur_mclk = 1300
# Set arch specific specs
self._mspec._l2_banks = 16
self._mspec.lds_banks_per_cu = 32
@@ -0,0 +1,117 @@
##############################################################################bl
# MIT License
#
# Copyright (c) 2021 - 2025 Advanced Micro Devices, Inc. All Rights Reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
##############################################################################el
from pathlib import Path
import config
from rocprof_compute_soc.soc_base import OmniSoC_Base
from roofline import Roofline
from utils.logger import demarcate
from utils.utils import console_error, console_log, mibench
class gfx950_soc(OmniSoC_Base):
def __init__(self, args, mspec):
super().__init__(args, mspec)
self.set_arch("gfx950")
if hasattr(self.get_args(), "roof_only") and self.get_args().roof_only:
self.set_perfmon_dir(
str(
Path(str(config.rocprof_compute_home)).joinpath(
"rocprof_compute_soc",
"profile_configs",
"gfx950",
"roofline",
)
)
)
else:
# NB: We're using generalized Mi300 perfmon configs
self.set_perfmon_dir(
str(
Path(str(config.rocprof_compute_home)).joinpath(
"rocprof_compute_soc",
"profile_configs",
"gfx950",
)
)
)
self.set_compatible_profilers(["rocprofv3"])
# Per IP block max number of simultaneous counters. GFX IP Blocks
self.set_perfmon_config(
{
"SQ": 8,
"TA": 2,
"TD": 2,
"TCP": 4,
"TCC": 4,
"CPC": 2,
"CPF": 2,
"SPI": 2,
"GRBM": 2,
"GDS": 4,
"TCC_channels": 16,
}
)
self.roofline_obj = Roofline(args, self._mspec)
# Set arch specific specs
self._mspec._l2_banks = 16
self._mspec.lds_banks_per_cu = 32
self._mspec.pipes_per_gpu = 4
# -----------------------
# Required child methods
# -----------------------
@demarcate
def profiling_setup(self):
"""Perform any SoC-specific setup prior to profiling."""
super().profiling_setup()
# Performance counter filtering
self.perfmon_filter(self.get_args().roof_only)
@demarcate
def post_profiling(self):
"""Perform any SoC-specific post profiling activities."""
super().post_profiling()
if not self.get_args().no_roof:
console_log(
"roofline", "Checking for roofline.csv in " + str(self.get_args().path)
)
if not Path(self.get_args().path).joinpath("roofline.csv").is_file():
mibench(self.get_args(), self._mspec)
self.roofline_obj.post_processing()
else:
console_log("roofline", "Skipping roofline")
@demarcate
def analysis_setup(self, roofline_parameters=None):
"""Perform any SoC-specific setup prior to analysis."""
super().analysis_setup()
# configure roofline for analysis
if roofline_parameters:
self.roofline_obj = Roofline(
self.get_args(), self._mspec, roofline_parameters
)
@@ -120,7 +120,7 @@ def discrete_background_color_bins(df, n_bins=5, columns="all"):
####################
# GRAPHICAL ELEMENTS
####################
def build_bar_chart(display_df, table_config, barchart_elements, norm_filt, hbm_bw):
def build_bar_chart(display_df, table_config, barchart_elements, norm_filt):
"""
Read data into a bar chart. ID will determine which subtype of barchart.
"""
@@ -214,6 +214,9 @@ def build_bar_chart(display_df, table_config, barchart_elements, norm_filt, hbm_
orientation="h",
).update_xaxes(range=[0, 110], ticks="inside", title="%")
) # append first % chart
hbm_bw = float(
display_df[display_df["Metric"] == "HBM Bandwidth"]["Avg"].iloc[0]
)
d_figs.append(
px.bar(
display_df[display_df["Unit"] == "Gb/s"],
@@ -1,7 +1,5 @@
import os
import sys
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Union
from typing import Any, Dict
import yaml
@@ -13,14 +11,20 @@ MI50 = 0
MI100 = 1
MI200 = 2
MI300 = 3
MI350 = 4
MI_CONSTANS = {MI50: "mi50", MI100: "mi100", MI200: "mi200", MI300: "mi300"}
MI_CONSTANS = {
MI50: "mi50",
MI100: "mi100",
MI200: "mi200",
MI300: "mi300",
MI350: "mi350",
}
gpu_series_dict = {} # key: gpu arch
gpu_model_dict = {} # key: gpu_arch
mi300_num_xcds_dict = {} # key: gpu model
mi300_nps_dict = {} # key: gpu model
mi300_chip_id_dict = {} # key: chip id (int)
num_xcds_dict = {} # key: gpu model
chip_id_dict = {} # key: chip id (int)
# ----------------------------
@@ -60,10 +64,9 @@ def parse_mi_gpu_spec():
MI GPUs
|-- series
|-- architecture (list)
|-- models
|-- chip_ids
|-- mi300_arch
|-- partition_mode
|-- gpu model
|-- chip_ids
|-- partition_mode
"""
current_dir = os.path.dirname(__file__)
@@ -71,61 +74,26 @@ def parse_mi_gpu_spec():
# Load the YAML data
yaml_data = load_yaml(yaml_file_path)
mi300_models_dict = {}
for mi_index, mi_series in MI_CONSTANS.items():
if mi_series != MI_CONSTANS[MI300]:
console_debug("[parse_mi_gpu_spec] Processing series: %s" % mi_series)
for key, value in yaml_data.items():
# parse out gpu series and gpu model information for mi50, 100, 200
curr_gpu_arch = value[mi_index]["gpu_archs"][0]["gpu_arch"]
gpu_series_dict[curr_gpu_arch] = mi_series
gpu_model_dict[curr_gpu_arch] = []
for models in value[mi_index]["gpu_archs"][0]["models"]:
gpu_model_dict[curr_gpu_arch].append(models["gpu_model"])
elif mi_series == MI_CONSTANS[MI300]:
# MI300 requires specific processing
for key, value in yaml_data.items():
mi300_gpu_archs_list = []
# NOTE: only MI300 have multiple architectures
for archs in value[MI300]["gpu_archs"]:
curr_gpu_arch = archs["gpu_arch"]
mi300_gpu_archs_list.append(curr_gpu_arch)
gpu_series_dict[curr_gpu_arch] = mi_series
for idx, arch in enumerate(mi300_gpu_archs_list):
mi300_models_dict[arch] = []
for models in value[MI300]["gpu_archs"][idx]["models"]:
gpu_model = models["gpu_model"]
# 1. Parse compute partition. NOTE: compute partition mode num xcds is available for all mi300 gpu models
mi300_num_xcds_dict[gpu_model] = models["partition_mode"][
"compute_partition_mode"
]["num_xcds"]
# 2. Parse memory_partition. NOTE: memory partition mode nps is available for all mi300 gpu models
mi300_nps_dict[gpu_model] = models["partition_mode"][
"memory_partition_mode"
]
# 3. Parse chip id (physical and virtual).
if models["chip_ids"]["physical"]:
# save chip_id, gpu_model pair if chip id is available
# NOTE: chip id is available for all gfx942 machines
mi300_chip_id_dict[models["chip_ids"]["physical"]] = models[
"gpu_model"
]
if models["chip_ids"]["virtual"]:
# save chip_id, gpu_model pair if chip id is available
# NOTE: chip id is available for all gfx942 machines
mi300_chip_id_dict[models["chip_ids"]["virtual"]] = models[
"gpu_model"
]
mi300_models_dict[arch].append(gpu_model)
gpu_model_dict.update(mi300_models_dict)
for series in yaml_data["mi_gpu_spec"]:
curr_gpu_series = series["gpu_series"]
console_debug("[parse_mi_gpu_spec] Processing series: %s" % curr_gpu_series)
for archs in series["gpu_archs"]:
curr_gpu_arch = archs["gpu_arch"]
gpu_series_dict[curr_gpu_arch] = curr_gpu_series
gpu_model_dict[curr_gpu_arch] = []
for models in archs["models"]:
curr_gpu_model = models["gpu_model"]
gpu_model_dict[curr_gpu_arch].append(curr_gpu_model)
num_xcds_dict[curr_gpu_model] = (
models.get("partition_mode", {})
.get("compute_partition_mode", {})
.get("num_xcds", {})
)
if "chip_ids" in models and "physical" in models["chip_ids"]:
chip_id_dict[models["chip_ids"]["physical"]] = curr_gpu_model
if "chip_ids" in models and "virtual" in models["chip_ids"]:
chip_id_dict[models["chip_ids"]["virtual"]] = curr_gpu_model
def get_gpu_series_dict():
@@ -164,9 +132,9 @@ def get_gpu_model(gpu_arch_, chip_id_):
gpu_arch_lower = gpu_arch_.lower()
# Handle gfx942 with chip_id mapping
if gpu_arch_lower == "gfx942":
if chip_id_ and int(chip_id_) in mi300_chip_id_dict:
gpu_model = mi300_chip_id_dict.get(int(chip_id_))
if gpu_arch_lower not in ("gfx906", "gfx908", "gfx90a"):
if chip_id_ and int(chip_id_) in chip_id_dict:
gpu_model = chip_id_dict.get(int(chip_id_))
else:
console_warning(f"No gpu model found for chip id: {chip_id_}")
return None
@@ -186,8 +154,12 @@ def get_gpu_model(gpu_arch_, chip_id_):
return gpu_model.upper()
def get_mi300_num_xcds(gpu_model_, compute_partition_):
if not mi300_num_xcds_dict:
def get_num_xcds(gpu_model_, compute_partition_):
# Only gpu in and above mi 300 series have more than one XCDs
if gpu_model_.lower() in ("mi50", "mi60", "mi100", "mi210", "mi250", "mi250x"):
return 1
if not num_xcds_dict:
console_error(
"mi300_num_xcds_dict not yet populated, did you run parse_mi_gpu_spec()?"
)
@@ -196,10 +168,10 @@ def get_mi300_num_xcds(gpu_model_, compute_partition_):
gpu_model_lower = gpu_model_.lower()
partition_lower = compute_partition_.lower()
if gpu_model_lower not in mi300_num_xcds_dict:
if gpu_model_lower not in num_xcds_dict:
return None
model_dict = mi300_num_xcds_dict[gpu_model_lower]
model_dict = num_xcds_dict[gpu_model_lower]
if partition_lower not in model_dict:
console_log(f"Unknown compute partition: {compute_partition_}")
return None
@@ -214,9 +186,9 @@ def get_mi300_num_xcds(gpu_model_, compute_partition_):
return num_xcds
def get_mi300_chip_id_dict():
if mi300_chip_id_dict:
return mi300_chip_id_dict
def get_chip_id_dict():
if chip_id_dict:
return chip_id_dict
else:
console_error(
"mi300_chip_id_dict not yet populated, did you run parse_mi_gpu_spec()?"
@@ -9,11 +9,11 @@
# MI GPUs
# |-- series: the specific MI series; mi50, mi100, mi200, mi300
# |-- architecture: currently, only mi300 gpus hold different architectures
# |-- models
# |-- chip_ids: chip id is specific to the environment the gpu is being used on
# |-- partition_mode: currently, only mi300 gpus hold partition mode information
# two types: compute partition mode, memory partition mode,
# currently only mi300 gpus contains compute partition mode information on number of xcds
# |-- gpu model
# |-- chip_ids: chip id is specific to the environment the gpu is being used on
# |-- partition_mode
# | -- compute partition mode
# | -- memory partition mode
#
# --------------------------------------------------------------------------------
@@ -23,45 +23,31 @@ mi_gpu_spec:
- gpu_arch: gfx906
models:
- gpu_model: mi50
partition_mode: null
chip_ids:
physical: null
virtual: null
- gpu_model: mi60
partition_mode: null
chip_ids:
physical: null
virtual: null
- gpu_series: mi100
gpu_archs:
- gpu_arch: gfx908
models:
- gpu_model: mi100
partition_mode: null
chip_ids:
physical: 29580
virtual: null
- gpu_series: mi200
gpu_archs:
- gpu_arch: gfx90a
models:
- gpu_model: mi210
partition_mode: null
chip_ids:
physical: 29711
virtual: null
- gpu_model: mi250
partition_mode: null
chip_ids:
physical: 29708
virtual: null
- gpu_model: mi250x
partition_mode: null
chip_ids:
physical: 29704
virtual: null
- gpu_model: mi250
- gpu_model: mi250x
- gpu_series: mi300
gpu_archs:
@@ -72,16 +58,10 @@ mi_gpu_spec:
compute_partition_mode:
num_xcds:
spx: 6
dpx: null
tpx: 2
qpx: null
cpx: null
memory_partition_mode:
nps4: [tpx]
nps1: [spx, tpx]
chip_ids:
physical: null
virtual: null
- gpu_arch: gfx941
models:
@@ -91,15 +71,11 @@ mi_gpu_spec:
num_xcds:
spx: 8
dpx: 4
tpx: null
qpx: 2
cpx: 1
memory_partition_mode:
nps4: [qpx, cpx]
nps1: [spx, qpx, cpx]
chip_ids:
physical: null
virtual: null
- gpu_arch: gfx942
models:
@@ -108,10 +84,7 @@ mi_gpu_spec:
compute_partition_mode:
num_xcds:
spx: 6
dpx: null
tpx: 2
qpx: null
cpx: null
memory_partition_mode:
nps4: [tpx]
nps1: [spx, tpx]
@@ -125,7 +98,6 @@ mi_gpu_spec:
num_xcds:
spx: 8
dpx: 4
tpx: null
qpx: 2
cpx: 1
memory_partition_mode:
@@ -141,8 +113,6 @@ mi_gpu_spec:
num_xcds:
spx: 4
dpx: 2
tpx: null
qpx: null
cpx: 1
memory_partition_mode:
nps4: [cpx]
@@ -150,3 +120,21 @@ mi_gpu_spec:
chip_ids:
physical: 29858
virtual: 29878
- gpu_series: mi350
gpu_archs:
- gpu_arch: gfx950
models:
- gpu_model: mi350
partition_mode:
compute_partition_mode:
num_xcds:
spx: 8
dpx: 4
qpx: 2
cpx: 1
memory_partition_mode:
nps1: [spx, dpx, qpx, cpx]
nps4: [qpx, cpx]
chip_ids:
physical: 30112
@@ -86,6 +86,7 @@ build_in_vars = {
0) / $max_waves_per_cu) * 8) + MIN(MOD(ROUND(AVG(((4 * SQ_BUSY_CU_CYCLES) \
/ $GRBM_GUI_ACTIVE_PER_XCD)), 0), $max_waves_per_cu), 8)), $cu_per_gpu))",
"kernelBusyCycles": "ROUND(AVG((((End_Timestamp - Start_Timestamp) / 1000) * $max_sclk)), 0)",
"hbmBandwidth": "($max_mclk / 1000 * 32 * $num_hbm_channels)",
}
supported_call = {
@@ -700,19 +701,80 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
console_error("Hauting execution for warning above.")
ammolite__se_per_gpu = int(sys_info.se_per_gpu)
if np.isnan(ammolite__se_per_gpu) or ammolite__se_per_gpu == 0:
console_warning(
"se_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__pipes_per_gpu = int(sys_info.pipes_per_gpu)
if np.isnan(ammolite__pipes_per_gpu) or ammolite__pipes_per_gpu == 0:
console_warning(
"pipes_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__cu_per_gpu = int(sys_info.cu_per_gpu)
if np.isnan(ammolite__cu_per_gpu) or ammolite__cu_per_gpu == 0:
console_warning(
"cu_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__simd_per_cu = int(sys_info.simd_per_cu) # not used
if np.isnan(ammolite__simd_per_cu) or ammolite__simd_per_cu == 0:
console_warning(
"simd_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__sqc_per_gpu = int(sys_info.sqc_per_gpu)
if np.isnan(ammolite__sqc_per_gpu) or ammolite__sqc_per_gpu == 0:
console_warning(
"sqc_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__lds_banks_per_cu = int(sys_info.lds_banks_per_cu)
if np.isnan(ammolite__lds_banks_per_cu) or ammolite__lds_banks_per_cu == 0:
console_warning(
"lds_banks_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__cur_sclk = float(sys_info.cur_sclk) # not used
ammolite__mclk = float(sys_info.cur_mclk) # not used
if np.isnan(ammolite__cur_sclk) or ammolite__cur_sclk == 0:
console_warning(
"cur_sclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__cur_mclk = float(sys_info.cur_mclk) # not used
if np.isnan(ammolite__cur_mclk) or ammolite__cur_mclk == 0:
console_warning(
"cur_mclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__max_mclk = float(sys_info.max_mclk)
if np.isnan(ammolite__max_mclk) or ammolite__max_mclk == 0:
console_warning(
"max_mclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__max_sclk = float(sys_info.max_sclk)
if np.isnan(ammolite__max_sclk) or ammolite__max_sclk == 0:
console_warning(
"max_sclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__max_waves_per_cu = int(sys_info.max_waves_per_cu)
ammolite__hbm_bw = float(sys_info.hbm_bw)
if np.isnan(ammolite__max_waves_per_cu) or ammolite__max_waves_per_cu == 0:
console_warning(
"max_waver_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__num_hbm_channels = float(sys_info.num_hbm_channels)
if np.isnan(ammolite__num_hbm_channels) or ammolite__num_hbm_channels == 0:
console_warning(
"num_hbm_channels is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__total_l2_chan = calc_builtin_var("$total_l2_chan", sys_info)
if np.isnan(ammolite__total_l2_chan) or ammolite__total_l2_chan == 0:
console_warning(
"total_l2_chan is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__num_xcd = int(sys_info.num_xcd)
if np.isnan(ammolite__num_xcd) or ammolite__num_xcd == 0:
console_warning(
"num_xcd is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
ammolite__wave_size = int(sys_info.wave_size)
if np.isnan(ammolite__wave_size) or ammolite__wave_size == 0:
console_warning(
"wave_size is not available in sysinfo.csv, please provide the correct value using --specs-correction"
)
# TODO: fix all $normUnit in Unit column or title
@@ -751,6 +813,7 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
ammolite__build_in[key] = None
ammolite__numActiveCUs = ammolite__build_in["numActiveCUs"]
ammolite__kernelBusyCycles = ammolite__build_in["kernelBusyCycles"]
ammolite__hbmBandwidth = ammolite__build_in["hbmBandwidth"]
# Hmmm... apply + lambda should just work
# df['Value'] = df['Value'].apply(lambda s: eval(compile(str(s), '<string>', 'eval')))
@@ -821,7 +884,6 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
else:
console_error("analysis", str(ae))
# print("eval_metric", id, expr)
try:
out = eval(compile(row[expr], "<string>", "eval"))
@@ -39,9 +39,9 @@ import pandas as pd
import config
from utils.logger import console_debug, console_error, console_log, console_warning
from utils.mi_gpu_spec import get_gpu_series_dict, get_mi300_chip_id_dict
from utils.mi_gpu_spec import get_chip_id_dict, get_gpu_series_dict, get_num_xcds
from utils.tty import get_table_string
from utils.utils import get_version, total_xcds
from utils.utils import get_version
VERSION_LOC = [
"version",
@@ -72,7 +72,6 @@ def detect_arch(_rocminfo):
def detect_gpu_chip_id(_rocminfo):
gpu_chip_id = None
mi300_chip_id_dict = get_mi300_chip_id_dict().keys()
for idx1, linetext in enumerate(_rocminfo):
# NOTE: current supported socs only have numbers in Chip ID
@@ -84,8 +83,8 @@ def detect_gpu_chip_id(_rocminfo):
if not gpu_chip_id:
console_warning("No Chip ID detected: " + str(gpu_chip_id))
elif (
gpu_chip_id not in mi300_chip_id_dict
and int(gpu_chip_id) not in mi300_chip_id_dict
gpu_chip_id not in get_chip_id_dict().keys()
and int(gpu_chip_id) not in get_chip_id_dict().keys()
):
console_warning("Unknown Chip ID detected: " + str(gpu_chip_id))
return gpu_chip_id
@@ -214,7 +213,7 @@ def generate_machine_specs(args, sysinfo: dict = None):
specs.total_l2_chan: str = total_l2_banks(
specs.gpu_model, int(specs._l2_banks), specs.compute_partition
)
specs.hbm_bw: str = str(int(specs.max_mclk) / 1000 * 32 * specs.get_hbm_channels())
specs.num_hbm_channels: str = str(specs.get_hbm_channels())
return specs
@@ -518,15 +517,6 @@ class MachineSpecs:
"name": "Pipes per GPU",
},
)
hbm_bw: str = field(
default=None,
metadata={
"doc": "The peak theoretical HBM bandwidth for the accelerators/GPUs in the system. On systems with\n"
"configurable partitioning, (e.g., MI300) this is the peak theoretical HBM bandwidth for a partition.",
"name": "HBM BW",
"unit": "GB/s",
},
)
num_xcd: str = field(
default=None,
metadata={
@@ -536,14 +526,13 @@ class MachineSpecs:
"unit": "XCDs",
},
)
num_hbm_channels: str = field(
default=None,
metadata={"doc": "Number of HBM channels", "name": "HBM channels"},
)
def get_hbm_channels(self):
# check MI300 has a valid compute partition
mi300a_archs = ["mi300a_a0", "mi300a_a1"]
mi300x_archs = ["mi300x_a0", "mi300x_a1"]
mi308x_archs = ["mi308x"]
if self.gpu_model.lower() in mi300a_archs + mi300x_archs + mi308x_archs:
if self.memory_partition.lower().startswith("nps"):
hbmchannels = 128
if self.memory_partition.lower() == "nps2":
hbmchannels /= 2
@@ -551,10 +540,9 @@ class MachineSpecs:
hbmchannels /= 4
elif self.memory_partition.lower() == "nps8":
hbmchannels /= 8
return int(hbmchannels)
return hbmchannels
else:
hbmchannels = int(self.total_l2_chan)
return hbmchannels
return int(self.total_l2_chan)
def get_class_members(self):
all_populated = True
@@ -581,7 +569,7 @@ class MachineSpecs:
data[name] = value
if not all_populated:
console_error("Missing specs fields for %s" % self.gpu_arch)
console_warning("Missing specs fields for %s" % self.gpu_arch)
return pd.DataFrame(data, index=[0])
def __repr__(self):
@@ -682,7 +670,7 @@ def total_sqc(archname, numCUs, numSEs):
def total_l2_banks(archname, L2Banks, compute_partition):
xcds = total_xcds(archname, compute_partition)
xcds = get_num_xcds(archname, compute_partition)
totalL2Banks = L2Banks * xcds
return totalL2Banks
@@ -43,16 +43,32 @@ import pandas as pd
import config
from utils.logger import console_debug, console_error, console_log, console_warning
from utils.mi_gpu_spec import get_mi300_num_xcds
from utils.mi_gpu_spec import get_num_xcds
rocprof_cmd = ""
rocprof_args = ""
spi_pipe_counter_regexs = [r"SPI_CS\d+_(.*)", r"SPI_CSQ_P\d+_(.*)"]
def is_tcc_channel_counter(counter):
return counter.startswith("TCC") and counter.endswith("]")
def is_spi_pipe_counter(counter):
for pattern in spi_pipe_counter_regexs:
if re.match(pattern, counter):
return True
return False
def get_base_spi_pipe_counter(counter):
for pattern in spi_pipe_counter_regexs:
match = re.match(pattern, counter)
if match:
return match.group(1)
return ""
def using_v1():
return "ROCPROF" not in os.environ.keys() or (
@@ -571,12 +587,7 @@ def run_prof(
# set required env var for mi300
new_env = None
if (
mspec.gpu_model.lower() == "mi300x_a0"
or mspec.gpu_model.lower() == "mi300x_a1"
or mspec.gpu_model.lower() == "mi300a_a0"
or mspec.gpu_model.lower() == "mi300a_a1"
):
if mspec.gpu_model.lower() not in ("mi50", "mi60", "mi210", "mi250", "mi250x"):
new_env = os.environ.copy()
new_env["ROCPROFILER_INDIVIDUAL_XCC_MODE"] = "1"
@@ -661,7 +672,7 @@ def run_prof(
if new_env and not using_v3() and not using_v1():
# flatten tcc for applicable mi300 input
f = path(workload_dir + "/out/pmc_1/results_" + fbase + ".csv")
xcds = total_xcds(mspec.gpu_model, mspec.compute_partition)
xcds = get_num_xcds(mspec.gpu_model, mspec.compute_partition)
df = flatten_tcc_info_across_xcds(f, xcds, int(mspec._l2_banks))
df.to_csv(f, index=False)
@@ -1065,62 +1076,6 @@ def flatten_tcc_info_across_xcds(file, xcds, tcc_channel_per_xcd):
return df
def total_xcds(gpu_model, compute_partition):
"""
Returns the number of xcds for a gpu model and compute_partition pair.
"""
# For mi300 chips, return result from mi_gpu_spec
result = get_mi300_num_xcds(gpu_model, compute_partition)
if result:
return result
# For other systems, use manual check
# check MI300 has a valid compute partition
mi300a_model = ["mi300a_a0", "mi300a_a1"]
mi300x_model = ["mi300x_a0", "mi300x_a1"]
mi308x_model = ["mi308x"]
if (
gpu_model.lower() in mi300a_model + mi300x_model + mi308x_model
and compute_partition == "NA"
):
console_error("Invalid compute partition found for {}".format(gpu_model))
if gpu_model.lower() not in mi300a_model + mi300x_model + mi308x_model:
return 1
# from the whitepaper
# https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
if compute_partition.lower() == "spx":
if gpu_model.lower() in mi300a_model:
return 6
if gpu_model.lower() in mi300x_model:
return 8
if gpu_model.lower() in mi308x_model:
return 4
if compute_partition.lower() == "tpx":
if gpu_model.lower() in mi300a_model:
return 2
if compute_partition.lower() == "dpx":
if gpu_model.lower() in mi300x_model:
return 4
if gpu_model.lower() in mi308x_model:
return 2
if compute_partition.lower() == "qpx":
if gpu_model.lower() in mi300x_model:
return 2
if compute_partition.lower() == "cpx":
if gpu_model.lower() in mi300x_model:
return 1
if gpu_model.lower() in mi308x_model:
return 1
# TODO implement other archs here as needed
console_error(
"Unknown compute partition / arch found for {} / {}".format(
compute_partition, gpu_model
)
)
def get_submodules(package_name):
"""List all submodules for a target package"""
import importlib
@@ -136,7 +136,7 @@ def test_L1_cache_counters(
options,
check_success=False,
roof=False,
app_name=app_name
app_name=app_name,
)
assert return_code == 0
@@ -15,6 +15,7 @@ indirs = [
"tests/workloads/vcopy/MI200",
"tests/workloads/vcopy/MI300A_A1",
"tests/workloads/vcopy/MI300X_A1",
"tests/workloads/vcopy/MI350",
]
@@ -255,9 +256,13 @@ def test_dispatch_5(binary_handler_analyze_rocprof_compute):
@pytest.mark.misc
def test_gpu_ids(binary_handler_analyze_rocprof_compute):
for dir in indirs:
if dir.endswith("MI350"):
gpu_id = "0"
else:
gpu_id = "2"
workload_dir = test_utils.setup_workload_dir(dir)
code = binary_handler_analyze_rocprof_compute(
["analyze", "--path", workload_dir, "--gpu-id", "2"]
["analyze", "--path", workload_dir, "--gpu-id", gpu_id]
)
assert code == 0
@@ -112,6 +112,13 @@ def test_analyze_ipblocks_TCC_MI200(binary_handler_analyze_rocprof_compute):
assert code == 0
def test_analyze_no_roof_MI350(binary_handler_analyze_rocprof_compute):
code = binary_handler_analyze_rocprof_compute(
["analyze", "--path", "tests/workloads/no_roof/MI350"]
)
assert code == 0
def test_analyze_no_roof_MI300X_A1(binary_handler_analyze_rocprof_compute):
code = binary_handler_analyze_rocprof_compute(
["analyze", "--path", "tests/workloads/no_roof/MI300X_A1"]
@@ -14,6 +14,7 @@ import test_utils
# Globals
# TODO: MI350 What are the gpu models in MI 350 series
SUPPORTED_ARCHS = {
"gfx906": {"mi50": ["MI50", "MI60"]},
"gfx908": {"mi100": ["MI100"]},
@@ -21,12 +22,14 @@ SUPPORTED_ARCHS = {
"gfx940": {"mi300": ["MI300A_A0"]},
"gfx941": {"mi300": ["MI300X_A0"]},
"gfx942": {"mi300": ["MI300A_A1", "MI300X_A1"]},
"gfx950": {"mi350": ["MI350"]},
}
MI300_CHIP_IDS = {
CHIP_IDS = {
"29856": "MI300A_A1",
"29857": "MI300X_A1",
"29858": "MI308X",
"30112": "MI350",
}
@@ -106,6 +109,25 @@ ALL_CSVS_MI300 = sorted(
"timestamps.csv",
]
)
ALL_CSVS_MI350 = sorted(
[
"SQ_IFETCH_LEVEL.csv",
"SQ_INST_LEVEL_LDS.csv",
"SQ_INST_LEVEL_SMEM.csv",
"SQ_INST_LEVEL_VMEM.csv",
"SQ_LEVEL_WAVES.csv",
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
)
ROOF_ONLY_FILES = sorted(
[
@@ -290,9 +312,9 @@ def gpu_soc():
## 3) Deduce gpu model name from arch
gpu_model = list(SUPPORTED_ARCHS[gpu_arch].keys())[0].upper()
if gpu_model == "MI300":
if chip_id in MI300_CHIP_IDS:
gpu_model = MI300_CHIP_IDS[chip_id]
if gpu_model not in ("MI50", "MI100", "MI200"):
if chip_id in CHIP_IDS:
gpu_model = CHIP_IDS[chip_id]
return gpu_model
@@ -303,6 +325,9 @@ soc = gpu_soc()
if "MI300" in soc:
os.environ["ROCPROF"] = "rocprofv2"
if "MI350" in soc:
os.environ["ROCPROF"] = "rocprofv3"
Baseline_dir = str(Path("tests/workloads/vcopy/" + soc).resolve())
@@ -491,6 +516,8 @@ def test_path(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("This test is not supported for {}".format(soc))
assert 0
@@ -502,7 +529,7 @@ def test_path(binary_handler_profile_rocprof_compute):
@pytest.mark.misc
def test_roof_kernel_names(binary_handler_profile_rocprof_compute):
if soc == "MI100":
if soc in ("MI100", "MI350"):
# roofline is not supported on MI100
assert True
# Do not continue testing
@@ -517,7 +544,7 @@ def test_roof_kernel_names(binary_handler_profile_rocprof_compute):
# assert successful run
assert returncode == 0
file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
if soc == "MI200" or "MI300" in soc:
if soc == "MI200" in soc or "MI300" in soc:
assert sorted(list(file_dict.keys())) == sorted(
ROOF_ONLY_FILES + ["kernelName_legend.pdf"]
)
@@ -546,6 +573,8 @@ def test_device_filter(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("Testing isn't supported yet for {}".format(soc))
assert 0
@@ -574,6 +603,8 @@ def test_kernel(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("Testing isn't supported yet for {}".format(soc))
assert 0
@@ -625,6 +656,24 @@ def test_block_SQ(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"SQ_IFETCH_LEVEL.csv",
"SQ_INST_LEVEL_LDS.csv",
"SQ_INST_LEVEL_SMEM.csv",
"SQ_INST_LEVEL_VMEM.csv",
"SQ_LEVEL_WAVES.csv",
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -652,6 +701,8 @@ def test_block_SQC(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs.remove("timestamps.csv")
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -684,6 +735,8 @@ def test_block_TA(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs.remove("timestamps.csv")
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -721,6 +774,15 @@ def test_block_TD(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -771,6 +833,8 @@ def test_block_TCP(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs.remove("timestamps.csv")
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -825,6 +889,8 @@ def test_block_TCC(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs.remove("timestamps.csv")
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -857,6 +923,23 @@ def test_block_SPI(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"pmc_perf_8.csv",
"pmc_perf_9.csv",
"pmc_perf_10.csv",
"pmc_perf_11.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -886,6 +969,19 @@ def test_block_CPC(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -910,6 +1006,8 @@ def test_block_CPF(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs.remove("timestamps.csv")
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
validate(
@@ -959,6 +1057,24 @@ def test_block_SQ_CPC(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"SQ_IFETCH_LEVEL.csv",
"SQ_INST_LEVEL_LDS.csv",
"SQ_INST_LEVEL_SMEM.csv",
"SQ_INST_LEVEL_VMEM.csv",
"SQ_LEVEL_WAVES.csv",
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -1009,6 +1125,24 @@ def test_block_SQ_TA(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"SQ_IFETCH_LEVEL.csv",
"SQ_INST_LEVEL_LDS.csv",
"SQ_INST_LEVEL_SMEM.csv",
"SQ_INST_LEVEL_VMEM.csv",
"SQ_LEVEL_WAVES.csv",
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -1055,6 +1189,24 @@ def test_block_SQ_SPI(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"SQ_IFETCH_LEVEL.csv",
"SQ_INST_LEVEL_LDS.csv",
"SQ_INST_LEVEL_SMEM.csv",
"SQ_INST_LEVEL_VMEM.csv",
"SQ_LEVEL_WAVES.csv",
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -1106,6 +1258,24 @@ def test_block_SQ_SQC_TCP_CPC(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"SQ_IFETCH_LEVEL.csv",
"SQ_INST_LEVEL_LDS.csv",
"SQ_INST_LEVEL_SMEM.csv",
"SQ_INST_LEVEL_VMEM.csv",
"SQ_LEVEL_WAVES.csv",
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -1171,6 +1341,24 @@ def test_block_SQ_SPI_TA_TCC_CPF(binary_handler_profile_rocprof_compute):
"sysinfo.csv",
"timestamps.csv",
]
if soc == "MI350":
expected_csvs = [
"SQ_IFETCH_LEVEL.csv",
"SQ_INST_LEVEL_LDS.csv",
"SQ_INST_LEVEL_SMEM.csv",
"SQ_INST_LEVEL_VMEM.csv",
"SQ_LEVEL_WAVES.csv",
"pmc_perf.csv",
"pmc_perf_0.csv",
"pmc_perf_1.csv",
"pmc_perf_2.csv",
"pmc_perf_3.csv",
"pmc_perf_4.csv",
"pmc_perf_5.csv",
"pmc_perf_6.csv",
"pmc_perf_7.csv",
"sysinfo.csv",
]
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
@@ -1196,6 +1384,8 @@ def test_dispatch_0(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("Testing isn't supported yet for {}".format(soc))
assert 0
@@ -1226,6 +1416,8 @@ def test_dispatch_0_1(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("Testing isn't supported yet for {}".format(soc))
assert 0
@@ -1253,6 +1445,8 @@ def test_dispatch_2(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("Testing isn't supported yet for {}".format(soc))
assert 0
@@ -1283,6 +1477,8 @@ def test_join_type_grid(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("Testing isn't supported yet for {}".format(soc))
assert 0
@@ -1310,6 +1506,8 @@ def test_join_type_kernel(binary_handler_profile_rocprof_compute):
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
elif "MI300" in soc:
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
elif "MI350" in soc:
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
else:
print("Testing isn't supported yet for {}".format(soc))
assert 0
@@ -1326,7 +1524,7 @@ def test_join_type_kernel(binary_handler_profile_rocprof_compute):
@pytest.mark.sort
def test_roof_sort_dispatches(binary_handler_profile_rocprof_compute):
# only test 1 device for roofline
if soc == "MI100":
if soc in ("MI100", "MI350"):
# roofline is not supported on MI100
assert True
# Do not continue testing
@@ -1356,7 +1554,7 @@ def test_roof_sort_dispatches(binary_handler_profile_rocprof_compute):
@pytest.mark.sort
def test_roof_sort_kernels(binary_handler_profile_rocprof_compute):
# only test 1 device for roofline
if soc == "MI100":
if soc in ("MI100", "MI350"):
# roofline is not supported on MI100
assert True
# Do not continue testing
@@ -1386,7 +1584,7 @@ def test_roof_sort_kernels(binary_handler_profile_rocprof_compute):
@pytest.mark.mem
def test_roof_mem_levels_vL1D(binary_handler_profile_rocprof_compute):
# only test 1 device for roofline
if soc == "MI100":
if soc in ("MI100", "MI350"):
# roofline is not supported on MI100
assert True
# Do not continue testing
@@ -1416,7 +1614,7 @@ def test_roof_mem_levels_vL1D(binary_handler_profile_rocprof_compute):
@pytest.mark.mem
def test_roof_mem_levels_LDS(binary_handler_profile_rocprof_compute):
# only test 1 device for roofline
if soc == "MI100":
if soc in ("MI100", "MI350"):
# roofline is not supported on MI100
assert True
# Do not continue testing
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:54:18 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_filter path ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Thu 21 Mar 2024 03:54:18 PM (CDT) Thu 21 Mar 2024 03:52:12 PM (CDT) 2 t007-001.hpcfund AMD EPYC 7V13 64-Core Processor American Megatrends Inc.0602 Rocky Linux 9.1 (Blue Onyx) 5.14.0-162.18.1.el9_1.x86_64 527651008 6.0.2-115 113-D3431401-100 NA NA MI100 gfx908 16 8192 120 4 8 64 1024 40 1502 1200 1502 1200 32 32 64 4 1228.8 1 32
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:35:56 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_filter path ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline Thu 21 Mar 2024 04:35:56 PM (CDT) Thu 21 Mar 2024 04:16:46 PM (CDT) 2 t007-002.hpcfund AMD EPYC 7V13 64-Core Processor American Megatrends Inc.0602 Rocky Linux 9.1 (Blue Onyx) 5.14.0-162.18.1.el9_1.x86_64 527650760 6.0.2-115 113-D67301-059 NA NA MI200 gfx90a 16 8192 104 4 8 64 1024 32 1700 1600 1700 1600 32 32 56 4 1638.4 1 32
@@ -0,0 +1,4 @@
Dispatch_ID,Kernel_Name,GPU_ID
0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
1 Dispatch_ID Kernel_Name GPU_ID
2 0 vecCopy(double*, double*, double*, int, int) (.kd) 11995
3 1 vecCopy(double*, double*, double*, int, int) (.kd) 11995
4 2 vecCopy(double*, double*, double*, int, int) (.kd) 11995
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:39:25 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_filter vcopy ./tests/vcopy -n 1048576 -b 256 -i 3 tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Wed 29 May 2024 01:39:25 PM (CDT) Thu 30 May 2024 02:09:51 PM (CDT) 2 sh5-1w300-rg3-3 AMD Instinct MI300A Accelerator American Megatrends International, LLC.RMO1002DS Ubuntu 22.04.2 LTS 5.18.2-mi300-build-140423-ubuntu-22.04+ 131174852 6.1.2-110 N/A SPX NPS1 MI300A_A1 gfx942 32 24576 228 4 24 64 1024 32 2100 1300 2100 1300 96 32 120 4 5324.8 6 96
@@ -0,0 +1,4 @@
Dispatch_ID,Kernel_Name,GPU_ID
0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
1 Dispatch_ID Kernel_Name GPU_ID
2 0 vecCopy(double*, double*, double*, int, int) (.kd) 60633
3 1 vecCopy(double*, double*, double*, int, int) (.kd) 60633
4 2 vecCopy(double*, double*, double*, int, int) (.kd) 60633
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:03:10 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_filter vcopy ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Wed 29 May 2024 12:03:10 PM (CDT) Thu 30 May 2024 02:19:39 PM (CDT) 2 splinter-126-wr-c6 AMD Ryzen 9 7950X 16-Core Processor American Megatrends International, LLC.VS2683299N.FD Ubuntu 22.04.4 LTS 5.18.2-mi300-build-140423-ubuntu-22.04+ 114656528 6.2.0-13611 113-MI3SRIOV-001 SPX NPS1 MI300X_A1 gfx942 32 4096 304 4 32 64 1024 32 2100 1300 2100 1300 128 32 160 4 5324.8 8 128
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:53:52 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_inv_int path ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Thu 21 Mar 2024 03:53:52 PM (CDT) Thu 21 Mar 2024 03:52:12 PM (CDT) 2 t007-001.hpcfund AMD EPYC 7V13 64-Core Processor American Megatrends Inc.0602 Rocky Linux 9.1 (Blue Onyx) 5.14.0-162.18.1.el9_1.x86_64 527651008 6.0.2-115 113-D3431401-100 NA NA MI100 gfx908 16 8192 120 4 8 64 1024 40 1502 1200 1502 1200 32 32 64 4 1228.8 1 32
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:33:56 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_inv_int path ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline Thu 21 Mar 2024 04:33:56 PM (CDT) Thu 21 Mar 2024 04:16:46 PM (CDT) 2 t007-002.hpcfund AMD EPYC 7V13 64-Core Processor American Megatrends Inc.0602 Rocky Linux 9.1 (Blue Onyx) 5.14.0-162.18.1.el9_1.x86_64 527650760 6.0.2-115 113-D67301-059 NA NA MI200 gfx90a 16 8192 104 4 8 64 1024 32 1700 1600 1700 1600 32 32 56 4 1638.4 1 32
@@ -0,0 +1,4 @@
Dispatch_ID,Kernel_Name,GPU_ID
0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
1 Dispatch_ID Kernel_Name GPU_ID
2 0 vecCopy(double*, double*, double*, int, int) (.kd) 11995
3 1 vecCopy(double*, double*, double*, int, int) (.kd) 11995
4 2 vecCopy(double*, double*, double*, int, int) (.kd) 11995
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:38:17 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_inv_int vcopy ./tests/vcopy -n 1048576 -b 256 -i 3 tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Wed 29 May 2024 01:38:17 PM (CDT) Thu 30 May 2024 02:09:51 PM (CDT) 2 sh5-1w300-rg3-3 AMD Instinct MI300A Accelerator American Megatrends International, LLC.RMO1002DS Ubuntu 22.04.2 LTS 5.18.2-mi300-build-140423-ubuntu-22.04+ 131174852 6.1.2-110 N/A SPX NPS1 MI300A_A1 gfx942 32 24576 228 4 24 64 1024 32 2100 1300 2100 1300 96 32 120 4 5324.8 6 96
@@ -0,0 +1,4 @@
Dispatch_ID,Kernel_Name,GPU_ID
0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
1 Dispatch_ID Kernel_Name GPU_ID
2 0 vecCopy(double*, double*, double*, int, int) (.kd) 60633
3 1 vecCopy(double*, double*, double*, int, int) (.kd) 60633
4 2 vecCopy(double*, double*, double*, int, int) (.kd) 60633
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:02:25 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 device_inv_int vcopy ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Wed 29 May 2024 12:02:25 PM (CDT) Thu 30 May 2024 02:19:39 PM (CDT) 2 splinter-126-wr-c6 AMD Ryzen 9 7950X 16-Core Processor American Megatrends International, LLC.VS2683299N.FD Ubuntu 22.04.4 LTS 5.18.2-mi300-build-140423-ubuntu-22.04+ 114656528 6.2.0-13611 113-MI3SRIOV-001 SPX NPS1 MI300X_A1 gfx942 32 4096 304 4 32 64 1024 32 2100 1300 2100 1300 128 32 160 4 5324.8 8 128
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:53:14 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 dispatch_0 path ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Thu 21 Mar 2024 03:53:14 PM (CDT) Thu 21 Mar 2024 03:52:12 PM (CDT) 2 t007-001.hpcfund AMD EPYC 7V13 64-Core Processor American Megatrends Inc.0602 Rocky Linux 9.1 (Blue Onyx) 5.14.0-162.18.1.el9_1.x86_64 527651008 6.0.2-115 113-D3431401-100 NA NA MI100 gfx908 16 8192 120 4 8 64 1024 40 1502 1200 1502 1200 32 32 64 4 1228.8 1 32
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:24:01 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 dispatch_0 path ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline Thu 21 Mar 2024 04:24:01 PM (CDT) Thu 21 Mar 2024 04:16:46 PM (CDT) 2 t007-002.hpcfund AMD EPYC 7V13 64-Core Processor American Megatrends Inc.0602 Rocky Linux 9.1 (Blue Onyx) 5.14.0-162.18.1.el9_1.x86_64 527650760 6.0.2-115 113-D67301-059 NA NA MI200 gfx90a 16 8192 104 4 8 64 1024 32 1700 1600 1700 1600 32 32 56 4 1638.4 1 32
@@ -0,0 +1,4 @@
Dispatch_ID,Kernel_Name,GPU_ID
0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
1 Dispatch_ID Kernel_Name GPU_ID
2 0 vecCopy(double*, double*, double*, int, int) (.kd) 11995
3 1 vecCopy(double*, double*, double*, int, int) (.kd) 11995
4 2 vecCopy(double*, double*, double*, int, int) (.kd) 11995
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:36:42 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 dispatch_0 vcopy ./tests/vcopy -n 1048576 -b 256 -i 3 tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Wed 29 May 2024 01:36:42 PM (CDT) Thu 30 May 2024 02:09:51 PM (CDT) 2 sh5-1w300-rg3-3 AMD Instinct MI300A Accelerator American Megatrends International, LLC.RMO1002DS Ubuntu 22.04.2 LTS 5.18.2-mi300-build-140423-ubuntu-22.04+ 131174852 6.1.2-110 N/A SPX NPS1 MI300A_A1 gfx942 32 24576 228 4 24 64 1024 32 2100 1300 2100 1300 96 32 120 4 5324.8 6 96
@@ -0,0 +1,4 @@
Dispatch_ID,Kernel_Name,GPU_ID
0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
1 Dispatch_ID Kernel_Name GPU_ID
2 0 vecCopy(double*, double*, double*, int, int) (.kd) 60633
3 1 vecCopy(double*, double*, double*, int, int) (.kd) 60633
4 2 vecCopy(double*, double*, double*, int, int) (.kd) 60633
@@ -1,2 +1,2 @@
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:01:22 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
1 workload_name command ip_blocks timestamp version hostname cpu_model sbios linux_distro linux_kernel_version amd_gpu_kernel_version cpu_memory gpu_memory rocm_version vbios compute_partition memory_partition gpu_model gpu_arch gpu_l1 gpu_l2 cu_per_gpu simd_per_cu se_per_gpu wave_size workgroup_max_size max_waves_per_cu max_sclk max_mclk cur_sclk cur_mclk total_l2_chan lds_banks_per_cu sqc_per_gpu pipes_per_gpu hbm_bw num_xcd num_hbm_channels
2 dispatch_0 vcopy ./tests/vcopy -n 1048576 -b 256 -i 3 SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF Wed 29 May 2024 12:01:22 PM (CDT) Thu 30 May 2024 02:19:39 PM (CDT) 2 splinter-126-wr-c6 AMD Ryzen 9 7950X 16-Core Processor American Megatrends International, LLC.VS2683299N.FD Ubuntu 22.04.4 LTS 5.18.2-mi300-build-140423-ubuntu-22.04+ 114656528 6.2.0-13611 113-MI3SRIOV-001 SPX NPS1 MI300X_A1 gfx942 32 4096 304 4 32 64 1024 32 2100 1300 2100 1300 128 32 160 4 5324.8 8 128

Some files were not shown because too many files have changed in this diff Show More