Support MI 350 profiling (#632)
* Add MI 350 hardware information
* Refactor MI GPU YAML file and corresponding interface
* Add SoC file for gfx950 architecture
* Add analysis report configs for MI 350 containing existing metrics
* Add placeholder None valued metrics for previous architectures to make
baseline comparison work
* Enable testing on MI 350
* Analysis config metric changes
- SPI changes
- Update metric formula for default SPI pipe counter
- Use efficiently collected pipe wise SPI counters
- Add SPI Wave Occupancy
- Add Scheduler-Pipe Wave Utilization
- Update formula for VGPR Writes
- Add Scheduler-Pipe FIFO Full Rate
- CPC changes
- Add CPC SYNC FIFO Full Rate
- Add CPC CANE Stall Rate
- Add CPC ADC Utilization
- SQ changes
- Add VALU co-issue efficiency
- Add F6F4 datatype metrics
- Update formula for total FLOPs by adding F6F4 counters
- Add LDS STORE / LOAD / ATOMIC metrics
- Add LDS STORE / LOAD / ATOMIC bandwidth
- Add LDS FIFO and TA ADDR / CMD / DATA FIFO full rates
* Collect TCP_TCP_LATENCY_sum only for gfx950 (MI 350)
* Do not inject SQ_ACCUM_PREV_HIRES unnecesarily
* Do not hardcode memory and shader clock speeds
* Write num_hbm_channels to sysinfo.csv instead of hbm_bw while profiling
* Move generate sysinfo.csv to pre processing step of profiling
* Add warnings to use --specs-correction for missing sysinfo.csv values during analysis phase
* Update CHANGELOG
* Analysis phase warning to use --specs-correction when needed
[ROCm/rocprofiler-compute commit: f9aa7be97c]
This commit is contained in:
zatwierdzone przez
GitHub
rodzic
1273a5e2a9
commit
27585a8a2b
@@ -22,17 +22,33 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
|
||||
* Support host-trap PC Sampling on CLI (beta version)
|
||||
|
||||
* Add support for tuned performance counters for gfx950 GPUs
|
||||
* Add L1 latencies
|
||||
* Add L2 latencies
|
||||
* Add L2 to EA stalls
|
||||
* Add L2 to EA stalls per channel
|
||||
* Support for AMD Instinct MI350 series GPUs with the addition of the following counters:
|
||||
* VALU co-issue (Two VALUs are issued instructions) efficiency
|
||||
* Stream Processor Instruction (SPI) Wave Occupancy
|
||||
* Scheduler-Pipe Wave Utilization
|
||||
* Scheduler FIFO Full Rate
|
||||
* CPC ADC Utilization
|
||||
* F6F4 datatype metrics
|
||||
* Update formula for total FLOPs while taking into account F6F4 ops
|
||||
* LDS STORE, LDS LOAD, LDS ATOMIC instruction count metrics
|
||||
* LDS STORE, LDS LOAD, LDS ATOMIC bandwidth metrics
|
||||
* LDS FIFO full rate
|
||||
* Sequencer -> TA ADDR Stall rates
|
||||
* Sequencer -> TA CMD Stall rates
|
||||
* Sequencer -> TA DATA Stall rates
|
||||
* L1 latencies
|
||||
* L2 latencies
|
||||
* L2 to EA stalls
|
||||
* L2 to EA stalls per channel
|
||||
|
||||
### Changed
|
||||
|
||||
* Change normal_unit default to per_kernel
|
||||
* Change dependency from rocm-smi to amd-smi
|
||||
* Decrease profiling time by not collecting counters not used in post analysis
|
||||
* Update definition of following metrics for MI 350:
|
||||
* VGPR Writes
|
||||
* Total FLOPs (consider fp6 and fp4 ops)
|
||||
|
||||
### Resolved issues
|
||||
|
||||
@@ -44,6 +60,14 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
|
||||
|
||||
* GPU id filtering is not supported when using rocprof v3
|
||||
|
||||
* Analysis of previously collected workload data will not work due to sysinfo.csv schema change
|
||||
* As a workaround, run the profiling operation again for the workload and interrupt the process after ten seconds.
|
||||
Followed by copying the `sysinfo.csv` file from the new data folder to the old one.
|
||||
This assumes your system specification hasn't changed since the creation of the previous workload data.
|
||||
|
||||
* Analysis of new workloads might require providing shader/memory clock speed using
|
||||
--specs-correction operation if `amd-smi` or `rocminfo` does not provide clock speeds.
|
||||
|
||||
## ROCm Compute Profiler 3.1.0 for ROCm 6.4.0
|
||||
|
||||
### Added
|
||||
|
||||
@@ -292,9 +292,8 @@ add_test(
|
||||
add_test(
|
||||
NAME test_L1_cache_counters
|
||||
COMMAND
|
||||
${Python3_EXECUTABLE} -m pytest -m L1_cache
|
||||
--junitxml=tests/test_TCP_counters.xml ${COV_OPTION}
|
||||
${PROJECT_SOURCE_DIR}/tests/test_TCP_counters.py
|
||||
${Python3_EXECUTABLE} -m pytest -m L1_cache --junitxml=tests/test_TCP_counters.xml
|
||||
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_TCP_counters.py
|
||||
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
|
||||
|
||||
# ---------
|
||||
|
||||
@@ -673,7 +673,7 @@ Examples:
|
||||
"--specs-correction",
|
||||
type=str,
|
||||
metavar="",
|
||||
help="\t\tSpecify the specs to correct.",
|
||||
help="\t\tSpecify the specs to correct. e.g. --specs-correction='specname1:specvalue1,specname2:specvalue2'",
|
||||
)
|
||||
analyze_advanced_group.add_argument(
|
||||
"--list-nodes",
|
||||
|
||||
@@ -107,7 +107,6 @@ class webui_analysis(OmniAnalyze_Base):
|
||||
console_debug("analysis", "gui normalization is %s" % norm_filt)
|
||||
|
||||
base_data = self.initalize_runs() # Re-initalizes everything
|
||||
hbm_bw = base_data[base_run].sys_info["hbm_bw"][0]
|
||||
panel_configs = copy.deepcopy(arch_configs.panel_configs)
|
||||
# Generate original raw df
|
||||
base_data[base_run].raw_pmc = file_io.create_df_pmc(
|
||||
@@ -231,7 +230,6 @@ class webui_analysis(OmniAnalyze_Base):
|
||||
norm_filt=norm_filt,
|
||||
comparable_columns=comparable_columns,
|
||||
decimal=self.get_args().decimal,
|
||||
hbm_bw=base_data[base_run].sys_info["hbm_bw"][0],
|
||||
)
|
||||
|
||||
# Update content for this section
|
||||
@@ -358,7 +356,6 @@ def determine_chart_type(
|
||||
norm_filt,
|
||||
comparable_columns,
|
||||
decimal,
|
||||
hbm_bw,
|
||||
):
|
||||
content = []
|
||||
|
||||
@@ -372,9 +369,7 @@ def determine_chart_type(
|
||||
# Determine chart type:
|
||||
# a) Barchart
|
||||
if table_config["id"] in [x for i in barchart_elements.values() for x in i]:
|
||||
d_figs = build_bar_chart(
|
||||
display_df, table_config, barchart_elements, norm_filt, hbm_bw
|
||||
)
|
||||
d_figs = build_bar_chart(display_df, table_config, barchart_elements, norm_filt)
|
||||
# Smaller formatting if barchart yeilds several graphs
|
||||
if (
|
||||
len(d_figs)
|
||||
|
||||
@@ -311,6 +311,21 @@ class RocProfCompute_Base:
|
||||
if self.__args.name.find(".") != -1 or self.__args.name.find("-") != -1:
|
||||
console_error("'-' and '.' are not permitted in -n/--name")
|
||||
|
||||
gen_sysinfo(
|
||||
workload_name=self.__args.name,
|
||||
workload_dir=self.get_args().path,
|
||||
ip_blocks=[
|
||||
name
|
||||
for name, type in self.__args.filter_blocks.items()
|
||||
if type == "hardware_block"
|
||||
],
|
||||
app_cmd=self.__args.remaining,
|
||||
skip_roof=self.__args.no_roof,
|
||||
roof_only=self.__args.roof_only,
|
||||
mspec=self._soc._mspec,
|
||||
soc=self._soc,
|
||||
)
|
||||
|
||||
@abstractmethod
|
||||
def run_profiling(self, version: str, prog: str):
|
||||
"""Run profiling."""
|
||||
@@ -446,21 +461,6 @@ class RocProfCompute_Base:
|
||||
"performing post-processing using %s profiler" % self.__profiler,
|
||||
)
|
||||
|
||||
gen_sysinfo(
|
||||
workload_name=self.__args.name,
|
||||
workload_dir=self.get_args().path,
|
||||
ip_blocks=[
|
||||
name
|
||||
for name, type in self.__args.filter_blocks.items()
|
||||
if type == "hardware_block"
|
||||
],
|
||||
app_cmd=self.__args.remaining,
|
||||
skip_roof=self.__args.no_roof,
|
||||
roof_only=self.__args.roof_only,
|
||||
mspec=self._soc._mspec,
|
||||
soc=self._soc,
|
||||
)
|
||||
|
||||
|
||||
def test_df_column_equality(df):
|
||||
return df.eq(df.iloc[:, 0], axis=0).all(1).all()
|
||||
|
||||
+11
-4
@@ -62,6 +62,13 @@ Panel Config:
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: None # No perf counter
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (Int8):
|
||||
value: None # No perf counter
|
||||
unit: GOPs
|
||||
@@ -179,17 +186,17 @@ Panel Config:
|
||||
value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Write BW:
|
||||
value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Read Latency:
|
||||
value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
|
||||
|
||||
+18
@@ -19,6 +19,24 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
CPC SYNC FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPC CANE Stall Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPC ADC Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPF Utilization:
|
||||
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
|
||||
+21
@@ -19,6 +19,13 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Schedule-Pipe Wave Occupancy:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Wave
|
||||
tips:
|
||||
Accelerator Utilization:
|
||||
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
@@ -31,6 +38,13 @@ Panel Config:
|
||||
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe Wave Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Workgroup Manager Utilization:
|
||||
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
@@ -108,6 +122,13 @@ Panel Config:
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Stall Rate:
|
||||
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
|
||||
+14
@@ -181,6 +181,13 @@ Panel Config:
|
||||
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
Spill/Stack Coalesceable Instr:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Global/Generic Coalesceable Instr:
|
||||
avg: None # No perf counter
|
||||
min: None # No perf counter
|
||||
@@ -283,3 +290,10 @@ Panel Config:
|
||||
max: None # No HW module
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA-F6F4:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
+21
@@ -61,6 +61,13 @@ Panel Config:
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (INT8):
|
||||
value: None # No perf counter
|
||||
unit: None
|
||||
@@ -109,6 +116,13 @@ Panel Config:
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
VALU Co-Issue Efficiency:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
avg: None # No HW module
|
||||
min: None # No HW module
|
||||
@@ -210,6 +224,13 @@ Panel Config:
|
||||
max: None # No perf counter
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
F6F4 OPs:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
INT8 OPs:
|
||||
avg: None # No perf counter
|
||||
min: None # No perf counter
|
||||
|
||||
+56
@@ -55,6 +55,48 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_LDS / $denom))
|
||||
unit: (Instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
Theoretical Bandwidth:
|
||||
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
@@ -116,3 +158,17 @@ Panel Config:
|
||||
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
unit: (Accesses + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Command FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Data FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -43,6 +43,27 @@ Panel Config:
|
||||
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Address Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Command Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Data Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Total Instructions:
|
||||
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
|
||||
@@ -40,6 +40,10 @@ Panel Config:
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
HBM Bandwidth:
|
||||
value: $hbmBandwidth
|
||||
unit: GB/s
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1702
|
||||
|
||||
+11
-4
@@ -62,6 +62,13 @@ Panel Config:
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: None # No perf counter
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (Int8):
|
||||
value: None # No perf counter
|
||||
unit: GOPs
|
||||
@@ -179,17 +186,17 @@ Panel Config:
|
||||
value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Write BW:
|
||||
value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Read Latency:
|
||||
value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
|
||||
|
||||
+21
@@ -19,6 +19,27 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC SYNC FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC CANE Stall Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC ADC Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPF Utilization:
|
||||
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
|
||||
+21
@@ -19,6 +19,13 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Schedule-Pipe Wave Occupancy:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Wave
|
||||
tips:
|
||||
Accelerator Utilization:
|
||||
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
@@ -31,6 +38,13 @@ Panel Config:
|
||||
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe Wave Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Workgroup Manager Utilization:
|
||||
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
@@ -108,6 +122,13 @@ Panel Config:
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Stall Rate:
|
||||
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
|
||||
+14
@@ -181,6 +181,13 @@ Panel Config:
|
||||
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
Spill/Stack Coalesceable Instr:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Global/Generic Read:
|
||||
avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
|
||||
@@ -271,3 +278,10 @@ Panel Config:
|
||||
max: None # No HW module
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA-F6F4:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -61,6 +61,13 @@ Panel Config:
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (INT8):
|
||||
value: None # No perf counter
|
||||
unit: None
|
||||
@@ -109,6 +116,13 @@ Panel Config:
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
VALU Co-Issue Efficiency:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
avg: None # No HW module
|
||||
min: None # No HW module
|
||||
@@ -210,6 +224,13 @@ Panel Config:
|
||||
max: None # No perf counter
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
F6F4 OPs:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
INT8 OPs:
|
||||
avg: None # No perf counter
|
||||
min: None # No perf counter
|
||||
|
||||
+56
@@ -55,6 +55,48 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_LDS / $denom))
|
||||
unit: (Instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
Theoretical Bandwidth:
|
||||
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
@@ -116,3 +158,17 @@ Panel Config:
|
||||
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
unit: (Accesses + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Command FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Data FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -43,6 +43,27 @@ Panel Config:
|
||||
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Address Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Command Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Data Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Total Instructions:
|
||||
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
|
||||
@@ -40,6 +40,10 @@ Panel Config:
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
HBM Bandwidth:
|
||||
value: $hbmBandwidth
|
||||
unit: GB/s
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1702
|
||||
|
||||
+11
-4
@@ -76,6 +76,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (Int8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -197,17 +204,17 @@ Panel Config:
|
||||
value: AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA_RDREQ_32B_sum * 32) + ((TCC_EA_RDREQ_sum - TCC_EA_RDREQ_32B_sum)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 64)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Write BW:
|
||||
value: AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA_WRREQ_64B_sum * 64) + ((TCC_EA_WRREQ_sum - TCC_EA_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Read Latency:
|
||||
value: AVG(((TCC_EA_RDREQ_LEVEL_sum / TCC_EA_RDREQ_sum) if (TCC_EA_RDREQ_sum
|
||||
|
||||
+18
@@ -19,6 +19,24 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
CPC SYNC FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPC CANE Stall Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPC ADC Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPF Utilization:
|
||||
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
|
||||
+21
@@ -19,6 +19,13 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Schedule-Pipe Wave Occupancy:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Wave
|
||||
tips:
|
||||
Accelerator Utilization:
|
||||
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
@@ -31,6 +38,13 @@ Panel Config:
|
||||
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe Wave Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Workgroup Manager Utilization:
|
||||
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
@@ -108,6 +122,13 @@ Panel Config:
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Stall Rate:
|
||||
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
|
||||
+14
@@ -181,6 +181,13 @@ Panel Config:
|
||||
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
Spill/Stack Coalesceable Instr:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Global/Generic Read:
|
||||
avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
|
||||
@@ -271,3 +278,10 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA-F6F4:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -75,6 +75,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (INT8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -124,6 +131,13 @@ Panel Config:
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
VALU Co-Issue Efficiency:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
@@ -264,6 +278,13 @@ Panel Config:
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
F6F4 OPs:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
INT8 OPs:
|
||||
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
|
||||
+56
@@ -55,6 +55,48 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_LDS / $denom))
|
||||
unit: (Instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
Theoretical Bandwidth:
|
||||
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
@@ -116,3 +158,17 @@ Panel Config:
|
||||
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
unit: (Accesses + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Command FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Data FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -43,6 +43,27 @@ Panel Config:
|
||||
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Address Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Command Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Data Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Total Instructions:
|
||||
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
|
||||
@@ -40,6 +40,10 @@ Panel Config:
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
HBM Bandwidth:
|
||||
value: $hbmBandwidth
|
||||
unit: GB/s
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1702
|
||||
|
||||
+11
-4
@@ -77,6 +77,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (Int8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -198,18 +205,18 @@ Panel Config:
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Write BW:
|
||||
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Read Latency:
|
||||
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
|
||||
|
||||
+21
@@ -19,6 +19,27 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC SYNC FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC CANE Stall Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC ADC Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPF Utilization:
|
||||
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
|
||||
+21
@@ -19,6 +19,13 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Schedule-Pipe Wave Occupancy:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Wave
|
||||
tips:
|
||||
Accelerator Utilization:
|
||||
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
@@ -31,6 +38,13 @@ Panel Config:
|
||||
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe Wave Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Workgroup Manager Utilization:
|
||||
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
@@ -108,6 +122,13 @@ Panel Config:
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Stall Rate:
|
||||
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
|
||||
+14
@@ -209,6 +209,13 @@ Panel Config:
|
||||
max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
Spill/Stack Coalesceable Instr:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Spill/Stack Read:
|
||||
avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
|
||||
@@ -274,4 +281,11 @@ Panel Config:
|
||||
min: MIN((SQ_INSTS_VALU_MFMA_F64 / $denom))
|
||||
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA-F6F4:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
tips:
|
||||
|
||||
+21
@@ -76,6 +76,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (INT8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -125,6 +132,13 @@ Panel Config:
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
VALU Co-Issue Efficiency:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
@@ -265,6 +279,13 @@ Panel Config:
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
F6F4 OPs:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
INT8 OPs:
|
||||
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
|
||||
+56
@@ -55,6 +55,48 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_LDS / $denom))
|
||||
unit: (Instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
Theoretical Bandwidth:
|
||||
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
@@ -116,3 +158,17 @@ Panel Config:
|
||||
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
unit: (Accesses + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Command FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Data FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -43,6 +43,27 @@ Panel Config:
|
||||
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Address Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Command Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Data Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Total Instructions:
|
||||
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
|
||||
@@ -40,6 +40,10 @@ Panel Config:
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
HBM Bandwidth:
|
||||
value: $hbmBandwidth
|
||||
unit: GB/s
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1702
|
||||
|
||||
+11
-4
@@ -77,6 +77,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (Int8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -198,18 +205,18 @@ Panel Config:
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Write BW:
|
||||
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Read Latency:
|
||||
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
|
||||
|
||||
+21
@@ -19,6 +19,27 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
CPC SYNC FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
CPC CANE Stall Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
CPC ADC Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPF Utilization:
|
||||
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
|
||||
+21
@@ -19,6 +19,13 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Schedule-Pipe Wave Occupancy:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Wave
|
||||
tips:
|
||||
Accelerator Utilization:
|
||||
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
@@ -31,6 +38,13 @@ Panel Config:
|
||||
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe Wave Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Workgroup Manager Utilization:
|
||||
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
@@ -108,6 +122,13 @@ Panel Config:
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Stall Rate:
|
||||
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
|
||||
+14
@@ -209,6 +209,13 @@ Panel Config:
|
||||
max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
Spill/Stack Coalesceable Instr:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Spill/Stack Read:
|
||||
avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
|
||||
@@ -275,3 +282,10 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA-F6F4:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -76,6 +76,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (INT8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -125,6 +132,13 @@ Panel Config:
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
VALU Co-Issue Efficiency:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
@@ -265,6 +279,13 @@ Panel Config:
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
F6F4 OPs:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
INT8 OPs:
|
||||
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
|
||||
+56
@@ -55,6 +55,48 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_LDS / $denom))
|
||||
unit: (Instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
Theoretical Bandwidth:
|
||||
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
@@ -116,3 +158,17 @@ Panel Config:
|
||||
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
unit: (Accesses + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Command FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Data FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -43,6 +43,27 @@ Panel Config:
|
||||
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Address Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Command Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Data Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Total Instructions:
|
||||
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
|
||||
@@ -40,6 +40,10 @@ Panel Config:
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
HBM Bandwidth:
|
||||
value: $hbmBandwidth
|
||||
unit: GB/s
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1702
|
||||
|
||||
+11
-4
@@ -77,6 +77,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (Int8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -198,18 +205,18 @@ Panel Config:
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Write BW:
|
||||
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbm_bw
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbm_bw)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Read Latency:
|
||||
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
|
||||
|
||||
+21
@@ -76,6 +76,27 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC SYNC FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC CANE Stall Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
CPC ADC Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
CPC Utilization:
|
||||
avg: AVG((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
|
||||
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
|
||||
|
||||
+21
@@ -19,6 +19,13 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Schedule-Pipe Wave Occupancy:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Wave
|
||||
tips:
|
||||
Accelerator Utilization:
|
||||
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
@@ -31,6 +38,13 @@ Panel Config:
|
||||
max: MAX(100 * SPI_CSN_BUSY / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe Wave Utilization:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Workgroup Manager Utilization:
|
||||
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
@@ -108,6 +122,13 @@ Panel Config:
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Scheduler-Pipe FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Stall Rate:
|
||||
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
|
||||
+14
@@ -209,6 +209,13 @@ Panel Config:
|
||||
max: MAX((TA_BUFFER_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then remove this
|
||||
Spill/Stack Coalesceable Instr:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Spill/Stack Read:
|
||||
avg: AVG((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_BUFFER_READ_WAVEFRONTS_sum / $denom))
|
||||
@@ -275,3 +282,10 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA-F6F4:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -76,6 +76,13 @@ Panel Config:
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
MFMA FLOPs (F6F4):
|
||||
value: None
|
||||
unit: GFLOP
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
MFMA IOPs (INT8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
@@ -125,6 +132,13 @@ Panel Config:
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
VALU Co-Issue Efficiency:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: pct
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
@@ -265,6 +279,13 @@ Panel Config:
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
F6F4 OPs:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
INT8 OPs:
|
||||
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
|
||||
+56
@@ -55,6 +55,48 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_LDS / $denom))
|
||||
unit: (Instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS LOAD Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS STORE Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS ATOMIC Bandwidth:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
units: Gbps
|
||||
tips:
|
||||
Theoretical Bandwidth:
|
||||
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
@@ -116,3 +158,17 @@ Panel Config:
|
||||
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
unit: (Accesses + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Command FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
LDS Data FIFO Full Rate:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
|
||||
+21
@@ -43,6 +43,27 @@ Panel Config:
|
||||
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Address Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Command Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
# TODO: Fix baseline comparision logic to handle non existent metrics, then
|
||||
Sequencer → TA Data Stall:
|
||||
avg: None
|
||||
min: None
|
||||
max: None
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Total Instructions:
|
||||
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
|
||||
@@ -41,6 +41,10 @@ Panel Config:
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
HBM Bandwidth:
|
||||
value: $hbmBandwidth
|
||||
unit: GB/s
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1702
|
||||
|
||||
+14
@@ -0,0 +1,14 @@
|
||||
---
|
||||
Panel Config:
|
||||
id: 000
|
||||
title: Top Stats
|
||||
data source:
|
||||
- raw_csv_table:
|
||||
id: 001
|
||||
title: Top Kernels
|
||||
source: pmc_kernel_top.csv
|
||||
|
||||
- raw_csv_table:
|
||||
id: 002
|
||||
title: Dispatch List
|
||||
source: pmc_dispatch_info.csv
|
||||
@@ -0,0 +1,9 @@
|
||||
---
|
||||
Panel Config:
|
||||
id: 100
|
||||
title: System Info
|
||||
data source:
|
||||
- raw_csv_table:
|
||||
id: 101
|
||||
source: sysinfo.csv
|
||||
columnwise: True
|
||||
+269
@@ -0,0 +1,269 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
SALU: &SALU_anchor Scalar Arithmetic Logic Unit
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 200
|
||||
title: System Speed-of-Light
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 201
|
||||
title: Speed-of-Light
|
||||
header:
|
||||
metric: Metric
|
||||
value: Avg
|
||||
unit: Unit
|
||||
peak: Peak
|
||||
pop: Pct of Peak
|
||||
tips: Tips
|
||||
metric:
|
||||
VALU FLOPs:
|
||||
value: AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
|
||||
+ (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
|
||||
+ SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32)))) + (64 * (((SQ_INSTS_VALU_ADD_F64
|
||||
+ SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64) + (2 * SQ_INSTS_VALU_FMA_F64))))
|
||||
/ (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
|
||||
pop: ((100 * AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16)
|
||||
+ SQ_INSTS_VALU_TRANS_F16) + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32
|
||||
+ SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32))))
|
||||
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (2 * SQ_INSTS_VALU_FMA_F64)))) / (End_Timestamp - Start_Timestamp)))) / (((($max_sclk
|
||||
* $cu_per_gpu) * 64) * 2) / 1000))
|
||||
tips:
|
||||
VALU IOPs:
|
||||
value: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
|
||||
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
|
||||
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (BF16):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F16):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F32):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F64):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F6F4):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
MFMA IOPs (Int8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
|
||||
tips:
|
||||
Active CUs:
|
||||
value: $numActiveCUs
|
||||
unit: CUs
|
||||
peak: $cu_per_gpu
|
||||
pop: ((100 * $numActiveCUs) / $cu_per_gpu)
|
||||
tips:
|
||||
SALU Utilization:
|
||||
value: AVG(((100 * SQ_ACTIVE_INST_SCA) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG(((100 * SQ_ACTIVE_INST_SCA) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
tips:
|
||||
VALU Utilization:
|
||||
value: AVG(((100 * SQ_ACTIVE_INST_VALU) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG(((100 * SQ_ACTIVE_INST_VALU) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
tips:
|
||||
MFMA Utilization:
|
||||
value: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / (($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)
|
||||
* 4)))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / (($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)
|
||||
* 4)))
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
value: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
tips:
|
||||
Branch Utilization:
|
||||
value: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
tips:
|
||||
VALU Active Threads:
|
||||
value: AVG(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
|
||||
!= 0) else None))
|
||||
unit: Threads
|
||||
peak: $wave_size
|
||||
pop: (100 * AVG((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU / $wave_size) if (SQ_ACTIVE_INST_VALU != 0) else None))
|
||||
tips:
|
||||
IPC:
|
||||
value: AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))
|
||||
unit: Instr/cycle
|
||||
peak: 5
|
||||
pop: ((100 * AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))) / 5)
|
||||
tips:
|
||||
Wavefront Occupancy:
|
||||
value: AVG((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
|
||||
unit: Wavefronts
|
||||
peak: ($max_waves_per_cu * $cu_per_gpu)
|
||||
pop: (100 * AVG(((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD) / ($max_waves_per_cu
|
||||
* $cu_per_gpu))))
|
||||
coll_level: SQ_LEVEL_WAVES
|
||||
tips:
|
||||
Theoretical LDS Bandwidth:
|
||||
value: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: (($max_sclk * $cu_per_gpu) * 0.128)
|
||||
pop: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
|
||||
tips:
|
||||
LDS Bank Conflicts/Access:
|
||||
value: AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
|
||||
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
|
||||
unit: Conflicts/access
|
||||
peak: 32
|
||||
pop: ((100 * AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
|
||||
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))) / 32)
|
||||
tips:
|
||||
vL1D Cache Hit Rate:
|
||||
value: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum)
|
||||
+ TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum))
|
||||
/ TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0) else
|
||||
None))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG(((100 - ((100 * (((TCP_TCC_READ_REQ_sum + TCP_TCC_WRITE_REQ_sum) +
|
||||
TCP_TCC_ATOMIC_WITH_RET_REQ_sum) + TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum)) /
|
||||
TCP_TOTAL_CACHE_ACCESSES_sum)) if (TCP_TOTAL_CACHE_ACCESSES_sum != 0) else
|
||||
None))
|
||||
tips:
|
||||
vL1D Cache BW:
|
||||
value: AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: ((($max_sclk / 1000) * 128) * $cu_per_gpu)
|
||||
pop: ((100 * AVG(((TCP_TOTAL_CACHE_ACCESSES_sum * 128) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk / 1000) * 128) * $cu_per_gpu))
|
||||
tips:
|
||||
L2 Cache Hit Rate:
|
||||
value: AVG((((100 * TCC_HIT_sum) / (TCC_HIT_sum + TCC_MISS_sum)) if ((TCC_HIT_sum
|
||||
+ TCC_MISS_sum) != 0) else None))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG((((100 * TCC_HIT_sum) / (TCC_HIT_sum + TCC_MISS_sum)) if ((TCC_HIT_sum
|
||||
+ TCC_MISS_sum) != 0) else None))
|
||||
tips:
|
||||
L2 Cache BW:
|
||||
value: AVG(((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: ((($max_sclk / 1000) * 128) * TO_INT($total_l2_chan))
|
||||
pop: ((100 * AVG(((TCC_REQ_sum * 128) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk / 1000) * 128) * TO_INT($total_l2_chan)))
|
||||
tips:
|
||||
L2-Fabric Read BW:
|
||||
value: AVG((128 * TCC_BUBBLE_sum +
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp))
|
||||
unit: GB/s
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * (AVG((128 * TCC_BUBBLE_sum +
|
||||
64 * (TCC_EA0_RDREQ_sum - TCC_BUBBLE_sum - TCC_EA0_RDREQ_32B_sum) +
|
||||
32 * TCC_EA0_RDREQ_32B_sum) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Write BW:
|
||||
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
peak: $hbmBandwidth
|
||||
pop: ((100 * AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))) / $hbmBandwidth)
|
||||
tips:
|
||||
L2-Fabric Read Latency:
|
||||
value: AVG(((TCC_EA0_RDREQ_LEVEL_sum / TCC_EA0_RDREQ_sum) if (TCC_EA0_RDREQ_sum
|
||||
!= 0) else None))
|
||||
unit: Cycles
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
L2-Fabric Write Latency:
|
||||
value: AVG(((TCC_EA0_WRREQ_LEVEL_sum / TCC_EA0_WRREQ_sum) if (TCC_EA0_WRREQ_sum
|
||||
!= 0) else None))
|
||||
unit: Cycles
|
||||
peak: None
|
||||
pop: None
|
||||
tips:
|
||||
sL1D Cache Hit Rate:
|
||||
value: AVG((((100 * SQC_DCACHE_HITS) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES))
|
||||
if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES) != 0) else None))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG((((100 * SQC_DCACHE_HITS) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES))
|
||||
if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES) != 0) else None))
|
||||
tips:
|
||||
sL1D Cache BW:
|
||||
value: AVG(((SQC_DCACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))
|
||||
unit: GB/s
|
||||
peak: ((($max_sclk / 1000) * 64) * $sqc_per_gpu)
|
||||
pop: ((100 * AVG(((SQC_DCACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))) / ((($max_sclk
|
||||
/ 1000) * 64) * $sqc_per_gpu))
|
||||
tips:
|
||||
L1I Hit Rate:
|
||||
value: AVG(((100 * SQC_ICACHE_HITS) / (SQC_ICACHE_HITS + SQC_ICACHE_MISSES)))
|
||||
unit: pct
|
||||
peak: 100
|
||||
pop: AVG(((100 * SQC_ICACHE_HITS) / (SQC_ICACHE_HITS + SQC_ICACHE_MISSES)))
|
||||
tips:
|
||||
L1I BW:
|
||||
value: AVG(((SQC_ICACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))
|
||||
unit: GB/s
|
||||
peak: ((($max_sclk / 1000) * 64) * $sqc_per_gpu)
|
||||
pop: ((100 * AVG(((SQC_ICACHE_REQ / (End_Timestamp - Start_Timestamp)) * 64))) / ((($max_sclk
|
||||
/ 1000) * 64) * $sqc_per_gpu))
|
||||
tips:
|
||||
L1I Fetch Latency:
|
||||
value: AVG((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
|
||||
unit: Cycles
|
||||
peak: None
|
||||
pop: None
|
||||
coll_level: SQ_IFETCH_LEVEL
|
||||
tips:
|
||||
+153
@@ -0,0 +1,153 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 500
|
||||
title: Command Processor (CPC/CPF)
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 501
|
||||
title: Command Processor Fetcher
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
CPF Utilization:
|
||||
avg: AVG((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
min: MIN((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
max: MAX((((100 * CPF_CPF_STAT_BUSY) / (CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE))
|
||||
if ((CPF_CPF_STAT_BUSY + CPF_CPF_STAT_IDLE) != 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
CPF Stall:
|
||||
avg: AVG((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
|
||||
!= 0) else None))
|
||||
min: MIN((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
|
||||
!= 0) else None))
|
||||
max: MAX((((100 * CPF_CPF_STAT_STALL) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
|
||||
!= 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
CPF-L2 Utilization:
|
||||
avg: AVG((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
|
||||
if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
|
||||
min: MIN((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
|
||||
if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
|
||||
max: MAX((((100 * CPF_CPF_TCIU_BUSY) / (CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE))
|
||||
if ((CPF_CPF_TCIU_BUSY + CPF_CPF_TCIU_IDLE) != 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
CPF-L2 Stall:
|
||||
avg: AVG((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
|
||||
!= 0) else None))
|
||||
min: MIN((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
|
||||
!= 0) else None))
|
||||
max: MAX((((100 * CPF_CPF_TCIU_STALL) / CPF_CPF_TCIU_BUSY) if (CPF_CPF_TCIU_BUSY
|
||||
!= 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
CPF-UTCL1 Stall:
|
||||
avg: AVG(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
|
||||
!= 0) else None)
|
||||
min: MIN(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
|
||||
!= 0) else None)
|
||||
max: MAX(((100 * CPF_CMP_UTCL1_STALL_ON_TRANSLATION) / CPF_CPF_STAT_BUSY) if (CPF_CPF_STAT_BUSY
|
||||
!= 0) else None)
|
||||
unit: pct
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 502
|
||||
title: Packet Processor
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
CPC SYNC FIFO Full Rate:
|
||||
avg: AVG((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
|
||||
min: MIN((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
|
||||
max: MAX((100 * CPC_SYNC_FIFO_FULL) / CPC_SYNC_WRREQ_FIFO_BUSY if (CPC_SYNC_WRREQ_FIFO_BUSY != 0) else None)
|
||||
unit: pct
|
||||
tips:
|
||||
CPC CANE Stall Rate:
|
||||
avg: AVG((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
|
||||
min: MIN((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
|
||||
max: MAX((100 * CPC_CANE_STALL) / CPC_CANE_BUSY if (CPC_CANE_BUSY != 0) else None)
|
||||
unit: pct
|
||||
tips:
|
||||
CPC ADC Utilization:
|
||||
avg: AVG((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
|
||||
min: MIN((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
|
||||
max: MAX((100 * CPC_TG_SEND) / CPC_GD_BUSY if (CPC_GD_BUSY != 0) else None)
|
||||
unit: pct
|
||||
tips:
|
||||
CPC Utilization:
|
||||
avg: AVG((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
|
||||
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
|
||||
min: MIN((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
|
||||
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
|
||||
max: MAX((((100 * CPC_CPC_STAT_BUSY) / (CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE))
|
||||
if ((CPC_CPC_STAT_BUSY + CPC_CPC_STAT_IDLE) != 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
CPC Stall Rate:
|
||||
avg: AVG((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
|
||||
!= 0) else None))
|
||||
min: MIN((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
|
||||
!= 0) else None))
|
||||
max: MAX((((100 * CPC_CPC_STAT_STALL) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
|
||||
!= 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
CPC Packet Decoding Utilization:
|
||||
avg: AVG((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
|
||||
min: MIN((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
|
||||
max: MAX((100 * CPC_ME1_BUSY_FOR_PACKET_DECODE) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
|
||||
unit: pct
|
||||
tips:
|
||||
CPC-Workgroup Manager Utilization:
|
||||
avg: AVG((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
|
||||
min: MIN((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
|
||||
max: MAX((100 * CPC_ME1_DC0_SPI_BUSY) / CPC_CPC_STAT_BUSY if (CPC_CPC_STAT_BUSY != 0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
CPC-L2 Utilization:
|
||||
avg: AVG((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
|
||||
if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
|
||||
min: MIN((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
|
||||
if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
|
||||
max: MAX((((100 * CPC_CPC_TCIU_BUSY) / (CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE))
|
||||
if ((CPC_CPC_TCIU_BUSY + CPC_CPC_TCIU_IDLE) != 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
CPC-UTCL1 Stall:
|
||||
avg: AVG(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
|
||||
!= 0) else None)
|
||||
min: MIN(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
|
||||
!= 0) else None)
|
||||
max: MAX(((100 * CPC_UTCL1_STALL_ON_TRANSLATION) / CPC_CPC_STAT_BUSY) if (CPC_CPC_STAT_BUSY
|
||||
!= 0) else None)
|
||||
unit: pct
|
||||
tips:
|
||||
CPC-UTCL2 Utilization:
|
||||
avg: AVG((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
|
||||
if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
|
||||
min: MIN((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
|
||||
if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
|
||||
max: MAX((((100 * CPC_CPC_UTCL2IU_BUSY) / (CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE))
|
||||
if ((CPC_CPC_UTCL2IU_BUSY + CPC_CPC_UTCL2IU_IDLE) != 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
+188
@@ -0,0 +1,188 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 600
|
||||
title: Workgroup Manager (SPI)
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 601
|
||||
title: Workgroup Manager Utilizations
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Schedule-Pipe Wave Occupancy:
|
||||
avg: AVG(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
|
||||
min: MIN(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
|
||||
max: MAX(SPI_CSQ_P0_OCCUPANCY + SPI_CSQ_P1_OCCUPANCY + SPI_CSQ_P2_OCCUPANCY + SPI_CSQ_P3_OCCUPANCY)
|
||||
unit: Wave
|
||||
tips:
|
||||
Accelerator Utilization:
|
||||
avg: AVG(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
min: MIN(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
max: MAX(100 * $GRBM_GUI_ACTIVE_PER_XCD / $GRBM_COUNT_PER_XCD)
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Utilization:
|
||||
avg: AVG(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
min: MIN(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
max: MAX(100 * (SPI_CS0_BUSY + SPI_CS1_BUSY + SPI_CS2_BUSY + SPI_CS3_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Wave Utilization:
|
||||
avg: AVG(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
min: MIN(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
max: MAX(100 * (SPI_CSC_WAVE_CNT_BUSY) / ($GRBM_GUI_ACTIVE_PER_XCD * $pipes_per_gpu * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Workgroup Manager Utilization:
|
||||
avg: AVG(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
max: MAX(100 * $GRBM_SPI_BUSY_PER_XCD / $GRBM_GUI_ACTIVE_PER_XCD)
|
||||
unit: Pct
|
||||
tips:
|
||||
Shader Engine Utilization:
|
||||
avg: AVG(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
|
||||
min: MIN(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
|
||||
max: MAX(100 * SQ_BUSY_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $se_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
SIMD Utilization:
|
||||
avg: AVG(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(100 * SQ_BUSY_CU_CYCLES / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Dispatched Workgroups:
|
||||
avg: AVG(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
|
||||
min: MIN(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
|
||||
max: MAX(SPI_CS0_NUM_THREADGROUPS + SPI_CS1_NUM_THREADGROUPS + SPI_CS2_NUM_THREADGROUPS + SPI_CS3_NUM_THREADGROUPS)
|
||||
unit: Workgroups
|
||||
tips:
|
||||
Dispatched Wavefronts:
|
||||
avg: AVG(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
|
||||
min: MIN(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
|
||||
max: MAX(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
|
||||
unit: Wavefronts
|
||||
tips:
|
||||
VGPR Writes:
|
||||
avg: AVG((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
|
||||
None))
|
||||
min: MIN((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
|
||||
None))
|
||||
max: MAX((((SPI_VWC0_VDATA_VALID_WR + SPI_VWC1_VDATA_VALID_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
|
||||
None))
|
||||
unit: Cycles/wave
|
||||
tips:
|
||||
SGPR Writes:
|
||||
avg: AVG((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
|
||||
None))
|
||||
min: MIN((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
|
||||
None))
|
||||
max: MAX((((1 * SPI_SWC_CSC_WR) / (SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)) if ((SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE) != 0) else
|
||||
None))
|
||||
unit: Cycles/wave
|
||||
tips:
|
||||
- metric_table:
|
||||
id: 602
|
||||
title: Workgroup Manager - Resource Allocation
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Not-scheduled Rate (Workgroup Manager):
|
||||
avg: AVG((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
min: MIN((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
max: MAX((100 * SPI_RA_REQ_NO_ALLOC_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
Not-scheduled Rate (Scheduler-Pipe):
|
||||
avg: AVG((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
min: MIN((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
max: MAX((100 * SPI_RA_REQ_NO_ALLOC / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe FIFO Full Rate:
|
||||
avg: AVG((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
min: MIN((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
max: MAX((100 * (SPI_CS0_CRAWLER_STALL + SPI_CS1_CRAWLER_STALL + SPI_CS2_CRAWLER_STALL + SPI_CS3_CRAWLER_STALL) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
Scheduler-Pipe Stall Rate:
|
||||
avg: AVG((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
min: MIN((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
max: MAX((((100 * SPI_RA_RES_STALL_CSN) / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD !=
|
||||
0) else None))
|
||||
unit: Pct
|
||||
tips:
|
||||
Scratch Stall Rate:
|
||||
avg: AVG((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
|
||||
min: MIN((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
|
||||
max: MAX((100 * SPI_RA_TMP_STALL_CSN / ($GRBM_SPI_BUSY_PER_XCD * $se_per_gpu)) if ($GRBM_SPI_BUSY_PER_XCD != 0) else None)
|
||||
unit: Pct
|
||||
tips:
|
||||
Insufficient SIMD Waveslots:
|
||||
avg: AVG(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(100 * SPI_RA_WAVE_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Insufficient SIMD VGPRs:
|
||||
avg: AVG(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(100 * SPI_RA_VGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Insufficient SIMD SGPRs:
|
||||
avg: AVG(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(100 * SPI_RA_SGPR_SIMD_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Insufficient CU LDS:
|
||||
avg: AVG(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(400 * SPI_RA_LDS_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Insufficient CU Barriers:
|
||||
avg: AVG(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(400 * SPI_RA_BAR_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Reached CU Workgroup Limit:
|
||||
avg: AVG(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(400 * SPI_RA_TGLIM_CU_FULL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
Reached CU Wavefront Limit:
|
||||
avg: AVG(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
min: MIN(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
max: MAX(400 * SPI_RA_WVLIM_STALL_CSN / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu))
|
||||
unit: Pct
|
||||
tips:
|
||||
+142
@@ -0,0 +1,142 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 700
|
||||
title: Wavefront
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 701
|
||||
title: Wavefront Launch Stats
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Grid Size:
|
||||
avg: AVG(Grid_Size)
|
||||
min: MIN(Grid_Size)
|
||||
max: MAX(Grid_Size)
|
||||
unit: Work Items
|
||||
tips:
|
||||
Workgroup Size:
|
||||
avg: AVG(Workgroup_Size)
|
||||
min: MIN(Workgroup_Size)
|
||||
max: MAX(Workgroup_Size)
|
||||
unit: Work Items
|
||||
tips:
|
||||
Total Wavefronts:
|
||||
avg: AVG(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
|
||||
min: MIN(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
|
||||
max: MAX(SPI_CS0_WAVE + SPI_CS1_WAVE + SPI_CS2_WAVE + SPI_CS3_WAVE)
|
||||
unit: Wavefronts
|
||||
tips:
|
||||
Saved Wavefronts:
|
||||
avg: AVG(SQ_WAVES_SAVED)
|
||||
min: MIN(SQ_WAVES_SAVED)
|
||||
max: MAX(SQ_WAVES_SAVED)
|
||||
unit: Wavefronts
|
||||
tips:
|
||||
Restored Wavefronts:
|
||||
avg: AVG(SQ_WAVES_RESTORED)
|
||||
min: MIN(SQ_WAVES_RESTORED)
|
||||
max: MAX(SQ_WAVES_RESTORED)
|
||||
unit: Wavefronts
|
||||
tips:
|
||||
VGPRs:
|
||||
avg: AVG(Arch_VGPR)
|
||||
min: MIN(Arch_VGPR)
|
||||
max: MAX(Arch_VGPR)
|
||||
unit: Registers
|
||||
tips:
|
||||
AGPRs:
|
||||
avg: AVG(Accum_VGPR)
|
||||
min: MIN(Accum_VGPR)
|
||||
max: MAX(Accum_VGPR)
|
||||
unit: Registers
|
||||
tips:
|
||||
SGPRs:
|
||||
avg: AVG(SGPR)
|
||||
min: MIN(SGPR)
|
||||
max: MAX(SGPR)
|
||||
unit: Registers
|
||||
tips:
|
||||
LDS Allocation:
|
||||
avg: AVG(LDS_Per_Workgroup)
|
||||
min: MIN(LDS_Per_Workgroup)
|
||||
max: MAX(LDS_Per_Workgroup)
|
||||
unit: Bytes
|
||||
tips:
|
||||
Scratch Allocation:
|
||||
avg: AVG(Scratch_Per_Workitem)
|
||||
min: MIN(Scratch_Per_Workitem)
|
||||
max: MAX(Scratch_Per_Workitem)
|
||||
unit: Bytes/Workitem
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 702
|
||||
title: Wavefront Runtime Stats
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Kernel Time (Nanosec):
|
||||
avg: AVG((End_Timestamp - Start_Timestamp))
|
||||
min: MIN((End_Timestamp - Start_Timestamp))
|
||||
max: MAX((End_Timestamp - Start_Timestamp))
|
||||
unit: ns
|
||||
tips:
|
||||
Kernel Time (Cycles):
|
||||
avg: AVG($GRBM_GUI_ACTIVE_PER_XCD)
|
||||
min: MIN($GRBM_GUI_ACTIVE_PER_XCD)
|
||||
max: MAX($GRBM_GUI_ACTIVE_PER_XCD)
|
||||
unit: Cycle
|
||||
tips:
|
||||
Instructions per wavefront:
|
||||
avg: AVG((SQ_INSTS / SQ_WAVES))
|
||||
min: MIN((SQ_INSTS / SQ_WAVES))
|
||||
max: MAX((SQ_INSTS / SQ_WAVES))
|
||||
unit: Instr/wavefront
|
||||
tips:
|
||||
Wave Cycles:
|
||||
avg: AVG(((4 * SQ_WAVE_CYCLES) / $denom))
|
||||
min: MIN(((4 * SQ_WAVE_CYCLES) / $denom))
|
||||
max: MAX(((4 * SQ_WAVE_CYCLES) / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Dependency Wait Cycles:
|
||||
avg: AVG(((4 * SQ_WAIT_ANY) / $denom))
|
||||
min: MIN(((4 * SQ_WAIT_ANY) / $denom))
|
||||
max: MAX(((4 * SQ_WAIT_ANY) / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Issue Wait Cycles:
|
||||
avg: AVG(((4 * SQ_WAIT_INST_ANY) / $denom))
|
||||
min: MIN(((4 * SQ_WAIT_INST_ANY) / $denom))
|
||||
max: MAX(((4 * SQ_WAIT_INST_ANY) / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Active Cycles:
|
||||
avg: AVG(((4 * SQ_ACTIVE_INST_ANY) / $denom))
|
||||
min: MIN(((4 * SQ_ACTIVE_INST_ANY) / $denom))
|
||||
max: MAX(((4 * SQ_ACTIVE_INST_ANY) / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Wavefront Occupancy:
|
||||
avg: AVG((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
|
||||
min: MIN((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
|
||||
max: MAX((SQ_ACCUM_PREV_HIRES / $GRBM_GUI_ACTIVE_PER_XCD))
|
||||
unit: Wavefronts
|
||||
coll_level: SQ_LEVEL_WAVES
|
||||
tips:
|
||||
+6
-9
@@ -185,15 +185,6 @@ Panel Config:
|
||||
max: MAX((TA_FLAT_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Global/Generic Coalesceable Instr:
|
||||
avg: None
|
||||
# AVG((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
|
||||
min: None
|
||||
# MIN((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
|
||||
max: None
|
||||
# MAX((TA_FLAT_COALESCEABLE_WAVEFRONTS_sum / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
Global/Generic Read:
|
||||
avg: AVG((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_FLAT_READ_WAVEFRONTS_sum / $denom))
|
||||
@@ -290,3 +281,9 @@ Panel Config:
|
||||
max: MAX((SQ_INSTS_VALU_MFMA_F64 / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
MFMA-F6F4:
|
||||
avg: AVG((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
|
||||
min: MIN((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
|
||||
max: MAX((SQ_INSTS_VALU_MFMA_F6F4 / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
|
||||
+293
@@ -0,0 +1,293 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 1100
|
||||
title: Compute Units - Compute Pipeline
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1101
|
||||
title: Speed-of-Light
|
||||
header:
|
||||
metric: Metric
|
||||
value: Avg
|
||||
unit: Unit
|
||||
peak: Peak
|
||||
pop: Pct of Peak
|
||||
tips: Tips
|
||||
metric:
|
||||
VALU FLOPs:
|
||||
value: AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
|
||||
+ (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
|
||||
+ SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32)))) + (64 * (((SQ_INSTS_VALU_ADD_F64
|
||||
+ SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64) + (2 * SQ_INSTS_VALU_FMA_F64))))
|
||||
/ (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
|
||||
pop: ((100 * AVG(((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16)
|
||||
+ SQ_INSTS_VALU_TRANS_F16) + (2 * SQ_INSTS_VALU_FMA_F16))) + (64 * (((SQ_INSTS_VALU_ADD_F32
|
||||
+ SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32) + (2 * SQ_INSTS_VALU_FMA_F32))))
|
||||
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (2 * SQ_INSTS_VALU_FMA_F64)))) / (End_Timestamp - Start_Timestamp)))) / (((($max_sclk
|
||||
* $cu_per_gpu) * 64) * 2) / 1000))
|
||||
tips:
|
||||
VALU IOPs:
|
||||
value: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
peak: (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000)
|
||||
pop: ((100 * AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) / (End_Timestamp
|
||||
- Start_Timestamp)))) / (((($max_sclk * $cu_per_gpu) * 64) * 2) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F8 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (BF16):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_BF16 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F16):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 4096) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F16 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 4096) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F32):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F32 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F64):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F64 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
MFMA FLOPs (F6F4):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GFLOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 256) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_F6F4 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 256) / 1000))
|
||||
tips:
|
||||
MFMA IOPs (INT8):
|
||||
value: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GIOP
|
||||
peak: ((($max_sclk * $cu_per_gpu) * 8192) / 1000)
|
||||
pop: ((100 * AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / (End_Timestamp - Start_Timestamp))))
|
||||
/ ((($max_sclk * $cu_per_gpu) * 8192) / 1000))
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1102
|
||||
title: Pipeline Stats
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
IPC:
|
||||
avg: AVG((SQ_INSTS / SQ_BUSY_CU_CYCLES))
|
||||
min: MIN((SQ_INSTS / SQ_BUSY_CU_CYCLES))
|
||||
max: MAX((SQ_INSTS / SQ_BUSY_CU_CYCLES))
|
||||
unit: Instr/cycle
|
||||
tips:
|
||||
IPC (Issued):
|
||||
avg: AVG(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
|
||||
+ SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED + SQ_INSTS_LDS)
|
||||
/ SQ_ACTIVE_INST_ANY))
|
||||
min: MIN(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
|
||||
+ SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED + SQ_INSTS_LDS)
|
||||
/ SQ_ACTIVE_INST_ANY))
|
||||
max: MAX(((((((((SQ_INSTS_VALU + SQ_INSTS_VMEM) + SQ_INSTS_SALU) + SQ_INSTS_SMEM))
|
||||
+ SQ_INSTS_BRANCH) + SQ_INSTS_SENDMSG) + SQ_INSTS_VSKIPPED + SQ_INSTS_LDS)
|
||||
/ SQ_ACTIVE_INST_ANY))
|
||||
unit: Instr/cycle
|
||||
tips:
|
||||
SALU Utilization:
|
||||
avg: AVG((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_SCA) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
VALU Utilization:
|
||||
avg: AVG((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_VALU) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
# Precentage of VALU instructions which are issued to two VALUs at a time
|
||||
VALU Co-Issue Efficiency:
|
||||
avg: AVG((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
|
||||
min: MIN((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
|
||||
max: MAX((100 * SQ_ACTIVE_INST_VALU2) / (SQ_ACTIVE_INST_VALU - SQ_ACTIVE_INST_VALU2))
|
||||
unit: pct
|
||||
tips:
|
||||
VMEM Utilization:
|
||||
avg: AVG((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
max: MAX((((100 * (SQ_ACTIVE_INST_FLAT+SQ_ACTIVE_INST_VMEM)) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
Branch Utilization:
|
||||
avg: AVG((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
min: MIN((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
max: MAX((((100 * SQ_ACTIVE_INST_MISC) / $GRBM_GUI_ACTIVE_PER_XCD) / $cu_per_gpu))
|
||||
unit: pct
|
||||
tips:
|
||||
VALU Active Threads:
|
||||
avg: AVG(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
|
||||
!= 0) else None))
|
||||
min: MIN(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
|
||||
!= 0) else None))
|
||||
max: MAX(((SQ_THREAD_CYCLES_VALU / SQ_ACTIVE_INST_VALU) if (SQ_ACTIVE_INST_VALU
|
||||
!= 0) else None))
|
||||
unit: Threads
|
||||
tips:
|
||||
MFMA Utilization:
|
||||
avg: AVG(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
|
||||
min: MIN(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
|
||||
max: MAX(((100 * SQ_VALU_MFMA_BUSY_CYCLES) / ((4 * $cu_per_gpu) * $GRBM_GUI_ACTIVE_PER_XCD)))
|
||||
unit: pct
|
||||
tips:
|
||||
MFMA Instr Cycles:
|
||||
avg: AVG(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
|
||||
else None))
|
||||
min: MIN(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
|
||||
else None))
|
||||
max: MAX(((SQ_VALU_MFMA_BUSY_CYCLES / SQ_INSTS_MFMA) if (SQ_INSTS_MFMA != 0)
|
||||
else None))
|
||||
unit: cycles/instr
|
||||
tips:
|
||||
VMEM Latency:
|
||||
avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
|
||||
else None))
|
||||
min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
|
||||
else None))
|
||||
max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_VMEM) if (SQ_INSTS_VMEM != 0)
|
||||
else None))
|
||||
unit: Cycles
|
||||
coll_level: SQ_INST_LEVEL_VMEM
|
||||
tips:
|
||||
SMEM Latency:
|
||||
avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
|
||||
else None))
|
||||
min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
|
||||
else None))
|
||||
max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_SMEM) if (SQ_INSTS_SMEM != 0)
|
||||
else None))
|
||||
unit: Cycles
|
||||
coll_level: SQ_INST_LEVEL_SMEM
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1103
|
||||
title: Arithmetic Operations
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
FLOPs (Total):
|
||||
avg: AVG((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
|
||||
+ (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
|
||||
* SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
|
||||
+ SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
|
||||
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
|
||||
$denom))
|
||||
min: MIN((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
|
||||
+ (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
|
||||
* SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
|
||||
+ SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
|
||||
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
|
||||
$denom))
|
||||
max: MAX((((((((64 * (((SQ_INSTS_VALU_ADD_F16 + SQ_INSTS_VALU_MUL_F16) + SQ_INSTS_VALU_TRANS_F16)
|
||||
+ (SQ_INSTS_VALU_FMA_F16 * 2))) + ((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F16) + (512
|
||||
* SQ_INSTS_VALU_MFMA_MOPS_BF16))) + (64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32)
|
||||
+ SQ_INSTS_VALU_TRANS_F32) + (SQ_INSTS_VALU_FMA_F32 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32))
|
||||
+ (64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2)))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4)) /
|
||||
$denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
IOPs (Total):
|
||||
avg: AVG(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
|
||||
min: MIN(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
|
||||
max: MAX(((64 * (SQ_INSTS_VALU_INT32 + SQ_INSTS_VALU_INT64)) + (SQ_INSTS_VALU_MFMA_MOPS_I8 * 512)) / $denom)
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
F8 OPs:
|
||||
avg: AVG(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
|
||||
min: MIN(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
|
||||
max: MAX(((512 * SQ_INSTS_VALU_MFMA_MOPS_F8) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
F16 OPs:
|
||||
avg: AVG(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
|
||||
(64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
|
||||
SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
|
||||
min: MIN(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
|
||||
(64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
|
||||
SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
|
||||
max: MAX(((((((64 * SQ_INSTS_VALU_ADD_F16) + (64 * SQ_INSTS_VALU_MUL_F16)) +
|
||||
(64 * SQ_INSTS_VALU_TRANS_F16)) + (128 * SQ_INSTS_VALU_FMA_F16)) + (512 *
|
||||
SQ_INSTS_VALU_MFMA_MOPS_F16)) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
BF16 OPs:
|
||||
avg: AVG(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
|
||||
min: MIN(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
|
||||
max: MAX(((512 * SQ_INSTS_VALU_MFMA_MOPS_BF16) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
F32 OPs:
|
||||
avg: AVG((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
|
||||
+ (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
|
||||
min: MIN((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
|
||||
+ (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
|
||||
max: MAX((((64 * (((SQ_INSTS_VALU_ADD_F32 + SQ_INSTS_VALU_MUL_F32) + SQ_INSTS_VALU_TRANS_F32)
|
||||
+ (SQ_INSTS_VALU_FMA_F32 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F32)) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
F64 OPs:
|
||||
avg: AVG((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
|
||||
min: MIN((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
|
||||
max: MAX((((64 * (((SQ_INSTS_VALU_ADD_F64 + SQ_INSTS_VALU_MUL_F64) + SQ_INSTS_VALU_TRANS_F64)
|
||||
+ (SQ_INSTS_VALU_FMA_F64 * 2))) + (512 * SQ_INSTS_VALU_MFMA_MOPS_F64)) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
F6F4 OPs:
|
||||
avg: AVG((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
|
||||
min: MIN((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
|
||||
max: MAX((512 * SQ_INSTS_VALU_MFMA_MOPS_F6F4) / $denom)
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
INT8 OPs:
|
||||
avg: AVG(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
min: MIN(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
max: MAX(((SQ_INSTS_VALU_MFMA_MOPS_I8 * 512) / $denom))
|
||||
unit: (OPs + $normUnit)
|
||||
tips:
|
||||
+166
@@ -0,0 +1,166 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 1200
|
||||
title: Local Data Share (LDS)
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1201
|
||||
title: Speed-of-Light
|
||||
header:
|
||||
metric: Metric
|
||||
value: Avg
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Utilization:
|
||||
value: AVG(((100 * SQ_LDS_IDX_ACTIVE) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
Access Rate:
|
||||
value: AVG(((200 * SQ_ACTIVE_INST_LDS) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
Theoretical Bandwidth (% of Peak):
|
||||
value: AVG((((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ (End_Timestamp - Start_Timestamp)) / (($max_sclk * $cu_per_gpu) * 0.00128)))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
Bank Conflict Rate:
|
||||
value: AVG((((SQ_LDS_BANK_CONFLICT * 3.125) / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
|
||||
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
comparable: false # for now
|
||||
cli_style: simple_bar
|
||||
|
||||
- metric_table:
|
||||
id: 1202
|
||||
title: LDS Stats
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
LDS Instrs:
|
||||
avg: AVG((SQ_INSTS_LDS / $denom))
|
||||
min: MIN((SQ_INSTS_LDS / $denom))
|
||||
max: MAX((SQ_INSTS_LDS / $denom))
|
||||
unit: (Instr + $normUnit)
|
||||
tips:
|
||||
LDS LOAD:
|
||||
avg: AVG((SQ_INSTS_LDS_LOAD / $denom))
|
||||
min: MIN((SQ_INSTS_LDS_LOAD / $denom))
|
||||
max: MAX((SQ_INSTS_LDS_LOAD / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
LDS STORE:
|
||||
avg: AVG((SQ_INSTS_LDS_STORE / $denom))
|
||||
min: MIN((SQ_INSTS_LDS_STORE / $denom))
|
||||
max: MAX((SQ_INSTS_LDS_STORE / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
LDS ATOMIC:
|
||||
avg: AVG((SQ_INSTS_LDS_ATOMIC / $denom))
|
||||
min: MIN((SQ_INSTS_LDS_ATOMIC / $denom))
|
||||
max: MAX((SQ_INSTS_LDS_ATOMIC / $denom))
|
||||
unit: (instr + $normUnit)
|
||||
tips:
|
||||
LDS LOAD Bandwidth:
|
||||
avg: AVG(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
min: MIN(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
max: MAX(64 * SQ_INSTS_LDS_LOAD_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
units: Gbps
|
||||
tips:
|
||||
LDS STORE Bandwidth:
|
||||
avg: AVG(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
min: MIN(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
max: MAX(64 * SQ_INSTS_LDS_STORE_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
units: Gbps
|
||||
tips:
|
||||
LDS ATOMIC Bandwidth:
|
||||
avg: AVG(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
min: MIN(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
max: MAX(64 * SQ_INSTS_LDS_ATOMIC_BANDWIDTH / (End_Timestamp - Start_Timestamp))
|
||||
units: Gbps
|
||||
tips:
|
||||
Theoretical Bandwidth:
|
||||
avg: AVG(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
min: MIN(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
max: MAX(((((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) * 4) * TO_INT($lds_banks_per_cu))
|
||||
/ $denom))
|
||||
unit: (Bytes + $normUnit)
|
||||
tips:
|
||||
LDS Latency:
|
||||
avg: AVG(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
|
||||
min: MIN(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
|
||||
max: MAX(((SQ_ACCUM_PREV_HIRES / SQ_INSTS_LDS) if (SQ_INSTS_LDS != 0) else None))
|
||||
unit: Cycles
|
||||
coll_level: SQ_INST_LEVEL_LDS
|
||||
tips:
|
||||
Bank Conflicts/Access:
|
||||
avg: AVG(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
|
||||
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
|
||||
min: MIN(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
|
||||
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
|
||||
max: MAX(((SQ_LDS_BANK_CONFLICT / (SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT))
|
||||
if ((SQ_LDS_IDX_ACTIVE - SQ_LDS_BANK_CONFLICT) != 0) else None))
|
||||
unit: Conflicts/Access
|
||||
tips:
|
||||
Index Accesses:
|
||||
avg: AVG((SQ_LDS_IDX_ACTIVE / $denom))
|
||||
min: MIN((SQ_LDS_IDX_ACTIVE / $denom))
|
||||
max: MAX((SQ_LDS_IDX_ACTIVE / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Atomic Return Cycles:
|
||||
avg: AVG((SQ_LDS_ATOMIC_RETURN / $denom))
|
||||
min: MIN((SQ_LDS_ATOMIC_RETURN / $denom))
|
||||
max: MAX((SQ_LDS_ATOMIC_RETURN / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Bank Conflict:
|
||||
avg: AVG((SQ_LDS_BANK_CONFLICT / $denom))
|
||||
min: MIN((SQ_LDS_BANK_CONFLICT / $denom))
|
||||
max: MAX((SQ_LDS_BANK_CONFLICT / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Addr Conflict:
|
||||
avg: AVG((SQ_LDS_ADDR_CONFLICT / $denom))
|
||||
min: MIN((SQ_LDS_ADDR_CONFLICT / $denom))
|
||||
max: MAX((SQ_LDS_ADDR_CONFLICT / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Unaligned Stall:
|
||||
avg: AVG((SQ_LDS_UNALIGNED_STALL / $denom))
|
||||
min: MIN((SQ_LDS_UNALIGNED_STALL / $denom))
|
||||
max: MAX((SQ_LDS_UNALIGNED_STALL / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Mem Violations:
|
||||
avg: AVG((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
min: MIN((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
max: MAX((SQ_LDS_MEM_VIOLATIONS / $denom))
|
||||
unit: (Accesses + $normUnit)
|
||||
tips:
|
||||
LDS Command FIFO Full Rate:
|
||||
avg: AVG((SQ_LDS_CMD_FIFO_FULL / $denom))
|
||||
min: MIN((SQ_LDS_CMD_FIFO_FULL / $denom))
|
||||
max: MAX((SQ_LDS_CMD_FIFO_FULL / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
LDS Data FIFO Full Rate:
|
||||
avg: AVG((SQ_LDS_DATA_FIFO_FULL / $denom))
|
||||
min: MIN((SQ_LDS_DATA_FIFO_FULL / $denom))
|
||||
max: MAX((SQ_LDS_DATA_FIFO_FULL / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
+105
@@ -0,0 +1,105 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 1300
|
||||
title: Instruction Cache
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1301
|
||||
title: Speed-of-Light
|
||||
header:
|
||||
metric: Metric
|
||||
value: Avg
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Bandwidth:
|
||||
value: AVG(((SQC_ICACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu)
|
||||
* (End_Timestamp - Start_Timestamp))))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
Cache Hit Rate:
|
||||
value: AVG(((SQC_ICACHE_HITS * 100) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
|
||||
+ SQC_ICACHE_MISSES_DUPLICATE)))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
L1I-L2 Bandwidth:
|
||||
value: AVG(((SQC_TC_INST_REQ * 100000) / (2 * ($max_sclk * $sqc_per_gpu)
|
||||
* (End_Timestamp - Start_Timestamp))))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
comparable: false # for now
|
||||
cli_style: simple_bar
|
||||
|
||||
- metric_table:
|
||||
id: 1302
|
||||
title: Instruction Cache Accesses
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Req:
|
||||
avg: AVG((SQC_ICACHE_REQ / $denom))
|
||||
min: MIN((SQC_ICACHE_REQ / $denom))
|
||||
max: MAX((SQC_ICACHE_REQ / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Hits:
|
||||
avg: AVG((SQC_ICACHE_HITS / $denom))
|
||||
min: MIN((SQC_ICACHE_HITS / $denom))
|
||||
max: MAX((SQC_ICACHE_HITS / $denom))
|
||||
unit: (Hits + $normUnit)
|
||||
tips:
|
||||
Misses - Non Duplicated:
|
||||
avg: AVG((SQC_ICACHE_MISSES / $denom))
|
||||
min: MIN((SQC_ICACHE_MISSES / $denom))
|
||||
max: MAX((SQC_ICACHE_MISSES / $denom))
|
||||
unit: (Misses + $normUnit)
|
||||
tips:
|
||||
Misses - Duplicated:
|
||||
avg: AVG((SQC_ICACHE_MISSES_DUPLICATE / $denom))
|
||||
min: MIN((SQC_ICACHE_MISSES_DUPLICATE / $denom))
|
||||
max: MAX((SQC_ICACHE_MISSES_DUPLICATE / $denom))
|
||||
unit: (Misses + $normUnit)
|
||||
tips:
|
||||
Cache Hit Rate:
|
||||
avg: AVG(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
|
||||
+ SQC_ICACHE_MISSES_DUPLICATE)))
|
||||
min: MIN(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
|
||||
+ SQC_ICACHE_MISSES_DUPLICATE)))
|
||||
max: MAX(((100 * SQC_ICACHE_HITS) / ((SQC_ICACHE_HITS + SQC_ICACHE_MISSES)
|
||||
+ SQC_ICACHE_MISSES_DUPLICATE)))
|
||||
unit: pct
|
||||
tips:
|
||||
Instruction Fetch Latency:
|
||||
avg: AVG((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
|
||||
min: MIN((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
|
||||
max: MAX((SQ_ACCUM_PREV_HIRES / SQ_IFETCH))
|
||||
unit: Cycles
|
||||
coll_level: SQ_IFETCH_LEVEL
|
||||
tips:
|
||||
- metric_table:
|
||||
id: 1303
|
||||
title: Instruction Cache - L2 Interface
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
L1I-L2 Bandwidth:
|
||||
avg: AVG(((SQC_TC_INST_REQ * 64) / $denom))
|
||||
min: MIN(((SQC_TC_INST_REQ * 64) / $denom))
|
||||
max: MAX(((SQC_TC_INST_REQ * 64) / $denom))
|
||||
unit: (Bytes + $normUnit)
|
||||
tips:
|
||||
+171
@@ -0,0 +1,171 @@
|
||||
---
|
||||
# Add description/tips for each metric in this section.
|
||||
# So it could be shown in hover.
|
||||
Metric Description:
|
||||
|
||||
# Define the panel properties and properties of each metric in the panel.
|
||||
Panel Config:
|
||||
id: 1400
|
||||
title: Scalar L1 Data Cache
|
||||
data source:
|
||||
- metric_table:
|
||||
id: 1401
|
||||
title: Speed-of-Light
|
||||
header:
|
||||
metric: Metric
|
||||
value: Avg
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Bandwidth:
|
||||
value: AVG(((SQC_DCACHE_REQ * 100000) / (($max_sclk * $sqc_per_gpu)
|
||||
* (End_Timestamp - Start_Timestamp))))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
Cache Hit Rate:
|
||||
value: AVG((((SQC_DCACHE_HITS * 100) / (SQC_DCACHE_HITS + SQC_DCACHE_MISSES + SQC_DCACHE_MISSES_DUPLICATE))
|
||||
if ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES + SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
sL1D-L2 BW:
|
||||
value: AVG(((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 100000)
|
||||
/ (2 * ($max_sclk * $sqc_per_gpu) * (End_Timestamp - Start_Timestamp)))
|
||||
unit: Pct of Peak
|
||||
tips:
|
||||
comparable: false # for now
|
||||
cli_style: simple_bar
|
||||
|
||||
- metric_table:
|
||||
id: 1402
|
||||
title: Scalar L1D Cache Accesses
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Req:
|
||||
avg: AVG((SQC_DCACHE_REQ / $denom))
|
||||
min: MIN((SQC_DCACHE_REQ / $denom))
|
||||
max: MAX((SQC_DCACHE_REQ / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Hits:
|
||||
avg: AVG((SQC_DCACHE_HITS / $denom))
|
||||
min: MIN((SQC_DCACHE_HITS / $denom))
|
||||
max: MAX((SQC_DCACHE_HITS / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Misses - Non Duplicated:
|
||||
avg: AVG((SQC_DCACHE_MISSES / $denom))
|
||||
min: MIN((SQC_DCACHE_MISSES / $denom))
|
||||
max: MAX((SQC_DCACHE_MISSES / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Misses- Duplicated:
|
||||
avg: AVG((SQC_DCACHE_MISSES_DUPLICATE / $denom))
|
||||
min: MIN((SQC_DCACHE_MISSES_DUPLICATE / $denom))
|
||||
max: MAX((SQC_DCACHE_MISSES_DUPLICATE / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Cache Hit Rate:
|
||||
avg: AVG((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
|
||||
+ SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
|
||||
+ SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
|
||||
min: MIN((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
|
||||
+ SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
|
||||
+ SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
|
||||
max: MAX((((100 * SQC_DCACHE_HITS) / ((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
|
||||
+ SQC_DCACHE_MISSES_DUPLICATE)) if (((SQC_DCACHE_HITS + SQC_DCACHE_MISSES)
|
||||
+ SQC_DCACHE_MISSES_DUPLICATE) != 0) else None))
|
||||
unit: pct
|
||||
tips:
|
||||
Read Req (Total):
|
||||
avg: AVG((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
|
||||
+ SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
|
||||
min: MIN((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
|
||||
+ SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
|
||||
max: MAX((((((SQC_DCACHE_REQ_READ_1 + SQC_DCACHE_REQ_READ_2) + SQC_DCACHE_REQ_READ_4)
|
||||
+ SQC_DCACHE_REQ_READ_8) + SQC_DCACHE_REQ_READ_16) / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Atomic Req:
|
||||
avg: AVG((SQC_DCACHE_ATOMIC / $denom))
|
||||
min: MIN((SQC_DCACHE_ATOMIC / $denom))
|
||||
max: MAX((SQC_DCACHE_ATOMIC / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Read Req (1 DWord):
|
||||
avg: AVG((SQC_DCACHE_REQ_READ_1 / $denom))
|
||||
min: MIN((SQC_DCACHE_REQ_READ_1 / $denom))
|
||||
max: MAX((SQC_DCACHE_REQ_READ_1 / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Read Req (2 DWord):
|
||||
avg: AVG((SQC_DCACHE_REQ_READ_2 / $denom))
|
||||
min: MIN((SQC_DCACHE_REQ_READ_2 / $denom))
|
||||
max: MAX((SQC_DCACHE_REQ_READ_2 / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Read Req (4 DWord):
|
||||
avg: AVG((SQC_DCACHE_REQ_READ_4 / $denom))
|
||||
min: MIN((SQC_DCACHE_REQ_READ_4 / $denom))
|
||||
max: MAX((SQC_DCACHE_REQ_READ_4 / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Read Req (8 DWord):
|
||||
avg: AVG((SQC_DCACHE_REQ_READ_8 / $denom))
|
||||
min: MIN((SQC_DCACHE_REQ_READ_8 / $denom))
|
||||
max: MAX((SQC_DCACHE_REQ_READ_8 / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Read Req (16 DWord):
|
||||
avg: AVG((SQC_DCACHE_REQ_READ_16 / $denom))
|
||||
min: MIN((SQC_DCACHE_REQ_READ_16 / $denom))
|
||||
max: MAX((SQC_DCACHE_REQ_READ_16 / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
|
||||
- metric_table:
|
||||
id: 1403
|
||||
title: Scalar L1D Cache - L2 Interface
|
||||
header:
|
||||
metric: Metric
|
||||
avg: Avg
|
||||
min: Min
|
||||
max: Max
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
sL1D-L2 BW:
|
||||
avg: AVG(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
|
||||
min: MIN(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
|
||||
max: MAX(((((SQC_TC_DATA_READ_REQ + SQC_TC_DATA_WRITE_REQ + SQC_TC_DATA_ATOMIC_REQ) * 64)) / $denom))
|
||||
unit: (Bytes + $normUnit)
|
||||
tips:
|
||||
Read Req:
|
||||
avg: AVG((SQC_TC_DATA_READ_REQ / $denom))
|
||||
min: MIN((SQC_TC_DATA_READ_REQ / $denom))
|
||||
max: MAX((SQC_TC_DATA_READ_REQ / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Write Req:
|
||||
avg: AVG((SQC_TC_DATA_WRITE_REQ / $denom))
|
||||
min: MIN((SQC_TC_DATA_WRITE_REQ / $denom))
|
||||
max: MAX((SQC_TC_DATA_WRITE_REQ / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Atomic Req:
|
||||
avg: AVG((SQC_TC_DATA_ATOMIC_REQ / $denom))
|
||||
min: MIN((SQC_TC_DATA_ATOMIC_REQ / $denom))
|
||||
max: MAX((SQC_TC_DATA_ATOMIC_REQ / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Stall Cycles:
|
||||
avg: AVG((SQC_TC_STALL / $denom))
|
||||
min: MIN((SQC_TC_STALL / $denom))
|
||||
max: MAX((SQC_TC_STALL / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
+18
@@ -43,6 +43,24 @@ Panel Config:
|
||||
max: MAX(((100 * TA_ADDR_STALLED_BY_TD_CYCLES_sum) / ($GRBM_GUI_ACTIVE_PER_XCD * $cu_per_gpu)))
|
||||
unit: pct
|
||||
tips:
|
||||
Sequencer → TA Address Stall:
|
||||
avg: AVG((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
|
||||
min: MIN((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
|
||||
max: MAX((SQ_VMEM_TA_ADDR_FIFO_FULL / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Sequencer → TA Command Stall:
|
||||
avg: AVG((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
|
||||
min: MIN((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
|
||||
max: MAX((SQ_VMEM_TA_CMD_FIFO_FULL / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Sequencer → TA Data Stall:
|
||||
avg: AVG((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
|
||||
min: MIN((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
|
||||
max: MAX((SQ_VMEM_WR_TA_DATA_FIFO_FULL / $denom))
|
||||
unit: (Cycles + $normUnit)
|
||||
tips:
|
||||
Total Instructions:
|
||||
avg: AVG((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
min: MIN((TA_TOTAL_WAVEFRONTS_sum / $denom))
|
||||
|
||||
+15
-6
@@ -32,12 +32,12 @@ Panel Config:
|
||||
tips:
|
||||
L2-Fabric Read BW:
|
||||
value: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum
|
||||
* 128)) / (End_Timestamp - Start_Timestamp))
|
||||
* 128)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
L2-Fabric Write and Atomic BW:
|
||||
value: AVG((((TCC_EA0_WRREQ_64B_sum * 64) + ((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_64B_sum)
|
||||
* 32)) / (End_Timestamp - Start_Timestamp))
|
||||
* 32)) / (End_Timestamp - Start_Timestamp)))
|
||||
unit: GB/s
|
||||
tips:
|
||||
|
||||
@@ -52,6 +52,15 @@ Panel Config:
|
||||
unit: Unit
|
||||
tips: Tips
|
||||
metric:
|
||||
Read BW:
|
||||
avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
|
||||
* 64)) / $denom))
|
||||
min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
|
||||
* 64)) / $denom))
|
||||
max: MAX((((TCC_EA0_RDREQ_32B_sum * 32) + ((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_32B_sum)
|
||||
* 64)) / $denom))
|
||||
unit: (Bytes + $normUnit)
|
||||
tips:
|
||||
Read BW:
|
||||
avg: AVG((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
|
||||
min: MIN((((TCC_EA0_RDREQ_32B_sum * 32) + (TCC_EA0_RDREQ_64B_sum * 64) + (TCC_EA0_RDREQ_128B_sum * 128)) / $denom))
|
||||
@@ -457,13 +466,13 @@ Panel Config:
|
||||
max: MAX((TCC_EA0_RD_UNCACHED_32B_sum / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Read - HBM:
|
||||
HBM Read:
|
||||
avg: AVG((TCC_EA0_RDREQ_DRAM_sum / $denom))
|
||||
min: MIN((TCC_EA0_RDREQ_DRAM_sum / $denom))
|
||||
max: MAX((TCC_EA0_RDREQ_DRAM_sum / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Read - Remote:
|
||||
Remote Read:
|
||||
avg: AVG((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
|
||||
min: MIN((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
|
||||
max: MAX((MAX((TCC_EA0_RDREQ_sum - TCC_EA0_RDREQ_DRAM_sum), 0) / $denom))
|
||||
@@ -505,13 +514,13 @@ Panel Config:
|
||||
max: MAX((TCC_EA0_WRREQ_64B_sum / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Write - HBM:
|
||||
HBM Write and Atomic:
|
||||
avg: AVG((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
|
||||
min: MIN((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
|
||||
max: MAX((TCC_EA0_WRREQ_WRITE_DRAM_sum / $denom))
|
||||
unit: (Req + $normUnit)
|
||||
tips:
|
||||
Write and Atomic - Remote:
|
||||
Remote Write and Atomic:
|
||||
avg: AVG((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
|
||||
min: MIN((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
|
||||
max: MAX((MAX((TCC_EA0_WRREQ_sum - TCC_EA0_WRREQ_DRAM_sum), 0) / $denom))
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
---
|
||||
Panel Config:
|
||||
id: 2100
|
||||
title: PC Sampling
|
||||
data source:
|
||||
- pc_sampling_table:
|
||||
id: 2101
|
||||
source: ps_file
|
||||
comparable: false # enable it later
|
||||
@@ -42,15 +42,16 @@ from utils.logger import (
|
||||
console_warning,
|
||||
demarcate,
|
||||
)
|
||||
from utils.mi_gpu_spec import get_gpu_model, get_gpu_series
|
||||
from utils.mi_gpu_spec import get_gpu_model, get_gpu_series, get_num_xcds
|
||||
from utils.parser import build_in_vars, supported_denom
|
||||
from utils.utils import (
|
||||
capture_subprocess_output,
|
||||
convert_metric_id_to_panel_idx,
|
||||
detect_rocprof,
|
||||
get_base_spi_pipe_counter,
|
||||
get_submodules,
|
||||
is_spi_pipe_counter,
|
||||
is_tcc_channel_counter,
|
||||
total_xcds,
|
||||
using_v3,
|
||||
)
|
||||
|
||||
@@ -186,7 +187,7 @@ class OmniSoC_Base:
|
||||
self._mspec.gpu_arch, self._mspec.gpu_chip_id
|
||||
)
|
||||
self._mspec.num_xcd = str(
|
||||
total_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
|
||||
get_num_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
|
||||
)
|
||||
|
||||
@demarcate
|
||||
@@ -316,10 +317,10 @@ class OmniSoC_Base:
|
||||
counters = counters - {"SQ_INSTS_VALU_MFMA_F8", "SQ_INSTS_VALU_MFMA_MOPS_F8"}
|
||||
|
||||
# Following counters are not supported
|
||||
# TCP_TCP_LATENCY_sum (except for gfx908 and gfx90a)
|
||||
# TCP_TCP_LATENCY_sum (except for gfx950)
|
||||
# SQC_DCACHE_INFLIGHT_LEVEL
|
||||
counters = counters - {"SQC_DCACHE_INFLIGHT_LEVEL"}
|
||||
if self.__arch not in ("gfx908", "gfx90a"):
|
||||
if self.__arch != "gfx950":
|
||||
counters = counters - {"TCP_TCP_LATENCY_sum"}
|
||||
|
||||
# SQ_ACCUM_PREV_HIRES will be injected for level counters later on
|
||||
@@ -510,6 +511,8 @@ class OmniSoC_Base:
|
||||
file_count = 0
|
||||
# Store all channels for a TCC channel counter in the same file
|
||||
tcc_channel_counter_file_map = dict()
|
||||
# Store all pipes for SPI pipe counters in the same file
|
||||
spi_pipe_counter_file_map = dict()
|
||||
for ctr in counters:
|
||||
# Store all channels for a TCC channel counter in the same file
|
||||
if is_tcc_channel_counter(ctr):
|
||||
@@ -517,13 +520,27 @@ class OmniSoC_Base:
|
||||
if output_file:
|
||||
output_file.add(ctr)
|
||||
continue
|
||||
# Store all pipes for SPI pipe counters in the same file
|
||||
if is_spi_pipe_counter(ctr):
|
||||
output_file = spi_pipe_counter_file_map.get(
|
||||
get_base_spi_pipe_counter(ctr)
|
||||
)
|
||||
if output_file:
|
||||
output_file.add(ctr)
|
||||
continue
|
||||
# Add counter to first file that has room
|
||||
added = False
|
||||
for i in range(len(output_files)):
|
||||
if output_files[i].add(ctr):
|
||||
added = True
|
||||
# Store all channels for a TCC channel counter in the same file
|
||||
if is_tcc_channel_counter(ctr):
|
||||
tcc_channel_counter_file_map[ctr.split("[")[0]] = output_files[i]
|
||||
# Store all pipes for SPI pipe counters in the same file
|
||||
if is_spi_pipe_counter(ctr):
|
||||
spi_pipe_counter_file_map[get_base_spi_pipe_counter(ctr)] = (
|
||||
output_files[i]
|
||||
)
|
||||
break
|
||||
|
||||
# All files are full, create a new file
|
||||
@@ -711,8 +728,18 @@ class LimitedSet:
|
||||
if e.split("[")[0] in {element.split("[")[0] for element in self.elements}:
|
||||
self.elements.append(e)
|
||||
return True
|
||||
# Store all pipes for SPI pipe counters in the same file
|
||||
if is_spi_pipe_counter(e) and get_base_spi_pipe_counter(e) in {
|
||||
get_base_spi_pipe_counter(element) for element in self.elements
|
||||
}:
|
||||
self.elements.append(e)
|
||||
return True
|
||||
if self.avail > 0:
|
||||
self.avail -= 1
|
||||
# SPI pipe counters take space of 2 counters
|
||||
if is_spi_pipe_counter(e):
|
||||
self.avail -= 2
|
||||
else:
|
||||
self.avail -= 1
|
||||
self.elements.append(e)
|
||||
return True
|
||||
return False
|
||||
|
||||
@@ -54,10 +54,6 @@ class gfx908_soc(OmniSoC_Base):
|
||||
self._mspec._l2_banks = 32
|
||||
self._mspec.lds_banks_per_cu = 32
|
||||
self._mspec.pipes_per_gpu = 4
|
||||
# --showmclkrange is broken in Mi100, hardcode freq
|
||||
if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
|
||||
self._mspec.max_mclk = 1200
|
||||
self._mspec.cur_mclk = 1200
|
||||
|
||||
# -----------------------
|
||||
# Required child methods
|
||||
|
||||
@@ -64,12 +64,6 @@ class gfx90a_soc(OmniSoC_Base):
|
||||
)
|
||||
self.roofline_obj = Roofline(args, self._mspec)
|
||||
|
||||
# Workaround for broken --showmclkrange
|
||||
# MI210/MI250/MI250X have 1600MHz mclk
|
||||
if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
|
||||
self._mspec.max_mclk = 1600
|
||||
self._mspec.cur_mclk = 1600
|
||||
|
||||
# Set arch specific specs
|
||||
self._mspec._l2_banks = 32
|
||||
self._mspec.lds_banks_per_cu = 32
|
||||
|
||||
@@ -64,12 +64,6 @@ class gfx942_soc(OmniSoC_Base):
|
||||
)
|
||||
self.roofline_obj = Roofline(args, self._mspec)
|
||||
|
||||
# Workaround for broken --showmclkrange
|
||||
# MI300X/MI300A/MI308X have 1300MHz mclk
|
||||
if self._mspec.max_mclk is None or self._mspec.cur_mclk is None:
|
||||
self._mspec.max_mclk = 1300
|
||||
self._mspec.cur_mclk = 1300
|
||||
|
||||
# Set arch specific specs
|
||||
self._mspec._l2_banks = 16
|
||||
self._mspec.lds_banks_per_cu = 32
|
||||
|
||||
@@ -0,0 +1,117 @@
|
||||
##############################################################################bl
|
||||
# MIT License
|
||||
#
|
||||
# Copyright (c) 2021 - 2025 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
# of this software and associated documentation files (the "Software"), to deal
|
||||
# in the Software without restriction, including without limitation the rights
|
||||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
# copies of the Software, and to permit persons to whom the Software is
|
||||
# furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in all
|
||||
# copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
# SOFTWARE.
|
||||
##############################################################################el
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import config
|
||||
from rocprof_compute_soc.soc_base import OmniSoC_Base
|
||||
from roofline import Roofline
|
||||
from utils.logger import demarcate
|
||||
from utils.utils import console_error, console_log, mibench
|
||||
|
||||
|
||||
class gfx950_soc(OmniSoC_Base):
|
||||
def __init__(self, args, mspec):
|
||||
super().__init__(args, mspec)
|
||||
self.set_arch("gfx950")
|
||||
if hasattr(self.get_args(), "roof_only") and self.get_args().roof_only:
|
||||
self.set_perfmon_dir(
|
||||
str(
|
||||
Path(str(config.rocprof_compute_home)).joinpath(
|
||||
"rocprof_compute_soc",
|
||||
"profile_configs",
|
||||
"gfx950",
|
||||
"roofline",
|
||||
)
|
||||
)
|
||||
)
|
||||
else:
|
||||
# NB: We're using generalized Mi300 perfmon configs
|
||||
self.set_perfmon_dir(
|
||||
str(
|
||||
Path(str(config.rocprof_compute_home)).joinpath(
|
||||
"rocprof_compute_soc",
|
||||
"profile_configs",
|
||||
"gfx950",
|
||||
)
|
||||
)
|
||||
)
|
||||
self.set_compatible_profilers(["rocprofv3"])
|
||||
# Per IP block max number of simultaneous counters. GFX IP Blocks
|
||||
self.set_perfmon_config(
|
||||
{
|
||||
"SQ": 8,
|
||||
"TA": 2,
|
||||
"TD": 2,
|
||||
"TCP": 4,
|
||||
"TCC": 4,
|
||||
"CPC": 2,
|
||||
"CPF": 2,
|
||||
"SPI": 2,
|
||||
"GRBM": 2,
|
||||
"GDS": 4,
|
||||
"TCC_channels": 16,
|
||||
}
|
||||
)
|
||||
self.roofline_obj = Roofline(args, self._mspec)
|
||||
|
||||
# Set arch specific specs
|
||||
self._mspec._l2_banks = 16
|
||||
self._mspec.lds_banks_per_cu = 32
|
||||
self._mspec.pipes_per_gpu = 4
|
||||
|
||||
# -----------------------
|
||||
# Required child methods
|
||||
# -----------------------
|
||||
@demarcate
|
||||
def profiling_setup(self):
|
||||
"""Perform any SoC-specific setup prior to profiling."""
|
||||
super().profiling_setup()
|
||||
# Performance counter filtering
|
||||
self.perfmon_filter(self.get_args().roof_only)
|
||||
|
||||
@demarcate
|
||||
def post_profiling(self):
|
||||
"""Perform any SoC-specific post profiling activities."""
|
||||
super().post_profiling()
|
||||
|
||||
if not self.get_args().no_roof:
|
||||
console_log(
|
||||
"roofline", "Checking for roofline.csv in " + str(self.get_args().path)
|
||||
)
|
||||
if not Path(self.get_args().path).joinpath("roofline.csv").is_file():
|
||||
mibench(self.get_args(), self._mspec)
|
||||
self.roofline_obj.post_processing()
|
||||
else:
|
||||
console_log("roofline", "Skipping roofline")
|
||||
|
||||
@demarcate
|
||||
def analysis_setup(self, roofline_parameters=None):
|
||||
"""Perform any SoC-specific setup prior to analysis."""
|
||||
super().analysis_setup()
|
||||
# configure roofline for analysis
|
||||
if roofline_parameters:
|
||||
self.roofline_obj = Roofline(
|
||||
self.get_args(), self._mspec, roofline_parameters
|
||||
)
|
||||
@@ -120,7 +120,7 @@ def discrete_background_color_bins(df, n_bins=5, columns="all"):
|
||||
####################
|
||||
# GRAPHICAL ELEMENTS
|
||||
####################
|
||||
def build_bar_chart(display_df, table_config, barchart_elements, norm_filt, hbm_bw):
|
||||
def build_bar_chart(display_df, table_config, barchart_elements, norm_filt):
|
||||
"""
|
||||
Read data into a bar chart. ID will determine which subtype of barchart.
|
||||
"""
|
||||
@@ -214,6 +214,9 @@ def build_bar_chart(display_df, table_config, barchart_elements, norm_filt, hbm_
|
||||
orientation="h",
|
||||
).update_xaxes(range=[0, 110], ticks="inside", title="%")
|
||||
) # append first % chart
|
||||
hbm_bw = float(
|
||||
display_df[display_df["Metric"] == "HBM Bandwidth"]["Avg"].iloc[0]
|
||||
)
|
||||
d_figs.append(
|
||||
px.bar(
|
||||
display_df[display_df["Unit"] == "Gb/s"],
|
||||
|
||||
@@ -1,7 +1,5 @@
|
||||
import os
|
||||
import sys
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, List, Optional, Union
|
||||
from typing import Any, Dict
|
||||
|
||||
import yaml
|
||||
|
||||
@@ -13,14 +11,20 @@ MI50 = 0
|
||||
MI100 = 1
|
||||
MI200 = 2
|
||||
MI300 = 3
|
||||
MI350 = 4
|
||||
|
||||
MI_CONSTANS = {MI50: "mi50", MI100: "mi100", MI200: "mi200", MI300: "mi300"}
|
||||
MI_CONSTANS = {
|
||||
MI50: "mi50",
|
||||
MI100: "mi100",
|
||||
MI200: "mi200",
|
||||
MI300: "mi300",
|
||||
MI350: "mi350",
|
||||
}
|
||||
|
||||
gpu_series_dict = {} # key: gpu arch
|
||||
gpu_model_dict = {} # key: gpu_arch
|
||||
mi300_num_xcds_dict = {} # key: gpu model
|
||||
mi300_nps_dict = {} # key: gpu model
|
||||
mi300_chip_id_dict = {} # key: chip id (int)
|
||||
num_xcds_dict = {} # key: gpu model
|
||||
chip_id_dict = {} # key: chip id (int)
|
||||
|
||||
|
||||
# ----------------------------
|
||||
@@ -60,10 +64,9 @@ def parse_mi_gpu_spec():
|
||||
MI GPUs
|
||||
|-- series
|
||||
|-- architecture (list)
|
||||
|-- models
|
||||
|-- chip_ids
|
||||
|-- mi300_arch
|
||||
|-- partition_mode
|
||||
|-- gpu model
|
||||
|-- chip_ids
|
||||
|-- partition_mode
|
||||
"""
|
||||
|
||||
current_dir = os.path.dirname(__file__)
|
||||
@@ -71,61 +74,26 @@ def parse_mi_gpu_spec():
|
||||
|
||||
# Load the YAML data
|
||||
yaml_data = load_yaml(yaml_file_path)
|
||||
mi300_models_dict = {}
|
||||
|
||||
for mi_index, mi_series in MI_CONSTANS.items():
|
||||
if mi_series != MI_CONSTANS[MI300]:
|
||||
console_debug("[parse_mi_gpu_spec] Processing series: %s" % mi_series)
|
||||
for key, value in yaml_data.items():
|
||||
# parse out gpu series and gpu model information for mi50, 100, 200
|
||||
curr_gpu_arch = value[mi_index]["gpu_archs"][0]["gpu_arch"]
|
||||
gpu_series_dict[curr_gpu_arch] = mi_series
|
||||
gpu_model_dict[curr_gpu_arch] = []
|
||||
for models in value[mi_index]["gpu_archs"][0]["models"]:
|
||||
gpu_model_dict[curr_gpu_arch].append(models["gpu_model"])
|
||||
elif mi_series == MI_CONSTANS[MI300]:
|
||||
# MI300 requires specific processing
|
||||
for key, value in yaml_data.items():
|
||||
mi300_gpu_archs_list = []
|
||||
# NOTE: only MI300 have multiple architectures
|
||||
for archs in value[MI300]["gpu_archs"]:
|
||||
curr_gpu_arch = archs["gpu_arch"]
|
||||
mi300_gpu_archs_list.append(curr_gpu_arch)
|
||||
gpu_series_dict[curr_gpu_arch] = mi_series
|
||||
|
||||
for idx, arch in enumerate(mi300_gpu_archs_list):
|
||||
mi300_models_dict[arch] = []
|
||||
for models in value[MI300]["gpu_archs"][idx]["models"]:
|
||||
gpu_model = models["gpu_model"]
|
||||
|
||||
# 1. Parse compute partition. NOTE: compute partition mode num xcds is available for all mi300 gpu models
|
||||
mi300_num_xcds_dict[gpu_model] = models["partition_mode"][
|
||||
"compute_partition_mode"
|
||||
]["num_xcds"]
|
||||
|
||||
# 2. Parse memory_partition. NOTE: memory partition mode nps is available for all mi300 gpu models
|
||||
mi300_nps_dict[gpu_model] = models["partition_mode"][
|
||||
"memory_partition_mode"
|
||||
]
|
||||
|
||||
# 3. Parse chip id (physical and virtual).
|
||||
if models["chip_ids"]["physical"]:
|
||||
# save chip_id, gpu_model pair if chip id is available
|
||||
# NOTE: chip id is available for all gfx942 machines
|
||||
mi300_chip_id_dict[models["chip_ids"]["physical"]] = models[
|
||||
"gpu_model"
|
||||
]
|
||||
|
||||
if models["chip_ids"]["virtual"]:
|
||||
# save chip_id, gpu_model pair if chip id is available
|
||||
# NOTE: chip id is available for all gfx942 machines
|
||||
mi300_chip_id_dict[models["chip_ids"]["virtual"]] = models[
|
||||
"gpu_model"
|
||||
]
|
||||
|
||||
mi300_models_dict[arch].append(gpu_model)
|
||||
|
||||
gpu_model_dict.update(mi300_models_dict)
|
||||
for series in yaml_data["mi_gpu_spec"]:
|
||||
curr_gpu_series = series["gpu_series"]
|
||||
console_debug("[parse_mi_gpu_spec] Processing series: %s" % curr_gpu_series)
|
||||
for archs in series["gpu_archs"]:
|
||||
curr_gpu_arch = archs["gpu_arch"]
|
||||
gpu_series_dict[curr_gpu_arch] = curr_gpu_series
|
||||
gpu_model_dict[curr_gpu_arch] = []
|
||||
for models in archs["models"]:
|
||||
curr_gpu_model = models["gpu_model"]
|
||||
gpu_model_dict[curr_gpu_arch].append(curr_gpu_model)
|
||||
num_xcds_dict[curr_gpu_model] = (
|
||||
models.get("partition_mode", {})
|
||||
.get("compute_partition_mode", {})
|
||||
.get("num_xcds", {})
|
||||
)
|
||||
if "chip_ids" in models and "physical" in models["chip_ids"]:
|
||||
chip_id_dict[models["chip_ids"]["physical"]] = curr_gpu_model
|
||||
if "chip_ids" in models and "virtual" in models["chip_ids"]:
|
||||
chip_id_dict[models["chip_ids"]["virtual"]] = curr_gpu_model
|
||||
|
||||
|
||||
def get_gpu_series_dict():
|
||||
@@ -164,9 +132,9 @@ def get_gpu_model(gpu_arch_, chip_id_):
|
||||
gpu_arch_lower = gpu_arch_.lower()
|
||||
|
||||
# Handle gfx942 with chip_id mapping
|
||||
if gpu_arch_lower == "gfx942":
|
||||
if chip_id_ and int(chip_id_) in mi300_chip_id_dict:
|
||||
gpu_model = mi300_chip_id_dict.get(int(chip_id_))
|
||||
if gpu_arch_lower not in ("gfx906", "gfx908", "gfx90a"):
|
||||
if chip_id_ and int(chip_id_) in chip_id_dict:
|
||||
gpu_model = chip_id_dict.get(int(chip_id_))
|
||||
else:
|
||||
console_warning(f"No gpu model found for chip id: {chip_id_}")
|
||||
return None
|
||||
@@ -186,8 +154,12 @@ def get_gpu_model(gpu_arch_, chip_id_):
|
||||
return gpu_model.upper()
|
||||
|
||||
|
||||
def get_mi300_num_xcds(gpu_model_, compute_partition_):
|
||||
if not mi300_num_xcds_dict:
|
||||
def get_num_xcds(gpu_model_, compute_partition_):
|
||||
# Only gpu in and above mi 300 series have more than one XCDs
|
||||
if gpu_model_.lower() in ("mi50", "mi60", "mi100", "mi210", "mi250", "mi250x"):
|
||||
return 1
|
||||
|
||||
if not num_xcds_dict:
|
||||
console_error(
|
||||
"mi300_num_xcds_dict not yet populated, did you run parse_mi_gpu_spec()?"
|
||||
)
|
||||
@@ -196,10 +168,10 @@ def get_mi300_num_xcds(gpu_model_, compute_partition_):
|
||||
gpu_model_lower = gpu_model_.lower()
|
||||
partition_lower = compute_partition_.lower()
|
||||
|
||||
if gpu_model_lower not in mi300_num_xcds_dict:
|
||||
if gpu_model_lower not in num_xcds_dict:
|
||||
return None
|
||||
|
||||
model_dict = mi300_num_xcds_dict[gpu_model_lower]
|
||||
model_dict = num_xcds_dict[gpu_model_lower]
|
||||
if partition_lower not in model_dict:
|
||||
console_log(f"Unknown compute partition: {compute_partition_}")
|
||||
return None
|
||||
@@ -214,9 +186,9 @@ def get_mi300_num_xcds(gpu_model_, compute_partition_):
|
||||
return num_xcds
|
||||
|
||||
|
||||
def get_mi300_chip_id_dict():
|
||||
if mi300_chip_id_dict:
|
||||
return mi300_chip_id_dict
|
||||
def get_chip_id_dict():
|
||||
if chip_id_dict:
|
||||
return chip_id_dict
|
||||
else:
|
||||
console_error(
|
||||
"mi300_chip_id_dict not yet populated, did you run parse_mi_gpu_spec()?"
|
||||
|
||||
@@ -9,11 +9,11 @@
|
||||
# MI GPUs
|
||||
# |-- series: the specific MI series; mi50, mi100, mi200, mi300
|
||||
# |-- architecture: currently, only mi300 gpus hold different architectures
|
||||
# |-- models
|
||||
# |-- chip_ids: chip id is specific to the environment the gpu is being used on
|
||||
# |-- partition_mode: currently, only mi300 gpus hold partition mode information
|
||||
# two types: compute partition mode, memory partition mode,
|
||||
# currently only mi300 gpus contains compute partition mode information on number of xcds
|
||||
# |-- gpu model
|
||||
# |-- chip_ids: chip id is specific to the environment the gpu is being used on
|
||||
# |-- partition_mode
|
||||
# | -- compute partition mode
|
||||
# | -- memory partition mode
|
||||
#
|
||||
# --------------------------------------------------------------------------------
|
||||
|
||||
@@ -23,45 +23,31 @@ mi_gpu_spec:
|
||||
- gpu_arch: gfx906
|
||||
models:
|
||||
- gpu_model: mi50
|
||||
partition_mode: null
|
||||
chip_ids:
|
||||
physical: null
|
||||
virtual: null
|
||||
- gpu_model: mi60
|
||||
partition_mode: null
|
||||
chip_ids:
|
||||
physical: null
|
||||
virtual: null
|
||||
|
||||
- gpu_series: mi100
|
||||
gpu_archs:
|
||||
- gpu_arch: gfx908
|
||||
models:
|
||||
- gpu_model: mi100
|
||||
partition_mode: null
|
||||
chip_ids:
|
||||
physical: 29580
|
||||
virtual: null
|
||||
|
||||
- gpu_series: mi200
|
||||
gpu_archs:
|
||||
- gpu_arch: gfx90a
|
||||
models:
|
||||
- gpu_model: mi210
|
||||
partition_mode: null
|
||||
chip_ids:
|
||||
physical: 29711
|
||||
virtual: null
|
||||
- gpu_model: mi250
|
||||
partition_mode: null
|
||||
chip_ids:
|
||||
physical: 29708
|
||||
virtual: null
|
||||
- gpu_model: mi250x
|
||||
partition_mode: null
|
||||
chip_ids:
|
||||
physical: 29704
|
||||
virtual: null
|
||||
- gpu_model: mi250
|
||||
- gpu_model: mi250x
|
||||
|
||||
- gpu_series: mi300
|
||||
gpu_archs:
|
||||
@@ -72,16 +58,10 @@ mi_gpu_spec:
|
||||
compute_partition_mode:
|
||||
num_xcds:
|
||||
spx: 6
|
||||
dpx: null
|
||||
tpx: 2
|
||||
qpx: null
|
||||
cpx: null
|
||||
memory_partition_mode:
|
||||
nps4: [tpx]
|
||||
nps1: [spx, tpx]
|
||||
chip_ids:
|
||||
physical: null
|
||||
virtual: null
|
||||
|
||||
- gpu_arch: gfx941
|
||||
models:
|
||||
@@ -91,15 +71,11 @@ mi_gpu_spec:
|
||||
num_xcds:
|
||||
spx: 8
|
||||
dpx: 4
|
||||
tpx: null
|
||||
qpx: 2
|
||||
cpx: 1
|
||||
memory_partition_mode:
|
||||
nps4: [qpx, cpx]
|
||||
nps1: [spx, qpx, cpx]
|
||||
chip_ids:
|
||||
physical: null
|
||||
virtual: null
|
||||
|
||||
- gpu_arch: gfx942
|
||||
models:
|
||||
@@ -108,10 +84,7 @@ mi_gpu_spec:
|
||||
compute_partition_mode:
|
||||
num_xcds:
|
||||
spx: 6
|
||||
dpx: null
|
||||
tpx: 2
|
||||
qpx: null
|
||||
cpx: null
|
||||
memory_partition_mode:
|
||||
nps4: [tpx]
|
||||
nps1: [spx, tpx]
|
||||
@@ -125,7 +98,6 @@ mi_gpu_spec:
|
||||
num_xcds:
|
||||
spx: 8
|
||||
dpx: 4
|
||||
tpx: null
|
||||
qpx: 2
|
||||
cpx: 1
|
||||
memory_partition_mode:
|
||||
@@ -141,8 +113,6 @@ mi_gpu_spec:
|
||||
num_xcds:
|
||||
spx: 4
|
||||
dpx: 2
|
||||
tpx: null
|
||||
qpx: null
|
||||
cpx: 1
|
||||
memory_partition_mode:
|
||||
nps4: [cpx]
|
||||
@@ -150,3 +120,21 @@ mi_gpu_spec:
|
||||
chip_ids:
|
||||
physical: 29858
|
||||
virtual: 29878
|
||||
|
||||
- gpu_series: mi350
|
||||
gpu_archs:
|
||||
- gpu_arch: gfx950
|
||||
models:
|
||||
- gpu_model: mi350
|
||||
partition_mode:
|
||||
compute_partition_mode:
|
||||
num_xcds:
|
||||
spx: 8
|
||||
dpx: 4
|
||||
qpx: 2
|
||||
cpx: 1
|
||||
memory_partition_mode:
|
||||
nps1: [spx, dpx, qpx, cpx]
|
||||
nps4: [qpx, cpx]
|
||||
chip_ids:
|
||||
physical: 30112
|
||||
|
||||
@@ -86,6 +86,7 @@ build_in_vars = {
|
||||
0) / $max_waves_per_cu) * 8) + MIN(MOD(ROUND(AVG(((4 * SQ_BUSY_CU_CYCLES) \
|
||||
/ $GRBM_GUI_ACTIVE_PER_XCD)), 0), $max_waves_per_cu), 8)), $cu_per_gpu))",
|
||||
"kernelBusyCycles": "ROUND(AVG((((End_Timestamp - Start_Timestamp) / 1000) * $max_sclk)), 0)",
|
||||
"hbmBandwidth": "($max_mclk / 1000 * 32 * $num_hbm_channels)",
|
||||
}
|
||||
|
||||
supported_call = {
|
||||
@@ -700,19 +701,80 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
|
||||
console_error("Hauting execution for warning above.")
|
||||
|
||||
ammolite__se_per_gpu = int(sys_info.se_per_gpu)
|
||||
if np.isnan(ammolite__se_per_gpu) or ammolite__se_per_gpu == 0:
|
||||
console_warning(
|
||||
"se_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__pipes_per_gpu = int(sys_info.pipes_per_gpu)
|
||||
if np.isnan(ammolite__pipes_per_gpu) or ammolite__pipes_per_gpu == 0:
|
||||
console_warning(
|
||||
"pipes_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__cu_per_gpu = int(sys_info.cu_per_gpu)
|
||||
if np.isnan(ammolite__cu_per_gpu) or ammolite__cu_per_gpu == 0:
|
||||
console_warning(
|
||||
"cu_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__simd_per_cu = int(sys_info.simd_per_cu) # not used
|
||||
if np.isnan(ammolite__simd_per_cu) or ammolite__simd_per_cu == 0:
|
||||
console_warning(
|
||||
"simd_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__sqc_per_gpu = int(sys_info.sqc_per_gpu)
|
||||
if np.isnan(ammolite__sqc_per_gpu) or ammolite__sqc_per_gpu == 0:
|
||||
console_warning(
|
||||
"sqc_per_gpu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__lds_banks_per_cu = int(sys_info.lds_banks_per_cu)
|
||||
if np.isnan(ammolite__lds_banks_per_cu) or ammolite__lds_banks_per_cu == 0:
|
||||
console_warning(
|
||||
"lds_banks_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__cur_sclk = float(sys_info.cur_sclk) # not used
|
||||
ammolite__mclk = float(sys_info.cur_mclk) # not used
|
||||
if np.isnan(ammolite__cur_sclk) or ammolite__cur_sclk == 0:
|
||||
console_warning(
|
||||
"cur_sclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__cur_mclk = float(sys_info.cur_mclk) # not used
|
||||
if np.isnan(ammolite__cur_mclk) or ammolite__cur_mclk == 0:
|
||||
console_warning(
|
||||
"cur_mclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__max_mclk = float(sys_info.max_mclk)
|
||||
if np.isnan(ammolite__max_mclk) or ammolite__max_mclk == 0:
|
||||
console_warning(
|
||||
"max_mclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__max_sclk = float(sys_info.max_sclk)
|
||||
if np.isnan(ammolite__max_sclk) or ammolite__max_sclk == 0:
|
||||
console_warning(
|
||||
"max_sclk is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__max_waves_per_cu = int(sys_info.max_waves_per_cu)
|
||||
ammolite__hbm_bw = float(sys_info.hbm_bw)
|
||||
if np.isnan(ammolite__max_waves_per_cu) or ammolite__max_waves_per_cu == 0:
|
||||
console_warning(
|
||||
"max_waver_per_cu is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__num_hbm_channels = float(sys_info.num_hbm_channels)
|
||||
if np.isnan(ammolite__num_hbm_channels) or ammolite__num_hbm_channels == 0:
|
||||
console_warning(
|
||||
"num_hbm_channels is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__total_l2_chan = calc_builtin_var("$total_l2_chan", sys_info)
|
||||
if np.isnan(ammolite__total_l2_chan) or ammolite__total_l2_chan == 0:
|
||||
console_warning(
|
||||
"total_l2_chan is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__num_xcd = int(sys_info.num_xcd)
|
||||
if np.isnan(ammolite__num_xcd) or ammolite__num_xcd == 0:
|
||||
console_warning(
|
||||
"num_xcd is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
ammolite__wave_size = int(sys_info.wave_size)
|
||||
if np.isnan(ammolite__wave_size) or ammolite__wave_size == 0:
|
||||
console_warning(
|
||||
"wave_size is not available in sysinfo.csv, please provide the correct value using --specs-correction"
|
||||
)
|
||||
|
||||
# TODO: fix all $normUnit in Unit column or title
|
||||
|
||||
@@ -751,6 +813,7 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
|
||||
ammolite__build_in[key] = None
|
||||
ammolite__numActiveCUs = ammolite__build_in["numActiveCUs"]
|
||||
ammolite__kernelBusyCycles = ammolite__build_in["kernelBusyCycles"]
|
||||
ammolite__hbmBandwidth = ammolite__build_in["hbmBandwidth"]
|
||||
|
||||
# Hmmm... apply + lambda should just work
|
||||
# df['Value'] = df['Value'].apply(lambda s: eval(compile(str(s), '<string>', 'eval')))
|
||||
@@ -821,7 +884,6 @@ def eval_metric(dfs, dfs_type, sys_info, raw_pmc_df, debug):
|
||||
else:
|
||||
console_error("analysis", str(ae))
|
||||
|
||||
# print("eval_metric", id, expr)
|
||||
try:
|
||||
out = eval(compile(row[expr], "<string>", "eval"))
|
||||
|
||||
|
||||
@@ -39,9 +39,9 @@ import pandas as pd
|
||||
|
||||
import config
|
||||
from utils.logger import console_debug, console_error, console_log, console_warning
|
||||
from utils.mi_gpu_spec import get_gpu_series_dict, get_mi300_chip_id_dict
|
||||
from utils.mi_gpu_spec import get_chip_id_dict, get_gpu_series_dict, get_num_xcds
|
||||
from utils.tty import get_table_string
|
||||
from utils.utils import get_version, total_xcds
|
||||
from utils.utils import get_version
|
||||
|
||||
VERSION_LOC = [
|
||||
"version",
|
||||
@@ -72,7 +72,6 @@ def detect_arch(_rocminfo):
|
||||
|
||||
def detect_gpu_chip_id(_rocminfo):
|
||||
gpu_chip_id = None
|
||||
mi300_chip_id_dict = get_mi300_chip_id_dict().keys()
|
||||
|
||||
for idx1, linetext in enumerate(_rocminfo):
|
||||
# NOTE: current supported socs only have numbers in Chip ID
|
||||
@@ -84,8 +83,8 @@ def detect_gpu_chip_id(_rocminfo):
|
||||
if not gpu_chip_id:
|
||||
console_warning("No Chip ID detected: " + str(gpu_chip_id))
|
||||
elif (
|
||||
gpu_chip_id not in mi300_chip_id_dict
|
||||
and int(gpu_chip_id) not in mi300_chip_id_dict
|
||||
gpu_chip_id not in get_chip_id_dict().keys()
|
||||
and int(gpu_chip_id) not in get_chip_id_dict().keys()
|
||||
):
|
||||
console_warning("Unknown Chip ID detected: " + str(gpu_chip_id))
|
||||
return gpu_chip_id
|
||||
@@ -214,7 +213,7 @@ def generate_machine_specs(args, sysinfo: dict = None):
|
||||
specs.total_l2_chan: str = total_l2_banks(
|
||||
specs.gpu_model, int(specs._l2_banks), specs.compute_partition
|
||||
)
|
||||
specs.hbm_bw: str = str(int(specs.max_mclk) / 1000 * 32 * specs.get_hbm_channels())
|
||||
specs.num_hbm_channels: str = str(specs.get_hbm_channels())
|
||||
return specs
|
||||
|
||||
|
||||
@@ -518,15 +517,6 @@ class MachineSpecs:
|
||||
"name": "Pipes per GPU",
|
||||
},
|
||||
)
|
||||
hbm_bw: str = field(
|
||||
default=None,
|
||||
metadata={
|
||||
"doc": "The peak theoretical HBM bandwidth for the accelerators/GPUs in the system. On systems with\n"
|
||||
"configurable partitioning, (e.g., MI300) this is the peak theoretical HBM bandwidth for a partition.",
|
||||
"name": "HBM BW",
|
||||
"unit": "GB/s",
|
||||
},
|
||||
)
|
||||
num_xcd: str = field(
|
||||
default=None,
|
||||
metadata={
|
||||
@@ -536,14 +526,13 @@ class MachineSpecs:
|
||||
"unit": "XCDs",
|
||||
},
|
||||
)
|
||||
num_hbm_channels: str = field(
|
||||
default=None,
|
||||
metadata={"doc": "Number of HBM channels", "name": "HBM channels"},
|
||||
)
|
||||
|
||||
def get_hbm_channels(self):
|
||||
# check MI300 has a valid compute partition
|
||||
mi300a_archs = ["mi300a_a0", "mi300a_a1"]
|
||||
mi300x_archs = ["mi300x_a0", "mi300x_a1"]
|
||||
mi308x_archs = ["mi308x"]
|
||||
|
||||
if self.gpu_model.lower() in mi300a_archs + mi300x_archs + mi308x_archs:
|
||||
if self.memory_partition.lower().startswith("nps"):
|
||||
hbmchannels = 128
|
||||
if self.memory_partition.lower() == "nps2":
|
||||
hbmchannels /= 2
|
||||
@@ -551,10 +540,9 @@ class MachineSpecs:
|
||||
hbmchannels /= 4
|
||||
elif self.memory_partition.lower() == "nps8":
|
||||
hbmchannels /= 8
|
||||
return int(hbmchannels)
|
||||
return hbmchannels
|
||||
else:
|
||||
hbmchannels = int(self.total_l2_chan)
|
||||
return hbmchannels
|
||||
return int(self.total_l2_chan)
|
||||
|
||||
def get_class_members(self):
|
||||
all_populated = True
|
||||
@@ -581,7 +569,7 @@ class MachineSpecs:
|
||||
data[name] = value
|
||||
|
||||
if not all_populated:
|
||||
console_error("Missing specs fields for %s" % self.gpu_arch)
|
||||
console_warning("Missing specs fields for %s" % self.gpu_arch)
|
||||
return pd.DataFrame(data, index=[0])
|
||||
|
||||
def __repr__(self):
|
||||
@@ -682,7 +670,7 @@ def total_sqc(archname, numCUs, numSEs):
|
||||
|
||||
|
||||
def total_l2_banks(archname, L2Banks, compute_partition):
|
||||
xcds = total_xcds(archname, compute_partition)
|
||||
xcds = get_num_xcds(archname, compute_partition)
|
||||
totalL2Banks = L2Banks * xcds
|
||||
return totalL2Banks
|
||||
|
||||
|
||||
@@ -43,16 +43,32 @@ import pandas as pd
|
||||
|
||||
import config
|
||||
from utils.logger import console_debug, console_error, console_log, console_warning
|
||||
from utils.mi_gpu_spec import get_mi300_num_xcds
|
||||
from utils.mi_gpu_spec import get_num_xcds
|
||||
|
||||
rocprof_cmd = ""
|
||||
rocprof_args = ""
|
||||
spi_pipe_counter_regexs = [r"SPI_CS\d+_(.*)", r"SPI_CSQ_P\d+_(.*)"]
|
||||
|
||||
|
||||
def is_tcc_channel_counter(counter):
|
||||
return counter.startswith("TCC") and counter.endswith("]")
|
||||
|
||||
|
||||
def is_spi_pipe_counter(counter):
|
||||
for pattern in spi_pipe_counter_regexs:
|
||||
if re.match(pattern, counter):
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def get_base_spi_pipe_counter(counter):
|
||||
for pattern in spi_pipe_counter_regexs:
|
||||
match = re.match(pattern, counter)
|
||||
if match:
|
||||
return match.group(1)
|
||||
return ""
|
||||
|
||||
|
||||
def using_v1():
|
||||
|
||||
return "ROCPROF" not in os.environ.keys() or (
|
||||
@@ -571,12 +587,7 @@ def run_prof(
|
||||
|
||||
# set required env var for mi300
|
||||
new_env = None
|
||||
if (
|
||||
mspec.gpu_model.lower() == "mi300x_a0"
|
||||
or mspec.gpu_model.lower() == "mi300x_a1"
|
||||
or mspec.gpu_model.lower() == "mi300a_a0"
|
||||
or mspec.gpu_model.lower() == "mi300a_a1"
|
||||
):
|
||||
if mspec.gpu_model.lower() not in ("mi50", "mi60", "mi210", "mi250", "mi250x"):
|
||||
new_env = os.environ.copy()
|
||||
new_env["ROCPROFILER_INDIVIDUAL_XCC_MODE"] = "1"
|
||||
|
||||
@@ -661,7 +672,7 @@ def run_prof(
|
||||
if new_env and not using_v3() and not using_v1():
|
||||
# flatten tcc for applicable mi300 input
|
||||
f = path(workload_dir + "/out/pmc_1/results_" + fbase + ".csv")
|
||||
xcds = total_xcds(mspec.gpu_model, mspec.compute_partition)
|
||||
xcds = get_num_xcds(mspec.gpu_model, mspec.compute_partition)
|
||||
df = flatten_tcc_info_across_xcds(f, xcds, int(mspec._l2_banks))
|
||||
df.to_csv(f, index=False)
|
||||
|
||||
@@ -1065,62 +1076,6 @@ def flatten_tcc_info_across_xcds(file, xcds, tcc_channel_per_xcd):
|
||||
return df
|
||||
|
||||
|
||||
def total_xcds(gpu_model, compute_partition):
|
||||
"""
|
||||
Returns the number of xcds for a gpu model and compute_partition pair.
|
||||
"""
|
||||
|
||||
# For mi300 chips, return result from mi_gpu_spec
|
||||
result = get_mi300_num_xcds(gpu_model, compute_partition)
|
||||
if result:
|
||||
return result
|
||||
|
||||
# For other systems, use manual check
|
||||
# check MI300 has a valid compute partition
|
||||
mi300a_model = ["mi300a_a0", "mi300a_a1"]
|
||||
mi300x_model = ["mi300x_a0", "mi300x_a1"]
|
||||
mi308x_model = ["mi308x"]
|
||||
if (
|
||||
gpu_model.lower() in mi300a_model + mi300x_model + mi308x_model
|
||||
and compute_partition == "NA"
|
||||
):
|
||||
console_error("Invalid compute partition found for {}".format(gpu_model))
|
||||
|
||||
if gpu_model.lower() not in mi300a_model + mi300x_model + mi308x_model:
|
||||
return 1
|
||||
# from the whitepaper
|
||||
# https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
|
||||
if compute_partition.lower() == "spx":
|
||||
if gpu_model.lower() in mi300a_model:
|
||||
return 6
|
||||
if gpu_model.lower() in mi300x_model:
|
||||
return 8
|
||||
if gpu_model.lower() in mi308x_model:
|
||||
return 4
|
||||
if compute_partition.lower() == "tpx":
|
||||
if gpu_model.lower() in mi300a_model:
|
||||
return 2
|
||||
if compute_partition.lower() == "dpx":
|
||||
if gpu_model.lower() in mi300x_model:
|
||||
return 4
|
||||
if gpu_model.lower() in mi308x_model:
|
||||
return 2
|
||||
if compute_partition.lower() == "qpx":
|
||||
if gpu_model.lower() in mi300x_model:
|
||||
return 2
|
||||
if compute_partition.lower() == "cpx":
|
||||
if gpu_model.lower() in mi300x_model:
|
||||
return 1
|
||||
if gpu_model.lower() in mi308x_model:
|
||||
return 1
|
||||
# TODO implement other archs here as needed
|
||||
console_error(
|
||||
"Unknown compute partition / arch found for {} / {}".format(
|
||||
compute_partition, gpu_model
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def get_submodules(package_name):
|
||||
"""List all submodules for a target package"""
|
||||
import importlib
|
||||
|
||||
@@ -136,7 +136,7 @@ def test_L1_cache_counters(
|
||||
options,
|
||||
check_success=False,
|
||||
roof=False,
|
||||
app_name=app_name
|
||||
app_name=app_name,
|
||||
)
|
||||
assert return_code == 0
|
||||
|
||||
|
||||
@@ -15,6 +15,7 @@ indirs = [
|
||||
"tests/workloads/vcopy/MI200",
|
||||
"tests/workloads/vcopy/MI300A_A1",
|
||||
"tests/workloads/vcopy/MI300X_A1",
|
||||
"tests/workloads/vcopy/MI350",
|
||||
]
|
||||
|
||||
|
||||
@@ -255,9 +256,13 @@ def test_dispatch_5(binary_handler_analyze_rocprof_compute):
|
||||
@pytest.mark.misc
|
||||
def test_gpu_ids(binary_handler_analyze_rocprof_compute):
|
||||
for dir in indirs:
|
||||
if dir.endswith("MI350"):
|
||||
gpu_id = "0"
|
||||
else:
|
||||
gpu_id = "2"
|
||||
workload_dir = test_utils.setup_workload_dir(dir)
|
||||
code = binary_handler_analyze_rocprof_compute(
|
||||
["analyze", "--path", workload_dir, "--gpu-id", "2"]
|
||||
["analyze", "--path", workload_dir, "--gpu-id", gpu_id]
|
||||
)
|
||||
assert code == 0
|
||||
|
||||
|
||||
@@ -112,6 +112,13 @@ def test_analyze_ipblocks_TCC_MI200(binary_handler_analyze_rocprof_compute):
|
||||
assert code == 0
|
||||
|
||||
|
||||
def test_analyze_no_roof_MI350(binary_handler_analyze_rocprof_compute):
|
||||
code = binary_handler_analyze_rocprof_compute(
|
||||
["analyze", "--path", "tests/workloads/no_roof/MI350"]
|
||||
)
|
||||
assert code == 0
|
||||
|
||||
|
||||
def test_analyze_no_roof_MI300X_A1(binary_handler_analyze_rocprof_compute):
|
||||
code = binary_handler_analyze_rocprof_compute(
|
||||
["analyze", "--path", "tests/workloads/no_roof/MI300X_A1"]
|
||||
|
||||
@@ -14,6 +14,7 @@ import test_utils
|
||||
|
||||
# Globals
|
||||
|
||||
# TODO: MI350 What are the gpu models in MI 350 series
|
||||
SUPPORTED_ARCHS = {
|
||||
"gfx906": {"mi50": ["MI50", "MI60"]},
|
||||
"gfx908": {"mi100": ["MI100"]},
|
||||
@@ -21,12 +22,14 @@ SUPPORTED_ARCHS = {
|
||||
"gfx940": {"mi300": ["MI300A_A0"]},
|
||||
"gfx941": {"mi300": ["MI300X_A0"]},
|
||||
"gfx942": {"mi300": ["MI300A_A1", "MI300X_A1"]},
|
||||
"gfx950": {"mi350": ["MI350"]},
|
||||
}
|
||||
|
||||
MI300_CHIP_IDS = {
|
||||
CHIP_IDS = {
|
||||
"29856": "MI300A_A1",
|
||||
"29857": "MI300X_A1",
|
||||
"29858": "MI308X",
|
||||
"30112": "MI350",
|
||||
}
|
||||
|
||||
|
||||
@@ -106,6 +109,25 @@ ALL_CSVS_MI300 = sorted(
|
||||
"timestamps.csv",
|
||||
]
|
||||
)
|
||||
ALL_CSVS_MI350 = sorted(
|
||||
[
|
||||
"SQ_IFETCH_LEVEL.csv",
|
||||
"SQ_INST_LEVEL_LDS.csv",
|
||||
"SQ_INST_LEVEL_SMEM.csv",
|
||||
"SQ_INST_LEVEL_VMEM.csv",
|
||||
"SQ_LEVEL_WAVES.csv",
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
)
|
||||
|
||||
ROOF_ONLY_FILES = sorted(
|
||||
[
|
||||
@@ -290,9 +312,9 @@ def gpu_soc():
|
||||
|
||||
## 3) Deduce gpu model name from arch
|
||||
gpu_model = list(SUPPORTED_ARCHS[gpu_arch].keys())[0].upper()
|
||||
if gpu_model == "MI300":
|
||||
if chip_id in MI300_CHIP_IDS:
|
||||
gpu_model = MI300_CHIP_IDS[chip_id]
|
||||
if gpu_model not in ("MI50", "MI100", "MI200"):
|
||||
if chip_id in CHIP_IDS:
|
||||
gpu_model = CHIP_IDS[chip_id]
|
||||
|
||||
return gpu_model
|
||||
|
||||
@@ -303,6 +325,9 @@ soc = gpu_soc()
|
||||
if "MI300" in soc:
|
||||
os.environ["ROCPROF"] = "rocprofv2"
|
||||
|
||||
if "MI350" in soc:
|
||||
os.environ["ROCPROF"] = "rocprofv3"
|
||||
|
||||
Baseline_dir = str(Path("tests/workloads/vcopy/" + soc).resolve())
|
||||
|
||||
|
||||
@@ -491,6 +516,8 @@ def test_path(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("This test is not supported for {}".format(soc))
|
||||
assert 0
|
||||
@@ -502,7 +529,7 @@ def test_path(binary_handler_profile_rocprof_compute):
|
||||
|
||||
@pytest.mark.misc
|
||||
def test_roof_kernel_names(binary_handler_profile_rocprof_compute):
|
||||
if soc == "MI100":
|
||||
if soc in ("MI100", "MI350"):
|
||||
# roofline is not supported on MI100
|
||||
assert True
|
||||
# Do not continue testing
|
||||
@@ -517,7 +544,7 @@ def test_roof_kernel_names(binary_handler_profile_rocprof_compute):
|
||||
# assert successful run
|
||||
assert returncode == 0
|
||||
file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
|
||||
if soc == "MI200" or "MI300" in soc:
|
||||
if soc == "MI200" in soc or "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(
|
||||
ROOF_ONLY_FILES + ["kernelName_legend.pdf"]
|
||||
)
|
||||
@@ -546,6 +573,8 @@ def test_device_filter(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("Testing isn't supported yet for {}".format(soc))
|
||||
assert 0
|
||||
@@ -574,6 +603,8 @@ def test_kernel(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI200)
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI300)
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("Testing isn't supported yet for {}".format(soc))
|
||||
assert 0
|
||||
@@ -625,6 +656,24 @@ def test_block_SQ(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"SQ_IFETCH_LEVEL.csv",
|
||||
"SQ_INST_LEVEL_LDS.csv",
|
||||
"SQ_INST_LEVEL_SMEM.csv",
|
||||
"SQ_INST_LEVEL_VMEM.csv",
|
||||
"SQ_LEVEL_WAVES.csv",
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -652,6 +701,8 @@ def test_block_SQC(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs.remove("timestamps.csv")
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -684,6 +735,8 @@ def test_block_TA(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs.remove("timestamps.csv")
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -721,6 +774,15 @@ def test_block_TD(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -771,6 +833,8 @@ def test_block_TCP(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs.remove("timestamps.csv")
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -825,6 +889,8 @@ def test_block_TCC(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs.remove("timestamps.csv")
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -857,6 +923,23 @@ def test_block_SPI(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"pmc_perf_8.csv",
|
||||
"pmc_perf_9.csv",
|
||||
"pmc_perf_10.csv",
|
||||
"pmc_perf_11.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -886,6 +969,19 @@ def test_block_CPC(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -910,6 +1006,8 @@ def test_block_CPF(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs.remove("timestamps.csv")
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
validate(
|
||||
@@ -959,6 +1057,24 @@ def test_block_SQ_CPC(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"SQ_IFETCH_LEVEL.csv",
|
||||
"SQ_INST_LEVEL_LDS.csv",
|
||||
"SQ_INST_LEVEL_SMEM.csv",
|
||||
"SQ_INST_LEVEL_VMEM.csv",
|
||||
"SQ_LEVEL_WAVES.csv",
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -1009,6 +1125,24 @@ def test_block_SQ_TA(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"SQ_IFETCH_LEVEL.csv",
|
||||
"SQ_INST_LEVEL_LDS.csv",
|
||||
"SQ_INST_LEVEL_SMEM.csv",
|
||||
"SQ_INST_LEVEL_VMEM.csv",
|
||||
"SQ_LEVEL_WAVES.csv",
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -1055,6 +1189,24 @@ def test_block_SQ_SPI(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"SQ_IFETCH_LEVEL.csv",
|
||||
"SQ_INST_LEVEL_LDS.csv",
|
||||
"SQ_INST_LEVEL_SMEM.csv",
|
||||
"SQ_INST_LEVEL_VMEM.csv",
|
||||
"SQ_LEVEL_WAVES.csv",
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -1106,6 +1258,24 @@ def test_block_SQ_SQC_TCP_CPC(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"SQ_IFETCH_LEVEL.csv",
|
||||
"SQ_INST_LEVEL_LDS.csv",
|
||||
"SQ_INST_LEVEL_SMEM.csv",
|
||||
"SQ_INST_LEVEL_VMEM.csv",
|
||||
"SQ_LEVEL_WAVES.csv",
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -1171,6 +1341,24 @@ def test_block_SQ_SPI_TA_TCC_CPF(binary_handler_profile_rocprof_compute):
|
||||
"sysinfo.csv",
|
||||
"timestamps.csv",
|
||||
]
|
||||
if soc == "MI350":
|
||||
expected_csvs = [
|
||||
"SQ_IFETCH_LEVEL.csv",
|
||||
"SQ_INST_LEVEL_LDS.csv",
|
||||
"SQ_INST_LEVEL_SMEM.csv",
|
||||
"SQ_INST_LEVEL_VMEM.csv",
|
||||
"SQ_LEVEL_WAVES.csv",
|
||||
"pmc_perf.csv",
|
||||
"pmc_perf_0.csv",
|
||||
"pmc_perf_1.csv",
|
||||
"pmc_perf_2.csv",
|
||||
"pmc_perf_3.csv",
|
||||
"pmc_perf_4.csv",
|
||||
"pmc_perf_5.csv",
|
||||
"pmc_perf_6.csv",
|
||||
"pmc_perf_7.csv",
|
||||
"sysinfo.csv",
|
||||
]
|
||||
|
||||
assert sorted(list(file_dict.keys())) == sorted(expected_csvs)
|
||||
|
||||
@@ -1196,6 +1384,8 @@ def test_dispatch_0(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("Testing isn't supported yet for {}".format(soc))
|
||||
assert 0
|
||||
@@ -1226,6 +1416,8 @@ def test_dispatch_0_1(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("Testing isn't supported yet for {}".format(soc))
|
||||
assert 0
|
||||
@@ -1253,6 +1445,8 @@ def test_dispatch_2(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("Testing isn't supported yet for {}".format(soc))
|
||||
assert 0
|
||||
@@ -1283,6 +1477,8 @@ def test_join_type_grid(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("Testing isn't supported yet for {}".format(soc))
|
||||
assert 0
|
||||
@@ -1310,6 +1506,8 @@ def test_join_type_kernel(binary_handler_profile_rocprof_compute):
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI200
|
||||
elif "MI300" in soc:
|
||||
assert sorted(list(file_dict.keys())) == ALL_CSVS_MI300
|
||||
elif "MI350" in soc:
|
||||
assert sorted(list(file_dict.keys())) == sorted(ALL_CSVS_MI350)
|
||||
else:
|
||||
print("Testing isn't supported yet for {}".format(soc))
|
||||
assert 0
|
||||
@@ -1326,7 +1524,7 @@ def test_join_type_kernel(binary_handler_profile_rocprof_compute):
|
||||
@pytest.mark.sort
|
||||
def test_roof_sort_dispatches(binary_handler_profile_rocprof_compute):
|
||||
# only test 1 device for roofline
|
||||
if soc == "MI100":
|
||||
if soc in ("MI100", "MI350"):
|
||||
# roofline is not supported on MI100
|
||||
assert True
|
||||
# Do not continue testing
|
||||
@@ -1356,7 +1554,7 @@ def test_roof_sort_dispatches(binary_handler_profile_rocprof_compute):
|
||||
@pytest.mark.sort
|
||||
def test_roof_sort_kernels(binary_handler_profile_rocprof_compute):
|
||||
# only test 1 device for roofline
|
||||
if soc == "MI100":
|
||||
if soc in ("MI100", "MI350"):
|
||||
# roofline is not supported on MI100
|
||||
assert True
|
||||
# Do not continue testing
|
||||
@@ -1386,7 +1584,7 @@ def test_roof_sort_kernels(binary_handler_profile_rocprof_compute):
|
||||
@pytest.mark.mem
|
||||
def test_roof_mem_levels_vL1D(binary_handler_profile_rocprof_compute):
|
||||
# only test 1 device for roofline
|
||||
if soc == "MI100":
|
||||
if soc in ("MI100", "MI350"):
|
||||
# roofline is not supported on MI100
|
||||
assert True
|
||||
# Do not continue testing
|
||||
@@ -1416,7 +1614,7 @@ def test_roof_mem_levels_vL1D(binary_handler_profile_rocprof_compute):
|
||||
@pytest.mark.mem
|
||||
def test_roof_mem_levels_LDS(binary_handler_profile_rocprof_compute):
|
||||
# only test 1 device for roofline
|
||||
if soc == "MI100":
|
||||
if soc in ("MI100", "MI350"):
|
||||
# roofline is not supported on MI100
|
||||
assert True
|
||||
# Do not continue testing
|
||||
|
||||
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:54:18 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
|
||||
|
||||
|
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:35:56 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
|
||||
|
||||
|
@@ -0,0 +1,4 @@
|
||||
Dispatch_ID,Kernel_Name,GPU_ID
|
||||
0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
|
+2
-2
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:39:25 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
|
||||
|
||||
|
@@ -0,0 +1,4 @@
|
||||
Dispatch_ID,Kernel_Name,GPU_ID
|
||||
0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
|
+2
-2
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_filter,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:03:10 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
|
||||
|
||||
|
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:53:52 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
|
||||
|
||||
|
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:33:56 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
|
||||
|
||||
|
@@ -0,0 +1,4 @@
|
||||
Dispatch_ID,Kernel_Name,GPU_ID
|
||||
0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
|
+2
-2
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:38:17 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
|
||||
|
||||
|
@@ -0,0 +1,4 @@
|
||||
Dispatch_ID,Kernel_Name,GPU_ID
|
||||
0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
|
+2
-2
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
device_inv_int,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:02:25 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
|
||||
|
||||
|
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:53:14 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 21 Mar 2024 03:52:12 PM (CDT),2,t007-001.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527651008,,6.0.2-115,113-D3431401-100,NA,NA,MI100,gfx908,16,8192,120,4,8,64,1024,40,1502,1200,1502,1200,32,32,64,4,1228.8,1,32
|
||||
|
||||
|
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:24:01 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
path,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF|roofline,Thu 21 Mar 2024 04:16:46 PM (CDT),2,t007-002.hpcfund,AMD EPYC 7V13 64-Core Processor,American Megatrends Inc.0602,Rocky Linux 9.1 (Blue Onyx),5.14.0-162.18.1.el9_1.x86_64,,527650760,,6.0.2-115,113-D67301-059,NA,NA,MI200,gfx90a,16,8192,104,4,8,64,1024,32,1700,1600,1700,1600,32,32,56,4,1638.4,1,32
|
||||
|
||||
|
@@ -0,0 +1,4 @@
|
||||
Dispatch_ID,Kernel_Name,GPU_ID
|
||||
0,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
1,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
2,"vecCopy(double*, double*, double*, int, int) (.kd)",11995
|
||||
|
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 01:36:42 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
vcopy,tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:09:51 PM (CDT),2,sh5-1w300-rg3-3,AMD Instinct MI300A Accelerator,"American Megatrends International, LLC.RMO1002DS",Ubuntu 22.04.2 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,131174852,,6.1.2-110,N/A,SPX,NPS1,MI300A_A1,gfx942,32,24576,228,4,24,64,1024,32,2100,1300,2100,1300,96,32,120,4,5324.8,6,96
|
||||
|
||||
|
@@ -0,0 +1,4 @@
|
||||
Dispatch_ID,Kernel_Name,GPU_ID
|
||||
0,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
1,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
2,"vecCopy(double*, double*, double*, int, int) (.kd)",60633
|
||||
|
@@ -1,2 +1,2 @@
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd
|
||||
dispatch_0,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Wed 29 May 2024 12:01:22 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8
|
||||
workload_name,command,ip_blocks,timestamp,version,hostname,cpu_model,sbios,linux_distro,linux_kernel_version,amd_gpu_kernel_version,cpu_memory,gpu_memory,rocm_version,vbios,compute_partition,memory_partition,gpu_model,gpu_arch,gpu_l1,gpu_l2,cu_per_gpu,simd_per_cu,se_per_gpu,wave_size,workgroup_max_size,max_waves_per_cu,max_sclk,max_mclk,cur_sclk,cur_mclk,total_l2_chan,lds_banks_per_cu,sqc_per_gpu,pipes_per_gpu,hbm_bw,num_xcd,num_hbm_channels
|
||||
vcopy,./tests/vcopy -n 1048576 -b 256 -i 3,SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF,Thu 30 May 2024 02:19:39 PM (CDT),2,splinter-126-wr-c6,AMD Ryzen 9 7950X 16-Core Processor,"American Megatrends International, LLC.VS2683299N.FD",Ubuntu 22.04.4 LTS,5.18.2-mi300-build-140423-ubuntu-22.04+,,114656528,,6.2.0-13611,113-MI3SRIOV-001,SPX,NPS1,MI300X_A1,gfx942,32,4096,304,4,32,64,1024,32,2100,1300,2100,1300,128,32,160,4,5324.8,8,128
|
||||
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user