Analysis report block based filtering for profiling (#566)

* Analysis report block based filtering for profiling

* Profiling mode changes

- `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23)
    - Only counters mentioned in the selected analysis report blocks will be collected
        - Add parsing logic to identify hardware counters from analysis report blocks
        - Add filtering logic to only write filtered counters in perfmon files
        - Log not collected counters in one line
- `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode
- Write arguments provided during profiling in profiling_configuration.yaml file

* Analysis mode changes

- During analysis mode, only show report blocks selected during profiling
    - If `-b` option is provided in analysis mode, then follow provided filters
- Do not show empty tables in analysis report

* Miscellaneous changes

- Update CHANGELOG
- Add test cases
    - Instruction mix report block filter
    - Instruction mix and Memory chart report block filter
    - Instruction mix report block filter and CPC hardware block filter
    - TA hardware block filter
    - --list-metrics in profile mode should work
- Move binary handler fixtures to conftest.py to avoid importing
  fixtures
- cmake file in tests directory has been updated to compile sample/vmem.hip for testing

* Public documentation changes

- Use the term "Hardware report block" instead of "Hardware block"
- Add documentation for "--list-metrics" option in profile mode
- Add example of filtering by hardware report block such as instruction
  mix and wavefront launch statistics
- Add deprecation warning for hardware component (sq, tcc) based filtering
This commit is contained in:
vedithal-amd
2025-03-10 14:42:56 -04:00
committad av GitHub
förälder 0aefd15b7b
incheckning 55cf0e237e
27 ändrade filer med 748 tillägg och 199 borttagningar
+5
Visa fil
@@ -8,6 +8,11 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.
* Add Docker files to package the application and dependencies into a single portable and executable standalone binary file
* Analysis report based filtering
* -b option in profile mode now additionally accepts metric id(s) for analysis report based filtering
* -b option in profile mode also accept hardware IP block for filtering, however, this support will be deprecated soon
* --list-metrics option added in profile mode to list possible metric id(s), similar to analyze mode
### Changed
* Change normal_unit default to per_kernel
+7
Visa fil
@@ -244,6 +244,13 @@ add_test(
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
add_test(
NAME test_profile_section
COMMAND
${Python3_EXECUTABLE} -m pytest -m section --junitxml=tests/test_profile_misc.xml
${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
set_tests_properties(
test_profile_kernel_execution
test_profile_ipblocks
+86 -10
Visa fil
@@ -230,7 +230,7 @@ Filtering options
-----------------
``-b``, ``--block <block-name>``
Allows system profiling on one or more selected hardware components to speed
Allows system profiling on one or more selected hardware report blocks to speed
up the profiling process. See :ref:`profiling-hw-component-filtering`.
``-k``, ``--kernel <kernel-substr>``
@@ -251,21 +251,91 @@ Filtering options
.. _profiling-hw-component-filtering:
Hardware component filtering
Hardware report block filtering
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can profile specific hardware components to speed up the profiling process.
In ROCm Compute Profiler, the term hardware block to refers to a hardware component or a
group of hardware components. All profiling results are accumulated in the same
target directory without overwriting those for other hardware components. This
enables incremental profiling and analysis.
You can profile specific hardware report blocks to speed up the profiling process.
In ROCm Compute Profiler, the term hardware report block refers to a section of the
analysis report which focuses on metrics associated with a hardware component or
a group of hardware components. All profiling results are accumulated in the same
target directory without overwriting those for other hardware components.
This enables incremental profiling and analysis.
The following example only gathers hardware counters for the shader sequencer
(SQ) and L2 cache (TCC) components, skipping all other hardware components.
The following example only gathers hardware counters used to calculate metrics
for ``Compute Unit - Instruction Mix`` (block 10) and ``Wavefront Launch Statistics``
(block 7) sections of the analysis report, while skipping over all other hardware counters.
.. code-block:: shell-session
$ rocprof-compute profile --name vcopy -b SQ TCC -- ./vcopy -n 1048576 -b 256
$ rocprof-compute profile --name vcopy -b 10 7 -- ./vcopy -n 1048576 -b 256
__ _
_ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___
| '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
| | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/
|_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___|
|_| |_|
rocprofiler-compute version: 2.0.0
Profiler choice: rocprofv1
Path: /home/auser/repos/rocprofiler-compute/sample/workloads/vcopy/MI200
Target: MI200
Command: ./vcopy -n 1048576 -b 256
Kernel Selection: None
Dispatch Selection: None
Hardware Blocks: []
Report Sections: ['10', '7']
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
To see a list of available hardware report blocks, use the ``--list-metrics`` option.
.. code-block:: shell-session
$ rocprof-compute profile --list-metrics
__ _
_ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___
| '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
| | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/
|_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___|
|_| |_|
0 -> Top Stats
1 -> System Info
2 -> System Speed-of-Light
2.1 -> Speed-of-Light
2.1.0 -> VALU FLOPs
2.1.1 -> VALU IOPs
2.1.2 -> MFMA FLOPs (F8)
...
5 -> Command Processor (CPC/CPF)
5.1 -> Command Processor Fetcher
5.1.0 -> CPF Utilization
5.1.1 -> CPF Stall
5.1.2 -> CPF-L2 Utilization
5.2 -> Packet Processor
5.2.0 -> CPC Utilization
5.2.1 -> CPC Stall Rate
5.2.5 -> CPC-UTCL1 Stall
...
6 -> Workgroup Manager (SPI)
6.1 -> Workgroup Manager Utilizations
6.1.0 -> Accelerator Utilization
6.1.1 -> Scheduler-Pipe Utilization
6.1.2 -> Workgroup Manager Utilization
It is also possible to filter counter collection by hardware component such as Shader Sequencer (SQ)
and L2 cache (TCC) as shown below.
.. code-block:: shell-session
$ rocprof-compute profile --name vcopy -b 10 7 -- ./vcopy -n 1048576 -b 256
__ _
_ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___
@@ -297,12 +367,18 @@ The following example only gathers hardware counters for the shader sequencer
Kernel Selection: None
Dispatch Selection: None
Hardware Blocks: ['sq', 'tcc']
Report Sections: []
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
.. warning::
Filtering by hardware components (e.g. SQ, TCC) will soon be deprecated.
It is recommended to use hardware report block based filtering.
.. _profiling-kernel-filtering:
Kernel filtering
+6 -6
Visa fil
@@ -57,17 +57,17 @@ Common filters to customize data collection include:
Enables filtering based on dispatch ID.
``-b``, ``--block``
Enables collection metrics for only the specified (one or more) hardware
component blocks.
Enables collection metrics for only the specified hardware report blocks.
See :ref:`Filtering <filtering>` for an in-depth walkthrough.
To view available metrics by hardware block, use the ``--list-metrics``
argument:
To view available metrics by hardware block, use the ``profile`` mode ``--list-metrics``
option with an optional system architecture argument (inferred if not provided):
.. code-block:: shell
$ rocprof-compute analyze --list-metrics <sys_arch>
$ rocprof-compute profile --list-metrics
$ rocprof-compute profile --list-metrics <sys_arch>
.. _basic-analyze-cli:
@@ -80,7 +80,7 @@ interface with profiling results. View different metrics derived from your
profiled results and get immediate access all metrics organized by hardware
blocks.
If you don't apply kernel, dispatch, or hardware block filters at this stage,
If you don't apply kernel, dispatch, or hardware report block filters at this stage,
analysis is reflective of the entirety of the profiling data.
To interact with profiling results from a different session, provide the
+1
Visa fil
@@ -50,6 +50,7 @@ pythonpath = [
]
markers = [
"section",
"kernel_execution",
"block",
"misc",
+60 -9
Visa fil
@@ -24,14 +24,15 @@
import argparse
import os
import re
import shutil
from pathlib import Path
def print_avail_arch(avail_arch: list):
ret_str = "\t\tList all available metrics for analysis on specified arch:"
ret_str = "\t\t\tList all available metrics for analysis on specified arch:"
for arch in avail_arch:
ret_str += "\n\t\t {}".format(arch)
ret_str += "\n\t\t\t {}".format(arch)
return ret_str
@@ -114,7 +115,6 @@ Examples:
type=str,
metavar="",
dest="name",
required=True,
help="\t\t\tAssign a name to workload.",
)
profile_group.add_argument("--target", type=str, default=None, help=argparse.SUPPRESS)
@@ -154,7 +154,7 @@ Examples:
default=False,
action="store_true",
help=argparse.SUPPRESS,
#help="\t\t\tKokkos trace, traces Kokkos API calls.",
# help="\t\t\tKokkos trace, traces Kokkos API calls.",
)
profile_group.add_argument(
"-k",
@@ -177,16 +177,67 @@ Examples:
required=False,
help="\t\t\tDispatch ID filtering.",
)
class AggregateDict(argparse.Action):
def __call__(self, parser, namespace, values, option_string=None):
aggregated_dict = getattr(namespace, self.dest, {})
if aggregated_dict is None:
aggregated_dict = {}
for key, value in values:
aggregated_dict[key] = value
setattr(namespace, self.dest, aggregated_dict)
def validate_block(value):
# Metric id regex, for example, 10, 4, 4.3, 4.32
# Dont allow more than two digits after decimal point
metric_id_pattern = re.compile(r"^\d+$|^\d\.\d$|^\d+\.\d\d$")
# Allow only the following hardware blocks
hardware_block_pattern = re.compile(r"^(SQ|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF)$")
if metric_id_pattern.match(value):
return (str(value), "metric_id")
if hardware_block_pattern.match(value):
return (str(value), "hardware_block")
raise argparse.ArgumentTypeError(f"Invalid hardware block or metric id: {value}")
profile_group.add_argument(
"-b",
"--block",
type=str,
dest="ipblocks",
type=validate_block,
action=AggregateDict,
dest="filter_blocks",
metavar="",
nargs="+",
required=False,
choices=["SQ", "SQC", "TA", "TD", "TCP", "TCC", "SPI", "CPC", "CPF"],
help="\t\t\tHardware block filtering:\n\t\t\t SQ\n\t\t\t SQC\n\t\t\t TA\n\t\t\t TD\n\t\t\t TCP\n\t\t\t TCC\n\t\t\t SPI\n\t\t\t CPC\n\t\t\t CPF",
default={},
help="""\t\t\tSpecify metric id(s) from --list-metrics for filtering (e.g. 10, 4, 4.3).
\t\t\tCan provide multiple space separated arguments.
\t\t\tCan also accept Hardware blocks.
\t\t\tHardware block filtering (to be deprecated soon):
\t\t\t SQ
\t\t\t SQC
\t\t\t TA
\t\t\t TD
\t\t\t TCP
\t\t\t TCC
\t\t\t SPI
\t\t\t CPC
\t\t\t CPF""",
)
profile_group.add_argument(
"--list-metrics",
metavar="",
nargs="?",
const="",
# Argument to --list-metrics is optional
choices=[""] + list(supported_archs.keys()), # ["gfx906", "gfx908", "gfx90a"],
help=print_avail_arch(supported_archs.keys()),
)
profile_group.add_argument(
"--config-dir",
dest="config_dir",
metavar="",
help="\t\t\tSpecify the directory of customized report section configs.",
default=rocprof_compute_home.joinpath("rocprof_compute_soc/analysis_configs/"),
)
result = shutil.which("rocscope")
@@ -487,7 +538,7 @@ Examples:
dest="filter_metrics",
metavar="",
nargs="+",
help="\t\tSpecify hardware block/metric id(s) from --list-metrics for filtering.",
help="\t\tSpecify metric id(s) from --list-metrics for filtering.",
)
analyze_group.add_argument(
"--gpu-id",
@@ -45,6 +45,7 @@ class OmniAnalyze_Base:
self.__args = args
self._runs = OrderedDict()
self._arch_configs = {}
self._profiling_config = dict()
self.__supported_archs = supported_archs
self._output = None
self.__socs: dict = None # available OmniSoC objs
@@ -254,6 +255,9 @@ class OmniAnalyze_Base:
open(self.__args.output_file, "w+") if self.__args.output_file else sys.stdout
)
# Read profiling config
self._profiling_config = file_io.load_profiling_config(self.__args.path[0][0])
# initalize runs
self._runs = self.initalize_runs()
@@ -100,4 +100,5 @@ class cli_analysis(OmniAnalyze_Base):
self._runs[self.get_args().path[0][0]].sys_info.iloc[0]["gpu_arch"]
],
self._output,
self._profiling_config,
)
+63 -5
Visa fil
@@ -33,10 +33,11 @@ import time
from pathlib import Path
import pandas as pd
import yaml
import config
from argparser import omniarg_parser
from utils import file_io
from utils import file_io, parser, schema
from utils.logger import (
setup_console_handler,
setup_file_handler,
@@ -47,6 +48,7 @@ from utils.utils import (
console_debug,
console_error,
console_log,
console_warning,
demarcate,
detect_rocprof,
get_submodules,
@@ -230,11 +232,50 @@ class RocProfCompute:
return
@demarcate
def list_metrics(self):
if not self.__args.list_metrics:
arch = self.__mspec.gpu_arch
else:
arch = self.__args.list_metrics
if arch in self.__supported_archs.keys():
ac = schema.ArchConfig()
ac.panel_configs = file_io.load_panel_configs(
self.__args.config_dir.joinpath(arch)
)
sys_info = self.__mspec.get_class_members().iloc[0]
parser.build_dfs(archConfigs=ac, filter_metrics=[], sys_info=sys_info)
for key, value in ac.metric_list.items():
prefix = ""
if "." not in str(key):
prefix = ""
elif str(key).count(".") == 1:
prefix = "\t"
else:
prefix = "\t\t"
print(prefix + key, "->", value)
sys.exit(0)
else:
console_error("Unsupported arch")
@demarcate
def run_profiler(self):
self.print_graphic()
self.load_soc_specs()
if self.__args.list_metrics is not None:
self.list_metrics()
elif self.__args.name is None:
sys.exit("Either --list-name or --name is required")
# Deprecation warning for hardware blocks
if [
name
for name, type in self.__args.filter_blocks.items()
if type == "hardware_block"
]:
console_warning("Hardware block based filtering will be deprecated soon")
# FIXME:
# Changing default path should be done at the end of arg parsing stage,
# unless there is a specific reason to do here.
@@ -250,25 +291,37 @@ class RocProfCompute:
from rocprof_compute_profile.profiler_rocprof_v1 import rocprof_v1_profiler
profiler = rocprof_v1_profiler(
self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
self.__args,
self.__profiler_mode,
self.__soc[self.__mspec.gpu_arch],
self.__supported_archs,
)
elif self.__profiler_mode == "rocprofv2":
from rocprof_compute_profile.profiler_rocprof_v2 import rocprof_v2_profiler
profiler = rocprof_v2_profiler(
self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
self.__args,
self.__profiler_mode,
self.__soc[self.__mspec.gpu_arch],
self.__supported_archs,
)
elif self.__profiler_mode == "rocprofv3":
from rocprof_compute_profile.profiler_rocprof_v3 import rocprof_v3_profiler
profiler = rocprof_v3_profiler(
self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
self.__args,
self.__profiler_mode,
self.__soc[self.__mspec.gpu_arch],
self.__supported_archs,
)
elif self.__profiler_mode == "rocscope":
from rocprof_compute_profile.profiler_rocscope import rocscope_profiler
profiler = rocscope_profiler(
self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
self.__args,
self.__profiler_mode,
self.__soc[self.__mspec.gpu_arch],
self.__supported_archs,
)
else:
console_error("Unsupported profiler")
@@ -278,6 +331,11 @@ class RocProfCompute:
# -----------------------
self.__soc[self.__mspec.gpu_arch].profiling_setup()
# Write profiling configuration as yaml file
with open(Path(self.__args.path).joinpath("profiling_config.yaml"), "w") as f:
args_dict = vars(self.__args)
args_dict["config_dir"] = str(args_dict["config_dir"])
yaml.dump(args_dict, f)
# enable file-based logging
setup_file_handler(self.__args.loglevel, self.__args.path)
+19 -5
Visa fil
@@ -27,7 +27,6 @@ import logging
import os
import re
import shutil
import sys
import time
from abc import ABC, abstractmethod
from pathlib import Path
@@ -51,15 +50,22 @@ from utils.utils import (
class RocProfCompute_Base:
def __init__(self, args, profiler_mode, soc):
def __init__(self, args, profiler_mode, soc, supported_archs):
self.__args = args
self.__profiler = profiler_mode
self.__supported_archs = supported_archs
self._soc = soc # OmniSoC obj
self.__perfmon_dir = str(
Path(str(config.rocprof_compute_home)).joinpath(
"rocprof_compute_soc", "profile_configs"
)
)
self.__filter_hardware_blocks = [
name for name, type in args.filter_blocks.items() if type == "hardware_block"
]
self.__filter_metric_ids = [
name for name, type in args.filter_blocks.items() if type == "metric_id"
]
def get_args(self):
return self.__args
@@ -320,10 +326,14 @@ class RocProfCompute_Base:
console_log("Command: " + str(self.__args.remaining))
console_log("Kernel Selection: " + str(self.__args.kernel))
console_log("Dispatch Selection: " + str(self.__args.dispatch))
if self.__args.ipblocks == None:
if self.__filter_hardware_blocks == None:
console_log("Hardware Blocks: All")
else:
console_log("Hardware Blocks: " + str(self.__args.ipblocks))
console_log("Hardware Blocks: " + str(self.__filter_hardware_blocks))
if self.__filter_metric_ids == None:
console_log("Report Sections: All")
else:
console_log("Report Sections: " + str(self.__filter_metric_ids))
msg = "Collecting Performance Counters"
(
@@ -424,7 +434,11 @@ class RocProfCompute_Base:
gen_sysinfo(
workload_name=self.__args.name,
workload_dir=self.get_args().path,
ip_blocks=self.__args.ipblocks,
ip_blocks=[
name
for name, type in self.__args.filter_blocks.items()
if type == "hardware_block"
],
app_cmd=self.__args.remaining,
skip_roof=self.__args.no_roof,
roof_only=self.__args.roof_only,
@@ -30,8 +30,8 @@ from utils.utils import console_log, demarcate, replace_timestamps, store_app_cm
class rocprof_v1_profiler(RocProfCompute_Base):
def __init__(self, profiling_args, profiler_mode, soc):
super().__init__(profiling_args, profiler_mode, soc)
def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
super().__init__(profiling_args, profiler_mode, soc, supported_archs)
self.ready_to_profile = (
self.get_args().roof_only
and not Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
@@ -31,8 +31,8 @@ from utils.utils import console_log, demarcate, replace_timestamps, store_app_cm
class rocprof_v2_profiler(RocProfCompute_Base):
def __init__(self, profiling_args, profiler_mode, soc):
super().__init__(profiling_args, profiler_mode, soc)
def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
super().__init__(profiling_args, profiler_mode, soc, supported_archs)
self.ready_to_profile = (
self.get_args().roof_only
and not Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
@@ -32,8 +32,8 @@ from utils.utils import console_error, console_log, demarcate, replace_timestamp
class rocprof_v3_profiler(RocProfCompute_Base):
def __init__(self, profiling_args, profiler_mode, soc):
super().__init__(profiling_args, profiler_mode, soc)
def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
super().__init__(profiling_args, profiler_mode, soc, supported_archs)
self.ready_to_profile = (
self.get_args().roof_only
and not Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
@@ -27,8 +27,8 @@ from utils.utils import console_log, demarcate
class rocscope_profiler(RocProfCompute_Base):
def __init__(self, profiling_args, profiler_mode, soc):
super().__init__(profiling_args, profiler_mode, soc)
def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
super().__init__(profiling_args, profiler_mode, soc, supported_archs)
# -----------------------
# Required child methods
@@ -1,11 +0,0 @@
---
Panel Config:
id: 400
title: Roofline
data source:
- raw_csv_table:
id: 401
source: roofline.csv
comparable: false # for now
cli_style: roofline_chart
# TODO: refactoring the data structure to have metrics here!
+164 -23
Visa fil
@@ -32,9 +32,17 @@ from collections import OrderedDict
from pathlib import Path
import numpy as np
import yaml
from rocprof_compute_base import MI300_CHIP_IDS, SUPPORTED_ARCHS
from utils.utils import console_debug, console_error, console_log, demarcate
from utils.parser import build_in_vars, supported_denom
from utils.utils import (
console_debug,
console_error,
console_log,
convert_metric_id_to_panel_idx,
demarcate,
)
class OmniSoC_Base:
@@ -48,19 +56,10 @@ class OmniSoC_Base:
self.__perfmon_config = (
{}
) # Per IP block max number of simulutaneous counters. GFX IP Blocks
self.__section_counters = set() # hw counters corresponding to filtered sections
self.__soc_params = {} # SoC specifications
self.__compatible_profilers = [] # Store profilers compatible with SoC
self.populate_mspec()
# In some cases (i.e. --specs) path will not be given
if hasattr(self.__args, "path"):
if self.__args.path == str(Path(os.getcwd()).joinpath("workloads")):
self.__workload_dir = str(
Path(self.__args.path).joinpath(
self.__args.name, self._mspec.gpu_model
)
)
else:
self.__workload_dir = self.__args.path
def __hash__(self):
return hash(self.__arch)
@@ -189,6 +188,47 @@ class OmniSoC_Base:
total_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
)
@demarcate
def section_filter(self):
"""
Create a set of counters required for the selected report sections.
Parse analysis report configuration files based on the selected report sections to be filtered.
"""
args = self.__args
for section in self.__filter_metric_ids:
section_num = convert_metric_id_to_panel_idx(section)
file_id = str(section_num // 100)
# Convert "4" to "04"
if len(file_id) == 1:
file_id = f"0{file_id}"
# Identify yaml file corresponding to file_id
config_filename = [
filename
for filename in os.listdir(Path(args.config_dir).joinpath(self.__arch))
if filename.endswith(".yaml") and filename.startswith(file_id)
][0]
# Read the yaml file
with open(
Path(args.config_dir).joinpath(self.__arch, config_filename), "r"
) as stream:
section_config = yaml.safe_load(stream)
# Extract subsection if section is of the form 4.52
if section_num % 100:
section_config_text = "\n".join(
[
# Convert yaml to string
yaml.dump(subsection)
for subsection in section_config["Panel Config"]["data source"]
if subsection["metric_table"]["id"] == section_num
]
)
else:
# Convert yaml to string
section_config_text = yaml.dump(section_config)
self.__section_counters = self.__section_counters.union(
parse_counters(section_config_text)
)
@demarcate
def perfmon_filter(self, roofline_perfmon_only: bool):
"""Filter default performance counter set based on user arguments"""
@@ -197,15 +237,40 @@ class OmniSoC_Base:
and Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
):
return
workload_perfmon_dir = self.__workload_dir + "/perfmon"
# In some cases (i.e. --specs) path will not be given
if hasattr(self.__args, "path"):
if self.__args.path == str(Path(os.getcwd()).joinpath("workloads")):
workload_dir = str(
Path(self.__args.path).joinpath(
self.__args.name, self._mspec.gpu_model
)
)
else:
workload_dir = self.__args.path
workload_perfmon_dir = workload_dir + "/perfmon"
self.__filter_hardware_blocks = [
name
for name, type in self.get_args().filter_blocks.items()
if type == "hardware_block"
]
self.__filter_metric_ids = [
name
for name, type in self.get_args().filter_blocks.items()
if type == "metric_id"
]
self.section_filter()
# Initialize directories
if not Path(self.__workload_dir).is_dir():
os.makedirs(self.__workload_dir)
elif not Path(self.__workload_dir).is_symlink():
shutil.rmtree(self.__workload_dir)
if not Path(workload_dir).is_dir():
os.makedirs(workload_dir)
elif not Path(workload_dir).is_symlink():
shutil.rmtree(workload_dir)
else:
os.unlink(self.__workload_dir)
os.unlink(workload_dir)
os.makedirs(workload_perfmon_dir)
@@ -216,16 +281,17 @@ class OmniSoC_Base:
)
# Perfmon list filtering
if self.__args.ipblocks != None:
for i in range(len(self.__args.ipblocks)):
self.__args.ipblocks[i] = self.__args.ipblocks[i].lower()
if self.__filter_hardware_blocks:
hardware_blocks = [
block.lower() for block in self.__filter_hardware_blocks
]
mpattern = "pmc_([a-zA-Z0-9_]+)_perf*"
pmc_files_list = []
for fname in ref_pmc_files_list:
fbase = Path(fname).stem
ip = re.match(mpattern, fbase).group(1)
if ip in self.__args.ipblocks:
if ip in hardware_blocks:
pmc_files_list.append(fname)
console_log("fname: " + fbase + ": Added")
else:
@@ -242,8 +308,9 @@ class OmniSoC_Base:
perfmon_coalesce(
pmc_files_list,
self.__perfmon_config,
self.__workload_dir,
workload_dir,
self.get_args().spatial_multiplexing,
self.__section_counters,
)
# ----------------------------------------------------
@@ -310,7 +377,38 @@ def using_v3():
@demarcate
def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir, spatial_multiplexing):
def parse_counters(config_text):
"""
Create a set of all hardware counters mentioned in the given config file content string
"""
# hw counter name should start with ip block name
hw_counter_regex = r"(?:SQ|SQC|TA|TD|TCP|TCC|CPC|CPF|SPI|GRBM)_[0-9A-Za-z_]+"
# only capture the variable name after $ using capturing group
variable_regex = r"\$([0-9A-Za-z_]+)"
hw_counter_matches = set(re.findall(hw_counter_regex, config_text))
variable_matches = set(re.findall(variable_regex, config_text))
# get hw counters and variables for all supported denominators
for formula in supported_denom.values():
hw_counter_matches.update(re.findall(hw_counter_regex, formula))
variable_matches.update(re.findall(variable_regex, formula))
# get hw counters corresponding to variables recursively
while variable_matches:
subvariable_matches = set()
for var in variable_matches:
if var in build_in_vars:
hw_counter_matches.update(
re.findall(hw_counter_regex, build_in_vars[var])
)
subvariable_matches.update(re.findall(variable_regex, build_in_vars[var]))
# process new found variables
variable_matches = subvariable_matches - variable_matches
return list(hw_counter_matches)
@demarcate
def perfmon_coalesce(
pmc_files_list, perfmon_config, workload_dir, spatial_multiplexing, section_counters
):
"""Sort and bucket all related performance counters to minimize required application passes"""
workload_perfmon_dir = workload_dir + "/perfmon"
@@ -388,6 +486,49 @@ def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir, spatial_multi
if accu in normal_counters:
del normal_counters[accu]
# If section report filters have been provided, only collect counters necessary for those section reports
# Remove _sum and _expand suffixes while matching
def remove_suffixes(string):
for suffix in ["_sum", "_expand"]:
if string.endswith(suffix):
string = string[: -len(suffix)]
break
return string
section_counters = {remove_suffixes(counter) for counter in section_counters}
ignored_counters = list()
if section_counters:
# Remove unnecessary normal counters
for counter_name in list(normal_counters.keys()):
if remove_suffixes(counter_name) not in section_counters:
del normal_counters[counter_name]
ignored_counters.append(counter_name)
# Remove unnecessary accumulate counters
filtered_accumlate_counters = list()
for counters in accumulate_counters:
if any(
remove_suffixes(counter_name) in section_counters
for counter_name in counters
):
filtered_accumlate_counters.append(counters)
else:
ignored_counters.extend(counter_name)
accumulate_counters = filtered_accumlate_counters
if ignored_counters:
console_log(
f"Not collecting following counters per provided filter: {', '.join(ignored_counters)} "
)
# Throw error if no counters to be collected
if len(normal_counters) == 0 and len(accumulate_counters) == 0:
console_error(
"profiling",
"No performance counters to collect, please check the provided profiling filters",
)
output_files = []
accu_file_count = 0
+13 -35
Visa fil
@@ -25,9 +25,11 @@
import os
import time
from abc import ABC, abstractmethod
from collections import OrderedDict
from pathlib import Path
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from dash import dcc, html
@@ -75,12 +77,6 @@ class Roofline:
if hasattr(self.__args, "sort") and self.__args.sort != "ALL":
self.__run_parameters["sort_type"] = self.__args.sort
if (
not isinstance(self.__run_parameters["workload_dir"], list)
and self.__run_parameters["workload_dir"] != None
):
self.roof_setup()
self.validate_parameters()
def validate_parameters(self):
@@ -110,6 +106,12 @@ class Roofline:
ret_df,
):
"""Generate a set of empirical roofline plots given a directory containing required profiling and benchmarking data"""
if (
not isinstance(self.__run_parameters["workload_dir"], list)
and self.__run_parameters["workload_dir"] != None
):
self.roof_setup()
# Create arithmetic intensity data that will populate the roofline model
console_debug("roofline", "Path: %s" % self.__run_parameters["workload_dir"])
self.__ai_data = calc_ai(self.__mspec, self.__run_parameters["sort_type"], ret_df)
@@ -375,9 +377,11 @@ class Roofline:
@demarcate
def standalone_roofline(self):
from collections import OrderedDict
import pandas as pd
if (
not isinstance(self.__run_parameters["workload_dir"], list)
and self.__run_parameters["workload_dir"] != None
):
self.roof_setup()
# Change vL1D to a interpretable str, if required
if "vL1D" in self.__run_parameters["mem_level"]:
@@ -394,32 +398,6 @@ class Roofline:
t_df["pmc_perf"] = pd.read_csv(app_path)
self.empirical_roofline(ret_df=t_df)
# Main methods
@abstractmethod
def pre_processing(self):
if self.__args.roof_only:
# check for sysinfo
console_log(
"roofline", "Checking for sysinfo.csv in " + str(self.__args.path)
)
sysinfo_path = str(Path(self.__args.path).joinpath("sysinfo.csv"))
if not Path(sysinfo_path).is_file():
console_log("roofline", "sysinfo.csv not found. Generating...")
class Dummy_SoC:
roofline_obj = True
gen_sysinfo(
workload_name=self.__args.name,
workload_dir=self.__workload_dir,
ip_blocks=self.__args.ipblocks,
app_cmd=self.__args.remaining,
skip_roof=self.__args.no_roof,
roof_only=self.__args.roof_only,
mspec=self.__mspec,
soc=Dummy_SoC,
)
@abstractmethod
def profile(self):
if self.__args.roof_only:
+16 -1
Visa fil
@@ -36,7 +36,7 @@ import yaml
import config
from utils import schema
from utils.kernel_name_shortener import kernel_name_shortener
from utils.utils import console_debug, console_error, demarcate
from utils.utils import console_debug, console_error, console_log, demarcate
# TODO: use pandas chunksize or dask to read really large csv file
# from dask import dataframe as dd
@@ -85,6 +85,21 @@ def load_panel_configs(dir):
return od
def load_profiling_config(config_dir):
"""
Load profiling config from yaml file.
"""
try:
with open(Path(config_dir).joinpath("profiling_config.yaml")) as file:
prof_config = yaml.safe_load(file)
return prof_config
except FileNotFoundError:
console_log(
f"Could not find profiling_config.yaml in {config_dir} for filtering analysis report"
)
return dict()
@demarcate
def create_df_kernel_top_stats(
df_in,
+3 -1
Visa fil
@@ -492,7 +492,9 @@ def build_dfs(archConfigs, filter_metrics, sys_info):
if type == "metric_table":
headers = ["Metric_ID"]
data_source_idx = str(data_config["id"] // 100)
if data_source_idx != 0 or data_source_idx in filter_metrics:
if data_source_idx != 0 or (
filter_metrics and data_source_idx in filter_metrics
):
metric_list[data_source_idx] = panel["title"]
if (
"cli_style" in data_config
+58 -7
Visa fil
@@ -29,7 +29,7 @@ import pandas as pd
from tabulate import tabulate
from utils import parser
from utils.utils import console_log, console_warning
from utils.utils import console_log, console_warning, convert_metric_id_to_panel_idx
hidden_columns = ["Tips", "coll_level"]
hidden_sections = [1900, 2000]
@@ -60,11 +60,20 @@ def get_table_string(df, transpose=False, decimal=2):
)
def show_all(args, runs, archConfigs, output):
def show_all(args, runs, archConfigs, output, profiling_config):
"""
Show all panels with their data in plain text mode.
"""
comparable_columns = parser.build_comparable_columns(args.time_unit)
filter_panel_ids = [
convert_metric_id_to_panel_idx(section)
for section in [
name
for name, type in profiling_config.get("filter_blocks", {}).items()
if type == "metric_id"
]
]
comparable_columns = parser.build_comparable_columns(args.time_unit)
for panel_id, panel in archConfigs.panel_configs.items():
# Skip panels that don't support baseline comparison
@@ -74,6 +83,27 @@ def show_all(args, runs, archConfigs, output):
for data_source in panel["data source"]:
for type, table_config in data_source.items():
# If block filtering was used during analysis, then dont use profiling config
# If block filtering was used in profiling config, only show those panels
# If block filtering not used in profiling config, show all panels
# Skip this table if table id or panel id is not present in block filters
# However, always show panel id <= 100
if (
not args.filter_metrics
and filter_panel_ids
and table_config["id"] not in filter_panel_ids
and panel_id not in filter_panel_ids
and panel_id > 100
):
table_id_str = (
str(table_config["id"] // 100)
+ "."
+ str(table_config["id"] % 100)
)
console_log(
f"Not showing table not selected during profiling: {table_id_str} {table_config['title']}"
)
continue
# take the 1st run as baseline
base_run, base_data = next(iter(runs.items()))
base_df = base_data.dfs[table_config["id"]]
@@ -207,7 +237,25 @@ def show_all(args, runs, archConfigs, output):
+ str(table_config["id"] % 100)
)
if "title" in table_config and table_config["title"]:
# Check if any column in df is empty
is_empty_columns_exist = any(
[
df.columns[col_idx]
for col_idx in range(len(df.columns))
if df.replace("", None).iloc[:, col_idx].isnull().all()
]
)
# Do not print the table if any column is empty
if is_empty_columns_exist:
console_log(
f"Not showing table with empty column(s): {table_id_str} {table_config['title']}"
)
if (
"title" in table_config
and table_config["title"]
and not is_empty_columns_exist
):
ss += table_id_str + " " + table_config["title"] + "\n"
if args.df_file_dir:
@@ -238,10 +286,13 @@ def show_all(args, runs, archConfigs, output):
and "columnwise" in table_config
and table_config["columnwise"] == True
)
ss += (
get_table_string(df, transpose=transpose, decimal=args.decimal)
+ "\n"
)
if not is_empty_columns_exist:
ss += (
get_table_string(
df, transpose=transpose, decimal=args.decimal
)
+ "\n"
)
if ss:
print("\n" + "-" * 80, file=output)
+15 -2
Visa fil
@@ -191,7 +191,7 @@ def capture_subprocess_output(subprocess_args, new_env=None, profileMode=False):
global rocprof_args
# Format command for debug messages, formatting for rocprofv1 and rocprofv2
command = " ".join(rocprof_args)
console_debug("subprocess", "Running: " + command)
console_debug("subprocess", "Running: " + command + " " + " ".join(subprocess_args))
# Start subprocess
# bufsize = 1 means output is line buffered
# universal_newlines = True is required for line buffering
@@ -820,7 +820,7 @@ def gen_sysinfo(
df["workload_name"] = workload_name
blocks = []
if ip_blocks == None:
if not ip_blocks:
t = ["SQ", "LDS", "SQC", "TA", "TD", "TCP", "TCC", "SPI", "CPC", "CPF"]
blocks += t
else:
@@ -1249,3 +1249,16 @@ def merge_counters_spatial_multiplex(df_multi_index):
final_df = pd.concat(result_dfs, keys=coll_levels, axis=1, copy=False)
return final_df
def convert_metric_id_to_panel_idx(metric_id):
# "4.02" -> 402
# "4.23" -> 423
# "4" -> 400
tokens = metric_id.split(".")
if len(tokens) == 1:
return int(tokens[0]) * 100
elif len(tokens) == 2:
return int(tokens[0]) * 100 + int(tokens[1])
else:
raise Exception(f"Invalid metric id: {metric_id}")
+5
Visa fil
@@ -13,3 +13,8 @@ set(VCOPY_SOURCES ../sample/vcopy.cpp)
set_source_files_properties(${VCOPY_SOURCES} PROPERTIES LANGUAGE HIP)
add_executable(vcopy ${VCOPY_SOURCES})
set_target_properties(vcopy PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
set(VMEM_SOURCES ../sample/vmem.hip)
set_source_files_properties(${VMEM_SOURCES} PROPERTIES LANGUAGE HIP)
add_executable(vmem ${VMEM_SOURCES})
set_target_properties(vmem PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
+72
Visa fil
@@ -1,5 +1,11 @@
import subprocess
from importlib.machinery import SourceFileLoader
from unittest.mock import patch
import pytest
rocprof_compute = SourceFileLoader("rocprof-compute", "src/rocprof-compute").load_module()
def pytest_addoption(parser):
parser.addoption(
@@ -8,3 +14,69 @@ def pytest_addoption(parser):
default=False,
help="Call standalone binary instead of main function during tests",
)
@pytest.fixture
def binary_handler_profile_rocprof_compute(request):
def _handler(config, workload_dir, options=[], check_success=True, roof=False):
if request.config.getoption("--call-binary"):
baseline_opts = [
"build/rocprof-compute.bin",
"profile",
"-n",
"app_1",
"-VVV",
]
if not roof:
baseline_opts.append("--no-roof")
process = subprocess.run(
baseline_opts
+ options
+ ["--path", workload_dir, "--"]
+ config["app_1"],
text=True,
)
# verify run status
if check_success:
assert process.returncode == 0
return process.returncode
else:
baseline_opts = ["rocprof-compute", "profile", "-n", "app_1", "-VVV"]
if not roof:
baseline_opts.append("--no-roof")
with pytest.raises(SystemExit) as e:
with patch(
"sys.argv",
baseline_opts
+ options
+ ["--path", workload_dir, "--"]
+ config["app_1"],
):
rocprof_compute.main()
# verify run status
if check_success:
assert e.value.code == 0
return e.value.code
return _handler
@pytest.fixture
def binary_handler_analyze_rocprof_compute(request):
def _handler(arguments):
if request.config.getoption("--call-binary"):
process = subprocess.run(
["build/rocprof-compute.bin", *arguments],
text=True,
)
return process.returncode
else:
with pytest.raises(SystemExit) as e:
with patch(
"sys.argv",
["rocprof-compute", *arguments],
):
rocprof_compute.main()
return e.value.code
return _handler
-1
Visa fil
@@ -6,7 +6,6 @@ from unittest.mock import patch
import pandas as pd
import pytest
import test_utils
from test_utils import binary_handler_analyze_rocprof_compute
config = {}
config["cleanup"] = True if "PYTEST_XDIST_WORKER_COUNT" in os.environ else False
-2
Visa fil
@@ -1,8 +1,6 @@
from unittest.mock import patch
import pandas as pd
import pytest
from test_utils import binary_handler_analyze_rocprof_compute
##################################################
## Generated tests ##
+133 -1
Visa fil
@@ -11,7 +11,6 @@ from unittest.mock import patch
import pandas as pd
import pytest
import test_utils
from test_utils import binary_handler_profile_rocprof_compute
# Globals
@@ -1458,3 +1457,136 @@ def test_mem_levels_LDS(binary_handler_profile_rocprof_compute):
)
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@pytest.mark.section
def test_instmix_section(binary_handler_profile_rocprof_compute):
options = ["--block", "10"]
workload_dir = test_utils.get_output_dir()
_ = binary_handler_profile_rocprof_compute(
config, workload_dir, options, check_success=True, roof=False
)
file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
validate(
inspect.stack()[0][3],
workload_dir,
file_dict,
)
assert test_utils.check_file_pattern(
"'10': metric_id", f"{workload_dir}/profiling_config.yaml"
)
assert test_utils.check_file_pattern(
"SQ_INSTS_VALU_MFMA_F64", f"{workload_dir}/pmc_perf.csv"
)
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@pytest.mark.section
def test_instmix_memchart_section(binary_handler_profile_rocprof_compute):
options = ["--block", "10", "3"]
workload_dir = test_utils.get_output_dir()
_ = binary_handler_profile_rocprof_compute(
config, workload_dir, options, check_success=True, roof=False
)
file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
validate(
inspect.stack()[0][3],
workload_dir,
file_dict,
)
assert test_utils.check_file_pattern(
"'10': metric_id", f"{workload_dir}/profiling_config.yaml"
)
assert test_utils.check_file_pattern(
"'3': metric_id", f"{workload_dir}/profiling_config.yaml"
)
assert test_utils.check_file_pattern(
"SQ_INSTS_VALU_MFMA_F64", f"{workload_dir}/pmc_perf.csv"
)
assert test_utils.check_file_pattern(
"SQC_TC_DATA_READ_REQ", f"{workload_dir}/pmc_perf.csv"
)
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@pytest.mark.section
def test_instmix_section_TA_block(binary_handler_profile_rocprof_compute):
options = ["--block", "10", "TA"]
workload_dir = test_utils.get_output_dir()
_ = binary_handler_profile_rocprof_compute(
config, workload_dir, options, check_success=True, roof=False
)
file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
validate(
inspect.stack()[0][3],
workload_dir,
file_dict,
)
assert test_utils.check_file_pattern(
"'10': metric_id", f"{workload_dir}/profiling_config.yaml"
)
assert test_utils.check_file_pattern(
"TA: hardware_block", f"{workload_dir}/profiling_config.yaml"
)
assert test_utils.check_file_pattern(
"TA_FLAT_WAVEFRONTS", f"{workload_dir}/pmc_perf.csv"
)
assert not test_utils.check_file_pattern(
"SQC_TC_DATA_READ_REQ", f"{workload_dir}/pmc_perf.csv"
)
assert test_utils.check_file_pattern("", f"{workload_dir}/pmc_perf.csv")
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@pytest.mark.section
def test_instmix_section_global_write_kernel(binary_handler_profile_rocprof_compute):
options = ["-k", "global_write", "--block", "10"]
custom_config = dict(config)
custom_config["kernel_name_1"] = "global_write"
custom_config["app_1"] = ["./tests/vmem"]
num_kernels = 1
workload_dir = test_utils.get_output_dir()
_ = binary_handler_profile_rocprof_compute(
custom_config, workload_dir, options, check_success=True, roof=False
)
file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
validate(
inspect.stack()[0][3],
workload_dir,
file_dict,
)
assert test_utils.check_file_pattern(
"'10': metric_id", f"{workload_dir}/profiling_config.yaml"
)
assert test_utils.check_file_pattern(
"- global_write", f"{workload_dir}/profiling_config.yaml"
)
assert test_utils.check_file_pattern(
"SQ_INSTS_VALU_MFMA_F64", f"{workload_dir}/pmc_perf.csv"
)
assert test_utils.check_file_pattern("global_write", f"{workload_dir}/pmc_perf.csv")
assert not test_utils.check_file_pattern(
"global_read", f"{workload_dir}/pmc_perf.csv"
)
test_utils.clean_output_dir(config["cleanup"], workload_dir)
@pytest.mark.section
def test_list_metrics(binary_handler_profile_rocprof_compute):
options = ["--list-metrics"]
workload_dir = test_utils.get_output_dir()
_ = binary_handler_profile_rocprof_compute(
config, workload_dir, options, check_success=True, roof=False
)
# workload dir should be empty
assert not os.listdir(workload_dir)
test_utils.clean_output_dir(config["cleanup"], workload_dir)
+9 -72
Visa fil
@@ -25,16 +25,11 @@
import inspect
import os
import re
import shutil
import subprocess
from importlib.machinery import SourceFileLoader
from pathlib import Path
from unittest.mock import patch
import pandas as pd
import pytest
rocprof_compute = SourceFileLoader("rocprof-compute", "src/rocprof-compute").load_module()
def check_resource_allocation():
@@ -57,6 +52,14 @@ def check_resource_allocation():
return
def check_file_pattern(pattern, file_path):
"""Check if the given pattern exists in the file"""
content = ""
with open(file_path) as f:
content = f.read()
return len(re.findall(pattern, content)) != 0
def get_output_dir(suffix="_output", clean_existing=True):
"""Provides a unique output directory based on the name of the calling test function with a suffix applied.
@@ -130,69 +133,3 @@ def check_csv_files(output_dir, num_devices, num_kernels):
elif file.endswith(".pdf"):
file_dict[file] = "pdf"
return file_dict
@pytest.fixture
def binary_handler_profile_rocprof_compute(request):
def _handler(config, workload_dir, options=[], check_success=True, roof=False):
if request.config.getoption("--call-binary"):
baseline_opts = [
"build/rocprof-compute.bin",
"profile",
"-n",
"app_1",
"-VVV",
]
if not roof:
baseline_opts.append("--no-roof")
process = subprocess.run(
baseline_opts
+ options
+ ["--path", workload_dir, "--"]
+ config["app_1"],
text=True,
)
# verify run status
if check_success:
assert process.returncode == 0
return process.returncode
else:
baseline_opts = ["rocprof-compute", "profile", "-n", "app_1", "-VVV"]
if not roof:
baseline_opts.append("--no-roof")
with pytest.raises(SystemExit) as e:
with patch(
"sys.argv",
baseline_opts
+ options
+ ["--path", workload_dir, "--"]
+ config["app_1"],
):
rocprof_compute.main()
# verify run status
if check_success:
assert e.value.code == 0
return e.value.code
return _handler
@pytest.fixture
def binary_handler_analyze_rocprof_compute(request):
def _handler(arguments):
if request.config.getoption("--call-binary"):
process = subprocess.run(
["build/rocprof-compute.bin", *arguments],
text=True,
)
return process.returncode
else:
with pytest.raises(SystemExit) as e:
with patch(
"sys.argv",
["rocprof-compute", *arguments],
):
rocprof_compute.main()
return e.value.code
return _handler