Analysis report block based filtering for profiling (#566)

* Analysis report block based filtering for profiling * Profiling mode changes - `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23) - Only counters mentioned in the selected analysis report blocks will be collected - Add parsing logic to identify hardware counters from analysis report blocks - Add filtering logic to only write filtered counters in perfmon files - Log not collected counters in one line - `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode - Write arguments provided during profiling in profiling_configuration.yaml file * Analysis mode changes - During analysis mode, only show report blocks selected during profiling - If `-b` option is provided in analysis mode, then follow provided filters - Do not show empty tables in analysis report * Miscellaneous changes - Update CHANGELOG - Add test cases - Instruction mix report block filter - Instruction mix and Memory chart report block filter - Instruction mix report block filter and CPC hardware block filter - TA hardware block filter - --list-metrics in profile mode should work - Move binary handler fixtures to conftest.py to avoid importing fixtures - cmake file in tests directory has been updated to compile sample/vmem.hip for testing * Public documentation changes - Use the term "Hardware report block" instead of "Hardware block" - Add documentation for "--list-metrics" option in profile mode - Add example of filtering by hardware report block such as instruction mix and wavefront launch statistics - Add deprecation warning for hardware component (sq, tcc) based filtering
2025-03-10 14:42:56 -04:00
@@ -8,6 +8,11 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs.

 * Add Docker files to package the application and dependencies into a single portable and executable standalone binary file

+* Analysis report based filtering
+  * -b option in profile mode now additionally accepts metric id(s) for analysis report based filtering
+  * -b option in profile mode also accept hardware IP block for filtering, however, this support will be deprecated soon
+  * --list-metrics option added in profile mode to list possible metric id(s), similar to analyze mode
+
 ### Changed

 * Change normal_unit default to per_kernel
@@ -244,6 +244,13 @@ add_test(
            ${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
    WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})

+add_test(
+    NAME test_profile_section
+    COMMAND
+        ${Python3_EXECUTABLE} -m pytest -m section --junitxml=tests/test_profile_misc.xml
+        ${COV_OPTION} ${PROJECT_SOURCE_DIR}/tests/test_profile_general.py
+    WORKING_DIRECTORY ${PROJECT_SOURCE_DIR})
+
 set_tests_properties(
    test_profile_kernel_execution
    test_profile_ipblocks
@@ -230,7 +230,7 @@ Filtering options
 -----------------

 ``-b``, ``--block <block-name>``
-   Allows system profiling on one or more selected hardware components to speed
+   Allows system profiling on one or more selected hardware report blocks to speed
   up the profiling process. See :ref:`profiling-hw-component-filtering`.

 ``-k``, ``--kernel <kernel-substr>``
@@ -251,21 +251,91 @@ Filtering options

 .. _profiling-hw-component-filtering:

-Hardware component filtering
+Hardware report block filtering
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-You can profile specific hardware components to speed up the profiling process.
-In ROCm Compute Profiler, the term hardware block to refers to a hardware component or a
-group of hardware components. All profiling results are accumulated in the same
-target directory without overwriting those for other hardware components. This
-enables incremental profiling and analysis.
+You can profile specific hardware report blocks to speed up the profiling process.
+In ROCm Compute Profiler, the term hardware report block refers to a section of the
+analysis report which focuses on metrics associated with a hardware component or
+a group of hardware components. All profiling results are accumulated in the same
+target directory without overwriting those for other hardware components.
+This enables incremental profiling and analysis.

-The following example only gathers hardware counters for the shader sequencer
-(SQ) and L2 cache (TCC) components, skipping all other hardware components.
+The following example only gathers hardware counters used to calculate metrics
+for ``Compute Unit - Instruction Mix`` (block 10) and ``Wavefront Launch Statistics``
+(block 7) sections of the analysis report, while skipping over all other hardware counters.

 .. code-block:: shell-session

-   $ rocprof-compute profile --name vcopy -b SQ TCC -- ./vcopy -n 1048576 -b 256
+   $ rocprof-compute profile --name vcopy -b 10 7 -- ./vcopy -n 1048576 -b 256
+
+                                    __                                       _
+    _ __ ___   ___ _ __  _ __ ___  / _|       ___ ___  _ __ ___  _ __  _   _| |_ ___
+   | '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
+   | | | (_) | (__| |_) | | | (_) |  _|_____| (_| (_) | | | | | | |_) | |_| | ||  __/
+   |_|  \___/ \___| .__/|_|  \___/|_|        \___\___/|_| |_| |_| .__/ \__,_|\__\___|
+                  |_|                                           |_|
+
+   rocprofiler-compute version: 2.0.0
+   Profiler choice: rocprofv1
+   Path: /home/auser/repos/rocprofiler-compute/sample/workloads/vcopy/MI200
+   Target: MI200
+   Command: ./vcopy -n 1048576 -b 256
+   Kernel Selection: None
+   Dispatch Selection: None
+   Hardware Blocks: []
+   Report Sections: ['10', '7']
+
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+   Collecting Performance Counters
+   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+   ...
+
+
+To see a list of available hardware report blocks, use the ``--list-metrics`` option.
+
+.. code-block:: shell-session
+
+   $ rocprof-compute profile --list-metrics
+
+                                    __                                       _
+    _ __ ___   ___ _ __  _ __ ___  / _|       ___ ___  _ __ ___  _ __  _   _| |_ ___
+   | '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
+   | | | (_) | (__| |_) | | | (_) |  _|_____| (_| (_) | | | | | | |_) | |_| | ||  __/
+   |_|  \___/ \___| .__/|_|  \___/|_|        \___\___/|_| |_| |_| .__/ \__,_|\__\___|
+                  |_|                                           |_|
+
+   0 -> Top Stats
+   1 -> System Info
+   2 -> System Speed-of-Light
+         2.1 -> Speed-of-Light
+                  2.1.0 -> VALU FLOPs
+                  2.1.1 -> VALU IOPs
+                  2.1.2 -> MFMA FLOPs (F8)
+   ...
+   5 -> Command Processor (CPC/CPF)
+         5.1 -> Command Processor Fetcher
+                  5.1.0 -> CPF Utilization
+                  5.1.1 -> CPF Stall
+                  5.1.2 -> CPF-L2 Utilization
+         5.2 -> Packet Processor
+                  5.2.0 -> CPC Utilization
+                  5.2.1 -> CPC Stall Rate
+                  5.2.5 -> CPC-UTCL1 Stall
+   ...
+   6 -> Workgroup Manager (SPI)
+         6.1 -> Workgroup Manager Utilizations
+                  6.1.0 -> Accelerator Utilization
+                  6.1.1 -> Scheduler-Pipe Utilization
+                  6.1.2 -> Workgroup Manager Utilization
+
+
+It is also possible to filter counter collection by hardware component such as Shader Sequencer (SQ)
+and L2 cache (TCC) as shown below.
+
+.. code-block:: shell-session
+
+   $ rocprof-compute profile --name vcopy -b 10 7 -- ./vcopy -n 1048576 -b 256

                                    __                                       _
    _ __ ___   ___ _ __  _ __ ___  / _|       ___ ___  _ __ ___  _ __  _   _| |_ ___
@@ -297,12 +367,18 @@ The following example only gathers hardware counters for the shader sequencer
   Kernel Selection: None
   Dispatch Selection: None
   Hardware Blocks: ['sq', 'tcc']
+   Report Sections: []

   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Collecting Performance Counters
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   ...

+.. warning::
+
+   Filtering by hardware components (e.g. SQ, TCC) will soon be deprecated.
+   It is recommended to use hardware report block based filtering.
+
 .. _profiling-kernel-filtering:

 Kernel filtering
@@ -57,17 +57,17 @@ Common filters to customize data collection include:
   Enables filtering based on dispatch ID.

 ``-b``, ``--block``
-   Enables collection metrics for only the specified (one or more) hardware
-   component blocks.
+   Enables collection metrics for only the specified hardware report blocks.

 See :ref:`Filtering <filtering>` for an in-depth walkthrough.

-To view available metrics by hardware block, use the ``--list-metrics``
-argument:
+To view available metrics by hardware block, use the ``profile`` mode ``--list-metrics``
+option with an optional system architecture argument (inferred if not provided):

 .. code-block:: shell

-   $ rocprof-compute analyze --list-metrics <sys_arch>
+   $ rocprof-compute profile --list-metrics
+   $ rocprof-compute profile --list-metrics <sys_arch>

 .. _basic-analyze-cli:

@@ -80,7 +80,7 @@ interface with profiling results. View different metrics derived from your
 profiled results and get immediate access all metrics organized by hardware
 blocks.

-If you don't apply kernel, dispatch, or hardware block filters at this stage,
+If you don't apply kernel, dispatch, or hardware report block filters at this stage,
 analysis is reflective of the entirety of the profiling data.

 To interact with profiling results from a different session, provide the
@@ -50,6 +50,7 @@ pythonpath = [
    ]

 markers = [
+	"section",
 	"kernel_execution",
 	"block",
 	"misc",
@@ -24,14 +24,15 @@

 import argparse
 import os
+import re
 import shutil
 from pathlib import Path


 def print_avail_arch(avail_arch: list):
-    ret_str = "\t\tList all available metrics for analysis on specified arch:"
+    ret_str = "\t\t\tList all available metrics for analysis on specified arch:"
    for arch in avail_arch:
-        ret_str += "\n\t\t   {}".format(arch)
+        ret_str += "\n\t\t\t   {}".format(arch)
    return ret_str


@@ -114,7 +115,6 @@ Examples:
        type=str,
        metavar="",
        dest="name",
-        required=True,
        help="\t\t\tAssign a name to workload.",
    )
    profile_group.add_argument("--target", type=str, default=None, help=argparse.SUPPRESS)
@@ -154,7 +154,7 @@ Examples:
        default=False,
        action="store_true",
        help=argparse.SUPPRESS,
-        #help="\t\t\tKokkos trace, traces Kokkos API calls.",
+        # help="\t\t\tKokkos trace, traces Kokkos API calls.",
    )
    profile_group.add_argument(
        "-k",
@@ -177,16 +177,67 @@ Examples:
        required=False,
        help="\t\t\tDispatch ID filtering.",
    )
+
+    class AggregateDict(argparse.Action):
+        def __call__(self, parser, namespace, values, option_string=None):
+            aggregated_dict = getattr(namespace, self.dest, {})
+            if aggregated_dict is None:
+                aggregated_dict = {}
+            for key, value in values:
+                aggregated_dict[key] = value
+            setattr(namespace, self.dest, aggregated_dict)
+
+    def validate_block(value):
+        # Metric id regex, for example, 10, 4, 4.3, 4.32
+        # Dont allow more than two digits after decimal point
+        metric_id_pattern = re.compile(r"^\d+$|^\d\.\d$|^\d+\.\d\d$")
+        # Allow only the following hardware blocks
+        hardware_block_pattern = re.compile(r"^(SQ|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF)$")
+        if metric_id_pattern.match(value):
+            return (str(value), "metric_id")
+        if hardware_block_pattern.match(value):
+            return (str(value), "hardware_block")
+        raise argparse.ArgumentTypeError(f"Invalid hardware block or metric id: {value}")
+
    profile_group.add_argument(
        "-b",
        "--block",
-        type=str,
-        dest="ipblocks",
+        type=validate_block,
+        action=AggregateDict,
+        dest="filter_blocks",
        metavar="",
        nargs="+",
        required=False,
-        choices=["SQ", "SQC", "TA", "TD", "TCP", "TCC", "SPI", "CPC", "CPF"],
-        help="\t\t\tHardware block filtering:\n\t\t\t   SQ\n\t\t\t   SQC\n\t\t\t   TA\n\t\t\t   TD\n\t\t\t   TCP\n\t\t\t   TCC\n\t\t\t   SPI\n\t\t\t   CPC\n\t\t\t   CPF",
+        default={},
+        help="""\t\t\tSpecify metric id(s) from --list-metrics for filtering (e.g. 10, 4, 4.3).
+    \t\t\tCan provide multiple space separated arguments.
+    \t\t\tCan also accept Hardware blocks.
+    \t\t\tHardware block filtering (to be deprecated soon):
+    \t\t\t   SQ
+    \t\t\t   SQC
+    \t\t\t   TA
+    \t\t\t   TD
+    \t\t\t   TCP
+    \t\t\t   TCC
+    \t\t\t   SPI
+    \t\t\t   CPC
+    \t\t\t   CPF""",
+    )
+    profile_group.add_argument(
+        "--list-metrics",
+        metavar="",
+        nargs="?",
+        const="",
+        # Argument to --list-metrics is optional
+        choices=[""] + list(supported_archs.keys()),  # ["gfx906", "gfx908", "gfx90a"],
+        help=print_avail_arch(supported_archs.keys()),
+    )
+    profile_group.add_argument(
+        "--config-dir",
+        dest="config_dir",
+        metavar="",
+        help="\t\t\tSpecify the directory of customized report section configs.",
+        default=rocprof_compute_home.joinpath("rocprof_compute_soc/analysis_configs/"),
    )

    result = shutil.which("rocscope")
@@ -487,7 +538,7 @@ Examples:
        dest="filter_metrics",
        metavar="",
        nargs="+",
-        help="\t\tSpecify hardware block/metric id(s) from --list-metrics for filtering.",
+        help="\t\tSpecify metric id(s) from --list-metrics for filtering.",
    )
    analyze_group.add_argument(
        "--gpu-id",
@@ -45,6 +45,7 @@ class OmniAnalyze_Base:
        self.__args = args
        self._runs = OrderedDict()
        self._arch_configs = {}
+        self._profiling_config = dict()
        self.__supported_archs = supported_archs
        self._output = None
        self.__socs: dict = None  # available OmniSoC objs
@@ -254,6 +255,9 @@ class OmniAnalyze_Base:
            open(self.__args.output_file, "w+") if self.__args.output_file else sys.stdout
        )

+        # Read profiling config
+        self._profiling_config = file_io.load_profiling_config(self.__args.path[0][0])
+
        # initalize runs
        self._runs = self.initalize_runs()

@@ -100,4 +100,5 @@ class cli_analysis(OmniAnalyze_Base):
                    self._runs[self.get_args().path[0][0]].sys_info.iloc[0]["gpu_arch"]
                ],
                self._output,
+                self._profiling_config,
            )
@@ -33,10 +33,11 @@ import time
 from pathlib import Path

 import pandas as pd
+import yaml

 import config
 from argparser import omniarg_parser
-from utils import file_io
+from utils import file_io, parser, schema
 from utils.logger import (
    setup_console_handler,
    setup_file_handler,
@@ -47,6 +48,7 @@ from utils.utils import (
    console_debug,
    console_error,
    console_log,
+    console_warning,
    demarcate,
    detect_rocprof,
    get_submodules,
@@ -230,11 +232,50 @@ class RocProfCompute:

        return

+    @demarcate
+    def list_metrics(self):
+        if not self.__args.list_metrics:
+            arch = self.__mspec.gpu_arch
+        else:
+            arch = self.__args.list_metrics
+        if arch in self.__supported_archs.keys():
+            ac = schema.ArchConfig()
+            ac.panel_configs = file_io.load_panel_configs(
+                self.__args.config_dir.joinpath(arch)
+            )
+            sys_info = self.__mspec.get_class_members().iloc[0]
+            parser.build_dfs(archConfigs=ac, filter_metrics=[], sys_info=sys_info)
+            for key, value in ac.metric_list.items():
+                prefix = ""
+                if "." not in str(key):
+                    prefix = ""
+                elif str(key).count(".") == 1:
+                    prefix = "\t"
+                else:
+                    prefix = "\t\t"
+                print(prefix + key, "->", value)
+            sys.exit(0)
+        else:
+            console_error("Unsupported arch")
+
    @demarcate
    def run_profiler(self):
        self.print_graphic()
        self.load_soc_specs()

+        if self.__args.list_metrics is not None:
+            self.list_metrics()
+        elif self.__args.name is None:
+            sys.exit("Either --list-name or --name is required")
+
+        # Deprecation warning for hardware blocks
+        if [
+            name
+            for name, type in self.__args.filter_blocks.items()
+            if type == "hardware_block"
+        ]:
+            console_warning("Hardware block based filtering will be deprecated soon")
+
        # FIXME:
        #     Changing default path should be done at the end of arg parsing stage,
        #     unless there is a specific reason to do here.
@@ -250,25 +291,37 @@ class RocProfCompute:
            from rocprof_compute_profile.profiler_rocprof_v1 import rocprof_v1_profiler

            profiler = rocprof_v1_profiler(
-                self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
+                self.__args,
+                self.__profiler_mode,
+                self.__soc[self.__mspec.gpu_arch],
+                self.__supported_archs,
            )
        elif self.__profiler_mode == "rocprofv2":
            from rocprof_compute_profile.profiler_rocprof_v2 import rocprof_v2_profiler

            profiler = rocprof_v2_profiler(
-                self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
+                self.__args,
+                self.__profiler_mode,
+                self.__soc[self.__mspec.gpu_arch],
+                self.__supported_archs,
            )
        elif self.__profiler_mode == "rocprofv3":
            from rocprof_compute_profile.profiler_rocprof_v3 import rocprof_v3_profiler

            profiler = rocprof_v3_profiler(
-                self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
+                self.__args,
+                self.__profiler_mode,
+                self.__soc[self.__mspec.gpu_arch],
+                self.__supported_archs,
            )
        elif self.__profiler_mode == "rocscope":
            from rocprof_compute_profile.profiler_rocscope import rocscope_profiler

            profiler = rocscope_profiler(
-                self.__args, self.__profiler_mode, self.__soc[self.__mspec.gpu_arch]
+                self.__args,
+                self.__profiler_mode,
+                self.__soc[self.__mspec.gpu_arch],
+                self.__supported_archs,
            )
        else:
            console_error("Unsupported profiler")
@@ -278,6 +331,11 @@ class RocProfCompute:
        # -----------------------

        self.__soc[self.__mspec.gpu_arch].profiling_setup()
+        # Write profiling configuration as yaml file
+        with open(Path(self.__args.path).joinpath("profiling_config.yaml"), "w") as f:
+            args_dict = vars(self.__args)
+            args_dict["config_dir"] = str(args_dict["config_dir"])
+            yaml.dump(args_dict, f)
        # enable file-based logging
        setup_file_handler(self.__args.loglevel, self.__args.path)

@@ -27,7 +27,6 @@ import logging
 import os
 import re
 import shutil
-import sys
 import time
 from abc import ABC, abstractmethod
 from pathlib import Path
@@ -51,15 +50,22 @@ from utils.utils import (


 class RocProfCompute_Base:
-    def __init__(self, args, profiler_mode, soc):
+    def __init__(self, args, profiler_mode, soc, supported_archs):
        self.__args = args
        self.__profiler = profiler_mode
+        self.__supported_archs = supported_archs
        self._soc = soc  # OmniSoC obj
        self.__perfmon_dir = str(
            Path(str(config.rocprof_compute_home)).joinpath(
                "rocprof_compute_soc", "profile_configs"
            )
        )
+        self.__filter_hardware_blocks = [
+            name for name, type in args.filter_blocks.items() if type == "hardware_block"
+        ]
+        self.__filter_metric_ids = [
+            name for name, type in args.filter_blocks.items() if type == "metric_id"
+        ]

    def get_args(self):
        return self.__args
@@ -320,10 +326,14 @@ class RocProfCompute_Base:
        console_log("Command: " + str(self.__args.remaining))
        console_log("Kernel Selection: " + str(self.__args.kernel))
        console_log("Dispatch Selection: " + str(self.__args.dispatch))
-        if self.__args.ipblocks == None:
+        if self.__filter_hardware_blocks == None:
            console_log("Hardware Blocks: All")
        else:
-            console_log("Hardware Blocks: " + str(self.__args.ipblocks))
+            console_log("Hardware Blocks: " + str(self.__filter_hardware_blocks))
+        if self.__filter_metric_ids == None:
+            console_log("Report Sections: All")
+        else:
+            console_log("Report Sections: " + str(self.__filter_metric_ids))

        msg = "Collecting Performance Counters"
        (
@@ -424,7 +434,11 @@ class RocProfCompute_Base:
        gen_sysinfo(
            workload_name=self.__args.name,
            workload_dir=self.get_args().path,
-            ip_blocks=self.__args.ipblocks,
+            ip_blocks=[
+                name
+                for name, type in self.__args.filter_blocks.items()
+                if type == "hardware_block"
+            ],
            app_cmd=self.__args.remaining,
            skip_roof=self.__args.no_roof,
            roof_only=self.__args.roof_only,
@@ -30,8 +30,8 @@ from utils.utils import console_log, demarcate, replace_timestamps, store_app_cm


 class rocprof_v1_profiler(RocProfCompute_Base):
-    def __init__(self, profiling_args, profiler_mode, soc):
-        super().__init__(profiling_args, profiler_mode, soc)
+    def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
+        super().__init__(profiling_args, profiler_mode, soc, supported_archs)
        self.ready_to_profile = (
            self.get_args().roof_only
            and not Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
@@ -31,8 +31,8 @@ from utils.utils import console_log, demarcate, replace_timestamps, store_app_cm


 class rocprof_v2_profiler(RocProfCompute_Base):
-    def __init__(self, profiling_args, profiler_mode, soc):
-        super().__init__(profiling_args, profiler_mode, soc)
+    def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
+        super().__init__(profiling_args, profiler_mode, soc, supported_archs)
        self.ready_to_profile = (
            self.get_args().roof_only
            and not Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
@@ -32,8 +32,8 @@ from utils.utils import console_error, console_log, demarcate, replace_timestamp


 class rocprof_v3_profiler(RocProfCompute_Base):
-    def __init__(self, profiling_args, profiler_mode, soc):
-        super().__init__(profiling_args, profiler_mode, soc)
+    def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
+        super().__init__(profiling_args, profiler_mode, soc, supported_archs)
        self.ready_to_profile = (
            self.get_args().roof_only
            and not Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
@@ -27,8 +27,8 @@ from utils.utils import console_log, demarcate


 class rocscope_profiler(RocProfCompute_Base):
-    def __init__(self, profiling_args, profiler_mode, soc):
-        super().__init__(profiling_args, profiler_mode, soc)
+    def __init__(self, profiling_args, profiler_mode, soc, supported_archs):
+        super().__init__(profiling_args, profiler_mode, soc, supported_archs)

    # -----------------------
    # Required child methods
@@ -1,11 +0,0 @@
---
-Panel Config:
-  id: 400
-  title: Roofline
-  data source:
-    - raw_csv_table:
-        id: 401
-        source: roofline.csv
-        comparable: false # for now
-        cli_style: roofline_chart
-        # TODO: refactoring the data structure to have metrics here!
@@ -32,9 +32,17 @@ from collections import OrderedDict
 from pathlib import Path

 import numpy as np
+import yaml

 from rocprof_compute_base import MI300_CHIP_IDS, SUPPORTED_ARCHS
-from utils.utils import console_debug, console_error, console_log, demarcate
+from utils.parser import build_in_vars, supported_denom
+from utils.utils import (
+    console_debug,
+    console_error,
+    console_log,
+    convert_metric_id_to_panel_idx,
+    demarcate,
+)


 class OmniSoC_Base:
@@ -48,19 +56,10 @@ class OmniSoC_Base:
        self.__perfmon_config = (
            {}
        )  # Per IP block max number of simulutaneous counters. GFX IP Blocks
+        self.__section_counters = set()  # hw counters corresponding to filtered sections
        self.__soc_params = {}  # SoC specifications
        self.__compatible_profilers = []  # Store profilers compatible with SoC
        self.populate_mspec()
-        # In some cases (i.e. --specs) path will not be given
-        if hasattr(self.__args, "path"):
-            if self.__args.path == str(Path(os.getcwd()).joinpath("workloads")):
-                self.__workload_dir = str(
-                    Path(self.__args.path).joinpath(
-                        self.__args.name, self._mspec.gpu_model
-                    )
-                )
-            else:
-                self.__workload_dir = self.__args.path

    def __hash__(self):
        return hash(self.__arch)
@@ -189,6 +188,47 @@ class OmniSoC_Base:
            total_xcds(self._mspec.gpu_model, self._mspec.compute_partition)
        )

+    @demarcate
+    def section_filter(self):
+        """
+        Create a set of counters required for the selected report sections.
+        Parse analysis report configuration files based on the selected report sections to be filtered.
+        """
+        args = self.__args
+        for section in self.__filter_metric_ids:
+            section_num = convert_metric_id_to_panel_idx(section)
+            file_id = str(section_num // 100)
+            # Convert "4" to "04"
+            if len(file_id) == 1:
+                file_id = f"0{file_id}"
+            # Identify yaml file corresponding to file_id
+            config_filename = [
+                filename
+                for filename in os.listdir(Path(args.config_dir).joinpath(self.__arch))
+                if filename.endswith(".yaml") and filename.startswith(file_id)
+            ][0]
+            # Read the yaml file
+            with open(
+                Path(args.config_dir).joinpath(self.__arch, config_filename), "r"
+            ) as stream:
+                section_config = yaml.safe_load(stream)
+            # Extract subsection if section is of the form 4.52
+            if section_num % 100:
+                section_config_text = "\n".join(
+                    [
+                        # Convert yaml to string
+                        yaml.dump(subsection)
+                        for subsection in section_config["Panel Config"]["data source"]
+                        if subsection["metric_table"]["id"] == section_num
+                    ]
+                )
+            else:
+                # Convert yaml to string
+                section_config_text = yaml.dump(section_config)
+            self.__section_counters = self.__section_counters.union(
+                parse_counters(section_config_text)
+            )
+
    @demarcate
    def perfmon_filter(self, roofline_perfmon_only: bool):
        """Filter default performance counter set based on user arguments"""
@@ -197,15 +237,40 @@ class OmniSoC_Base:
            and Path(self.get_args().path).joinpath("pmc_perf.csv").is_file()
        ):
            return
-        workload_perfmon_dir = self.__workload_dir + "/perfmon"
+
+        # In some cases (i.e. --specs) path will not be given
+        if hasattr(self.__args, "path"):
+            if self.__args.path == str(Path(os.getcwd()).joinpath("workloads")):
+                workload_dir = str(
+                    Path(self.__args.path).joinpath(
+                        self.__args.name, self._mspec.gpu_model
+                    )
+                )
+            else:
+                workload_dir = self.__args.path
+
+        workload_perfmon_dir = workload_dir + "/perfmon"
+
+        self.__filter_hardware_blocks = [
+            name
+            for name, type in self.get_args().filter_blocks.items()
+            if type == "hardware_block"
+        ]
+        self.__filter_metric_ids = [
+            name
+            for name, type in self.get_args().filter_blocks.items()
+            if type == "metric_id"
+        ]
+
+        self.section_filter()

        # Initialize directories
-        if not Path(self.__workload_dir).is_dir():
-            os.makedirs(self.__workload_dir)
-        elif not Path(self.__workload_dir).is_symlink():
-            shutil.rmtree(self.__workload_dir)
+        if not Path(workload_dir).is_dir():
+            os.makedirs(workload_dir)
+        elif not Path(workload_dir).is_symlink():
+            shutil.rmtree(workload_dir)
        else:
-            os.unlink(self.__workload_dir)
+            os.unlink(workload_dir)

        os.makedirs(workload_perfmon_dir)

@@ -216,16 +281,17 @@ class OmniSoC_Base:
            )

            # Perfmon list filtering
-            if self.__args.ipblocks != None:
-                for i in range(len(self.__args.ipblocks)):
-                    self.__args.ipblocks[i] = self.__args.ipblocks[i].lower()
+            if self.__filter_hardware_blocks:
+                hardware_blocks = [
+                    block.lower() for block in self.__filter_hardware_blocks
+                ]
                mpattern = "pmc_([a-zA-Z0-9_]+)_perf*"

                pmc_files_list = []
                for fname in ref_pmc_files_list:
                    fbase = Path(fname).stem
                    ip = re.match(mpattern, fbase).group(1)
-                    if ip in self.__args.ipblocks:
+                    if ip in hardware_blocks:
                        pmc_files_list.append(fname)
                        console_log("fname: " + fbase + ": Added")
                    else:
@@ -242,8 +308,9 @@ class OmniSoC_Base:
        perfmon_coalesce(
            pmc_files_list,
            self.__perfmon_config,
-            self.__workload_dir,
+            workload_dir,
            self.get_args().spatial_multiplexing,
+            self.__section_counters,
        )

    # ----------------------------------------------------
@@ -310,7 +377,38 @@ def using_v3():


@demarcate
-def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir, spatial_multiplexing):
+def parse_counters(config_text):
+    """
+    Create a set of all hardware counters mentioned in the given config file content string
+    """
+    # hw counter name should start with ip block name
+    hw_counter_regex = r"(?:SQ|SQC|TA|TD|TCP|TCC|CPC|CPF|SPI|GRBM)_[0-9A-Za-z_]+"
+    # only capture the variable name after $ using capturing group
+    variable_regex = r"\$([0-9A-Za-z_]+)"
+    hw_counter_matches = set(re.findall(hw_counter_regex, config_text))
+    variable_matches = set(re.findall(variable_regex, config_text))
+    # get hw counters and variables for all supported denominators
+    for formula in supported_denom.values():
+        hw_counter_matches.update(re.findall(hw_counter_regex, formula))
+        variable_matches.update(re.findall(variable_regex, formula))
+    # get hw counters corresponding to variables recursively
+    while variable_matches:
+        subvariable_matches = set()
+        for var in variable_matches:
+            if var in build_in_vars:
+                hw_counter_matches.update(
+                    re.findall(hw_counter_regex, build_in_vars[var])
+                )
+                subvariable_matches.update(re.findall(variable_regex, build_in_vars[var]))
+        # process new found variables
+        variable_matches = subvariable_matches - variable_matches
+    return list(hw_counter_matches)
+
+
+@demarcate
+def perfmon_coalesce(
+    pmc_files_list, perfmon_config, workload_dir, spatial_multiplexing, section_counters
+):
    """Sort and bucket all related performance counters to minimize required application passes"""
    workload_perfmon_dir = workload_dir + "/perfmon"

@@ -388,6 +486,49 @@ def perfmon_coalesce(pmc_files_list, perfmon_config, workload_dir, spatial_multi
            if accu in normal_counters:
                del normal_counters[accu]

+    # If section report filters have been provided, only collect counters necessary for those section reports
+    # Remove _sum and _expand suffixes while matching
+    def remove_suffixes(string):
+        for suffix in ["_sum", "_expand"]:
+            if string.endswith(suffix):
+                string = string[: -len(suffix)]
+                break
+        return string
+
+    section_counters = {remove_suffixes(counter) for counter in section_counters}
+    ignored_counters = list()
+
+    if section_counters:
+        # Remove unnecessary normal counters
+        for counter_name in list(normal_counters.keys()):
+            if remove_suffixes(counter_name) not in section_counters:
+                del normal_counters[counter_name]
+                ignored_counters.append(counter_name)
+
+        # Remove unnecessary accumulate counters
+        filtered_accumlate_counters = list()
+        for counters in accumulate_counters:
+            if any(
+                remove_suffixes(counter_name) in section_counters
+                for counter_name in counters
+            ):
+                filtered_accumlate_counters.append(counters)
+            else:
+                ignored_counters.extend(counter_name)
+        accumulate_counters = filtered_accumlate_counters
+
+    if ignored_counters:
+        console_log(
+            f"Not collecting following counters per provided filter: {', '.join(ignored_counters)} "
+        )
+
+    # Throw error if no counters to be collected
+    if len(normal_counters) == 0 and len(accumulate_counters) == 0:
+        console_error(
+            "profiling",
+            "No performance counters to collect, please check the provided profiling filters",
+        )
+
    output_files = []

    accu_file_count = 0
@@ -25,9 +25,11 @@
 import os
 import time
 from abc import ABC, abstractmethod
+from collections import OrderedDict
 from pathlib import Path

 import numpy as np
+import pandas as pd
 import plotly.graph_objects as go
 from dash import dcc, html

@@ -75,12 +77,6 @@ class Roofline:
        if hasattr(self.__args, "sort") and self.__args.sort != "ALL":
            self.__run_parameters["sort_type"] = self.__args.sort

-        if (
-            not isinstance(self.__run_parameters["workload_dir"], list)
-            and self.__run_parameters["workload_dir"] != None
-        ):
-            self.roof_setup()
-
        self.validate_parameters()

    def validate_parameters(self):
@@ -110,6 +106,12 @@ class Roofline:
        ret_df,
    ):
        """Generate a set of empirical roofline plots given a directory containing required profiling and benchmarking data"""
+        if (
+            not isinstance(self.__run_parameters["workload_dir"], list)
+            and self.__run_parameters["workload_dir"] != None
+        ):
+            self.roof_setup()
+
        # Create arithmetic intensity data that will populate the roofline model
        console_debug("roofline", "Path: %s" % self.__run_parameters["workload_dir"])
        self.__ai_data = calc_ai(self.__mspec, self.__run_parameters["sort_type"], ret_df)
@@ -375,9 +377,11 @@ class Roofline:

    @demarcate
    def standalone_roofline(self):
-        from collections import OrderedDict
-
-        import pandas as pd
+        if (
+            not isinstance(self.__run_parameters["workload_dir"], list)
+            and self.__run_parameters["workload_dir"] != None
+        ):
+            self.roof_setup()

        # Change vL1D to a interpretable str, if required
        if "vL1D" in self.__run_parameters["mem_level"]:
@@ -394,32 +398,6 @@ class Roofline:
        t_df["pmc_perf"] = pd.read_csv(app_path)
        self.empirical_roofline(ret_df=t_df)

-    # Main methods
-    @abstractmethod
-    def pre_processing(self):
-        if self.__args.roof_only:
-            # check for sysinfo
-            console_log(
-                "roofline", "Checking for sysinfo.csv in " + str(self.__args.path)
-            )
-            sysinfo_path = str(Path(self.__args.path).joinpath("sysinfo.csv"))
-            if not Path(sysinfo_path).is_file():
-                console_log("roofline", "sysinfo.csv not found. Generating...")
-
-                class Dummy_SoC:
-                    roofline_obj = True
-
-                gen_sysinfo(
-                    workload_name=self.__args.name,
-                    workload_dir=self.__workload_dir,
-                    ip_blocks=self.__args.ipblocks,
-                    app_cmd=self.__args.remaining,
-                    skip_roof=self.__args.no_roof,
-                    roof_only=self.__args.roof_only,
-                    mspec=self.__mspec,
-                    soc=Dummy_SoC,
-                )
-
    @abstractmethod
    def profile(self):
        if self.__args.roof_only:
@@ -36,7 +36,7 @@ import yaml
 import config
 from utils import schema
 from utils.kernel_name_shortener import kernel_name_shortener
-from utils.utils import console_debug, console_error, demarcate
+from utils.utils import console_debug, console_error, console_log, demarcate

 # TODO: use pandas chunksize or dask to read really large csv file
 # from dask import dataframe as dd
@@ -85,6 +85,21 @@ def load_panel_configs(dir):
    return od


+def load_profiling_config(config_dir):
+    """
+    Load profiling config from yaml file.
+    """
+    try:
+        with open(Path(config_dir).joinpath("profiling_config.yaml")) as file:
+            prof_config = yaml.safe_load(file)
+            return prof_config
+    except FileNotFoundError:
+        console_log(
+            f"Could not find profiling_config.yaml in {config_dir} for filtering analysis report"
+        )
+    return dict()
+
+
@demarcate
 def create_df_kernel_top_stats(
    df_in,
@@ -492,7 +492,9 @@ def build_dfs(archConfigs, filter_metrics, sys_info):
                if type == "metric_table":
                    headers = ["Metric_ID"]
                    data_source_idx = str(data_config["id"] // 100)
-                    if data_source_idx != 0 or data_source_idx in filter_metrics:
+                    if data_source_idx != 0 or (
+                        filter_metrics and data_source_idx in filter_metrics
+                    ):
                        metric_list[data_source_idx] = panel["title"]
                    if (
                        "cli_style" in data_config
@@ -29,7 +29,7 @@ import pandas as pd
 from tabulate import tabulate

 from utils import parser
-from utils.utils import console_log, console_warning
+from utils.utils import console_log, console_warning, convert_metric_id_to_panel_idx

 hidden_columns = ["Tips", "coll_level"]
 hidden_sections = [1900, 2000]
@@ -60,11 +60,20 @@ def get_table_string(df, transpose=False, decimal=2):
    )


-def show_all(args, runs, archConfigs, output):
+def show_all(args, runs, archConfigs, output, profiling_config):
    """
    Show all panels with their data in plain text mode.
    """
    comparable_columns = parser.build_comparable_columns(args.time_unit)
+    filter_panel_ids = [
+        convert_metric_id_to_panel_idx(section)
+        for section in [
+            name
+            for name, type in profiling_config.get("filter_blocks", {}).items()
+            if type == "metric_id"
+        ]
+    ]
+    comparable_columns = parser.build_comparable_columns(args.time_unit)

    for panel_id, panel in archConfigs.panel_configs.items():
        # Skip panels that don't support baseline comparison
@@ -74,6 +83,27 @@ def show_all(args, runs, archConfigs, output):

        for data_source in panel["data source"]:
            for type, table_config in data_source.items():
+                # If block filtering was used during analysis, then dont use profiling config
+                # If block filtering was used in profiling config, only show those panels
+                # If block filtering not used in profiling config, show all panels
+                # Skip this table if table id or panel id is not present in block filters
+                # However, always show panel id <= 100
+                if (
+                    not args.filter_metrics
+                    and filter_panel_ids
+                    and table_config["id"] not in filter_panel_ids
+                    and panel_id not in filter_panel_ids
+                    and panel_id > 100
+                ):
+                    table_id_str = (
+                        str(table_config["id"] // 100)
+                        + "."
+                        + str(table_config["id"] % 100)
+                    )
+                    console_log(
+                        f"Not showing table not selected during profiling: {table_id_str} {table_config['title']}"
+                    )
+                    continue
                # take the 1st run as baseline
                base_run, base_data = next(iter(runs.items()))
                base_df = base_data.dfs[table_config["id"]]
@@ -207,7 +237,25 @@ def show_all(args, runs, archConfigs, output):
                        + str(table_config["id"] % 100)
                    )

-                    if "title" in table_config and table_config["title"]:
+                    # Check if any column in df is empty
+                    is_empty_columns_exist = any(
+                        [
+                            df.columns[col_idx]
+                            for col_idx in range(len(df.columns))
+                            if df.replace("", None).iloc[:, col_idx].isnull().all()
+                        ]
+                    )
+                    # Do not print the table if any column is empty
+                    if is_empty_columns_exist:
+                        console_log(
+                            f"Not showing table with empty column(s): {table_id_str} {table_config['title']}"
+                        )
+
+                    if (
+                        "title" in table_config
+                        and table_config["title"]
+                        and not is_empty_columns_exist
+                    ):
                        ss += table_id_str + " " + table_config["title"] + "\n"

                    if args.df_file_dir:
@@ -238,10 +286,13 @@ def show_all(args, runs, archConfigs, output):
                        and "columnwise" in table_config
                        and table_config["columnwise"] == True
                    )
-                    ss += (
-                        get_table_string(df, transpose=transpose, decimal=args.decimal)
-                        + "\n"
-                    )
+                    if not is_empty_columns_exist:
+                        ss += (
+                            get_table_string(
+                                df, transpose=transpose, decimal=args.decimal
+                            )
+                            + "\n"
+                        )

        if ss:
            print("\n" + "-" * 80, file=output)
@@ -191,7 +191,7 @@ def capture_subprocess_output(subprocess_args, new_env=None, profileMode=False):
    global rocprof_args
    # Format command for debug messages, formatting for rocprofv1 and rocprofv2
    command = " ".join(rocprof_args)
-    console_debug("subprocess", "Running: " + command)
+    console_debug("subprocess", "Running: " + command + " " + " ".join(subprocess_args))
    # Start subprocess
    # bufsize = 1 means output is line buffered
    # universal_newlines = True is required for line buffering
@@ -820,7 +820,7 @@ def gen_sysinfo(
    df["workload_name"] = workload_name

    blocks = []
-    if ip_blocks == None:
+    if not ip_blocks:
        t = ["SQ", "LDS", "SQC", "TA", "TD", "TCP", "TCC", "SPI", "CPC", "CPF"]
        blocks += t
    else:
@@ -1249,3 +1249,16 @@ def merge_counters_spatial_multiplex(df_multi_index):

    final_df = pd.concat(result_dfs, keys=coll_levels, axis=1, copy=False)
    return final_df
+
+
+def convert_metric_id_to_panel_idx(metric_id):
+    # "4.02" -> 402
+    # "4.23" -> 423
+    # "4" -> 400
+    tokens = metric_id.split(".")
+    if len(tokens) == 1:
+        return int(tokens[0]) * 100
+    elif len(tokens) == 2:
+        return int(tokens[0]) * 100 + int(tokens[1])
+    else:
+        raise Exception(f"Invalid metric id: {metric_id}")
@@ -13,3 +13,8 @@ set(VCOPY_SOURCES ../sample/vcopy.cpp)
 set_source_files_properties(${VCOPY_SOURCES} PROPERTIES LANGUAGE HIP)
 add_executable(vcopy ${VCOPY_SOURCES})
 set_target_properties(vcopy PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
+
+set(VMEM_SOURCES ../sample/vmem.hip)
+set_source_files_properties(${VMEM_SOURCES} PROPERTIES LANGUAGE HIP)
+add_executable(vmem ${VMEM_SOURCES})
+set_target_properties(vmem PROPERTIES RUNTIME_OUTPUT_DIRECTORY ${CMAKE_SOURCE_DIR}/tests)
@@ -1,5 +1,11 @@
+import subprocess
+from importlib.machinery import SourceFileLoader
+from unittest.mock import patch
+
 import pytest

+rocprof_compute = SourceFileLoader("rocprof-compute", "src/rocprof-compute").load_module()
+

 def pytest_addoption(parser):
    parser.addoption(
@@ -8,3 +14,69 @@ def pytest_addoption(parser):
        default=False,
        help="Call standalone binary instead of main function during tests",
    )
+
+
+@pytest.fixture
+def binary_handler_profile_rocprof_compute(request):
+    def _handler(config, workload_dir, options=[], check_success=True, roof=False):
+        if request.config.getoption("--call-binary"):
+            baseline_opts = [
+                "build/rocprof-compute.bin",
+                "profile",
+                "-n",
+                "app_1",
+                "-VVV",
+            ]
+            if not roof:
+                baseline_opts.append("--no-roof")
+            process = subprocess.run(
+                baseline_opts
+                + options
+                + ["--path", workload_dir, "--"]
+                + config["app_1"],
+                text=True,
+            )
+            # verify run status
+            if check_success:
+                assert process.returncode == 0
+            return process.returncode
+        else:
+            baseline_opts = ["rocprof-compute", "profile", "-n", "app_1", "-VVV"]
+            if not roof:
+                baseline_opts.append("--no-roof")
+            with pytest.raises(SystemExit) as e:
+                with patch(
+                    "sys.argv",
+                    baseline_opts
+                    + options
+                    + ["--path", workload_dir, "--"]
+                    + config["app_1"],
+                ):
+                    rocprof_compute.main()
+            # verify run status
+            if check_success:
+                assert e.value.code == 0
+            return e.value.code
+
+    return _handler
+
+
+@pytest.fixture
+def binary_handler_analyze_rocprof_compute(request):
+    def _handler(arguments):
+        if request.config.getoption("--call-binary"):
+            process = subprocess.run(
+                ["build/rocprof-compute.bin", *arguments],
+                text=True,
+            )
+            return process.returncode
+        else:
+            with pytest.raises(SystemExit) as e:
+                with patch(
+                    "sys.argv",
+                    ["rocprof-compute", *arguments],
+                ):
+                    rocprof_compute.main()
+            return e.value.code
+
+    return _handler
@@ -6,7 +6,6 @@ from unittest.mock import patch
 import pandas as pd
 import pytest
 import test_utils
-from test_utils import binary_handler_analyze_rocprof_compute

 config = {}
 config["cleanup"] = True if "PYTEST_XDIST_WORKER_COUNT" in os.environ else False
@@ -1,8 +1,6 @@
 from unittest.mock import patch

-import pandas as pd
 import pytest
-from test_utils import binary_handler_analyze_rocprof_compute

 ##################################################
 ##          Generated tests                     ##
@@ -11,7 +11,6 @@ from unittest.mock import patch
 import pandas as pd
 import pytest
 import test_utils
-from test_utils import binary_handler_profile_rocprof_compute

 # Globals

@@ -1458,3 +1457,136 @@ def test_mem_levels_LDS(binary_handler_profile_rocprof_compute):
    )

    test_utils.clean_output_dir(config["cleanup"], workload_dir)
+
+
+@pytest.mark.section
+def test_instmix_section(binary_handler_profile_rocprof_compute):
+    options = ["--block", "10"]
+    workload_dir = test_utils.get_output_dir()
+    _ = binary_handler_profile_rocprof_compute(
+        config, workload_dir, options, check_success=True, roof=False
+    )
+
+    file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
+    validate(
+        inspect.stack()[0][3],
+        workload_dir,
+        file_dict,
+    )
+
+    assert test_utils.check_file_pattern(
+        "'10': metric_id", f"{workload_dir}/profiling_config.yaml"
+    )
+    assert test_utils.check_file_pattern(
+        "SQ_INSTS_VALU_MFMA_F64", f"{workload_dir}/pmc_perf.csv"
+    )
+    test_utils.clean_output_dir(config["cleanup"], workload_dir)
+
+
+@pytest.mark.section
+def test_instmix_memchart_section(binary_handler_profile_rocprof_compute):
+    options = ["--block", "10", "3"]
+    workload_dir = test_utils.get_output_dir()
+    _ = binary_handler_profile_rocprof_compute(
+        config, workload_dir, options, check_success=True, roof=False
+    )
+
+    file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
+    validate(
+        inspect.stack()[0][3],
+        workload_dir,
+        file_dict,
+    )
+
+    assert test_utils.check_file_pattern(
+        "'10': metric_id", f"{workload_dir}/profiling_config.yaml"
+    )
+    assert test_utils.check_file_pattern(
+        "'3': metric_id", f"{workload_dir}/profiling_config.yaml"
+    )
+    assert test_utils.check_file_pattern(
+        "SQ_INSTS_VALU_MFMA_F64", f"{workload_dir}/pmc_perf.csv"
+    )
+    assert test_utils.check_file_pattern(
+        "SQC_TC_DATA_READ_REQ", f"{workload_dir}/pmc_perf.csv"
+    )
+    test_utils.clean_output_dir(config["cleanup"], workload_dir)
+
+
+@pytest.mark.section
+def test_instmix_section_TA_block(binary_handler_profile_rocprof_compute):
+    options = ["--block", "10", "TA"]
+    workload_dir = test_utils.get_output_dir()
+    _ = binary_handler_profile_rocprof_compute(
+        config, workload_dir, options, check_success=True, roof=False
+    )
+
+    file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
+    validate(
+        inspect.stack()[0][3],
+        workload_dir,
+        file_dict,
+    )
+
+    assert test_utils.check_file_pattern(
+        "'10': metric_id", f"{workload_dir}/profiling_config.yaml"
+    )
+    assert test_utils.check_file_pattern(
+        "TA: hardware_block", f"{workload_dir}/profiling_config.yaml"
+    )
+    assert test_utils.check_file_pattern(
+        "TA_FLAT_WAVEFRONTS", f"{workload_dir}/pmc_perf.csv"
+    )
+    assert not test_utils.check_file_pattern(
+        "SQC_TC_DATA_READ_REQ", f"{workload_dir}/pmc_perf.csv"
+    )
+    assert test_utils.check_file_pattern("", f"{workload_dir}/pmc_perf.csv")
+    test_utils.clean_output_dir(config["cleanup"], workload_dir)
+
+
+@pytest.mark.section
+def test_instmix_section_global_write_kernel(binary_handler_profile_rocprof_compute):
+    options = ["-k", "global_write", "--block", "10"]
+    custom_config = dict(config)
+    custom_config["kernel_name_1"] = "global_write"
+    custom_config["app_1"] = ["./tests/vmem"]
+    num_kernels = 1
+
+    workload_dir = test_utils.get_output_dir()
+    _ = binary_handler_profile_rocprof_compute(
+        custom_config, workload_dir, options, check_success=True, roof=False
+    )
+
+    file_dict = test_utils.check_csv_files(workload_dir, 1, num_kernels)
+    validate(
+        inspect.stack()[0][3],
+        workload_dir,
+        file_dict,
+    )
+
+    assert test_utils.check_file_pattern(
+        "'10': metric_id", f"{workload_dir}/profiling_config.yaml"
+    )
+    assert test_utils.check_file_pattern(
+        "- global_write", f"{workload_dir}/profiling_config.yaml"
+    )
+    assert test_utils.check_file_pattern(
+        "SQ_INSTS_VALU_MFMA_F64", f"{workload_dir}/pmc_perf.csv"
+    )
+    assert test_utils.check_file_pattern("global_write", f"{workload_dir}/pmc_perf.csv")
+    assert not test_utils.check_file_pattern(
+        "global_read", f"{workload_dir}/pmc_perf.csv"
+    )
+    test_utils.clean_output_dir(config["cleanup"], workload_dir)
+
+
+@pytest.mark.section
+def test_list_metrics(binary_handler_profile_rocprof_compute):
+    options = ["--list-metrics"]
+    workload_dir = test_utils.get_output_dir()
+    _ = binary_handler_profile_rocprof_compute(
+        config, workload_dir, options, check_success=True, roof=False
+    )
+    # workload dir should be empty
+    assert not os.listdir(workload_dir)
+    test_utils.clean_output_dir(config["cleanup"], workload_dir)
@@ -25,16 +25,11 @@

 import inspect
 import os
+import re
 import shutil
-import subprocess
-from importlib.machinery import SourceFileLoader
 from pathlib import Path
-from unittest.mock import patch

 import pandas as pd
-import pytest
-
-rocprof_compute = SourceFileLoader("rocprof-compute", "src/rocprof-compute").load_module()


 def check_resource_allocation():
@@ -57,6 +52,14 @@ def check_resource_allocation():
    return


+def check_file_pattern(pattern, file_path):
+    """Check if the given pattern exists in the file"""
+    content = ""
+    with open(file_path) as f:
+        content = f.read()
+    return len(re.findall(pattern, content)) != 0
+
+
 def get_output_dir(suffix="_output", clean_existing=True):
    """Provides a unique output directory based on the name of the calling test function with a suffix applied.

@@ -130,69 +133,3 @@ def check_csv_files(output_dir, num_devices, num_kernels):
        elif file.endswith(".pdf"):
            file_dict[file] = "pdf"
    return file_dict
-
-
-@pytest.fixture
-def binary_handler_profile_rocprof_compute(request):
-    def _handler(config, workload_dir, options=[], check_success=True, roof=False):
-        if request.config.getoption("--call-binary"):
-            baseline_opts = [
-                "build/rocprof-compute.bin",
-                "profile",
-                "-n",
-                "app_1",
-                "-VVV",
-            ]
-            if not roof:
-                baseline_opts.append("--no-roof")
-            process = subprocess.run(
-                baseline_opts
-                + options
-                + ["--path", workload_dir, "--"]
-                + config["app_1"],
-                text=True,
-            )
-            # verify run status
-            if check_success:
-                assert process.returncode == 0
-            return process.returncode
-        else:
-            baseline_opts = ["rocprof-compute", "profile", "-n", "app_1", "-VVV"]
-            if not roof:
-                baseline_opts.append("--no-roof")
-            with pytest.raises(SystemExit) as e:
-                with patch(
-                    "sys.argv",
-                    baseline_opts
-                    + options
-                    + ["--path", workload_dir, "--"]
-                    + config["app_1"],
-                ):
-                    rocprof_compute.main()
-            # verify run status
-            if check_success:
-                assert e.value.code == 0
-            return e.value.code
-
-    return _handler
-
-
-@pytest.fixture
-def binary_handler_analyze_rocprof_compute(request):
-    def _handler(arguments):
-        if request.config.getoption("--call-binary"):
-            process = subprocess.run(
-                ["build/rocprof-compute.bin", *arguments],
-                text=True,
-            )
-            return process.returncode
-        else:
-            with pytest.raises(SystemExit) as e:
-                with patch(
-                    "sys.argv",
-                    ["rocprof-compute", *arguments],
-                ):
-                    rocprof_compute.main()
-            return e.value.code
-
-    return _handler