[SWDEV-542223] Update Violation Status Changes to Design + Minor cleanup (#558)

Changes: - Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency) - Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs (Violation Status is the first example of this in monitor) - Improve CLI monitor output: support multiple GPU lines per GPU, add new columns, and better formatting - Refactor helpers and logger for flexible unit formatting and table rendering - Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info() new metrics APIs in C++ example - Sync Python/C++ interface and structures for new metrics fields and naming - Remove deprecated/unused RSMI activity APIs, documentation not needed since the APIs no longer exist in ROCm SMI either. - Cleanup metric violations + fix handle watch arguments - Provide better handling/doc for average_flattened_ints() - Group xcp metrics with brackets in human readable + adjust output size Signed-off-by: Poag, Charis <Charis.Poag@amd.com> [ROCm/amdsmi commit: e2e4fc65c1]
2025-08-06 16:03:06 -05:00
parent 3437d5b5da
commit 07dfa789d0
@@ -117,12 +117,12 @@ $ amd-smi
 <a name="separate-driver-reload-anchor"></a>
 - **Separated driver reload from `amdsmi_set_gpu_memory_partition()` / `amdsmi_set_gpu_memory_partition_mode()` and CLI (`sudo amd-smi set -M <NPS mode>`)**  
  - Providing new API (`amdsmi_gpu_driver_reload()`) and CLI (`sudo amd-smi reset -r` or `sudo amd-smi reset --reload-driver`) once user is ready to reload driver. We understand
-  the automatic reload could be at an inconvienient time. This is why we now provide this
+  the automatic reload could be at an inconvenient time. This is why we now provide this
  functionality in separate API/CLI commands to use when the time is right.
  - It is important to understand, the memory (NPS) partition change requires:
-  1) Memory partition change request (`amdsmi_set_gpu_memory_partition()` / `amdsmi_set_gpu_memory_partition_mode()`) or CLI (`sudo amd-smi set -M <NPS mode>`)
-  2) Driver reload (`amdsmi_gpu_driver_reload()` / `sudo amd-smi reset -r` or `sudo amd-smi reset --reload-driver`) \[\*\]  
-  \[\*\] <i>Driver reload requires all GPU activity on all devices to be stopped.</i>
+    1) Memory partition change request (`amdsmi_set_gpu_memory_partition()` / `amdsmi_set_gpu_memory_partition_mode()`) or CLI (`sudo amd-smi set -M <NPS mode>`)
+    2) Driver reload (`amdsmi_gpu_driver_reload()` / `sudo amd-smi reset -r` or `sudo amd-smi reset --reload-driver`) \[\*\]
+  ***Driver reload requires all GPU activity on all devices to be stopped.***

 - **Modified `amd-smi` CLI `monitor` and `metric` for violations**.  
  - Disabled `amd-smi monitor --violation` on guests.  
@@ -164,9 +164,6 @@ $ amd-smi
  - `AMDSMI_EVT_NOTIF_PROCESS_START`
  - `AMDSMI_EVT_NOTIF_PROCESS_END`

- **Updated `amdsmi_get_clock_info` in `amdsmi_interface.py`**.  
-  - The `clk_deep_sleep` field now returns the sleep integer value.  
-
 - **Added Power Cap to `amd-smi monitor`**.  
  - `amd-smi monitor -p` will display the power cap along with power.

@@ -357,6 +354,8 @@ $ amd-smi

 - **Removed duplicated GPU IDs when receiving events using the `amd-smi event` command**.  

+- **Fixed `amd-smi monitor` decoder utilization (`DEC%`) not showing up on MI3x ASICs**.
+
 ### Upcoming changes

 - N/A
@@ -800,7 +799,7 @@ Updated `amdsmi_get_gpu_metrics_info()` and structure `amdsmi_gpu_metrics_t` to

 ### Changed

- **AMDSMI Library Version number to reflect changes in backwards compatability**.  
+- **AMDSMI Library Version number to reflect changes in backwards compatibility**.  
  - Removed Year from AMDSMI Library version number.
  - Version changed from 25.2.0.0 (Year.Major.Minor.Patch) to 25.2.0 (Major.Minor.Patch)
  - Removed year in all version references
@@ -852,7 +851,7 @@ Functions affected by struct change are:
 - **Added violation status output for Graphics Clock Below Host Limit to our CLI: `amdsmi_get_violation_status()`, `amd-smi metric  --throttle`, and `amd-smi monitor --violation`**.  
  ***Only available for MI300+ ASICs.***  
  Users can retrieve violation status' through either our Python or C++ APIs.  
-  Additionally, we have added capability to view these outputs conviently through `amd-smi metric --throttle` and `amd-smi monitor --violation`.  
+  Additionally, we have added capability to view these outputs conveniently through `amd-smi metric --throttle` and `amd-smi monitor --violation`.  
  Example outputs are listed below (below is for reference, output is subject to change):

    ```console
@@ -963,7 +962,7 @@ Functions affected by struct change are:
    ...
    ```

- **Changed amd-smi partition --accelerator & `amdsmi_get_gpu_accelerator_partition_profile_config()` detect users running without root/sudo privledges**  
+- **Changed amd-smi partition --accelerator & `amdsmi_get_gpu_accelerator_partition_profile_config()` detect users running without root/sudo permissions**  
  - Updated `amdsmi_get_gpu_accelerator_partition_profile_config()` to return `AMDSMI_STATUS_NO_PERM` immediately if users run without root/sudo permissions.
  - Updated `amd-smi partition --accelerator` to provide a warning for users without root/sudo permissions (see example below, ***output subject to change***).

@@ -75,12 +75,64 @@ def _print_error(e, destination):
        f = open(destination, "w", encoding="utf-8")
        f.write(e)
        f.close()
-        print("Error occured. Result written to " + str(destination) + " file")
+        print("Error occurred. Result written to " + str(destination) + " file")
+
+def configure_logging_and_execute(args, amd_smi_commands):
+    """
+    Configures logging based on the provided arguments and executes the subcommand.
+
+    Args:
+        args: Parsed command-line arguments.
+        amd_smi_commands: Instance of AMDSMICommands.
+    """
+    # Remove previous log handlers
+    for handler in logging.root.handlers[:]:
+        logging.root.removeHandler(handler)
+
+    # To enable debug logs in AMD SMI library:
+    #   set RSMI_LOGGING = 1 for logging to files
+    #   set RSMI_LOGGING = 2 for logging to stdout
+    #   set RSMI_LOGGING = 3 for logging to stdout and files
+    #   set RSMI_LOGGING = 0 to disable logging
+    # Files will be located in /var/log/amd_smi_lib/AMD-SMI-lib.log*
+
+    # log string with the following format:
+    # loglevel | YYYY-MM-DD HH:MM:SS.ms | filename:line | message
+    logging_dict = {
+        'DEBUG': logging.DEBUG,
+        'INFO': logging.INFO,
+        'WARNING': logging.WARNING,
+        'ERROR': logging.ERROR,
+        'CRITICAL': logging.CRITICAL
+    }
+
+    time = '%(asctime)s.%(msecs)03d'
+    datefmt = '%Y-%m-%d %H:%M:%S'
+    logging.basicConfig(format='%(levelname)s | ' + time + ' | %(filename)s:%(lineno)d | %(message)s',
+                        level=logging_dict[args.loglevel], datefmt=datefmt)
+
+    # Disable traceback for non-debug log levels
+    if args.loglevel == "DEBUG":
+        sys.tracebacklimit = 10
+    else:
+        sys.tracebacklimit = -1
+
+    logging.debug(args)
+
+    # Execute subcommands
+    try:
+        args.func(args)
+    except amdsmi_cli_exceptions.AmdSmiException as e:
+        _print_error(str(e), amd_smi_commands.logger.destination)
+    except amdsmi_exception.AmdSmiLibraryException as e:
+        exc = amdsmi_cli_exceptions.AmdSmiLibraryErrorException(amd_smi_commands.logger.format, e.get_error_code())
+        _print_error(str(exc), amd_smi_commands.logger.destination)


 if __name__ == "__main__":
    # Disable traceback before possible init errors in AMDSMICommands and AMDSMIParser
-    if "DEBUG" in sys.argv:
+    copy_argv = str(sys.argv.copy()).upper()
+    if "DEBUG" in copy_argv:
        sys.tracebacklimit = 10
    else:
        sys.tracebacklimit = -1
@@ -107,57 +159,31 @@ if __name__ == "__main__":
                                    sys_argv=sys.argv,
                                    helpers=amd_smi_helpers)
    try:
-        try:
-            argcomplete.autocomplete(amd_smi_parser)
-        except NameError:
-            logging.debug("argcomplete module not found. Autocomplete will not work.")
+        argcomplete.autocomplete(amd_smi_parser)
+    except NameError:
+        logging.debug("argcomplete module not found. Autocomplete will not work.")

-        # Store possible subcommands & aliases for later errors
-        valid_commands = amd_smi_parser.possible_commands
-        valid_commands += ['--help', '-h']
+    # Store possible subcommands & aliases for later errors
+    valid_commands = amd_smi_parser.possible_commands
+    valid_commands += ['--help', '-h']

-        sys.argv = [arg.lower() if arg.startswith('--') or not arg.startswith('-')
-                    else arg for arg in sys.argv]
-        if len(sys.argv) == 1:
-            args = amd_smi_parser.parse_args(args=['default'])
-        elif sys.argv[1] in valid_commands:
-            args = amd_smi_parser.parse_args(args=None)
-        else:
-            raise amdsmi_cli_exceptions.AmdSmiInvalidSubcommandException(sys.argv[1],amd_smi_commands.logger.destination)
+    sys.argv = [arg.lower() if arg.startswith('--') or not arg.startswith('-')
+                else arg for arg in sys.argv]
+    if len(sys.argv) == 1:
+        args = amd_smi_parser.parse_args(args=['default'])
+    elif sys.tracebacklimit == 10 and (sys.argv[1] == '--loglevel'):
+        args = amd_smi_parser.parse_args(args=['default', '--loglevel'] + sys.argv[2:])
+    elif sys.argv[1] in valid_commands:
+        args = amd_smi_parser.parse_args(args=None)
+    else:
+        raise amdsmi_cli_exceptions.AmdSmiInvalidSubcommandException(sys.argv[1],amd_smi_commands.logger.destination)

-        # Handle command modifiers before subcommand execution
-            # human readable is the default output format
-        if hasattr(args, 'json') and args.json:
-            amd_smi_commands.logger.format = amd_smi_commands.logger.LoggerFormat.json.value
-        if hasattr(args, 'csv') and args.csv:
-            amd_smi_commands.logger.format = amd_smi_commands.logger.LoggerFormat.csv.value
-        if hasattr(args, 'file') and args.file:
-            amd_smi_commands.logger.destination = args.file
-
-        # Remove previous log handlers
-        for handler in logging.root.handlers[:]:
-            logging.root.removeHandler(handler)
-
-        logging_dict = {'DEBUG' : logging.DEBUG,
-                        'INFO' : logging.INFO,
-                        'WARNING': logging.WARNING,
-                        'ERROR': logging.ERROR,
-                        'CRITICAL': logging.CRITICAL}
-        # To enable debug logs on rocm-smi library set RSMI_LOGGING = 1 in environment
-        logging.basicConfig(format='%(levelname)s: %(message)s', level=logging_dict[args.loglevel])
-
-        # Disable traceback for non-debug log levels
-        if args.loglevel == "DEBUG":
-            sys.tracebacklimit = 10
-        else:
-            sys.tracebacklimit = -1
-
-        logging.debug(args)
-
-        # Execute subcommands
-        args.func(args)
-    except amdsmi_cli_exceptions.AmdSmiException as e:
-        _print_error(str(e), amd_smi_commands.logger.destination)
-    except amdsmi_exception.AmdSmiLibraryException as e:
-        exc = amdsmi_cli_exceptions.AmdSmiLibraryErrorException(amd_smi_commands.logger.format, e.get_error_code())
-        _print_error(str(exc), amd_smi_commands.logger.destination)
+    # Handle command modifiers before subcommand execution
+    # human readable is the default output format
+    if hasattr(args, 'json') and args.json:
+        amd_smi_commands.logger.format = amd_smi_commands.logger.LoggerFormat.json.value
+    if hasattr(args, 'csv') and args.csv:
+        amd_smi_commands.logger.format = amd_smi_commands.logger.LoggerFormat.csv.value
+    if hasattr(args, 'file') and args.file:
+        amd_smi_commands.logger.destination = args.file
+    configure_logging_and_execute(args, amd_smi_commands)
@@ -27,6 +27,7 @@ import os
 import sys
 import threading
 import time
+import copy

 from _version import __version__
 from amdsmi_cli_exceptions import AmdSmiInvalidParameterException, AmdSmiRequiredCommandException, AmdSmiInvalidCommandException
@@ -1430,7 +1431,29 @@ class AMDSMICommands():


    def build_xcp_dict(self, key, violation_status, num_partition):
-        return {f"xcp_{i}": violation_status[key][i] for i in range(num_partition)}
+        if not isinstance(violation_status[key], list):
+            if "active_" in key:
+               if violation_status[key] != "N/A":
+                   if violation_status[key] is True:
+                       violation_status[key] = "ACTIVE"
+                   elif violation_status[key] is False:
+                       violation_status[key] = "NOT ACTIVE"
+            ret = violation_status[key]
+        elif isinstance(violation_status[key], list):
+            for row in violation_status[key]:
+                for element in row:
+                    if element != "N/A":
+                        if "active_" in key:
+                            if element is True:
+                                row[row.index(element)] = "ACTIVE"
+                            elif element is False:
+                                row[row.index(element)] = "NOT ACTIVE"
+                        elif ("per_" or "acc_") in key:
+                            row[row.index(element)] = element
+                    else:
+                        continue
+            ret = {f"xcp_{i}": violation_status[key][i] for i in range(num_partition)}
+        return ret

    def metric_gpu(self, args, multiple_devices=False, watching_output=False, gpu=None,
                usage=None, watch=None, watch_time=None, iterations=None, power=None,
@@ -1469,7 +1492,7 @@ class AMDSMICommands():
            guest_data (bool, optional): Value override for args.guest_data. Defaults to None.
            fb_usage (bool, optional): Value override for args.fb_usage. Defaults to None.
            xgmi (bool, optional): Value override for args.xgmi. Defaults to None.
-            throttle (bool, optional): Value override for args.violation. Defaults to None.
+            throttle (bool, optional): Value override for args.throttle. Defaults to None.

        Raises:
            IndexError: Index error if gpu list is empty
@@ -1506,6 +1529,8 @@ class AMDSMICommands():
                args.clock = clock
            if temperature:
                args.temperature = temperature
+            if voltage:
+                args.voltage = voltage
            if pcie:
                args.pcie = pcie
            if ecc:
@@ -1532,10 +1557,11 @@ class AMDSMICommands():
                args.energy = energy
            if throttle:
                args.violation = throttle
+                args.throttle = throttle
            current_platform_args += ["fan", "voltage_curve", "overdrive", "perf_level",
                                      "xgmi_err", "energy", "throttle"]
            current_platform_values += [args.fan, args.voltage_curve, args.overdrive,
-                                        args.perf_level, args.xgmi_err, args.energy, args.violation,
+                                        args.perf_level, args.xgmi_err, args.energy, args.throttle,
                                        ]

        if self.helpers.is_hypervisor():
@@ -1636,88 +1662,22 @@ class AMDSMICommands():
            gpu_metric = amdsmi_interface.amdsmi_get_gpu_metrics_info(args.gpu)
        except amdsmi_exception.AmdSmiLibraryException as e:
            logging.debug("#3 - Unable to load GPU Metrics table for %s | %s", gpu_id, e.get_error_info())
-            gpu_metric = {
-                "temperature_edge": "N/A",
-                "temperature_hotspot": "N/A",
-                "temperature_mem": "N/A",
-                "temperature_vrgfx": "N/A",
-                "temperature_vrsoc": "N/A",
-                "temperature_vrmem": "N/A",
-                "average_gfx_activity": "N/A",
-                "average_umc_activity": "N/A",
-                "average_mm_activity": "N/A",
-                "average_socket_power": "N/A",
-                "energy_accumulator": "N/A",
-                "system_clock_counter": "N/A",
-                "average_gfxclk_frequency": "N/A",
-                "average_socclk_frequency": "N/A",
-                "average_uclk_frequency": "N/A",
-                "average_vclk0_frequency": "N/A",
-                "average_dclk0_frequency": "N/A",
-                "average_vclk1_frequency": "N/A",
-                "average_dclk1_frequency": "N/A",
-                "current_gfxclk": "N/A",
-                "current_socclk": "N/A",
-                "current_uclk": "N/A",
-                "current_vclk0": "N/A",
-                "current_dclk0": "N/A",
-                "current_vclk1": "N/A",
-                "current_dclk1": "N/A",
-                "throttle_status": "N/A",
-                "current_fan_speed": "N/A",
-                "pcie_link_width": "N/A",
-                "pcie_link_speed": "N/A",
-                "gfx_activity_acc": "N/A",
-                "mem_activity_acc": "N/A",
-                "temperature_hbm": "N/A",
-                "firmware_timestamp": "N/A",
-                "voltage_soc": "N/A",
-                "voltage_gfx": "N/A",
-                "voltage_mem": "N/A",
-                "indep_throttle_status": "N/A",
-                "current_socket_power": "N/A",
-                "vcn_activity": "N/A",
-                "gfxclk_lock_status": "N/A",
-                "xgmi_link_width": "N/A",
-                "xgmi_link_speed": "N/A",
-                "pcie_bandwidth_acc": "N/A",
-                "pcie_bandwidth_inst": "N/A",
-                "pcie_l0_to_recov_count_acc": "N/A",
-                "pcie_replay_count_acc": "N/A",
-                "pcie_replay_rover_count_acc": "N/A",
-                "xgmi_read_data_acc": "N/A",
-                "xgmi_write_data_acc": "N/A",
-                "current_gfxclks": "N/A",
-                "current_socclks": "N/A",
-                "current_vclk0s": "N/A",
-                "current_dclk0s": "N/A",
-                "jpeg_activity": "N/A",
-                "pcie_nak_sent_count_acc": "N/A",
-                "pcie_nak_rcvd_count_acc": "N/A",
-                "accumulation_counter": "N/A",
-                "prochot_residency_acc": "N/A",
-                "ppt_residency_acc": "N/A",
-                "socket_thm_residency_acc": "N/A",
-                "vr_thm_residency_acc": "N/A",
-                "hbm_thm_residency_acc": "N/A",
-                "num_partition": "N/A",
-                "xcp_stats.gfx_busy_inst": "N/A",
-                "xcp_stats.jpeg_busy": "N/A",
-                "xcp_stats.vcn_busy": "N/A",
-                "xcp_stats.gfx_busy_acc": "N/A",
-                "xcp_stats.gfx_below_host_limit_acc": "N/A",
-                "xcp_stats.gfx_below_host_limit_ppt_acc": "N/A",
-                "xcp_stats.gfx_below_host_limit_thm_acc": "N/A",
-                "xcp_stats.gfx_low_utilization_acc": "N/A",
-                "xcp_stats.gfx_below_host_limit_total_acc": "N/A",
-                "xcp_stats.gfx_below_host_limit_ppt_per": "N/A",
-                "xcp_stats.gfx_below_host_limit_thm_per": "N/A",
-                "xcp_stats.gfx_low_utilization_per": "N/A",
-                "xcp_stats.gfx_below_host_limit_total_per": "N/A",
-                "pcie_lc_perf_other_end_recovery": "N/A",
-                "vram_max_bandwidth": "N/A",
-                "xgmi_link_status": "N/A",
-            }
+            gpu_metric = amdsmi_interface._NA_amdsmi_get_gpu_metrics_info()
+
+        # Workaround for XCP (partition) metrics not providing num_partition in v1.0
+        # Confirmed with driver team that we can default to 1 if num_partition is not defined.
+        # Pending partitions exist, ie. partition_id > 0. See logic below.
+        try:
+            partition_id = amdsmi_interface.amdsmi_get_gpu_kfd_info(args.gpu)['current_partition_id']
+        except amdsmi_exception.AmdSmiLibraryException as e:
+            logging.debug("Failed to get current partition id for gpu %s | %s", gpu_id, e.get_error_info())
+            partition_id = "N/A"
+
+        num_partition = gpu_metric['num_partition']
+        if num_partition == "N/A" and isinstance(partition_id, int) and partition_id > 0:
+            num_partition = 1  # Workaround for XCP metrics not providing num_partition in v1.0
+            logging.debug(f"num_partition is N/A and partition_id: {partition_id} (greater > 0).\nModified num_partition: {num_partition} to adjust for XCP metrics.")
+
        if self.logger.is_json_format():
            values_dict['gpu'] = int(gpu_id)
        # Populate the pcie_dict first due to multiple gpu metrics calls incorrectly increasing bandwidth
@@ -1821,7 +1781,6 @@ class AMDSMICommands():
                    # TODO: move vcn_activity and jpeg_activity into amdsmi_get_gpu_activity
                    engine_usage['vcn_activity'] = gpu_metric['vcn_activity']
                    engine_usage['jpeg_activity'] = gpu_metric['jpeg_activity']
-                    num_partition = gpu_metric['num_partition']
                    engine_usage['gfx_busy_inst'] = "N/A"
                    engine_usage['jpeg_busy'] = "N/A"
                    engine_usage['vcn_busy'] = "N/A"
@@ -2560,7 +2519,7 @@ class AMDSMICommands():

                values_dict['mem_usage'] = memory_usage
        if "throttle" in current_platform_args:
-            if args.violation:
+            if args.throttle:
                throttle_status = {
                    # Current values - counter/accumulated
                    'accumulation_counter': "N/A",
@@ -2571,9 +2530,9 @@ class AMDSMICommands():
                    'hbm_thermal_accumulated': "N/A",
                    'gfx_clk_below_host_limit_accumulated': "N/A", # deprecated
                    'gfx_clk_below_host_limit_power_accumulated': "N/A",
-                    'gfx_clk_below_host_limit_thermal_violation_accumulated': "N/A",
-                    'gfx_clk_below_host_limit_violation_accumulated': "N/A",
-                    'low_utilization_violation_accumulated': "N/A",
+                    'gfx_clk_below_host_limit_thermal_accumulated': "N/A",
+                    'total_gfx_clk_below_host_limit_accumulated': "N/A",
+                    'low_utilization_accumulated': "N/A",

                    # violation status values - active/not active
                    'prochot_violation_status': "N/A",
@@ -2581,9 +2540,10 @@ class AMDSMICommands():
                    'socket_thermal_violation_status': "N/A",
                    'vr_thermal_violation_status': "N/A",
                    'hbm_thermal_violation_status': "N/A",
+                    'gfx_clk_below_host_limit_violation_status': "N/A", # deprecated
                    'gfx_clk_below_host_limit_power_violation_status': "N/A",
                    'gfx_clk_below_host_limit_thermal_violation_status': "N/A",
-                    'gfx_clk_below_host_limit_violation_status': "N/A",
+                    'total_gfx_clk_below_host_limit_violation_status': "N/A",
                    'low_utilization_violation_status': "N/A",

                    # violation activity values - percent
@@ -2592,12 +2552,12 @@ class AMDSMICommands():
                    'socket_thermal_violation_activity': "N/A",
                    'vr_thermal_violation_activity': "N/A",
                    'hbm_thermal_violation_activity': "N/A",
+                    'gfx_clk_below_host_limit_violation_activity': "N/A", # deprecated
                    'gfx_clk_below_host_limit_power_violation_activity': "N/A",
                    'gfx_clk_below_host_limit_thermal_violation_activity': "N/A",
-                    'gfx_clk_below_host_limit_violation_activity': "N/A",
+                    'total_gfx_clk_below_host_limit_violation_activity': "N/A",
                    'low_utilization_violation_activity': "N/A",
                }
-                num_partition = gpu_metric['num_partition']

                try:
                    violation_status = amdsmi_interface.amdsmi_get_violation_status(args.gpu)
@@ -2609,18 +2569,18 @@ class AMDSMICommands():
                    throttle_status['hbm_thermal_accumulated'] = violation_status['acc_hbm_thrm']
                    throttle_status['gfx_clk_below_host_limit_accumulated'] = violation_status['acc_gfx_clk_below_host_limit'] #deprecated
                    throttle_status['gfx_clk_below_host_limit_power_accumulated'] = self.build_xcp_dict('acc_gfx_clk_below_host_limit_pwr', violation_status, num_partition)
-                    throttle_status['gfx_clk_below_host_limit_thermal_violation_accumulated'] = self.build_xcp_dict('acc_gfx_clk_below_host_limit_thm', violation_status, num_partition)
-                    throttle_status['gfx_clk_below_host_limit_violation_accumulated'] = self.build_xcp_dict('acc_gfx_clk_below_host_limit_total', violation_status, num_partition)
-                    throttle_status['low_utilization_violation_accumulated'] = self.build_xcp_dict('acc_low_utilization', violation_status, num_partition)
-                    throttle_status['prochot_violation_status'] = violation_status['active_prochot_thrm']
-                    throttle_status['ppt_violation_status'] = violation_status['active_ppt_pwr']
-                    throttle_status['socket_thermal_violation_status'] = violation_status['active_socket_thrm']
-                    throttle_status['vr_thermal_violation_status'] = violation_status['active_vr_thrm']
-                    throttle_status['hbm_thermal_violation_status'] = violation_status['active_hbm_thrm']
-                    throttle_status['gfx_clk_below_host_limit_violation_status'] = violation_status['active_gfx_clk_below_host_limit'] # deprecated
+                    throttle_status['gfx_clk_below_host_limit_thermal_accumulated'] = self.build_xcp_dict('acc_gfx_clk_below_host_limit_thrm', violation_status, num_partition)
+                    throttle_status['total_gfx_clk_below_host_limit_accumulated'] = self.build_xcp_dict('acc_gfx_clk_below_host_limit_total', violation_status, num_partition)
+                    throttle_status['low_utilization_accumulated'] = self.build_xcp_dict('acc_low_utilization', violation_status, num_partition)
+                    throttle_status['prochot_violation_status'] = self.build_xcp_dict('active_prochot_thrm', violation_status, num_partition)
+                    throttle_status['ppt_violation_status'] = self.build_xcp_dict('active_ppt_pwr', violation_status, num_partition)
+                    throttle_status['socket_thermal_violation_status'] = self.build_xcp_dict('active_socket_thrm', violation_status, num_partition)
+                    throttle_status['vr_thermal_violation_status'] = self.build_xcp_dict('active_vr_thrm', violation_status, num_partition)
+                    throttle_status['hbm_thermal_violation_status'] = self.build_xcp_dict('active_hbm_thrm', violation_status, num_partition)
+                    throttle_status['gfx_clk_below_host_limit_violation_status'] = self.build_xcp_dict('active_gfx_clk_below_host_limit', violation_status, num_partition) # deprecated
                    throttle_status['gfx_clk_below_host_limit_power_violation_status'] = self.build_xcp_dict('active_gfx_clk_below_host_limit_pwr', violation_status, num_partition)
-                    throttle_status['gfx_clk_below_host_limit_thermal_violation_status'] = self.build_xcp_dict('active_gfx_clk_below_host_limit_thm', violation_status, num_partition)
-                    throttle_status['gfx_clk_below_host_limit_violation_status'] = self.build_xcp_dict('active_gfx_clk_below_host_limit_total', violation_status, num_partition)
+                    throttle_status['gfx_clk_below_host_limit_thermal_violation_status'] = self.build_xcp_dict('active_gfx_clk_below_host_limit_thrm', violation_status, num_partition)
+                    throttle_status['total_gfx_clk_below_host_limit_violation_status'] = self.build_xcp_dict('active_gfx_clk_below_host_limit_total', violation_status, num_partition)
                    throttle_status['low_utilization_violation_status'] = self.build_xcp_dict('active_low_utilization', violation_status, num_partition)
                    throttle_status['prochot_violation_activity'] = violation_status['per_prochot_thrm']
                    throttle_status['ppt_violation_activity'] = violation_status['per_ppt_pwr']
@@ -2629,20 +2589,15 @@ class AMDSMICommands():
                    throttle_status['hbm_thermal_violation_activity'] = violation_status['per_hbm_thrm']
                    throttle_status['gfx_clk_below_host_limit_violation_activity'] = violation_status['per_gfx_clk_below_host_limit'] # deprecated
                    throttle_status['gfx_clk_below_host_limit_power_violation_activity'] = self.build_xcp_dict('per_gfx_clk_below_host_limit_pwr', violation_status, num_partition)
-                    throttle_status['gfx_clk_below_host_limit_thermal_violation_activity'] = self.build_xcp_dict('per_gfx_clk_below_host_limit_thm', violation_status, num_partition)
-                    throttle_status['gfx_clk_below_host_limit_violation_activity'] = self.build_xcp_dict('per_low_utilization', violation_status, num_partition)
-                    throttle_status['low_utilization_violation_activity'] = self.build_xcp_dict('per_gfx_clk_below_host_limit_total', violation_status, num_partition)
+                    throttle_status['gfx_clk_below_host_limit_thermal_violation_activity'] = self.build_xcp_dict('per_gfx_clk_below_host_limit_thrm', violation_status, num_partition)
+                    throttle_status['total_gfx_clk_below_host_limit_violation_activity'] = self.build_xcp_dict('per_gfx_clk_below_host_limit_total', violation_status, num_partition)
+                    throttle_status['low_utilization_violation_activity'] = self.build_xcp_dict('per_low_utilization', violation_status, num_partition)

                except amdsmi_exception.AmdSmiLibraryException as e:
                    values_dict['throttle'] = throttle_status
                    logging.debug("Failed to get violation status' for gpu %s | %s", gpu_id, e.get_error_info())

                for key, value in throttle_status.items():
-                    if "_status" in key:
-                        if value is True:
-                            throttle_status[key] = "ACTIVE"
-                        elif value is False:
-                            throttle_status[key] = "NOT ACTIVE"

                    activity_unit = ''
                    if "_activity" in key:
@@ -2651,21 +2606,18 @@ class AMDSMICommands():
                    if self.logger.is_human_readable_format():
                        if isinstance(value, (list, dict)):
                            for k, v in value.items():
-                                    for index, activity in enumerate(v):
-                                        if activity != "N/A":
-                                            value[k][index] = f"{activity} {activity_unit}"
-                                    value[k] = '[' + ", ".join(value[k]) + ']'
+                                for index, activity in enumerate(v):
+                                    value[k][index] = self.helpers.unit_format(self.logger, activity, activity_unit)
+                                value[k] = '[' + ", ".join(value[k]) + ']'
                        elif value != "N/A":
-                            throttle_status[key] = f"{value} {activity_unit}"
+                            value = self.helpers.unit_format(self.logger, value, activity_unit)
                    if self.logger.is_json_format():
-                        if isinstance(value, list):
-                            for index, activity in enumerate(value):
-                                if activity != "N/A":
-                                    throttle_status[key][index] = {"value" : activity,
-                                                                  "unit" : activity_unit}
+                        if isinstance(value, (list, dict)):
+                            for k, v in value.items():
+                                for index, activity in enumerate(v):
+                                    value[k][index] = self.helpers.unit_format(self.logger, activity, activity_unit)
                        elif value != "N/A":
-                            throttle_status[key] = {"value" : value,
-                                                    "unit" : activity_unit}
+                            throttle_status[key] = self.helpers.unit_format(self.logger, value, activity_unit)
                values_dict['throttle'] = throttle_status

        # Store timestamp first if watching_output is enabled
@@ -5525,7 +5477,6 @@ class AMDSMICommands():
            self.logger.clear_multiple_devices_output()
            return

-
    def monitor(self, args, multiple_devices=False, watching_output=False, gpu=None,
                    watch=None, watch_time=None, iterations=None, power_usage=None,
                    temperature=None, gfx_util=None, mem_util=None, encoder=None,
@@ -5691,9 +5642,29 @@ class AMDSMICommands():
                gpu_metric_debug_info = json.dumps(gpu_metrics_info, indent=4)
                logging.debug("GPU Metrics table for GPU %s | %s", gpu_id, gpu_metric_debug_info)
        except amdsmi_exception.AmdSmiLibraryException as e:
-            gpu_metrics_info = {} # Empty dict to avoid NameError
+            gpu_metrics_info = amdsmi_interface._NA_amdsmi_get_gpu_metrics_info()
            logging.debug("Unable to load GPU Metrics table for %s | %s", gpu_id, e.get_error_info())

+        # Workaround for XCP (partition) metrics not providing num_partition in v1.0
+        # Confirmed with driver team that we can default to 1 if num_partition is not defined.
+        # Pending partitions exist, ie. partition_id > 0. See logic below.
+        try:
+            partition_id = amdsmi_interface.amdsmi_get_gpu_kfd_info(args.gpu)['current_partition_id']
+        except amdsmi_exception.AmdSmiLibraryException as e:
+            logging.debug("Failed to get current partition id for gpu %s | %s", gpu_id, e.get_error_info())
+            partition_id = "N/A"
+
+        num_partition = gpu_metrics_info['num_partition']
+        if num_partition == "N/A":
+            num_partition = partition_id
+
+        num_xcp = num_partition  # used later for XCP metrics
+        self.logger.table_header += 'XCP'.rjust(5, ' ')
+        self.logger.store_output(args.gpu, 'xcp', partition_id)  # Starting with partition_id.
+                                                                 # Outputs which have xcp details
+                                                                 # will update this value via num_xcp.
+                                                                 # This value will help map to primary device.
+
        # Store the pcie_bw values due to possible increase in bandwidth due to repeated gpu_metrics calls
        if args.pcie:
            try:
@@ -5725,10 +5696,11 @@ class AMDSMICommands():
            self.logger.table_header += 'POWER'.rjust(7)

        if args.power_usage and not args.default_output:
-            # Get Max Power Cap
+            # Get Current Power Cap
            try:
                power_cap_info = amdsmi_interface.amdsmi_get_power_cap_info(args.gpu)
-                monitor_values['max_power'] = power_cap_info['max_power_cap']
+                monitor_values['max_power'] = power_cap_info['power_cap']  # Get current power cap (`power_cap`) socket is set to 
+                                                                           # `max_power_cap`, is the maximum value it can be set to
                monitor_values['max_power'] = self.helpers.convert_SI_unit(monitor_values['max_power'], AMDSMIHelpers.SI_Unit.MICRO)

                if self.logger.is_human_readable_format() and monitor_values['max_power'] != "N/A":
@@ -5785,7 +5757,7 @@ class AMDSMICommands():
                        monitor_values['gfx_clk'] = f"{monitor_values['gfx_clk']} {freq_unit}"
                    if self.logger.is_json_format():
                        monitor_values['gfx_clk'] = {"value" : monitor_values['gfx_clk'],
-                                                       "unit" : freq_unit}
+                                                     "unit" : freq_unit}

            except (KeyError, amdsmi_exception.AmdSmiLibraryException) as e:
                monitor_values['gfx_clk'] = "N/A"
@@ -5795,13 +5767,13 @@ class AMDSMICommands():

            try:
                gfx_util = gpu_metrics_info['average_gfx_activity']
-                monitor_values['gfx'] = round(gfx_util)
                activity_unit = '%'
                if gfx_util != "N/A":
-                    if self.logger.is_human_readable_format():
-                        monitor_values['gfx'] = f"{monitor_values['gfx']} {activity_unit}"
-                    if self.logger.is_json_format():
-                        monitor_values['gfx'] = {"value" : monitor_values['gfx'],
+                    monitor_values['gfx'] = gfx_util
+                if self.logger.is_human_readable_format():
+                    monitor_values['gfx'] = f"{monitor_values['gfx']} {activity_unit}"
+                if self.logger.is_json_format():
+                    monitor_values['gfx'] = {"value" : monitor_values['gfx'],
                                                 "unit" : activity_unit}
            except (KeyError, amdsmi_exception.AmdSmiLibraryException) as e:
                monitor_values['gfx'] = "N/A"
@@ -5812,14 +5784,14 @@ class AMDSMICommands():
        if args.mem:
            try:
                mem_util = gpu_metrics_info['average_umc_activity']
-                monitor_values['mem'] = round(mem_util)
                activity_unit = '%'
                if mem_util != "N/A":
-                    if self.logger.is_human_readable_format():
-                        monitor_values['mem'] = f"{monitor_values['mem']} {activity_unit}"
-                    if self.logger.is_json_format():
-                        monitor_values['mem'] = {"value" : monitor_values['mem'],
-                                                 "unit" : activity_unit}
+                    monitor_values['mem'] = mem_util
+                if self.logger.is_human_readable_format():
+                    monitor_values['mem'] = f"{monitor_values['mem']} {activity_unit}"
+                if self.logger.is_json_format():
+                    monitor_values['mem'] = {"value" : monitor_values['mem'],
+                                             "unit" : activity_unit}
            except (KeyError, amdsmi_exception.AmdSmiLibraryException) as e:
                monitor_values['mem'] = "N/A"
                logging.debug("Failed to get mem utilization on gpu %s | %s", gpu_id, e)
@@ -5878,19 +5850,13 @@ class AMDSMICommands():
        if args.decoder:
            try:
                # Get List of vcn activity values
-                # Note: MI3x ASICs only support decoding, so the vcn_activity is used for decoding activity.
+                # Note: MI3x ASICs only support decoding, so the vcn_activity/vcn_busy
+                #       is used for decoding activity.
                decoder_util = gpu_metrics_info['vcn_activity']
-                decoding_activity_avg = []
-                for value in decoder_util:
-                    if isinstance(value, int):
-                        decoding_activity_avg.append(value)
-
-                # Averaging the possible decoding activity values
-                if decoding_activity_avg:
-                    decoding_activity_avg = round(sum(decoding_activity_avg) / len(decoding_activity_avg))
-                else:
-                    decoding_activity_avg = "N/A"
-
+                if (gpu_metrics_info['vcn_activity'][0] == "N/A" and
+                    gpu_metrics_info['xcp_stats.vcn_busy'][partition_id][0] != "N/A"):
+                    decoder_util = gpu_metrics_info['xcp_stats.vcn_busy'][partition_id]
+                decoding_activity_avg = self.helpers.average_flattened_ints(decoder_util, context="decoder_util")
                monitor_values['decoder'] = decoding_activity_avg

                activity_unit = '%'
@@ -6050,6 +6016,10 @@ class AMDSMICommands():
                "vr_tviol": "N/A",
                "hbm_tviol": "N/A",
                "gfx_clkviol": "N/A",
+                "gfxclk_pviol": "N/A",
+                "gfxclk_tviol": "N/A",
+                "gfxclk_totalviol": "N/A",
+                "low_utilviol": "N/A"
            }
            try:
                violations = amdsmi_interface.amdsmi_get_violation_status(args.gpu)
@@ -6060,6 +6030,10 @@ class AMDSMICommands():
                violation_status['vr_tviol'] = violations['per_vr_thrm']
                violation_status['hbm_tviol'] = violations['per_hbm_thrm']
                violation_status['gfx_clkviol'] = violations['per_gfx_clk_below_host_limit']
+                violation_status['gfxclk_pviol'] = violations['per_gfx_clk_below_host_limit_pwr']
+                violation_status['gfxclk_tviol'] = violations['per_gfx_clk_below_host_limit_thrm']
+                violation_status['gfxclk_totalviol'] = violations['per_gfx_clk_below_host_limit_total']
+                violation_status['low_utilviol'] = violations['per_low_utilization']
            except amdsmi_exception.AmdSmiLibraryException as e:
                monitor_values['pviol'] = violation_status['pviol']
                monitor_values['tviol'] = violation_status['tviol']
@@ -6068,6 +6042,10 @@ class AMDSMICommands():
                monitor_values['vr_tviol'] = violation_status['vr_tviol']
                monitor_values['hbm_tviol'] = violation_status['hbm_tviol']
                monitor_values['gfx_clkviol'] = violation_status['gfx_clkviol']
+                monitor_values['gfxclk_pviol'] = violation_status['gfxclk_pviol']
+                monitor_values['gfxclk_tviol'] = violation_status['gfxclk_tviol']
+                monitor_values['gfxclk_totalviol'] = violation_status['gfxclk_totalviol']
+                monitor_values['low_utilviol'] = violation_status['low_utilviol']
                logging.debug("Failed to get violation status on gpu %s | %s", gpu_id, e.get_error_info())
            violation_status_unit = "%"
            kPVIOL_MAX_WIDTH = 7
@@ -6077,23 +6055,32 @@ class AMDSMICommands():
            kVR_MAX_WIDTH = 10
            kHBM_MAX_WIDTH = 11
            kGFXC_MAX_WIDTH = 13
+            kGFXC_PVIOL_MAX_WIDTH = 58
+            kGFXC_TVIOL_MAX_WIDTH = kGFXC_PVIOL_MAX_WIDTH
+            kGFXC_TOTALVIOL_MAX_WIDTH = kGFXC_PVIOL_MAX_WIDTH
+            kLOW_UTILVIOL_MAX_WIDTH = kGFXC_PVIOL_MAX_WIDTH

            for key, value in violation_status.items():
-                if value != "N/A":
-                    if key == "tviol_active":
-                        monitor_values[key] = value
+                if not isinstance(value, list):
+                    if value != "N/A":
+                        if key == 'tviol_active' or key == 'xcp':
+                            monitor_values[key] = value
+                        else:
+                            monitor_values[key] = self.helpers.unit_format(self.logger, violation_status[key], violation_status_unit)
                    else:
-                        monitor_values[key] = self.helpers.unit_format(self.logger, violation_status[key], violation_status_unit)
+                        monitor_values[key] = violation_status[key]
                else:
-                    monitor_values[key] = violation_status[key]
+                    if num_partition != "N/A":
+                        # these are one after another, in order to display each in sub-sections
+                        new_xcp_dict = {}
+                        for current_xcp in range(num_partition):
+                            new_xcp_dict[f"xcp_{current_xcp}"] = self.helpers.unit_format(self.logger, value[current_xcp], "%")
+                        monitor_values[key] = new_xcp_dict
+                    else:
+                        monitor_values[key] = value[0] if value else "N/A"
+            # save deep copy of monitor values, used later to grab xcp specific values
+            monitor_values_deepcopy = copy.deepcopy(monitor_values)

-            if self.logger.is_human_readable_format():
-                monitor_values['pviol'] = monitor_values['pviol'].rjust(kPVIOL_MAX_WIDTH, ' ')
-                monitor_values['tviol'] = monitor_values['tviol'].rjust(kTVIOL_MAX_WIDTH, ' ')
-                monitor_values['phot_tviol'] = monitor_values['phot_tviol'].rjust(kPHOT_MAX_WIDTH, ' ')
-                monitor_values['vr_tviol'] = monitor_values['vr_tviol'].rjust(kVR_MAX_WIDTH, ' ')
-                monitor_values['hbm_tviol'] = monitor_values['hbm_tviol'].rjust(kHBM_MAX_WIDTH, ' ')
-                monitor_values['gfx_clkviol'] = monitor_values['gfx_clkviol'].rjust(kGFXC_MAX_WIDTH, ' ')
            self.logger.table_header += 'PVIOL'.rjust(kPVIOL_MAX_WIDTH, ' ')
            self.logger.table_header += 'TVIOL'.rjust(kTVIOL_MAX_WIDTH, ' ')
            self.logger.table_header += 'TVIOL_ACTIVE'.rjust(kTVIOL_ACTIVE_MAX_WIDTH, ' ')
@@ -6101,9 +6088,69 @@ class AMDSMICommands():
            self.logger.table_header += 'VR_TVIOL'.rjust(kVR_MAX_WIDTH, ' ')
            self.logger.table_header += 'HBM_TVIOL'.rjust(kHBM_MAX_WIDTH, ' ')
            self.logger.table_header += 'GFX_CLKVIOL'.rjust(kGFXC_MAX_WIDTH, ' ')
+            self.logger.table_header += 'GFXCLK_PVIOL'.rjust(kGFXC_PVIOL_MAX_WIDTH, ' ')
+            self.logger.table_header += 'GFXCLK_TVIOL'.rjust(kGFXC_TVIOL_MAX_WIDTH, ' ')
+            self.logger.table_header += 'GFXCLK_TOTALVIOL'.rjust(kGFXC_TOTALVIOL_MAX_WIDTH, ' ')
+            self.logger.table_header += 'LOW_UTILVIOL'.rjust(kLOW_UTILVIOL_MAX_WIDTH, ' ')

-        self.logger.store_output(args.gpu, 'values', monitor_values)
+            # Print/capture by XCPs
+            if num_partition != "N/A" and partition_id == 0:
+                current_xcp = 0
+                while (current_xcp in range(num_partition) or current_xcp == 0):
+                    if not multiple_devices and watching_output and current_xcp == 0:
+                        # Need to clear output for single device, otherwise while watching output
+                        # XCP detail will continue stacking on top of each other
+                        self.logger.clear_multiple_devices_output()

+                    if watching_output:
+                        self.logger.store_output(args.gpu, 'timestamp', int(time.time()))
+
+                    self.logger.store_output(args.gpu, 'xcp', current_xcp)
+                    if current_xcp != 0:  # set all other values without XCP stats to N/A
+                        monitor_values['pviol'] = "N/A"
+                        monitor_values['tviol'] = "N/A"
+                        monitor_values['tviol_active'] = "N/A"
+                        monitor_values['phot_tviol'] = "N/A"
+                        monitor_values['vr_tviol'] = "N/A"
+                        monitor_values['hbm_tviol'] = "N/A"
+                        monitor_values['gfx_clkviol'] = "N/A"
+                        for k, _ in monitor_values.items():  # change other keys to "N/A" since we should have all applicable XCP stats
+                                                             # eg. amd-smi monitor -p -t -V should only show XCP info for violations
+                                                             # below primary device
+                            if k != 'xcp' and k not in ['gfxclk_pviol', 'gfxclk_tviol', 'gfxclk_totalviol', 'low_utilviol']:
+                                monitor_values[k] = "N/A"
+
+                    if isinstance(monitor_values_deepcopy['gfxclk_pviol'], dict):
+                        monitor_values['gfxclk_pviol'] = monitor_values_deepcopy['gfxclk_pviol'][f"xcp_{current_xcp}"]
+                    if isinstance(monitor_values_deepcopy['gfxclk_tviol'], dict):
+                        monitor_values['gfxclk_tviol'] = monitor_values_deepcopy['gfxclk_tviol'][f"xcp_{current_xcp}"]
+                    if isinstance(monitor_values_deepcopy['gfxclk_totalviol'], dict):
+                        monitor_values['gfxclk_totalviol'] = monitor_values_deepcopy['gfxclk_totalviol'][f"xcp_{current_xcp}"]
+                    if isinstance(monitor_values_deepcopy['low_utilviol'], dict):
+                        monitor_values['low_utilviol'] = monitor_values_deepcopy['low_utilviol'][f"xcp_{current_xcp}"]
+
+                    if self.logger.is_human_readable_format():
+                        monitor_values['pviol'] = monitor_values['pviol'].rjust(kPVIOL_MAX_WIDTH, ' ')
+                        monitor_values['tviol'] = monitor_values['tviol'].rjust(kTVIOL_MAX_WIDTH, ' ')
+                        monitor_values['phot_tviol'] = monitor_values['phot_tviol'].rjust(kPHOT_MAX_WIDTH, ' ')
+                        monitor_values['vr_tviol'] = monitor_values['vr_tviol'].rjust(kVR_MAX_WIDTH, ' ')
+                        monitor_values['hbm_tviol'] = monitor_values['hbm_tviol'].rjust(kHBM_MAX_WIDTH, ' ')
+                        monitor_values['gfx_clkviol'] = monitor_values['gfx_clkviol'].rjust(kGFXC_MAX_WIDTH, ' ')
+                        monitor_values['gfxclk_pviol'] = str(monitor_values['gfxclk_pviol']).rjust(kGFXC_PVIOL_MAX_WIDTH, ' ').strip().replace('\'', '')
+                        monitor_values['gfxclk_tviol'] = str(monitor_values['gfxclk_tviol']).rjust(kGFXC_TVIOL_MAX_WIDTH, ' ').strip().replace('\'', '')
+                        monitor_values['gfxclk_totalviol'] = str(monitor_values['gfxclk_totalviol']).rjust(kGFXC_TOTALVIOL_MAX_WIDTH, ' ').strip().replace('\'', '')
+                        monitor_values['low_utilviol'] = str(monitor_values['low_utilviol']).rjust(kLOW_UTILVIOL_MAX_WIDTH, ' ').strip().replace('\'', '')
+                    self.logger.store_output(args.gpu, 'values', monitor_values)
+                    self.logger.store_multiple_device_output()
+                    current_xcp += 1
+            else:
+                self.logger.store_output(args.gpu, 'xcp', num_xcp)
+                self.logger.store_output(args.gpu, 'values', monitor_values)
+                self.logger.store_multiple_device_output()
+
+        # Store typical output for all commands (XCP data will be handled separately, eg. violation status)
+        if not args.violation:
+            self.logger.store_output(args.gpu, 'values', monitor_values)
        # intialize dual_csv_format; applicable to process only
        dual_csv_output = False

@@ -6207,7 +6254,7 @@ class AMDSMICommands():
            self.logger.store_watch_output(multiple_device_enabled=False)


-        self.logger.print_output(multiple_device_enabled=False, watching_output=watching_output, tabular=True, dual_csv_output=dual_csv_output)
+        self.logger.print_output(multiple_device_enabled=True, watching_output=watching_output, tabular=True, dual_csv_output=dual_csv_output)


    def xgmi(self, args, multiple_devices=False, gpu=None, metric=None, xgmi_link_status=None):
@@ -6947,7 +6994,7 @@ class AMDSMICommands():
            try:
                gpu_metrics = amdsmi_interface.amdsmi_get_gpu_metrics_info(processor)
            except amdsmi_exception.AmdSmiLibraryException as e:
-                gpu_metrics = "N/A"
+                gpu_metrics = amdsmi_interface._NA_amdsmi_get_gpu_metrics_info()

            # partition info
            try:
@@ -6999,9 +7046,7 @@ class AMDSMICommands():
            # mem utilization, GPU utilization, power usage, and temperature from gpu_metrics
            if gpu_metrics != "N/A":
                mem_util = gpu_metrics['average_umc_activity']
-                mem_util = round(mem_util)
                gfx_util = gpu_metrics['average_gfx_activity']
-                gfx_util = round(gfx_util)
                if gpu_metrics['current_socket_power'] != "N/A":
                    current_power = gpu_metrics['current_socket_power']
                else:
@@ -1014,13 +1014,28 @@ class AMDSMIHelpers():
        return:
            str or dict : formatted output
        """
-        if value == "N/A":
-            return "N/A"
-        if logger.is_json_format():
-            return {"value": value, "unit": unit}
-        if logger.is_human_readable_format():
-            return f"{value} {unit}".rstrip()
-        return f"{value}"
+        if isinstance(value, list):
+            formatted_values = []
+            for val in value:
+                if isinstance(val, str) and val == "N/A":
+                    formatted_values.append("N/A")
+                else:
+                    formatted_values.append(self.unit_format(logger, val, unit))
+            return formatted_values
+        else:
+            if value == "N/A":
+                return "N/A"
+            if logger.is_json_format():
+                if unit:
+                    return {"value": value, "unit": unit}
+                else:
+                    return value
+            if logger.is_human_readable_format():
+                if unit:
+                    return f"{value} {unit}".rstrip()
+                else:
+                    return f"{value}".rstrip()
+            return f"{value}"

    def unit_unformat(self, logger, formatted_value):
        """
@@ -1483,3 +1498,22 @@ class AMDSMIHelpers():
                ranges[cpu] = f"{start_setbit}-{end_setbit}"

        return ranges
+
+    @staticmethod
+    def average_flattened_ints(data, context="data"):
+        """Calculate the average of flattened integers from a list or tuple
+        Args:
+            data (list or tuple): Data to calculate the average from
+            context (str, optional): Context for logging. Defaults to "data".
+        Returns:
+            float or str: Average of integers if available, otherwise "N/A"
+        """
+        # Type validation - ensure data is list or tuple
+        # Note: Data can be nested list of lists and will filter out N/A values
+        if not isinstance(data, (list, tuple)):
+            logging.debug(f"Invalid data type for {context}: expected list/tuple, got {type(data)}")
+            return "N/A"
+    
+        # Flatten nested lists and filter integers
+        flat = [v for value in data for v in (value if isinstance(value, list) else [value]) if isinstance(v, int)]
+        return round(sum(flat) / len(flat)) if flat else "N/A"
@@ -157,6 +157,9 @@ class AMDSMILogger():
            elif key == 'gpu':
                stored_gpu = string_value
                table_values += string_value.rjust(3)
+            elif key == 'xcp':
+                stored_gpu = string_value
+                table_values += string_value.rjust(5)
            elif key == 'timestamp':
                stored_timestamp = string_value
                table_values += string_value.rjust(10) + '  '
@@ -170,6 +173,8 @@ class AMDSMILogger():
                table_values += string_value.rjust(7)
            elif key in ('gfx_clk'):
                table_values += string_value.rjust(10)
+            elif key in ('vram_usage'):
+                table_values += string_value.rjust(16)
            elif key in ('mem_clock', 'vram_used'):
                table_values += string_value.rjust(11)
            elif key in ('vram_total', 'vram_free'):
@@ -217,6 +222,8 @@ class AMDSMILogger():
                table_values += string_value.rjust(11)
            elif key == "gfx_clkviol":
                table_values += string_value.rjust(13)
+            elif key in ("gfxclk_pviol", "gfxclk_tviol", "gfxclk_totalviol", "low_utilviol"):
+                table_values += string_value.rjust(58)
            elif key == "process_list":
                #Add an additional padding between the first instance of GPU and NAME
                table_values += '  '
@@ -1014,8 +1014,8 @@ class AMDSMIParser(argparse.ArgumentParser):
                metric_parser.add_argument('-l', '--perf-level', action='store_true', required=False, help=perf_level_help)
                metric_parser.add_argument('-x', '--xgmi-err', action='store_true', required=False, help=xgmi_err_help)
                metric_parser.add_argument('-E', '--energy', action='store_true', required=False, help=energy_help)
-                metric_parser.add_argument('-v', '--violation', action='store_true', required=False, help=throttle_help)
-                metric_parser.add_argument('-T', '--throttle', dest='violation', action='store_true', required=False, help=argparse.SUPPRESS)
+                metric_parser.add_argument('-v', '--violation', dest='throttle', action='store_true', required=False, help=throttle_help)
+                metric_parser.add_argument('-T', '--throttle', dest='throttle', action='store_true', required=False, help=argparse.SUPPRESS)

            # Options to only display to Hypervisors
            if self.helpers.is_hypervisor():
@@ -872,6 +872,7 @@ int main() {
        // For each device of the socket, get name and temperature.
        for (uint32_t device_index = 0; device_index < device_count; device_index++) {
            std::cout << "Device Index: " << device_index << std::endl;
+            std::cout << "SMI gpu #: " << gpu_number << std::endl;

 // Commenting out the code to get CPU socket count and GPU count
 // Doesn't work on system with no supported CPU sockets
@@ -884,6 +885,95 @@ int main() {
            std::cout << "GPU count: " << gpus << std::endl;
 #endif

+// Commenting out since, not verified to work on all ASICs yet.
+#if 0
+            amdsmi_name_value_t *pm_metrics = {};
+            uint32_t num_metrics = 0;
+            ret = amdsmi_get_gpu_pm_metrics_info(processor_handles[device_index],
+                                                 &pm_metrics, &num_metrics);
+            const char* err_str;
+            amdsmi_status_code_to_string(ret, &err_str);
+            std::cout << "    Output of amdsmi_get_gpu_pm_metrics_info:" << err_str << "\n";
+            if (ret == AMDSMI_STATUS_SUCCESS) {
+                CHK_AMDSMI_RET(ret)
+                std::cout << "\tNumber of PM metrics: " << num_metrics << std::endl;
+                for (uint32_t j = 0; j < num_metrics; j++) {
+                    std::cout << "\tPM Metric Name: " << pm_metrics[j].name
+                              << ", Value: " << pm_metrics[j].value << std::endl;
+                }
+            }
+            free(pm_metrics);
+
+            // typedef enum {
+            //     AMDSMI_REG_XGMI,  //!< XGMI registers
+            //     AMDSMI_REG_WAFL,  //!< WAFL registers
+            //     AMDSMI_REG_PCIE,  //!< PCIe registers
+            //     AMDSMI_REG_USR,   //!< Usr registers
+            //     AMDSMI_REG_USR1   //!< Usr1 registers
+            // } amdsmi_reg_type_t;
+            std::map<amdsmi_reg_type_t, std::string> reg_type_map = {
+                {AMDSMI_REG_XGMI, "XGMI"},
+                {AMDSMI_REG_WAFL, "WAFL"},
+                {AMDSMI_REG_PCIE, "PCIE"},
+                {AMDSMI_REG_USR, "USR"},
+                {AMDSMI_REG_USR1, "USR1"}
+            };
+
+            for (uint32_t j = static_cast<uint32_t>(AMDSMI_REG_XGMI);
+                 j <= static_cast<uint32_t>(AMDSMI_REG_USR1); j++) {
+                amdsmi_name_value_t *reg_metrics = {};
+                amdsmi_reg_type_t reg_type = static_cast<amdsmi_reg_type_t>(j);
+                std::string reg_type_str = "N/A";
+                ret = amdsmi_get_gpu_reg_table_info(processor_handles[device_index],
+                                                    reg_type, &reg_metrics, &num_metrics);
+                if (auto it = reg_type_map.find(reg_type); it != reg_type_map.end()) {
+                    reg_type_str = it->second;
+                }
+                // Skipping these for now due to some ASICS having issues
+                if (reg_type == AMDSMI_REG_USR1 || reg_type == AMDSMI_REG_XGMI ||
+                    reg_type == AMDSMI_REG_USR) {
+                    std::cout << "\tSkipping " << reg_type_str << " registers for now."
+                              << std::endl;
+                    free(reg_metrics);
+                    continue;
+                }
+
+                amdsmi_status_code_to_string(ret, &err_str);
+                std::cout << "    Output of amdsmi_get_gpu_reg_table_info(" << gpu_number << ", "
+                          << reg_type_str << "): " << err_str << "\n";
+                if (ret == AMDSMI_STATUS_SUCCESS) {
+                    CHK_AMDSMI_RET(ret)
+                    std::cout << "\tNumber of Register metrics: " << num_metrics << std::endl;
+                    for (uint32_t k = 0; k < num_metrics; k++) {
+                        if (reg_metrics == nullptr) {
+                            std::cout << "\tRegister Number: " << k
+                                      << ", Type: " << reg_type_str
+                                      << ", Register Metric Name: N/A, Value: N/A" << std::endl;
+                            continue;
+                        }
+                        if (reg_metrics[k].name == nullptr) {
+                            std::cout << "\tRegister Number: " << k
+                                      << ", Type: " << reg_type_str
+                                      << ", Register Metric Name: "
+                                      << (reg_metrics[k].name != nullptr ?
+                                          reg_metrics[k].name : "N/A")
+                                      << ", Value: N/A" << std::endl;
+                            continue;
+                        }
+                        std::cout << "\tRegister Number: " << k
+                                << ", Type: " << reg_type_str
+                                << ", Register Metric Name: "
+                                << (reg_metrics[k].name != nullptr ?
+                                    reg_metrics[k].name : "N/A")
+                                << ", Value: " << reg_metrics[k].value << std::endl;
+                    }
+                }
+                free(reg_metrics);
+                std::cout << std::endl;
+            }
+            std::cout << std::endl;
+#endif
+
            // Get device type. Since the amdsmi is initialized with
            // AMD_SMI_INIT_AMD_GPUS, the processor_type must be AMDSMI_PROCESSOR_TYPE_AMD_GPU.
            processor_type_t processor_type = {};
@@ -1909,8 +1999,8 @@ int main() {
                    }
                }
            }
-          gpu_number++;
-      }
+            gpu_number++;
+        }
    }

    // Clean up resources allocated at amdsmi_init. It will invalidate sockets
@@ -714,17 +714,17 @@ typedef struct {
                                                   Gfx clock below host limit violation; 1 = active 0 = not active; Max uint8 means unsupported.*/
    //GPU metrics 1.8 violations
    uint64_t acc_gfx_clk_below_host_limit_pwr[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];    //!< New Driver 1.8 fields: Current gfx clock below host limit power count; Max uint64 means unsupported
-    uint64_t acc_gfx_clk_below_host_limit_thm[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];    //!< New Driver 1.8 fields: Current gfx clock below host limit thermal count; Max uint64 means unsupported
+    uint64_t acc_gfx_clk_below_host_limit_thrm[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];    //!< New Driver 1.8 fields: Current gfx clock below host limit thermal count; Max uint64 means unsupported
    uint64_t acc_low_utilization[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];                 //!< New Driver 1.8 fields: Current low utilization count; Max uint64 means unsupported
    uint64_t acc_gfx_clk_below_host_limit_total[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];  //!< New Driver 1.8 fields: Current gfx clock below host limit total count; Max uint64 means unsupported

    uint64_t per_gfx_clk_below_host_limit_pwr[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];    //!< New Driver 1.8 fields: Gfx clock below host limit power violation % (greater than 0% is a violation); Max uint64 means unsupported
-    uint64_t per_gfx_clk_below_host_limit_thm[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];    //!< New Driver 1.8 fields: Gfx clock below host limit violation % (greater than 0% is a violation); Max uint64 means unsupported
+    uint64_t per_gfx_clk_below_host_limit_thrm[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];    //!< New Driver 1.8 fields: Gfx clock below host limit violation % (greater than 0% is a violation); Max uint64 means unsupported
    uint64_t per_low_utilization[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];                 //!< New Driver 1.8 fields: Low utilization violation % (greater than 0% is a violation); Max uint64 means unsupported
    uint64_t per_gfx_clk_below_host_limit_total[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];  //!< New Driver 1.8 fields: Any Gfx clock below host limit violation % (greater than 0% is a violation); Max uint64 means unsupported

    uint8_t active_gfx_clk_below_host_limit_pwr[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];  //!< New Driver 1.8 fields: Gfx clock below host limit power violation; 1 = active 0 = not active; Max uint8 means unsupported
-    uint8_t active_gfx_clk_below_host_limit_thm[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];  //!< New Driver 1.8 fields: Gfx clock below host limit thermal violation; 1 = active 0 = not active; Max uint8 means unsupported
+    uint8_t active_gfx_clk_below_host_limit_thrm[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];  //!< New Driver 1.8 fields: Gfx clock below host limit thermal violation; 1 = active 0 = not active; Max uint8 means unsupported
    uint8_t active_low_utilization[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];               //!< New Driver 1.8 fields: Low utilization violation; 1 = active 0 = not active; Max uint8 means unsupported
    uint8_t active_gfx_clk_below_host_limit_total[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];//!< New Driver 1.8 fields: Any Gfx clock host limit violation; 1 = active 0 = not active; Max uint8 means unsupported
    uint64_t reserved[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC];   // reserved for new violation info
@@ -787,6 +787,103 @@ def _notifyTypeToString(notify_type_b):
    else:
        return "Unknown"

+def _NA_amdsmi_get_gpu_metrics_info() -> Dict[str, str]:
+    """
+    Get 'N/A' metric values for gpu_metric, used for exception handling.
+
+    Parameters:
+        None
+
+    Returns:
+        Dict[str, str]: A dictionary with keys as metric names and values as 'N/A'.
+        This is used to indicate that the metric is not available or applicable.
+
+    Raises:
+        N/A
+    """
+    na_gpu_metrics_info = {
+        "common_header.structure_size": "N/A",
+        "common_header.format_revision": "N/A",
+        "common_header.content_revision": "N/A",
+        "temperature_edge": "N/A",
+        "temperature_hotspot": "N/A",
+        "temperature_mem": "N/A",
+        "temperature_vrgfx": "N/A",
+        "temperature_vrsoc": "N/A",
+        "temperature_vrmem": "N/A",
+        "average_gfx_activity": "N/A",
+        "average_umc_activity": "N/A",
+        "average_mm_activity": "N/A",
+        "average_socket_power": "N/A",
+        "energy_accumulator": "N/A",
+        "system_clock_counter": "N/A",
+        "average_gfxclk_frequency": "N/A",
+        "average_socclk_frequency": "N/A",
+        "average_uclk_frequency": "N/A",
+        "average_vclk0_frequency": "N/A",
+        "average_dclk0_frequency": "N/A",
+        "average_vclk1_frequency": "N/A",
+        "average_dclk1_frequency": "N/A",
+        "current_gfxclk": "N/A",
+        "current_socclk": "N/A",
+        "current_uclk": "N/A",
+        "current_vclk0": "N/A",
+        "current_dclk0": "N/A",
+        "current_vclk1": "N/A",
+        "current_dclk1": "N/A",
+        "throttle_status": "N/A",
+        "current_fan_speed": "N/A",
+        "pcie_link_width": "N/A",
+        "pcie_link_speed": "N/A",
+        "gfx_activity_acc": "N/A",
+        "mem_activity_acc": "N/A",
+        "temperature_hbm": "N/A",
+        "firmware_timestamp": "N/A",
+        "voltage_soc": "N/A",
+        "voltage_gfx": "N/A",
+        "voltage_mem": "N/A",
+        "indep_throttle_status": "N/A",
+        "current_socket_power": "N/A",
+        "vcn_activity": "N/A",
+        "gfxclk_lock_status": "N/A",
+        "xgmi_link_width": "N/A",
+        "xgmi_link_speed": "N/A",
+        "pcie_bandwidth_acc": "N/A",
+        "pcie_bandwidth_inst": "N/A",
+        "pcie_l0_to_recov_count_acc": "N/A",
+        "pcie_replay_count_acc": "N/A",
+        "pcie_replay_rover_count_acc": "N/A",
+        "xgmi_read_data_acc": "N/A",
+        "xgmi_write_data_acc": "N/A",
+        "current_gfxclks": "N/A",
+        "current_socclks": "N/A",
+        "current_vclk0s": "N/A",
+        "current_dclk0s": "N/A",
+        "jpeg_activity": "N/A",
+        "pcie_nak_sent_count_acc": "N/A",
+        "pcie_nak_rcvd_count_acc": "N/A",
+        "accumulation_counter": "N/A",
+        "prochot_residency_acc": "N/A",
+        "ppt_residency_acc": "N/A",
+        "socket_thm_residency_acc": "N/A",
+        "vr_thm_residency_acc": "N/A",
+        "hbm_thm_residency_acc": "N/A",
+        "num_partition": "N/A",
+        "xcp_stats.gfx_busy_inst": "N/A",
+        "xcp_stats.jpeg_busy": "N/A",
+        "xcp_stats.vcn_busy": "N/A",
+        "xcp_stats.gfx_busy_acc": "N/A",
+        "xcp_stats.gfx_below_host_limit_acc": "N/A",
+        "xcp_stats.gfx_below_host_limit_ppt_acc": "N/A",
+        "xcp_stats.gfx_below_host_limit_thm_acc": "N/A",
+        "xcp_stats.gfx_low_utilization_acc": "N/A",
+        "xcp_stats.gfx_below_host_limit_total_acc": "N/A",
+        "pcie_lc_perf_other_end_recovery": "N/A",
+        "vram_max_bandwidth": "N/A",
+        "xgmi_link_status": "N/A"
+    }
+    return na_gpu_metrics_info
+

 def amdsmi_get_socket_handles() -> List[c_void_p]:
    """
@@ -2351,9 +2448,9 @@ def amdsmi_get_violation_status(
        "acc_hbm_thrm": _validate_if_max_uint(violation_status.acc_hbm_thrm, MaxUIntegerTypes.UINT64_T),
        "acc_gfx_clk_below_host_limit": _validate_if_max_uint(violation_status.acc_gfx_clk_below_host_limit, MaxUIntegerTypes.UINT64_T),
        "acc_gfx_clk_below_host_limit_pwr": list(violation_status.acc_gfx_clk_below_host_limit_pwr),
-        "acc_gfx_clk_below_host_limit_thm": list(violation_status.acc_gfx_clk_below_host_limit_thm),
-        "acc_low_utilization": list(violation_status.acc_low_utilization),
+        "acc_gfx_clk_below_host_limit_thrm": list(violation_status.acc_gfx_clk_below_host_limit_thrm),
        "acc_gfx_clk_below_host_limit_total": list(violation_status.acc_gfx_clk_below_host_limit_total),
+        "acc_low_utilization": list(violation_status.acc_low_utilization),
        "per_prochot_thrm": _validate_if_max_uint(violation_status.per_prochot_thrm, MaxUIntegerTypes.UINT64_T, isActivity=True),
        "per_ppt_pwr": _validate_if_max_uint(violation_status.per_ppt_pwr, MaxUIntegerTypes.UINT64_T, isActivity=True),          #PVIOL
        "per_socket_thrm": _validate_if_max_uint(violation_status.per_socket_thrm, MaxUIntegerTypes.UINT64_T, isActivity=True),  #TVIOL
@@ -2361,9 +2458,9 @@ def amdsmi_get_violation_status(
        "per_hbm_thrm": _validate_if_max_uint(violation_status.per_hbm_thrm, MaxUIntegerTypes.UINT64_T, isActivity=True),
        "per_gfx_clk_below_host_limit": _validate_if_max_uint(violation_status.per_gfx_clk_below_host_limit, MaxUIntegerTypes.UINT64_T, isActivity=True),
        "per_gfx_clk_below_host_limit_pwr": list(violation_status.per_gfx_clk_below_host_limit_pwr),
-        "per_gfx_clk_below_host_limit_thm": list(violation_status.per_gfx_clk_below_host_limit_thm),
-        "per_low_utilization": list(violation_status.per_low_utilization),
+        "per_gfx_clk_below_host_limit_thrm": list(violation_status.per_gfx_clk_below_host_limit_thrm),
        "per_gfx_clk_below_host_limit_total": list(violation_status.per_gfx_clk_below_host_limit_total),
+        "per_low_utilization": list(violation_status.per_low_utilization),
        "active_prochot_thrm": _validate_if_max_uint(violation_status.active_prochot_thrm, MaxUIntegerTypes.UINT8_T, isBool=True),
        "active_ppt_pwr": _validate_if_max_uint(violation_status.active_ppt_pwr, MaxUIntegerTypes.UINT8_T, isBool=True),         #PVIOL
        "active_socket_thrm": _validate_if_max_uint(violation_status.active_socket_thrm, MaxUIntegerTypes.UINT8_T, isBool=True), #TVIOL
@@ -2371,9 +2468,9 @@ def amdsmi_get_violation_status(
        "active_hbm_thrm": _validate_if_max_uint(violation_status.active_hbm_thrm, MaxUIntegerTypes.UINT8_T, isBool=True),
        "active_gfx_clk_below_host_limit": _validate_if_max_uint(violation_status.active_gfx_clk_below_host_limit, MaxUIntegerTypes.UINT8_T, isBool=True),
        "active_gfx_clk_below_host_limit_pwr": list(violation_status.active_gfx_clk_below_host_limit_pwr),
-        "active_gfx_clk_below_host_limit_thm": list(violation_status.active_gfx_clk_below_host_limit_thm),
-        "active_low_utilization": list(violation_status.active_low_utilization),
+        "active_gfx_clk_below_host_limit_thrm": list(violation_status.active_gfx_clk_below_host_limit_thrm),
        "active_gfx_clk_below_host_limit_total": list(violation_status.active_gfx_clk_below_host_limit_total),
+        "active_low_utilization": list(violation_status.active_low_utilization),
    }

    # Create 2d array with each XCD's stats
@@ -2381,25 +2478,25 @@ def amdsmi_get_violation_status(
        for xcp_index, xcp_metrics in enumerate(dict_return['acc_gfx_clk_below_host_limit_pwr']):
            xcp_detail = []
            for val in xcp_metrics:
-                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T, isActivity=True))
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
            dict_return['acc_gfx_clk_below_host_limit_pwr'][xcp_index] = xcp_detail
-    if 'acc_gfx_clk_below_host_limit_thm' in dict_return:
-        for xcp_index, xcp_metrics in enumerate(dict_return['acc_gfx_clk_below_host_limit_thm']):
+    if 'acc_gfx_clk_below_host_limit_thrm' in dict_return:
+        for xcp_index, xcp_metrics in enumerate(dict_return['acc_gfx_clk_below_host_limit_thrm']):
            xcp_detail = []
            for val in xcp_metrics:
-                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T, isActivity=True))
-            dict_return['acc_gfx_clk_below_host_limit_thm'][xcp_index] = xcp_detail
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
+            dict_return['acc_gfx_clk_below_host_limit_thrm'][xcp_index] = xcp_detail
    if 'acc_low_utilization' in dict_return:
        for xcp_index, xcp_metrics in enumerate(dict_return['acc_low_utilization']):
            xcp_detail = []
            for val in xcp_metrics:
-                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T, isActivity=True))
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
            dict_return['acc_low_utilization'][xcp_index] = xcp_detail
    if 'acc_gfx_clk_below_host_limit_total' in dict_return:
        for xcp_index, xcp_metrics in enumerate(dict_return['acc_gfx_clk_below_host_limit_total']):
            xcp_detail = []
            for val in xcp_metrics:
-                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T, isActivity=True))
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
            dict_return['acc_gfx_clk_below_host_limit_total'][xcp_index] = xcp_detail

    if 'per_gfx_clk_below_host_limit_pwr' in dict_return:
@@ -2408,12 +2505,12 @@ def amdsmi_get_violation_status(
            for val in xcp_metrics:
                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T, isActivity=True))
            dict_return['per_gfx_clk_below_host_limit_pwr'][xcp_index] = xcp_detail
-    if 'per_gfx_clk_below_host_limit_thm' in dict_return:
-        for xcp_index, xcp_metrics in enumerate(dict_return['per_gfx_clk_below_host_limit_thm']):
+    if 'per_gfx_clk_below_host_limit_thrm' in dict_return:
+        for xcp_index, xcp_metrics in enumerate(dict_return['per_gfx_clk_below_host_limit_thrm']):
            xcp_detail = []
            for val in xcp_metrics:
                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T, isActivity=True))
-            dict_return['per_gfx_clk_below_host_limit_thm'][xcp_index] = xcp_detail
+            dict_return['per_gfx_clk_below_host_limit_thrm'][xcp_index] = xcp_detail
    if 'per_low_utilization' in dict_return:
        for xcp_index, xcp_metrics in enumerate(dict_return['per_low_utilization']):
            xcp_detail = []
@@ -2433,12 +2530,12 @@ def amdsmi_get_violation_status(
            for val in xcp_metrics:
                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT8_T, isBool=True))
            dict_return['active_gfx_clk_below_host_limit_pwr'][xcp_index] = xcp_detail
-    if 'active_gfx_clk_below_host_limit_thm' in dict_return:
-        for xcp_index, xcp_metrics in enumerate(dict_return['active_gfx_clk_below_host_limit_thm']):
+    if 'active_gfx_clk_below_host_limit_thrm' in dict_return:
+        for xcp_index, xcp_metrics in enumerate(dict_return['active_gfx_clk_below_host_limit_thrm']):
            xcp_detail = []
            for val in xcp_metrics:
                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT8_T, isBool=True))
-            dict_return['active_gfx_clk_below_host_limit_thm'][xcp_index] = xcp_detail
+            dict_return['active_gfx_clk_below_host_limit_thrm'][xcp_index] = xcp_detail
    if 'active_low_utilization' in dict_return:
        for xcp_index, xcp_metrics in enumerate(dict_return['active_low_utilization']):
            xcp_detail = []
@@ -4614,6 +4711,9 @@ def amdsmi_get_gpu_metrics_info(
    )

    gpu_metrics_output = {
+        "common_header.structure_size": _validate_if_max_uint(gpu_metrics.common_header.structure_size, MaxUIntegerTypes.UINT16_T),
+        "common_header.format_revision": _validate_if_max_uint(gpu_metrics.common_header.format_revision, MaxUIntegerTypes.UINT8_T),
+        "common_header.content_revision": _validate_if_max_uint(gpu_metrics.common_header.content_revision, MaxUIntegerTypes.UINT8_T),
        "temperature_edge": _validate_if_max_uint(gpu_metrics.temperature_edge, MaxUIntegerTypes.UINT16_T),
        "temperature_hotspot": _validate_if_max_uint(gpu_metrics.temperature_hotspot, MaxUIntegerTypes.UINT16_T),
        "temperature_mem": _validate_if_max_uint(gpu_metrics.temperature_mem, MaxUIntegerTypes.UINT16_T),
@@ -871,15 +871,15 @@ struct_amdsmi_violation_status_t._fields_ = [
    ('active_gfx_clk_below_host_limit', ctypes.c_ubyte),
    ('PADDING_0', ctypes.c_ubyte * 2),
    ('acc_gfx_clk_below_host_limit_pwr', ctypes.c_uint64 * 8 * 8),
-    ('acc_gfx_clk_below_host_limit_thm', ctypes.c_uint64 * 8 * 8),
+    ('acc_gfx_clk_below_host_limit_thrm', ctypes.c_uint64 * 8 * 8),
    ('acc_low_utilization', ctypes.c_uint64 * 8 * 8),
    ('acc_gfx_clk_below_host_limit_total', ctypes.c_uint64 * 8 * 8),
    ('per_gfx_clk_below_host_limit_pwr', ctypes.c_uint64 * 8 * 8),
-    ('per_gfx_clk_below_host_limit_thm', ctypes.c_uint64 * 8 * 8),
+    ('per_gfx_clk_below_host_limit_thrm', ctypes.c_uint64 * 8 * 8),
    ('per_low_utilization', ctypes.c_uint64 * 8 * 8),
    ('per_gfx_clk_below_host_limit_total', ctypes.c_uint64 * 8 * 8),
    ('active_gfx_clk_below_host_limit_pwr', ctypes.c_ubyte * 8 * 8),
-    ('active_gfx_clk_below_host_limit_thm', ctypes.c_ubyte * 8 * 8),
+    ('active_gfx_clk_below_host_limit_thrm', ctypes.c_ubyte * 8 * 8),
    ('active_low_utilization', ctypes.c_ubyte * 8 * 8),
    ('active_gfx_clk_below_host_limit_total', ctypes.c_ubyte * 8 * 8),
    ('reserved', ctypes.c_uint64 * 8 * 8),
@@ -705,7 +705,7 @@ struct AMDGpuMetrics_v18_t {
  uint16_t m_average_gfx_activity;
  uint16_t m_average_umc_activity;  // memory controller

-  /* VRAM max bandwidthi (in GB/sec) at max memory clock */
+  /* VRAM max bandwidth (in GB/sec) at max memory clock */
  uint64_t m_mem_max_bandwidth;

  /* Energy (15.259uJ (2^-16) units) */
@@ -1043,20 +1043,32 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
    violation_status->active_hbm_thrm = std::numeric_limits<uint8_t>::max();
    violation_status->active_gfx_clk_below_host_limit = std::numeric_limits<uint8_t>::max();

-    fill_2d_array(violation_status->acc_gfx_clk_below_host_limit_pwr, std::numeric_limits<uint64_t>::max());
-    fill_2d_array(violation_status->acc_gfx_clk_below_host_limit_thm, std::numeric_limits<uint64_t>::max());
-    fill_2d_array(violation_status->acc_low_utilization, std::numeric_limits<uint64_t>::max());
-    fill_2d_array(violation_status->acc_gfx_clk_below_host_limit_total, std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->acc_gfx_clk_below_host_limit_pwr,
+        std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->acc_gfx_clk_below_host_limit_thrm,
+        std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->acc_low_utilization,
+        std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->acc_gfx_clk_below_host_limit_total,
+        std::numeric_limits<uint64_t>::max());

-    fill_2d_array(violation_status->per_gfx_clk_below_host_limit_pwr, std::numeric_limits<uint64_t>::max());
-    fill_2d_array(violation_status->per_gfx_clk_below_host_limit_thm, std::numeric_limits<uint64_t>::max());
-    fill_2d_array(violation_status->per_low_utilization, std::numeric_limits<uint64_t>::max());
-    fill_2d_array(violation_status->per_gfx_clk_below_host_limit_total, std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->per_gfx_clk_below_host_limit_pwr,
+        std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->per_gfx_clk_below_host_limit_thrm,
+        std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->per_low_utilization,
+        std::numeric_limits<uint64_t>::max());
+    fill_2d_array(violation_status->per_gfx_clk_below_host_limit_total,
+        std::numeric_limits<uint64_t>::max());

-    fill_2d_array(violation_status->active_gfx_clk_below_host_limit_pwr, std::numeric_limits<uint8_t>::max());
-    fill_2d_array(violation_status->active_gfx_clk_below_host_limit_thm, std::numeric_limits<uint8_t>::max());
-    fill_2d_array(violation_status->active_low_utilization, std::numeric_limits<uint8_t>::max());
-    fill_2d_array(violation_status->active_gfx_clk_below_host_limit_total, std::numeric_limits<uint8_t>::max());
+    fill_2d_array(violation_status->active_gfx_clk_below_host_limit_pwr,
+        std::numeric_limits<uint8_t>::max());
+    fill_2d_array(violation_status->active_gfx_clk_below_host_limit_thrm,
+        std::numeric_limits<uint8_t>::max());
+    fill_2d_array(violation_status->active_low_utilization,
+        std::numeric_limits<uint8_t>::max());
+    fill_2d_array(violation_status->active_gfx_clk_below_host_limit_total,
+        std::numeric_limits<uint8_t>::max());

    const auto p1 = std::chrono::system_clock::now();
    auto current_time = std::chrono::duration_cast<std::chrono::microseconds>(
@@ -1081,14 +1093,14 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
    }

    // default to 0xffffffff as not supported
-    uint32_t partitition_id = std::numeric_limits<uint32_t>::max();
+    uint32_t partition_id = std::numeric_limits<uint32_t>::max();
    auto tmp_partition_id = uint32_t(0);
    amdsmi_status_t status = rsmi_wrapper(rsmi_dev_partition_id_get, processor_handle, 0,
                                          &(tmp_partition_id));
    // Do not return early if this value fails
    // continue to try getting all info
    if (status == AMDSMI_STATUS_SUCCESS) {
-        partitition_id = tmp_partition_id;
+        partition_id = tmp_partition_id;
    }

    amdsmi_gpu_metrics_t metric_info_a = {};
@@ -1102,15 +1114,28 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
        return status;
    }

-    // if all of these values are "undefined" then the feature is not supported on the ASIC
+    // Note: Both XCP and partition_id will default to 0, if gpu_metrics file is not present.
+    //       This is why we can check elements in kFIRST_ELEMENT == 0 for both XCP and partition_id.
+    const uint32_t kFIRST_ELEMENT = 0;
+
+    // Check if violation status is supported:
+    // If all of these values are "undefined" then the feature is not supported on the ASIC
    if (metric_info_a.accumulation_counter == std::numeric_limits<uint64_t>::max()
        && metric_info_a.prochot_residency_acc == std::numeric_limits<uint64_t>::max()
        && metric_info_a.ppt_residency_acc == std::numeric_limits<uint64_t>::max()
        && metric_info_a.socket_thm_residency_acc == std::numeric_limits<uint64_t>::max()
        && metric_info_a.vr_thm_residency_acc == std::numeric_limits<uint64_t>::max()
        && metric_info_a.hbm_thm_residency_acc == std::numeric_limits<uint64_t>::max()
-        && (metric_info_a.xcp_stats->gfx_below_host_limit_acc[partitition_id]
-        == std::numeric_limits<uint64_t>::max())) {
+        && metric_info_a.xcp_stats[kFIRST_ELEMENT].gfx_below_host_limit_acc[kFIRST_ELEMENT]
+        == std::numeric_limits<uint64_t>::max()
+        && metric_info_a.xcp_stats[kFIRST_ELEMENT].gfx_below_host_limit_ppt_acc[kFIRST_ELEMENT]
+        == std::numeric_limits<uint64_t>::max()
+        && metric_info_a.xcp_stats[kFIRST_ELEMENT].gfx_below_host_limit_thm_acc[kFIRST_ELEMENT]
+        == std::numeric_limits<uint64_t>::max()
+        && metric_info_a.xcp_stats[kFIRST_ELEMENT].gfx_low_utilization_acc[kFIRST_ELEMENT]
+        == std::numeric_limits<uint64_t>::max()
+        && metric_info_a.xcp_stats[kFIRST_ELEMENT].gfx_below_host_limit_total_acc[kFIRST_ELEMENT]
+        == std::numeric_limits<uint64_t>::max()) {
        ss << __PRETTY_FUNCTION__
           << " | ASIC does not support throttle violations!, "
           << "returning AMDSMI_STATUS_NOT_SUPPORTED";
@@ -1136,8 +1161,26 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
    violation_status->acc_socket_thrm = metric_info_b.socket_thm_residency_acc;
    violation_status->acc_vr_thrm = metric_info_b.vr_thm_residency_acc;
    violation_status->acc_hbm_thrm = metric_info_b.hbm_thm_residency_acc;
-    violation_status->acc_gfx_clk_below_host_limit //deprecated
-        = metric_info_b.xcp_stats->gfx_below_host_limit_acc[partitition_id];
+    violation_status->acc_gfx_clk_below_host_limit  // deprecated
+        = metric_info_b.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT];
+
+    // Copy XCP accumulators into 2D array
+    auto copy_xcp_metric = [](const auto& src, auto& dst, auto member_ptr) {
+        for (size_t i = 0; i < AMDSMI_MAX_NUM_XCP; ++i) {
+            std::copy(
+                std::begin(src[i].*member_ptr),
+                std::end(src[i].*member_ptr),
+                dst[i]);
+        }
+    };
+    copy_xcp_metric(metric_info_b.xcp_stats, violation_status->acc_gfx_clk_below_host_limit_pwr,
+                    &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_ppt_acc);
+    copy_xcp_metric(metric_info_b.xcp_stats, violation_status->acc_gfx_clk_below_host_limit_thrm,
+                    &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_thm_acc);
+    copy_xcp_metric(metric_info_b.xcp_stats, violation_status->acc_low_utilization,
+                    &amdsmi_gpu_xcp_metrics_t::gfx_low_utilization_acc);
+    copy_xcp_metric(metric_info_b.xcp_stats, violation_status->acc_gfx_clk_below_host_limit_total,
+                    &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_total_acc);

    ss << __PRETTY_FUNCTION__ << " | "
       << "[gpu_metrics A] metric_info_a.accumulation_counter: " << std::dec
@@ -1152,8 +1195,9 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
       << metric_info_a.vr_thm_residency_acc << "\n"
       << "; metric_info_a.hbm_thm_residency_acc: " << std::dec
       << metric_info_a.hbm_thm_residency_acc << "\n"
-       << "; metric_info_b.xcp_stats->gfx_below_host_limit_acc[" << partitition_id << "]: "
-       << std::dec << metric_info_a.xcp_stats->gfx_below_host_limit_acc[partitition_id] << "\n"
+       << "; metric_info_a.xcp_stats[" << partition_id << "].gfx_below_host_limit_acc["
+       << kFIRST_ELEMENT << "]: " << std::dec  // deprecated
+       << metric_info_a.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT] << "\n"
       << " [gpu_metrics B] metric_info_b.accumulation_counter: " << std::dec
       << metric_info_b.accumulation_counter << "\n"
       << "; metric_info_b.prochot_residency_acc: " << std::dec
@@ -1166,46 +1210,11 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
       << metric_info_b.vr_thm_residency_acc << "\n"
       << "; metric_info_b.hbm_thm_residency_acc: " << std::dec
       << metric_info_b.hbm_thm_residency_acc << "\n"
-       << "; metric_info_b.xcp_stats->gfx_below_host_limit_acc[" << partitition_id << "]: " //deprecated
-       << std::dec << metric_info_b.xcp_stats->gfx_below_host_limit_acc[partitition_id] << "\n";
+       << "; metric_info_b.xcp_stats[" << partition_id << "].gfx_below_host_limit_acc["
+       << kFIRST_ELEMENT << "]: " << std::dec  // deprecated
+       << metric_info_b.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT] << "\n";
    LOG_DEBUG(ss);

-    auto copy_gfx_acc = [](auto priv_it, auto priv_end, auto pub_it, auto gfx_acc_ptr) {
-        for (; priv_it != priv_end; ++priv_it, ++pub_it) {
-            std::copy(std::begin((*priv_it).*gfx_acc_ptr),
-                      std::end((*priv_it).*gfx_acc_ptr),
-                      std::begin(*pub_it));
-        }
-    };
-
-    copy_gfx_acc(
-        std::begin(metric_info_b.xcp_stats),
-        std::end(metric_info_b.xcp_stats),
-        std::begin(violation_status->acc_gfx_clk_below_host_limit_pwr),
-        &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_ppt_acc
-    );
-
-    copy_gfx_acc(
-        std::begin(metric_info_b.xcp_stats),
-        std::end(metric_info_b.xcp_stats),
-        std::begin(violation_status->acc_gfx_clk_below_host_limit_thm),
-        &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_thm_acc
-    );
-
-    copy_gfx_acc(
-        std::begin(metric_info_b.xcp_stats),
-        std::end(metric_info_b.xcp_stats),
-        std::begin(violation_status->acc_low_utilization),
-        &amdsmi_gpu_xcp_metrics_t::gfx_low_utilization_acc
-    );
-
-    copy_gfx_acc(
-        std::begin(metric_info_b.xcp_stats),
-        std::end(metric_info_b.xcp_stats),
-        std::begin(violation_status->acc_gfx_clk_below_host_limit_total),
-        &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_total_acc
-    );
-
    if ( (metric_info_b.prochot_residency_acc != std::numeric_limits<uint64_t>::max()
        || metric_info_a.prochot_residency_acc != std::numeric_limits<uint64_t>::max())
        && (metric_info_b.prochot_residency_acc >= metric_info_a.prochot_residency_acc)
@@ -1309,15 +1318,19 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
           << violation_status->active_hbm_thrm << "\n";
        LOG_DEBUG(ss);
    }
-    /* //deprecated
-    if ((metric_info_b.xcp_stats->gfx_below_host_limit_acc[partitition_id] != std::numeric_limits<uint64_t>::max() ||
-         metric_info_a.xcp_stats->gfx_below_host_limit_acc[partitition_id] != std::numeric_limits<uint64_t>::max()) &&
-        (metric_info_b.xcp_stats->gfx_below_host_limit_acc[partitition_id] >= metric_info_a.xcp_stats->gfx_below_host_limit_acc[partitition_id]) &&
+    // deprecated - design likely needs to include both [XCP][XCC], like the new metrics
+    if ((metric_info_b.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT]
+        != std::numeric_limits<uint64_t>::max() ||
+         metric_info_a.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT]
+         != std::numeric_limits<uint64_t>::max()) &&
+        (metric_info_b.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT]
+            >= metric_info_a.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT]) &&
        ((metric_info_b.accumulation_counter - metric_info_a.accumulation_counter) > 0)) {
        violation_status->per_gfx_clk_below_host_limit =
-            (((metric_info_b.xcp_stats->gfx_below_host_limit_acc[partitition_id] -
-                metric_info_a.xcp_stats->gfx_below_host_limit_acc[partitition_id]) * 100) /
-            (metric_info_b.accumulation_counter - metric_info_a.accumulation_counter));
+            (((metric_info_b.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT] -
+                metric_info_a.xcp_stats[partition_id].gfx_below_host_limit_acc[kFIRST_ELEMENT])
+                * 100) /
+                (metric_info_b.accumulation_counter - metric_info_a.accumulation_counter));

        if (violation_status->per_gfx_clk_below_host_limit > 0) {
            violation_status->active_gfx_clk_below_host_limit = 1;
@@ -1327,68 +1340,95 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
        ss << __PRETTY_FUNCTION__ << " | "
           << "ENTERED gfx_below_host_limit_acc | per_gfx_clk_below_host_limit: " << std::dec
           << violation_status->per_gfx_clk_below_host_limit
-           << "%; active_ppt_pwr = " << std::dec
+           << "%; active_ppt_pwr = " << std::boolalpha
           << violation_status->active_gfx_clk_below_host_limit << "\n";
        LOG_DEBUG(ss);
    }
-    */
-    uint64_t counter_delta = metric_info_b.accumulation_counter - metric_info_a.accumulation_counter;
-    auto calc_viol_actv_percent = [](auto priv_it1, auto end1, auto priv_it2, auto pub_it, auto act_it, auto viol_ptr, uint64_t counter_delta) {
-        for (; priv_it1 != end1; ++priv_it1, ++priv_it2, ++pub_it, ++act_it) {
-            auto& priv_it_arr2 = (*priv_it2).*viol_ptr;
-            auto& priv_it_arr1 = (*priv_it1).*viol_ptr;
-            for (size_t i = 0; i < AMDSMI_MAX_NUM_XCC; ++i) {
-                uint64_t value2 = priv_it_arr2[i];
-                uint64_t value1 = priv_it_arr1[i];
-                if ((value2 != std::numeric_limits<uint64_t>::max() ||
-                     value1 != std::numeric_limits<uint64_t>::max()) &&
-                    (value2 > value1) && (counter_delta > 0)) {
-                    (*pub_it)[i] = ((value2 - value1) * 100) / counter_delta;
-                    (*act_it)[i] = (((*pub_it)[i]) > 0) ? 1 : 0;
+
+    // one-shot processing of all XCP violation metrics
+    // using a lambda function to avoid code duplication
+    using MetricArrayType = uint64_t[AMDSMI_MAX_NUM_XCC];
+    using MetricMemberPtr = MetricArrayType amdsmi_gpu_xcp_metrics_t::*;
+
+    auto process_all_XCP_violation_metrics = [&](
+        const std::vector<std::pair<std::string, MetricMemberPtr>>& metric_members,
+        std::vector<std::reference_wrapper<
+            uint64_t[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC]>> per_arrays,
+        std::vector<std::reference_wrapper<
+            uint8_t[AMDSMI_MAX_NUM_XCP][AMDSMI_MAX_NUM_XCC]>> active_arrays) {
+        uint64_t counter_delta = static_cast<uint64_t>(metric_info_b.accumulation_counter)
+                            - static_cast<uint64_t>(metric_info_a.accumulation_counter);
+
+        ss << __PRETTY_FUNCTION__ << " | Processing all XCP metrics with counter_delta: "
+           << std::dec << counter_delta << "\n";
+        LOG_DEBUG(ss);
+
+        for (size_t metric_idx = 0; metric_idx < metric_members.size(); ++metric_idx) {
+            const auto& member_pair = metric_members[metric_idx];
+            const std::string& member_name = member_pair.first;
+            MetricMemberPtr member_ptr = member_pair.second;
+
+            auto& per_arr = per_arrays[metric_idx].get();
+            auto& active_arr = active_arrays[metric_idx].get();
+
+            ss << "  [Metric] " << member_name << "\n";
+            for (uint32_t xcp = 0; xcp < AMDSMI_MAX_NUM_XCP; ++xcp) {
+                const MetricArrayType& arr_a = metric_info_a.xcp_stats[xcp].*member_ptr;
+                const MetricArrayType& arr_b = metric_info_b.xcp_stats[xcp].*member_ptr;
+                ss << "    xcp: " << xcp << " (";
+                for (uint32_t xcc = 0; xcc < AMDSMI_MAX_NUM_XCC; ++xcc) {
+                    uint64_t val_a = arr_a[xcc];
+                    uint64_t val_b = arr_b[xcc];
+
+                    if (val_b == std::numeric_limits<uint64_t>::max() ||
+                        val_a == std::numeric_limits<uint64_t>::max() ||
+                        counter_delta <= 0 ||
+                        val_b < val_a) {
+                        per_arr[xcp][xcc] = std::numeric_limits<uint64_t>::max();
+                        active_arr[xcp][xcc] = std::numeric_limits<uint8_t>::max();
+                        ss << "[Invalid] (" << std::dec << per_arr[xcp][xcc]
+                           << ", " << static_cast<int>(active_arr[xcp][xcc]) << ") ";
+                        continue;
+                    }
+
+                    uint64_t percent = ((val_b - val_a) * 100) / counter_delta;
+                    per_arr[xcp][xcc] = percent;
+                    active_arr[xcp][xcc] = (percent > 0) ? 1 : 0;
+                    ss << "[Valid] (" << std::dec << percent << "%, "
+                       << std::boolalpha << static_cast<bool>(active_arr[xcp][xcc])
+                       << ") | val_b: " << std::dec << val_b
+                       << ", val_a: " << std::dec << val_a
+                       << ", counter_delta: " << std::dec << counter_delta << " ";
                }
+                ss << ")\n";
            }
        }
+        LOG_DEBUG(ss);
    };

-    calc_viol_actv_percent(
-        std::begin(metric_info_a.xcp_stats),
-        std::end(metric_info_a.xcp_stats),
-        std::begin(metric_info_b.xcp_stats),
-        std::begin(violation_status->per_gfx_clk_below_host_limit_pwr),
-        std::begin(violation_status->active_gfx_clk_below_host_limit_pwr),
-        &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_ppt_acc,
-        counter_delta
-    );
+    // Prepare metric members and arrays for processing
+    const std::vector<std::pair<std::string, MetricMemberPtr>> metric_members = {
+        {"gfx_below_host_limit_ppt_acc", &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_ppt_acc},
+        {"gfx_below_host_limit_thm_acc", &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_thm_acc},
+        {"gfx_low_utilization_acc", &amdsmi_gpu_xcp_metrics_t::gfx_low_utilization_acc},
+        {"gfx_below_host_limit_total_acc",
+            &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_total_acc}
+    };

-    calc_viol_actv_percent(
-        std::begin(metric_info_a.xcp_stats),
-        std::end(metric_info_a.xcp_stats),
-        std::begin(metric_info_b.xcp_stats),
-        std::begin(violation_status->per_gfx_clk_below_host_limit_thm),
-        std::begin(violation_status->active_gfx_clk_below_host_limit_thm),
-        &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_thm_acc,
-        counter_delta
-    );
-
-    calc_viol_actv_percent(
-        std::begin(metric_info_a.xcp_stats),
-        std::end(metric_info_a.xcp_stats),
-        std::begin(metric_info_b.xcp_stats),
-        std::begin(violation_status->per_low_utilization),
-        std::begin(violation_status->active_low_utilization),
-        &amdsmi_gpu_xcp_metrics_t::gfx_low_utilization_acc,
-        counter_delta
-    );
-
-    calc_viol_actv_percent(
-        std::begin(metric_info_a.xcp_stats),
-        std::end(metric_info_a.xcp_stats),
-        std::begin(metric_info_b.xcp_stats),
-        std::begin(violation_status->per_gfx_clk_below_host_limit_total),
-        std::begin(violation_status->active_gfx_clk_below_host_limit_total),
-        &amdsmi_gpu_xcp_metrics_t::gfx_below_host_limit_total_acc,
-        counter_delta
-    );
+    process_all_XCP_violation_metrics(
+        metric_members,
+        {
+            std::ref(violation_status->per_gfx_clk_below_host_limit_pwr),
+            std::ref(violation_status->per_gfx_clk_below_host_limit_thrm),
+            std::ref(violation_status->per_low_utilization),
+            std::ref(violation_status->per_gfx_clk_below_host_limit_total)
+        },
+        {
+            std::ref(violation_status->active_gfx_clk_below_host_limit_pwr),
+            std::ref(violation_status->active_gfx_clk_below_host_limit_thrm),
+            std::ref(violation_status->active_low_utilization),
+            std::ref(violation_status->active_gfx_clk_below_host_limit_total)
+        });

    ss << __PRETTY_FUNCTION__ << " | "
       << "RETURNING AMDSMI_STATUS_SUCCESS | "
@@ -1406,20 +1446,20 @@ amdsmi_status_t amdsmi_get_violation_status(amdsmi_processor_handle processor_ha
       << violation_status->per_vr_thrm
       << "; violation_status->per_hbm_thrm (%): " << std::dec
       << violation_status->per_hbm_thrm
-       << "; violation_status->per_gfx_clk_below_host_limit (%): " << std::dec //deprecated
+       << "; violation_status->per_gfx_clk_below_host_limit (%): " << std::dec  // deprecated
       << violation_status->per_gfx_clk_below_host_limit
-       << "; violation_status->active_prochot_thrm (bool): " << std::dec
+       << "; violation_status->active_prochot_thrm (bool): " << std::boolalpha
       << static_cast<int>(violation_status->active_prochot_thrm)
-       << "; violation_status->active_ppt_pwr (bool): " << std::dec
+       << "; violation_status->active_ppt_pwr (bool): " << std::boolalpha
       << static_cast<int>(violation_status->active_ppt_pwr)
-       << "; violation_status->active_socket_thrm (bool): " << std::dec
+       << "; violation_status->active_socket_thrm (bool): " << std::boolalpha
       << static_cast<int>(violation_status->active_socket_thrm)
-       << "; violation_status->active_vr_thrm (bool): " << std::dec
+       << "; violation_status->active_vr_thrm (bool): " << std::boolalpha
       << static_cast<int>(violation_status->active_vr_thrm)
-       << "; violation_status->active_hbm_thrm (bool): " << std::dec
+       << "; violation_status->active_hbm_thrm (bool): " << std::boolalpha
       << static_cast<int>(violation_status->active_hbm_thrm)
-       << "; violation_status->active_gfx_clk_below_host_limit (bool): " << std::dec //deprecated
-       << static_cast<int>(violation_status->active_gfx_clk_below_host_limit)
+       << "; violation_status->active_gfx_clk_below_host_limit (bool): "  // deprecated
+       << std::boolalpha << static_cast<int>(violation_status->active_gfx_clk_below_host_limit)
       << "\n";
    LOG_INFO(ss);