[SWDEV-535159] Add support for GPU partition metrics (#490)

[SWDEV-535159] Add support for GPU partition metrics Changes include: - Internal logic to smart-switch between gpu_metrics/xcp_metrics files - [WIP] Initial plumbing for new partition metric API Change-Id: I4340fb1b48bac0117d80d5d486b9e871430d5cd8 Signed-off-by: Charis Poag <Charis.Poag@amd.com> Add amdsmi_get_gpu_partition_metrics_info() + minor cleanup Change-Id: I5d60604f18baddbd03852dc90e88aa0b8107d50e Signed-off-by: Charis Poag <Charis.Poag@amd.com> Fix partition metric logic + update logging/tests Change-Id: I9e89b19ead17694c54e224f8e13ff8ee3eb2e22a Signed-off-by: Charis Poag <Charis.Poag@amd.com> Adjust amd-smi metric/monitor/default to show (some) partition information Change-Id: I2e8d2745876a19bdaec3c039daa97345c9f701b5 Signed-off-by: Charis Poag <Charis.Poag@amd.com> Add C++ tests Change-Id: Ib9eb0b57a6d7a280992e05a4c6eba632826952ef Signed-off-by: Charis Poag <Charis.Poag@amd.com> Remove modification of energy counter, not needed Change-Id: I5c48eaaae248ee6dc79abba609d837ec35d78022 Signed-off-by: Charis Poag <Charis.Poag@amd.com> [CLI] amd-smi metric: cleaned up N/A'd multi-valued to show just N/A Changes: 1. amd-smi metric: cleaned up N/A'd multi-valued to show just N/A ex. JPEG_ACTIVITY: [N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A] Now just shows: N/A 2. [Python Unit Test] Changed testname TestAmdSmiPythonBDF(unittest.TestCase) -> AmdSmiPythonUnitTest Test name was confusing. Change-Id: Ieb3b036f30002fd22362508eb9fc5d443df395ae Signed-off-by: Charis Poag <Charis.Poag@amd.com> Log cleanup Change-Id: I1b1a95f1844d35bec7a7bd8cb996f87e4914c069 Signed-off-by: Charis Poag <Charis.Poag@amd.com> Add amd-smi partition-metrics CLI + general cleanup Change-Id: Ia91488e6cb3a4d62b4087afbddfe0b3bb9378fdc Signed-off-by: Charis Poag <Charis.Poag@amd.com> [1.3 metrics] Remove forwards compatibility for partition metrics Change-Id: Iab928983e6f6f1587bc9307f6f3fa2b2696ca6f7 Signed-off-by: Charis Poag <Charis.Poag@amd.com> Fixed violation output not showing % + general cleanup Change-Id: Icac1b0a55b18c7628b07109ae0c377d17e0825f1 Signed-off-by: Charis Poag <Charis.Poag@amd.com> Clean up amdsmi_get_gpu_partition_metrics_info & amd-smi partition-metric outputs Change-Id: I6427028b980874641e9ffb3b5d88ad493dbf9cf4 Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Fix metrics not found + extra logging/formatting Change-Id: I841a27bb2c305e97ec7579a13ac915e5be497c3a Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Update license to current default Change-Id: I0de9b8a2d5dbbeab4491097f0354ba17b0d30866 Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Cleanup for review Change-Id: I96ed25c3f2b8968eea1af24c5e5860c2b4e74e6e Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Moderize updated/new interal APIs. Change-Id: I3c48a250eeb703709b14cb5ffa68268d8321626c Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Remove extra logging in dynamic metrics Change-Id: Idb97547bcbe143d6fa1cb5cb278ffe4da615ce14 Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Remove amd-smi partition-metric command Change-Id: Ib83c17e5cd7e0da3798198943bddd46c296b411c Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Move new CLI updates to another PR + minor fixes Change-Id: I3b1163eec12f9b5f7d95ee33de08e168cec1b1fe Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Allow dynamic metrics to work for gpu/xcp metrics 1.9+/1.1+ Updated some logging as well. Change-Id: I2ed9f5a5ef8afb1520508820ca6153525f0644b4 Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Allow dyn gpu/xcp metric v1.9+/v1.1+ Added tests for quick check Change-Id: I576d6f6582a55afb08e5ac57791ce95e2fa184a2 Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Update tests for larger subset of version checks Change-Id: I3cdf4f8bb4fc6161f4c76566939f90545d0f362a Signed-off-by: Charis Poag <Charis.Poag@amd.com> * Fix XCP metrics in gpu/partition metric pre-v1.9/v1.1 (dynamic) Change-Id: I4dabc1ed6bef6b86c8e7f92bf9cb5992f3966fe2 Signed-off-by: Charis Poag <Charis.Poag@amd.com> --------- Signed-off-by: Charis Poag <Charis.Poag@amd.com> [ROCm/amdsmi commit: 01b4fe6614]
2025-10-20 14:43:40 -05:00
@@ -1632,6 +1632,7 @@ class AMDSMICommands():
        # Add timestamp and store values for specified arguments
        values_dict = {}

+        is_partition_metrics = False  # True if we get the metrics from xcp_metrics file (amdsmi_get_gpu_partition_metrics_info)
        #get metric info only once per gpu, this will speed up data output
        try:
            # Get GPU Metrics table
@@ -1640,19 +1641,10 @@ class AMDSMICommands():
            logging.debug("#3 - Unable to load GPU Metrics table for %s | %s", gpu_id, e.get_error_info())
            gpu_metric = amdsmi_interface._NA_amdsmi_get_gpu_metrics_info()

-        # Workaround for XCP (partition) metrics not providing num_partition in v1.0
-        # Confirmed with driver team that we can default to 1 if num_partition is not defined.
-        # Pending partitions exist, ie. partition_id > 0. See logic below.
-        try:
-            partition_id = amdsmi_interface.amdsmi_get_gpu_kfd_info(args.gpu)['current_partition_id']
-        except amdsmi_exception.AmdSmiLibraryException as e:
-            logging.debug("Failed to get current partition id for gpu %s | %s", gpu_id, e.get_error_info())
-            partition_id = "N/A"
-
-        num_partition = gpu_metric['num_partition']
-        if num_partition == "N/A":
-            num_partition = 1  # Workaround for XCP metrics not providing num_partition in v1.0
-            logging.debug(f"num_partition is N/A and partition_id: {partition_id} (greater > 0).\nModified num_partition: {num_partition} to adjust for XCP metrics.")
+        # Workaround for XCP (partition) metrics not providing num_partition in v1.9+/v1.1+
+        # Provides original formatting for earlier metric versions
+        partition_metric_info = self.helpers._get_metric_version_and_partition_info(gpu_metric, is_partition_metrics, gpu_id, args.gpu)
+        num_partition = partition_metric_info['num_partition']

        if self.logger.is_json_format():
            values_dict['gpu'] = int(gpu_id)
@@ -2679,7 +2671,7 @@ class AMDSMICommands():
                                    value[k][index] = self.helpers.unit_format(self.logger, activity, activity_unit)
                                value[k] = '[' + ", ".join(value[k]) + ']'
                        elif value != "N/A":
-                            value = self.helpers.unit_format(self.logger, value, activity_unit)
+                            throttle_status[key] = self.helpers.unit_format(self.logger, value, activity_unit)
                    if self.logger.is_json_format():
                        if isinstance(value, (list, dict)):
                            for k, v in value.items():
@@ -3090,7 +3082,6 @@ class AMDSMICommands():
        if not self.logger.is_json_format():
            self.logger.print_output(multiple_device_enabled=multiple_devices_csv_override)

-
    def metric(self, args, multiple_devices=False, watching_output=False, gpu=None,
                usage=None, watch=None, watch_time=None, iterations=None, power=None,
                clock=None, temperature=None, ecc=None, ecc_blocks=None, pcie=None,
@@ -5710,6 +5701,7 @@ class AMDSMICommands():
            except amdsmi_exception.AmdSmiLibraryException as e:
                logging.debug("#5 - Unable to load GPU Metrics table for %s | %s", gpu_id, e.get_error_info())

+        is_partition_metrics = False  # True if we get the metrics from xcp_metrics file (amdsmi_get_gpu_partition_metrics_info)
        #get metric info only once per gpu, this will speed up data output
        try:
            # Get GPU Metrics table
@@ -5721,25 +5713,15 @@ class AMDSMICommands():
            gpu_metrics_info = amdsmi_interface._NA_amdsmi_get_gpu_metrics_info()
            logging.debug("Unable to load GPU Metrics table for %s | %s", gpu_id, e.get_error_info())

-        # Workaround for XCP (partition) metrics not providing num_partition in v1.0
-        # Confirmed with driver team that we can default to 1 if num_partition is not defined.
-        # Pending partitions exist, ie. partition_id > 0. See logic below.
-        try:
-            partition_id = amdsmi_interface.amdsmi_get_gpu_kfd_info(args.gpu)['current_partition_id']
-        except amdsmi_exception.AmdSmiLibraryException as e:
-            logging.debug("Failed to get current partition id for gpu %s | %s", gpu_id, e.get_error_info())
-            partition_id = "N/A"
+        # Workaround for XCP (partition) metrics not providing num_partition in v1.9+/v1.1+
+        # Provides original formatting for earlier metric versions
+        partition_metric_info = self.helpers._get_metric_version_and_partition_info(gpu_metrics_info, is_partition_metrics, gpu_id, args.gpu)
+        partition_id = partition_metric_info['partition_id']
+        num_partition = partition_metric_info['num_partition']

-        num_partition = gpu_metrics_info['num_partition']
-        if num_partition == "N/A":
-            num_partition = partition_id
-
-        num_xcp = num_partition  # used later for XCP metrics
+        # Update logger for XCP display (only if applicable)
        self.logger.table_header += 'XCP'.rjust(5, ' ')
-        self.logger.store_output(args.gpu, 'xcp', partition_id)  # Starting with partition_id.
-                                                                 # Outputs which have xcp details
-                                                                 # will update this value via num_xcp.
-                                                                 # This value will help map to primary device.
+        self.logger.store_output(args.gpu, 'xcp', partition_id)  # Store partition_id initially; can be updated via num_xcp

        # Store the pcie_bw values due to possible increase in bandwidth due to repeated gpu_metrics calls
        if args.pcie:
@@ -5979,7 +5961,7 @@ class AMDSMICommands():
                                                           "unit" : freq_unit}
            except (KeyError, amdsmi_exception.AmdSmiLibraryException) as e:
                monitor_values['dclock'] = "N/A"
-                logging.debug("Failed to get vclock on gpu %s | %s", gpu_id, e)
+                logging.debug("Failed to get dclock on gpu %s | %s", gpu_id, e)

            self.logger.table_header += 'DCLOCK'.rjust(10)

@@ -6322,7 +6304,7 @@ class AMDSMICommands():
                    self.logger.store_multiple_device_output()
                    current_xcp += 1
            else:
-                self.logger.store_output(args.gpu, 'xcp', num_xcp)
+                self.logger.store_output(args.gpu, 'xcp', partition_id)
                self.logger.store_output(args.gpu, 'values', monitor_values)

        # Store typical output for all commands (XCP data will be handled separately, eg. violation status)
@@ -1018,7 +1018,6 @@ class AMDSMIHelpers():
        """This function will format output with unit based on the logger output format

        params:
-            args - argparser args to pass to subcommand
            logger (AMDSMILogger) - Logger to print out output
            value - the value to be formatted
            unit - the unit to be formatted with the value
@@ -1041,6 +1040,9 @@ class AMDSMIHelpers():
                    return {"value": value, "unit": unit}
                else:
                    return value
+            if logger.is_csv_format():
+                # For CSV, return the raw value (number or "N/A"), not a string
+                return value
            if logger.is_human_readable_format():
                if unit:
                    return f"{value} {unit}".rstrip()
@@ -1745,3 +1747,70 @@ class AMDSMIHelpers():
        # Flatten nested lists and filter integers
        flat = [v for value in data for v in (value if isinstance(value, list) else [value]) if isinstance(v, int)]
        return round(sum(flat) / len(flat)) if flat else "N/A"
+
+    def _get_metric_version_and_partition_info(self, gpu_metrics_info, is_partition_metrics, gpu_id, gpu_handle):
+        """
+        Helper method to compute metric version, partition ID, and num_partition for dynamic metrics.
+        Handles logging updates internally for reusability.
+        
+        Args:
+            gpu_metrics_info (dict): GPU metrics info from amdsmi_get_gpu_metrics_info.
+            is_partition_metrics (bool): Whether this is for partition metrics.
+            gpu_id (int): GPU ID for logging.
+            gpu_handle: GPU device handle for KFD info retrieval.
+        
+        Returns:
+            dict: {
+                'metric_version': float or "N/A",
+                'partition_id': int or "N/A",
+                'num_partition': int or "N/A",
+                'num_xcp': int or "N/A"  # Alias for num_partition
+            }
+        """
+        # Compute metric version from header revisions
+        metric_version = "N/A"
+        format_rev = gpu_metrics_info.get('common_header.format_revision', "N/A")
+        content_rev = gpu_metrics_info.get('common_header.content_revision', "N/A")
+        if format_rev != "N/A" and content_rev != "N/A":
+            try:
+                metric_version = float(f"{format_rev}.{content_rev}")
+            except ValueError:
+                metric_version = "N/A"  # Fallback if conversion fails
+        
+        # Retrieve partition ID from KFD info
+        partition_id = "N/A"
+        try:
+            kfd_info = amdsmi_interface.amdsmi_get_gpu_kfd_info(gpu_handle)
+            partition_id = kfd_info.get('current_partition_id', "N/A")
+        except amdsmi_exception.AmdSmiLibraryException as e:
+            logging.debug("Failed to get current partition ID for GPU %s | %s", gpu_id, e.get_error_info())
+        
+        # Determine num_partition with fallback logic for dynamic metrics
+        num_partition = gpu_metrics_info.get('num_partition', "N/A")
+        if metric_version != "N/A" and num_partition == "N/A":
+            # Workaround: Default to 1 for newer metric versions if num_partition is missing
+            # (Confirmed with driver team; applies to GPU and partition metrics)
+            if not is_partition_metrics and metric_version >= 1.9:
+                num_partition = 1
+            elif is_partition_metrics and metric_version >= 1.1:
+                num_partition = 1
+            elif partition_id != "N/A" and partition_id > 0:
+                # Fallback to partition_id if partitions exist but num_partition is unavailable
+                num_partition = partition_id
+            # Else: Remains "N/A" if no conditions match
+        
+        # Alias num_xcp for XCP metrics usage
+        num_xcp = num_partition
+        
+        # Debug logging
+        logging.debug(
+            "GPU %s | Metric version: %s, num_partition: %s, partition_id: %s, num_xcp: %s",
+            gpu_id, metric_version, num_partition, partition_id, num_xcp
+        )
+        
+        return {
+            'metric_version': metric_version,
+            'partition_id': partition_id,
+            'num_partition': num_partition,
+            'num_xcp': num_xcp
+        }    
@@ -918,7 +918,6 @@ class AMDSMIParser(argparse.ArgumentParser):
        self._add_device_arguments(bad_pages_parser, required=False)
        self._add_command_modifiers(bad_pages_parser)

-
    def _add_metric_parser(self, subparsers: argparse._SubParsersAction, func):
        # Subparser help text
        metric_help = "Gets metric/performance information about the specified GPU"
@@ -4055,6 +4055,30 @@ amdsmi_get_gpu_metrics_header_info(amdsmi_processor_handle processor_handle, amd
 amdsmi_status_t amdsmi_get_gpu_metrics_info(amdsmi_processor_handle processor_handle,
                                            amdsmi_gpu_metrics_t *pgpu_metrics);

+/**
+ *  @brief This function retrieves the partition metrics information.
+ *
+ *  @ingroup tagClkPowerPerfQuery
+ *
+ *  @platform{gpu_bm_linux} @platform{guest_1vf}
+ *
+ *  @details Given a processor handle @p processor_handle and a pointer to a
+ *  ::amdsmi_gpu_metrics_t structure @p pgpu_metrics, this function will populate
+ *  @p pgpu_metrics. See ::amdsmi_gpu_metrics_t for more details.
+ *
+ *  @param[in] processor_handle a processor handle
+ *
+ *  @param[in,out] pgpu_metrics a pointer to an ::amdsmi_gpu_metrics_t structure
+ *  If this parameter is nullptr, this function will return
+ *  ::AMDSMI_STATUS_INVAL if the function is supported with the provided,
+ *  arguments and ::AMDSMI_STATUS_NOT_SUPPORTED if it is not supported with the
+ *  provided arguments.
+ *
+ *  @return ::amdsmi_status_t | ::AMDSMI_STATUS_SUCCESS on success, non-zero on fail
+ */
+amdsmi_status_t amdsmi_get_gpu_partition_metrics_info(amdsmi_processor_handle processor_handle,
+                                                      amdsmi_gpu_metrics_t *pgpu_metrics);
+
 /**
 *  @brief Get the pm metrics table with provided device index.
 *
@@ -184,6 +184,7 @@ from .amdsmi_interface import amdsmi_get_gpu_mem_overdrive_level
 from .amdsmi_interface import amdsmi_get_clk_freq
 from .amdsmi_interface import amdsmi_get_gpu_od_volt_info
 from .amdsmi_interface import amdsmi_get_gpu_metrics_info
+from .amdsmi_interface import amdsmi_get_gpu_partition_metrics_info
 from .amdsmi_interface import amdsmi_get_gpu_od_volt_curve_regions
 from .amdsmi_interface import amdsmi_is_gpu_power_management_enabled

@@ -4932,6 +4932,165 @@ def amdsmi_get_gpu_metrics_info(
            gpu_metrics_output['xcp_stats.gfx_below_host_limit_total_acc'][xcp_index] = xcp_detail
    return gpu_metrics_output

+def amdsmi_get_gpu_partition_metrics_info(
+    processor_handle: processor_handle_t,
+) -> Dict[str, Any]:
+    if not isinstance(processor_handle, amdsmi_wrapper.amdsmi_processor_handle):
+        raise AmdSmiParameterException(
+            processor_handle, amdsmi_wrapper.amdsmi_processor_handle
+        )
+
+    gpu_metrics = amdsmi_wrapper.amdsmi_gpu_metrics_t()
+    _check_res(
+        amdsmi_wrapper.amdsmi_get_gpu_partition_metrics_info(
+            processor_handle, ctypes.byref(gpu_metrics)
+        )
+    )
+
+    gpu_metrics_output = {
+        "common_header.structure_size": _validate_if_max_uint(gpu_metrics.common_header.structure_size, MaxUIntegerTypes.UINT16_T),
+        "common_header.format_revision": _validate_if_max_uint(gpu_metrics.common_header.format_revision, MaxUIntegerTypes.UINT8_T),
+        "common_header.content_revision": _validate_if_max_uint(gpu_metrics.common_header.content_revision, MaxUIntegerTypes.UINT8_T),
+        "temperature_edge": _validate_if_max_uint(gpu_metrics.temperature_edge, MaxUIntegerTypes.UINT16_T),
+        "temperature_hotspot": _validate_if_max_uint(gpu_metrics.temperature_hotspot, MaxUIntegerTypes.UINT16_T),
+        "temperature_mem": _validate_if_max_uint(gpu_metrics.temperature_mem, MaxUIntegerTypes.UINT16_T),
+        "temperature_vrgfx": _validate_if_max_uint(gpu_metrics.temperature_vrgfx, MaxUIntegerTypes.UINT16_T),
+        "temperature_vrsoc": _validate_if_max_uint(gpu_metrics.temperature_vrsoc, MaxUIntegerTypes.UINT16_T),
+        "temperature_vrmem": _validate_if_max_uint(gpu_metrics.temperature_vrmem, MaxUIntegerTypes.UINT16_T),
+        "average_gfx_activity": _validate_if_max_uint(gpu_metrics.average_gfx_activity, MaxUIntegerTypes.UINT16_T, isActivity=True),
+        "average_umc_activity": _validate_if_max_uint(gpu_metrics.average_umc_activity, MaxUIntegerTypes.UINT16_T, isActivity=True),
+        "average_mm_activity": _validate_if_max_uint(gpu_metrics.average_mm_activity, MaxUIntegerTypes.UINT16_T, isActivity=True),
+        "average_socket_power": _validate_if_max_uint(gpu_metrics.average_socket_power, MaxUIntegerTypes.UINT16_T),
+        "energy_accumulator": _validate_if_max_uint(gpu_metrics.energy_accumulator, MaxUIntegerTypes.UINT64_T),
+        "system_clock_counter": _validate_if_max_uint(gpu_metrics.system_clock_counter, MaxUIntegerTypes.UINT64_T),
+        "average_gfxclk_frequency": _validate_if_max_uint(gpu_metrics.average_gfxclk_frequency, MaxUIntegerTypes.UINT16_T),
+        "average_socclk_frequency": _validate_if_max_uint(gpu_metrics.average_socclk_frequency, MaxUIntegerTypes.UINT16_T),
+        "average_uclk_frequency": _validate_if_max_uint(gpu_metrics.average_uclk_frequency, MaxUIntegerTypes.UINT16_T),
+        "average_vclk0_frequency": _validate_if_max_uint(gpu_metrics.average_vclk0_frequency, MaxUIntegerTypes.UINT16_T),
+        "average_dclk0_frequency": _validate_if_max_uint(gpu_metrics.average_dclk0_frequency, MaxUIntegerTypes.UINT16_T),
+        "average_vclk1_frequency": _validate_if_max_uint(gpu_metrics.average_vclk1_frequency, MaxUIntegerTypes.UINT16_T),
+        "average_dclk1_frequency": _validate_if_max_uint(gpu_metrics.average_dclk1_frequency, MaxUIntegerTypes.UINT16_T),
+        "current_gfxclk": _validate_if_max_uint(gpu_metrics.current_gfxclk, MaxUIntegerTypes.UINT16_T),
+        "current_socclk": _validate_if_max_uint(gpu_metrics.current_socclk, MaxUIntegerTypes.UINT16_T),
+        "current_uclk": _validate_if_max_uint(gpu_metrics.current_uclk, MaxUIntegerTypes.UINT16_T),
+        "current_vclk0": _validate_if_max_uint(gpu_metrics.current_vclk0, MaxUIntegerTypes.UINT16_T),
+        "current_dclk0": _validate_if_max_uint(gpu_metrics.current_dclk0, MaxUIntegerTypes.UINT16_T),
+        "current_vclk1": _validate_if_max_uint(gpu_metrics.current_vclk1, MaxUIntegerTypes.UINT16_T),
+        "current_dclk1": _validate_if_max_uint(gpu_metrics.current_dclk1, MaxUIntegerTypes.UINT16_T),
+        "throttle_status": _validate_if_max_uint(gpu_metrics.throttle_status, MaxUIntegerTypes.UINT32_T, isBool=True),
+        "current_fan_speed": _validate_if_max_uint(gpu_metrics.current_fan_speed, MaxUIntegerTypes.UINT16_T),
+        "pcie_link_width": _validate_if_max_uint(gpu_metrics.pcie_link_width, MaxUIntegerTypes.UINT16_T),
+        "pcie_link_speed": _validate_if_max_uint(gpu_metrics.pcie_link_speed, MaxUIntegerTypes.UINT16_T),
+        "gfx_activity_acc": _validate_if_max_uint(gpu_metrics.gfx_activity_acc, MaxUIntegerTypes.UINT32_T),
+        "mem_activity_acc": _validate_if_max_uint(gpu_metrics.mem_activity_acc, MaxUIntegerTypes.UINT32_T),
+        "temperature_hbm": _validate_if_max_uint(list(gpu_metrics.temperature_hbm), MaxUIntegerTypes.UINT16_T),
+        "firmware_timestamp": _validate_if_max_uint(gpu_metrics.firmware_timestamp, MaxUIntegerTypes.UINT64_T),
+        "voltage_soc": _validate_if_max_uint(gpu_metrics.voltage_soc, MaxUIntegerTypes.UINT16_T),
+        "voltage_gfx": _validate_if_max_uint(gpu_metrics.voltage_gfx, MaxUIntegerTypes.UINT16_T),
+        "voltage_mem": _validate_if_max_uint(gpu_metrics.voltage_mem, MaxUIntegerTypes.UINT16_T),
+        "indep_throttle_status": _validate_if_max_uint(gpu_metrics.indep_throttle_status, MaxUIntegerTypes.UINT64_T, isBool=True),
+        "current_socket_power": _validate_if_max_uint(gpu_metrics.current_socket_power, MaxUIntegerTypes.UINT16_T),
+        "vcn_activity": _validate_if_max_uint(list(gpu_metrics.vcn_activity), MaxUIntegerTypes.UINT16_T, isActivity=True),
+        "gfxclk_lock_status": _validate_if_max_uint(gpu_metrics.gfxclk_lock_status, MaxUIntegerTypes.UINT32_T),
+        "xgmi_link_width": _validate_if_max_uint(gpu_metrics.xgmi_link_width, MaxUIntegerTypes.UINT16_T),
+        "xgmi_link_speed": _validate_if_max_uint(gpu_metrics.xgmi_link_speed, MaxUIntegerTypes.UINT16_T),
+        "pcie_bandwidth_acc": _validate_if_max_uint(gpu_metrics.pcie_bandwidth_acc, MaxUIntegerTypes.UINT64_T),
+        "pcie_bandwidth_inst": _validate_if_max_uint(gpu_metrics.pcie_bandwidth_inst, MaxUIntegerTypes.UINT64_T),
+        "pcie_l0_to_recov_count_acc": _validate_if_max_uint(gpu_metrics.pcie_l0_to_recov_count_acc, MaxUIntegerTypes.UINT64_T),
+        "pcie_replay_count_acc": _validate_if_max_uint(gpu_metrics.pcie_replay_count_acc, MaxUIntegerTypes.UINT64_T),
+        "pcie_replay_rover_count_acc": _validate_if_max_uint(gpu_metrics.pcie_replay_rover_count_acc, MaxUIntegerTypes.UINT64_T),
+        "xgmi_read_data_acc": _validate_if_max_uint(list(gpu_metrics.xgmi_read_data_acc), MaxUIntegerTypes.UINT64_T),
+        "xgmi_write_data_acc": _validate_if_max_uint(list(gpu_metrics.xgmi_write_data_acc), MaxUIntegerTypes.UINT64_T),
+        "current_gfxclks": _validate_if_max_uint(list(gpu_metrics.current_gfxclks), MaxUIntegerTypes.UINT16_T),
+        "current_socclks": _validate_if_max_uint(list(gpu_metrics.current_socclks), MaxUIntegerTypes.UINT16_T),
+        "current_vclk0s": _validate_if_max_uint(list(gpu_metrics.current_vclk0s), MaxUIntegerTypes.UINT16_T),
+        "current_dclk0s": _validate_if_max_uint(list(gpu_metrics.current_dclk0s), MaxUIntegerTypes.UINT16_T),
+        "jpeg_activity": _validate_if_max_uint(list(gpu_metrics.jpeg_activity), MaxUIntegerTypes.UINT16_T, isActivity=True),
+        "pcie_nak_sent_count_acc": _validate_if_max_uint(gpu_metrics.pcie_nak_sent_count_acc, MaxUIntegerTypes.UINT32_T),
+        "pcie_nak_rcvd_count_acc": _validate_if_max_uint(gpu_metrics.pcie_nak_rcvd_count_acc, MaxUIntegerTypes.UINT32_T),
+        "accumulation_counter": _validate_if_max_uint(gpu_metrics.accumulation_counter, MaxUIntegerTypes.UINT64_T),
+        "prochot_residency_acc": _validate_if_max_uint(gpu_metrics.prochot_residency_acc, MaxUIntegerTypes.UINT64_T),
+        "ppt_residency_acc": _validate_if_max_uint(gpu_metrics.ppt_residency_acc, MaxUIntegerTypes.UINT64_T),
+        "socket_thm_residency_acc": _validate_if_max_uint(gpu_metrics.socket_thm_residency_acc, MaxUIntegerTypes.UINT64_T),
+        "vr_thm_residency_acc": _validate_if_max_uint(gpu_metrics.vr_thm_residency_acc, MaxUIntegerTypes.UINT64_T),
+        "hbm_thm_residency_acc": _validate_if_max_uint(gpu_metrics.hbm_thm_residency_acc, MaxUIntegerTypes.UINT64_T),
+        "num_partition": _validate_if_max_uint(gpu_metrics.num_partition, MaxUIntegerTypes.UINT16_T),
+        "xcp_stats.gfx_busy_inst": list(gpu_metrics.xcp_stats),
+        "xcp_stats.jpeg_busy": list(gpu_metrics.xcp_stats),
+        "xcp_stats.vcn_busy": list(gpu_metrics.xcp_stats),
+        "xcp_stats.gfx_busy_acc": list(gpu_metrics.xcp_stats),
+        "xcp_stats.gfx_below_host_limit_acc": list(gpu_metrics.xcp_stats),
+        "xcp_stats.gfx_below_host_limit_ppt_acc": list(gpu_metrics.xcp_stats),
+        "xcp_stats.gfx_below_host_limit_thm_acc": list(gpu_metrics.xcp_stats),
+        "xcp_stats.gfx_low_utilization_acc": list(gpu_metrics.xcp_stats),
+        "xcp_stats.gfx_below_host_limit_total_acc": list(gpu_metrics.xcp_stats),
+        "pcie_lc_perf_other_end_recovery": _validate_if_max_uint(gpu_metrics.pcie_lc_perf_other_end_recovery, MaxUIntegerTypes.UINT32_T),
+        "vram_max_bandwidth": _validate_if_max_uint(gpu_metrics.vram_max_bandwidth, MaxUIntegerTypes.UINT64_T),
+        "xgmi_link_status": _validate_if_max_uint(list(gpu_metrics.xgmi_link_status), MaxUIntegerTypes.UINT16_T),
+    }
+
+    # Create 2d array with each XCD's stats
+    if 'xcp_stats.gfx_busy_inst' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.gfx_busy_inst']):
+            xcp_detail = []
+            for val in xcp_metrics.gfx_busy_inst:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT32_T, isActivity=True))
+            gpu_metrics_output['xcp_stats.gfx_busy_inst'][xcp_index] = xcp_detail
+
+    if 'xcp_stats.jpeg_busy' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.jpeg_busy']):
+            xcp_detail = []
+            for val in xcp_metrics.jpeg_busy:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT16_T, isActivity=True))
+            gpu_metrics_output['xcp_stats.jpeg_busy'][xcp_index] = xcp_detail
+
+    if 'xcp_stats.vcn_busy' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.vcn_busy']):
+            xcp_detail = []
+            for val in xcp_metrics.vcn_busy:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT16_T, isActivity=True))
+            gpu_metrics_output["xcp_stats.vcn_busy"][xcp_index] = xcp_detail
+
+    if 'xcp_stats.gfx_busy_acc' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.gfx_busy_acc']):
+            xcp_detail = []
+            for val in xcp_metrics.gfx_busy_acc:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
+            gpu_metrics_output["xcp_stats.gfx_busy_acc"][xcp_index] = xcp_detail
+
+    if 'xcp_stats.gfx_below_host_limit_acc' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.gfx_below_host_limit_acc']):
+            xcp_detail = []
+            for val in xcp_metrics.gfx_below_host_limit_acc:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
+            gpu_metrics_output['xcp_stats.gfx_below_host_limit_acc'][xcp_index] = xcp_detail
+    # new for gpu metrics v1.8
+    if 'xcp_stats.gfx_below_host_limit_ppt_acc'  in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.gfx_below_host_limit_ppt_acc']):
+            xcp_detail = []
+            for val in xcp_metrics.gfx_below_host_limit_ppt_acc:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
+            gpu_metrics_output['xcp_stats.gfx_below_host_limit_ppt_acc'][xcp_index] = xcp_detail
+    if 'xcp_stats.gfx_below_host_limit_thm_acc' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.gfx_below_host_limit_thm_acc']):
+            xcp_detail = []
+            for val in xcp_metrics.gfx_below_host_limit_thm_acc:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
+            gpu_metrics_output['xcp_stats.gfx_below_host_limit_thm_acc'][xcp_index] = xcp_detail
+    if 'xcp_stats.gfx_low_utilization_acc' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.gfx_low_utilization_acc']):
+            xcp_detail = []
+            for val in xcp_metrics.gfx_low_utilization_acc:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
+            gpu_metrics_output['xcp_stats.gfx_low_utilization_acc'][xcp_index] = xcp_detail
+    if 'xcp_stats.gfx_below_host_limit_total_acc' in gpu_metrics_output:
+        for xcp_index, xcp_metrics in enumerate(gpu_metrics_output['xcp_stats.gfx_below_host_limit_total_acc']):
+            xcp_detail = []
+            for val in xcp_metrics.gfx_below_host_limit_total_acc:
+                xcp_detail.append(_validate_if_max_uint(val, MaxUIntegerTypes.UINT64_T))
+            gpu_metrics_output['xcp_stats.gfx_below_host_limit_total_acc'][xcp_index] = xcp_detail
+    return gpu_metrics_output
+

 def amdsmi_get_gpu_od_volt_curve_regions(
    processor_handle: processor_handle_t, num_regions: int
@@ -964,6 +964,21 @@ amdsmi_card_form_factor_t = ctypes.c_uint32 # enum
 class struct_amdsmi_pcie_info_t(Structure):
    pass

+class struct_pcie_static_(Structure):
+    pass
+
+struct_pcie_static_._pack_ = 1 # source:False
+struct_pcie_static_._fields_ = [
+    ('max_pcie_width', ctypes.c_uint16),
+    ('PADDING_0', ctypes.c_ubyte * 2),
+    ('max_pcie_speed', ctypes.c_uint32),
+    ('pcie_interface_version', ctypes.c_uint32),
+    ('slot_type', amdsmi_card_form_factor_t),
+    ('max_pcie_interface_version', ctypes.c_uint32),
+    ('PADDING_1', ctypes.c_ubyte * 4),
+    ('reserved', ctypes.c_uint64 * 9),
+]
+
 class struct_pcie_metric_(Structure):
    pass

@@ -984,21 +999,6 @@ struct_pcie_metric_._fields_ = [
    ('reserved', ctypes.c_uint64 * 12),
 ]

-class struct_pcie_static_(Structure):
-    pass
-
-struct_pcie_static_._pack_ = 1 # source:False
-struct_pcie_static_._fields_ = [
-    ('max_pcie_width', ctypes.c_uint16),
-    ('PADDING_0', ctypes.c_ubyte * 2),
-    ('max_pcie_speed', ctypes.c_uint32),
-    ('pcie_interface_version', ctypes.c_uint32),
-    ('slot_type', amdsmi_card_form_factor_t),
-    ('max_pcie_interface_version', ctypes.c_uint32),
-    ('PADDING_1', ctypes.c_ubyte * 4),
-    ('reserved', ctypes.c_uint64 * 9),
-]
-
 struct_amdsmi_pcie_info_t._pack_ = 1 # source:False
 struct_amdsmi_pcie_info_t._fields_ = [
    ('pcie_static', struct_pcie_static_),
@@ -2630,6 +2630,9 @@ amdsmi_get_gpu_metrics_header_info.argtypes = [amdsmi_processor_handle, ctypes.P
 amdsmi_get_gpu_metrics_info = _libraries['libamd_smi.so'].amdsmi_get_gpu_metrics_info
 amdsmi_get_gpu_metrics_info.restype = amdsmi_status_t
 amdsmi_get_gpu_metrics_info.argtypes = [amdsmi_processor_handle, ctypes.POINTER(struct_amdsmi_gpu_metrics_t)]
+amdsmi_get_gpu_partition_metrics_info = _libraries['libamd_smi.so'].amdsmi_get_gpu_partition_metrics_info
+amdsmi_get_gpu_partition_metrics_info.restype = amdsmi_status_t
+amdsmi_get_gpu_partition_metrics_info.argtypes = [amdsmi_processor_handle, ctypes.POINTER(struct_amdsmi_gpu_metrics_t)]
 amdsmi_get_gpu_pm_metrics_info = _libraries['libamd_smi.so'].amdsmi_get_gpu_pm_metrics_info
 amdsmi_get_gpu_pm_metrics_info.restype = amdsmi_status_t
 amdsmi_get_gpu_pm_metrics_info.argtypes = [amdsmi_processor_handle, ctypes.POINTER(ctypes.POINTER(struct_amdsmi_name_value_t)), ctypes.POINTER(ctypes.c_uint32)]
@@ -3418,6 +3421,7 @@ __all__ = \
    'amdsmi_get_gpu_metrics_info',
    'amdsmi_get_gpu_od_volt_curve_regions',
    'amdsmi_get_gpu_od_volt_info', 'amdsmi_get_gpu_overdrive_level',
+    'amdsmi_get_gpu_partition_metrics_info',
    'amdsmi_get_gpu_pci_bandwidth',
    'amdsmi_get_gpu_pci_replay_counter',
    'amdsmi_get_gpu_pci_throughput', 'amdsmi_get_gpu_perf_level',
@@ -3264,6 +3264,29 @@ rsmi_status_t rsmi_dev_gpu_reset(uint32_t dv_ind);
 rsmi_status_t rsmi_dev_od_volt_info_get(uint32_t dv_ind,
                                               rsmi_od_volt_freq_data_t *odv);

+/**
+ *  @brief This function retrieves the gpu partition metrics information
+ *
+ *  @details Given a device index @p dv_ind and a pointer to a
+ *  ::rsmi_gpu_metrics_t structure @p pgpu_metrics, this function will populate
+ *  @p pgpu_metrics. See ::rsmi_gpu_metrics_t for more details.
+ *
+ *  @param[in] dv_ind a device index
+ *
+ *  @param[inout] pgpu_metrics a pointer to an ::rsmi_gpu_metrics_t structure
+ *  If this parameter is nullptr, this function will return
+ *  ::RSMI_STATUS_INVALID_ARGS if the function is supported with the provided,
+ *  arguments and ::RSMI_STATUS_NOT_SUPPORTED if it is not supported with the
+ *  provided arguments.
+ *
+ *  @retval ::RSMI_STATUS_SUCCESS call was successful
+ *  @retval ::RSMI_STATUS_NOT_SUPPORTED installed software or hardware does not
+ *  support this function with the given arguments
+ *  @retval ::RSMI_STATUS_INVALID_ARGS the provided arguments are not valid
+ */
+rsmi_status_t rsmi_dev_gpu_partition_metrics_info_get(uint32_t dv_ind,
+                                                      rsmi_gpu_metrics_t *pgpu_metrics);
+
 /**
 *  @brief This function retrieves the gpu metrics information
 *
@@ -156,6 +156,7 @@ enum DevInfoTypes {
  kDevMemPageBad,
  kDevNumaNode,
  kDevGpuMetrics,
+  kdevGpuPartitionMetrics,
  kDevPmMetrics,
  kDevRegMetrics,
  kDevBaseBoardTempMetrics,
@@ -215,7 +216,7 @@ class Device {
    int readDevInfo(DevInfoTypes type, std::vector<std::string> *retVec);
    int readDevInfo(DevInfoTypes type, std::size_t b_size,
                                      void *p_binary_data);
-    std::string get_sys_file_path_by_type(DevInfoTypes type) const;
+    std::string get_sys_file_path_by_type(DevInfoTypes type, bool getPathOnly = false) const;
    // Get the property from a file which may contain multiple properties.
    int readDevInfo(DevInfoTypes type, const std::string& property,
                                      std::string& value);
@@ -254,19 +255,31 @@ class Device {
    template <typename T> std::string readBootPartitionState(uint32_t dv_ind);
    rsmi_status_t check_amdgpu_property_reinforcement_query(uint32_t dev_idx, AMDGpuVerbTypes_t verb_type);

-    void dev_set_gpu_metric(GpuMetricsBasePtr gpu_metrics_ptr) { m_gpu_metrics_ptr = std::move(gpu_metrics_ptr); };
-    GpuMetricsBasePtr& dev_get_gpu_metric() { return m_gpu_metrics_ptr; };
    const AMDGpuMetricsHeader_v1_t& dev_get_metrics_header() {return m_gpu_metrics_header; }
-    rsmi_status_t setup_gpu_metrics_reading();
-    rsmi_status_t dev_read_gpu_metrics_header_data();
-    rsmi_status_t dev_read_gpu_metrics_all_data();
-    rsmi_status_t run_internal_gpu_metrics_query(AMDGpuMetricsUnitType_t metric_counter, AMDGpuDynamicMetricTblValues_t& values);
-    rsmi_status_t dev_log_gpu_metrics(std::ostringstream& outstream_metrics);
-    AMGpuMetricsPublicLatestTupl_t dev_copy_internal_to_external_metrics();
+    auto setup_gpu_metrics_reading(DevInfoTypes type = DevInfoTypes::kDevGpuMetrics)
+        -> rsmi_status_t;
+    auto dev_read_gpu_metrics_header_data(DevInfoTypes type = DevInfoTypes::kDevGpuMetrics)
+        -> rsmi_status_t;
+    auto dev_read_gpu_metrics_all_data(DevInfoTypes type = DevInfoTypes::kDevGpuMetrics)
+        -> rsmi_status_t;
+    auto run_internal_gpu_metrics_query(AMDGpuMetricsUnitType_t metric_counter,
+                                        AMDGpuDynamicMetricTblValues_t &values,
+                                        DevInfoTypes type = DevInfoTypes::kDevGpuMetrics)
+        -> rsmi_status_t;
+    auto dev_log_gpu_metrics(std::ostringstream &outstream_metrics,
+                             DevInfoTypes type = DevInfoTypes::kDevGpuMetrics) -> rsmi_status_t;
+    auto dev_copy_internal_to_external_metrics(DevInfoTypes type = DevInfoTypes::kDevGpuMetrics)
+        -> AMGpuMetricsPublicLatestTupl_t;

    static const std::map<DevInfoTypes, const char*> devInfoTypesStrings;
    void set_smi_device_id(uint32_t device_id) { m_device_id = device_id; }
    void set_smi_partition_id(uint32_t partition_id) { m_partition_id = partition_id; }
+    auto set_smi_dev_info_type(DevInfoTypes type) -> void { m_dev_info_type = type; }
+    auto get_smi_device_id(void) const -> uint32_t { return m_device_id; }
+    auto get_smi_partition_id(void) const -> uint32_t { return m_partition_id; }
+    auto is_smi_expecting_partition_metrics(void) const -> bool {
+        return m_dev_info_type == DevInfoTypes::kdevGpuPartitionMetrics;
+    }
    static const char* get_type_string(DevInfoTypes type);
    rsmi_status_t get_smi_device_identifiers(uint32_t device_id,
                  rsmi_device_identifiers_t *device_identifiers);
@@ -310,6 +323,7 @@ class Device {
    uint64_t m_gpu_metrics_updated_timestamp;
    uint32_t m_device_id;
    uint32_t m_partition_id;
+    DevInfoTypes m_dev_info_type{DevInfoTypes::kDevGpuMetrics};

    // New dynamic GPU metrics support
    bool m_is_dynamic_gpu_metrics_supported = false;
@@ -1,49 +1,24 @@
 /*
- * MIT License
- *
 * Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
 *
- *  Developed by:
- *
- *                  AMD ML Software Engineering
- *
- *                  Advanced Micro Devices, Inc.
- *
- *                  www.amd.com
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- *  - Redistributions of source code must retain the above copyright notice,
- *    this list of conditions and the following disclaimers.
- *  - Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimers in
- *    the documentation and/or other materials provided with the distribution.
- *  - Neither the names of Advanced Micro Devices, Inc,
- *    nor the names of its contributors may be used to endorse or promote
- *    products derived from this Software without specific prior written
- *    permission.
- *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- *
- *
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
 */
-
- 
 #ifndef ROCM_SMI_ROCM_SMI_DYN_GPU_METRICS_H_
 #define ROCM_SMI_ROCM_SMI_DYN_GPU_METRICS_H_

@@ -299,6 +274,7 @@ enum class AMDGpuMetricUnitType_t
    QUANTITY,
    STATUS_FLAG
 };
+
 using AMDGpuMetricUnitTypeTranslationTable_t = std::unordered_map<AMDGpuMetricUnitType_t, AMDGpuDynamicTranslationTextInfo_t>;

 static const auto AMDGpuMetricUnitTypeToString = AMDGpuMetricUnitTypeTranslationTable_t {
@@ -26,6 +26,7 @@
 #include "rocm_smi/rocm_smi_common.h"
 #include "rocm_smi/rocm_smi.h"
 #include "rocm_smi/rocm_smi_dyn_gpu_metrics.h"
+#include "rocm_smi/rocm_smi_logger.h"

 #include <array>
 #include <algorithm>
@@ -689,6 +690,33 @@ struct AMDGpuMetrics_v17_t {
  uint32_t m_pcie_lc_perf_other_end_recovery;
 };

+struct AMDGpuMetrics_v18_Partition_v1_0_t {
+  ~AMDGpuMetrics_v18_Partition_v1_0_t() = default;
+  struct AMDGpuMetricsHeader_v1_t m_common_header;
+
+  /* Current clocks (Mhz) */
+  uint16_t m_current_gfxclk[kRSMI_MAX_NUM_XCC];
+  uint16_t m_current_socclk[kRSMI_MAX_NUM_CLKS];
+  uint16_t m_current_vclk0[kRSMI_MAX_NUM_CLKS];
+  uint16_t m_current_dclk0[kRSMI_MAX_NUM_CLKS];
+  uint16_t m_current_uclk;
+  uint16_t m_padding;
+
+  /* Utilization Instantaneous (%) */
+  uint32_t m_gfx_busy_inst[kRSMI_MAX_NUM_XCC];
+  uint16_t m_jpeg_busy[kRSMI_MAX_NUM_JPEG_ENG_V1];
+  uint16_t m_vcn_busy[kRSMI_MAX_NUM_VCNS];
+
+  /* Utilization Accumulated (%) */
+  uint64_t m_gfx_busy_acc[kRSMI_MAX_NUM_XCC];
+
+  /* Total App Clock Counter Accumulated */
+  uint64_t m_gfx_below_host_limit_ppt_acc[kRSMI_MAX_NUM_XCC];
+  uint64_t m_gfx_below_host_limit_thm_acc[kRSMI_MAX_NUM_XCC];
+  uint64_t m_gfx_low_utilization_acc[kRSMI_MAX_NUM_XCC];
+  uint64_t m_gfx_below_host_limit_total_acc[kRSMI_MAX_NUM_XCC];
+};
+
 struct AMDGpuMetrics_v18_t {
  ~AMDGpuMetrics_v18_t() = default;
  struct AMDGpuMetricsHeader_v1_t m_common_header;
@@ -1053,8 +1081,10 @@ enum class AMDGpuMetricVersionFlags_t : AMDGpuMetricVersionFlagId_t
  kGpuMetricV15 = (0x1 << 5),
  kGpuMetricV16 = (0x1 << 6),
  kGpuMetricV17 = (0x1 << 7),
-  kGpuMetricV18 = (0x1 << 8),  // Added new version flag: Last static GPU Metrics
-  kGpuMetricV19 = (0x1 << 9),  // Dyn.GPU Metrics
+  kGpuMetricV18 = (0x1 << 8),
+  kGpuXcpMetricV10 = (0x1 << 0),         // Added in v1.8 for partition metrics v1.0
+  kGpuMetricDynV19Plus = (0x1 << 9),     // Dyn. GPU Metrics v1.9+
+  kGpuXcpMetricDynV11Plus = (0x1 << 1),  // Added in v1.9 for Dyn. partition metrics v1.1+
 };
 using AMDGpuMetricVersionTranslationTbl_t = std::map<uint16_t, AMDGpuMetricVersionFlags_t>;
 using GpuMetricTypePtr_t = std::shared_ptr<void>;
@@ -1069,6 +1099,7 @@ class GpuMetricsBase_t {
    virtual AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() = 0;
    virtual void set_device_id(uint32_t device_id) { m_device_id = device_id; }
    virtual void set_partition_id(uint32_t partition_id) { m_partition_id = partition_id; }
+    virtual void set_is_partition_metrics(bool is_partition_req) { m_is_partition_metrics = is_partition_req; }
    static std::mutex s_base_tbl_mu;
    virtual AMDGpuDynamicMetricsTbl_t get_metrics_dynamic_tbl() {
      std::lock_guard<std::mutex> lk(s_base_tbl_mu);
@@ -1080,6 +1111,7 @@ class GpuMetricsBase_t {
    uint64_t m_metrics_timestamp;
    uint32_t m_device_id;
    uint32_t m_partition_id;
+    bool m_is_partition_metrics {false};
 };
 using GpuMetricsBasePtr = std::shared_ptr<GpuMetricsBase_t>;
 using AMDGpuMetricFactories_t = const std::map<AMDGpuMetricVersionFlags_t, GpuMetricsBasePtr>;
@@ -1293,11 +1325,31 @@ class GpuMetricsBase_v18_t final : public GpuMetricsBase_t {
  }

  GpuMetricTypePtr_t get_metrics_table() override {
-    if (!m_gpu_metric_ptr) {
-      m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v18_t*){});
+    std::ostringstream ss;
+    ss << __PRETTY_FUNCTION__
+       << " ==== START ==== "
+       << " Initializing metrics table request: "
+       << " | Partition ID: " << m_partition_id
+       << " | Device ID: " << m_device_id
+       << " | Is Partition Metrics: " << std::boolalpha << m_is_partition_metrics
+       << " | m_gpu_metric_ptr: " << (!m_gpu_metric_ptr ? "nullptr" : "valid")
+       << " | m_gpu_metric_partition_ptr: "
+       << (!m_gpu_metric_partition_ptr ? "nullptr" : "valid");
+    LOG_DEBUG(ss);
+    // If m_is_partition_metrics is false, we use the main GPU metrics table.
+    // Otherwise, we use the partition metrics table.
+    // This is to avoid having two pointers to the same table.
+    if (m_is_partition_metrics && !m_gpu_metric_partition_ptr) {
+      return std::shared_ptr<AMDGpuMetrics_v18_Partition_v1_0_t>(
+                &m_gpu_metrics_partition_tbl, [](AMDGpuMetrics_v18_Partition_v1_0_t*){/* no-op */});
+    } else if (!m_is_partition_metrics && !m_gpu_metric_ptr) {
+      return std::shared_ptr<AMDGpuMetrics_v18_t>(
+                &m_gpu_metrics_tbl, [](AMDGpuMetrics_v18_t*){/* no-op */});
    }
-    assert(m_gpu_metric_ptr != nullptr);
-    return m_gpu_metric_ptr;
+    return std::shared_ptr<AMDGpuMetrics_v18_t>(
+                nullptr, [](AMDGpuMetrics_v18_t*){/* no-op */});  // Return nullptr if we couldn't
+                                                                  // validate which metric table
+                                                                  // user is requesting
  }

  AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
@@ -1310,10 +1362,12 @@ class GpuMetricsBase_v18_t final : public GpuMetricsBase_t {
 private:
  AMDGpuMetrics_v18_t m_gpu_metrics_tbl;
  std::shared_ptr<AMDGpuMetrics_v18_t> m_gpu_metric_ptr;
+  AMDGpuMetrics_v18_Partition_v1_0_t m_gpu_metrics_partition_tbl;
+  std::shared_ptr<AMDGpuMetrics_v18_Partition_v1_0_t> m_gpu_metric_partition_ptr;
 };

 class GpuMetricsBaseDynamic_t final : public GpuMetricsBase_t {
-  public:
+ public:
    ~GpuMetricsBaseDynamic_t() = default;

  // Unused
@@ -1341,7 +1395,7 @@ class GpuMetricsBaseDynamic_t final : public GpuMetricsBase_t {

  AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;

-  private:
+ private:
    AMDGpuDynamicMetrics_t m_dyn;
    details::AMDGpuDynamicMetricsHeader_v1_t m_header{};

@@ -114,6 +114,7 @@ static const char *kDevXGMIErrorFName = "xgmi_error";
 static const char *kDevSerialNumberFName = "serial_number";
 static const char *kDevNumaNodeFName = "numa_node";
 static const char *kDevGpuMetricsFName = "gpu_metrics";
+static const char *kDevGpuPartitionMetricsFName = "xcp/xcp_metrics";
 static const char *kDevPmMetricsFName = "pm_metrics";   // PM log
 static const char *kDevRegMetricsFName = "reg_state";   // register table
 static const char *kDevBaseBoardTempMetricsFName = "board/baseboard_temp";
@@ -321,6 +322,7 @@ static const std::map<DevInfoTypes, const char *> kDevAttribNameMap = {
    {kDevMemPageBad, kDevMemPageBadFName},
    {kDevNumaNode, kDevNumaNodeFName},
    {kDevGpuMetrics, kDevGpuMetricsFName},
+    {kdevGpuPartitionMetrics, kDevGpuPartitionMetricsFName},
    {kDevPmMetrics, kDevPmMetricsFName},
    {kDevSocPstate, kDevSocPstateFName},
    {kDevXgmiPlpd, kDevXgmiPlpdFName},
@@ -498,6 +500,7 @@ Device::devInfoTypesStrings = {
  {kDevMemPageBad, "kDevMemPageBad"},
  {kDevNumaNode, "kDevNumaNode"},
  {kDevGpuMetrics, "kDevGpuMetrics"},
+  {kdevGpuPartitionMetrics, "kdevGpuPartitionMetrics"},
  {kDevPmMetrics, "kDevPmMetrics"},
  {kDevRegMetrics, "kDevRegMetrics"},
  {kDevBaseBoardTempMetrics, "kDevBaseBoardTempMetrics"},
@@ -747,10 +750,29 @@ int Device::openDebugFileStream(DevInfoTypes type, T *fs, const char *str) {
  return 0;
 }

-std::string Device::get_sys_file_path_by_type(DevInfoTypes type) const {
+/**
+ * @brief Get the sysfs file path for a given device attribute type.
+ *
+ * This function constructs the full path to a sysfs file corresponding to the specified
+ * device attribute type for this device instance. The path is constructed using the device's
+ * base path, appending "/device/" and the attribute name from kDevAttribNameMap.
+ *
+ * If getPathOnly is true, the constructed path is returned without checking for file existence.
+ * If getPathOnly is false, the function checks if the file exists; if not, an empty string is returned.
+ *
+ * @param type        The device attribute type (DevInfoTypes) for which to get the sysfs file path.
+ * @param getPathOnly If true, return the constructed path without checking for file existence.
+ *                    If false, return an empty string if the file does not exist.
+ * @return std::string The full sysfs file path, or an empty string if the file does not exist
+ *                     and getPathOnly is false.
+ */
+std::string Device::get_sys_file_path_by_type(DevInfoTypes type, bool getPathOnly) const {
  auto sysfs_path = path_;
  sysfs_path += "/device/";
  sysfs_path += kDevAttribNameMap.at(type);
+  if (getPathOnly) {
+    return sysfs_path;
+  }

  if (access(sysfs_path.c_str(), F_OK) != 0) {
      sysfs_path.clear();
@@ -1133,7 +1155,6 @@ int Device::readDevInfoBinary(DevInfoTypes type, std::size_t b_size,
  // is the issue, so should remain.
  const std::string key = path_ + "/device/" + kDevAttribNameMap.at(type)
                                + "#" + std::to_string(b_size);
-                                
  GpuMetricsCache* cache_ptr = nullptr;
  {
    std::lock_guard<std::mutex> map_lk(g_gpu_metrics_cache_map_mu);
@@ -1447,6 +1468,7 @@ int Device::readDevInfo(DevInfoTypes type, std::size_t b_size,

  switch (type) {
     case kDevGpuMetrics:
+     case kdevGpuPartitionMetrics:
      return readDevInfoBinary(type, b_size, p_binary_data);
      break;

@@ -1,46 +1,23 @@
 /*
- * MIT License
- *
 * Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
 *
- *  Developed by:
- *
- *                  AMD ML Software Engineering
- *
- *                  Advanced Micro Devices, Inc.
- *
- *                  www.amd.com
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- *  - Redistributions of source code must retain the above copyright notice,
- *    this list of conditions and the following disclaimers.
- *  - Redistributions in binary form must reproduce the above copyright
- *    notice, this list of conditions and the following disclaimers in
- *    the documentation and/or other materials provided with the distribution.
- *  - Neither the names of Advanced Micro Devices, Inc,
- *    nor the names of its contributors may be used to endorse or promote
- *    products derived from this Software without specific prior written
- *    permission.
- *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- *
- *
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
 */

 #include "rocm_smi/rocm_smi.h"
@@ -156,7 +133,7 @@ static inline std::optional<AMDGpuMetricAttributeValue_t> read_metric_value(Curs

 auto AMDGpuDynamicMetrics_t::parse_from_buffer(const std::byte* data,
                                              std::size_t size) noexcept -> rsmi_status_t {
-
+  std::ostringstream ss;
  rsmi_status_t status = RSMI_STATUS_SUCCESS;
  if (!data || (size < (sizeof(AMDGpuDynamicMetricsHeader_v1_t) + sizeof(uint32_t)))) {
      return RSMI_STATUS_INSUFFICIENT_SIZE;
@@ -178,6 +155,17 @@ auto AMDGpuDynamicMetrics_t::parse_from_buffer(const std::byte* data,
  if (attr_count == 0 || attr_count > size){
    return RSMI_STATUS_UNEXPECTED_SIZE;
  }
+  std::string m_header_version_str = std::to_string(static_cast<uint32_t>(hdr.m_format_revision))
+                                     + "." +
+                                     std::to_string(static_cast<uint32_t>(hdr.m_content_revision));
+  ss << __PRETTY_FUNCTION__
+     << " | Info: Dynamic GPU Metrics"
+     << " | Attr Count: " << attr_count
+     << " | Header Version: " << m_header_version_str
+     << " | Header Size: " << hdr.get_size()
+     << " | Total Size: " << size
+     << " |";
+  LOG_TRACE(ss);

  details::AMDGpuMetricSchemaType_t metrics_data;
  metrics_data.reserve(attr_count);
@@ -212,7 +200,6 @@ auto AMDGpuDynamicMetrics_t::parse_from_buffer(const std::byte* data,
    AMDGpuMetricAttributeInstance_t inst{};
    status = schema_lookup_instance(attr_id, attr_type, inst);
    if (status != RSMI_STATUS_SUCCESS){
-      std::ostringstream ss;
      ss << __PRETTY_FUNCTION__
        << " | Warn: schema lookup miss"
        << " | Attr ID: "   << static_cast<std::underlying_type_t<AMDGpuMetricAttributeId_t>>(attr_id)
@@ -3352,13 +3352,27 @@ amdsmi_get_gpu_metrics_header_info(amdsmi_processor_handle processor_handle,
                    reinterpret_cast<metrics_table_header_t*>(header_value));
 }

+amdsmi_status_t  amdsmi_get_gpu_partition_metrics_info(
+        amdsmi_processor_handle processor_handle,
+        amdsmi_gpu_metrics_t *pgpu_metrics) {
+    AMDSMI_CHECK_INIT();
+    if (pgpu_metrics != nullptr) {
+        *pgpu_metrics = amdsmi_gpu_metrics_t{};  // Use a default initializer for the struct
+    } else {
+        return AMDSMI_STATUS_INVAL;  // Return error if pgpu_metrics is null
+    }
+    return rsmi_wrapper(rsmi_dev_gpu_partition_metrics_info_get, processor_handle, 0,
+                       reinterpret_cast<rsmi_gpu_metrics_t*>(pgpu_metrics));
+}
+
 amdsmi_status_t  amdsmi_get_gpu_metrics_info(
        amdsmi_processor_handle processor_handle,
        amdsmi_gpu_metrics_t *pgpu_metrics) {
    AMDSMI_CHECK_INIT();
-    // nullptr api supported
    if (pgpu_metrics != nullptr) {
        *pgpu_metrics = amdsmi_gpu_metrics_t{};  // Use a default initializer for the struct
+    } else {
+        return AMDSMI_STATUS_INVAL;  // Return error if pgpu_metrics is null
    }
    return rsmi_wrapper(rsmi_dev_gpu_metrics_info_get, processor_handle, 0,
                       reinterpret_cast<rsmi_gpu_metrics_t*>(pgpu_metrics));
@@ -52,6 +52,11 @@ include_directories(${TEST} ${CMAKE_CURRENT_SOURCE_DIR}/.. ${ROCM_INC_DIR}/..)
 add_executable(${TEST} ${tstSources} ${functionalSources})
 target_link_libraries(${TEST} ${AMD_SMI} GTest::gtest_main c stdc++ pthread)

+if (CMAKE_CXX_COMPILER_ID STREQUAL "GNU"
+    AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS "9")
+  target_link_libraries(${TEST} stdc++fs)
+endif()
+
 # Install tests
 install(
    TARGETS ${TEST}
@@ -0,0 +1,203 @@
+/*
+ * Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#include <amd_smi_test/test_base.h>
+#include <gtest/gtest.h>
+
+#include <cstdint>
+#include <filesystem>
+#include <fstream>
+#include <vector>
+
+#include "rocm_smi/rocm_smi_gpu_metrics.h"
+
+namespace amd::smi {
+
+// Forward declarations of internal helpers we exercise in this unit-test.
+AMDGpuMetricVersionFlags_t translate_header_to_flag_version(
+    const AMDGpuMetricsHeader_v1_t& metrics_header, bool is_partition_metrics,
+    const std::string& file_path);
+
+GpuMetricsBasePtr amdgpu_metrics_factory(AMDGpuMetricVersionFlags_t gpu_metric_version,
+                                         bool is_partition_metrics, const std::string& file_path);
+
+}  // namespace amd::smi
+
+namespace {
+// Version helper checker
+auto GetExpectedMetricVersionFlag(uint16_t major, uint16_t minor, bool is_partition_metrics)
+    -> amd::smi::AMDGpuMetricVersionFlags_t {
+  using Flag = amd::smi::AMDGpuMetricVersionFlags_t;
+  if (is_partition_metrics) {
+    if (major == 1) {
+      if (minor == 0) {
+        return Flag::kGpuXcpMetricV10;
+      } else if (minor >= 1) {
+        return Flag::kGpuXcpMetricDynV11Plus;
+      } else {
+        return Flag::kGpuMetricNone;
+      }
+    }
+  } else {  // GPU metrics
+    if (major == 1) {
+      switch (minor) {
+        case 0: return Flag::kGpuMetricNone;
+        case 1: return Flag::kGpuMetricV11;
+        case 2: return Flag::kGpuMetricV12;
+        case 3: return Flag::kGpuMetricV13;
+        case 4: return Flag::kGpuMetricV14;
+        case 5: return Flag::kGpuMetricV15;
+        case 6: return Flag::kGpuMetricV16;
+        case 7: return Flag::kGpuMetricV17;
+        case 8: return Flag::kGpuMetricV18;
+        default: return Flag::kGpuMetricDynV19Plus;
+      }
+    }
+  }
+  return Flag::kGpuMetricNone;
+}
+
+// pass a header we want to test against
+auto BuildFakeMetricsBlob(amd::smi::AMDGpuMetricsHeader_v1_t new_header) -> std::vector<uint8_t> {
+  if (new_header.m_structure_size < sizeof(new_header)) {
+    throw std::runtime_error("Header size too small");
+  }
+  amd::smi::AMDGpuMetricsHeader_v1_t header{};
+  header.m_structure_size = static_cast<uint16_t>(sizeof(header));
+  header.m_format_revision = new_header.m_format_revision;
+  header.m_content_revision = new_header.m_content_revision;
+
+  const uint8_t* begin = reinterpret_cast<const uint8_t*>(&header);
+  return std::vector<uint8_t>(begin, begin + sizeof(header));
+}
+
+auto WriteBlobToTempFile(const std::vector<uint8_t>& blob,
+                         const std::string& filename = "amdsmi_fake_metrics.bin")
+    -> std::filesystem::path {
+  auto temp_dir = std::filesystem::temp_directory_path();
+  auto file_path = temp_dir / filename;
+
+  std::ofstream stream(file_path, std::ios::binary | std::ios::trunc);
+  stream.write(reinterpret_cast<const char*>(blob.data()),
+               static_cast<std::streamsize>(blob.size()));
+  stream.close();
+
+  return file_path;
+}
+
+}  // namespace
+
+TEST(AmdSmiDynamicMetricTest, GPUMetricDynamicVersionSupported) {
+  const bool is_partition_metrics = false;
+  for (auto ver : {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18}) {
+    std::string test_detail = "[GPUMetric";
+    if (ver >= 9) {
+      test_detail += "Dynamic] ";
+    } else {
+      test_detail += "] ";
+    }
+    std::cout << test_detail << "Checking version 1." << ver << std::endl;
+    SCOPED_TRACE(testing::Message() << "Subtest for minor version: 1." << ver);
+    const auto blob = BuildFakeMetricsBlob(amd::smi::AMDGpuMetricsHeader_v1_t{
+        .m_structure_size = sizeof(amd::smi::AMDGpuMetricsHeader_v1_t),
+        .m_format_revision = 1,
+        .m_content_revision = static_cast<uint16_t>(ver),  // Known minor versions
+    });
+    const auto fake_path =
+        WriteBlobToTempFile(blob, "amdsmi_fake_gpu_metrics_v1" + std::to_string(ver) + ".bin");
+
+    ASSERT_FALSE(blob.empty());
+    ASSERT_TRUE(std::filesystem::exists(fake_path));
+
+    const auto* header = reinterpret_cast<const amd::smi::AMDGpuMetricsHeader_v1_t*>(blob.data());
+
+    const auto flag = amd::smi::translate_header_to_flag_version(*header, is_partition_metrics,
+                                                                 fake_path.string());
+
+    EXPECT_EQ(flag, GetExpectedMetricVersionFlag(1, ver, is_partition_metrics))
+        << "Version 1." << ver << " should be treated as supported";
+
+    auto gpu_metrics_ptr =
+        amd::smi::amdgpu_metrics_factory(flag, is_partition_metrics, fake_path.string());
+
+    if (ver != 0) {
+      EXPECT_NE(gpu_metrics_ptr, nullptr)
+          << "Factory must create metrics object for supported version";
+    } else {
+      EXPECT_EQ(gpu_metrics_ptr, nullptr)
+          << "Factory must not create metrics object for unsupported versions";
+    }
+    if (gpu_metrics_ptr) {
+      std::cout << test_detail << "Created valid object for version 1." << ver << std::endl;
+    } else {
+      std::cout << test_detail << "Unsupported Metric Version"
+                << " | Failed to create valid object for version 1." << ver << std::endl;
+    }
+
+    std::filesystem::remove(fake_path);
+  }
+}
+
+TEST(AmdSmiDynamicMetricTest, XCPMetricDynamicVersionSupported) {
+  const bool is_partition_metrics = true;
+  for (auto ver : {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18}) {
+    std::string test_detail = "[XCPMetric";
+    if (ver >= 1) {
+      test_detail += "Dynamic] ";
+    } else {
+      test_detail += "] ";
+    }
+    std::cout << test_detail << "Checking version 1." << ver << std::endl;
+    SCOPED_TRACE(testing::Message() << "Subtest for minor version: 1." << ver);
+    const auto blob = BuildFakeMetricsBlob(amd::smi::AMDGpuMetricsHeader_v1_t{
+        .m_structure_size = sizeof(amd::smi::AMDGpuMetricsHeader_v1_t),
+        .m_format_revision = 1,
+        .m_content_revision = static_cast<uint16_t>(ver),  // Known minor versions
+    });
+    const auto fake_path =
+        WriteBlobToTempFile(blob, "amdsmi_fake_xcp_metrics_v1" + std::to_string(ver) + ".bin");
+
+    ASSERT_FALSE(blob.empty());
+    ASSERT_TRUE(std::filesystem::exists(fake_path));
+
+    const auto* header = reinterpret_cast<const amd::smi::AMDGpuMetricsHeader_v1_t*>(blob.data());
+
+    const auto flag = amd::smi::translate_header_to_flag_version(*header, is_partition_metrics,
+                                                                 fake_path.string());
+
+    EXPECT_EQ(flag, GetExpectedMetricVersionFlag(1, ver, is_partition_metrics))
+        << "Version 1." << ver << " should be treated as supported";
+
+    auto xcp_metrics_ptr =
+        amd::smi::amdgpu_metrics_factory(flag, is_partition_metrics, fake_path.string());
+
+    EXPECT_NE(xcp_metrics_ptr, nullptr)
+        << "Factory must create metrics object for supported version";
+    if (xcp_metrics_ptr) {
+      std::cout << test_detail << "Created valid object for version 1." << ver << std::endl;
+    } else {
+      std::cout << test_detail << "Failed to create valid object for version 1." << ver
+                << std::endl;
+    }
+
+    std::filesystem::remove(fake_path);
+  }
+}
@@ -0,0 +1,426 @@
+/*
+ * Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include <cstdint>
+
+#include <iostream>
+#include <iterator>
+#include <string>
+#include <map>
+
+#include <gtest/gtest.h>
+#include "amd_smi/amdsmi.h"
+#include "gpu_partition_metrics_read.h"
+#include "../test_common.h"
+#include "rocm_smi/rocm_smi_utils.h"
+#include "amd_smi/impl/amd_smi_utils.h"
+
+
+TestGpuPartitionMetricsRead::TestGpuPartitionMetricsRead() : TestBase() {
+  set_title("AMDSMI GPU Partition (XCP) Metrics Read Test");
+  set_description("The GPU Partition (XCP) Metrics tests verifies that "
+                  "the gpu metrics info can be read properly.");
+}
+
+TestGpuPartitionMetricsRead::~TestGpuPartitionMetricsRead(void) {
+}
+
+void TestGpuPartitionMetricsRead::SetUp(void) {
+  TestBase::SetUp();
+  return;
+}
+
+void TestGpuPartitionMetricsRead::DisplayTestInfo(void) {
+  TestBase::DisplayTestInfo();
+}
+
+void TestGpuPartitionMetricsRead::DisplayResults(void) const {
+  TestBase::DisplayResults();
+  return;
+}
+
+void TestGpuPartitionMetricsRead::Close() {
+  // This will close handles opened within amdsmitst utility calls and call
+  // amdsmi_shut_down(), so it should be done after other hsa cleanup
+  TestBase::Close();
+}
+
+
+
+void TestGpuPartitionMetricsRead::Run(void) {
+  amdsmi_status_t err;
+
+  TestBase::Run();
+  if (setup_failed_) {
+    std::cout << "** SetUp Failed for this test. Skipping.**" << std::endl;
+    return;
+  }
+
+  for (uint32_t i = 0; i < num_monitor_devs(); ++i) {
+    PrintDeviceHeader(processor_handles_[i]);
+    std::cout << "Device #" << std::to_string(i) << "\n";
+
+    IF_VERB(STANDARD) {
+        std::cout << "\n\n";
+        std::cout << "\t**GPU PARTITION METRICS: Using static struct (Backwards Compatibility):\n";
+    }
+    amdsmi_gpu_metrics_t smu = {};
+    err =  amdsmi_get_gpu_partition_metrics_info(processor_handles_[i], &smu);
+    const char *status_string;
+    amdsmi_status_code_to_string(err, &status_string);
+    std::cout << "\t\t** amdsmi_get_gpu_partition_metrics_info(): " << status_string
+    << "\n";
+    if (err != AMDSMI_STATUS_SUCCESS) {
+      if (err == AMDSMI_STATUS_NOT_SUPPORTED) {
+        IF_VERB(STANDARD) {
+          std::cout << "\t**" <<
+          "Not supported on this machine" << std::endl;
+          continue;
+        }
+      }
+      CHK_ERR_ASRT(err);  // Anything else should be a failure
+                          // (ie, we are not handling the metrics right/etc..)
+    } else {
+      IF_VERB(STANDARD) {
+          std::cout << "METRIC TABLE HEADER:\n";
+          std::cout << "structure_size=" << std::dec
+          << static_cast<uint16_t>(smu.common_header.structure_size) << "\n";
+          std::cout << "format_revision=" << std::dec
+          << static_cast<uint16_t>(smu.common_header.format_revision) << "\n";
+          std::cout << "content_revision=" << std::dec
+          << static_cast<uint16_t>(smu.common_header.content_revision) << "\n";
+
+          std::cout << "\n";
+          std::cout << "TIME STAMPS (ns):\n";
+          std::cout << std::dec << "system_clock_counter=" << smu.system_clock_counter << "\n";
+          std::cout << "firmware_timestamp (10ns resolution)=" << std::dec << smu.firmware_timestamp
+                    << "\n";
+
+          std::cout << "\n";
+          std::cout << "TEMPERATURES (C):\n";
+          std::cout << std::dec << "temperature_edge= " << smu.temperature_edge << "\n";
+          std::cout << std::dec << "temperature_hotspot= " << smu.temperature_hotspot << "\n";
+          std::cout << std::dec << "temperature_mem= " << smu.temperature_mem << "\n";
+          std::cout << std::dec << "temperature_vrgfx= " << smu.temperature_vrgfx << "\n";
+          std::cout << std::dec << "temperature_vrsoc= " << smu.temperature_vrsoc << "\n";
+          std::cout << std::dec << "temperature_vrmem= " << smu.temperature_vrmem << "\n";
+          std::cout << "temperature_hbm = [";
+          std::copy(std::begin(smu.temperature_hbm),
+                    std::end(smu.temperature_hbm),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << "\n";
+          std::cout << "UTILIZATION (%):\n";
+          std::cout << std::dec << "average_gfx_activity=" << smu.average_gfx_activity << "\n";
+          std::cout << std::dec << "average_umc_activity=" << smu.average_umc_activity << "\n";
+          std::cout << std::dec << "average_mm_activity=" << smu.average_mm_activity << "\n";
+          std::cout << std::dec << "vcn_activity= [";
+          std::copy(std::begin(smu.vcn_activity),
+                    std::end(smu.vcn_activity),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << "\n";
+          std::cout << std::dec << "jpeg_activity= [";
+          std::copy(std::begin(smu.jpeg_activity),
+                    std::end(smu.jpeg_activity),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << "\n";
+          std::cout << "POWER (W)/ENERGY (15.259uJ per 1ns):\n";
+          std::cout << std::dec << "average_socket_power=" << smu.average_socket_power << "\n";
+          std::cout << std::dec << "current_socket_power=" << smu.current_socket_power << "\n";
+          std::cout << std::dec << "energy_accumulator=" << smu.energy_accumulator << "\n";
+
+          std::cout << "\n";
+          std::cout << "AVG CLOCKS (MHz):\n";
+          std::cout << std::dec << "average_gfxclk_frequency=" << smu.average_gfxclk_frequency
+                    << "\n";
+          std::cout << std::dec << "average_gfxclk_frequency=" << smu.average_gfxclk_frequency
+                    << "\n";
+          std::cout << std::dec << "average_uclk_frequency=" << smu.average_uclk_frequency << "\n";
+          std::cout << std::dec << "average_vclk0_frequency=" << smu.average_vclk0_frequency
+                    << "\n";
+          std::cout << std::dec << "average_dclk0_frequency=" << smu.average_dclk0_frequency
+                    << "\n";
+          std::cout << std::dec << "average_vclk1_frequency=" << smu.average_vclk1_frequency
+                    << "\n";
+          std::cout << std::dec << "average_dclk1_frequency=" << smu.average_dclk1_frequency
+                    << "\n";
+
+          std::cout << "\n";
+          std::cout << "CURRENT CLOCKS (MHz):\n";
+          std::cout << std::dec << "current_gfxclk=" << smu.current_gfxclk << "\n";
+          std::cout << std::dec << "current_gfxclks= [";
+          std::copy(std::begin(smu.current_gfxclks),
+                    std::end(smu.current_gfxclks),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << std::dec << "current_socclk=" << smu.current_socclk << "\n";
+          std::cout << std::dec << "current_socclks= [";
+          std::copy(std::begin(smu.current_socclks),
+                    std::end(smu.current_socclks),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << std::dec << "current_uclk=" << smu.current_uclk << "\n";
+          std::cout << std::dec << "current_vclk0=" << smu.current_vclk0 << "\n";
+          std::cout << std::dec << "current_vclk0s= [";
+          std::copy(std::begin(smu.current_vclk0s),
+                    std::end(smu.current_vclk0s),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << std::dec << "current_dclk0=" << smu.current_dclk0 << "\n";
+          std::cout << std::dec << "current_dclk0s= [";
+          std::copy(std::begin(smu.current_dclk0s),
+                    std::end(smu.current_dclk0s),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << std::dec << "current_vclk1=" << smu.current_vclk1 << "\n";
+          std::cout << std::dec << "current_dclk1=" << smu.current_dclk1 << "\n";
+
+          std::cout << "\n";
+          std::cout << "TROTTLE STATUS:\n";
+          std::cout << std::dec << "throttle_status=" << smu.throttle_status << "\n";
+
+          std::cout << "\n";
+          std::cout << "FAN SPEED:\n";
+          std::cout << std::dec << "current_fan_speed=" << smu.current_fan_speed << "\n";
+
+          std::cout << "\n";
+          std::cout << "LINK WIDTH (number of lanes) /SPEED (0.1 GT/s):\n";
+          std::cout << "pcie_link_width=" << smu.pcie_link_width << "\n";
+          std::cout << "pcie_link_speed=" << smu.pcie_link_speed << "\n";
+          std::cout << "xgmi_link_width=" << smu.xgmi_link_width << "\n";
+          std::cout << "xgmi_link_speed=" << smu.xgmi_link_speed << "\n";
+
+          std::cout << "\n";
+          std::cout << "Utilization Accumulated(%):\n";
+          std::cout << "gfx_activity_acc=" << std::dec << smu.gfx_activity_acc << "\n";
+          std::cout << "mem_activity_acc=" << std::dec << smu.mem_activity_acc  << "\n";
+
+          std::cout << "\n";
+          std::cout << "XGMI ACCUMULATED DATA TRANSFER SIZE (KB):\n";
+          std::cout << std::dec << "xgmi_read_data_acc= [";
+          std::copy(std::begin(smu.xgmi_read_data_acc),
+                    std::end(smu.xgmi_read_data_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << std::dec << "xgmi_write_data_acc= [";
+          std::copy(std::begin(smu.xgmi_write_data_acc),
+                    std::end(smu.xgmi_write_data_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          std::cout << std::dec << "xgmi_link_status= [";
+          std::copy(std::begin(smu.xgmi_link_status),
+                    std::end(smu.xgmi_link_status),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+          std::cout << std::dec << "]\n";
+
+          // Voltage (mV)
+          std::cout << "voltage_soc = " << std::dec << smu.voltage_soc << "\n";
+          std::cout << "voltage_gfx = " << std::dec << smu.voltage_gfx << "\n";
+          std::cout << "voltage_mem = " << std::dec << smu.voltage_mem << "\n";
+
+          std::cout << "indep_throttle_status = " << std::dec << smu.indep_throttle_status << "\n";
+
+          // Clock Lock Status. Each bit corresponds to clock instance
+          std::cout << "gfxclk_lock_status (in hex) = " << std::hex
+                    << smu.gfxclk_lock_status << std::dec <<"\n";
+
+          // Bandwidth (GB/sec)
+          std::cout << "pcie_bandwidth_acc=" << std::dec << smu.pcie_bandwidth_acc << "\n";
+          std::cout << "pcie_bandwidth_inst=" << std::dec << smu.pcie_bandwidth_inst << "\n";
+
+          // VRAM max bandwidth at max memory clock (GB/sec)
+          std::cout << "vram_max_bandwidth=" << std::dec << smu.vram_max_bandwidth << "\n";
+
+          // Counts
+          std::cout << "pcie_l0_to_recov_count_acc= " << std::dec << smu.pcie_l0_to_recov_count_acc
+                    << "\n";
+          std::cout << "pcie_replay_count_acc= " << std::dec << smu.pcie_replay_count_acc << "\n";
+          std::cout << "pcie_replay_rover_count_acc= " << std::dec
+                    << smu.pcie_replay_rover_count_acc << "\n";
+          std::cout << "pcie_nak_sent_count_acc= " << std::dec << smu.pcie_nak_sent_count_acc
+                    << "\n";
+          std::cout << "pcie_nak_rcvd_count_acc= " << std::dec << smu.pcie_nak_rcvd_count_acc
+                    << "\n";
+
+          // Accumulation cycle counter
+          // Accumulated throttler residencies
+          std::cout << "\n";
+          std::cout << "RESIDENCY ACCUMULATION / COUNTER:\n";
+          std::cout << "accumulation_counter = " << std::dec << smu.accumulation_counter << "\n";
+          std::cout << "prochot_residency_acc = " << std::dec << smu.prochot_residency_acc << "\n";
+          std::cout << "ppt_residency_acc = " << std::dec << smu.ppt_residency_acc << "\n";
+          std::cout << "socket_thm_residency_acc = " << std::dec << smu.socket_thm_residency_acc
+                    << "\n";
+          std::cout << "vr_thm_residency_acc = " << std::dec << smu.vr_thm_residency_acc
+                    << "\n";
+          std::cout << "hbm_thm_residency_acc = " << std::dec << smu.hbm_thm_residency_acc << "\n";
+
+          // Number of current partitions
+          std::cout << "num_partition = " << std::dec << smu.num_partition << "\n";
+
+          // PCIE other end recovery counter
+          std::cout << "pcie_lc_perf_other_end_recovery = "
+                    << std::dec << smu.pcie_lc_perf_other_end_recovery << "\n";
+
+          std::cout << std::dec << "xcp_stats.gfx_busy_inst = \n";
+          auto xcp = 0;
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.gfx_busy_inst),
+                    std::end(row.gfx_busy_inst),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.jpeg_busy = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.jpeg_busy),
+                    std::end(row.jpeg_busy),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.vcn_busy = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.vcn_busy),
+                    std::end(row.vcn_busy),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.gfx_busy_acc = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.gfx_busy_acc),
+                    std::end(row.gfx_busy_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.gfx_below_host_limit_acc = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.gfx_below_host_limit_acc),
+                    std::end(row.gfx_below_host_limit_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+          // new for gpu metrics v1.8
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.gfx_below_host_limit_ppt_acc = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.gfx_below_host_limit_ppt_acc),
+                    std::end(row.gfx_below_host_limit_ppt_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.gfx_below_host_limit_thm_acc = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.gfx_below_host_limit_thm_acc),
+                    std::end(row.gfx_below_host_limit_thm_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.gfx_low_utilization_acc = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.gfx_low_utilization_acc),
+                    std::end(row.gfx_low_utilization_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          xcp = 0;
+          std::cout << std::dec << "xcp_stats.gfx_below_host_limit_total_acc = \n";
+          for (auto& row : smu.xcp_stats) {
+            std::cout << "XCP[" << xcp << "] = " << "[ ";
+            std::copy(std::begin(row.gfx_below_host_limit_total_acc),
+                    std::end(row.gfx_below_host_limit_total_acc),
+                    amd::smi::make_ostream_joiner(&std::cout, ", "));
+            std::cout << " ]\n";
+            xcp++;
+          }
+
+          std::cout << "\n\n";
+          std::cout << "\t ** -> Checking metrics with constant changes ** " << "\n";
+          constexpr uint16_t kMAX_ITER_TEST = 10;
+          amdsmi_gpu_metrics_t gpu_xcp_metrics_check = {};
+          for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
+              amdsmi_get_gpu_metrics_info(processor_handles_[i], &gpu_xcp_metrics_check);
+              std::cout << "\t\t -> firmware_timestamp [" << idx << "/" << kMAX_ITER_TEST << "]: "
+                        << gpu_xcp_metrics_check.firmware_timestamp << "\n";
+          }
+
+          std::cout << "\n";
+          for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
+              amdsmi_get_gpu_partition_metrics_info(processor_handles_[i], &gpu_xcp_metrics_check);
+              std::cout << "\t\t -> system_clock_counter [" << idx << "/" << kMAX_ITER_TEST << "]: "
+                        << gpu_xcp_metrics_check.system_clock_counter << "\n";
+          }
+
+          std::cout << "\n";
+          std::cout << " ** Note: Values MAX'ed out "
+                    << "(UINTX MAX are unsupported for the version in question) ** " << "\n\n";
+      }
+    }
+
+    // Verify api support checking functionality is working
+    err =  amdsmi_get_gpu_partition_metrics_info(processor_handles_[i], nullptr);
+    if (err !=AMDSMI_STATUS_INVAL) {
+      DISPLAY_AMDSMI_ERR(err);
+    }
+    amdsmi_status_code_to_string(err, &status_string);
+    std::cout << "\t\t** amdsmi_get_gpu_partition_metrics_info(nullptr check): " << status_string << "\n";
+    ASSERT_EQ(err, AMDSMI_STATUS_INVAL);
+  }
+}
@@ -0,0 +1,51 @@
+/*
+ * Copyright (c) Advanced Micro Devices, Inc. All rights reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TESTS_AMD_SMI_TEST_FUNCTIONAL_GPU_PARTITION_METRICS_READ_H_
+#define TESTS_AMD_SMI_TEST_FUNCTIONAL_GPU_PARTITION_METRICS_READ_H_
+
+#include "../test_base.h"
+
+class TestGpuPartitionMetricsRead : public TestBase {
+ public:
+    TestGpuPartitionMetricsRead();
+
+  // @Brief: Destructor for test case of TestGpuPartitionMetricsRead
+  virtual ~TestGpuPartitionMetricsRead();
+
+  // @Brief: Setup the environment for measurement
+  virtual void SetUp();
+
+  // @Brief: Core measurement execution
+  virtual void Run();
+
+  // @Brief: Clean up and retrive the resource
+  virtual void Close();
+
+  // @Brief: Display  results
+  virtual void DisplayResults() const;
+
+  // @Brief: Display information about what this test does
+  virtual void DisplayTestInfo(void);
+};
+
+#endif  // TESTS_AMD_SMI_TEST_FUNCTIONAL_GPU_PARTITION_METRICS_READ_H_
@@ -37,6 +37,7 @@
 #include "functional/process_info_read.h"
 #include "functional/gpu_busy_read.h"
 #include "functional/gpu_metrics_read.h"
+#include "functional/gpu_partition_metrics_read.h"
 #include "functional/err_cnt_read.h"
 #include "functional/power_read.h"
 #include "functional/power_read_write.h"
@@ -224,6 +225,10 @@ TEST(amdsmitstReadOnly, TestGpuMetricsRead) {
  TestGpuMetricsRead tst;
  RunGenericTest(&tst);
 }
+TEST(amdsmitstReadOnly, TestGpuPartitionMetricsRead) {
+  TestGpuPartitionMetricsRead tst;
+  RunGenericTest(&tst);
+}
 TEST(amdsmitstReadOnly, TestMetricsCounterRead) {
  TestMetricsCounterRead tst;
  RunGenericTest(&tst);
@@ -282,7 +282,23 @@ void TestBase::PrintDeviceHeader(amdsmi_processor_handle dv_ind) {
    }
  }

-  std::cout << std::setbase(10);
+  amdsmi_kfd_info_t kfd_info;
+  err = amdsmi_get_gpu_kfd_info(dv_ind, &kfd_info);
+  if (err == AMDSMI_STATUS_NOT_SUPPORTED) {
+    IF_VERB(STANDARD) {
+      std::cout << "\t**KFD info: " << smi_amdgpu_get_status_string(err, false) << std::endl;
+    }
+    ASSERT_EQ(err, AMDSMI_STATUS_NOT_SUPPORTED);
+  } else {
+    CHK_ERR_ASRT(err)
+    IF_VERB(STANDARD) {
+      std::cout << "\t**KFD info: " << std::endl;
+      std::cout << "\t\t**GPU ID: " << std::dec << kfd_info.kfd_id << std::endl;
+      std::cout << "\t\t**Node ID: " << std::dec << kfd_info.node_id << std::endl;
+      std::cout << "\t\t**Partition ID: "
+                << std::dec << kfd_info.current_partition_id << std::endl;
+    }
+  }
 }
 void TestBase::Run(void) {
  std::string label;
@@ -1581,8 +1581,6 @@ class TestAmdSmiPython(unittest.TestCase):

    def test_get_gpu_metrics_info(self):
        self._print_func_name('')
-        if self.TODO_SKIP_FAIL:
-            self.skipTest("Skipping test_get_gpu_metrics_info as it fails (MI350X, AMDSMI_STATUS_UNEXPECTED_DATA).")
        for i, gpu in enumerate(self.processors):
            msg = f'gpu({i}):'
            try:
@@ -1595,6 +1593,19 @@ class TestAmdSmiPython(unittest.TestCase):
            raise self.raise_exception
        return

+    def test_get_gpu_partition_metrics_info(self):
+        self._print_func_name('')
+        for i, gpu in enumerate(self.processors):
+            try:
+                msg = f'gpu({i}): '
+                ret = amdsmi.amdsmi_get_gpu_partition_metrics_info(gpu)
+                self._print(msg, ret)
+            except amdsmi.AmdSmiLibraryException as e:
+                if self._check_ret(msg, e, self.PASS):
+                    self.raise_exception = e
+        if self.raise_exception:
+            raise self.raise_exception
+
    def test_get_gpu_od_volt_curve_regions(self):
        self._print_func_name('')
        num_region = 10
@@ -3110,6 +3121,8 @@ class TestAmdSmiPython(unittest.TestCase):

    def test_set_gpu_perf_level(self):
        self._print_func_name('')
+        if self.TODO_SKIP_NOT_COMPLETE:
+            self.skipTest("Skipping test_set_gpu_perf_level as it is not complete.")
        dev_perf_level_current = self.dev_perf_levels[0][1]
        for i, gpu in enumerate(self.processors):
            msg = f'gpu({i}):'