diff --git a/projects/amdsmi/CHANGELOG.md b/projects/amdsmi/CHANGELOG.md index 00728e5308..b7a3a85d55 100644 --- a/projects/amdsmi/CHANGELOG.md +++ b/projects/amdsmi/CHANGELOG.md @@ -4,11 +4,127 @@ Full documentation for amd_smi_lib is available at [https://rocm.docs.amd.com/]( ***All information listed below is for reference and subject to change.*** +## amd_smi_lib for ROCm 6.2.0 + +### Changed + +Output for `amd-smi metric --clock` is updated to reflect each engine and bug fixes for the clock lock status and deep sleep status. + +``` shell +$ amd-smi metric --clock +GPU: 0 + CLOCK: + GFX_0: + CLK: 113 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + GFX_1: + CLK: 113 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + GFX_2: + CLK: 112 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + GFX_3: + CLK: 113 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + GFX_4: + CLK: 113 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + GFX_5: + CLK: 113 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + GFX_6: + CLK: 113 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + GFX_7: + CLK: 113 MHz + MIN_CLK: 500 MHz + MAX_CLK: 1800 MHz + CLK_LOCKED: DISABLED + DEEP_SLEEP: ENABLED + MEM_0: + CLK: 900 MHz + MIN_CLK: 900 MHz + MAX_CLK: 1200 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: DISABLED + VCLK_0: + CLK: 29 MHz + MIN_CLK: 914 MHz + MAX_CLK: 1480 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED + VCLK_1: + CLK: 29 MHz + MIN_CLK: 914 MHz + MAX_CLK: 1480 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED + VCLK_2: + CLK: 29 MHz + MIN_CLK: 914 MHz + MAX_CLK: 1480 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED + VCLK_3: + CLK: 29 MHz + MIN_CLK: 914 MHz + MAX_CLK: 1480 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED + DCLK_0: + CLK: 22 MHz + MIN_CLK: 711 MHz + MAX_CLK: 1233 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED + DCLK_1: + CLK: 22 MHz + MIN_CLK: 711 MHz + MAX_CLK: 1233 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED + DCLK_2: + CLK: 22 MHz + MIN_CLK: 711 MHz + MAX_CLK: 1233 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED + DCLK_3: + CLK: 22 MHz + MIN_CLK: 711 MHz + MAX_CLK: 1233 MHz + CLK_LOCKED: N/A + DEEP_SLEEP: ENABLED +``` + ## amd_smi_lib for ROCm 6.1.0 ### Added + - **Added Monitor Command** Provides users the ability to customize GPU metrics to capture, collect, and observe. Output is provided in a table view. This aligns closer to ROCm SMI `rocm-smi` (no argument), additionally allows uers to customize what data is helpful for their use-case. + ```shell $ amd-smi monitor -h usage: amd-smi monitor [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] @@ -52,6 +168,7 @@ Command Modifiers: --loglevel LEVEL Set the logging level from the possible choices: DEBUG, INFO, WARNING, ERROR, CRITICAL ``` + ```shell $ amd-smi monitor -ptumv GPU POWER GPU_TEMP MEM_TEMP GFX_UTIL GFX_CLOCK MEM_UTIL MEM_CLOCK VRAM_USED VRAM_TOTAL @@ -80,6 +197,7 @@ CPU: 0 INTERFACE_VERSION: PROTO VERSION: 6 ``` + ```shell $ amd-smi metric -O 0 1 2 CORE: 0 @@ -106,6 +224,7 @@ CORE: 2 CORE_ENERGY: VALUE: N/A ``` + ```shell $ amd-smi metric -U all CPU: 0 @@ -212,6 +331,7 @@ CPU: 0 CPU_TEMP: RESPONSE: N/A ``` + - **Added support for new metrics: VCN, JPEG engines, and PCIe errors** Using the AMD SMI tool, users can retreive VCN, JPEG engines, and PCIe errors by calling `amd-smi metric -P` or `amd-smi metric --usage`. Depending on device support, `VCN_ACTIVITY` will update for MI3x ASICs (with 4 separate VCN engine activities) for older asics `MM_ACTIVITY` with UVD/VCN engine activity (average of all engines). `JPEG_ACTIVITY` is a new field for MI3x ASICs, where device can support up to 32 JPEG engine activities. See our documentation for more in-depth understanding of these new fields. @@ -230,6 +350,7 @@ GPU: 0 CURRENT_BANDWIDTH_RECEIVED: N/A MAX_PACKET_SIZE: N/A ``` + ```shell $ amd-smi metric --usage GPU: 0 @@ -243,11 +364,13 @@ GPU: 0 0 %, 0 %, 0 %, 0 %] ``` + - **Added AMDSMI Tool Version** AMD SMI will report ***three versions***: AMDSMI Tool, AMDSMI Library version, and ROCm version. The AMDSMI Tool version is the CLI/tool version number with commit ID appended after `+` sign. The AMDSMI Library version is the library package version number. The ROCm version is the system's installed ROCm version, if ROCm is not installed it will report N/A. + ```shell $ amd-smi version AMDSMI Tool: 23.4.2+505b858 | AMDSMI Library version: 24.2.0.0 | ROCm version: 6.1.0 @@ -255,6 +378,7 @@ AMDSMI Tool: 23.4.2+505b858 | AMDSMI Library version: 24.2.0.0 | ROCm version: 6 - **Added XGMI table** Displays XGMI information for AMD GPU devices in a table format. Only available on supported ASICs (eg. MI300). Here users can view read/write data XGMI or PCIe accumulated data transfer size (in KiloBytes). + ```shell $ amd-smi xgmi LINK METRIC TABLE: @@ -285,10 +409,12 @@ GPU7 0000:df:00.0 32 Gb/s 512 Gb/s XGMI Write 0 KB 1 KB 1 KB 1 KB 1 KB 1 KB 1 KB N/A ``` + - **Added units of measure to JSON output.** We added unit of measure to JSON/CSV `amd-smi metric`, `amd-smi static`, and `amd-smi monitor` commands. Ex. + ```shell amd-smi metric -p --json [ @@ -321,7 +447,8 @@ amd-smi metric -p --json ### Changed - **Topology is now left-aligned with BDF of each device listed individual table's row/coloumns.** -We provided each device's BDF for every table's row/columns, then left aligned data. We want AMD SMI Tool output to be easy to understand and digest for our users. Having users scroll up to find this information made it difficult to follow, especially for devices which have many devices associated with one ASIC. +We provided each device's BDF for every table's row/columns, then left aligned data. We want AMD SMI Tool output to be easy to understand and digest for our users. Having users scroll up to find this information made it difficult to follow, especially for devices which have many devices associated with one ASIC. + ```shell $ amd-smi topology ACCESS TABLE: @@ -381,6 +508,7 @@ NUMA BW TABLE: ``` ### Optimizations + - N/A ### Fixed @@ -394,14 +522,14 @@ Platforms which are identified as having an older pyyaml version or pip, we no m - `amd-smi firmware` - `amd-smi metric` - `amd-smi topology` + ```shell TypeError: dump_all() got an unexpected keyword argument 'sort_keys' ``` + - **Fix for crash when user is not a member of video/render groups** AMD SMI now uses same mutex handler for devices as rocm-smi. This helps avoid crashes when DRM/device data is inaccessable to the logged in user. - - ### Known Issues - N/A @@ -419,7 +547,6 @@ You can now query MI300 device metrics to get real-time information. Metrics inc - **Compute and memory partition support** Users can now view, set, and reset partitions. The topology display can provide a more in-depth look at the device's current configuration. - ### Changed - **GPU index sorting made consistent with other tools** @@ -437,7 +564,6 @@ Now the information is displayed as a table by each GPU's BDF, which closer rese - **Fix for driver not initialized** If driver module is not loaded, user retrieve error reponse indicating amdgpu module is not loaded. - ### Known Issues - N/A diff --git a/projects/amdsmi/amdsmi_cli/amdsmi_commands.py b/projects/amdsmi/amdsmi_cli/amdsmi_commands.py index 689b3fa55f..fce9e85214 100644 --- a/projects/amdsmi/amdsmi_cli/amdsmi_commands.py +++ b/projects/amdsmi/amdsmi_cli/amdsmi_commands.py @@ -1344,73 +1344,189 @@ class AMDSMICommands(): values_dict['power'] = power_dict if "clock" in current_platform_args: if args.clock: + # Populate Skeleton output with N/A clocks = {} - clock_types = [amdsmi_interface.AmdSmiClkType.GFX, - amdsmi_interface.AmdSmiClkType.MEM, - amdsmi_interface.AmdSmiClkType.VCLK0, - amdsmi_interface.AmdSmiClkType.VCLK1] - for clock_type in clock_types: - clock_name = amdsmi_interface.amdsmi_wrapper.amdsmi_clk_type_t__enumvalues[clock_type].replace("CLK_TYPE_", "") - # Ensure that gfx is the clock_name instead of another macro - if clock_type == amdsmi_interface.AmdSmiClkType.GFX: - clock_name = "gfx" - # Store the clock_name for vclk0 - vlck0_clock_name = None - if clock_type == amdsmi_interface.AmdSmiClkType.VCLK0: - vlck0_clock_name = clock_name + for clock_index in range(amdsmi_interface.AMDSMI_MAX_NUM_GFX_CLKS): + gfx_index = f"gfx_{clock_index}" + clocks[gfx_index] = {"clk" : "N/A", + "min_clk" : "N/A", + "max_clk" : "N/A", + "clk_locked" : "N/A", + "deep_sleep" : "N/A"} - try: - clock_info_dict = amdsmi_interface.amdsmi_get_clock_info(args.gpu, clock_type) - clock_info = {"clk" : clock_info_dict["cur_clk"]} - del clock_info_dict["cur_clk"] - clock_info.update(clock_info_dict) + clocks["mem_0"] = {"clk" : "N/A", + "min_clk" : "N/A", + "max_clk" : "N/A", + "clk_locked" : "N/A", + "deep_sleep" : "N/A"} - if clock_info['sleep_clk'] == 0xFFFFFFFF: - clock_info['sleep_clk'] = "N/A" + for clock_index in range(amdsmi_interface.AMDSMI_MAX_NUM_CLKS): + vclk_index = f"vclk_{clock_index}" + clocks[vclk_index] = {"clk" : "N/A", + "min_clk" : "N/A", + "max_clk" : "N/A", + "clk_locked" : "N/A", + "deep_sleep" : "N/A"} - clock_freq_unit = 'MHz' - for key, value in clock_info.items(): - if isinstance(value, int): - if self.logger.is_human_readable_format(): - clock_info[key] = f"{value} {clock_freq_unit}" - if self.logger.is_json_format(): - clock_info[key] = {"value" : value, - "unit" : clock_freq_unit} + for clock_index in range(amdsmi_interface.AMDSMI_MAX_NUM_CLKS): + dclk_index = f"dclk_{clock_index}" + clocks[dclk_index] = {"clk" : "N/A", + "min_clk" : "N/A", + "max_clk" : "N/A", + "clk_locked" : "N/A", + "deep_sleep" : "N/A"} - clocks[clock_name] = clock_info - except amdsmi_exception.AmdSmiLibraryException as e: - # Handle the case where VCLK1 is not enaled in sysfs on all GPUs - if clock_type == amdsmi_interface.AmdSmiClkType.VCLK1: - # Check if VCLK0 was retrieved successfully - if vlck0_clock_name in clocks: - # Since VCLK0 exists, do not error - logging.debug("VLCK0 exists, not adding %s clock info to output for gpu %s | %s", clock_name, gpu_id, e.get_error_info()) - continue - else: - # Handle all other failed to get clock info - clocks[clock_name] = {"clk": "N/A", - "max_clk": "N/A", - "min_clk": "N/A", - "sleep_clk": "N/A"} - logging.debug("Failed to get %s clock info for gpu %s | %s", clock_name, gpu_id, e.get_error_info()) + clock_unit = "MHz" + # TODO make the deepsleep threshold correspond to the * in sysfs for current deep sleep status + deep_sleep_threshold = 140 + # Populate clock values from gpu_metrics_info try: - gfxclk_lock_status = amdsmi_interface.amdsmi_get_gpu_metrics_info(args.gpu)['gfxclk_lock_status'] - if gfxclk_lock_status != "N/A": - if gfxclk_lock_status: - gfxclk_lock_status = "ENABLED" - else: - gfxclk_lock_status = "DISABLED" - except amdsmi_exception.AmdSmiLibraryException as e: - gfxclk_lock_status = "N/A" - logging.debug("Failed to get gfx clock lock status info for gpu %s | %s", gpu_id, e.get_error_info()) + gpu_metrics_info = amdsmi_interface.amdsmi_get_gpu_metrics_info(args.gpu) - if "gfx" in clocks: - if isinstance(clocks['gfx'], dict): - clocks['gfx']['clk_locked'] = gfxclk_lock_status - else: - clocks['gfx'] = {"clk_locked": gfxclk_lock_status} + # Populate GFX clock values + current_gfx_clocks = gpu_metrics_info["current_gfxclks"] + for clock_index, current_gfx_clock in enumerate(current_gfx_clocks): + # If the current clock is N/A then nothing else applies + if current_gfx_clock == "N/A": + continue + + gfx_index = f"gfx_{clock_index}" + clocks[gfx_index]["clk"] = self.helpers.unit_format(self.logger, + current_gfx_clock, + clock_unit) + + # Populate clock locked status + if gpu_metrics_info["gfxclk_lock_status"] != "N/A": + gfx_clock_lock_flag = 1 << clock_index # This is the position of the clock lock flag + if gpu_metrics_info["gfxclk_lock_status"] & gfx_clock_lock_flag: + clocks[gfx_index]["clk_locked"] = "ENABLED" + else: + clocks[gfx_index]["clk_locked"] = "DISABLED" + + # Populate deep sleep status + if int(current_gfx_clock) <= deep_sleep_threshold: + clocks[gfx_index]["deep_sleep"] = "ENABLED" + else: + clocks[gfx_index]["deep_sleep"] = "DISABLED" + + # Populate MEM clock value + current_mem_clock = gpu_metrics_info["current_uclk"] # single value + if current_mem_clock != "N/A": + clocks["mem_0"]["clk"] = self.helpers.unit_format(self.logger, + current_mem_clock, + clock_unit) + + if int(current_mem_clock) <= deep_sleep_threshold: + clocks["mem_0"]["deep_sleep"] = "ENABLED" + else: + clocks["mem_0"]["deep_sleep"] = "DISABLED" + + # Populate VCLK clock values + current_vclk_clocks = gpu_metrics_info["current_vclk0s"] + for clock_index, current_vclk_clock in enumerate(current_vclk_clocks): + # If the current clock is N/A then nothing else applies + if current_vclk_clock == "N/A": + continue + + vclk_index = f"vclk_{clock_index}" + clocks[vclk_index]["clk"] = self.helpers.unit_format(self.logger, + current_vclk_clock, + clock_unit) + + if int(current_vclk_clock) <= deep_sleep_threshold: + clocks[vclk_index]["deep_sleep"] = "ENABLED" + else: + clocks[vclk_index]["deep_sleep"] = "DISABLED" + + # Populate DCLK clock values + current_dclk_clocks = gpu_metrics_info["current_dclk0s"] + for clock_index, current_dclk_clock in enumerate(current_dclk_clocks): + # If the current clock is N/A then nothing else applies + if current_dclk_clock == "N/A": + continue + + dclk_index = f"dclk_{clock_index}" + clocks[dclk_index]["clk"] = self.helpers.unit_format(self.logger, + current_dclk_clock, + clock_unit) + + if int(current_dclk_clock) <= deep_sleep_threshold: + clocks[dclk_index]["deep_sleep"] = "ENABLED" + else: + clocks[dclk_index]["deep_sleep"] = "DISABLED" + except amdsmi_exception.AmdSmiLibraryException as e: + logging.debug("Failed to get gpu_metrics_info for gpu %s | %s", gpu_id, e.get_error_info()) + + # Populate the max and min clock values from sysfs + # Min and Max values are per clock type, not per clock engine + + # GFX min and max clocks + try: + gfx_clock_info_dict = amdsmi_interface.amdsmi_get_clock_info(args.gpu, + amdsmi_interface.AmdSmiClkType.GFX) + + for clock_index in range(amdsmi_interface.AMDSMI_MAX_NUM_GFX_CLKS): + gfx_index = f"gfx_{clock_index}" + if clocks[gfx_index]["clk"] == "N/A": + # if the current clock is N/A then we shouldn't populate the max and min values + continue + + clocks[gfx_index]["min_clk"] = self.helpers.unit_format(self.logger, + gfx_clock_info_dict["min_clk"], + clock_unit) + clocks[gfx_index]["max_clk"] = self.helpers.unit_format(self.logger, + gfx_clock_info_dict["max_clk"], + clock_unit) + except amdsmi_exception.AmdSmiLibraryException as e: + logging.debug("Failed to get gfx clock info for gpu %s | %s", gpu_id, e.get_error_info()) + + # MEM min and max clocks + try: + mem_clock_info_dict = amdsmi_interface.amdsmi_get_clock_info(args.gpu, + amdsmi_interface.AmdSmiClkType.MEM) + + # if the current clock is N/A then we shouldn't populate the max and min values + if clocks["mem_0"]["clk"] != "N/A": + clocks["mem_0"]["min_clk"] = self.helpers.unit_format(self.logger, + mem_clock_info_dict["min_clk"], + clock_unit) + clocks["mem_0"]["max_clk"] = self.helpers.unit_format(self.logger, + mem_clock_info_dict["max_clk"], + clock_unit) + except amdsmi_exception.AmdSmiLibraryException as e: + logging.debug("Failed to get mem clock info for gpu %s | %s", gpu_id, e.get_error_info()) + + # VCLK & DCLK min and max clocks + try: + vclk0_clock_info_dict = amdsmi_interface.amdsmi_get_clock_info(args.gpu, + amdsmi_interface.AmdSmiClkType.VCLK0) + + dclk0_clock_info_dict = amdsmi_interface.amdsmi_get_clock_info(args.gpu, + amdsmi_interface.AmdSmiClkType.DCLK0) + + for clock_index in range(amdsmi_interface.AMDSMI_MAX_NUM_CLKS): + vclk_index = f"vclk_{clock_index}" + # if the current clock is N/A then we shouldn't populate the max and min values + if clocks[vclk_index]["clk"] != "N/A": + clocks[vclk_index]["min_clk"] = self.helpers.unit_format(self.logger, + vclk0_clock_info_dict["min_clk"], + clock_unit) + clocks[vclk_index]["max_clk"] = self.helpers.unit_format(self.logger, + vclk0_clock_info_dict["max_clk"], + clock_unit) + + dclk_index = f"dclk_{clock_index}" + if clocks[dclk_index]["clk"] != "N/A": + clocks[dclk_index]["min_clk"] = self.helpers.unit_format(self.logger, + dclk0_clock_info_dict["min_clk"], + clock_unit) + clocks[dclk_index]["max_clk"] = self.helpers.unit_format(self.logger, + dclk0_clock_info_dict["max_clk"], + clock_unit) + except amdsmi_exception.AmdSmiLibraryException as e: + logging.debug("Failed to get vclk and/or dclk clock info for gpu %s | %s", gpu_id, e.get_error_info()) values_dict['clock'] = clocks if "temperature" in current_platform_args: @@ -4116,7 +4232,7 @@ class AMDSMICommands(): for xgmi_dict in xgmi_values: src_gpu_id = xgmi_dict['gpu'] src_gpu_bdf = xgmi_dict['bdf'] - src_gpu = amdsmi_interface.amdsmi_get_processor_handle_from_bdf(src_gpu_bdf) #TODO VERIFY this is correct + src_gpu = amdsmi_interface.amdsmi_get_processor_handle_from_bdf(src_gpu_bdf) logging.debug("check2 device_handle: %s", src_gpu) # This should be the same order as the check1 @@ -4256,7 +4372,7 @@ class AMDSMICommands(): self.logger.multiple_device_output = xgmi_values - if self.logger.is_csv_format(): # @TODO Test topology override needed + if self.logger.is_csv_format(): new_output = [] for elem in self.logger.multiple_device_output: new_output.append(self.logger.flatten_dict(elem, topology_override=True)) diff --git a/projects/amdsmi/amdsmi_cli/amdsmi_helpers.py b/projects/amdsmi/amdsmi_cli/amdsmi_helpers.py index 2083c15530..6383969a6a 100644 --- a/projects/amdsmi/amdsmi_cli/amdsmi_helpers.py +++ b/projects/amdsmi/amdsmi_cli/amdsmi_helpers.py @@ -412,7 +412,7 @@ class AMDSMIHelpers(): return True, selected_device_handles - def handle_gpus(self, args,logger, subcommand): + def handle_gpus(self, args, logger, subcommand): """This function will run execute the subcommands based on the number of gpus passed in via args. params: @@ -708,3 +708,21 @@ class AMDSMIHelpers(): return f"{bytes_input:3.1f} {unit}" bytes_input /= 1024 return f"{bytes_input:.1f} YB" + + + def unit_format(self, logger, value, unit): + """This function will format output with unit based on the logger output format + + params: + args - argparser args to pass to subcommand + logger (AMDSMILogger) - Logger to print out output + value - the value to be formatted + unit - the unit to be formatted with the value + return: + str or dict : formatted output + """ + if logger.is_json_format(): + return {"value": value, "unit": unit} + if logger.is_human_readable_format(): + return f"{value} {unit}" + return f"{value}" diff --git a/projects/amdsmi/py-interface/README.md b/projects/amdsmi/py-interface/README.md index ae9b356810..f8b8b3d2a9 100644 --- a/projects/amdsmi/py-interface/README.md +++ b/projects/amdsmi/py-interface/README.md @@ -2155,7 +2155,7 @@ Output: Dictionary with fields `indep_throttle_status` | ASIC independent throttle status (see drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h for bit flags) | `current_socket_power` | Current socket power (also known as instant socket power) | W `vcn_activity` | List of VCN encode/decode engine utilization per AID | % -`gfxclk_lock_status` | Clock lock status. Each bit corresponds to clock instance. | +`gfxclk_lock_status` | Clock lock status. Bits 0:7 correspond to each gfx clock engine instance. Bits 0:5 for APU/AID devices | `xgmi_link_width` | XGMI bus width | lanes `xgmi_link_speed` | XGMI bitrate | GB/s `pcie_bandwidth_acc` | PCIe accumulated bandwidth | GB/s diff --git a/projects/amdsmi/py-interface/amdsmi_interface.py b/projects/amdsmi/py-interface/amdsmi_interface.py index e3dfa1a47e..98c41f73bc 100644 --- a/projects/amdsmi/py-interface/amdsmi_interface.py +++ b/projects/amdsmi/py-interface/amdsmi_interface.py @@ -3519,7 +3519,7 @@ def amdsmi_get_gpu_metrics_info( if gpu_metrics_output[metric] == 0xFFFF: gpu_metrics_output[metric] = "N/A" - uint_32_metrics = ['gfx_activity_acc','mem_activity_acc', 'pcie_nak_sent_count_acc', 'pcie_nak_rcvd_count_acc'] + uint_32_metrics = ['gfx_activity_acc','mem_activity_acc', 'pcie_nak_sent_count_acc', 'pcie_nak_rcvd_count_acc', 'gfxclk_lock_status'] for metric in uint_32_metrics: if gpu_metrics_output[metric] == 0xFFFFFFFF: gpu_metrics_output[metric] = "N/A" @@ -3533,7 +3533,7 @@ def amdsmi_get_gpu_metrics_info( gpu_metrics_output[metric] = "N/A" # Custom validation for metrics in a bool format - uint_32_bool_metrics = ['throttle_status', 'gfxclk_lock_status'] + uint_32_bool_metrics = ['throttle_status'] for metric in uint_32_bool_metrics: if gpu_metrics_output[metric] == 0xFFFFFFFF: gpu_metrics_output[metric] = "N/A" diff --git a/projects/amdsmi/src/amd_smi/amd_smi.cc b/projects/amdsmi/src/amd_smi/amd_smi.cc index 1dafee87ff..2f56eb45df 100644 --- a/projects/amdsmi/src/amd_smi/amd_smi.cc +++ b/projects/amdsmi/src/amd_smi/amd_smi.cc @@ -1651,6 +1651,12 @@ amdsmi_get_clock_info(amdsmi_processor_handle processor_handle, amdsmi_clk_type_ case CLK_TYPE_VCLK1: info->cur_clk = metrics.current_vclk1; break; + case CLK_TYPE_DCLK0: + info->cur_clk = metrics.current_dclk0; + break; + case CLK_TYPE_DCLK1: + info->cur_clk = metrics.current_dclk1; + break; default: return AMDSMI_STATUS_INVAL; } diff --git a/projects/amdsmi/src/amd_smi/amd_smi_utils.cc b/projects/amdsmi/src/amd_smi/amd_smi_utils.cc index f73a1a7626..13762c3808 100644 --- a/projects/amdsmi/src/amd_smi/amd_smi_utils.cc +++ b/projects/amdsmi/src/amd_smi/amd_smi_utils.cc @@ -229,6 +229,12 @@ amdsmi_status_t smi_amdgpu_get_ranges(amd::smi::AMDSmiGPUDevice* device, amdsmi_ case CLK_TYPE_VCLK1: fullpath += "/pp_dpm_vclk1"; break; + case CLK_TYPE_DCLK0: + fullpath += "/pp_dpm_dclk"; + break; + case CLK_TYPE_DCLK1: + fullpath += "/pp_dpm_dclk1"; + break; default: return AMDSMI_STATUS_INVAL; }