ファイル
rocm-systems/projects/amdsmi/py-interface/__init__.py
T
Poag, Charis ce19b921b0 [SWDEV-535159] Add support for GPU partition metrics (#490)
[SWDEV-535159] Add support for GPU partition metrics

Changes include:
  - Internal logic to smart-switch between gpu_metrics/xcp_metrics files
  - [WIP] Initial plumbing for new partition metric API

Change-Id: I4340fb1b48bac0117d80d5d486b9e871430d5cd8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amdsmi_get_gpu_partition_metrics_info() + minor cleanup

Change-Id: I5d60604f18baddbd03852dc90e88aa0b8107d50e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fix partition metric logic + update logging/tests

Change-Id: I9e89b19ead17694c54e224f8e13ff8ee3eb2e22a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Adjust amd-smi metric/monitor/default to show (some) partition information

Change-Id: I2e8d2745876a19bdaec3c039daa97345c9f701b5
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add C++ tests

Change-Id: Ib9eb0b57a6d7a280992e05a4c6eba632826952ef
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Remove modification of energy counter, not needed

Change-Id: I5c48eaaae248ee6dc79abba609d837ec35d78022
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[CLI] amd-smi metric: cleaned up N/A'd multi-valued to show just N/A

Changes:
1. amd-smi metric: cleaned up N/A'd multi-valued to show just N/A
ex.
JPEG_ACTIVITY: [N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A]

Now just shows: N/A

2. [Python Unit Test] Changed testname TestAmdSmiPythonBDF(unittest.TestCase) ->
 AmdSmiPythonUnitTest

Test name was confusing.

Change-Id: Ieb3b036f30002fd22362508eb9fc5d443df395ae
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Log cleanup

Change-Id: I1b1a95f1844d35bec7a7bd8cb996f87e4914c069
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amd-smi partition-metrics CLI + general cleanup

Change-Id: Ia91488e6cb3a4d62b4087afbddfe0b3bb9378fdc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[1.3 metrics] Remove forwards compatibility for partition metrics

Change-Id: Iab928983e6f6f1587bc9307f6f3fa2b2696ca6f7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fixed violation output not showing % + general cleanup

Change-Id: Icac1b0a55b18c7628b07109ae0c377d17e0825f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Clean up amdsmi_get_gpu_partition_metrics_info & amd-smi partition-metric outputs

Change-Id: I6427028b980874641e9ffb3b5d88ad493dbf9cf4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix metrics not found + extra logging/formatting

Change-Id: I841a27bb2c305e97ec7579a13ac915e5be497c3a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update license to current default

Change-Id: I0de9b8a2d5dbbeab4491097f0354ba17b0d30866
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Cleanup for review

Change-Id: I96ed25c3f2b8968eea1af24c5e5860c2b4e74e6e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Moderize updated/new interal APIs.

Change-Id: I3c48a250eeb703709b14cb5ffa68268d8321626c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove extra logging in dynamic metrics

Change-Id: Idb97547bcbe143d6fa1cb5cb278ffe4da615ce14
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove amd-smi partition-metric command

Change-Id: Ib83c17e5cd7e0da3798198943bddd46c296b411c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Move new CLI updates to another PR + minor fixes

Change-Id: I3b1163eec12f9b5f7d95ee33de08e168cec1b1fe
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dynamic metrics to work for gpu/xcp metrics 1.9+/1.1+

Updated some logging as well.

Change-Id: I2ed9f5a5ef8afb1520508820ca6153525f0644b4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dyn gpu/xcp metric v1.9+/v1.1+

Added tests for quick check

Change-Id: I576d6f6582a55afb08e5ac57791ce95e2fa184a2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update tests for larger subset of version checks

Change-Id: I3cdf4f8bb4fc6161f4c76566939f90545d0f362a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix XCP metrics in gpu/partition metric pre-v1.9/v1.1 (dynamic)

Change-Id: I4dabc1ed6bef6b86c8e7f92bf9cb5992f3966fe2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 01b4fe6614]
2025-10-20 14:43:40 -05:00

317 行
15 KiB
Python

# Copyright (C) Advanced Micro Devices. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy of
# this software and associated documentation files (the "Software"), to deal in
# the Software without restriction, including without limitation the rights to
# use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
# the Software, and to permit persons to whom the Software is furnished to do so,
# subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#``
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
# FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
# COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
# IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
# CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
# Library Version is the tool/amdsmi_interface version
from ._version import __version__
# Library Initialization
from .amdsmi_interface import amdsmi_init
from .amdsmi_interface import amdsmi_shut_down
# Device Discovery
from .amdsmi_interface import amdsmi_get_processor_type
from .amdsmi_interface import amdsmi_get_processor_handles
from .amdsmi_interface import amdsmi_get_socket_handles
from .amdsmi_interface import amdsmi_get_socket_info
from .amdsmi_interface import amdsmi_get_processor_count_from_handles
from .amdsmi_interface import amdsmi_get_processor_handles_by_type
# ESMI Dependent Functions
try:
from .amdsmi_interface import amdsmi_get_cpusocket_handles
from .amdsmi_interface import amdsmi_get_cpucore_handles
from .amdsmi_interface import amdsmi_get_processor_info
from .amdsmi_interface import amdsmi_get_cpu_hsmp_proto_ver
from .amdsmi_interface import amdsmi_get_cpu_smu_fw_version
from .amdsmi_interface import amdsmi_get_cpu_core_energy
from .amdsmi_interface import amdsmi_get_cpu_socket_energy
from .amdsmi_interface import amdsmi_get_threads_per_core
from .amdsmi_interface import amdsmi_get_cpu_hsmp_driver_version
from .amdsmi_interface import amdsmi_get_cpu_prochot_status
from .amdsmi_interface import amdsmi_get_cpu_fclk_mclk
from .amdsmi_interface import amdsmi_get_cpu_cclk_limit
from .amdsmi_interface import amdsmi_get_cpu_socket_current_active_freq_limit
from .amdsmi_interface import amdsmi_get_cpu_socket_freq_range
from .amdsmi_interface import amdsmi_get_cpu_core_current_freq_limit
from .amdsmi_interface import amdsmi_get_cpu_socket_power
from .amdsmi_interface import amdsmi_get_cpu_socket_power_cap
from .amdsmi_interface import amdsmi_get_cpu_socket_power_cap_max
from .amdsmi_interface import amdsmi_get_cpu_pwr_svi_telemetry_all_rails
from .amdsmi_interface import amdsmi_set_cpu_socket_power_cap
from .amdsmi_interface import amdsmi_set_cpu_pwr_efficiency_mode
from .amdsmi_interface import amdsmi_get_cpu_core_boostlimit
from .amdsmi_interface import amdsmi_get_cpu_socket_c0_residency
from .amdsmi_interface import amdsmi_set_cpu_core_boostlimit
from .amdsmi_interface import amdsmi_set_cpu_socket_boostlimit
from .amdsmi_interface import amdsmi_get_cpu_ddr_bw
from .amdsmi_interface import amdsmi_get_cpu_socket_temperature
from .amdsmi_interface import amdsmi_get_cpu_dimm_temp_range_and_refresh_rate
from .amdsmi_interface import amdsmi_get_cpu_dimm_power_consumption
from .amdsmi_interface import amdsmi_get_cpu_dimm_thermal_sensor
from .amdsmi_interface import amdsmi_set_cpu_xgmi_width
from .amdsmi_interface import amdsmi_set_cpu_gmi3_link_width_range
from .amdsmi_interface import amdsmi_cpu_apb_enable
from .amdsmi_interface import amdsmi_cpu_apb_disable
from .amdsmi_interface import amdsmi_set_cpu_socket_lclk_dpm_level
from .amdsmi_interface import amdsmi_get_cpu_socket_lclk_dpm_level
from .amdsmi_interface import amdsmi_set_cpu_pcie_link_rate
from .amdsmi_interface import amdsmi_set_cpu_df_pstate_range
from .amdsmi_interface import amdsmi_get_cpu_current_io_bandwidth
from .amdsmi_interface import amdsmi_get_cpu_current_xgmi_bw
from .amdsmi_interface import amdsmi_get_hsmp_metrics_table_version
from .amdsmi_interface import amdsmi_get_hsmp_metrics_table
from .amdsmi_interface import amdsmi_first_online_core_on_cpu_socket
from .amdsmi_interface import amdsmi_get_cpu_family
from .amdsmi_interface import amdsmi_get_cpu_model
from .amdsmi_interface import amdsmi_get_cpu_model_name
from .amdsmi_interface import amdsmi_get_cpu_handles
except AttributeError:
pass
from .amdsmi_interface import amdsmi_get_processor_handle_from_bdf
from .amdsmi_interface import amdsmi_get_gpu_device_bdf
from .amdsmi_interface import amdsmi_get_gpu_device_uuid
from .amdsmi_interface import amdsmi_get_gpu_enumeration_info
# # Functions not dependent on ESMI library
from .amdsmi_interface import amdsmi_get_cpu_socket_count
from .amdsmi_interface import amdsmi_get_cpu_cores_per_socket
from .amdsmi_interface import amdsmi_get_cpu_affinity_with_scope
# # SW Version Information
from .amdsmi_interface import amdsmi_get_gpu_driver_info
# # ASIC and Bus Static Information
from .amdsmi_interface import amdsmi_get_gpu_asic_info
from .amdsmi_interface import amdsmi_get_gpu_kfd_info
from .amdsmi_interface import amdsmi_get_power_cap_info
from .amdsmi_interface import amdsmi_get_gpu_vram_info
from .amdsmi_interface import amdsmi_get_gpu_cache_info
from .amdsmi_interface import amdsmi_get_gpu_xcd_counter
from .amdsmi_interface import amdsmi_get_gpu_revision
# # Microcode and VBIOS Information
from .amdsmi_interface import amdsmi_get_gpu_vbios_info
from .amdsmi_interface import amdsmi_get_fw_info
# # GPU Monitoring
from .amdsmi_interface import amdsmi_get_gpu_activity
from .amdsmi_interface import amdsmi_get_gpu_vram_usage
from .amdsmi_interface import amdsmi_get_power_info
from .amdsmi_interface import amdsmi_get_clock_info
from .amdsmi_interface import amdsmi_get_gpu_busy_percent
from .amdsmi_interface import amdsmi_get_pcie_info
from .amdsmi_interface import amdsmi_get_gpu_bad_page_info
from .amdsmi_interface import amdsmi_get_gpu_bad_page_threshold
from .amdsmi_interface import amdsmi_get_violation_status
from .amdsmi_interface import amdsmi_get_gpu_xgmi_link_status
# # Event Notification
from .amdsmi_interface import amdsmi_init_gpu_event_notification
from .amdsmi_interface import amdsmi_set_gpu_event_notification_mask
from .amdsmi_interface import amdsmi_get_gpu_event_notification
from .amdsmi_interface import amdsmi_stop_gpu_event_notification
# # Process Information
from .amdsmi_interface import amdsmi_get_gpu_process_list
# # ECC Error Information
from .amdsmi_interface import amdsmi_get_gpu_total_ecc_count
# # Board Information
from .amdsmi_interface import amdsmi_get_gpu_board_info
# # Ras Information
from .amdsmi_interface import amdsmi_get_gpu_ras_feature_info
from .amdsmi_interface import amdsmi_get_gpu_ras_block_features_enabled
from .amdsmi_interface import amdsmi_get_gpu_cper_entries
from .amdsmi_interface import amdsmi_gpu_validate_ras_eeprom
# # Unsupported Functions In Virtual Environment
from .amdsmi_interface import amdsmi_set_gpu_pci_bandwidth
from .amdsmi_interface import amdsmi_set_power_cap
from .amdsmi_interface import amdsmi_set_gpu_power_profile
from .amdsmi_interface import amdsmi_set_gpu_clk_range
from .amdsmi_interface import amdsmi_set_gpu_clk_limit
from .amdsmi_interface import amdsmi_set_gpu_od_clk_info
from .amdsmi_interface import amdsmi_set_gpu_od_volt_info
from .amdsmi_interface import amdsmi_set_gpu_perf_level
from .amdsmi_interface import amdsmi_get_gpu_power_profile_presets
from .amdsmi_interface import amdsmi_reset_gpu
from .amdsmi_interface import amdsmi_gpu_driver_reload
from .amdsmi_interface import amdsmi_set_gpu_perf_determinism_mode
from .amdsmi_interface import amdsmi_set_gpu_fan_speed
from .amdsmi_interface import amdsmi_reset_gpu_fan
from .amdsmi_interface import amdsmi_set_clk_freq
from .amdsmi_interface import amdsmi_set_gpu_overdrive_level
from .amdsmi_interface import amdsmi_get_soc_pstate
from .amdsmi_interface import amdsmi_set_soc_pstate
from .amdsmi_interface import amdsmi_set_xgmi_plpd
from .amdsmi_interface import amdsmi_get_xgmi_plpd
from .amdsmi_interface import amdsmi_clean_gpu_local_data
from .amdsmi_interface import amdsmi_get_gpu_process_isolation
from .amdsmi_interface import amdsmi_set_gpu_process_isolation
# # Physical State Queries
from .amdsmi_interface import amdsmi_get_gpu_fan_rpms
from .amdsmi_interface import amdsmi_get_gpu_fan_speed
from .amdsmi_interface import amdsmi_get_gpu_fan_speed_max
from .amdsmi_interface import amdsmi_get_temp_metric
from .amdsmi_interface import amdsmi_get_gpu_volt_metric
# # Clock, Power and Performance Query
from .amdsmi_interface import amdsmi_get_utilization_count
from .amdsmi_interface import amdsmi_get_gpu_perf_level
from .amdsmi_interface import amdsmi_get_gpu_overdrive_level
from .amdsmi_interface import amdsmi_get_gpu_mem_overdrive_level
from .amdsmi_interface import amdsmi_get_clk_freq
from .amdsmi_interface import amdsmi_get_gpu_od_volt_info
from .amdsmi_interface import amdsmi_get_gpu_metrics_info
from .amdsmi_interface import amdsmi_get_gpu_partition_metrics_info
from .amdsmi_interface import amdsmi_get_gpu_od_volt_curve_regions
from .amdsmi_interface import amdsmi_is_gpu_power_management_enabled
# # Performance Counters
from .amdsmi_interface import amdsmi_gpu_counter_group_supported
from .amdsmi_interface import amdsmi_gpu_create_counter
from .amdsmi_interface import amdsmi_gpu_destroy_counter
from .amdsmi_interface import amdsmi_gpu_control_counter
from .amdsmi_interface import amdsmi_gpu_read_counter
from .amdsmi_interface import amdsmi_get_gpu_available_counters
# # Error Query
from .amdsmi_interface import amdsmi_get_gpu_ecc_count
from .amdsmi_interface import amdsmi_get_gpu_ecc_enabled
from .amdsmi_interface import amdsmi_get_gpu_ecc_status
from .amdsmi_interface import amdsmi_status_code_to_string
# # System Information Query
from .amdsmi_interface import amdsmi_get_gpu_compute_process_info
from .amdsmi_interface import amdsmi_get_gpu_compute_process_info_by_pid
from .amdsmi_interface import amdsmi_get_gpu_compute_process_gpus
from .amdsmi_interface import amdsmi_gpu_xgmi_error_status
from .amdsmi_interface import amdsmi_reset_gpu_xgmi_error
from .amdsmi_interface import amdsmi_get_esmi_err_msg
# # PCIE information
from .amdsmi_interface import amdsmi_get_gpu_bdf_id
from .amdsmi_interface import amdsmi_get_gpu_pci_bandwidth
from .amdsmi_interface import amdsmi_get_gpu_pci_throughput
from .amdsmi_interface import amdsmi_get_gpu_pci_replay_counter
from .amdsmi_interface import amdsmi_get_gpu_topo_numa_affinity
# # Power information
from .amdsmi_interface import amdsmi_get_energy_count
# # Memory information
from .amdsmi_interface import amdsmi_get_gpu_memory_total
from .amdsmi_interface import amdsmi_get_gpu_memory_usage
from .amdsmi_interface import amdsmi_get_gpu_memory_reserved_pages
# # Events
from .amdsmi_interface import AmdSmiEventReader
# # Device Identification information
from .amdsmi_interface import amdsmi_get_gpu_vendor_name
from .amdsmi_interface import amdsmi_get_gpu_id
from .amdsmi_interface import amdsmi_get_gpu_vram_vendor
from .amdsmi_interface import amdsmi_get_gpu_subsystem_id
from .amdsmi_interface import amdsmi_get_gpu_subsystem_name
# # Hardware topology query
from .amdsmi_interface import amdsmi_topo_get_numa_node_number
from .amdsmi_interface import amdsmi_topo_get_link_weight
from .amdsmi_interface import amdsmi_get_minmax_bandwidth_between_processors
from .amdsmi_interface import amdsmi_get_link_metrics
from .amdsmi_interface import amdsmi_topo_get_link_type
from .amdsmi_interface import amdsmi_topo_get_p2p_status
from .amdsmi_interface import amdsmi_is_P2P_accessible
from .amdsmi_interface import amdsmi_get_xgmi_info
from .amdsmi_interface import amdsmi_get_link_topology_nearest
# # Partition Functions
from .amdsmi_interface import amdsmi_get_gpu_compute_partition
from .amdsmi_interface import amdsmi_set_gpu_compute_partition
from .amdsmi_interface import amdsmi_get_gpu_memory_partition
from .amdsmi_interface import amdsmi_set_gpu_memory_partition
from .amdsmi_interface import amdsmi_get_gpu_accelerator_partition_profile
from .amdsmi_interface import amdsmi_get_gpu_accelerator_partition_profile_config
from .amdsmi_interface import amdsmi_get_gpu_memory_partition_config
from .amdsmi_interface import amdsmi_set_gpu_accelerator_partition_profile
from .amdsmi_interface import amdsmi_set_gpu_memory_partition_mode
# # Individual GPU Metrics Functions
from .amdsmi_interface import amdsmi_get_gpu_metrics_header_info
from .amdsmi_interface import amdsmi_get_gpu_reg_table_info
from .amdsmi_interface import amdsmi_get_gpu_pm_metrics_info
# # Virtualization Mode Detection
from .amdsmi_interface import amdsmi_get_gpu_virtualization_mode
# # Functions where library initialization is not needed
# # Version information
from .amdsmi_interface import amdsmi_get_lib_version
from .amdsmi_interface import amdsmi_get_rocm_version
# # Enums
from .amdsmi_interface import AmdSmiStatus
from .amdsmi_interface import AmdSmiInitFlags
from .amdsmi_interface import AmdSmiContainerTypes
from .amdsmi_interface import AmdSmiDeviceType
from .amdsmi_interface import AmdSmiMmIp
from .amdsmi_interface import AmdSmiFwBlock
from .amdsmi_interface import AmdSmiClkType
from .amdsmi_interface import AmdSmiClkLimitType
from .amdsmi_interface import AmdSmiRegType
from .amdsmi_interface import AmdSmiTemperatureType
from .amdsmi_interface import AmdSmiDevPerfLevel
from .amdsmi_interface import AmdSmiEventGroup
from .amdsmi_interface import AmdSmiEventType
from .amdsmi_interface import AmdSmiCounterCommand
from .amdsmi_interface import AmdSmiEvtNotificationType
from .amdsmi_interface import AmdSmiTemperatureMetric
from .amdsmi_interface import AmdSmiVoltageMetric
from .amdsmi_interface import AmdSmiVoltageType
from .amdsmi_interface import AmdSmiComputePartitionType
from .amdsmi_interface import AmdSmiMemoryPartitionType
from .amdsmi_interface import AmdSmiPowerProfilePresetMasks
from .amdsmi_interface import AmdSmiGpuBlock
from .amdsmi_interface import AmdSmiRasErrState
from .amdsmi_interface import AmdSmiMemoryType
from .amdsmi_interface import AmdSmiFreqInd
from .amdsmi_interface import AmdSmiXgmiStatus
from .amdsmi_interface import AmdSmiMemoryPageStatus
from .amdsmi_interface import AmdSmiLinkType
from .amdsmi_interface import AmdSmiUtilizationCounterType
from .amdsmi_interface import AmdSmiProcessorType
from .amdsmi_interface import AmdSmiVirtualizationMode
from .amdsmi_interface import AmdSmiVramType
from .amdsmi_interface import AmdSmiAffinityScope
# Exceptions
from .amdsmi_exception import AmdSmiLibraryException
from .amdsmi_exception import AmdSmiRetryException
from .amdsmi_exception import AmdSmiParameterException
from .amdsmi_exception import AmdSmiKeyException
from .amdsmi_exception import AmdSmiBdfFormatException
from .amdsmi_exception import AmdSmiTimeoutException
from .amdsmi_exception import AmdSmiException