[SWDEV-422195/SWDEV-440985] GPU metrics 1.6 + --showmetrics

Changes:
- Added new GPU metrics:
  1) Violation status' (ex. PVIOL/TVIOL) accumulators
  2) XCP (Graphics Compute Partitions) statistics
  3) pcie other end recovery counter
- Added rocm-smi --showmetrics
Units/values reflect as indicated by driver, may differ
from AMD SMI or other ROCm SMI interfaces which
use these fields.
- N/A fields means the device does not support providing this
data.

Change-Id: Ia2cd3bb65c4f474ebdb39db8062ea716f2b4d8ee
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
このコミットが含まれているのは:
Charis Poag
2024-06-21 15:13:15 -05:00
コミット 0609cbf1d0
13個のファイルの変更2308行の追加602行の削除
+63 -8
ファイルの表示
@@ -4,22 +4,68 @@ Full documentation for rocm_smi_lib is available at [https://rocm.docs.amd.com/]
***All information listed below is for reference and subject to change.***
## rocm_smi_lib for ROCm 6.3
### Changes
- **Added support for GPU metrics 1.6 to `rsmi_dev_gpu_metrics_info_get()`**
Updated `rsmi_dev_gpu_metrics_info_get()` and structure `rsmi_gpu_metrics_t` to include new fields for PVIOL / TVIOL, XCP (Graphics Compute Partitions) stats, and pcie_lc_perf_other_end_recovery:
- `uint64_t accumulation_counter` - used for all throttled calculations
- `uint64_t prochot_residency_acc` - Processor hot accumulator
- `uint64_t ppt_residency_acc` - Package Power Tracking (PPT) accumulator (used in PVIOL calculations)
- `uint64_t socket_thm_residency_acc` - Socket thermal accumulator - (used in TVIOL calculations)
- `uint64_t vr_thm_residency_acc` - Voltage Rail (VR) thermal accumulator
- `uint64_t hbm_thm_residency_acc` - High Bandwidth Memory (HBM) thermal accumulator
- `uint16_t num_partition` - corresponds to the current total number of partitions
- `struct amdgpu_xcp_metrics_t xcp_stats[MAX_NUM_XCP]` - for each partition associated with current GPU, provides gfx busy & accumulators, jpeg, and decoder (VCN) engine utilizations
- `uint32_t gfx_busy_inst[MAX_NUM_XCC]` - graphic engine utilization (%)
- `uint16_t jpeg_busy[MAX_NUM_JPEG_ENGS]` - jpeg engine utilization (%)
- `uint16_t vcn_busy[MAX_NUM_VCNS]` - decoder (VCN) engine utilization (%)
- `uint64_t gfx_busy_acc[MAX_NUM_XCC]` - graphic engine utilization accumulated (%)
- `uint32_t pcie_lc_perf_other_end_recovery` - corresponds to the pcie other end recovery counter
- **Added ability to view raw GPU metrics`rocm-smi --showmetrics`**
Users can now view GPU metrics from our new `rocm-smi --showmetrics`. Unlike AMD SMI (or other ROCM-SMI interfaces), these values are ***not*** converted into applicable units as users may see in `amd-smi metric`. Units listed display as indicated by the driver, they are not converted (eg. in other AMD SMI/ROCm SMI interfaces which use the data provided). It is important to note, that fields displaying `N/A` data mean this ASIC does not support or backward compatibility was not provided in a newer ASIC's GPU metric structure.
### Removals
- N/A
### Optimizations
- N/A
### Resolved issues
- N/A
### Known Issues
- N/A
### Upcoming changes
- N/A
## rocm_smi_lib for ROCm 6.2.1
### Added
### Changes
- N/A
### Changed
### Removals
- N/A
### Optimized
### Optimizations
- **Improved handling of UnicodeEncodeErrors with non UTF-8 locales**
Non UTF-8 locales were causing crashing on UTF-8 special characters
### Fixed
### Resolved issues
- **Fixed rsmitstReadWrite.TestComputePartitionReadWrite segfault**
Segfault was caused due to unhandled start conditions:
@@ -36,28 +82,33 @@ c. reload amgpu - `sudo modprobe amdgpu`
Test needed to keep track of total number of devices, in order to ensure test comes back to the original configuration.
The test segfault could be seen on all MI3x ASICs, if brought up in a non-SPX configuration upon boot.
### Known Issues
- N/A
### Upcoming changes
- N/A
## rocm_smi_lib for ROCm 6.2
### Added
### Changes
- **Added Partition ID API (`rsmi_dev_partition_id_get(..)`)**
Previously `rsmi_dev_partition_id_get` could only be retrived by querying through `rsmi_dev_pci_id_get()`
and parsing optional bits in our python CLI/API. We are now making this available directly through API.
As well as added testing, in our compute partitioning tests verifing partition IDs update accordingly.
### Changed
### Removals
- N/A
### Optimized
### Optimizations
- N/A
### Fixed
### Resolved issues
- **Partition ID CLI output**
Due to driver changes in KFD, some devices may report bits [31:28] or [2:0]. With the newly added `rsmi_dev_partition_id_get(..)`, we provided this fallback to properly retreive partition ID. We
@@ -74,6 +125,10 @@ plan to eventually remove partition ID from the function portion of the BDF (Bus
- N/A
### Upcoming changes
- N/A
## rocm_smi_lib for ROCm 6.1.2
### Added
+97 -15
ファイルの表示
@@ -925,10 +925,6 @@ struct metrics_table_header_t {
typedef struct metrics_table_header_t metrics_table_header_t;
/// \endcond
/**
* @brief The following structure holds the gpu metrics values for a device.
*/
/**
* @brief Unit conversion factor for HBM temperatures
*/
@@ -964,6 +960,41 @@ typedef struct metrics_table_header_t metrics_table_header_t;
*/
#define RSMI_MAX_NUM_GFX_CLKS 8
/**
* @brief This should match kRSMI_MAX_NUM_XCC;
* XCC - Accelerated Compute Core, the collection of compute units,
* ACE (Asynchronous Compute Engines), caches,
* and global resources organized as one unit.
*
* Refer to amd.com documentation for more detail:
* https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
*/
#define RSMI_MAX_NUM_XCC 8
/**
* @brief This should match kRSMI_MAX_NUM_XCP;
* XCP - Accelerated Compute Processor,
* also referred to as the Graphics Compute Partitions.
* Each physical gpu could have a maximum of 8 separate partitions
* associated with each (depending on ASIC support).
*
* Refer to amd.com documentation for more detail:
* https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
*/
#define RSMI_MAX_NUM_XCP 8
/**
* @brief The following structures hold the gpu statistics for a device.
*/
struct amdgpu_xcp_metrics_t {
/* Utilization Instantaneous (%) */
uint32_t gfx_busy_inst[RSMI_MAX_NUM_XCC];
uint16_t jpeg_busy[RSMI_MAX_NUM_JPEG_ENGS];
uint16_t vcn_busy[RSMI_MAX_NUM_VCNS];
/* Utilization Accumulated (%) */
uint64_t gfx_busy_acc[RSMI_MAX_NUM_XCC];
};
typedef struct {
// TODO(amd) Doxygen documents
@@ -985,7 +1016,7 @@ typedef struct {
*/
struct metrics_table_header_t common_header;
// Temperature
// Temperature (C)
uint16_t temperature_edge;
uint16_t temperature_hotspot;
uint16_t temperature_mem;
@@ -993,19 +1024,19 @@ typedef struct {
uint16_t temperature_vrsoc;
uint16_t temperature_vrmem;
// Utilization
// Utilization (%)
uint16_t average_gfx_activity;
uint16_t average_umc_activity; // memory controller
uint16_t average_mm_activity; // UVD or VCN
// Power/Energy
// Power (W) /Energy (15.259uJ per 1ns)
uint16_t average_socket_power;
uint64_t energy_accumulator; // v1 mod. (32->64)
// Driver attached timestamp (in ns)
uint64_t system_clock_counter; // v1 mod. (moved from top of struct)
// Average clocks
// Average clocks (MHz)
uint16_t average_gfxclk_frequency;
uint16_t average_socclk_frequency;
uint16_t average_uclk_frequency;
@@ -1014,7 +1045,7 @@ typedef struct {
uint16_t average_vclk1_frequency;
uint16_t average_dclk1_frequency;
// Current clocks
// Current clocks (MHz)
uint16_t current_gfxclk;
uint16_t current_socclk;
uint16_t current_uclk;
@@ -1026,10 +1057,10 @@ typedef struct {
// Throttle status
uint32_t throttle_status;
// Fans
// Fans (RPM)
uint16_t current_fan_speed;
// Link width/speed
// Link width (number of lanes) /speed (0.1 GT/s)
uint16_t pcie_link_width; // v1 mod.(8->16)
uint16_t pcie_link_speed; // in 0.1 GT/s; v1 mod. (8->16)
@@ -1045,7 +1076,7 @@ typedef struct {
/*
* v1.2 additions
*/
// PMFW attached timestamp (10ns resolution)
// PMFW attached timestamp (10ns resolution)
uint64_t firmware_timestamp;
@@ -1068,19 +1099,19 @@ typedef struct {
uint16_t current_socket_power;
// Utilization (%)
uint16_t vcn_activity[RSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
uint16_t vcn_activity[RSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
// Clock Lock Status. Each bit corresponds to clock instance
uint32_t gfxclk_lock_status;
// XGMI bus width and bitrate (in Gbps)
// XGMI bus width and bitrate (in GB/s)
uint16_t xgmi_link_width;
uint16_t xgmi_link_speed;
// PCIE accumulated bandwidth (GB/sec)
uint64_t pcie_bandwidth_acc;
// PCIE instantaneous bandwidth (GB/sec)
// PCIE instantaneous bandwidth (GB/sec)
uint64_t pcie_bandwidth_inst;
// PCIE L0 to recovery state transition accumulated count
@@ -1114,6 +1145,57 @@ typedef struct {
// PCIE NAK received accumulated count
uint32_t pcie_nak_rcvd_count_acc;
/*
* v1.6 additions
*/
/* Accumulation cycle counter */
uint64_t accumulation_counter;
/**
* Accumulated throttler residencies
*/
uint64_t prochot_residency_acc;
/**
* Accumulated throttler residencies
*
* Prochot (thermal) - PPT (power)
* Package Power Tracking (PPT) violation % (greater than 0% is a violation);
* aka PVIOL
*
* Ex. PVIOL/TVIOL calculations
* Where A and B are measurments recorded at prior points in time.
* Typically A is the earlier measured value and B is the latest measured value.
*
* PVIOL % = (PptResidencyAcc (B) - PptResidencyAcc (A)) * 100/ (AccumulationCounter (B) - AccumulationCounter (A))
* TVIOL % = (SocketThmResidencyAcc (B) - SocketThmResidencyAcc (A)) * 100 / (AccumulationCounter (B) - AccumulationCounter (A))
*/
uint64_t ppt_residency_acc;
/**
* Accumulated throttler residencies
*
* Socket (thermal) -
* Socket thermal violation % (greater than 0% is a violation);
* aka TVIOL
*
* Ex. PVIOL/TVIOL calculations
* Where A and B are measurments recorded at prior points in time.
* Typically A is the earlier measured value and B is the latest measured value.
*
* PVIOL % = (PptResidencyAcc (B) - PptResidencyAcc (A)) * 100/ (AccumulationCounter (B) - AccumulationCounter (A))
* TVIOL % = (SocketThmResidencyAcc (B) - SocketThmResidencyAcc (A)) * 100 / (AccumulationCounter (B) - AccumulationCounter (A))
*/
uint64_t socket_thm_residency_acc;
uint64_t vr_thm_residency_acc;
uint64_t hbm_thm_residency_acc;
/* Number of current partition */
uint16_t num_partition;
/* XCP (Graphic Cluster Partitions) metrics stats */
struct amdgpu_xcp_metrics_t xcp_stats[RSMI_MAX_NUM_XCP];
/* PCIE other end recovery counter */
uint32_t pcie_lc_perf_other_end_recovery;
/// \endcond
} rsmi_gpu_metrics_t;
+4
ファイルの表示
@@ -242,6 +242,8 @@ class Device {
AMGpuMetricsPublicLatestTupl_t dev_copy_internal_to_external_metrics();
static const std::map<DevInfoTypes, const char*> devInfoTypesStrings;
void set_smi_device_id(uint32_t i) { m_device_id = i; }
void set_smi_partition_id(uint32_t i) { m_partition_id = i; }
private:
std::shared_ptr<Monitor> monitor_;
@@ -278,6 +280,8 @@ class Device {
GpuMetricsBasePtr m_gpu_metrics_ptr;
AMDGpuMetricsHeader_v1_t m_gpu_metrics_header;
uint64_t m_gpu_metrics_updated_timestamp;
uint32_t m_device_id;
uint32_t m_partition_id;
};
+253 -114
ファイルの表示
@@ -3,7 +3,7 @@
* The University of Illinois/NCSA
* Open Source License (NCSA)
*
* Copyright (c) 2017-2023, Advanced Micro Devices, Inc.
* Copyright (c) 2017-2024, Advanced Micro Devices, Inc.
* All rights reserved.
*
* Developed by:
@@ -52,6 +52,7 @@
#include <cassert>
#include <cstdint>
#include <cstring>
#include <string>
#include <map>
#include <memory>
#include <type_traits>
@@ -64,21 +65,19 @@
* All 1.4 and newer GPU metrics are now defined in this header.
*
*/
namespace amd::smi
{
namespace amd::smi {
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MAJOR_VER_1 = 1;
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_1 = 1;
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_2 = 2;
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_3 = 3;
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_4 = 4;
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MAJOR_VER = kRSMI_GPU_METRICS_API_CONTENT_MAJOR_VER_1;
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MINON_VER = kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_4;
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MAJOR_VER
= kRSMI_GPU_METRICS_API_CONTENT_MAJOR_VER_1;
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MINON_VER
= kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_4;
// Note: As gpu metrics are updating
constexpr uint32_t kRSMI_GPU_METRICS_EXPIRATION_SECS = 5;
// Note: This *must* match NUM_HBM_INSTANCES
constexpr uint32_t kRSMI_MAX_NUM_HBM_INSTANCES = 4;
@@ -97,23 +96,36 @@ constexpr uint32_t kRSMI_MAX_NUM_VCNS = 4;
// Note: This *must* match NUM_JPEG_ENG
constexpr uint32_t kRSMI_MAX_JPEG_ENGINES = 32;
// Note: This *must* match MAX_XCC
constexpr uint32_t kRSMI_MAX_NUM_XCC = 8;
struct AMDGpuMetricsHeader_v1_t
{
// Note: This *must* match MAX_XCP
constexpr uint32_t kRSMI_MAX_NUM_XCP = 8;
struct AMDGpuMetricsHeader_v1_t {
uint16_t m_structure_size;
uint8_t m_format_revision;
uint8_t m_content_revision;
};
struct AMDGpuMetricsBase_t
{
struct amdgpu_xcp_metrics {
/* Utilization Instantaneous (%) */
uint32_t gfx_busy_inst[kRSMI_MAX_NUM_XCC];
uint16_t jpeg_busy[kRSMI_MAX_JPEG_ENGINES];
uint16_t vcn_busy[kRSMI_MAX_NUM_VCNS];
/* Utilization Accumulated (%) */
uint64_t gfx_busy_acc[kRSMI_MAX_NUM_XCC];
};
struct AMDGpuMetricsBase_t {
virtual ~AMDGpuMetricsBase_t() = default;
};
using AMDGpuMetricsBaseRef = AMDGpuMetricsBase_t&;
struct AMDGpuMetrics_v11_t
{
struct AMDGpuMetrics_v11_t {
~AMDGpuMetrics_v11_t() = default;
struct AMDGpuMetricsHeader_v1_t m_common_header;
@@ -174,8 +186,7 @@ struct AMDGpuMetrics_v11_t
uint16_t m_temperature_hbm[kRSMI_MAX_NUM_HBM_INSTANCES];
};
struct AMDGpuMetrics_v12_t
{
struct AMDGpuMetrics_v12_t {
~AMDGpuMetrics_v12_t() = default;
struct AMDGpuMetricsHeader_v1_t m_common_header;
@@ -238,8 +249,7 @@ struct AMDGpuMetrics_v12_t
uint64_t m_firmware_timestamp;
};
struct AMDGpuMetrics_v13_t
{
struct AMDGpuMetrics_v13_t {
~AMDGpuMetrics_v13_t() = default;
struct AMDGpuMetricsHeader_v1_t m_common_header;
@@ -298,7 +308,7 @@ struct AMDGpuMetrics_v13_t
uint32_t m_mem_activity_acc; // new in v1
uint16_t m_temperature_hbm[kRSMI_MAX_NUM_HBM_INSTANCES]; // new in v1
// PMFW attached timestamp (10ns resolution)
// PMFW attached timestamp (10ns resolution)
uint64_t m_firmware_timestamp;
// Voltage (mV)
@@ -312,8 +322,7 @@ struct AMDGpuMetrics_v13_t
uint64_t m_indep_throttle_status;
};
struct AMDGpuMetrics_v14_t
{
struct AMDGpuMetrics_v14_t {
~AMDGpuMetrics_v14_t() = default;
struct AMDGpuMetricsHeader_v1_t m_common_header;
@@ -329,7 +338,7 @@ struct AMDGpuMetrics_v14_t
// Utilization (%)
uint16_t m_average_gfx_activity;
uint16_t m_average_umc_activity; // memory controller
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
// Energy (15.259uJ (2^-16) units)
uint64_t m_energy_accumulator;
@@ -345,9 +354,9 @@ struct AMDGpuMetrics_v14_t
// Link width (number of lanes) and speed (in 0.1 GT/s)
uint16_t m_pcie_link_width;
uint16_t m_pcie_link_speed; // in 0.1 GT/s
uint16_t m_pcie_link_speed; // in 0.1 GT/s
// XGMI bus width and bitrate (in Gbps)
// XGMI bus width and bitrate (in Gbps)
uint16_t m_xgmi_link_width;
uint16_t m_xgmi_link_speed;
@@ -358,7 +367,7 @@ struct AMDGpuMetrics_v14_t
// PCIE accumulated bandwidth (GB/sec)
uint64_t m_pcie_bandwidth_acc;
// PCIE instantaneous bandwidth (GB/sec)
// PCIE instantaneous bandwidth (GB/sec)
uint64_t m_pcie_bandwidth_inst;
// PCIE L0 to recovery state transition accumulated count
@@ -387,8 +396,7 @@ struct AMDGpuMetrics_v14_t
uint16_t m_padding;
};
struct AMDGpuMetrics_v15_t
{
struct AMDGpuMetrics_v15_t {
~AMDGpuMetrics_v15_t() = default;
struct AMDGpuMetricsHeader_v1_t m_common_header;
@@ -404,7 +412,7 @@ struct AMDGpuMetrics_v15_t
// Utilization (%)
uint16_t m_average_gfx_activity;
uint16_t m_average_umc_activity; // memory controller
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
uint16_t m_jpeg_activity[kRSMI_MAX_JPEG_ENGINES]; // JPEG activity percent (encode/decode)
// Energy (15.259uJ (2^-16) units)
@@ -421,7 +429,7 @@ struct AMDGpuMetrics_v15_t
// Link width (number of lanes) and speed (in 0.1 GT/s)
uint16_t m_pcie_link_width;
uint16_t m_pcie_link_speed; // in 0.1 GT/s
uint16_t m_pcie_link_speed; // in 0.1 GT/s
// XGMI bus width and bitrate (in Gbps)
uint16_t m_xgmi_link_width;
@@ -468,7 +476,103 @@ struct AMDGpuMetrics_v15_t
uint16_t m_padding;
};
using AMGpuMetricsLatest_t = AMDGpuMetrics_v15_t;
struct AMDGpuMetrics_v16_t {
~AMDGpuMetrics_v16_t() = default;
struct AMDGpuMetricsHeader_v1_t m_common_header;
// Temperature (Celsius). It will be zero (0) if unsupported.
uint16_t m_temperature_hotspot;
uint16_t m_temperature_mem;
uint16_t m_temperature_vrsoc;
// Power (Watts)
uint16_t m_current_socket_power;
// Utilization (%)
uint16_t m_average_gfx_activity;
uint16_t m_average_umc_activity; // memory controller
// Energy (15.259uJ (2^-16) units)
uint64_t m_energy_accumulator;
// Driver attached timestamp (in ns)
uint64_t m_system_clock_counter;
/*
* Important: bumped up public to uint64_t due to planned size increase
* for newer ASICs
*/
/* Accumulation cycle counter */
uint32_t m_accumulation_counter;
/* Accumulated throttler residencies */
uint32_t m_prochot_residency_acc;
uint32_t m_ppt_residency_acc;
uint32_t m_socket_thm_residency_acc;
uint32_t m_vr_thm_residency_acc;
uint32_t m_hbm_thm_residency_acc;
// Clock Lock Status. Each bit corresponds to clock instance
uint32_t m_gfxclk_lock_status;
// Link width (number of lanes) and speed (in 0.1 GT/s)
uint16_t m_pcie_link_width;
uint16_t m_pcie_link_speed; // in 0.1 GT/s
// XGMI bus width and bitrate (in Gbps)
uint16_t m_xgmi_link_width;
uint16_t m_xgmi_link_speed;
// Utilization Accumulated (%)
uint32_t m_gfx_activity_acc;
uint32_t m_mem_activity_acc;
// PCIE accumulated bandwidth (GB/sec)
uint64_t m_pcie_bandwidth_acc;
// PCIE instantaneous bandwidth (GB/sec)
uint64_t m_pcie_bandwidth_inst;
// PCIE L0 to recovery state transition accumulated count
uint64_t m_pcie_l0_to_recov_count_acc;
// PCIE replay accumulated count
uint64_t m_pcie_replay_count_acc;
// PCIE replay rollover accumulated count
uint64_t m_pcie_replay_rover_count_acc;
// PCIE NAK sent accumulated count
uint32_t m_pcie_nak_sent_count_acc;
// PCIE NAK received accumulated count
uint32_t m_pcie_nak_rcvd_count_acc;
// XGMI accumulated data transfer size(KiloBytes)
uint64_t m_xgmi_read_data_acc[kRSMI_MAX_NUM_XGMI_LINKS];
uint64_t m_xgmi_write_data_acc[kRSMI_MAX_NUM_XGMI_LINKS];
// PMFW attached timestamp (10ns resolution)
uint64_t m_firmware_timestamp;
// Current clocks (Mhz)
uint16_t m_current_gfxclk[kRSMI_MAX_NUM_GFX_CLKS];
uint16_t m_current_socclk[kRSMI_MAX_NUM_CLKS];
uint16_t m_current_vclk0[kRSMI_MAX_NUM_CLKS];
uint16_t m_current_dclk0[kRSMI_MAX_NUM_CLKS];
uint16_t m_current_uclk;
/* Number of current partition */
uint16_t m_num_partition;
/* XCP (Graphic Cluster Partitions) metrics stats */
struct amdgpu_xcp_metrics m_xcp_stats[kRSMI_MAX_NUM_XCP];
/* PCIE other end recovery counter */
uint32_t m_pcie_lc_perf_other_end_recovery;
};
using AMGpuMetricsLatest_t = AMDGpuMetrics_v16_t;
/**
* This is GPU Metrics version that gets to public access.
@@ -555,8 +659,7 @@ using AMDGpuMetricVersionFlagId_t = uint32_t;
* Each Metric Unit (or a set of them) is related to a Metric class.
*
*/
enum class AMDGpuMetricsClassId_t : AMDGpuMetricTypeId_t
{
enum class AMDGpuMetricsClassId_t : AMDGpuMetricTypeId_t {
kGpuMetricHeader,
kGpuMetricTemperature,
kGpuMetricUtilization,
@@ -569,6 +672,9 @@ enum class AMDGpuMetricsClassId_t : AMDGpuMetricTypeId_t
kGpuMetricLinkWidthSpeed,
kGpuMetricVoltage,
kGpuMetricTimestamp,
kGpuMetricThrottleResidency,
kGpuMetricPartition,
kGpuMetricXcpStats,
};
using AMDGpuMetricsClassIdTranslationTbl_t = std::map<AMDGpuMetricsClassId_t, std::string>;
@@ -605,8 +711,8 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
kMetricAvgMmActivity,
kMetricGfxActivityAccumulator,
kMetricMemActivityAccumulator,
kMetricVcnActivity, //v1.4
kMetricJpegActivity, //v1.5
kMetricVcnActivity, // v1.4
kMetricJpegActivity, // v1.5
// kGpuMetricAverageClock counters
kMetricAvgGfxClockFrequency,
@@ -618,11 +724,11 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
kMetricAvgDClock1Frequency,
// kGpuMetricCurrentClock counters
kMetricCurrGfxClock, //v1.4: Changed to multi-valued
kMetricCurrSocClock, //v1.4: Changed to multi-valued
kMetricCurrGfxClock, // v1.4: Changed to multi-valued
kMetricCurrSocClock, // v1.4: Changed to multi-valued
kMetricCurrUClock,
kMetricCurrVClock0, //v1.4: Changed to multi-valued
kMetricCurrDClock0, //v1.4: Changed to multi-valued
kMetricCurrVClock0, // v1.4: Changed to multi-valued
kMetricCurrDClock0, // v1.4: Changed to multi-valued
kMetricCurrVClock1,
kMetricCurrDClock1,
@@ -631,7 +737,7 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
kMetricIndepThrottleStatus,
// kGpuMetricGfxClkLockStatus counters
kMetricGfxClkLockStatus, //v1.4
kMetricGfxClkLockStatus, // v1.4
// kGpuMetricCurrentFanSpeed counters
kMetricCurrFanSpeed,
@@ -639,31 +745,50 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
// kGpuMetricLinkWidthSpeed counters
kMetricPcieLinkWidth,
kMetricPcieLinkSpeed,
kMetricPcieBandwidthAccumulator, //v1.4
kMetricPcieBandwidthInst, //v1.4
kMetricXgmiLinkWidth, //v1.4
kMetricXgmiLinkSpeed, //v1.4
kMetricXgmiReadDataAccumulator, //v1.4
kMetricXgmiWriteDataAccumulator, //v1.4
kMetricPcieL0RecovCountAccumulator, //v1.4
kMetricPcieReplayCountAccumulator, //v1.4
kMetricPcieReplayRollOverCountAccumulator, //v1.4
kMetricPcieNakSentCountAccumulator, //v1.5
kMetricPcieNakReceivedCountAccumulator, //v1.5
kMetricPcieBandwidthAccumulator, // v1.4
kMetricPcieBandwidthInst, // v1.4
kMetricXgmiLinkWidth, // v1.4
kMetricXgmiLinkSpeed, // v1.4
kMetricXgmiReadDataAccumulator, // v1.4
kMetricXgmiWriteDataAccumulator, // v1.4
kMetricPcieL0RecovCountAccumulator, // v1.4
kMetricPcieReplayCountAccumulator, // v1.4
kMetricPcieReplayRollOverCountAccumulator, // v1.4
kMetricPcieNakSentCountAccumulator, // v1.5
kMetricPcieNakReceivedCountAccumulator, // v1.5
// kGpuMetricPowerEnergy counters
kMetricAvgSocketPower,
kMetricCurrSocketPower, //v1.4
kMetricEnergyAccumulator, //v1.4
kMetricCurrSocketPower, // v1.4
kMetricEnergyAccumulator, // v1.4
// kGpuMetricVoltage counters
kMetricVoltageSoc, //v1.3
kMetricVoltageGfx, //v1.3
kMetricVoltageMem, //v1.3
kMetricVoltageSoc, // v1.3
kMetricVoltageGfx, // v1.3
kMetricVoltageMem, // v1.3
// kGpuMetricTimestamp counters
kMetricTSClockCounter,
kMetricTSFirmware,
// kMetricAccumulationCounter counters
kMetricAccumulationCounter, // v1.6
kMetricProchotResidencyAccumulator, // v1.6
kMetricPPTResidencyAccumulator, // v1.6
kMetricSocketThmResidencyAccumulator, // v1.6
kMetricVRThmResidencyAccumulator, // v1.6
kMetricHBMThmResidencyAccumulator, // v1.6
// kGpuMetricPartition
kGpuMetricNumPartition, // v1.6
// kGpuMetricXcpStats
kMetricGfxBusyInst, // v1.6
kMetricJpegBusy, // v1.6
kMetricVcnBusy, // v1.6
kMetricGfxBusyAcc, // v1.6
kMetricPcieLCPerfOtherEndRecov, // v1.6
};
using AMDGpuMetricsUnitTypeTranslationTbl_t = std::map<AMDGpuMetricsUnitType_t, std::string>;
@@ -676,14 +801,14 @@ enum class AMDGpuMetricsDataType_t : AMDGpuMetricsDataTypeId_t
kUInt64,
};
struct AMDGpuDynamicMetricsValue_t
{
struct AMDGpuDynamicMetricsValue_t {
uint64_t m_value;
std::string m_info;
AMDGpuMetricsDataType_t m_original_type;
};
using AMDGpuDynamicMetricTblValues_t = std::vector<AMDGpuDynamicMetricsValue_t>;
using AMDGpuDynamicMetricsTbl_t = std::map<AMDGpuMetricsClassId_t, std::map<AMDGpuMetricsUnitType_t, AMDGpuDynamicMetricTblValues_t>>;
using AMDGpuDynamicMetricsTbl_t = std::map<AMDGpuMetricsClassId_t,
std::map<AMDGpuMetricsUnitType_t, AMDGpuDynamicMetricTblValues_t>>;
/*
@@ -700,13 +825,13 @@ enum class AMDGpuMetricVersionFlags_t : AMDGpuMetricVersionFlagId_t
kGpuMetricV13 = (0x1 << 3),
kGpuMetricV14 = (0x1 << 4),
kGpuMetricV15 = (0x1 << 5),
kGpuMetricV16 = (0x1 << 6),
};
using AMDGpuMetricVersionTranslationTbl_t = std::map<uint16_t, AMDGpuMetricVersionFlags_t>;
using GpuMetricTypePtr_t = std::shared_ptr<void>;
class GpuMetricsBase_t
{
public:
class GpuMetricsBase_t {
public:
virtual ~GpuMetricsBase_t() = default;
virtual size_t sizeof_metric_table() = 0;
virtual GpuMetricTypePtr_t get_metrics_table() = 0;
@@ -714,30 +839,32 @@ class GpuMetricsBase_t
virtual AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() = 0;
virtual rsmi_status_t populate_metrics_dynamic_tbl() = 0;
virtual AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() = 0;
virtual void set_device_id(uint32_t device_id) { m_device_id = device_id; }
virtual void set_partition_id(uint32_t partition_id) { m_partition_id = partition_id; }
virtual AMDGpuDynamicMetricsTbl_t get_metrics_dynamic_tbl() {
return m_metrics_dynamic_tbl;
}
protected:
protected:
AMDGpuDynamicMetricsTbl_t m_metrics_dynamic_tbl;
uint64_t m_metrics_timestamp;
uint32_t m_device_id;
uint32_t m_partition_id;
};
using GpuMetricsBasePtr = std::shared_ptr<GpuMetricsBase_t>;
using AMDGpuMetricFactories_t = const std::map<AMDGpuMetricVersionFlags_t, GpuMetricsBasePtr>;
class GpuMetricsBase_v11_t final : public GpuMetricsBase_t
{
public:
class GpuMetricsBase_v11_t final : public GpuMetricsBase_t {
public:
virtual ~GpuMetricsBase_v11_t() = default;
size_t sizeof_metric_table() override {
return sizeof(AMDGpuMetrics_v11_t);
}
GpuMetricTypePtr_t get_metrics_table() override
{
GpuMetricTypePtr_t get_metrics_table() override {
if (!m_gpu_metric_ptr) {
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v11_t*){});
}
@@ -745,13 +872,11 @@ class GpuMetricsBase_v11_t final : public GpuMetricsBase_t
return m_gpu_metric_ptr;
}
void dump_internal_metrics_table() override
{
void dump_internal_metrics_table() override {
return;
}
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
{
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
return AMDGpuMetricVersionFlags_t::kGpuMetricV11;
}
@@ -759,23 +884,20 @@ class GpuMetricsBase_v11_t final : public GpuMetricsBase_t
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
private:
private:
AMDGpuMetrics_v11_t m_gpu_metrics_tbl;
std::shared_ptr<AMDGpuMetrics_v11_t> m_gpu_metric_ptr;
};
class GpuMetricsBase_v12_t final : public GpuMetricsBase_t
{
public:
class GpuMetricsBase_v12_t final : public GpuMetricsBase_t {
public:
~GpuMetricsBase_v12_t() = default;
size_t sizeof_metric_table() override {
return sizeof(AMDGpuMetrics_v12_t);
}
GpuMetricTypePtr_t get_metrics_table() override
{
GpuMetricTypePtr_t get_metrics_table() override {
if (!m_gpu_metric_ptr) {
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v12_t*){});
}
@@ -783,36 +905,31 @@ class GpuMetricsBase_v12_t final : public GpuMetricsBase_t
return m_gpu_metric_ptr;
}
void dump_internal_metrics_table() override
{
void dump_internal_metrics_table() override {
return;
}
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
{
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
return AMDGpuMetricVersionFlags_t::kGpuMetricV12;
}
rsmi_status_t populate_metrics_dynamic_tbl() override;
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
private:
private:
AMDGpuMetrics_v12_t m_gpu_metrics_tbl;
std::shared_ptr<AMDGpuMetrics_v12_t> m_gpu_metric_ptr;
};
class GpuMetricsBase_v13_t final : public GpuMetricsBase_t
{
public:
class GpuMetricsBase_v13_t final : public GpuMetricsBase_t {
public:
~GpuMetricsBase_v13_t() = default;
size_t sizeof_metric_table() override {
return sizeof(AMDGpuMetrics_v13_t);
}
GpuMetricTypePtr_t get_metrics_table() override
{
GpuMetricTypePtr_t get_metrics_table() override {
if (!m_gpu_metric_ptr) {
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v13_t*){});
}
@@ -822,8 +939,7 @@ class GpuMetricsBase_v13_t final : public GpuMetricsBase_t
void dump_internal_metrics_table() override;
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
{
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
return AMDGpuMetricVersionFlags_t::kGpuMetricV13;
}
@@ -831,23 +947,20 @@ class GpuMetricsBase_v13_t final : public GpuMetricsBase_t
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
private:
private:
AMDGpuMetrics_v13_t m_gpu_metrics_tbl;
std::shared_ptr<AMDGpuMetrics_v13_t> m_gpu_metric_ptr;
};
class GpuMetricsBase_v14_t final : public GpuMetricsBase_t
{
public:
class GpuMetricsBase_v14_t final : public GpuMetricsBase_t {
public:
~GpuMetricsBase_v14_t() = default;
size_t sizeof_metric_table() override {
return sizeof(AMDGpuMetrics_v14_t);
}
GpuMetricTypePtr_t get_metrics_table() override
{
GpuMetricTypePtr_t get_metrics_table() override {
if (!m_gpu_metric_ptr) {
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v14_t*){});
}
@@ -857,8 +970,7 @@ class GpuMetricsBase_v14_t final : public GpuMetricsBase_t
void dump_internal_metrics_table() override;
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
{
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
return AMDGpuMetricVersionFlags_t::kGpuMetricV14;
}
@@ -866,23 +978,20 @@ class GpuMetricsBase_v14_t final : public GpuMetricsBase_t
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
private:
private:
AMDGpuMetrics_v14_t m_gpu_metrics_tbl;
std::shared_ptr<AMDGpuMetrics_v14_t> m_gpu_metric_ptr;
};
class GpuMetricsBase_v15_t final : public GpuMetricsBase_t
{
public:
class GpuMetricsBase_v15_t final : public GpuMetricsBase_t {
public:
~GpuMetricsBase_v15_t() = default;
size_t sizeof_metric_table() override {
return sizeof(AMDGpuMetrics_v15_t);
}
GpuMetricTypePtr_t get_metrics_table() override
{
GpuMetricTypePtr_t get_metrics_table() override {
if (!m_gpu_metric_ptr) {
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v15_t*){});
}
@@ -892,8 +1001,7 @@ class GpuMetricsBase_v15_t final : public GpuMetricsBase_t
void dump_internal_metrics_table() override;
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
{
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
return AMDGpuMetricVersionFlags_t::kGpuMetricV15;
}
@@ -901,20 +1009,51 @@ class GpuMetricsBase_v15_t final : public GpuMetricsBase_t
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
private:
private:
AMDGpuMetrics_v15_t m_gpu_metrics_tbl;
std::shared_ptr<AMDGpuMetrics_v15_t> m_gpu_metric_ptr;
};
class GpuMetricsBase_v16_t final : public GpuMetricsBase_t {
public:
~GpuMetricsBase_v16_t() = default;
size_t sizeof_metric_table() override {
return sizeof(AMDGpuMetrics_v16_t);
}
GpuMetricTypePtr_t get_metrics_table() override {
if (!m_gpu_metric_ptr) {
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v16_t*){});
}
assert(m_gpu_metric_ptr != nullptr);
return m_gpu_metric_ptr;
}
void dump_internal_metrics_table() override;
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
return AMDGpuMetricVersionFlags_t::kGpuMetricV16;
}
rsmi_status_t populate_metrics_dynamic_tbl() override;
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
private:
AMDGpuMetrics_v16_t m_gpu_metrics_tbl;
std::shared_ptr<AMDGpuMetrics_v16_t> m_gpu_metric_ptr;
};
template<typename T>
rsmi_status_t rsmi_dev_gpu_metrics_info_query(uint32_t dv_ind, AMDGpuMetricsUnitType_t metric_counter, T& metric_value);
rsmi_status_t rsmi_dev_gpu_metrics_info_query(uint32_t dv_ind,
AMDGpuMetricsUnitType_t metric_counter, T& metric_value);
} // namespace amd::smi
} // namespace amd::smi
rsmi_status_t
rsmi_dev_gpu_metrics_header_info_get(uint32_t dv_ind, metrics_table_header_t& header_value);
rsmi_dev_gpu_metrics_header_info_get(uint32_t dv_ind,
metrics_table_header_t& header_value);
#endif // ROCM_SMI_ROCM_SMI_GPU_METRICS_H_
#endif // ROCM_SMI_ROCM_SMI_GPU_METRICS_H_
+72
ファイルの表示
@@ -48,14 +48,18 @@
#include <algorithm>
#include <cstdint>
#include <iomanip>
#include <iosfwd>
#include <iostream>
#include <iterator>
#include <limits>
#include <ostream>
#include <queue>
#include <sstream>
#include <string>
#include <tuple>
#include <type_traits>
#include <vector>
#include <utility>
#include "rocm_smi/rocm_smi_device.h"
@@ -599,6 +603,74 @@ using TextFileTagContents_t = TagTextContents_t<std::string, std::string,
std::string, std::string>;
//
// Note: Output iterator that inserts a delimiter between elements.
//
template<typename DelimiterType, typename CharType = char,
typename TraitsType = std::char_traits<CharType>>
class ostream_joiner {
public:
using Char_t = CharType;
using Traits_t = TraitsType;
using Ostream_t = std::basic_ostream<Char_t, Traits_t>;
using iterator_category = std::output_iterator_tag;
using value_type = void;
using difference_type = void;
using pointer = void;
using reference = void;
ostream_joiner(Ostream_t* outstream,
const DelimiterType& delimiter) noexcept
(std::is_nothrow_copy_constructible_v<DelimiterType>)
: m_outstream(outstream), m_delimiter(delimiter) {}
ostream_joiner(Ostream_t* outstream, DelimiterType&& delimiter) noexcept
(std::is_nothrow_move_constructible_v<DelimiterType>)
: m_outstream(outstream), m_delimiter(std::move(delimiter)) {}
template<typename ValueType> ostream_joiner& operator=(const ValueType& value) {
if (!m_is_first) {
*m_outstream << m_delimiter;
}
this->m_is_first = false;
this->m_value_count++;
if ((m_value_count % kMAX_VALUES_PER_LINE) == 0) {
*m_outstream << "\n" << value;
this->m_value_count = 0;
} else {
*m_outstream << value;
}
return *this;
}
ostream_joiner& operator*() noexcept { return *this; }
ostream_joiner& operator++() noexcept { return *this; }
ostream_joiner& operator++(int) noexcept { return *this; }
private:
Ostream_t* m_outstream;
DelimiterType m_delimiter;
bool m_is_first = true;
uint32_t m_value_count = 0;
const uint32_t kMAX_VALUES_PER_LINE = 9;
};
/// Object generator for ostream_joiner.
template<typename CharType, typename TraitsType, typename DelimiterType>
inline ostream_joiner<std::decay_t<DelimiterType>, CharType, TraitsType>
make_ostream_joiner(std::basic_ostream<CharType, TraitsType>* outstream,
DelimiterType&& delimiter) {
return {
outstream,
std::forward<DelimiterType>(delimiter)
};
}
} // namespace smi
} // namespace amd
+325 -5
ファイルの表示
@@ -24,6 +24,7 @@ import trace
from io import StringIO
from time import ctime
from subprocess import check_output
from enum import IntEnum
from typing import TYPE_CHECKING
# only used for type checking
@@ -48,9 +49,9 @@ except ImportError:
# Minor version - Increment when adding a new feature, set to 0 when major is incremented
# Patch version - Increment when adding a fix, set to 0 when minor is incremented
# Hash version - Shortened commit hash. Print here and not with lib for consistency with amd-smi
SMI_MAJ = 2
SMI_MIN = 3
SMI_PAT = 1
SMI_MAJ = 3
SMI_MIN = 0
SMI_PAT = 0
# SMI_HASH is provided by rsmiBindings
__version__ = '%s.%s.%s+%s' % (SMI_MAJ, SMI_MIN, SMI_PAT, SMI_HASH)
@@ -856,7 +857,7 @@ def printEventList(device, delay, eventList):
print2DArray([['\rGPU[%d]:\t' % (data.dv_ind), ctime().split()[3], notification_type_names[data.event.value - 1],
data.message.decode('utf8') + '\r']])
def printLog(device, metricName, value=None, extraSpace=False, useItalics=False):
def printLog(device, metricName, value=None, extraSpace=False, useItalics=False, xcp=None):
""" Print out to the SMI log
:param device: DRM device identifier
@@ -878,7 +879,10 @@ def printLog(device, metricName, value=None, extraSpace=False, useItalics=False)
formatJson(device, str(metricName))
return
if value is not None:
logstr = 'GPU[%s]\t\t: %s: %s' % (device, metricName, value)
if xcp == None:
logstr = 'GPU[%s]\t\t: %s: %s' % (device, metricName, value)
else:
logstr = 'GPU[%s] XCP[%s]\t: %s: %s' % (device, xcp, metricName, value)
else:
logstr = 'GPU[%s]\t\t: %s' % (device, metricName)
if device is None:
@@ -3544,6 +3548,318 @@ def showMemoryPartition(deviceList):
printErrLog(device, 'Failed to retrieve current memory partition, even though device supports it.')
printLogSpacer()
class UIntegerTypes(IntEnum):
UINT8_T = 0xFF
UINT16_T = 0xFFFF
UINT32_T = 0xFFFFFFFF
UINT64_T = 0xFFFFFFFFFFFFFFFF
def validateIfMaxUint(valToCheck, uintType: UIntegerTypes):
return_val = "N/A"
if not isinstance(valToCheck, list):
if valToCheck == uintType:
return return_val
else:
return valToCheck
else:
return_val = valToCheck
for idx, v in enumerate(valToCheck):
if v == uintType:
return_val[idx] = "N/A"
return return_val
def showGPUMetrics(deviceList):
""" Returns the gpu metrics for a list of devices
:param deviceList: List of DRM devices (can be a single-item list)
"""
printLogSpacer(' GPU Metrics ')
gpu_metrics = rsmi_gpu_metrics_t()
temp_unit="C"
power_unit="W"
energy_unit="15.259uJ (2^-16)"
volt_unit="mV"
clock_unit="MHz"
fan_speed="rpm"
percent_unit="%"
pcie_acc_unit="GB/s"
pcie_lanes_unit="Lanes"
pcie_speed_unit="0.1 GT/s"
xgmi_speed="Gbps"
xgmi_data_sz="kB"
time_unit="ns"
time_unit_10="10ns resolution"
count="Count"
no_unit = None
for device in deviceList:
ret = rocmsmi.rsmi_dev_gpu_metrics_info_get(device, byref(gpu_metrics))
metrics = {
"common_header": "N/A"
}
if rsmi_ret_ok(ret, device, 'rsmi_dev_gpu_metrics_info_get',silent=True):
metrics = {
"common_header": {
"version": float(str(gpu_metrics.common_header.format_revision) + "."
+ str(gpu_metrics.common_header.content_revision)),
"size": gpu_metrics.common_header.structure_size
}, "temperature_edge": {
"value": validateIfMaxUint(gpu_metrics.temperature_edge, UIntegerTypes.UINT16_T),
"unit": temp_unit,
}, "temperature_hotspot": {
"value": validateIfMaxUint(gpu_metrics.temperature_hotspot, UIntegerTypes.UINT16_T),
"unit": temp_unit,
}, "temperature_mem": {
"value": validateIfMaxUint(gpu_metrics.temperature_mem, UIntegerTypes.UINT16_T),
"unit": temp_unit,
}, "temperature_vrgfx": {
"value": validateIfMaxUint(gpu_metrics.temperature_vrgfx, UIntegerTypes.UINT16_T),
"unit": temp_unit,
}, "temperature_vrsoc": {
"value": validateIfMaxUint(gpu_metrics.temperature_vrsoc, UIntegerTypes.UINT16_T),
"unit": temp_unit,
}, "temperature_vrmem": {
"value": validateIfMaxUint(gpu_metrics.temperature_vrmem, UIntegerTypes.UINT16_T),
"unit": temp_unit,
}, "average_gfx_activity": {
"value": validateIfMaxUint(gpu_metrics.average_gfx_activity, UIntegerTypes.UINT16_T),
"unit": percent_unit,
}, "average_umc_activity": {
"value": validateIfMaxUint(gpu_metrics.average_umc_activity, UIntegerTypes.UINT16_T),
"unit": percent_unit,
}, "average_mm_activity": {
"value": validateIfMaxUint(gpu_metrics.average_mm_activity, UIntegerTypes.UINT16_T),
"unit": percent_unit,
}, "average_socket_power": {
"value": validateIfMaxUint(gpu_metrics.average_socket_power, UIntegerTypes.UINT16_T),
"unit": power_unit,
}, "energy_accumulator": {
"value": validateIfMaxUint(gpu_metrics.energy_accumulator, UIntegerTypes.UINT64_T),
"unit": energy_unit,
}, "system_clock_counter": {
"value": validateIfMaxUint(gpu_metrics.system_clock_counter, UIntegerTypes.UINT64_T),
"unit": time_unit,
}, "average_gfxclk_frequency": {
"value": validateIfMaxUint(gpu_metrics.average_gfxclk_frequency, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "average_socclk_frequency": {
"value": validateIfMaxUint(gpu_metrics.average_socclk_frequency, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "average_uclk_frequency": {
"value": validateIfMaxUint(gpu_metrics.average_uclk_frequency, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "average_vclk0_frequency": {
"value": validateIfMaxUint(gpu_metrics.average_vclk0_frequency, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "average_dclk0_frequency": {
"value": validateIfMaxUint(gpu_metrics.average_dclk0_frequency, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "average_vclk1_frequency": {
"value": validateIfMaxUint(gpu_metrics.average_vclk1_frequency, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "average_dclk1_frequency": {
"value": validateIfMaxUint(gpu_metrics.average_dclk1_frequency, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_gfxclk": {
"value": validateIfMaxUint(gpu_metrics.current_gfxclk, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_socclk": {
"value": validateIfMaxUint(gpu_metrics.current_socclk, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_uclk": {
"value": validateIfMaxUint(gpu_metrics.current_uclk, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_vclk0": {
"value": validateIfMaxUint(gpu_metrics.current_vclk0, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_dclk0": {
"value": validateIfMaxUint(gpu_metrics.current_dclk0, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_vclk1": {
"value": validateIfMaxUint(gpu_metrics.current_vclk1, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_dclk1": {
"value": validateIfMaxUint(gpu_metrics.current_dclk1, UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "throttle_status": {
"value": validateIfMaxUint(gpu_metrics.throttle_status, UIntegerTypes.UINT32_T),
"unit": no_unit,
}, "current_fan_speed": {
"value": validateIfMaxUint(gpu_metrics.current_fan_speed, UIntegerTypes.UINT16_T),
"unit": fan_speed,
}, "pcie_link_width": {
"value": validateIfMaxUint(gpu_metrics.pcie_link_width, UIntegerTypes.UINT16_T),
"unit": pcie_lanes_unit,
}, "pcie_link_speed": {
"value": validateIfMaxUint(gpu_metrics.pcie_link_speed, UIntegerTypes.UINT16_T),
"unit": pcie_speed_unit,
}, "gfx_activity_acc": {
"value": validateIfMaxUint(gpu_metrics.gfx_activity_acc, UIntegerTypes.UINT32_T),
"unit": percent_unit,
}, "mem_activity_acc": {
"value": validateIfMaxUint(gpu_metrics.mem_activity_acc, UIntegerTypes.UINT32_T),
"unit": percent_unit,
}, "temperature_hbm": {
"value": validateIfMaxUint(list(gpu_metrics.temperature_hbm), UIntegerTypes.UINT16_T),
"unit": temp_unit,
}, "firmware_timestamp": {
"value": validateIfMaxUint(gpu_metrics.firmware_timestamp, UIntegerTypes.UINT64_T),
"unit": time_unit_10,
}, "voltage_soc": {
"value": validateIfMaxUint(gpu_metrics.voltage_soc, UIntegerTypes.UINT16_T),
"unit": volt_unit,
}, "voltage_gfx": {
"value": validateIfMaxUint(gpu_metrics.voltage_gfx, UIntegerTypes.UINT16_T),
"unit": volt_unit,
}, "voltage_mem": {
"value": validateIfMaxUint(gpu_metrics.voltage_mem, UIntegerTypes.UINT16_T),
"unit": volt_unit,
}, "indep_throttle_status": {
"value": validateIfMaxUint(gpu_metrics.indep_throttle_status, UIntegerTypes.UINT64_T),
"unit": no_unit,
}, "current_socket_power": {
"value": validateIfMaxUint(gpu_metrics.current_socket_power, UIntegerTypes.UINT16_T),
"unit": power_unit,
}, "vcn_activity": {
"value": validateIfMaxUint(list(gpu_metrics.vcn_activity), UIntegerTypes.UINT16_T),
"unit": percent_unit,
}, "gfxclk_lock_status": {
"value": validateIfMaxUint(gpu_metrics.gfxclk_lock_status, UIntegerTypes.UINT32_T),
"unit": no_unit,
}, "xgmi_link_width": {
"value": validateIfMaxUint(gpu_metrics.xgmi_link_width, UIntegerTypes.UINT16_T),
"unit": no_unit,
}, "xgmi_link_speed": {
"value": validateIfMaxUint(gpu_metrics.xgmi_link_speed, UIntegerTypes.UINT16_T),
"unit": xgmi_speed,
}, "pcie_bandwidth_acc": {
"value": validateIfMaxUint(gpu_metrics.pcie_bandwidth_acc, UIntegerTypes.UINT64_T),
"unit": pcie_acc_unit,
}, "pcie_bandwidth_inst": {
"value": validateIfMaxUint(gpu_metrics.pcie_bandwidth_inst, UIntegerTypes.UINT64_T),
"unit": pcie_acc_unit,
}, "pcie_l0_to_recov_count_acc": {
"value": validateIfMaxUint(gpu_metrics.pcie_l0_to_recov_count_acc, UIntegerTypes.UINT64_T),
"unit": count,
}, "pcie_replay_count_acc": {
"value": validateIfMaxUint(gpu_metrics.pcie_replay_count_acc, UIntegerTypes.UINT64_T),
"unit": count,
}, "pcie_replay_rover_count_acc": {
"value": validateIfMaxUint(gpu_metrics.pcie_replay_rover_count_acc, UIntegerTypes.UINT64_T),
"unit": count,
}, "xgmi_read_data_acc": {
"value": validateIfMaxUint(list(gpu_metrics.xgmi_read_data_acc), UIntegerTypes.UINT64_T),
"unit": xgmi_data_sz,
}, "xgmi_write_data_acc": {
"value": validateIfMaxUint(list(gpu_metrics.xgmi_write_data_acc), UIntegerTypes.UINT64_T),
"unit": xgmi_data_sz,
}, "current_gfxclks": {
"value": validateIfMaxUint(list(gpu_metrics.current_gfxclks), UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_socclks": {
"value": validateIfMaxUint(list(gpu_metrics.current_socclks), UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_vclk0s": {
"value": validateIfMaxUint(list(gpu_metrics.current_vclk0s), UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "current_dclk0s": {
"value": validateIfMaxUint(list(gpu_metrics.current_dclk0s), UIntegerTypes.UINT16_T),
"unit": clock_unit,
}, "jpeg_activity": {
"value": validateIfMaxUint(list(gpu_metrics.jpeg_activity), UIntegerTypes.UINT16_T),
"unit": percent_unit,
}, "pcie_nak_sent_count_acc": {
"value": validateIfMaxUint(gpu_metrics.pcie_nak_sent_count_acc, UIntegerTypes.UINT32_T),
"unit": count,
}, "pcie_nak_rcvd_count_acc": {
"value": validateIfMaxUint(gpu_metrics.pcie_nak_rcvd_count_acc, UIntegerTypes.UINT32_T),
"unit": count,
}, "accumulation_counter": {
"value": validateIfMaxUint(gpu_metrics.accumulation_counter, UIntegerTypes.UINT64_T),
"unit": count,
}, "prochot_residency_acc": {
"value": validateIfMaxUint(gpu_metrics.prochot_residency_acc, UIntegerTypes.UINT64_T),
"unit": count,
}, "ppt_residency_acc": {
"value": validateIfMaxUint(gpu_metrics.ppt_residency_acc, UIntegerTypes.UINT64_T),
"unit": count,
}, "socket_thm_residency_acc": {
"value": validateIfMaxUint(gpu_metrics.socket_thm_residency_acc, UIntegerTypes.UINT64_T),
"unit": count,
}, "vr_thm_residency_acc": {
"value": validateIfMaxUint(gpu_metrics.vr_thm_residency_acc, UIntegerTypes.UINT64_T),
"unit": count,
}, "hbm_thm_residency_acc": {
"value": validateIfMaxUint(gpu_metrics.hbm_thm_residency_acc, UIntegerTypes.UINT64_T),
"unit": count,
},
"pcie_lc_perf_other_end_recovery": {
"value": validateIfMaxUint(gpu_metrics.pcie_lc_perf_other_end_recovery, UIntegerTypes.UINT32_T),
"unit": count,
},
"num_partition": {
"value": validateIfMaxUint(gpu_metrics.num_partition, UIntegerTypes.UINT16_T),
"unit": no_unit,
},
"xcp_stats.gfx_busy_inst": {
"value": gpu_metrics.xcp_stats,
"unit": percent_unit,
},
"xcp_stats.jpeg_busy": {
"value": gpu_metrics.xcp_stats,
"unit": percent_unit,
},
"xcp_stats.vcn_busy": {
"value": gpu_metrics.xcp_stats,
"unit": percent_unit,
},
"xcp_stats.gfx_busy_acc": {
"value": gpu_metrics.xcp_stats,
"unit": percent_unit,
},
}
printLog(device, 'Metric Version and Size (Bytes)',
str(metrics["common_header"]["version"]) + " " + str(metrics["common_header"]["size"]))
for k,v in metrics.items():
if k != "common_header" and 'xcp_stats' not in k:
if v["unit"] != None:
printLog(device, k + " (" + str(v["unit"]) + ")", str(v["value"]))
elif v["unit"] == None:
printLog(device, k, str(v["value"]))
if 'xcp_stats.gfx_busy_inst' in k:
for curr_xcp, item in enumerate(v['value']):
print_xcp_detail = []
for _, val in enumerate(item.gfx_busy_inst):
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT32_T))
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
if 'xcp_stats.jpeg_busy' in k:
for curr_xcp, item in enumerate(v['value']):
print_xcp_detail = []
for _, val in enumerate(item.jpeg_busy):
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT16_T))
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
if 'xcp_stats.vcn_busy' in k:
for curr_xcp, item in enumerate(v['value']):
print_xcp_detail = []
for _, val in enumerate(item.vcn_busy):
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT16_T))
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
if 'xcp_stats.gfx_busy_acc' in k:
for curr_xcp, item in enumerate(v['value']):
print_xcp_detail = []
for _, val in enumerate(item.gfx_busy_acc):
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT64_T))
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
if int(device) < (len(deviceList) - 1):
printLogSpacer()
elif ret == rsmi_status_t.RSMI_STATUS_NOT_SUPPORTED:
printLog(device, 'Not supported on the given system', None)
else:
rsmi_ret_ok(ret, device, 'get_gpu_metrics')
printErrLog(device, 'Failed to retrieve GPU metrics, metric version may not be supported for this device.')
printLogSpacer()
def checkAmdGpus(deviceList):
""" Check if there are any AMD GPUs being queried,
@@ -3913,6 +4229,7 @@ if __name__ == '__main__':
groupDisplay.add_argument('--shownodesbw', help='Shows the numa nodes ', action='store_true')
groupDisplay.add_argument('--showcomputepartition', help='Shows current compute partitioning ', action='store_true')
groupDisplay.add_argument('--showmemorypartition', help='Shows current memory partition ', action='store_true')
groupDisplay.add_argument('--showmetrics', help='Show current gpu metric data ', action='store_true')
groupActionReset.add_argument('-r', '--resetclocks', help='Reset clocks and OverDrive to default',
action='store_true')
@@ -4079,6 +4396,7 @@ if __name__ == '__main__':
args.showvc = True
args.showcomputepartition = True
args.showmemorypartition = True
args.showmetrics = True
if not PRINT_JSON:
args.showprofile = True
@@ -4209,6 +4527,8 @@ if __name__ == '__main__':
showComputePartition(deviceList)
if args.showmemorypartition:
showMemoryPartition(deviceList)
if args.showmetrics:
showGPUMetrics(deviceList)
if args.setclock:
setClocks(deviceList, args.setclock[0], [int(args.setclock[1])])
if args.setsclk:
+99
ファイルの表示
@@ -642,3 +642,102 @@ rsmi_power_type_dict = {
1: 'CURRENT SOCKET',
0xFFFFFFFF: 'INVALID_POWER_TYPE'
}
class metrics_table_header_t(Structure):
pass
# metrics_table_header_t._pack_ = 1 # source:False
metrics_table_header_t._fields_ = [
('structure_size', c_uint16),
('format_revision', c_uint8),
('content_revision', c_uint8),
]
amd_metrics_table_header_t = metrics_table_header_t
class amdgpu_xcp_metrics_t(Structure):
pass
# amdgpu_xcp_metrics_t._pack_ = 1 # source:False
amdgpu_xcp_metrics_t._fields_ = [
('gfx_busy_inst', c_uint32 * 8),
('jpeg_busy', c_uint16 * 32),
('vcn_busy', c_uint16 * 4),
('gfx_busy_acc', c_uint64 * 8),
]
xcp_stats_t = amdgpu_xcp_metrics_t
class rsmi_gpu_metrics_t(Structure):
pass
# rsmi_gpu_metrics_t._pack_ = 1 # source:False
rsmi_gpu_metrics_t._fields_ = [
('common_header', amd_metrics_table_header_t),
('temperature_edge', c_uint16),
('temperature_hotspot', c_uint16),
('temperature_mem', c_uint16),
('temperature_vrgfx', c_uint16),
('temperature_vrsoc', c_uint16),
('temperature_vrmem', c_uint16),
('average_gfx_activity', c_uint16),
('average_umc_activity', c_uint16),
('average_mm_activity', c_uint16),
('average_socket_power', c_uint16),
('energy_accumulator', c_uint64),
('system_clock_counter', c_uint64),
('average_gfxclk_frequency', c_uint16),
('average_socclk_frequency', c_uint16),
('average_uclk_frequency', c_uint16),
('average_vclk0_frequency', c_uint16),
('average_dclk0_frequency', c_uint16),
('average_vclk1_frequency', c_uint16),
('average_dclk1_frequency', c_uint16),
('current_gfxclk', c_uint16),
('current_socclk', c_uint16),
('current_uclk', c_uint16),
('current_vclk0', c_uint16),
('current_dclk0', c_uint16),
('current_vclk1', c_uint16),
('current_dclk1', c_uint16),
('throttle_status', c_uint32),
('current_fan_speed', c_uint16),
('pcie_link_width', c_uint16),
('pcie_link_speed', c_uint16),
('gfx_activity_acc', c_uint32),
('mem_activity_acc', c_uint32),
('temperature_hbm', c_uint16 * 4),
('firmware_timestamp', c_uint64),
('voltage_soc', c_uint16),
('voltage_gfx', c_uint16),
('voltage_mem', c_uint16),
('indep_throttle_status', c_uint64),
('current_socket_power', c_uint16),
('vcn_activity', c_uint16 * 4),
('gfxclk_lock_status', c_uint32),
('xgmi_link_width', c_uint16),
('xgmi_link_speed', c_uint16),
('pcie_bandwidth_acc', c_uint64),
('pcie_bandwidth_inst', c_uint64),
('pcie_l0_to_recov_count_acc', c_uint64),
('pcie_replay_count_acc', c_uint64),
('pcie_replay_rover_count_acc', c_uint64),
('xgmi_read_data_acc', c_uint64 * 8),
('xgmi_write_data_acc', c_uint64 * 8),
('current_gfxclks', c_uint16 * 8),
('current_socclks', c_uint16 * 4),
('current_vclk0s', c_uint16 * 4),
('current_dclk0s', c_uint16 * 4),
('jpeg_activity', c_uint16 * 32),
('pcie_nak_sent_count_acc', c_uint32),
('pcie_nak_rcvd_count_acc', c_uint32),
('accumulation_counter', c_uint64),
('prochot_residency_acc', c_uint64),
('ppt_residency_acc', c_uint64),
('socket_thm_residency_acc', c_uint64),
('vr_thm_residency_acc', c_uint64),
('hbm_thm_residency_acc', c_uint64),
('num_partition', c_uint16),
('xcp_stats', xcp_stats_t * 8),
('pcie_lc_perf_other_end_recovery', c_uint32),
]
amdsmi_gpu_metrics_t = rsmi_gpu_metrics_t
+83 -37
ファイルの表示
@@ -731,30 +731,6 @@ template<typename T> constexpr float convert_mw_to_w(T mw) {
return static_cast<float>(mw / 1000.0);
}
template <typename T>
auto print_error_or_value(rsmi_status_t status_code, const T& metric) {
if (status_code == rsmi_status_t::RSMI_STATUS_SUCCESS) {
if constexpr (std::is_array_v<T>) {
auto idx = uint16_t(0);
auto str_values = std::string();
const auto num_elems = static_cast<uint16_t>(std::end(metric) - std::begin(metric));
str_values = ("\n\t\t num of values: " + std::to_string(num_elems) + "\n");
for (const auto& el : metric) {
str_values += "\t\t [" + std::to_string(idx) + "]: " + std::to_string(el) + "\n";
++idx;
}
return str_values;
}
else if constexpr ((std::is_same_v<T, std::uint16_t>) ||
(std::is_same_v<T, std::uint32_t>) ||
(std::is_same_v<T, std::uint64_t>)) {
return std::to_string(metric);
}
}
else {
return ("\n\t\tStatus: [" + std::to_string(status_code) + "] " + "-> " + amd::smi::getRSMIStatusString(status_code));
}
};
template <typename T>
std::string print_unsigned_int(T value) {
@@ -860,8 +836,9 @@ int main() {
//
std::cout << "\n";
print_test_header("GPU METRICS: Using static struct (Backwards Compatibility) ", i);
print_function_header_with_rsmi_ret(ret, "rsmi_dev_gpu_metrics_info_get(" + std::to_string(i) + ", &gpu_metrics)");
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics);
ret = rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics);
print_function_header_with_rsmi_ret(ret, "rsmi_dev_gpu_metrics_info_get("
+ std::to_string(i) + ", &gpu_metrics)");
std::cout << "\t**.common_header.format_revision : "
<< print_unsigned_int(gpu_metrics.common_header.format_revision) << "\n";
@@ -960,6 +937,22 @@ int main() {
<< gpu_metrics.pcie_replay_count_acc << "\n";
std::cout << "\t**.pcie_replay_rover_count_acc : " << std::dec
<< gpu_metrics.pcie_replay_rover_count_acc << "\n";
std::cout << "\t**.accumulation_counter : " << std::dec
<< gpu_metrics.accumulation_counter << "\n";
std::cout << "\t**.prochot_residency_acc : " << std::dec
<< gpu_metrics.prochot_residency_acc << "\n";
std::cout << "\t**.ppt_residency_acc : " << std::dec
<< gpu_metrics.ppt_residency_acc << "\n";
std::cout << "\t**.socket_thm_residency_acc : " << std::dec
<< gpu_metrics.socket_thm_residency_acc << "\n";
std::cout << "\t**.vr_thm_residency_acc : " << std::dec
<< gpu_metrics.vr_thm_residency_acc << "\n";
std::cout << "\t**.hbm_thm_residency_acc : " << std::dec
<< gpu_metrics.hbm_thm_residency_acc << "\n";
std::cout << "\t**.num_partition: " << std::dec
<< gpu_metrics.num_partition << "\n";
std::cout << "\t**.pcie_lc_perf_other_end_recovery: "
<< gpu_metrics.pcie_lc_perf_other_end_recovery << "\n";
std::cout << "\t**.temperature_hbm[] : " << std::dec << "\n";
for (const auto& temp : gpu_metrics.temperature_hbm) {
@@ -1001,23 +994,70 @@ int main() {
std::cout << "\t -> " << std::dec << dclk << "\n";
}
std::cout << std::dec << "xcp_stats.gfx_busy_inst = \n";
auto xcp = 0;
for (auto& row : gpu_metrics.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.gfx_busy_inst),
std::end(row.gfx_busy_inst),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
xcp = 0;
std::cout << std::dec << "xcp_stats.jpeg_busy = \n";
for (auto& row : gpu_metrics.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.jpeg_busy),
std::end(row.jpeg_busy),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
xcp = 0;
std::cout << std::dec << "xcp_stats.vcn_busy = \n";
for (auto& row : gpu_metrics.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.vcn_busy),
std::end(row.vcn_busy),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
xcp = 0;
std::cout << std::dec << "xcp_stats.gfx_busy_acc = \n";
for (auto& row : gpu_metrics.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.gfx_busy_acc),
std::end(row.gfx_busy_acc),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
std::cout << "\n";
std::cout << "\t ** -> Checking metrics with constant changes ** " << "\n";
constexpr uint16_t kMAX_ITER_TEST = 10;
rsmi_gpu_metrics_t gpu_metrics_check;
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
std::cout << "\t\t -> firmware_timestamp [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.firmware_timestamp << "\n";
std::cout << "\t\t -> firmware_timestamp [" << idx
<< "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.firmware_timestamp << "\n";
}
std::cout << "\n";
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
std::cout << "\t\t -> system_clock_counter [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.system_clock_counter << "\n";
std::cout << "\t\t -> system_clock_counter [" << idx
<< "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.system_clock_counter << "\n";
}
std::cout << "\n\n";
std::cout << " ** Note: Values MAX'ed out (UINTX MAX are unsupported for the version in question) ** " << "\n";
std::cout << " ** Note: Values MAX'ed out "
"(UINTX MAX are unsupported for the version in question) ** " << "\n";
std::cout << "\n\n";
@@ -1026,14 +1066,16 @@ int main() {
ret = rsmi_dev_metrics_header_info_get(i, &header_values);
std::cout << "\t[Metrics Header]" << "\n";
std::cout << "\t -> format_revision : " << print_unsigned_int(header_values.format_revision) << "\n";
std::cout << "\t -> content_revision : " << print_unsigned_int(header_values.content_revision) << "\n";
std::cout << "\t -> format_revision : "
<< print_unsigned_int(header_values.format_revision) << "\n";
std::cout << "\t -> content_revision : "
<< print_unsigned_int(header_values.content_revision) << "\n";
std::cout << "\t--------------------" << "\n";
std::cout << "\n";
std::cout << "\t[XCD CounterVoltage]" << "\n";
ret = rsmi_dev_metrics_xcd_counter_get(i, &val_ui16);
std::cout << "\t -> xcd_counter(): " << print_error_or_value(ret, val_ui16) << "\n";
std::cout << "\t -> xcd_counter(): " << val_ui16;
std::cout << "\n\n";
ret = rsmi_dev_perf_level_get(i, &pfl);
@@ -1041,8 +1083,12 @@ int main() {
std::cout << "\t**Performance Level:" <<
perf_level_string(pfl) << "\n";
ret = rsmi_dev_overdrive_level_get(i, &val_ui32);
CHK_AND_PRINT_RSMI_ERR_RET(ret)
std::cout << "\t**OverDrive Level:" << val_ui32 << "\n";
std::cout << "\t**OverDrive Level: ";
if (ret == RSMI_STATUS_SUCCESS) {
std::cout << val_ui32 << "\n";
} else {
CHK_RSMI_NOT_SUPPORTED_OR_UNEXPECTED_DATA_RET(ret)
}
print_test_header("GPU Clocks", i);
for (int clkType = static_cast<int>(RSMI_CLK_TYPE_SYS);
@@ -1159,9 +1205,6 @@ int main() {
}
for (uint32_t i = 0; i < num_monitor_devs; ++i) {
ret = test_set_overdrive(i);
CHK_AND_PRINT_RSMI_ERR_RET(ret)
ret = test_set_perf_level(i);
CHK_AND_PRINT_RSMI_ERR_RET(ret)
@@ -1182,6 +1225,9 @@ int main() {
ret = test_set_memory_partition(i);
CHK_AND_PRINT_RSMI_ERR_RET(ret)
ret = test_set_overdrive(i);
CHK_RSMI_NOT_SUPPORTED_RET(ret)
}
return 0;
+3
ファイルの表示
@@ -5846,6 +5846,9 @@ rsmi_dev_metrics_xcd_counter_get(uint32_t dv_ind, uint16_t* xcd_counter_value)
auto status_code = rsmi_dev_gpu_metrics_info_get(dv_ind, &gpu_metrics);
if (status_code == rsmi_status_t::RSMI_STATUS_SUCCESS) {
for (const auto& gfxclk : gpu_metrics.current_gfxclks) {
if (gfxclk == UINT16_MAX) {
break;
}
if ((gfxclk != 0) && (gfxclk != UINT16_MAX)) {
xcd_counter++;
}
+9 -7
ファイルの表示
@@ -964,15 +964,17 @@ int Device::readDevInfoBinary(DevInfoTypes type, std::size_t b_size,
LOG_ERROR(ss);
return ENOENT;
}
ss << "Successfully read DevInfoBinary for DevInfoType ("
<< devInfoTypesStrings.at(type) << ") - SYSFS ("
<< sysfs_path << "), returning binaryData = " << p_binary_data
<< "; byte_size = " << std::dec << static_cast<int>(b_size);
if (ROCmLogging::Logger::getInstance()->isLoggerEnabled()) {
ss << "Successfully read DevInfoBinary for DevInfoType ("
<< devInfoTypesStrings.at(type) << ") - SYSFS ("
<< sysfs_path << "), returning binaryData = " << p_binary_data
<< "; byte_size = " << std::dec << static_cast<int>(b_size);
std::string metricDescription = "AMD SMI GPU METRICS (16-byte width), "
std::string metricDescription = "AMD SMI GPU METRICS (16-byte width), "
+ sysfs_path;
logHexDump(metricDescription.c_str(), p_binary_data, b_size, 16);
LOG_INFO(ss);
logHexDump(metricDescription.c_str(), p_binary_data, b_size, 16);
LOG_INFO(ss);
}
return 0;
}
+983 -270
ファイルの表示
ファイル差分が大きすぎるため省略します 差分を読み込み
+228 -115
ファイルの表示
@@ -5,7 +5,7 @@
* The University of Illinois/NCSA
* Open Source License (NCSA)
*
* Copyright (c) 2020, Advanced Micro Devices, Inc.
* Copyright (c) 2020-2024, Advanced Micro Devices, Inc.
* All rights reserved.
*
* Developed by:
@@ -46,7 +46,9 @@
#include <stdint.h>
#include <stddef.h>
#include <algorithm>
#include <iostream>
#include <iterator>
#include <sstream>
#include <string>
#include <map>
@@ -89,36 +91,6 @@ void TestGpuMetricsRead::Close() {
}
using GPUMetricResults_t = std::map<std::string, rsmi_status_t>;
GPUMetricResults_t MetricResults{};
template <typename T>
auto print_error_or_value(std::string title, std::string func_name, const T& metric) {
auto str_values = title;
const auto status_code = MetricResults.at(func_name);
if (status_code == rsmi_status_t::RSMI_STATUS_SUCCESS) {
if constexpr (std::is_array_v<T>) {
auto idx = uint16_t(0);
const auto num_elems = static_cast<uint16_t>(std::end(metric) - std::begin(metric));
str_values += ("\n\t\t num of values: " + std::to_string(num_elems) + "\n");
for (const auto& el : metric) {
str_values += "\t\t [" + std::to_string(idx) + "]: " + std::to_string(el) + "\n";
++idx;
}
return str_values;
}
else if constexpr ((std::is_same_v<T, std::uint16_t>) ||
(std::is_same_v<T, std::uint32_t>) ||
(std::is_same_v<T, std::uint64_t>)) {
return str_values += std::to_string(metric);
}
}
else {
return str_values += ("\n\t\tStatus: [" + std::to_string(status_code) + "] " + "-> " + amd::smi::getRSMIStatusString(status_code));
}
};
void TestGpuMetricsRead::Run(void) {
rsmi_status_t err;
@@ -140,13 +112,15 @@ void TestGpuMetricsRead::Run(void) {
auto ret = rsmi_dev_metrics_header_info_get(i, &header_values);
if (ret == rsmi_status_t::RSMI_STATUS_SUCCESS) {
std::cout << "\t[Metrics Header]" << "\n";
std::cout << "\t -> format_revision : " << amd::smi::print_unsigned_int(header_values.format_revision) << "\n";
std::cout << "\t -> content_revision : " << amd::smi::print_unsigned_int(header_values.content_revision) << "\n";
std::cout << "\t -> format_revision : "
<< static_cast<uint16_t>(header_values.format_revision) << "\n";
std::cout << "\t -> content_revision : "
<< static_cast<uint16_t>(header_values.content_revision) << "\n";
std::cout << "\t--------------------" << "\n";
}
}
rsmi_gpu_metrics_t smu;
rsmi_gpu_metrics_t smu = {};
err = rsmi_dev_gpu_metrics_info_get(i, &smu);
if (err != RSMI_STATUS_SUCCESS) {
if (err == RSMI_STATUS_NOT_SUPPORTED) {
@@ -159,96 +133,232 @@ void TestGpuMetricsRead::Run(void) {
} else {
CHK_ERR_ASRT(err);
IF_VERB(STANDARD) {
std::cout << std::dec << "\tsystem_clock_counter=" << smu.system_clock_counter << '\n';
std::cout << std::dec << "\ttemperature_edge=" << smu.temperature_edge << '\n';
std::cout << std::dec << "\ttemperature_hotspot=" << smu.temperature_hotspot << '\n';
std::cout << std::dec << "\ttemperature_mem=" << smu.temperature_mem << '\n';
std::cout << std::dec << "\ttemperature_vrgfx=" << smu.temperature_vrgfx << '\n';
std::cout << std::dec << "\ttemperature_vrsoc=" << smu.temperature_vrsoc << '\n';
std::cout << std::dec << "\ttemperature_vrmem=" << smu.temperature_vrmem << '\n';
std::cout << std::dec << "\taverage_gfx_activity=" << smu.average_gfx_activity << '\n';
std::cout << std::dec << "\taverage_umc_activity=" << smu.average_umc_activity << '\n';
std::cout << std::dec << "\taverage_mm_activity=" << smu.average_mm_activity << '\n';
std::cout << std::dec << "\taverage_socket_power=" << smu.average_socket_power << '\n';
std::cout << std::dec << "\tenergy_accumulator=" << smu.energy_accumulator << '\n';
std::cout << std::dec << "\taverage_gfxclk_frequency=" << smu.average_gfxclk_frequency << '\n';
std::cout << std::dec << "\taverage_uclk_frequency=" << smu.average_uclk_frequency << '\n';
std::cout << std::dec << "\taverage_vclk0_frequency=" << smu.average_vclk0_frequency << '\n';
std::cout << std::dec << "\taverage_dclk0_frequency=" << smu.average_dclk0_frequency << '\n';
std::cout << std::dec << "\taverage_vclk1_frequency=" << smu.average_vclk1_frequency << '\n';
std::cout << std::dec << "\taverage_dclk1_frequency=" << smu.average_dclk1_frequency << '\n';
std::cout << std::dec << "\tcurrent_gfxclk=" << smu.current_gfxclk << '\n';
std::cout << std::dec << "\tcurrent_socclk=" << smu.current_socclk << '\n';
std::cout << std::dec << "\tcurrent_uclk=" << smu.current_uclk << '\n';
std::cout << std::dec << "\tcurrent_vclk0=" << smu.current_vclk0 << '\n';
std::cout << std::dec << "\tcurrent_dclk0=" << smu.current_dclk0 << '\n';
std::cout << std::dec << "\tcurrent_vclk1=" << smu.current_vclk1 << '\n';
std::cout << std::dec << "\tcurrent_dclk1=" << smu.current_dclk1 << '\n';
std::cout << std::dec << "\tthrottle_status=" << smu.throttle_status << '\n';
std::cout << std::dec << "\tcurrent_fan_speed=" << smu.current_fan_speed << '\n';
std::cout << std::dec << "\tpcie_link_width=" << smu.pcie_link_width << '\n';
std::cout << std::dec << "\tpcie_link_speed=" << smu.pcie_link_speed << '\n';
std::cout << std::dec << "\tgfx_activity_acc=" << std::dec << smu.gfx_activity_acc << '\n';
std::cout << std::dec << "\tmem_activity_acc=" << std::dec << smu.mem_activity_acc << '\n';
for (int i = 0; i < RSMI_NUM_HBM_INSTANCES; ++i) {
std::cout << "\ttemperature_hbm[" << i << "]=" << std::dec << smu.temperature_hbm[i] << '\n';
}
std::cout << "\n";
std::cout << "\tfirmware_timestamp=" << std::dec << smu.firmware_timestamp << '\n';
std::cout << "\tvoltage_soc=" << std::dec << smu.voltage_soc << '\n';
std::cout << "\tvoltage_gfx=" << std::dec << smu.voltage_gfx << '\n';
std::cout << "\tvoltage_mem=" << std::dec << smu.voltage_mem << '\n';
std::cout << "\tindep_throttle_status=" << std::dec << smu.indep_throttle_status << '\n';
std::cout << "\tcurrent_socket_power=" << std::dec << smu.current_socket_power << '\n';
for (int i = 0; i < RSMI_MAX_NUM_VCNS; ++i) {
std::cout << "\tvcn_activity[" << i << "]=" << std::dec << smu.vcn_activity[i] << '\n';
}
std::cout << "\n";
for (int i = 0; i < RSMI_MAX_NUM_JPEG_ENGS; ++i) {
std::cout << "\tjpeg_activity[" << i << "]=" << std::dec << smu.jpeg_activity[i] << '\n';
}
std::cout << "\n";
std::cout << "\tgfxclk_lock_status=" << std::dec << smu.gfxclk_lock_status << '\n';
std::cout << "\txgmi_link_width=" << std::dec << smu.xgmi_link_width << '\n';
std::cout << "\txgmi_link_speed=" << std::dec << smu.xgmi_link_speed << '\n';
std::cout << "\tpcie_bandwidth_acc=" << std::dec << smu.pcie_bandwidth_acc << '\n';
std::cout << "\tpcie_bandwidth_inst=" << std::dec << smu.pcie_bandwidth_inst << '\n';
std::cout << "\tpcie_l0_to_recov_count_acc=" << std::dec << smu.pcie_l0_to_recov_count_acc << '\n';
std::cout << "\tpcie_replay_count_acc=" << std::dec << smu.pcie_replay_count_acc << '\n';
std::cout << "\tpcie_replay_rover_count_acc=" << std::dec << smu.pcie_replay_rover_count_acc << '\n';
for (int i = 0; i < RSMI_MAX_NUM_XGMI_LINKS; ++i) {
std::cout << "\txgmi_read_data_acc[" << i << "]=" << std::dec << smu.xgmi_read_data_acc[i] << '\n';
}
std::cout << "METRIC TABLE HEADER:\n";
std::cout << "structure_size=" << std::dec
<< static_cast<uint16_t>(smu.common_header.structure_size) << "\n";
std::cout << "format_revision=" << std::dec
<< static_cast<uint16_t>(smu.common_header.format_revision) << "\n";
std::cout << "content_revision=" << std::dec
<< static_cast<uint16_t>(smu.common_header.content_revision) << "\n";
std::cout << "\n";
for (int i = 0; i < RSMI_MAX_NUM_XGMI_LINKS; ++i) {
std::cout << "\txgmi_write_data_acc[" << i << "]=" << std::dec << smu.xgmi_write_data_acc[i] << '\n';
}
std::cout << "TIME STAMPS (ns):\n";
std::cout << std::dec << "system_clock_counter=" << smu.system_clock_counter << "\n";
std::cout << "firmware_timestamp (10ns resolution)=" << std::dec << smu.firmware_timestamp
<< "\n";
std::cout << "\n";
for (int i = 0; i < RSMI_MAX_NUM_GFX_CLKS; ++i) {
std::cout << "\tcurrent_gfxclks[" << i << "]=" << std::dec << smu.current_gfxclks[i] << '\n';
}
std::cout << "TEMPERATURES (C):\n";
std::cout << std::dec << "temperature_edge= " << smu.temperature_edge << "\n";
std::cout << std::dec << "temperature_hotspot= " << smu.temperature_hotspot << "\n";
std::cout << std::dec << "temperature_mem= " << smu.temperature_mem << "\n";
std::cout << std::dec << "temperature_vrgfx= " << smu.temperature_vrgfx << "\n";
std::cout << std::dec << "temperature_vrsoc= " << smu.temperature_vrsoc << "\n";
std::cout << std::dec << "temperature_vrmem= " << smu.temperature_vrmem << "\n";
std::cout << "temperature_hbm = [";
std::copy(std::begin(smu.temperature_hbm),
std::end(smu.temperature_hbm),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << "\n";
for (int i = 0; i < RSMI_MAX_NUM_CLKS; ++i) {
std::cout << "\tcurrent_socclks[" << i << "]=" << std::dec << smu.current_socclks[i] << '\n';
}
std::cout << "UTILIZATION (%):\n";
std::cout << std::dec << "average_gfx_activity=" << smu.average_gfx_activity << "\n";
std::cout << std::dec << "average_umc_activity=" << smu.average_umc_activity << "\n";
std::cout << std::dec << "average_mm_activity=" << smu.average_mm_activity << "\n";
std::cout << std::dec << "vcn_activity= [";
std::copy(std::begin(smu.vcn_activity),
std::end(smu.vcn_activity),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << "\n";
for (int i = 0; i < RSMI_MAX_NUM_CLKS; ++i) {
std::cout << "\tcurrent_vclk0s[" << i << "]=" << std::dec << smu.current_vclk0s[i] << '\n';
}
std::cout << std::dec << "jpeg_activity= [";
std::copy(std::begin(smu.jpeg_activity),
std::end(smu.jpeg_activity),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << "\n";
for (int i = 0; i < RSMI_MAX_NUM_CLKS; ++i) {
std::cout << "\tcurrent_dclk0s[" << i << "]=" << std::dec << smu.current_dclk0s[i] << '\n';
std::cout << "POWER (W)/ENERGY (15.259uJ per 1ns):\n";
std::cout << std::dec << "average_socket_power=" << smu.average_socket_power << "\n";
std::cout << std::dec << "current_socket_power=" << smu.current_socket_power << "\n";
std::cout << std::dec << "energy_accumulator=" << smu.energy_accumulator << "\n";
std::cout << "\n";
std::cout << "AVG CLOCKS (MHz):\n";
std::cout << std::dec << "average_gfxclk_frequency=" << smu.average_gfxclk_frequency
<< "\n";
std::cout << std::dec << "average_gfxclk_frequency=" << smu.average_gfxclk_frequency
<< "\n";
std::cout << std::dec << "average_uclk_frequency=" << smu.average_uclk_frequency << "\n";
std::cout << std::dec << "average_vclk0_frequency=" << smu.average_vclk0_frequency
<< "\n";
std::cout << std::dec << "average_dclk0_frequency=" << smu.average_dclk0_frequency
<< "\n";
std::cout << std::dec << "average_vclk1_frequency=" << smu.average_vclk1_frequency
<< "\n";
std::cout << std::dec << "average_dclk1_frequency=" << smu.average_dclk1_frequency
<< "\n";
std::cout << "\n";
std::cout << "CURRENT CLOCKS (MHz):\n";
std::cout << std::dec << "current_gfxclk=" << smu.current_gfxclk << "\n";
std::cout << std::dec << "current_gfxclks= [";
std::copy(std::begin(smu.current_gfxclks),
std::end(smu.current_gfxclks),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << std::dec << "current_socclk=" << smu.current_socclk << "\n";
std::cout << std::dec << "current_socclks= [";
std::copy(std::begin(smu.current_socclks),
std::end(smu.current_socclks),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << std::dec << "current_uclk=" << smu.current_uclk << "\n";
std::cout << std::dec << "current_vclk0=" << smu.current_vclk0 << "\n";
std::cout << std::dec << "current_vclk0s= [";
std::copy(std::begin(smu.current_vclk0s),
std::end(smu.current_vclk0s),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << std::dec << "current_dclk0=" << smu.current_dclk0 << "\n";
std::cout << std::dec << "current_dclk0s= [";
std::copy(std::begin(smu.current_dclk0s),
std::end(smu.current_dclk0s),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << std::dec << "current_vclk1=" << smu.current_vclk1 << "\n";
std::cout << std::dec << "current_dclk1=" << smu.current_dclk1 << "\n";
std::cout << "\n";
std::cout << "TROTTLE STATUS:\n";
std::cout << std::dec << "throttle_status=" << smu.throttle_status << "\n";
std::cout << "\n";
std::cout << "FAN SPEED:\n";
std::cout << std::dec << "current_fan_speed=" << smu.current_fan_speed << "\n";
std::cout << "\n";
std::cout << "LINK WIDTH (number of lanes) /SPEED (0.1 GT/s):\n";
std::cout << "pcie_link_width=" << smu.pcie_link_width << "\n";
std::cout << "pcie_link_speed=" << smu.pcie_link_speed << "\n";
std::cout << "xgmi_link_width=" << smu.xgmi_link_width << "\n";
std::cout << "xgmi_link_speed=" << smu.xgmi_link_speed << "\n";
std::cout << "\n";
std::cout << "Utilization Accumulated(%):\n";
std::cout << "gfx_activity_acc=" << std::dec << smu.gfx_activity_acc << "\n";
std::cout << "mem_activity_acc=" << std::dec << smu.mem_activity_acc << "\n";
std::cout << "\n";
std::cout << "XGMI ACCUMULATED DATA TRANSFER SIZE (KB):\n";
std::cout << std::dec << "xgmi_read_data_acc= [";
std::copy(std::begin(smu.xgmi_read_data_acc),
std::end(smu.xgmi_read_data_acc),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
std::cout << std::dec << "xgmi_write_data_acc= [";
std::copy(std::begin(smu.xgmi_write_data_acc),
std::end(smu.xgmi_write_data_acc),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << std::dec << "]\n";
// Voltage (mV)
std::cout << "voltage_soc = " << std::dec << smu.voltage_soc << "\n";
std::cout << "voltage_gfx = " << std::dec << smu.voltage_gfx << "\n";
std::cout << "voltage_mem = " << std::dec << smu.voltage_mem << "\n";
std::cout << "indep_throttle_status = " << std::dec << smu.indep_throttle_status << "\n";
// Clock Lock Status. Each bit corresponds to clock instance
std::cout << "gfxclk_lock_status (in hex) = " << std::hex
<< smu.gfxclk_lock_status << std::dec <<"\n";
// Bandwidth (GB/sec)
std::cout << "pcie_bandwidth_acc=" << std::dec << smu.pcie_bandwidth_acc << "\n";
std::cout << "pcie_bandwidth_inst=" << std::dec << smu.pcie_bandwidth_inst << "\n";
// Counts
std::cout << "pcie_l0_to_recov_count_acc= " << std::dec << smu.pcie_l0_to_recov_count_acc
<< "\n";
std::cout << "pcie_replay_count_acc= " << std::dec << smu.pcie_replay_count_acc << "\n";
std::cout << "pcie_replay_rover_count_acc= " << std::dec
<< smu.pcie_replay_rover_count_acc << "\n";
std::cout << "pcie_nak_sent_count_acc= " << std::dec << smu.pcie_nak_sent_count_acc
<< "\n";
std::cout << "pcie_nak_rcvd_count_acc= " << std::dec << smu.pcie_nak_rcvd_count_acc
<< "\n";
// PCIE other end recovery counter
std::cout << "pcie_lc_perf_other_end_recovery = "
<< std::dec << smu.pcie_lc_perf_other_end_recovery << "\n";
// Accumulation cycle counter
// Accumulated throttler residencies
std::cout << "\n";
std::cout << "RESIDENCY ACCUMULATION / COUNTER:\n";
std::cout << "accumulation_counter = " << std::dec << smu.accumulation_counter << "\n";
std::cout << "prochot_residency_acc = " << std::dec << smu.prochot_residency_acc << "\n";
std::cout << "ppt_residency_acc = " << std::dec << smu.ppt_residency_acc << "\n";
std::cout << "socket_thm_residency_acc = " << std::dec << smu.socket_thm_residency_acc
<< "\n";
std::cout << "vr_thm_residency_acc = " << std::dec << smu.vr_thm_residency_acc
<< "\n";
std::cout << "hbm_thm_residency_acc = " << std::dec << smu.hbm_thm_residency_acc << "\n";
// Number of current partitions
std::cout << "num_partition = " << std::dec << smu.num_partition << "\n";
std::cout << std::dec << "xcp_stats.gfx_busy_inst = \n";
auto xcp = 0;
for (auto& row : smu.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.gfx_busy_inst),
std::end(row.gfx_busy_inst),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
xcp = 0;
std::cout << std::dec << "xcp_stats.jpeg_busy = \n";
for (auto& row : smu.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.jpeg_busy),
std::end(row.jpeg_busy),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
xcp = 0;
std::cout << std::dec << "xcp_stats.vcn_busy = \n";
for (auto& row : smu.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.vcn_busy),
std::end(row.vcn_busy),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
xcp = 0;
std::cout << std::dec << "xcp_stats.gfx_busy_acc = \n";
for (auto& row : smu.xcp_stats) {
std::cout << "XCP[" << xcp << "] = " << "[ ";
std::copy(std::begin(row.gfx_busy_acc),
std::end(row.gfx_busy_acc),
amd::smi::make_ostream_joiner(&std::cout, ", "));
std::cout << " ]\n";
xcp++;
}
std::cout << "\n\n";
std::cout << "\t ** -> Checking metrics with constant changes ** " << "\n";
@@ -256,17 +366,20 @@ void TestGpuMetricsRead::Run(void) {
rsmi_gpu_metrics_t gpu_metrics_check;
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
std::cout << "\t\t -> firmware_timestamp [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.firmware_timestamp << "\n";
std::cout << "\t\t -> firmware_timestamp [" << idx << "/" << kMAX_ITER_TEST << "]: "
<< gpu_metrics_check.firmware_timestamp << "\n";
}
std::cout << "\n";
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
std::cout << "\t\t -> system_clock_counter [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.system_clock_counter << "\n";
std::cout << "\t\t -> system_clock_counter [" << idx << "/" << kMAX_ITER_TEST << "]: "
<< gpu_metrics_check.system_clock_counter << "\n";
}
std::cout << "\n";
std::cout << " ** Note: Values MAX'ed out (UINTX MAX are unsupported for the version in question) ** " << "\n\n";
std::cout << " ** Note: Values MAX'ed out "
<< "(UINTX MAX are unsupported for the version in question) ** " << "\n\n";
}
}
+89 -31
ファイルの表示
@@ -53,6 +53,7 @@
#include "rocm_smi/rocm_smi.h"
#include "rocm_smi_test/functional/measure_api_execution_time.h"
#include "rocm_smi_test/test_common.h"
#include "rocm_smi/rocm_smi_utils.h"
TestMeasureApiExecutionTime::TestMeasureApiExecutionTime() : TestBase() {
@@ -89,8 +90,31 @@ void TestMeasureApiExecutionTime::Run(void) {
rsmi_temperature_metric_t met = RSMI_TEMP_CURRENT;
rsmi_status_t ret;
float repeat = 300.0;
constexpr uint32_t kFAN_SPEED_ELAPSED_MS_BASE = (1000);
constexpr uint32_t kMETRICS_ELAPSED_MS_BASE = (1500);
constexpr float kFAN_SPEED_ELAPSED_MICROSEC_BASE = (1000);
/**
* gpu_metrics can only refresh every 1000 microseconds (1 millisecs) due to FW
*
* We have additional processing time (each read() -> fread() ~ costs 900 microseconds).
* We need to read 2x:
* 1) reading metric's header to check support (~900 microseconds)
* 2) read full metric based on defined structure (~900 microseconds)
* 3) Setup backwards compatiblity (~100 microseconds)
* 4) Put data into structures (~100 microseconds)
* 5) Pass to public structure (~100 microseconds)
* ---------------------------
* ~2100 worst case
*
* Note: performance of fread/mmap/read
* https://github.com/nurettn/c-read-vs-mmap-vs-fread
*
* Possible improvments ideas:
* a) Initize "N/A" / Max UINT only for non-backwards comptable public struct
* or arrays
* b) Directly put data into public structure - this skips other copy/fill
* procedures
* c) Expirement with other file reading options
**/
constexpr float kMETRICS_ELAPSED_MICROSEC_BASE = (2100);
bool skip = false;
TestBase::Run();
@@ -107,91 +131,125 @@ void TestMeasureApiExecutionTime::Run(void) {
for (uint32_t dv_ind = 0; dv_ind < num_monitor_devs(); ++dv_ind) {
PrintDeviceHeader(dv_ind);
//test execution time for rsmi_dev_fan_speed_get
// test execution time for rsmi_dev_fan_speed_get
auto start = std::chrono::high_resolution_clock::now();
for (int i=0; i < static_cast<int>(repeat); ++i){
for (int i=0; i < static_cast<int>(repeat); ++i) {
ret = rsmi_dev_fan_speed_get(dv_ind, 0, &val_i64);
}
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast
<std::chrono::microseconds>(stop - start);
if (ret != RSMI_STATUS_SUCCESS){
std::cout << "\n\trsmi_dev_fan_speed_get returned: "
<< amd::smi::getRSMIStatusString(ret) << "\n";
if (ret != RSMI_STATUS_SUCCESS) {
skip = true;
}
std::cout << std:: endl;
// Expected performance: (stop - start) over all iterations [in microseconds]
// == (expected microseconds * # of iterations)
if (!skip) {
std::cout << "\trsmi_dev_fan_speed_get execution time: " <<
(static_cast<float>(duration.count()) / repeat) << " microseconds" << std::endl;
EXPECT_LT(duration.count(), (kFAN_SPEED_ELAPSED_MS_BASE * repeat));
std::cout << "\trsmi_dev_fan_speed_get() total execution time: "
<< std::to_string((static_cast<float>(duration.count())))
<< " microseconds, expected < "
<< std::to_string((static_cast<float>(kFAN_SPEED_ELAPSED_MICROSEC_BASE) * repeat))
<< " microseconds" << std::endl;
std::cout << "\trsmi_dev_fan_speed_get() average execution time: "
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
EXPECT_LT(duration.count(), static_cast<float>(kFAN_SPEED_ELAPSED_MICROSEC_BASE) * repeat);
}
skip = false;
//test execution time for rsmi_dev_temp_metric_get
// test execution time for rsmi_dev_temp_metric_get
start = std::chrono::high_resolution_clock::now();
for (int i=0; i < static_cast<int>(repeat); ++i){
for (int i=0; i < static_cast<int>(repeat); ++i) {
ret = rsmi_dev_temp_metric_get(dv_ind, 0, met, &val_i64);
}
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast
<std::chrono::microseconds>(stop - start);
if (ret != RSMI_STATUS_SUCCESS){
std::cout << "\n\trsmi_dev_temp_metric_get returned: "
<< amd::smi::getRSMIStatusString(ret) << "\n";
if (ret != RSMI_STATUS_SUCCESS) {
skip = true;
}
if (!skip) {
std::cout << "\trsmi_dev_temp_metric_get execution time: " <<
(static_cast<float>(duration.count()) / repeat) << " microseconds" << std::endl;
EXPECT_LT(duration.count(), (kMETRICS_ELAPSED_MS_BASE * repeat));
std::cout << "\trsmi_dev_temp_metric_get() total execution time: "
<< std::to_string((static_cast<float>(duration.count())))
<< " microseconds, expected < "
<< std::to_string((static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat))
<< " microseconds" << std::endl;
std::cout << "\trsmi_dev_temp_metric_get() average execution time: "
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
EXPECT_LT(duration.count(), (static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat));
}
skip = false;
//test execution time for rsmi_dev_gpu_metrics_info_get
// test execution time for rsmi_dev_gpu_metrics_info_get
start = std::chrono::high_resolution_clock::now();
for (int i=0; i < static_cast<int>(repeat); ++i){
for (int i=0; i < static_cast<int>(repeat); ++i) {
ret = rsmi_dev_gpu_metrics_info_get(dv_ind, &smu);
}
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast
<std::chrono::microseconds>(stop - start) ;
<std::chrono::microseconds>(stop - start);
if (ret != RSMI_STATUS_SUCCESS){
std::cout << "\n\trsmi_dev_gpu_metrics_info_get returned: "
<< amd::smi::getRSMIStatusString(ret) << "\n";
if (ret != RSMI_STATUS_SUCCESS) {
skip = true;
}
if (!skip) {
std::cout << "\trsmi_dev_gpu_metrics_info_get execution time: " <<
(static_cast<float>(duration.count()) / repeat ) << " microseconds" << std::endl;
EXPECT_LT(duration.count(), (kMETRICS_ELAPSED_MS_BASE * repeat));
std::cout << "\trsmi_dev_gpu_metrics_info_get() total execution time: "
<< std::to_string(static_cast<float>(duration.count()))
<< " microseconds, expected < "
<< std::to_string((kMETRICS_ELAPSED_MICROSEC_BASE * repeat))
<< " microseconds" << std::endl;
std::cout << "\trsmi_dev_gpu_metrics_info_get() average execution time: "
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
EXPECT_LT(static_cast<float>(duration.count()),
static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat);
}
skip = false;
auto val_ui16 = static_cast<uint16_t>(0);
auto status_code(rsmi_status_t::RSMI_STATUS_SUCCESS);
start = std::chrono::high_resolution_clock::now();
for (int i=0; i < static_cast<int>(repeat); ++i){
for (int i=0; i < static_cast<int>(repeat); ++i) {
status_code = rsmi_dev_metrics_xcd_counter_get(dv_ind, &val_ui16);
}
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
if (status_code != rsmi_status_t::RSMI_STATUS_SUCCESS){
std::cout << "\n\tsmi_dev_metrics_xcd_counter_get returned: "
<< amd::smi::getRSMIStatusString(ret) << "\n";
if (status_code != rsmi_status_t::RSMI_STATUS_SUCCESS) {
skip = true;
}
if (!skip) {
std::cout << "\trsmi_dev_metrics_xcd_counter_get() execution time: "
<< (static_cast<float>(duration.count()) / repeat) << " microseconds" << std::endl;
EXPECT_LT(duration.count(), (kMETRICS_ELAPSED_MS_BASE * repeat));
std::cout << "\trsmi_dev_metrics_xcd_counter_get() total execution time: "
<< std::to_string((static_cast<float>(duration.count())))
<< " microseconds, expected < "
<< std::to_string((static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat))
<< " microseconds" << std::endl;
std::cout << "\trsmi_dev_metrics_xcd_counter_get() average execution time: "
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
EXPECT_LT(duration.count(), static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat);
}
skip = false;
}
std::cout.precision(prev);
auto test_stop = std::chrono::high_resolution_clock::now();
auto test_duration = std::chrono::duration_cast<std::chrono::microseconds>(test_stop - test_start);
auto test_duration = std::chrono::duration_cast<std::chrono::microseconds>(
test_stop - test_start);
std::cout << "\n" << "============================================================================" << "\n";
std::cout << "\n"
<< "============================================================================" << "\n";
std::cout << " Total execution time (All APIs): "
<< (static_cast<float>(test_duration.count()) / repeat) << " microseconds" << "\n";
std::cout << "============================================================================" << "\n";
<< (test_duration.count()) << " microseconds" << "\n";
std::cout
<< "============================================================================" << "\n";
}