[SWDEV-422195/SWDEV-440985] GPU metrics 1.6 + --showmetrics
Changes: - Added new GPU metrics: 1) Violation status' (ex. PVIOL/TVIOL) accumulators 2) XCP (Graphics Compute Partitions) statistics 3) pcie other end recovery counter - Added rocm-smi --showmetrics Units/values reflect as indicated by driver, may differ from AMD SMI or other ROCm SMI interfaces which use these fields. - N/A fields means the device does not support providing this data. Change-Id: Ia2cd3bb65c4f474ebdb39db8062ea716f2b4d8ee Signed-off-by: Charis Poag <Charis.Poag@amd.com>
このコミットが含まれているのは:
+63
-8
@@ -4,22 +4,68 @@ Full documentation for rocm_smi_lib is available at [https://rocm.docs.amd.com/]
|
||||
|
||||
***All information listed below is for reference and subject to change.***
|
||||
|
||||
|
||||
## rocm_smi_lib for ROCm 6.3
|
||||
|
||||
### Changes
|
||||
|
||||
- **Added support for GPU metrics 1.6 to `rsmi_dev_gpu_metrics_info_get()`**
|
||||
Updated `rsmi_dev_gpu_metrics_info_get()` and structure `rsmi_gpu_metrics_t` to include new fields for PVIOL / TVIOL, XCP (Graphics Compute Partitions) stats, and pcie_lc_perf_other_end_recovery:
|
||||
- `uint64_t accumulation_counter` - used for all throttled calculations
|
||||
- `uint64_t prochot_residency_acc` - Processor hot accumulator
|
||||
- `uint64_t ppt_residency_acc` - Package Power Tracking (PPT) accumulator (used in PVIOL calculations)
|
||||
- `uint64_t socket_thm_residency_acc` - Socket thermal accumulator - (used in TVIOL calculations)
|
||||
- `uint64_t vr_thm_residency_acc` - Voltage Rail (VR) thermal accumulator
|
||||
- `uint64_t hbm_thm_residency_acc` - High Bandwidth Memory (HBM) thermal accumulator
|
||||
- `uint16_t num_partition` - corresponds to the current total number of partitions
|
||||
- `struct amdgpu_xcp_metrics_t xcp_stats[MAX_NUM_XCP]` - for each partition associated with current GPU, provides gfx busy & accumulators, jpeg, and decoder (VCN) engine utilizations
|
||||
- `uint32_t gfx_busy_inst[MAX_NUM_XCC]` - graphic engine utilization (%)
|
||||
- `uint16_t jpeg_busy[MAX_NUM_JPEG_ENGS]` - jpeg engine utilization (%)
|
||||
- `uint16_t vcn_busy[MAX_NUM_VCNS]` - decoder (VCN) engine utilization (%)
|
||||
- `uint64_t gfx_busy_acc[MAX_NUM_XCC]` - graphic engine utilization accumulated (%)
|
||||
- `uint32_t pcie_lc_perf_other_end_recovery` - corresponds to the pcie other end recovery counter
|
||||
|
||||
- **Added ability to view raw GPU metrics`rocm-smi --showmetrics`**
|
||||
Users can now view GPU metrics from our new `rocm-smi --showmetrics`. Unlike AMD SMI (or other ROCM-SMI interfaces), these values are ***not*** converted into applicable units as users may see in `amd-smi metric`. Units listed display as indicated by the driver, they are not converted (eg. in other AMD SMI/ROCm SMI interfaces which use the data provided). It is important to note, that fields displaying `N/A` data mean this ASIC does not support or backward compatibility was not provided in a newer ASIC's GPU metric structure.
|
||||
|
||||
### Removals
|
||||
|
||||
- N/A
|
||||
|
||||
### Optimizations
|
||||
|
||||
- N/A
|
||||
|
||||
### Resolved issues
|
||||
|
||||
- N/A
|
||||
|
||||
|
||||
### Known Issues
|
||||
|
||||
- N/A
|
||||
|
||||
### Upcoming changes
|
||||
|
||||
- N/A
|
||||
|
||||
|
||||
## rocm_smi_lib for ROCm 6.2.1
|
||||
|
||||
### Added
|
||||
### Changes
|
||||
|
||||
- N/A
|
||||
|
||||
### Changed
|
||||
### Removals
|
||||
|
||||
- N/A
|
||||
|
||||
### Optimized
|
||||
### Optimizations
|
||||
|
||||
- **Improved handling of UnicodeEncodeErrors with non UTF-8 locales**
|
||||
Non UTF-8 locales were causing crashing on UTF-8 special characters
|
||||
|
||||
### Fixed
|
||||
### Resolved issues
|
||||
|
||||
- **Fixed rsmitstReadWrite.TestComputePartitionReadWrite segfault**
|
||||
Segfault was caused due to unhandled start conditions:
|
||||
@@ -36,28 +82,33 @@ c. reload amgpu - `sudo modprobe amdgpu`
|
||||
Test needed to keep track of total number of devices, in order to ensure test comes back to the original configuration.
|
||||
The test segfault could be seen on all MI3x ASICs, if brought up in a non-SPX configuration upon boot.
|
||||
|
||||
|
||||
### Known Issues
|
||||
|
||||
- N/A
|
||||
|
||||
### Upcoming changes
|
||||
|
||||
- N/A
|
||||
|
||||
## rocm_smi_lib for ROCm 6.2
|
||||
|
||||
### Added
|
||||
### Changes
|
||||
|
||||
- **Added Partition ID API (`rsmi_dev_partition_id_get(..)`)**
|
||||
Previously `rsmi_dev_partition_id_get` could only be retrived by querying through `rsmi_dev_pci_id_get()`
|
||||
and parsing optional bits in our python CLI/API. We are now making this available directly through API.
|
||||
As well as added testing, in our compute partitioning tests verifing partition IDs update accordingly.
|
||||
|
||||
### Changed
|
||||
### Removals
|
||||
|
||||
- N/A
|
||||
|
||||
### Optimized
|
||||
### Optimizations
|
||||
|
||||
- N/A
|
||||
|
||||
### Fixed
|
||||
### Resolved issues
|
||||
|
||||
- **Partition ID CLI output**
|
||||
Due to driver changes in KFD, some devices may report bits [31:28] or [2:0]. With the newly added `rsmi_dev_partition_id_get(..)`, we provided this fallback to properly retreive partition ID. We
|
||||
@@ -74,6 +125,10 @@ plan to eventually remove partition ID from the function portion of the BDF (Bus
|
||||
|
||||
- N/A
|
||||
|
||||
### Upcoming changes
|
||||
|
||||
- N/A
|
||||
|
||||
## rocm_smi_lib for ROCm 6.1.2
|
||||
|
||||
### Added
|
||||
|
||||
+97
-15
@@ -925,10 +925,6 @@ struct metrics_table_header_t {
|
||||
typedef struct metrics_table_header_t metrics_table_header_t;
|
||||
/// \endcond
|
||||
|
||||
/**
|
||||
* @brief The following structure holds the gpu metrics values for a device.
|
||||
*/
|
||||
|
||||
/**
|
||||
* @brief Unit conversion factor for HBM temperatures
|
||||
*/
|
||||
@@ -964,6 +960,41 @@ typedef struct metrics_table_header_t metrics_table_header_t;
|
||||
*/
|
||||
#define RSMI_MAX_NUM_GFX_CLKS 8
|
||||
|
||||
/**
|
||||
* @brief This should match kRSMI_MAX_NUM_XCC;
|
||||
* XCC - Accelerated Compute Core, the collection of compute units,
|
||||
* ACE (Asynchronous Compute Engines), caches,
|
||||
* and global resources organized as one unit.
|
||||
*
|
||||
* Refer to amd.com documentation for more detail:
|
||||
* https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
|
||||
*/
|
||||
#define RSMI_MAX_NUM_XCC 8
|
||||
|
||||
/**
|
||||
* @brief This should match kRSMI_MAX_NUM_XCP;
|
||||
* XCP - Accelerated Compute Processor,
|
||||
* also referred to as the Graphics Compute Partitions.
|
||||
* Each physical gpu could have a maximum of 8 separate partitions
|
||||
* associated with each (depending on ASIC support).
|
||||
*
|
||||
* Refer to amd.com documentation for more detail:
|
||||
* https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf
|
||||
*/
|
||||
#define RSMI_MAX_NUM_XCP 8
|
||||
|
||||
/**
|
||||
* @brief The following structures hold the gpu statistics for a device.
|
||||
*/
|
||||
struct amdgpu_xcp_metrics_t {
|
||||
/* Utilization Instantaneous (%) */
|
||||
uint32_t gfx_busy_inst[RSMI_MAX_NUM_XCC];
|
||||
uint16_t jpeg_busy[RSMI_MAX_NUM_JPEG_ENGS];
|
||||
uint16_t vcn_busy[RSMI_MAX_NUM_VCNS];
|
||||
|
||||
/* Utilization Accumulated (%) */
|
||||
uint64_t gfx_busy_acc[RSMI_MAX_NUM_XCC];
|
||||
};
|
||||
|
||||
typedef struct {
|
||||
// TODO(amd) Doxygen documents
|
||||
@@ -985,7 +1016,7 @@ typedef struct {
|
||||
*/
|
||||
struct metrics_table_header_t common_header;
|
||||
|
||||
// Temperature
|
||||
// Temperature (C)
|
||||
uint16_t temperature_edge;
|
||||
uint16_t temperature_hotspot;
|
||||
uint16_t temperature_mem;
|
||||
@@ -993,19 +1024,19 @@ typedef struct {
|
||||
uint16_t temperature_vrsoc;
|
||||
uint16_t temperature_vrmem;
|
||||
|
||||
// Utilization
|
||||
// Utilization (%)
|
||||
uint16_t average_gfx_activity;
|
||||
uint16_t average_umc_activity; // memory controller
|
||||
uint16_t average_mm_activity; // UVD or VCN
|
||||
|
||||
// Power/Energy
|
||||
// Power (W) /Energy (15.259uJ per 1ns)
|
||||
uint16_t average_socket_power;
|
||||
uint64_t energy_accumulator; // v1 mod. (32->64)
|
||||
|
||||
// Driver attached timestamp (in ns)
|
||||
uint64_t system_clock_counter; // v1 mod. (moved from top of struct)
|
||||
|
||||
// Average clocks
|
||||
// Average clocks (MHz)
|
||||
uint16_t average_gfxclk_frequency;
|
||||
uint16_t average_socclk_frequency;
|
||||
uint16_t average_uclk_frequency;
|
||||
@@ -1014,7 +1045,7 @@ typedef struct {
|
||||
uint16_t average_vclk1_frequency;
|
||||
uint16_t average_dclk1_frequency;
|
||||
|
||||
// Current clocks
|
||||
// Current clocks (MHz)
|
||||
uint16_t current_gfxclk;
|
||||
uint16_t current_socclk;
|
||||
uint16_t current_uclk;
|
||||
@@ -1026,10 +1057,10 @@ typedef struct {
|
||||
// Throttle status
|
||||
uint32_t throttle_status;
|
||||
|
||||
// Fans
|
||||
// Fans (RPM)
|
||||
uint16_t current_fan_speed;
|
||||
|
||||
// Link width/speed
|
||||
// Link width (number of lanes) /speed (0.1 GT/s)
|
||||
uint16_t pcie_link_width; // v1 mod.(8->16)
|
||||
uint16_t pcie_link_speed; // in 0.1 GT/s; v1 mod. (8->16)
|
||||
|
||||
@@ -1045,7 +1076,7 @@ typedef struct {
|
||||
/*
|
||||
* v1.2 additions
|
||||
*/
|
||||
// PMFW attached timestamp (10ns resolution)
|
||||
// PMFW attached timestamp (10ns resolution)
|
||||
uint64_t firmware_timestamp;
|
||||
|
||||
|
||||
@@ -1068,19 +1099,19 @@ typedef struct {
|
||||
uint16_t current_socket_power;
|
||||
|
||||
// Utilization (%)
|
||||
uint16_t vcn_activity[RSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
|
||||
uint16_t vcn_activity[RSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
|
||||
|
||||
// Clock Lock Status. Each bit corresponds to clock instance
|
||||
uint32_t gfxclk_lock_status;
|
||||
|
||||
// XGMI bus width and bitrate (in Gbps)
|
||||
// XGMI bus width and bitrate (in GB/s)
|
||||
uint16_t xgmi_link_width;
|
||||
uint16_t xgmi_link_speed;
|
||||
|
||||
// PCIE accumulated bandwidth (GB/sec)
|
||||
uint64_t pcie_bandwidth_acc;
|
||||
|
||||
// PCIE instantaneous bandwidth (GB/sec)
|
||||
// PCIE instantaneous bandwidth (GB/sec)
|
||||
uint64_t pcie_bandwidth_inst;
|
||||
|
||||
// PCIE L0 to recovery state transition accumulated count
|
||||
@@ -1114,6 +1145,57 @@ typedef struct {
|
||||
// PCIE NAK received accumulated count
|
||||
uint32_t pcie_nak_rcvd_count_acc;
|
||||
|
||||
/*
|
||||
* v1.6 additions
|
||||
*/
|
||||
/* Accumulation cycle counter */
|
||||
uint64_t accumulation_counter;
|
||||
|
||||
/**
|
||||
* Accumulated throttler residencies
|
||||
*/
|
||||
uint64_t prochot_residency_acc;
|
||||
/**
|
||||
* Accumulated throttler residencies
|
||||
*
|
||||
* Prochot (thermal) - PPT (power)
|
||||
* Package Power Tracking (PPT) violation % (greater than 0% is a violation);
|
||||
* aka PVIOL
|
||||
*
|
||||
* Ex. PVIOL/TVIOL calculations
|
||||
* Where A and B are measurments recorded at prior points in time.
|
||||
* Typically A is the earlier measured value and B is the latest measured value.
|
||||
*
|
||||
* PVIOL % = (PptResidencyAcc (B) - PptResidencyAcc (A)) * 100/ (AccumulationCounter (B) - AccumulationCounter (A))
|
||||
* TVIOL % = (SocketThmResidencyAcc (B) - SocketThmResidencyAcc (A)) * 100 / (AccumulationCounter (B) - AccumulationCounter (A))
|
||||
*/
|
||||
uint64_t ppt_residency_acc;
|
||||
/**
|
||||
* Accumulated throttler residencies
|
||||
*
|
||||
* Socket (thermal) -
|
||||
* Socket thermal violation % (greater than 0% is a violation);
|
||||
* aka TVIOL
|
||||
*
|
||||
* Ex. PVIOL/TVIOL calculations
|
||||
* Where A and B are measurments recorded at prior points in time.
|
||||
* Typically A is the earlier measured value and B is the latest measured value.
|
||||
*
|
||||
* PVIOL % = (PptResidencyAcc (B) - PptResidencyAcc (A)) * 100/ (AccumulationCounter (B) - AccumulationCounter (A))
|
||||
* TVIOL % = (SocketThmResidencyAcc (B) - SocketThmResidencyAcc (A)) * 100 / (AccumulationCounter (B) - AccumulationCounter (A))
|
||||
*/
|
||||
uint64_t socket_thm_residency_acc;
|
||||
uint64_t vr_thm_residency_acc;
|
||||
uint64_t hbm_thm_residency_acc;
|
||||
|
||||
/* Number of current partition */
|
||||
uint16_t num_partition;
|
||||
|
||||
/* XCP (Graphic Cluster Partitions) metrics stats */
|
||||
struct amdgpu_xcp_metrics_t xcp_stats[RSMI_MAX_NUM_XCP];
|
||||
|
||||
/* PCIE other end recovery counter */
|
||||
uint32_t pcie_lc_perf_other_end_recovery;
|
||||
|
||||
/// \endcond
|
||||
} rsmi_gpu_metrics_t;
|
||||
|
||||
@@ -242,6 +242,8 @@ class Device {
|
||||
AMGpuMetricsPublicLatestTupl_t dev_copy_internal_to_external_metrics();
|
||||
|
||||
static const std::map<DevInfoTypes, const char*> devInfoTypesStrings;
|
||||
void set_smi_device_id(uint32_t i) { m_device_id = i; }
|
||||
void set_smi_partition_id(uint32_t i) { m_partition_id = i; }
|
||||
|
||||
private:
|
||||
std::shared_ptr<Monitor> monitor_;
|
||||
@@ -278,6 +280,8 @@ class Device {
|
||||
GpuMetricsBasePtr m_gpu_metrics_ptr;
|
||||
AMDGpuMetricsHeader_v1_t m_gpu_metrics_header;
|
||||
uint64_t m_gpu_metrics_updated_timestamp;
|
||||
uint32_t m_device_id;
|
||||
uint32_t m_partition_id;
|
||||
};
|
||||
|
||||
|
||||
|
||||
@@ -3,7 +3,7 @@
|
||||
* The University of Illinois/NCSA
|
||||
* Open Source License (NCSA)
|
||||
*
|
||||
* Copyright (c) 2017-2023, Advanced Micro Devices, Inc.
|
||||
* Copyright (c) 2017-2024, Advanced Micro Devices, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* Developed by:
|
||||
@@ -52,6 +52,7 @@
|
||||
#include <cassert>
|
||||
#include <cstdint>
|
||||
#include <cstring>
|
||||
#include <string>
|
||||
#include <map>
|
||||
#include <memory>
|
||||
#include <type_traits>
|
||||
@@ -64,21 +65,19 @@
|
||||
* All 1.4 and newer GPU metrics are now defined in this header.
|
||||
*
|
||||
*/
|
||||
namespace amd::smi
|
||||
{
|
||||
namespace amd::smi {
|
||||
|
||||
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MAJOR_VER_1 = 1;
|
||||
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_1 = 1;
|
||||
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_2 = 2;
|
||||
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_3 = 3;
|
||||
constexpr uint32_t kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_4 = 4;
|
||||
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MAJOR_VER = kRSMI_GPU_METRICS_API_CONTENT_MAJOR_VER_1;
|
||||
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MINON_VER = kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_4;
|
||||
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MAJOR_VER
|
||||
= kRSMI_GPU_METRICS_API_CONTENT_MAJOR_VER_1;
|
||||
constexpr uint32_t kRSMI_LATEST_GPU_METRICS_API_CONTENT_MINON_VER
|
||||
= kRSMI_GPU_METRICS_API_CONTENT_MINOR_VER_4;
|
||||
|
||||
|
||||
// Note: As gpu metrics are updating
|
||||
constexpr uint32_t kRSMI_GPU_METRICS_EXPIRATION_SECS = 5;
|
||||
|
||||
// Note: This *must* match NUM_HBM_INSTANCES
|
||||
constexpr uint32_t kRSMI_MAX_NUM_HBM_INSTANCES = 4;
|
||||
|
||||
@@ -97,23 +96,36 @@ constexpr uint32_t kRSMI_MAX_NUM_VCNS = 4;
|
||||
// Note: This *must* match NUM_JPEG_ENG
|
||||
constexpr uint32_t kRSMI_MAX_JPEG_ENGINES = 32;
|
||||
|
||||
// Note: This *must* match MAX_XCC
|
||||
constexpr uint32_t kRSMI_MAX_NUM_XCC = 8;
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t
|
||||
{
|
||||
// Note: This *must* match MAX_XCP
|
||||
constexpr uint32_t kRSMI_MAX_NUM_XCP = 8;
|
||||
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t {
|
||||
uint16_t m_structure_size;
|
||||
uint8_t m_format_revision;
|
||||
uint8_t m_content_revision;
|
||||
};
|
||||
|
||||
struct AMDGpuMetricsBase_t
|
||||
{
|
||||
struct amdgpu_xcp_metrics {
|
||||
/* Utilization Instantaneous (%) */
|
||||
uint32_t gfx_busy_inst[kRSMI_MAX_NUM_XCC];
|
||||
uint16_t jpeg_busy[kRSMI_MAX_JPEG_ENGINES];
|
||||
uint16_t vcn_busy[kRSMI_MAX_NUM_VCNS];
|
||||
|
||||
/* Utilization Accumulated (%) */
|
||||
uint64_t gfx_busy_acc[kRSMI_MAX_NUM_XCC];
|
||||
};
|
||||
|
||||
struct AMDGpuMetricsBase_t {
|
||||
virtual ~AMDGpuMetricsBase_t() = default;
|
||||
};
|
||||
using AMDGpuMetricsBaseRef = AMDGpuMetricsBase_t&;
|
||||
|
||||
|
||||
struct AMDGpuMetrics_v11_t
|
||||
{
|
||||
struct AMDGpuMetrics_v11_t {
|
||||
~AMDGpuMetrics_v11_t() = default;
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t m_common_header;
|
||||
@@ -174,8 +186,7 @@ struct AMDGpuMetrics_v11_t
|
||||
uint16_t m_temperature_hbm[kRSMI_MAX_NUM_HBM_INSTANCES];
|
||||
};
|
||||
|
||||
struct AMDGpuMetrics_v12_t
|
||||
{
|
||||
struct AMDGpuMetrics_v12_t {
|
||||
~AMDGpuMetrics_v12_t() = default;
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t m_common_header;
|
||||
@@ -238,8 +249,7 @@ struct AMDGpuMetrics_v12_t
|
||||
uint64_t m_firmware_timestamp;
|
||||
};
|
||||
|
||||
struct AMDGpuMetrics_v13_t
|
||||
{
|
||||
struct AMDGpuMetrics_v13_t {
|
||||
~AMDGpuMetrics_v13_t() = default;
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t m_common_header;
|
||||
@@ -298,7 +308,7 @@ struct AMDGpuMetrics_v13_t
|
||||
uint32_t m_mem_activity_acc; // new in v1
|
||||
uint16_t m_temperature_hbm[kRSMI_MAX_NUM_HBM_INSTANCES]; // new in v1
|
||||
|
||||
// PMFW attached timestamp (10ns resolution)
|
||||
// PMFW attached timestamp (10ns resolution)
|
||||
uint64_t m_firmware_timestamp;
|
||||
|
||||
// Voltage (mV)
|
||||
@@ -312,8 +322,7 @@ struct AMDGpuMetrics_v13_t
|
||||
uint64_t m_indep_throttle_status;
|
||||
};
|
||||
|
||||
struct AMDGpuMetrics_v14_t
|
||||
{
|
||||
struct AMDGpuMetrics_v14_t {
|
||||
~AMDGpuMetrics_v14_t() = default;
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t m_common_header;
|
||||
@@ -329,7 +338,7 @@ struct AMDGpuMetrics_v14_t
|
||||
// Utilization (%)
|
||||
uint16_t m_average_gfx_activity;
|
||||
uint16_t m_average_umc_activity; // memory controller
|
||||
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
|
||||
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
|
||||
|
||||
// Energy (15.259uJ (2^-16) units)
|
||||
uint64_t m_energy_accumulator;
|
||||
@@ -345,9 +354,9 @@ struct AMDGpuMetrics_v14_t
|
||||
|
||||
// Link width (number of lanes) and speed (in 0.1 GT/s)
|
||||
uint16_t m_pcie_link_width;
|
||||
uint16_t m_pcie_link_speed; // in 0.1 GT/s
|
||||
uint16_t m_pcie_link_speed; // in 0.1 GT/s
|
||||
|
||||
// XGMI bus width and bitrate (in Gbps)
|
||||
// XGMI bus width and bitrate (in Gbps)
|
||||
uint16_t m_xgmi_link_width;
|
||||
uint16_t m_xgmi_link_speed;
|
||||
|
||||
@@ -358,7 +367,7 @@ struct AMDGpuMetrics_v14_t
|
||||
// PCIE accumulated bandwidth (GB/sec)
|
||||
uint64_t m_pcie_bandwidth_acc;
|
||||
|
||||
// PCIE instantaneous bandwidth (GB/sec)
|
||||
// PCIE instantaneous bandwidth (GB/sec)
|
||||
uint64_t m_pcie_bandwidth_inst;
|
||||
|
||||
// PCIE L0 to recovery state transition accumulated count
|
||||
@@ -387,8 +396,7 @@ struct AMDGpuMetrics_v14_t
|
||||
uint16_t m_padding;
|
||||
};
|
||||
|
||||
struct AMDGpuMetrics_v15_t
|
||||
{
|
||||
struct AMDGpuMetrics_v15_t {
|
||||
~AMDGpuMetrics_v15_t() = default;
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t m_common_header;
|
||||
@@ -404,7 +412,7 @@ struct AMDGpuMetrics_v15_t
|
||||
// Utilization (%)
|
||||
uint16_t m_average_gfx_activity;
|
||||
uint16_t m_average_umc_activity; // memory controller
|
||||
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
|
||||
uint16_t m_vcn_activity[kRSMI_MAX_NUM_VCNS]; // VCN instances activity percent (encode/decode)
|
||||
uint16_t m_jpeg_activity[kRSMI_MAX_JPEG_ENGINES]; // JPEG activity percent (encode/decode)
|
||||
|
||||
// Energy (15.259uJ (2^-16) units)
|
||||
@@ -421,7 +429,7 @@ struct AMDGpuMetrics_v15_t
|
||||
|
||||
// Link width (number of lanes) and speed (in 0.1 GT/s)
|
||||
uint16_t m_pcie_link_width;
|
||||
uint16_t m_pcie_link_speed; // in 0.1 GT/s
|
||||
uint16_t m_pcie_link_speed; // in 0.1 GT/s
|
||||
|
||||
// XGMI bus width and bitrate (in Gbps)
|
||||
uint16_t m_xgmi_link_width;
|
||||
@@ -468,7 +476,103 @@ struct AMDGpuMetrics_v15_t
|
||||
|
||||
uint16_t m_padding;
|
||||
};
|
||||
using AMGpuMetricsLatest_t = AMDGpuMetrics_v15_t;
|
||||
struct AMDGpuMetrics_v16_t {
|
||||
~AMDGpuMetrics_v16_t() = default;
|
||||
|
||||
struct AMDGpuMetricsHeader_v1_t m_common_header;
|
||||
|
||||
// Temperature (Celsius). It will be zero (0) if unsupported.
|
||||
uint16_t m_temperature_hotspot;
|
||||
uint16_t m_temperature_mem;
|
||||
uint16_t m_temperature_vrsoc;
|
||||
|
||||
// Power (Watts)
|
||||
uint16_t m_current_socket_power;
|
||||
|
||||
// Utilization (%)
|
||||
uint16_t m_average_gfx_activity;
|
||||
uint16_t m_average_umc_activity; // memory controller
|
||||
|
||||
// Energy (15.259uJ (2^-16) units)
|
||||
uint64_t m_energy_accumulator;
|
||||
|
||||
// Driver attached timestamp (in ns)
|
||||
uint64_t m_system_clock_counter;
|
||||
|
||||
/*
|
||||
* Important: bumped up public to uint64_t due to planned size increase
|
||||
* for newer ASICs
|
||||
*/
|
||||
/* Accumulation cycle counter */
|
||||
uint32_t m_accumulation_counter;
|
||||
|
||||
/* Accumulated throttler residencies */
|
||||
uint32_t m_prochot_residency_acc;
|
||||
uint32_t m_ppt_residency_acc;
|
||||
uint32_t m_socket_thm_residency_acc;
|
||||
uint32_t m_vr_thm_residency_acc;
|
||||
uint32_t m_hbm_thm_residency_acc;
|
||||
|
||||
// Clock Lock Status. Each bit corresponds to clock instance
|
||||
uint32_t m_gfxclk_lock_status;
|
||||
|
||||
// Link width (number of lanes) and speed (in 0.1 GT/s)
|
||||
uint16_t m_pcie_link_width;
|
||||
uint16_t m_pcie_link_speed; // in 0.1 GT/s
|
||||
|
||||
// XGMI bus width and bitrate (in Gbps)
|
||||
uint16_t m_xgmi_link_width;
|
||||
uint16_t m_xgmi_link_speed;
|
||||
|
||||
// Utilization Accumulated (%)
|
||||
uint32_t m_gfx_activity_acc;
|
||||
uint32_t m_mem_activity_acc;
|
||||
|
||||
// PCIE accumulated bandwidth (GB/sec)
|
||||
uint64_t m_pcie_bandwidth_acc;
|
||||
|
||||
// PCIE instantaneous bandwidth (GB/sec)
|
||||
uint64_t m_pcie_bandwidth_inst;
|
||||
|
||||
// PCIE L0 to recovery state transition accumulated count
|
||||
uint64_t m_pcie_l0_to_recov_count_acc;
|
||||
|
||||
// PCIE replay accumulated count
|
||||
uint64_t m_pcie_replay_count_acc;
|
||||
|
||||
// PCIE replay rollover accumulated count
|
||||
uint64_t m_pcie_replay_rover_count_acc;
|
||||
|
||||
// PCIE NAK sent accumulated count
|
||||
uint32_t m_pcie_nak_sent_count_acc;
|
||||
|
||||
// PCIE NAK received accumulated count
|
||||
uint32_t m_pcie_nak_rcvd_count_acc;
|
||||
|
||||
// XGMI accumulated data transfer size(KiloBytes)
|
||||
uint64_t m_xgmi_read_data_acc[kRSMI_MAX_NUM_XGMI_LINKS];
|
||||
uint64_t m_xgmi_write_data_acc[kRSMI_MAX_NUM_XGMI_LINKS];
|
||||
|
||||
// PMFW attached timestamp (10ns resolution)
|
||||
uint64_t m_firmware_timestamp;
|
||||
|
||||
// Current clocks (Mhz)
|
||||
uint16_t m_current_gfxclk[kRSMI_MAX_NUM_GFX_CLKS];
|
||||
uint16_t m_current_socclk[kRSMI_MAX_NUM_CLKS];
|
||||
uint16_t m_current_vclk0[kRSMI_MAX_NUM_CLKS];
|
||||
uint16_t m_current_dclk0[kRSMI_MAX_NUM_CLKS];
|
||||
uint16_t m_current_uclk;
|
||||
|
||||
/* Number of current partition */
|
||||
uint16_t m_num_partition;
|
||||
|
||||
/* XCP (Graphic Cluster Partitions) metrics stats */
|
||||
struct amdgpu_xcp_metrics m_xcp_stats[kRSMI_MAX_NUM_XCP];
|
||||
|
||||
/* PCIE other end recovery counter */
|
||||
uint32_t m_pcie_lc_perf_other_end_recovery;
|
||||
};
|
||||
using AMGpuMetricsLatest_t = AMDGpuMetrics_v16_t;
|
||||
|
||||
/**
|
||||
* This is GPU Metrics version that gets to public access.
|
||||
@@ -555,8 +659,7 @@ using AMDGpuMetricVersionFlagId_t = uint32_t;
|
||||
* Each Metric Unit (or a set of them) is related to a Metric class.
|
||||
*
|
||||
*/
|
||||
enum class AMDGpuMetricsClassId_t : AMDGpuMetricTypeId_t
|
||||
{
|
||||
enum class AMDGpuMetricsClassId_t : AMDGpuMetricTypeId_t {
|
||||
kGpuMetricHeader,
|
||||
kGpuMetricTemperature,
|
||||
kGpuMetricUtilization,
|
||||
@@ -569,6 +672,9 @@ enum class AMDGpuMetricsClassId_t : AMDGpuMetricTypeId_t
|
||||
kGpuMetricLinkWidthSpeed,
|
||||
kGpuMetricVoltage,
|
||||
kGpuMetricTimestamp,
|
||||
kGpuMetricThrottleResidency,
|
||||
kGpuMetricPartition,
|
||||
kGpuMetricXcpStats,
|
||||
};
|
||||
using AMDGpuMetricsClassIdTranslationTbl_t = std::map<AMDGpuMetricsClassId_t, std::string>;
|
||||
|
||||
@@ -605,8 +711,8 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
|
||||
kMetricAvgMmActivity,
|
||||
kMetricGfxActivityAccumulator,
|
||||
kMetricMemActivityAccumulator,
|
||||
kMetricVcnActivity, //v1.4
|
||||
kMetricJpegActivity, //v1.5
|
||||
kMetricVcnActivity, // v1.4
|
||||
kMetricJpegActivity, // v1.5
|
||||
|
||||
// kGpuMetricAverageClock counters
|
||||
kMetricAvgGfxClockFrequency,
|
||||
@@ -618,11 +724,11 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
|
||||
kMetricAvgDClock1Frequency,
|
||||
|
||||
// kGpuMetricCurrentClock counters
|
||||
kMetricCurrGfxClock, //v1.4: Changed to multi-valued
|
||||
kMetricCurrSocClock, //v1.4: Changed to multi-valued
|
||||
kMetricCurrGfxClock, // v1.4: Changed to multi-valued
|
||||
kMetricCurrSocClock, // v1.4: Changed to multi-valued
|
||||
kMetricCurrUClock,
|
||||
kMetricCurrVClock0, //v1.4: Changed to multi-valued
|
||||
kMetricCurrDClock0, //v1.4: Changed to multi-valued
|
||||
kMetricCurrVClock0, // v1.4: Changed to multi-valued
|
||||
kMetricCurrDClock0, // v1.4: Changed to multi-valued
|
||||
kMetricCurrVClock1,
|
||||
kMetricCurrDClock1,
|
||||
|
||||
@@ -631,7 +737,7 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
|
||||
kMetricIndepThrottleStatus,
|
||||
|
||||
// kGpuMetricGfxClkLockStatus counters
|
||||
kMetricGfxClkLockStatus, //v1.4
|
||||
kMetricGfxClkLockStatus, // v1.4
|
||||
|
||||
// kGpuMetricCurrentFanSpeed counters
|
||||
kMetricCurrFanSpeed,
|
||||
@@ -639,31 +745,50 @@ enum class AMDGpuMetricsUnitType_t : AMDGpuMetricTypeId_t
|
||||
// kGpuMetricLinkWidthSpeed counters
|
||||
kMetricPcieLinkWidth,
|
||||
kMetricPcieLinkSpeed,
|
||||
kMetricPcieBandwidthAccumulator, //v1.4
|
||||
kMetricPcieBandwidthInst, //v1.4
|
||||
kMetricXgmiLinkWidth, //v1.4
|
||||
kMetricXgmiLinkSpeed, //v1.4
|
||||
kMetricXgmiReadDataAccumulator, //v1.4
|
||||
kMetricXgmiWriteDataAccumulator, //v1.4
|
||||
kMetricPcieL0RecovCountAccumulator, //v1.4
|
||||
kMetricPcieReplayCountAccumulator, //v1.4
|
||||
kMetricPcieReplayRollOverCountAccumulator, //v1.4
|
||||
kMetricPcieNakSentCountAccumulator, //v1.5
|
||||
kMetricPcieNakReceivedCountAccumulator, //v1.5
|
||||
kMetricPcieBandwidthAccumulator, // v1.4
|
||||
kMetricPcieBandwidthInst, // v1.4
|
||||
kMetricXgmiLinkWidth, // v1.4
|
||||
kMetricXgmiLinkSpeed, // v1.4
|
||||
kMetricXgmiReadDataAccumulator, // v1.4
|
||||
kMetricXgmiWriteDataAccumulator, // v1.4
|
||||
kMetricPcieL0RecovCountAccumulator, // v1.4
|
||||
kMetricPcieReplayCountAccumulator, // v1.4
|
||||
kMetricPcieReplayRollOverCountAccumulator, // v1.4
|
||||
kMetricPcieNakSentCountAccumulator, // v1.5
|
||||
kMetricPcieNakReceivedCountAccumulator, // v1.5
|
||||
|
||||
// kGpuMetricPowerEnergy counters
|
||||
kMetricAvgSocketPower,
|
||||
kMetricCurrSocketPower, //v1.4
|
||||
kMetricEnergyAccumulator, //v1.4
|
||||
kMetricCurrSocketPower, // v1.4
|
||||
kMetricEnergyAccumulator, // v1.4
|
||||
|
||||
// kGpuMetricVoltage counters
|
||||
kMetricVoltageSoc, //v1.3
|
||||
kMetricVoltageGfx, //v1.3
|
||||
kMetricVoltageMem, //v1.3
|
||||
kMetricVoltageSoc, // v1.3
|
||||
kMetricVoltageGfx, // v1.3
|
||||
kMetricVoltageMem, // v1.3
|
||||
|
||||
// kGpuMetricTimestamp counters
|
||||
kMetricTSClockCounter,
|
||||
kMetricTSFirmware,
|
||||
|
||||
// kMetricAccumulationCounter counters
|
||||
kMetricAccumulationCounter, // v1.6
|
||||
kMetricProchotResidencyAccumulator, // v1.6
|
||||
kMetricPPTResidencyAccumulator, // v1.6
|
||||
kMetricSocketThmResidencyAccumulator, // v1.6
|
||||
kMetricVRThmResidencyAccumulator, // v1.6
|
||||
kMetricHBMThmResidencyAccumulator, // v1.6
|
||||
|
||||
// kGpuMetricPartition
|
||||
kGpuMetricNumPartition, // v1.6
|
||||
|
||||
// kGpuMetricXcpStats
|
||||
kMetricGfxBusyInst, // v1.6
|
||||
kMetricJpegBusy, // v1.6
|
||||
kMetricVcnBusy, // v1.6
|
||||
kMetricGfxBusyAcc, // v1.6
|
||||
|
||||
kMetricPcieLCPerfOtherEndRecov, // v1.6
|
||||
};
|
||||
using AMDGpuMetricsUnitTypeTranslationTbl_t = std::map<AMDGpuMetricsUnitType_t, std::string>;
|
||||
|
||||
@@ -676,14 +801,14 @@ enum class AMDGpuMetricsDataType_t : AMDGpuMetricsDataTypeId_t
|
||||
kUInt64,
|
||||
};
|
||||
|
||||
struct AMDGpuDynamicMetricsValue_t
|
||||
{
|
||||
struct AMDGpuDynamicMetricsValue_t {
|
||||
uint64_t m_value;
|
||||
std::string m_info;
|
||||
AMDGpuMetricsDataType_t m_original_type;
|
||||
};
|
||||
using AMDGpuDynamicMetricTblValues_t = std::vector<AMDGpuDynamicMetricsValue_t>;
|
||||
using AMDGpuDynamicMetricsTbl_t = std::map<AMDGpuMetricsClassId_t, std::map<AMDGpuMetricsUnitType_t, AMDGpuDynamicMetricTblValues_t>>;
|
||||
using AMDGpuDynamicMetricsTbl_t = std::map<AMDGpuMetricsClassId_t,
|
||||
std::map<AMDGpuMetricsUnitType_t, AMDGpuDynamicMetricTblValues_t>>;
|
||||
|
||||
|
||||
/*
|
||||
@@ -700,13 +825,13 @@ enum class AMDGpuMetricVersionFlags_t : AMDGpuMetricVersionFlagId_t
|
||||
kGpuMetricV13 = (0x1 << 3),
|
||||
kGpuMetricV14 = (0x1 << 4),
|
||||
kGpuMetricV15 = (0x1 << 5),
|
||||
kGpuMetricV16 = (0x1 << 6),
|
||||
};
|
||||
using AMDGpuMetricVersionTranslationTbl_t = std::map<uint16_t, AMDGpuMetricVersionFlags_t>;
|
||||
using GpuMetricTypePtr_t = std::shared_ptr<void>;
|
||||
|
||||
class GpuMetricsBase_t
|
||||
{
|
||||
public:
|
||||
class GpuMetricsBase_t {
|
||||
public:
|
||||
virtual ~GpuMetricsBase_t() = default;
|
||||
virtual size_t sizeof_metric_table() = 0;
|
||||
virtual GpuMetricTypePtr_t get_metrics_table() = 0;
|
||||
@@ -714,30 +839,32 @@ class GpuMetricsBase_t
|
||||
virtual AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() = 0;
|
||||
virtual rsmi_status_t populate_metrics_dynamic_tbl() = 0;
|
||||
virtual AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() = 0;
|
||||
virtual void set_device_id(uint32_t device_id) { m_device_id = device_id; }
|
||||
virtual void set_partition_id(uint32_t partition_id) { m_partition_id = partition_id; }
|
||||
virtual AMDGpuDynamicMetricsTbl_t get_metrics_dynamic_tbl() {
|
||||
return m_metrics_dynamic_tbl;
|
||||
}
|
||||
|
||||
protected:
|
||||
protected:
|
||||
AMDGpuDynamicMetricsTbl_t m_metrics_dynamic_tbl;
|
||||
uint64_t m_metrics_timestamp;
|
||||
uint32_t m_device_id;
|
||||
uint32_t m_partition_id;
|
||||
|
||||
};
|
||||
using GpuMetricsBasePtr = std::shared_ptr<GpuMetricsBase_t>;
|
||||
using AMDGpuMetricFactories_t = const std::map<AMDGpuMetricVersionFlags_t, GpuMetricsBasePtr>;
|
||||
|
||||
|
||||
class GpuMetricsBase_v11_t final : public GpuMetricsBase_t
|
||||
{
|
||||
public:
|
||||
class GpuMetricsBase_v11_t final : public GpuMetricsBase_t {
|
||||
public:
|
||||
virtual ~GpuMetricsBase_v11_t() = default;
|
||||
|
||||
size_t sizeof_metric_table() override {
|
||||
return sizeof(AMDGpuMetrics_v11_t);
|
||||
}
|
||||
|
||||
GpuMetricTypePtr_t get_metrics_table() override
|
||||
{
|
||||
GpuMetricTypePtr_t get_metrics_table() override {
|
||||
if (!m_gpu_metric_ptr) {
|
||||
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v11_t*){});
|
||||
}
|
||||
@@ -745,13 +872,11 @@ class GpuMetricsBase_v11_t final : public GpuMetricsBase_t
|
||||
return m_gpu_metric_ptr;
|
||||
}
|
||||
|
||||
void dump_internal_metrics_table() override
|
||||
{
|
||||
void dump_internal_metrics_table() override {
|
||||
return;
|
||||
}
|
||||
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
|
||||
{
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
|
||||
return AMDGpuMetricVersionFlags_t::kGpuMetricV11;
|
||||
}
|
||||
|
||||
@@ -759,23 +884,20 @@ class GpuMetricsBase_v11_t final : public GpuMetricsBase_t
|
||||
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
|
||||
|
||||
|
||||
private:
|
||||
private:
|
||||
AMDGpuMetrics_v11_t m_gpu_metrics_tbl;
|
||||
std::shared_ptr<AMDGpuMetrics_v11_t> m_gpu_metric_ptr;
|
||||
|
||||
};
|
||||
|
||||
class GpuMetricsBase_v12_t final : public GpuMetricsBase_t
|
||||
{
|
||||
public:
|
||||
class GpuMetricsBase_v12_t final : public GpuMetricsBase_t {
|
||||
public:
|
||||
~GpuMetricsBase_v12_t() = default;
|
||||
|
||||
size_t sizeof_metric_table() override {
|
||||
return sizeof(AMDGpuMetrics_v12_t);
|
||||
}
|
||||
|
||||
GpuMetricTypePtr_t get_metrics_table() override
|
||||
{
|
||||
GpuMetricTypePtr_t get_metrics_table() override {
|
||||
if (!m_gpu_metric_ptr) {
|
||||
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v12_t*){});
|
||||
}
|
||||
@@ -783,36 +905,31 @@ class GpuMetricsBase_v12_t final : public GpuMetricsBase_t
|
||||
return m_gpu_metric_ptr;
|
||||
}
|
||||
|
||||
void dump_internal_metrics_table() override
|
||||
{
|
||||
void dump_internal_metrics_table() override {
|
||||
return;
|
||||
}
|
||||
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
|
||||
{
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
|
||||
return AMDGpuMetricVersionFlags_t::kGpuMetricV12;
|
||||
}
|
||||
|
||||
rsmi_status_t populate_metrics_dynamic_tbl() override;
|
||||
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
|
||||
|
||||
private:
|
||||
private:
|
||||
AMDGpuMetrics_v12_t m_gpu_metrics_tbl;
|
||||
std::shared_ptr<AMDGpuMetrics_v12_t> m_gpu_metric_ptr;
|
||||
|
||||
};
|
||||
|
||||
class GpuMetricsBase_v13_t final : public GpuMetricsBase_t
|
||||
{
|
||||
public:
|
||||
class GpuMetricsBase_v13_t final : public GpuMetricsBase_t {
|
||||
public:
|
||||
~GpuMetricsBase_v13_t() = default;
|
||||
|
||||
size_t sizeof_metric_table() override {
|
||||
return sizeof(AMDGpuMetrics_v13_t);
|
||||
}
|
||||
|
||||
GpuMetricTypePtr_t get_metrics_table() override
|
||||
{
|
||||
GpuMetricTypePtr_t get_metrics_table() override {
|
||||
if (!m_gpu_metric_ptr) {
|
||||
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v13_t*){});
|
||||
}
|
||||
@@ -822,8 +939,7 @@ class GpuMetricsBase_v13_t final : public GpuMetricsBase_t
|
||||
|
||||
void dump_internal_metrics_table() override;
|
||||
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
|
||||
{
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
|
||||
return AMDGpuMetricVersionFlags_t::kGpuMetricV13;
|
||||
}
|
||||
|
||||
@@ -831,23 +947,20 @@ class GpuMetricsBase_v13_t final : public GpuMetricsBase_t
|
||||
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
|
||||
|
||||
|
||||
private:
|
||||
private:
|
||||
AMDGpuMetrics_v13_t m_gpu_metrics_tbl;
|
||||
std::shared_ptr<AMDGpuMetrics_v13_t> m_gpu_metric_ptr;
|
||||
|
||||
};
|
||||
|
||||
class GpuMetricsBase_v14_t final : public GpuMetricsBase_t
|
||||
{
|
||||
public:
|
||||
class GpuMetricsBase_v14_t final : public GpuMetricsBase_t {
|
||||
public:
|
||||
~GpuMetricsBase_v14_t() = default;
|
||||
|
||||
size_t sizeof_metric_table() override {
|
||||
return sizeof(AMDGpuMetrics_v14_t);
|
||||
}
|
||||
|
||||
GpuMetricTypePtr_t get_metrics_table() override
|
||||
{
|
||||
GpuMetricTypePtr_t get_metrics_table() override {
|
||||
if (!m_gpu_metric_ptr) {
|
||||
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v14_t*){});
|
||||
}
|
||||
@@ -857,8 +970,7 @@ class GpuMetricsBase_v14_t final : public GpuMetricsBase_t
|
||||
|
||||
void dump_internal_metrics_table() override;
|
||||
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
|
||||
{
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
|
||||
return AMDGpuMetricVersionFlags_t::kGpuMetricV14;
|
||||
}
|
||||
|
||||
@@ -866,23 +978,20 @@ class GpuMetricsBase_v14_t final : public GpuMetricsBase_t
|
||||
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
|
||||
|
||||
|
||||
private:
|
||||
private:
|
||||
AMDGpuMetrics_v14_t m_gpu_metrics_tbl;
|
||||
std::shared_ptr<AMDGpuMetrics_v14_t> m_gpu_metric_ptr;
|
||||
|
||||
};
|
||||
|
||||
class GpuMetricsBase_v15_t final : public GpuMetricsBase_t
|
||||
{
|
||||
public:
|
||||
class GpuMetricsBase_v15_t final : public GpuMetricsBase_t {
|
||||
public:
|
||||
~GpuMetricsBase_v15_t() = default;
|
||||
|
||||
size_t sizeof_metric_table() override {
|
||||
return sizeof(AMDGpuMetrics_v15_t);
|
||||
}
|
||||
|
||||
GpuMetricTypePtr_t get_metrics_table() override
|
||||
{
|
||||
GpuMetricTypePtr_t get_metrics_table() override {
|
||||
if (!m_gpu_metric_ptr) {
|
||||
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v15_t*){});
|
||||
}
|
||||
@@ -892,8 +1001,7 @@ class GpuMetricsBase_v15_t final : public GpuMetricsBase_t
|
||||
|
||||
void dump_internal_metrics_table() override;
|
||||
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override
|
||||
{
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
|
||||
return AMDGpuMetricVersionFlags_t::kGpuMetricV15;
|
||||
}
|
||||
|
||||
@@ -901,20 +1009,51 @@ class GpuMetricsBase_v15_t final : public GpuMetricsBase_t
|
||||
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
|
||||
|
||||
|
||||
private:
|
||||
private:
|
||||
AMDGpuMetrics_v15_t m_gpu_metrics_tbl;
|
||||
std::shared_ptr<AMDGpuMetrics_v15_t> m_gpu_metric_ptr;
|
||||
};
|
||||
|
||||
class GpuMetricsBase_v16_t final : public GpuMetricsBase_t {
|
||||
public:
|
||||
~GpuMetricsBase_v16_t() = default;
|
||||
|
||||
size_t sizeof_metric_table() override {
|
||||
return sizeof(AMDGpuMetrics_v16_t);
|
||||
}
|
||||
|
||||
GpuMetricTypePtr_t get_metrics_table() override {
|
||||
if (!m_gpu_metric_ptr) {
|
||||
m_gpu_metric_ptr.reset(&m_gpu_metrics_tbl, [](AMDGpuMetrics_v16_t*){});
|
||||
}
|
||||
assert(m_gpu_metric_ptr != nullptr);
|
||||
return m_gpu_metric_ptr;
|
||||
}
|
||||
|
||||
void dump_internal_metrics_table() override;
|
||||
|
||||
AMDGpuMetricVersionFlags_t get_gpu_metrics_version_used() override {
|
||||
return AMDGpuMetricVersionFlags_t::kGpuMetricV16;
|
||||
}
|
||||
|
||||
rsmi_status_t populate_metrics_dynamic_tbl() override;
|
||||
AMGpuMetricsPublicLatestTupl_t copy_internal_to_external_metrics() override;
|
||||
|
||||
private:
|
||||
AMDGpuMetrics_v16_t m_gpu_metrics_tbl;
|
||||
std::shared_ptr<AMDGpuMetrics_v16_t> m_gpu_metric_ptr;
|
||||
};
|
||||
|
||||
template<typename T>
|
||||
rsmi_status_t rsmi_dev_gpu_metrics_info_query(uint32_t dv_ind, AMDGpuMetricsUnitType_t metric_counter, T& metric_value);
|
||||
rsmi_status_t rsmi_dev_gpu_metrics_info_query(uint32_t dv_ind,
|
||||
AMDGpuMetricsUnitType_t metric_counter, T& metric_value);
|
||||
|
||||
} // namespace amd::smi
|
||||
} // namespace amd::smi
|
||||
|
||||
|
||||
rsmi_status_t
|
||||
rsmi_dev_gpu_metrics_header_info_get(uint32_t dv_ind, metrics_table_header_t& header_value);
|
||||
rsmi_dev_gpu_metrics_header_info_get(uint32_t dv_ind,
|
||||
metrics_table_header_t& header_value);
|
||||
|
||||
|
||||
#endif // ROCM_SMI_ROCM_SMI_GPU_METRICS_H_
|
||||
#endif // ROCM_SMI_ROCM_SMI_GPU_METRICS_H_
|
||||
|
||||
@@ -48,14 +48,18 @@
|
||||
#include <algorithm>
|
||||
#include <cstdint>
|
||||
#include <iomanip>
|
||||
#include <iosfwd>
|
||||
#include <iostream>
|
||||
#include <iterator>
|
||||
#include <limits>
|
||||
#include <ostream>
|
||||
#include <queue>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <tuple>
|
||||
#include <type_traits>
|
||||
#include <vector>
|
||||
#include <utility>
|
||||
|
||||
#include "rocm_smi/rocm_smi_device.h"
|
||||
|
||||
@@ -599,6 +603,74 @@ using TextFileTagContents_t = TagTextContents_t<std::string, std::string,
|
||||
std::string, std::string>;
|
||||
|
||||
|
||||
//
|
||||
// Note: Output iterator that inserts a delimiter between elements.
|
||||
//
|
||||
template<typename DelimiterType, typename CharType = char,
|
||||
typename TraitsType = std::char_traits<CharType>>
|
||||
class ostream_joiner {
|
||||
public:
|
||||
using Char_t = CharType;
|
||||
using Traits_t = TraitsType;
|
||||
using Ostream_t = std::basic_ostream<Char_t, Traits_t>;
|
||||
using iterator_category = std::output_iterator_tag;
|
||||
using value_type = void;
|
||||
using difference_type = void;
|
||||
using pointer = void;
|
||||
using reference = void;
|
||||
|
||||
|
||||
ostream_joiner(Ostream_t* outstream,
|
||||
const DelimiterType& delimiter) noexcept
|
||||
(std::is_nothrow_copy_constructible_v<DelimiterType>)
|
||||
: m_outstream(outstream), m_delimiter(delimiter) {}
|
||||
|
||||
ostream_joiner(Ostream_t* outstream, DelimiterType&& delimiter) noexcept
|
||||
(std::is_nothrow_move_constructible_v<DelimiterType>)
|
||||
: m_outstream(outstream), m_delimiter(std::move(delimiter)) {}
|
||||
|
||||
template<typename ValueType> ostream_joiner& operator=(const ValueType& value) {
|
||||
if (!m_is_first) {
|
||||
*m_outstream << m_delimiter;
|
||||
}
|
||||
this->m_is_first = false;
|
||||
this->m_value_count++;
|
||||
|
||||
if ((m_value_count % kMAX_VALUES_PER_LINE) == 0) {
|
||||
*m_outstream << "\n" << value;
|
||||
this->m_value_count = 0;
|
||||
} else {
|
||||
*m_outstream << value;
|
||||
}
|
||||
|
||||
return *this;
|
||||
}
|
||||
|
||||
ostream_joiner& operator*() noexcept { return *this; }
|
||||
ostream_joiner& operator++() noexcept { return *this; }
|
||||
ostream_joiner& operator++(int) noexcept { return *this; }
|
||||
|
||||
|
||||
private:
|
||||
Ostream_t* m_outstream;
|
||||
DelimiterType m_delimiter;
|
||||
bool m_is_first = true;
|
||||
uint32_t m_value_count = 0;
|
||||
const uint32_t kMAX_VALUES_PER_LINE = 9;
|
||||
};
|
||||
|
||||
/// Object generator for ostream_joiner.
|
||||
template<typename CharType, typename TraitsType, typename DelimiterType>
|
||||
inline ostream_joiner<std::decay_t<DelimiterType>, CharType, TraitsType>
|
||||
make_ostream_joiner(std::basic_ostream<CharType, TraitsType>* outstream,
|
||||
DelimiterType&& delimiter) {
|
||||
return {
|
||||
outstream,
|
||||
std::forward<DelimiterType>(delimiter)
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
} // namespace smi
|
||||
} // namespace amd
|
||||
|
||||
|
||||
+325
-5
@@ -24,6 +24,7 @@ import trace
|
||||
from io import StringIO
|
||||
from time import ctime
|
||||
from subprocess import check_output
|
||||
from enum import IntEnum
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
# only used for type checking
|
||||
@@ -48,9 +49,9 @@ except ImportError:
|
||||
# Minor version - Increment when adding a new feature, set to 0 when major is incremented
|
||||
# Patch version - Increment when adding a fix, set to 0 when minor is incremented
|
||||
# Hash version - Shortened commit hash. Print here and not with lib for consistency with amd-smi
|
||||
SMI_MAJ = 2
|
||||
SMI_MIN = 3
|
||||
SMI_PAT = 1
|
||||
SMI_MAJ = 3
|
||||
SMI_MIN = 0
|
||||
SMI_PAT = 0
|
||||
# SMI_HASH is provided by rsmiBindings
|
||||
__version__ = '%s.%s.%s+%s' % (SMI_MAJ, SMI_MIN, SMI_PAT, SMI_HASH)
|
||||
|
||||
@@ -856,7 +857,7 @@ def printEventList(device, delay, eventList):
|
||||
print2DArray([['\rGPU[%d]:\t' % (data.dv_ind), ctime().split()[3], notification_type_names[data.event.value - 1],
|
||||
data.message.decode('utf8') + '\r']])
|
||||
|
||||
def printLog(device, metricName, value=None, extraSpace=False, useItalics=False):
|
||||
def printLog(device, metricName, value=None, extraSpace=False, useItalics=False, xcp=None):
|
||||
""" Print out to the SMI log
|
||||
|
||||
:param device: DRM device identifier
|
||||
@@ -878,7 +879,10 @@ def printLog(device, metricName, value=None, extraSpace=False, useItalics=False)
|
||||
formatJson(device, str(metricName))
|
||||
return
|
||||
if value is not None:
|
||||
logstr = 'GPU[%s]\t\t: %s: %s' % (device, metricName, value)
|
||||
if xcp == None:
|
||||
logstr = 'GPU[%s]\t\t: %s: %s' % (device, metricName, value)
|
||||
else:
|
||||
logstr = 'GPU[%s] XCP[%s]\t: %s: %s' % (device, xcp, metricName, value)
|
||||
else:
|
||||
logstr = 'GPU[%s]\t\t: %s' % (device, metricName)
|
||||
if device is None:
|
||||
@@ -3544,6 +3548,318 @@ def showMemoryPartition(deviceList):
|
||||
printErrLog(device, 'Failed to retrieve current memory partition, even though device supports it.')
|
||||
printLogSpacer()
|
||||
|
||||
class UIntegerTypes(IntEnum):
|
||||
UINT8_T = 0xFF
|
||||
UINT16_T = 0xFFFF
|
||||
UINT32_T = 0xFFFFFFFF
|
||||
UINT64_T = 0xFFFFFFFFFFFFFFFF
|
||||
|
||||
def validateIfMaxUint(valToCheck, uintType: UIntegerTypes):
|
||||
return_val = "N/A"
|
||||
if not isinstance(valToCheck, list):
|
||||
if valToCheck == uintType:
|
||||
return return_val
|
||||
else:
|
||||
return valToCheck
|
||||
else:
|
||||
return_val = valToCheck
|
||||
for idx, v in enumerate(valToCheck):
|
||||
if v == uintType:
|
||||
return_val[idx] = "N/A"
|
||||
return return_val
|
||||
|
||||
def showGPUMetrics(deviceList):
|
||||
""" Returns the gpu metrics for a list of devices
|
||||
|
||||
:param deviceList: List of DRM devices (can be a single-item list)
|
||||
"""
|
||||
printLogSpacer(' GPU Metrics ')
|
||||
gpu_metrics = rsmi_gpu_metrics_t()
|
||||
temp_unit="C"
|
||||
power_unit="W"
|
||||
energy_unit="15.259uJ (2^-16)"
|
||||
volt_unit="mV"
|
||||
clock_unit="MHz"
|
||||
fan_speed="rpm"
|
||||
percent_unit="%"
|
||||
pcie_acc_unit="GB/s"
|
||||
pcie_lanes_unit="Lanes"
|
||||
pcie_speed_unit="0.1 GT/s"
|
||||
xgmi_speed="Gbps"
|
||||
xgmi_data_sz="kB"
|
||||
time_unit="ns"
|
||||
time_unit_10="10ns resolution"
|
||||
count="Count"
|
||||
no_unit = None
|
||||
|
||||
for device in deviceList:
|
||||
ret = rocmsmi.rsmi_dev_gpu_metrics_info_get(device, byref(gpu_metrics))
|
||||
metrics = {
|
||||
"common_header": "N/A"
|
||||
}
|
||||
if rsmi_ret_ok(ret, device, 'rsmi_dev_gpu_metrics_info_get',silent=True):
|
||||
metrics = {
|
||||
"common_header": {
|
||||
"version": float(str(gpu_metrics.common_header.format_revision) + "."
|
||||
+ str(gpu_metrics.common_header.content_revision)),
|
||||
"size": gpu_metrics.common_header.structure_size
|
||||
}, "temperature_edge": {
|
||||
"value": validateIfMaxUint(gpu_metrics.temperature_edge, UIntegerTypes.UINT16_T),
|
||||
"unit": temp_unit,
|
||||
}, "temperature_hotspot": {
|
||||
"value": validateIfMaxUint(gpu_metrics.temperature_hotspot, UIntegerTypes.UINT16_T),
|
||||
"unit": temp_unit,
|
||||
}, "temperature_mem": {
|
||||
"value": validateIfMaxUint(gpu_metrics.temperature_mem, UIntegerTypes.UINT16_T),
|
||||
"unit": temp_unit,
|
||||
}, "temperature_vrgfx": {
|
||||
"value": validateIfMaxUint(gpu_metrics.temperature_vrgfx, UIntegerTypes.UINT16_T),
|
||||
"unit": temp_unit,
|
||||
}, "temperature_vrsoc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.temperature_vrsoc, UIntegerTypes.UINT16_T),
|
||||
"unit": temp_unit,
|
||||
}, "temperature_vrmem": {
|
||||
"value": validateIfMaxUint(gpu_metrics.temperature_vrmem, UIntegerTypes.UINT16_T),
|
||||
"unit": temp_unit,
|
||||
}, "average_gfx_activity": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_gfx_activity, UIntegerTypes.UINT16_T),
|
||||
"unit": percent_unit,
|
||||
}, "average_umc_activity": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_umc_activity, UIntegerTypes.UINT16_T),
|
||||
"unit": percent_unit,
|
||||
}, "average_mm_activity": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_mm_activity, UIntegerTypes.UINT16_T),
|
||||
"unit": percent_unit,
|
||||
}, "average_socket_power": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_socket_power, UIntegerTypes.UINT16_T),
|
||||
"unit": power_unit,
|
||||
}, "energy_accumulator": {
|
||||
"value": validateIfMaxUint(gpu_metrics.energy_accumulator, UIntegerTypes.UINT64_T),
|
||||
"unit": energy_unit,
|
||||
}, "system_clock_counter": {
|
||||
"value": validateIfMaxUint(gpu_metrics.system_clock_counter, UIntegerTypes.UINT64_T),
|
||||
"unit": time_unit,
|
||||
}, "average_gfxclk_frequency": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_gfxclk_frequency, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "average_socclk_frequency": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_socclk_frequency, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "average_uclk_frequency": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_uclk_frequency, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "average_vclk0_frequency": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_vclk0_frequency, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "average_dclk0_frequency": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_dclk0_frequency, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "average_vclk1_frequency": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_vclk1_frequency, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "average_dclk1_frequency": {
|
||||
"value": validateIfMaxUint(gpu_metrics.average_dclk1_frequency, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_gfxclk": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_gfxclk, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_socclk": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_socclk, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_uclk": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_uclk, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_vclk0": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_vclk0, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_dclk0": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_dclk0, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_vclk1": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_vclk1, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_dclk1": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_dclk1, UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "throttle_status": {
|
||||
"value": validateIfMaxUint(gpu_metrics.throttle_status, UIntegerTypes.UINT32_T),
|
||||
"unit": no_unit,
|
||||
}, "current_fan_speed": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_fan_speed, UIntegerTypes.UINT16_T),
|
||||
"unit": fan_speed,
|
||||
}, "pcie_link_width": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_link_width, UIntegerTypes.UINT16_T),
|
||||
"unit": pcie_lanes_unit,
|
||||
}, "pcie_link_speed": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_link_speed, UIntegerTypes.UINT16_T),
|
||||
"unit": pcie_speed_unit,
|
||||
}, "gfx_activity_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.gfx_activity_acc, UIntegerTypes.UINT32_T),
|
||||
"unit": percent_unit,
|
||||
}, "mem_activity_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.mem_activity_acc, UIntegerTypes.UINT32_T),
|
||||
"unit": percent_unit,
|
||||
}, "temperature_hbm": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.temperature_hbm), UIntegerTypes.UINT16_T),
|
||||
"unit": temp_unit,
|
||||
}, "firmware_timestamp": {
|
||||
"value": validateIfMaxUint(gpu_metrics.firmware_timestamp, UIntegerTypes.UINT64_T),
|
||||
"unit": time_unit_10,
|
||||
}, "voltage_soc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.voltage_soc, UIntegerTypes.UINT16_T),
|
||||
"unit": volt_unit,
|
||||
}, "voltage_gfx": {
|
||||
"value": validateIfMaxUint(gpu_metrics.voltage_gfx, UIntegerTypes.UINT16_T),
|
||||
"unit": volt_unit,
|
||||
}, "voltage_mem": {
|
||||
"value": validateIfMaxUint(gpu_metrics.voltage_mem, UIntegerTypes.UINT16_T),
|
||||
"unit": volt_unit,
|
||||
}, "indep_throttle_status": {
|
||||
"value": validateIfMaxUint(gpu_metrics.indep_throttle_status, UIntegerTypes.UINT64_T),
|
||||
"unit": no_unit,
|
||||
}, "current_socket_power": {
|
||||
"value": validateIfMaxUint(gpu_metrics.current_socket_power, UIntegerTypes.UINT16_T),
|
||||
"unit": power_unit,
|
||||
}, "vcn_activity": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.vcn_activity), UIntegerTypes.UINT16_T),
|
||||
"unit": percent_unit,
|
||||
}, "gfxclk_lock_status": {
|
||||
"value": validateIfMaxUint(gpu_metrics.gfxclk_lock_status, UIntegerTypes.UINT32_T),
|
||||
"unit": no_unit,
|
||||
}, "xgmi_link_width": {
|
||||
"value": validateIfMaxUint(gpu_metrics.xgmi_link_width, UIntegerTypes.UINT16_T),
|
||||
"unit": no_unit,
|
||||
}, "xgmi_link_speed": {
|
||||
"value": validateIfMaxUint(gpu_metrics.xgmi_link_speed, UIntegerTypes.UINT16_T),
|
||||
"unit": xgmi_speed,
|
||||
}, "pcie_bandwidth_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_bandwidth_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": pcie_acc_unit,
|
||||
}, "pcie_bandwidth_inst": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_bandwidth_inst, UIntegerTypes.UINT64_T),
|
||||
"unit": pcie_acc_unit,
|
||||
}, "pcie_l0_to_recov_count_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_l0_to_recov_count_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "pcie_replay_count_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_replay_count_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "pcie_replay_rover_count_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_replay_rover_count_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "xgmi_read_data_acc": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.xgmi_read_data_acc), UIntegerTypes.UINT64_T),
|
||||
"unit": xgmi_data_sz,
|
||||
}, "xgmi_write_data_acc": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.xgmi_write_data_acc), UIntegerTypes.UINT64_T),
|
||||
"unit": xgmi_data_sz,
|
||||
}, "current_gfxclks": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.current_gfxclks), UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_socclks": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.current_socclks), UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_vclk0s": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.current_vclk0s), UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "current_dclk0s": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.current_dclk0s), UIntegerTypes.UINT16_T),
|
||||
"unit": clock_unit,
|
||||
}, "jpeg_activity": {
|
||||
"value": validateIfMaxUint(list(gpu_metrics.jpeg_activity), UIntegerTypes.UINT16_T),
|
||||
"unit": percent_unit,
|
||||
}, "pcie_nak_sent_count_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_nak_sent_count_acc, UIntegerTypes.UINT32_T),
|
||||
"unit": count,
|
||||
}, "pcie_nak_rcvd_count_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_nak_rcvd_count_acc, UIntegerTypes.UINT32_T),
|
||||
"unit": count,
|
||||
}, "accumulation_counter": {
|
||||
"value": validateIfMaxUint(gpu_metrics.accumulation_counter, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "prochot_residency_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.prochot_residency_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "ppt_residency_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.ppt_residency_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "socket_thm_residency_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.socket_thm_residency_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "vr_thm_residency_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.vr_thm_residency_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
}, "hbm_thm_residency_acc": {
|
||||
"value": validateIfMaxUint(gpu_metrics.hbm_thm_residency_acc, UIntegerTypes.UINT64_T),
|
||||
"unit": count,
|
||||
},
|
||||
"pcie_lc_perf_other_end_recovery": {
|
||||
"value": validateIfMaxUint(gpu_metrics.pcie_lc_perf_other_end_recovery, UIntegerTypes.UINT32_T),
|
||||
"unit": count,
|
||||
},
|
||||
"num_partition": {
|
||||
"value": validateIfMaxUint(gpu_metrics.num_partition, UIntegerTypes.UINT16_T),
|
||||
"unit": no_unit,
|
||||
},
|
||||
"xcp_stats.gfx_busy_inst": {
|
||||
"value": gpu_metrics.xcp_stats,
|
||||
"unit": percent_unit,
|
||||
},
|
||||
"xcp_stats.jpeg_busy": {
|
||||
"value": gpu_metrics.xcp_stats,
|
||||
"unit": percent_unit,
|
||||
},
|
||||
"xcp_stats.vcn_busy": {
|
||||
"value": gpu_metrics.xcp_stats,
|
||||
"unit": percent_unit,
|
||||
},
|
||||
"xcp_stats.gfx_busy_acc": {
|
||||
"value": gpu_metrics.xcp_stats,
|
||||
"unit": percent_unit,
|
||||
},
|
||||
}
|
||||
|
||||
printLog(device, 'Metric Version and Size (Bytes)',
|
||||
str(metrics["common_header"]["version"]) + " " + str(metrics["common_header"]["size"]))
|
||||
for k,v in metrics.items():
|
||||
if k != "common_header" and 'xcp_stats' not in k:
|
||||
if v["unit"] != None:
|
||||
printLog(device, k + " (" + str(v["unit"]) + ")", str(v["value"]))
|
||||
elif v["unit"] == None:
|
||||
printLog(device, k, str(v["value"]))
|
||||
if 'xcp_stats.gfx_busy_inst' in k:
|
||||
for curr_xcp, item in enumerate(v['value']):
|
||||
print_xcp_detail = []
|
||||
for _, val in enumerate(item.gfx_busy_inst):
|
||||
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT32_T))
|
||||
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
|
||||
if 'xcp_stats.jpeg_busy' in k:
|
||||
for curr_xcp, item in enumerate(v['value']):
|
||||
print_xcp_detail = []
|
||||
for _, val in enumerate(item.jpeg_busy):
|
||||
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT16_T))
|
||||
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
|
||||
if 'xcp_stats.vcn_busy' in k:
|
||||
for curr_xcp, item in enumerate(v['value']):
|
||||
print_xcp_detail = []
|
||||
for _, val in enumerate(item.vcn_busy):
|
||||
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT16_T))
|
||||
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
|
||||
if 'xcp_stats.gfx_busy_acc' in k:
|
||||
for curr_xcp, item in enumerate(v['value']):
|
||||
print_xcp_detail = []
|
||||
for _, val in enumerate(item.gfx_busy_acc):
|
||||
print_xcp_detail.append(validateIfMaxUint(val, UIntegerTypes.UINT64_T))
|
||||
printLog(device, k + " (" + str(v["unit"]) + ")", str(print_xcp_detail), xcp=str(curr_xcp))
|
||||
|
||||
if int(device) < (len(deviceList) - 1):
|
||||
printLogSpacer()
|
||||
elif ret == rsmi_status_t.RSMI_STATUS_NOT_SUPPORTED:
|
||||
printLog(device, 'Not supported on the given system', None)
|
||||
else:
|
||||
rsmi_ret_ok(ret, device, 'get_gpu_metrics')
|
||||
printErrLog(device, 'Failed to retrieve GPU metrics, metric version may not be supported for this device.')
|
||||
printLogSpacer()
|
||||
|
||||
def checkAmdGpus(deviceList):
|
||||
""" Check if there are any AMD GPUs being queried,
|
||||
@@ -3913,6 +4229,7 @@ if __name__ == '__main__':
|
||||
groupDisplay.add_argument('--shownodesbw', help='Shows the numa nodes ', action='store_true')
|
||||
groupDisplay.add_argument('--showcomputepartition', help='Shows current compute partitioning ', action='store_true')
|
||||
groupDisplay.add_argument('--showmemorypartition', help='Shows current memory partition ', action='store_true')
|
||||
groupDisplay.add_argument('--showmetrics', help='Show current gpu metric data ', action='store_true')
|
||||
|
||||
groupActionReset.add_argument('-r', '--resetclocks', help='Reset clocks and OverDrive to default',
|
||||
action='store_true')
|
||||
@@ -4079,6 +4396,7 @@ if __name__ == '__main__':
|
||||
args.showvc = True
|
||||
args.showcomputepartition = True
|
||||
args.showmemorypartition = True
|
||||
args.showmetrics = True
|
||||
|
||||
if not PRINT_JSON:
|
||||
args.showprofile = True
|
||||
@@ -4209,6 +4527,8 @@ if __name__ == '__main__':
|
||||
showComputePartition(deviceList)
|
||||
if args.showmemorypartition:
|
||||
showMemoryPartition(deviceList)
|
||||
if args.showmetrics:
|
||||
showGPUMetrics(deviceList)
|
||||
if args.setclock:
|
||||
setClocks(deviceList, args.setclock[0], [int(args.setclock[1])])
|
||||
if args.setsclk:
|
||||
|
||||
@@ -642,3 +642,102 @@ rsmi_power_type_dict = {
|
||||
1: 'CURRENT SOCKET',
|
||||
0xFFFFFFFF: 'INVALID_POWER_TYPE'
|
||||
}
|
||||
|
||||
class metrics_table_header_t(Structure):
|
||||
pass
|
||||
|
||||
# metrics_table_header_t._pack_ = 1 # source:False
|
||||
metrics_table_header_t._fields_ = [
|
||||
('structure_size', c_uint16),
|
||||
('format_revision', c_uint8),
|
||||
('content_revision', c_uint8),
|
||||
]
|
||||
amd_metrics_table_header_t = metrics_table_header_t
|
||||
|
||||
class amdgpu_xcp_metrics_t(Structure):
|
||||
pass
|
||||
|
||||
# amdgpu_xcp_metrics_t._pack_ = 1 # source:False
|
||||
amdgpu_xcp_metrics_t._fields_ = [
|
||||
('gfx_busy_inst', c_uint32 * 8),
|
||||
('jpeg_busy', c_uint16 * 32),
|
||||
('vcn_busy', c_uint16 * 4),
|
||||
('gfx_busy_acc', c_uint64 * 8),
|
||||
]
|
||||
xcp_stats_t = amdgpu_xcp_metrics_t
|
||||
|
||||
class rsmi_gpu_metrics_t(Structure):
|
||||
pass
|
||||
|
||||
|
||||
# rsmi_gpu_metrics_t._pack_ = 1 # source:False
|
||||
rsmi_gpu_metrics_t._fields_ = [
|
||||
('common_header', amd_metrics_table_header_t),
|
||||
('temperature_edge', c_uint16),
|
||||
('temperature_hotspot', c_uint16),
|
||||
('temperature_mem', c_uint16),
|
||||
('temperature_vrgfx', c_uint16),
|
||||
('temperature_vrsoc', c_uint16),
|
||||
('temperature_vrmem', c_uint16),
|
||||
('average_gfx_activity', c_uint16),
|
||||
('average_umc_activity', c_uint16),
|
||||
('average_mm_activity', c_uint16),
|
||||
('average_socket_power', c_uint16),
|
||||
('energy_accumulator', c_uint64),
|
||||
('system_clock_counter', c_uint64),
|
||||
('average_gfxclk_frequency', c_uint16),
|
||||
('average_socclk_frequency', c_uint16),
|
||||
('average_uclk_frequency', c_uint16),
|
||||
('average_vclk0_frequency', c_uint16),
|
||||
('average_dclk0_frequency', c_uint16),
|
||||
('average_vclk1_frequency', c_uint16),
|
||||
('average_dclk1_frequency', c_uint16),
|
||||
('current_gfxclk', c_uint16),
|
||||
('current_socclk', c_uint16),
|
||||
('current_uclk', c_uint16),
|
||||
('current_vclk0', c_uint16),
|
||||
('current_dclk0', c_uint16),
|
||||
('current_vclk1', c_uint16),
|
||||
('current_dclk1', c_uint16),
|
||||
('throttle_status', c_uint32),
|
||||
('current_fan_speed', c_uint16),
|
||||
('pcie_link_width', c_uint16),
|
||||
('pcie_link_speed', c_uint16),
|
||||
('gfx_activity_acc', c_uint32),
|
||||
('mem_activity_acc', c_uint32),
|
||||
('temperature_hbm', c_uint16 * 4),
|
||||
('firmware_timestamp', c_uint64),
|
||||
('voltage_soc', c_uint16),
|
||||
('voltage_gfx', c_uint16),
|
||||
('voltage_mem', c_uint16),
|
||||
('indep_throttle_status', c_uint64),
|
||||
('current_socket_power', c_uint16),
|
||||
('vcn_activity', c_uint16 * 4),
|
||||
('gfxclk_lock_status', c_uint32),
|
||||
('xgmi_link_width', c_uint16),
|
||||
('xgmi_link_speed', c_uint16),
|
||||
('pcie_bandwidth_acc', c_uint64),
|
||||
('pcie_bandwidth_inst', c_uint64),
|
||||
('pcie_l0_to_recov_count_acc', c_uint64),
|
||||
('pcie_replay_count_acc', c_uint64),
|
||||
('pcie_replay_rover_count_acc', c_uint64),
|
||||
('xgmi_read_data_acc', c_uint64 * 8),
|
||||
('xgmi_write_data_acc', c_uint64 * 8),
|
||||
('current_gfxclks', c_uint16 * 8),
|
||||
('current_socclks', c_uint16 * 4),
|
||||
('current_vclk0s', c_uint16 * 4),
|
||||
('current_dclk0s', c_uint16 * 4),
|
||||
('jpeg_activity', c_uint16 * 32),
|
||||
('pcie_nak_sent_count_acc', c_uint32),
|
||||
('pcie_nak_rcvd_count_acc', c_uint32),
|
||||
('accumulation_counter', c_uint64),
|
||||
('prochot_residency_acc', c_uint64),
|
||||
('ppt_residency_acc', c_uint64),
|
||||
('socket_thm_residency_acc', c_uint64),
|
||||
('vr_thm_residency_acc', c_uint64),
|
||||
('hbm_thm_residency_acc', c_uint64),
|
||||
('num_partition', c_uint16),
|
||||
('xcp_stats', xcp_stats_t * 8),
|
||||
('pcie_lc_perf_other_end_recovery', c_uint32),
|
||||
]
|
||||
amdsmi_gpu_metrics_t = rsmi_gpu_metrics_t
|
||||
|
||||
@@ -731,30 +731,6 @@ template<typename T> constexpr float convert_mw_to_w(T mw) {
|
||||
return static_cast<float>(mw / 1000.0);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
auto print_error_or_value(rsmi_status_t status_code, const T& metric) {
|
||||
if (status_code == rsmi_status_t::RSMI_STATUS_SUCCESS) {
|
||||
if constexpr (std::is_array_v<T>) {
|
||||
auto idx = uint16_t(0);
|
||||
auto str_values = std::string();
|
||||
const auto num_elems = static_cast<uint16_t>(std::end(metric) - std::begin(metric));
|
||||
str_values = ("\n\t\t num of values: " + std::to_string(num_elems) + "\n");
|
||||
for (const auto& el : metric) {
|
||||
str_values += "\t\t [" + std::to_string(idx) + "]: " + std::to_string(el) + "\n";
|
||||
++idx;
|
||||
}
|
||||
return str_values;
|
||||
}
|
||||
else if constexpr ((std::is_same_v<T, std::uint16_t>) ||
|
||||
(std::is_same_v<T, std::uint32_t>) ||
|
||||
(std::is_same_v<T, std::uint64_t>)) {
|
||||
return std::to_string(metric);
|
||||
}
|
||||
}
|
||||
else {
|
||||
return ("\n\t\tStatus: [" + std::to_string(status_code) + "] " + "-> " + amd::smi::getRSMIStatusString(status_code));
|
||||
}
|
||||
};
|
||||
|
||||
template <typename T>
|
||||
std::string print_unsigned_int(T value) {
|
||||
@@ -860,8 +836,9 @@ int main() {
|
||||
//
|
||||
std::cout << "\n";
|
||||
print_test_header("GPU METRICS: Using static struct (Backwards Compatibility) ", i);
|
||||
print_function_header_with_rsmi_ret(ret, "rsmi_dev_gpu_metrics_info_get(" + std::to_string(i) + ", &gpu_metrics)");
|
||||
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics);
|
||||
ret = rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics);
|
||||
print_function_header_with_rsmi_ret(ret, "rsmi_dev_gpu_metrics_info_get("
|
||||
+ std::to_string(i) + ", &gpu_metrics)");
|
||||
|
||||
std::cout << "\t**.common_header.format_revision : "
|
||||
<< print_unsigned_int(gpu_metrics.common_header.format_revision) << "\n";
|
||||
@@ -960,6 +937,22 @@ int main() {
|
||||
<< gpu_metrics.pcie_replay_count_acc << "\n";
|
||||
std::cout << "\t**.pcie_replay_rover_count_acc : " << std::dec
|
||||
<< gpu_metrics.pcie_replay_rover_count_acc << "\n";
|
||||
std::cout << "\t**.accumulation_counter : " << std::dec
|
||||
<< gpu_metrics.accumulation_counter << "\n";
|
||||
std::cout << "\t**.prochot_residency_acc : " << std::dec
|
||||
<< gpu_metrics.prochot_residency_acc << "\n";
|
||||
std::cout << "\t**.ppt_residency_acc : " << std::dec
|
||||
<< gpu_metrics.ppt_residency_acc << "\n";
|
||||
std::cout << "\t**.socket_thm_residency_acc : " << std::dec
|
||||
<< gpu_metrics.socket_thm_residency_acc << "\n";
|
||||
std::cout << "\t**.vr_thm_residency_acc : " << std::dec
|
||||
<< gpu_metrics.vr_thm_residency_acc << "\n";
|
||||
std::cout << "\t**.hbm_thm_residency_acc : " << std::dec
|
||||
<< gpu_metrics.hbm_thm_residency_acc << "\n";
|
||||
std::cout << "\t**.num_partition: " << std::dec
|
||||
<< gpu_metrics.num_partition << "\n";
|
||||
std::cout << "\t**.pcie_lc_perf_other_end_recovery: "
|
||||
<< gpu_metrics.pcie_lc_perf_other_end_recovery << "\n";
|
||||
|
||||
std::cout << "\t**.temperature_hbm[] : " << std::dec << "\n";
|
||||
for (const auto& temp : gpu_metrics.temperature_hbm) {
|
||||
@@ -1001,23 +994,70 @@ int main() {
|
||||
std::cout << "\t -> " << std::dec << dclk << "\n";
|
||||
}
|
||||
|
||||
std::cout << std::dec << "xcp_stats.gfx_busy_inst = \n";
|
||||
auto xcp = 0;
|
||||
for (auto& row : gpu_metrics.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.gfx_busy_inst),
|
||||
std::end(row.gfx_busy_inst),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
xcp = 0;
|
||||
std::cout << std::dec << "xcp_stats.jpeg_busy = \n";
|
||||
for (auto& row : gpu_metrics.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.jpeg_busy),
|
||||
std::end(row.jpeg_busy),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
xcp = 0;
|
||||
std::cout << std::dec << "xcp_stats.vcn_busy = \n";
|
||||
for (auto& row : gpu_metrics.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.vcn_busy),
|
||||
std::end(row.vcn_busy),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
xcp = 0;
|
||||
std::cout << std::dec << "xcp_stats.gfx_busy_acc = \n";
|
||||
for (auto& row : gpu_metrics.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.gfx_busy_acc),
|
||||
std::end(row.gfx_busy_acc),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "\t ** -> Checking metrics with constant changes ** " << "\n";
|
||||
constexpr uint16_t kMAX_ITER_TEST = 10;
|
||||
rsmi_gpu_metrics_t gpu_metrics_check;
|
||||
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
|
||||
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
|
||||
std::cout << "\t\t -> firmware_timestamp [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.firmware_timestamp << "\n";
|
||||
std::cout << "\t\t -> firmware_timestamp [" << idx
|
||||
<< "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.firmware_timestamp << "\n";
|
||||
}
|
||||
|
||||
std::cout << "\n";
|
||||
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
|
||||
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
|
||||
std::cout << "\t\t -> system_clock_counter [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.system_clock_counter << "\n";
|
||||
std::cout << "\t\t -> system_clock_counter [" << idx
|
||||
<< "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.system_clock_counter << "\n";
|
||||
}
|
||||
|
||||
std::cout << "\n\n";
|
||||
std::cout << " ** Note: Values MAX'ed out (UINTX MAX are unsupported for the version in question) ** " << "\n";
|
||||
std::cout << " ** Note: Values MAX'ed out "
|
||||
"(UINTX MAX are unsupported for the version in question) ** " << "\n";
|
||||
|
||||
|
||||
std::cout << "\n\n";
|
||||
@@ -1026,14 +1066,16 @@ int main() {
|
||||
|
||||
ret = rsmi_dev_metrics_header_info_get(i, &header_values);
|
||||
std::cout << "\t[Metrics Header]" << "\n";
|
||||
std::cout << "\t -> format_revision : " << print_unsigned_int(header_values.format_revision) << "\n";
|
||||
std::cout << "\t -> content_revision : " << print_unsigned_int(header_values.content_revision) << "\n";
|
||||
std::cout << "\t -> format_revision : "
|
||||
<< print_unsigned_int(header_values.format_revision) << "\n";
|
||||
std::cout << "\t -> content_revision : "
|
||||
<< print_unsigned_int(header_values.content_revision) << "\n";
|
||||
std::cout << "\t--------------------" << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "\t[XCD CounterVoltage]" << "\n";
|
||||
ret = rsmi_dev_metrics_xcd_counter_get(i, &val_ui16);
|
||||
std::cout << "\t -> xcd_counter(): " << print_error_or_value(ret, val_ui16) << "\n";
|
||||
std::cout << "\t -> xcd_counter(): " << val_ui16;
|
||||
std::cout << "\n\n";
|
||||
|
||||
ret = rsmi_dev_perf_level_get(i, &pfl);
|
||||
@@ -1041,8 +1083,12 @@ int main() {
|
||||
std::cout << "\t**Performance Level:" <<
|
||||
perf_level_string(pfl) << "\n";
|
||||
ret = rsmi_dev_overdrive_level_get(i, &val_ui32);
|
||||
CHK_AND_PRINT_RSMI_ERR_RET(ret)
|
||||
std::cout << "\t**OverDrive Level:" << val_ui32 << "\n";
|
||||
std::cout << "\t**OverDrive Level: ";
|
||||
if (ret == RSMI_STATUS_SUCCESS) {
|
||||
std::cout << val_ui32 << "\n";
|
||||
} else {
|
||||
CHK_RSMI_NOT_SUPPORTED_OR_UNEXPECTED_DATA_RET(ret)
|
||||
}
|
||||
|
||||
print_test_header("GPU Clocks", i);
|
||||
for (int clkType = static_cast<int>(RSMI_CLK_TYPE_SYS);
|
||||
@@ -1159,9 +1205,6 @@ int main() {
|
||||
}
|
||||
|
||||
for (uint32_t i = 0; i < num_monitor_devs; ++i) {
|
||||
ret = test_set_overdrive(i);
|
||||
CHK_AND_PRINT_RSMI_ERR_RET(ret)
|
||||
|
||||
ret = test_set_perf_level(i);
|
||||
CHK_AND_PRINT_RSMI_ERR_RET(ret)
|
||||
|
||||
@@ -1182,6 +1225,9 @@ int main() {
|
||||
|
||||
ret = test_set_memory_partition(i);
|
||||
CHK_AND_PRINT_RSMI_ERR_RET(ret)
|
||||
|
||||
ret = test_set_overdrive(i);
|
||||
CHK_RSMI_NOT_SUPPORTED_RET(ret)
|
||||
}
|
||||
|
||||
return 0;
|
||||
|
||||
@@ -5846,6 +5846,9 @@ rsmi_dev_metrics_xcd_counter_get(uint32_t dv_ind, uint16_t* xcd_counter_value)
|
||||
auto status_code = rsmi_dev_gpu_metrics_info_get(dv_ind, &gpu_metrics);
|
||||
if (status_code == rsmi_status_t::RSMI_STATUS_SUCCESS) {
|
||||
for (const auto& gfxclk : gpu_metrics.current_gfxclks) {
|
||||
if (gfxclk == UINT16_MAX) {
|
||||
break;
|
||||
}
|
||||
if ((gfxclk != 0) && (gfxclk != UINT16_MAX)) {
|
||||
xcd_counter++;
|
||||
}
|
||||
|
||||
@@ -964,15 +964,17 @@ int Device::readDevInfoBinary(DevInfoTypes type, std::size_t b_size,
|
||||
LOG_ERROR(ss);
|
||||
return ENOENT;
|
||||
}
|
||||
ss << "Successfully read DevInfoBinary for DevInfoType ("
|
||||
<< devInfoTypesStrings.at(type) << ") - SYSFS ("
|
||||
<< sysfs_path << "), returning binaryData = " << p_binary_data
|
||||
<< "; byte_size = " << std::dec << static_cast<int>(b_size);
|
||||
if (ROCmLogging::Logger::getInstance()->isLoggerEnabled()) {
|
||||
ss << "Successfully read DevInfoBinary for DevInfoType ("
|
||||
<< devInfoTypesStrings.at(type) << ") - SYSFS ("
|
||||
<< sysfs_path << "), returning binaryData = " << p_binary_data
|
||||
<< "; byte_size = " << std::dec << static_cast<int>(b_size);
|
||||
|
||||
std::string metricDescription = "AMD SMI GPU METRICS (16-byte width), "
|
||||
std::string metricDescription = "AMD SMI GPU METRICS (16-byte width), "
|
||||
+ sysfs_path;
|
||||
logHexDump(metricDescription.c_str(), p_binary_data, b_size, 16);
|
||||
LOG_INFO(ss);
|
||||
logHexDump(metricDescription.c_str(), p_binary_data, b_size, 16);
|
||||
LOG_INFO(ss);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
+983
-270
ファイル差分が大きすぎるため省略します
差分を読み込み
@@ -5,7 +5,7 @@
|
||||
* The University of Illinois/NCSA
|
||||
* Open Source License (NCSA)
|
||||
*
|
||||
* Copyright (c) 2020, Advanced Micro Devices, Inc.
|
||||
* Copyright (c) 2020-2024, Advanced Micro Devices, Inc.
|
||||
* All rights reserved.
|
||||
*
|
||||
* Developed by:
|
||||
@@ -46,7 +46,9 @@
|
||||
#include <stdint.h>
|
||||
#include <stddef.h>
|
||||
|
||||
#include <algorithm>
|
||||
#include <iostream>
|
||||
#include <iterator>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <map>
|
||||
@@ -89,36 +91,6 @@ void TestGpuMetricsRead::Close() {
|
||||
}
|
||||
|
||||
|
||||
using GPUMetricResults_t = std::map<std::string, rsmi_status_t>;
|
||||
GPUMetricResults_t MetricResults{};
|
||||
|
||||
template <typename T>
|
||||
auto print_error_or_value(std::string title, std::string func_name, const T& metric) {
|
||||
auto str_values = title;
|
||||
const auto status_code = MetricResults.at(func_name);
|
||||
if (status_code == rsmi_status_t::RSMI_STATUS_SUCCESS) {
|
||||
if constexpr (std::is_array_v<T>) {
|
||||
auto idx = uint16_t(0);
|
||||
|
||||
const auto num_elems = static_cast<uint16_t>(std::end(metric) - std::begin(metric));
|
||||
str_values += ("\n\t\t num of values: " + std::to_string(num_elems) + "\n");
|
||||
for (const auto& el : metric) {
|
||||
str_values += "\t\t [" + std::to_string(idx) + "]: " + std::to_string(el) + "\n";
|
||||
++idx;
|
||||
}
|
||||
return str_values;
|
||||
}
|
||||
else if constexpr ((std::is_same_v<T, std::uint16_t>) ||
|
||||
(std::is_same_v<T, std::uint32_t>) ||
|
||||
(std::is_same_v<T, std::uint64_t>)) {
|
||||
|
||||
return str_values += std::to_string(metric);
|
||||
}
|
||||
}
|
||||
else {
|
||||
return str_values += ("\n\t\tStatus: [" + std::to_string(status_code) + "] " + "-> " + amd::smi::getRSMIStatusString(status_code));
|
||||
}
|
||||
};
|
||||
|
||||
void TestGpuMetricsRead::Run(void) {
|
||||
rsmi_status_t err;
|
||||
@@ -140,13 +112,15 @@ void TestGpuMetricsRead::Run(void) {
|
||||
auto ret = rsmi_dev_metrics_header_info_get(i, &header_values);
|
||||
if (ret == rsmi_status_t::RSMI_STATUS_SUCCESS) {
|
||||
std::cout << "\t[Metrics Header]" << "\n";
|
||||
std::cout << "\t -> format_revision : " << amd::smi::print_unsigned_int(header_values.format_revision) << "\n";
|
||||
std::cout << "\t -> content_revision : " << amd::smi::print_unsigned_int(header_values.content_revision) << "\n";
|
||||
std::cout << "\t -> format_revision : "
|
||||
<< static_cast<uint16_t>(header_values.format_revision) << "\n";
|
||||
std::cout << "\t -> content_revision : "
|
||||
<< static_cast<uint16_t>(header_values.content_revision) << "\n";
|
||||
std::cout << "\t--------------------" << "\n";
|
||||
}
|
||||
}
|
||||
|
||||
rsmi_gpu_metrics_t smu;
|
||||
rsmi_gpu_metrics_t smu = {};
|
||||
err = rsmi_dev_gpu_metrics_info_get(i, &smu);
|
||||
if (err != RSMI_STATUS_SUCCESS) {
|
||||
if (err == RSMI_STATUS_NOT_SUPPORTED) {
|
||||
@@ -159,96 +133,232 @@ void TestGpuMetricsRead::Run(void) {
|
||||
} else {
|
||||
CHK_ERR_ASRT(err);
|
||||
IF_VERB(STANDARD) {
|
||||
std::cout << std::dec << "\tsystem_clock_counter=" << smu.system_clock_counter << '\n';
|
||||
std::cout << std::dec << "\ttemperature_edge=" << smu.temperature_edge << '\n';
|
||||
std::cout << std::dec << "\ttemperature_hotspot=" << smu.temperature_hotspot << '\n';
|
||||
std::cout << std::dec << "\ttemperature_mem=" << smu.temperature_mem << '\n';
|
||||
std::cout << std::dec << "\ttemperature_vrgfx=" << smu.temperature_vrgfx << '\n';
|
||||
std::cout << std::dec << "\ttemperature_vrsoc=" << smu.temperature_vrsoc << '\n';
|
||||
std::cout << std::dec << "\ttemperature_vrmem=" << smu.temperature_vrmem << '\n';
|
||||
std::cout << std::dec << "\taverage_gfx_activity=" << smu.average_gfx_activity << '\n';
|
||||
std::cout << std::dec << "\taverage_umc_activity=" << smu.average_umc_activity << '\n';
|
||||
std::cout << std::dec << "\taverage_mm_activity=" << smu.average_mm_activity << '\n';
|
||||
std::cout << std::dec << "\taverage_socket_power=" << smu.average_socket_power << '\n';
|
||||
std::cout << std::dec << "\tenergy_accumulator=" << smu.energy_accumulator << '\n';
|
||||
std::cout << std::dec << "\taverage_gfxclk_frequency=" << smu.average_gfxclk_frequency << '\n';
|
||||
std::cout << std::dec << "\taverage_uclk_frequency=" << smu.average_uclk_frequency << '\n';
|
||||
std::cout << std::dec << "\taverage_vclk0_frequency=" << smu.average_vclk0_frequency << '\n';
|
||||
std::cout << std::dec << "\taverage_dclk0_frequency=" << smu.average_dclk0_frequency << '\n';
|
||||
std::cout << std::dec << "\taverage_vclk1_frequency=" << smu.average_vclk1_frequency << '\n';
|
||||
std::cout << std::dec << "\taverage_dclk1_frequency=" << smu.average_dclk1_frequency << '\n';
|
||||
std::cout << std::dec << "\tcurrent_gfxclk=" << smu.current_gfxclk << '\n';
|
||||
std::cout << std::dec << "\tcurrent_socclk=" << smu.current_socclk << '\n';
|
||||
std::cout << std::dec << "\tcurrent_uclk=" << smu.current_uclk << '\n';
|
||||
std::cout << std::dec << "\tcurrent_vclk0=" << smu.current_vclk0 << '\n';
|
||||
std::cout << std::dec << "\tcurrent_dclk0=" << smu.current_dclk0 << '\n';
|
||||
std::cout << std::dec << "\tcurrent_vclk1=" << smu.current_vclk1 << '\n';
|
||||
std::cout << std::dec << "\tcurrent_dclk1=" << smu.current_dclk1 << '\n';
|
||||
std::cout << std::dec << "\tthrottle_status=" << smu.throttle_status << '\n';
|
||||
std::cout << std::dec << "\tcurrent_fan_speed=" << smu.current_fan_speed << '\n';
|
||||
std::cout << std::dec << "\tpcie_link_width=" << smu.pcie_link_width << '\n';
|
||||
std::cout << std::dec << "\tpcie_link_speed=" << smu.pcie_link_speed << '\n';
|
||||
std::cout << std::dec << "\tgfx_activity_acc=" << std::dec << smu.gfx_activity_acc << '\n';
|
||||
std::cout << std::dec << "\tmem_activity_acc=" << std::dec << smu.mem_activity_acc << '\n';
|
||||
|
||||
for (int i = 0; i < RSMI_NUM_HBM_INSTANCES; ++i) {
|
||||
std::cout << "\ttemperature_hbm[" << i << "]=" << std::dec << smu.temperature_hbm[i] << '\n';
|
||||
}
|
||||
std::cout << "\n";
|
||||
std::cout << "\tfirmware_timestamp=" << std::dec << smu.firmware_timestamp << '\n';
|
||||
std::cout << "\tvoltage_soc=" << std::dec << smu.voltage_soc << '\n';
|
||||
std::cout << "\tvoltage_gfx=" << std::dec << smu.voltage_gfx << '\n';
|
||||
std::cout << "\tvoltage_mem=" << std::dec << smu.voltage_mem << '\n';
|
||||
std::cout << "\tindep_throttle_status=" << std::dec << smu.indep_throttle_status << '\n';
|
||||
std::cout << "\tcurrent_socket_power=" << std::dec << smu.current_socket_power << '\n';
|
||||
|
||||
for (int i = 0; i < RSMI_MAX_NUM_VCNS; ++i) {
|
||||
std::cout << "\tvcn_activity[" << i << "]=" << std::dec << smu.vcn_activity[i] << '\n';
|
||||
}
|
||||
std::cout << "\n";
|
||||
|
||||
for (int i = 0; i < RSMI_MAX_NUM_JPEG_ENGS; ++i) {
|
||||
std::cout << "\tjpeg_activity[" << i << "]=" << std::dec << smu.jpeg_activity[i] << '\n';
|
||||
}
|
||||
std::cout << "\n";
|
||||
|
||||
std::cout << "\tgfxclk_lock_status=" << std::dec << smu.gfxclk_lock_status << '\n';
|
||||
std::cout << "\txgmi_link_width=" << std::dec << smu.xgmi_link_width << '\n';
|
||||
std::cout << "\txgmi_link_speed=" << std::dec << smu.xgmi_link_speed << '\n';
|
||||
std::cout << "\tpcie_bandwidth_acc=" << std::dec << smu.pcie_bandwidth_acc << '\n';
|
||||
std::cout << "\tpcie_bandwidth_inst=" << std::dec << smu.pcie_bandwidth_inst << '\n';
|
||||
std::cout << "\tpcie_l0_to_recov_count_acc=" << std::dec << smu.pcie_l0_to_recov_count_acc << '\n';
|
||||
std::cout << "\tpcie_replay_count_acc=" << std::dec << smu.pcie_replay_count_acc << '\n';
|
||||
std::cout << "\tpcie_replay_rover_count_acc=" << std::dec << smu.pcie_replay_rover_count_acc << '\n';
|
||||
for (int i = 0; i < RSMI_MAX_NUM_XGMI_LINKS; ++i) {
|
||||
std::cout << "\txgmi_read_data_acc[" << i << "]=" << std::dec << smu.xgmi_read_data_acc[i] << '\n';
|
||||
}
|
||||
std::cout << "METRIC TABLE HEADER:\n";
|
||||
std::cout << "structure_size=" << std::dec
|
||||
<< static_cast<uint16_t>(smu.common_header.structure_size) << "\n";
|
||||
std::cout << "format_revision=" << std::dec
|
||||
<< static_cast<uint16_t>(smu.common_header.format_revision) << "\n";
|
||||
std::cout << "content_revision=" << std::dec
|
||||
<< static_cast<uint16_t>(smu.common_header.content_revision) << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
for (int i = 0; i < RSMI_MAX_NUM_XGMI_LINKS; ++i) {
|
||||
std::cout << "\txgmi_write_data_acc[" << i << "]=" << std::dec << smu.xgmi_write_data_acc[i] << '\n';
|
||||
}
|
||||
std::cout << "TIME STAMPS (ns):\n";
|
||||
std::cout << std::dec << "system_clock_counter=" << smu.system_clock_counter << "\n";
|
||||
std::cout << "firmware_timestamp (10ns resolution)=" << std::dec << smu.firmware_timestamp
|
||||
<< "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
for (int i = 0; i < RSMI_MAX_NUM_GFX_CLKS; ++i) {
|
||||
std::cout << "\tcurrent_gfxclks[" << i << "]=" << std::dec << smu.current_gfxclks[i] << '\n';
|
||||
}
|
||||
std::cout << "TEMPERATURES (C):\n";
|
||||
std::cout << std::dec << "temperature_edge= " << smu.temperature_edge << "\n";
|
||||
std::cout << std::dec << "temperature_hotspot= " << smu.temperature_hotspot << "\n";
|
||||
std::cout << std::dec << "temperature_mem= " << smu.temperature_mem << "\n";
|
||||
std::cout << std::dec << "temperature_vrgfx= " << smu.temperature_vrgfx << "\n";
|
||||
std::cout << std::dec << "temperature_vrsoc= " << smu.temperature_vrsoc << "\n";
|
||||
std::cout << std::dec << "temperature_vrmem= " << smu.temperature_vrmem << "\n";
|
||||
std::cout << "temperature_hbm = [";
|
||||
std::copy(std::begin(smu.temperature_hbm),
|
||||
std::end(smu.temperature_hbm),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << "\n";
|
||||
for (int i = 0; i < RSMI_MAX_NUM_CLKS; ++i) {
|
||||
std::cout << "\tcurrent_socclks[" << i << "]=" << std::dec << smu.current_socclks[i] << '\n';
|
||||
}
|
||||
std::cout << "UTILIZATION (%):\n";
|
||||
std::cout << std::dec << "average_gfx_activity=" << smu.average_gfx_activity << "\n";
|
||||
std::cout << std::dec << "average_umc_activity=" << smu.average_umc_activity << "\n";
|
||||
std::cout << std::dec << "average_mm_activity=" << smu.average_mm_activity << "\n";
|
||||
std::cout << std::dec << "vcn_activity= [";
|
||||
std::copy(std::begin(smu.vcn_activity),
|
||||
std::end(smu.vcn_activity),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << "\n";
|
||||
for (int i = 0; i < RSMI_MAX_NUM_CLKS; ++i) {
|
||||
std::cout << "\tcurrent_vclk0s[" << i << "]=" << std::dec << smu.current_vclk0s[i] << '\n';
|
||||
}
|
||||
std::cout << std::dec << "jpeg_activity= [";
|
||||
std::copy(std::begin(smu.jpeg_activity),
|
||||
std::end(smu.jpeg_activity),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << "\n";
|
||||
for (int i = 0; i < RSMI_MAX_NUM_CLKS; ++i) {
|
||||
std::cout << "\tcurrent_dclk0s[" << i << "]=" << std::dec << smu.current_dclk0s[i] << '\n';
|
||||
std::cout << "POWER (W)/ENERGY (15.259uJ per 1ns):\n";
|
||||
std::cout << std::dec << "average_socket_power=" << smu.average_socket_power << "\n";
|
||||
std::cout << std::dec << "current_socket_power=" << smu.current_socket_power << "\n";
|
||||
std::cout << std::dec << "energy_accumulator=" << smu.energy_accumulator << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "AVG CLOCKS (MHz):\n";
|
||||
std::cout << std::dec << "average_gfxclk_frequency=" << smu.average_gfxclk_frequency
|
||||
<< "\n";
|
||||
std::cout << std::dec << "average_gfxclk_frequency=" << smu.average_gfxclk_frequency
|
||||
<< "\n";
|
||||
std::cout << std::dec << "average_uclk_frequency=" << smu.average_uclk_frequency << "\n";
|
||||
std::cout << std::dec << "average_vclk0_frequency=" << smu.average_vclk0_frequency
|
||||
<< "\n";
|
||||
std::cout << std::dec << "average_dclk0_frequency=" << smu.average_dclk0_frequency
|
||||
<< "\n";
|
||||
std::cout << std::dec << "average_vclk1_frequency=" << smu.average_vclk1_frequency
|
||||
<< "\n";
|
||||
std::cout << std::dec << "average_dclk1_frequency=" << smu.average_dclk1_frequency
|
||||
<< "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "CURRENT CLOCKS (MHz):\n";
|
||||
std::cout << std::dec << "current_gfxclk=" << smu.current_gfxclk << "\n";
|
||||
std::cout << std::dec << "current_gfxclks= [";
|
||||
std::copy(std::begin(smu.current_gfxclks),
|
||||
std::end(smu.current_gfxclks),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << std::dec << "current_socclk=" << smu.current_socclk << "\n";
|
||||
std::cout << std::dec << "current_socclks= [";
|
||||
std::copy(std::begin(smu.current_socclks),
|
||||
std::end(smu.current_socclks),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << std::dec << "current_uclk=" << smu.current_uclk << "\n";
|
||||
std::cout << std::dec << "current_vclk0=" << smu.current_vclk0 << "\n";
|
||||
std::cout << std::dec << "current_vclk0s= [";
|
||||
std::copy(std::begin(smu.current_vclk0s),
|
||||
std::end(smu.current_vclk0s),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << std::dec << "current_dclk0=" << smu.current_dclk0 << "\n";
|
||||
std::cout << std::dec << "current_dclk0s= [";
|
||||
std::copy(std::begin(smu.current_dclk0s),
|
||||
std::end(smu.current_dclk0s),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << std::dec << "current_vclk1=" << smu.current_vclk1 << "\n";
|
||||
std::cout << std::dec << "current_dclk1=" << smu.current_dclk1 << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "TROTTLE STATUS:\n";
|
||||
std::cout << std::dec << "throttle_status=" << smu.throttle_status << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "FAN SPEED:\n";
|
||||
std::cout << std::dec << "current_fan_speed=" << smu.current_fan_speed << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "LINK WIDTH (number of lanes) /SPEED (0.1 GT/s):\n";
|
||||
std::cout << "pcie_link_width=" << smu.pcie_link_width << "\n";
|
||||
std::cout << "pcie_link_speed=" << smu.pcie_link_speed << "\n";
|
||||
std::cout << "xgmi_link_width=" << smu.xgmi_link_width << "\n";
|
||||
std::cout << "xgmi_link_speed=" << smu.xgmi_link_speed << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "Utilization Accumulated(%):\n";
|
||||
std::cout << "gfx_activity_acc=" << std::dec << smu.gfx_activity_acc << "\n";
|
||||
std::cout << "mem_activity_acc=" << std::dec << smu.mem_activity_acc << "\n";
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << "XGMI ACCUMULATED DATA TRANSFER SIZE (KB):\n";
|
||||
std::cout << std::dec << "xgmi_read_data_acc= [";
|
||||
std::copy(std::begin(smu.xgmi_read_data_acc),
|
||||
std::end(smu.xgmi_read_data_acc),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
std::cout << std::dec << "xgmi_write_data_acc= [";
|
||||
std::copy(std::begin(smu.xgmi_write_data_acc),
|
||||
std::end(smu.xgmi_write_data_acc),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << std::dec << "]\n";
|
||||
|
||||
// Voltage (mV)
|
||||
std::cout << "voltage_soc = " << std::dec << smu.voltage_soc << "\n";
|
||||
std::cout << "voltage_gfx = " << std::dec << smu.voltage_gfx << "\n";
|
||||
std::cout << "voltage_mem = " << std::dec << smu.voltage_mem << "\n";
|
||||
|
||||
std::cout << "indep_throttle_status = " << std::dec << smu.indep_throttle_status << "\n";
|
||||
|
||||
// Clock Lock Status. Each bit corresponds to clock instance
|
||||
std::cout << "gfxclk_lock_status (in hex) = " << std::hex
|
||||
<< smu.gfxclk_lock_status << std::dec <<"\n";
|
||||
|
||||
// Bandwidth (GB/sec)
|
||||
std::cout << "pcie_bandwidth_acc=" << std::dec << smu.pcie_bandwidth_acc << "\n";
|
||||
std::cout << "pcie_bandwidth_inst=" << std::dec << smu.pcie_bandwidth_inst << "\n";
|
||||
|
||||
// Counts
|
||||
std::cout << "pcie_l0_to_recov_count_acc= " << std::dec << smu.pcie_l0_to_recov_count_acc
|
||||
<< "\n";
|
||||
std::cout << "pcie_replay_count_acc= " << std::dec << smu.pcie_replay_count_acc << "\n";
|
||||
std::cout << "pcie_replay_rover_count_acc= " << std::dec
|
||||
<< smu.pcie_replay_rover_count_acc << "\n";
|
||||
std::cout << "pcie_nak_sent_count_acc= " << std::dec << smu.pcie_nak_sent_count_acc
|
||||
<< "\n";
|
||||
std::cout << "pcie_nak_rcvd_count_acc= " << std::dec << smu.pcie_nak_rcvd_count_acc
|
||||
<< "\n";
|
||||
|
||||
// PCIE other end recovery counter
|
||||
std::cout << "pcie_lc_perf_other_end_recovery = "
|
||||
<< std::dec << smu.pcie_lc_perf_other_end_recovery << "\n";
|
||||
|
||||
// Accumulation cycle counter
|
||||
// Accumulated throttler residencies
|
||||
std::cout << "\n";
|
||||
std::cout << "RESIDENCY ACCUMULATION / COUNTER:\n";
|
||||
std::cout << "accumulation_counter = " << std::dec << smu.accumulation_counter << "\n";
|
||||
std::cout << "prochot_residency_acc = " << std::dec << smu.prochot_residency_acc << "\n";
|
||||
std::cout << "ppt_residency_acc = " << std::dec << smu.ppt_residency_acc << "\n";
|
||||
std::cout << "socket_thm_residency_acc = " << std::dec << smu.socket_thm_residency_acc
|
||||
<< "\n";
|
||||
std::cout << "vr_thm_residency_acc = " << std::dec << smu.vr_thm_residency_acc
|
||||
<< "\n";
|
||||
std::cout << "hbm_thm_residency_acc = " << std::dec << smu.hbm_thm_residency_acc << "\n";
|
||||
|
||||
// Number of current partitions
|
||||
std::cout << "num_partition = " << std::dec << smu.num_partition << "\n";
|
||||
|
||||
|
||||
std::cout << std::dec << "xcp_stats.gfx_busy_inst = \n";
|
||||
auto xcp = 0;
|
||||
for (auto& row : smu.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.gfx_busy_inst),
|
||||
std::end(row.gfx_busy_inst),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
xcp = 0;
|
||||
std::cout << std::dec << "xcp_stats.jpeg_busy = \n";
|
||||
for (auto& row : smu.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.jpeg_busy),
|
||||
std::end(row.jpeg_busy),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
xcp = 0;
|
||||
std::cout << std::dec << "xcp_stats.vcn_busy = \n";
|
||||
for (auto& row : smu.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.vcn_busy),
|
||||
std::end(row.vcn_busy),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
xcp = 0;
|
||||
std::cout << std::dec << "xcp_stats.gfx_busy_acc = \n";
|
||||
for (auto& row : smu.xcp_stats) {
|
||||
std::cout << "XCP[" << xcp << "] = " << "[ ";
|
||||
std::copy(std::begin(row.gfx_busy_acc),
|
||||
std::end(row.gfx_busy_acc),
|
||||
amd::smi::make_ostream_joiner(&std::cout, ", "));
|
||||
std::cout << " ]\n";
|
||||
xcp++;
|
||||
}
|
||||
|
||||
std::cout << "\n\n";
|
||||
std::cout << "\t ** -> Checking metrics with constant changes ** " << "\n";
|
||||
@@ -256,17 +366,20 @@ void TestGpuMetricsRead::Run(void) {
|
||||
rsmi_gpu_metrics_t gpu_metrics_check;
|
||||
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
|
||||
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
|
||||
std::cout << "\t\t -> firmware_timestamp [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.firmware_timestamp << "\n";
|
||||
std::cout << "\t\t -> firmware_timestamp [" << idx << "/" << kMAX_ITER_TEST << "]: "
|
||||
<< gpu_metrics_check.firmware_timestamp << "\n";
|
||||
}
|
||||
|
||||
std::cout << "\n";
|
||||
for (auto idx = uint16_t(1); idx <= kMAX_ITER_TEST; ++idx) {
|
||||
rsmi_dev_gpu_metrics_info_get(i, &gpu_metrics_check);
|
||||
std::cout << "\t\t -> system_clock_counter [" << idx << "/" << kMAX_ITER_TEST << "]: " << gpu_metrics_check.system_clock_counter << "\n";
|
||||
std::cout << "\t\t -> system_clock_counter [" << idx << "/" << kMAX_ITER_TEST << "]: "
|
||||
<< gpu_metrics_check.system_clock_counter << "\n";
|
||||
}
|
||||
|
||||
std::cout << "\n";
|
||||
std::cout << " ** Note: Values MAX'ed out (UINTX MAX are unsupported for the version in question) ** " << "\n\n";
|
||||
std::cout << " ** Note: Values MAX'ed out "
|
||||
<< "(UINTX MAX are unsupported for the version in question) ** " << "\n\n";
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -53,6 +53,7 @@
|
||||
#include "rocm_smi/rocm_smi.h"
|
||||
#include "rocm_smi_test/functional/measure_api_execution_time.h"
|
||||
#include "rocm_smi_test/test_common.h"
|
||||
#include "rocm_smi/rocm_smi_utils.h"
|
||||
|
||||
|
||||
TestMeasureApiExecutionTime::TestMeasureApiExecutionTime() : TestBase() {
|
||||
@@ -89,8 +90,31 @@ void TestMeasureApiExecutionTime::Run(void) {
|
||||
rsmi_temperature_metric_t met = RSMI_TEMP_CURRENT;
|
||||
rsmi_status_t ret;
|
||||
float repeat = 300.0;
|
||||
constexpr uint32_t kFAN_SPEED_ELAPSED_MS_BASE = (1000);
|
||||
constexpr uint32_t kMETRICS_ELAPSED_MS_BASE = (1500);
|
||||
constexpr float kFAN_SPEED_ELAPSED_MICROSEC_BASE = (1000);
|
||||
/**
|
||||
* gpu_metrics can only refresh every 1000 microseconds (1 millisecs) due to FW
|
||||
*
|
||||
* We have additional processing time (each read() -> fread() ~ costs 900 microseconds).
|
||||
* We need to read 2x:
|
||||
* 1) reading metric's header to check support (~900 microseconds)
|
||||
* 2) read full metric based on defined structure (~900 microseconds)
|
||||
* 3) Setup backwards compatiblity (~100 microseconds)
|
||||
* 4) Put data into structures (~100 microseconds)
|
||||
* 5) Pass to public structure (~100 microseconds)
|
||||
* ---------------------------
|
||||
* ~2100 worst case
|
||||
*
|
||||
* Note: performance of fread/mmap/read
|
||||
* https://github.com/nurettn/c-read-vs-mmap-vs-fread
|
||||
*
|
||||
* Possible improvments ideas:
|
||||
* a) Initize "N/A" / Max UINT only for non-backwards comptable public struct
|
||||
* or arrays
|
||||
* b) Directly put data into public structure - this skips other copy/fill
|
||||
* procedures
|
||||
* c) Expirement with other file reading options
|
||||
**/
|
||||
constexpr float kMETRICS_ELAPSED_MICROSEC_BASE = (2100);
|
||||
bool skip = false;
|
||||
|
||||
TestBase::Run();
|
||||
@@ -107,91 +131,125 @@ void TestMeasureApiExecutionTime::Run(void) {
|
||||
for (uint32_t dv_ind = 0; dv_ind < num_monitor_devs(); ++dv_ind) {
|
||||
PrintDeviceHeader(dv_ind);
|
||||
|
||||
//test execution time for rsmi_dev_fan_speed_get
|
||||
// test execution time for rsmi_dev_fan_speed_get
|
||||
auto start = std::chrono::high_resolution_clock::now();
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i){
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i) {
|
||||
ret = rsmi_dev_fan_speed_get(dv_ind, 0, &val_i64);
|
||||
|
||||
}
|
||||
auto stop = std::chrono::high_resolution_clock::now();
|
||||
auto duration = std::chrono::duration_cast
|
||||
<std::chrono::microseconds>(stop - start);
|
||||
|
||||
if (ret != RSMI_STATUS_SUCCESS){
|
||||
std::cout << "\n\trsmi_dev_fan_speed_get returned: "
|
||||
<< amd::smi::getRSMIStatusString(ret) << "\n";
|
||||
if (ret != RSMI_STATUS_SUCCESS) {
|
||||
skip = true;
|
||||
}
|
||||
std::cout << std:: endl;
|
||||
|
||||
// Expected performance: (stop - start) over all iterations [in microseconds]
|
||||
// == (expected microseconds * # of iterations)
|
||||
|
||||
if (!skip) {
|
||||
std::cout << "\trsmi_dev_fan_speed_get execution time: " <<
|
||||
(static_cast<float>(duration.count()) / repeat) << " microseconds" << std::endl;
|
||||
EXPECT_LT(duration.count(), (kFAN_SPEED_ELAPSED_MS_BASE * repeat));
|
||||
std::cout << "\trsmi_dev_fan_speed_get() total execution time: "
|
||||
<< std::to_string((static_cast<float>(duration.count())))
|
||||
<< " microseconds, expected < "
|
||||
<< std::to_string((static_cast<float>(kFAN_SPEED_ELAPSED_MICROSEC_BASE) * repeat))
|
||||
<< " microseconds" << std::endl;
|
||||
std::cout << "\trsmi_dev_fan_speed_get() average execution time: "
|
||||
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
|
||||
EXPECT_LT(duration.count(), static_cast<float>(kFAN_SPEED_ELAPSED_MICROSEC_BASE) * repeat);
|
||||
}
|
||||
skip = false;
|
||||
|
||||
//test execution time for rsmi_dev_temp_metric_get
|
||||
// test execution time for rsmi_dev_temp_metric_get
|
||||
start = std::chrono::high_resolution_clock::now();
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i){
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i) {
|
||||
ret = rsmi_dev_temp_metric_get(dv_ind, 0, met, &val_i64);
|
||||
}
|
||||
stop = std::chrono::high_resolution_clock::now();
|
||||
duration = std::chrono::duration_cast
|
||||
<std::chrono::microseconds>(stop - start);
|
||||
|
||||
if (ret != RSMI_STATUS_SUCCESS){
|
||||
std::cout << "\n\trsmi_dev_temp_metric_get returned: "
|
||||
<< amd::smi::getRSMIStatusString(ret) << "\n";
|
||||
if (ret != RSMI_STATUS_SUCCESS) {
|
||||
skip = true;
|
||||
}
|
||||
if (!skip) {
|
||||
std::cout << "\trsmi_dev_temp_metric_get execution time: " <<
|
||||
(static_cast<float>(duration.count()) / repeat) << " microseconds" << std::endl;
|
||||
EXPECT_LT(duration.count(), (kMETRICS_ELAPSED_MS_BASE * repeat));
|
||||
std::cout << "\trsmi_dev_temp_metric_get() total execution time: "
|
||||
<< std::to_string((static_cast<float>(duration.count())))
|
||||
<< " microseconds, expected < "
|
||||
<< std::to_string((static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat))
|
||||
<< " microseconds" << std::endl;
|
||||
std::cout << "\trsmi_dev_temp_metric_get() average execution time: "
|
||||
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
|
||||
EXPECT_LT(duration.count(), (static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat));
|
||||
}
|
||||
skip = false;
|
||||
|
||||
//test execution time for rsmi_dev_gpu_metrics_info_get
|
||||
// test execution time for rsmi_dev_gpu_metrics_info_get
|
||||
start = std::chrono::high_resolution_clock::now();
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i){
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i) {
|
||||
ret = rsmi_dev_gpu_metrics_info_get(dv_ind, &smu);
|
||||
}
|
||||
stop = std::chrono::high_resolution_clock::now();
|
||||
duration = std::chrono::duration_cast
|
||||
<std::chrono::microseconds>(stop - start) ;
|
||||
<std::chrono::microseconds>(stop - start);
|
||||
|
||||
if (ret != RSMI_STATUS_SUCCESS){
|
||||
std::cout << "\n\trsmi_dev_gpu_metrics_info_get returned: "
|
||||
<< amd::smi::getRSMIStatusString(ret) << "\n";
|
||||
if (ret != RSMI_STATUS_SUCCESS) {
|
||||
skip = true;
|
||||
}
|
||||
if (!skip) {
|
||||
std::cout << "\trsmi_dev_gpu_metrics_info_get execution time: " <<
|
||||
(static_cast<float>(duration.count()) / repeat ) << " microseconds" << std::endl;
|
||||
EXPECT_LT(duration.count(), (kMETRICS_ELAPSED_MS_BASE * repeat));
|
||||
std::cout << "\trsmi_dev_gpu_metrics_info_get() total execution time: "
|
||||
<< std::to_string(static_cast<float>(duration.count()))
|
||||
<< " microseconds, expected < "
|
||||
<< std::to_string((kMETRICS_ELAPSED_MICROSEC_BASE * repeat))
|
||||
<< " microseconds" << std::endl;
|
||||
std::cout << "\trsmi_dev_gpu_metrics_info_get() average execution time: "
|
||||
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
|
||||
EXPECT_LT(static_cast<float>(duration.count()),
|
||||
static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat);
|
||||
}
|
||||
skip = false;
|
||||
|
||||
auto val_ui16 = static_cast<uint16_t>(0);
|
||||
auto status_code(rsmi_status_t::RSMI_STATUS_SUCCESS);
|
||||
start = std::chrono::high_resolution_clock::now();
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i){
|
||||
for (int i=0; i < static_cast<int>(repeat); ++i) {
|
||||
status_code = rsmi_dev_metrics_xcd_counter_get(dv_ind, &val_ui16);
|
||||
}
|
||||
stop = std::chrono::high_resolution_clock::now();
|
||||
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
|
||||
if (status_code != rsmi_status_t::RSMI_STATUS_SUCCESS){
|
||||
std::cout << "\n\tsmi_dev_metrics_xcd_counter_get returned: "
|
||||
<< amd::smi::getRSMIStatusString(ret) << "\n";
|
||||
if (status_code != rsmi_status_t::RSMI_STATUS_SUCCESS) {
|
||||
skip = true;
|
||||
}
|
||||
if (!skip) {
|
||||
std::cout << "\trsmi_dev_metrics_xcd_counter_get() execution time: "
|
||||
<< (static_cast<float>(duration.count()) / repeat) << " microseconds" << std::endl;
|
||||
EXPECT_LT(duration.count(), (kMETRICS_ELAPSED_MS_BASE * repeat));
|
||||
std::cout << "\trsmi_dev_metrics_xcd_counter_get() total execution time: "
|
||||
<< std::to_string((static_cast<float>(duration.count())))
|
||||
<< " microseconds, expected < "
|
||||
<< std::to_string((static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat))
|
||||
<< " microseconds" << std::endl;
|
||||
std::cout << "\trsmi_dev_metrics_xcd_counter_get() average execution time: "
|
||||
<< std::to_string(duration.count()/repeat) << " microseconds" << std::endl;
|
||||
EXPECT_LT(duration.count(), static_cast<float>(kMETRICS_ELAPSED_MICROSEC_BASE) * repeat);
|
||||
}
|
||||
skip = false;
|
||||
}
|
||||
|
||||
std::cout.precision(prev);
|
||||
auto test_stop = std::chrono::high_resolution_clock::now();
|
||||
auto test_duration = std::chrono::duration_cast<std::chrono::microseconds>(test_stop - test_start);
|
||||
auto test_duration = std::chrono::duration_cast<std::chrono::microseconds>(
|
||||
test_stop - test_start);
|
||||
|
||||
std::cout << "\n" << "============================================================================" << "\n";
|
||||
std::cout << "\n"
|
||||
<< "============================================================================" << "\n";
|
||||
std::cout << " Total execution time (All APIs): "
|
||||
<< (static_cast<float>(test_duration.count()) / repeat) << " microseconds" << "\n";
|
||||
std::cout << "============================================================================" << "\n";
|
||||
<< (test_duration.count()) << " microseconds" << "\n";
|
||||
std::cout
|
||||
<< "============================================================================" << "\n";
|
||||
}
|
||||
|
||||
新しいイシューから参照
ユーザーをブロックする