Files
rocm-systems/docs/how-to/amdsmi-cpp-lib.md
T

199 خطوط
7.5 KiB
Markdown

2024-10-02 14:42:34 -04:00
---
myst:
html_meta:
"description lang=en": "Get started with the AMD SMI C++ library. Basic usage and examples."
"keywords": "api, smi, lib, c++, system, management, interface, ROCm"
---
# AMD SMI C++ library usage and examples
This section presents a brief overview and some basic examples on the AMD SMI
library's usage. Whether you are developing applications for performance
monitoring, system diagnostics, or resource allocation, the AMD SMI C++ library
serves as a valuable tool for leveraging the full potential of AMD hardware in
your projects.
```{note}
``hipcc`` and other compilers will not automatically link in the ``libamd_smi``
dynamic library. To compile code that uses the AMD SMI library API, ensure the
``libamd_smi.so`` can be located by setting the ``LD_LIBRARY_PATH`` environment
variable to the directory containing ``librocm_smi64.so`` (usually
``/opt/rocm/lib``) or by passing the ``-lamd_smi`` flag to the compiler.
```
```{seealso}
Refer to the [C++ library API reference](../reference/amdsmi-cpp-api.md).
```
(device_socket_handle)=
## Device and socket handles
Many functions in the library take a _socket handle_ or _device handle_. A
_socket_ refers to a physical hardware socket, abstracted by the library to
represent the hardware more effectively to the user. While there is always one
unique GPU per socket, an APU may house both a GPU and CPU on the same socket.
For MI200 GPUs, multiple GCDs may reside within a single socket
To identify the sockets in a system, use the `amdsmi_get_socket_handles()`
function, which returns a list of socket handles. These handles can then be used
with `amdsmi_get_processor_handles()` to query devices within each socket. The
device handle is used to differentiate between detected devices; however, it's
important to note that a device handle may change after restarting the
application, so it should not be considered a persistent identifier across
processes.
The list of socket handles obtained from `amdsmi_get_socket_handles()` can
also be used to query the CPUs in each socket by calling
`amdsmi_get_processor_handles_by_type()`. This function can then be called again
to query the cores within each CPU.
(cpp_hello_amdsmi)=
## Hello AMD SMI
An application using AMD SMI must call `amdsmi_init()` to initialize the AMI SMI
library before all other calls. This call initializes the internal data
structures required for subsequent AMD SMI operations. In the call, a flag can
be passed to indicate if the application is interested in a specific device
type.
`amdsmi_shut_down()` must be the last call to properly close connection to
driver and make sure that any resources held by AMD SMI are released.
1. A simple "Hello World" type program that displays the temperature of detected
devices looks like this:
```cpp
#include <iostream>
#include <vector>
#include "amd_smi/amdsmi.h"
int main() {
amdsmi_status_t ret;
// Init amdsmi for sockets and devices. Here we are only interested in AMD_GPUS.
ret = amdsmi_init(AMDSMI_INIT_AMD_GPUS);
// Get all sockets
uint32_t socket_count = 0;
// Get the socket count available in the system.
ret = amdsmi_get_socket_handles(&socket_count, nullptr);
// Allocate the memory for the sockets
std::vector<amdsmi_socket_handle> sockets(socket_count);
// Get the socket handles in the system
ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);
std::cout << "Total Socket: " << socket_count << std::endl;
// For each socket, get identifier and devices
for (uint32_t i=0; i < socket_count; i++) {
// Get Socket info
char socket_info[128];
ret = amdsmi_get_socket_info(sockets[i], 128, socket_info);
std::cout << "Socket " << socket_info<< std::endl;
// Get the device count for the socket.
uint32_t device_count = 0;
ret = amdsmi_get_processor_handles(sockets[i], &device_count, nullptr);
// Allocate the memory for the device handlers on the socket
std::vector<amdsmi_processor_handle> processor_handles(device_count);
// Get all devices of the socket
ret = amdsmi_get_processor_handles(sockets[i],
&device_count, &processor_handles[0]);
// For each device of the socket, get name and temperature.
for (uint32_t j=0; j < device_count; j++) {
// Get device type. Since the amdsmi is initialized with
// AMD_SMI_INIT_AMD_GPUS, the processor_type must be AMDSMI_PROCESSOR_TYPE_AMD_GPU.
processor_type_t processor_type;
ret = amdsmi_get_processor_type(processor_handles[j], &processor_type);
if (processor_type != AMDSMI_PROCESSOR_TYPE_AMD_GPU) {
std::cout << "Expect AMDSMI_PROCESSOR_TYPE_AMD_GPU device type!\n";
return 1;
}
// Get device name
amdsmi_board_info_t board_info;
ret = amdsmi_get_gpu_board_info(processor_handles[j], &board_info);
std::cout << "\tdevice "
<< j <<"\n\t\tName:" << board_info.product_name << std::endl;
// Get temperature
int64_t val_i64 = 0;
ret = amdsmi_get_temp_metric(processor_handles[j], AMDSMI_TEMPERATURE_TYPE_EDGE,
AMDSMI_TEMP_CURRENT, &val_i64);
std::cout << "\t\tTemperature: " << val_i64 << "C" << std::endl;
}
}
// Clean up resources allocated at amdsmi_init. It will invalidate sockets
// and devices pointers
ret = amdsmi_shut_down();
return 0;
}
```
2. A sample program that displays the power of detected CPUs looks like this:
```cpp
#include <iostream>
#include <vector>
#include "amd_smi/amdsmi.h"
int main(int argc, char **argv) {
amdsmi_status_t ret;
uint32_t socket_count = 0;
// Initialize amdsmi for AMD CPUs
ret = amdsmi_init(AMDSMI_INIT_AMD_CPUS);
ret = amdsmi_get_socket_handles(&socket_count, nullptr);
// Allocate the memory for the sockets
std::vector<amdsmi_socket_handle> sockets(socket_count);
// Get the sockets of the system
ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);
std::cout << "Total Socket: " << socket_count << std::endl;
// For each socket, get cpus
for (uint32_t i = 0; i < socket_count; i++) {
uint32_t cpu_count = 0;
// Set processor type as AMDSMI_PROCESSOR_TYPE_AMD_CPU
processor_type_t processor_type = AMDSMI_PROCESSOR_TYPE_AMD_CPU;
ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, nullptr, &cpu_count);
// Allocate the memory for the cpus
std::vector<amdsmi_processor_handle> plist(cpu_count);
// Get the cpus for each socket
ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, &plist[0], &cpu_count);
for (uint32_t index = 0; index < plist.size(); index++) {
uint32_t socket_power;
std::cout<<"CPU "<<index<<"\t"<< std::endl;
std::cout<<"Power (Watts): ";
ret = amdsmi_get_cpu_socket_power(plist[index], &socket_power);
if(ret != AMDSMI_STATUS_SUCCESS)
std::cout<<"Failed to get cpu socket power"<<"["<<index<<"] , Err["<<ret<<"] "<< std::endl;
if (!ret) {
std::cout<<static_cast<double>(socket_power)/1000<<std::endl;
}
std::cout<<std::endl;
}
}
// Clean up resources allocated at amdsmi_init
ret = amdsmi_shut_down();
return 0;
}
```