Files
Joseph Narlo 16f06808d4 [SWDEV-565460] AMD SMI Document Multiple Init Best Practices (#2293)
* [SWDEV-565460] AMD SMI Document Multiple Init Best Practices

Signed-off-by: amd-josnarlo <josnarlo.amd.com>

* Add sphinxcontrib-mermaid to render diagram in HTML

bump rocm-docs-core to 1.31.0
pip-compile requirements.txt

---------

Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
2025-12-16 11:06:18 -06:00

11 KiB

myst
myst
html_meta
description lang=en keywords
Get started with the AMD SMI C++ library. Basic usage and examples. api, smi, lib, c++, system, management, interface, ROCm

AMD SMI C++ library usage and examples

This section presents a brief overview and some basic examples on the AMD SMI library's usage. Whether you are developing applications for performance monitoring, system diagnostics, or resource allocation, the AMD SMI C++ library serves as a valuable tool for leveraging the full potential of AMD hardware in your projects.

``hipcc`` and other compilers will not automatically link in the ``libamd_smi``
dynamic library. To compile code that uses the AMD SMI library API, ensure the
``libamd_smi.so`` can be located by setting the ``LD_LIBRARY_PATH`` environment
variable to the directory containing ``librocm_smi64.so`` (usually
``/opt/rocm/lib``) or by passing the ``-lamd_smi`` flag to the compiler.
The environment variable ``AMDSMI_GPU_METRICS_CACHE_MS`` may be set to
control the internal GPU metrics cache duration (ms).
Default 1, set to 0 to disable.
The environment variable ``AMDSMI_ASIC_INFO_CACHE_MS`` may be set to
control the internal GPU asic info cache duration (ms).
Default 10000 ms, set to 0 to disable.
Refer to the [C++ library API reference](../reference/amdsmi-cpp-api.md).

(device_socket_handle)=

Device and socket handles

Many functions in the library take a socket handle or device handle. A socket refers to a physical hardware socket, abstracted by the library to represent the hardware more effectively to the user. While there is always one unique GPU per socket, an APU may house both a GPU and CPU on the same socket. For MI200 GPUs, multiple GCDs may reside within a single socket

To identify the sockets in a system, use the amdsmi_get_socket_handles() function, which returns a list of socket handles. These handles can then be used with amdsmi_get_processor_handles() to query devices within each socket. The device handle is used to differentiate between detected devices; however, it's important to note that a device handle may change after restarting the application, so it should not be considered a persistent identifier across processes.

The list of socket handles obtained from amdsmi_get_socket_handles() can also be used to query the CPUs in each socket by calling amdsmi_get_processor_handles_by_type(). This function can then be called again to query the cores within each CPU.

(cpp_hello_amdsmi)=

Hello AMD SMI

An application using AMD SMI must call amdsmi_init() to initialize the AMI SMI library before all other calls. This call initializes the internal data structures required for subsequent AMD SMI operations. In the call, a flag can be passed to indicate if the application is interested in a specific device type.

amdsmi_shut_down() must be the last call to properly close connection to driver and make sure that any resources held by AMD SMI are released.

  1. A simple "Hello World" type program that displays the temperature of detected devices.

    Sample build example:
    $ g++ -I/opt/rocm/include <file_name>.cc -L/opt/rocm/lib -lamd_smi -o <filename>
    
    Users /opt/rocm-*/bin path may differ (depending on install), please locate the path of your libamd_smi.so.*.
    For example:
    
    $ sudo find /opt/ -iname libamd_smi.so*
    /opt/rocm-6.4.1/lib/libamd_smi.so.25.0
    /opt/rocm-6.4.1/lib/libamd_smi.so
    

    The code is as follows:

    #include <iostream>
    #include <vector>
    #include "amd_smi/amdsmi.h"
    
    int main() {
      amdsmi_status_t ret;
    
      // Init amdsmi for sockets and devices. Here we are only interested in AMD_GPUS.
      ret = amdsmi_init(AMDSMI_INIT_AMD_GPUS);
    
      // Get all sockets
      uint32_t socket_count = 0;
    
      // Get the socket count available in the system.
      ret = amdsmi_get_socket_handles(&socket_count, nullptr);
    
      // Allocate the memory for the sockets
      std::vector<amdsmi_socket_handle> sockets(socket_count);
      // Get the socket handles in the system
      ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);
    
      std::cout << "Total Socket: " << socket_count << std::endl;
    
      // For each socket, get identifier and devices
      for (uint32_t i=0; i < socket_count; i++) {
        // Get Socket info
        char socket_info[128];
        ret = amdsmi_get_socket_info(sockets[i], 128, socket_info);
        std::cout << "Socket " << socket_info<< std::endl;
    
        // Get the device count for the socket.
        uint32_t device_count = 0;
        ret = amdsmi_get_processor_handles(sockets[i], &device_count, nullptr);
    
        // Allocate the memory for the device handlers on the socket
        std::vector<amdsmi_processor_handle> processor_handles(device_count);
        // Get all devices of the socket
        ret = amdsmi_get_processor_handles(sockets[i],
                  &device_count, &processor_handles[0]);
    
        // For each device of the socket, get name and temperature.
        for (uint32_t j=0; j < device_count; j++) {
          // Get device type. Since the amdsmi is initialized with
          // AMD_SMI_INIT_AMD_GPUS, the processor_type must be AMDSMI_PROCESSOR_TYPE_AMD_GPU.
          processor_type_t processor_type;
          ret = amdsmi_get_processor_type(processor_handles[j], &processor_type);
          if (processor_type != AMDSMI_PROCESSOR_TYPE_AMD_GPU) {
            std::cout << "Expect AMDSMI_PROCESSOR_TYPE_AMD_GPU device type!\n";
            return 1;
          }
    
          // Get device name
          amdsmi_board_info_t board_info;
          ret = amdsmi_get_gpu_board_info(processor_handles[j], &board_info);
          std::cout << "\tdevice "
                      << j <<"\n\t\tName:" << board_info.product_name << std::endl;
    
          // Get temperature
          int64_t val_i64 = 0;
          ret =  amdsmi_get_temp_metric(processor_handles[j], AMDSMI_TEMPERATURE_TYPE_EDGE,
                  AMDSMI_TEMP_CURRENT, &val_i64);
          std::cout << "\t\tTemperature: " << val_i64 << "C" << std::endl;
        }
      }
    
      // Clean up resources allocated at amdsmi_init. It will invalidate sockets
      // and devices pointers
      ret = amdsmi_shut_down();
    
      return 0;
    }
    
  2. A sample program that displays the power of detected CPUs.

    Sample build example:
    $ g++ -DENABLE_ESMI -I/opt/rocm/include <file_name>.cc -L/opt/rocm/lib -lamd_smi -o <filename>
    
    For finding available rocm include and library path, see building example on sample program 1 above.
    

    The code is as follows:

    #include <iostream>
    #include <vector>
    #include "amd_smi/amdsmi.h"
    
    int main(int argc, char **argv) {
        amdsmi_status_t ret;
        uint32_t socket_count = 0;
    
        // Initialize amdsmi for AMD CPUs
        ret = amdsmi_init(AMDSMI_INIT_AMD_CPUS);
    
        ret = amdsmi_get_socket_handles(&socket_count, nullptr);
    
        // Allocate the memory for the sockets
        std::vector<amdsmi_socket_handle> sockets(socket_count);
    
        // Get the sockets of the system
        ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]);
    
        std::cout << "Total Socket: " << socket_count << std::endl;
    
        // For each socket, get cpus
        for (uint32_t i = 0; i < socket_count; i++) {
            uint32_t cpu_count = 0;
    
            // Set processor type as AMDSMI_PROCESSOR_TYPE_AMD_CPU
            processor_type_t processor_type = AMDSMI_PROCESSOR_TYPE_AMD_CPU;
            ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, nullptr, &cpu_count);
    
            // Allocate the memory for the cpus
            std::vector<amdsmi_processor_handle> plist(cpu_count);
    
            // Get the cpus for each socket
            ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, &plist[0], &cpu_count);
    
            for (uint32_t index = 0; index < plist.size(); index++) {
                uint32_t socket_power;
                std::cout<<"CPU "<<index<<"\t"<< std::endl;
                std::cout<<"Power (Watts): ";
    
                ret = amdsmi_get_cpu_socket_power(plist[index], &socket_power);
                if(ret != AMDSMI_STATUS_SUCCESS)
                    std::cout<<"Failed to get cpu socket power"<<"["<<index<<"] , Err["<<ret<<"] "<< std::endl;
    
                if (!ret) {
                    std::cout<<static_cast<double>(socket_power)/1000<<std::endl;
                }
                std::cout<<std::endl;
            }
        }
    
        // Clean up resources allocated at amdsmi_init
        ret = amdsmi_shut_down();
    
        return 0;
    }
    

(multiple_init_perf_opt)=

Multiple initialization performance optimization

To optimize performance when initializing multiple amd-smi instances, AMD SMI implements a static metrics cache stored in a Singleton object. This design allows metrics data to persist across multiple instances of amd-smi, whether the tool is invoked repeatedly or used within a multithreaded C/C++ application.

Upon creation of the first amd-smi instance, a Singleton object that contains the metrics cache is instantiated. Any amd-smi instances created thereafter will inherit the Singleton object metrics cache data. Each amd-smi creation increases the usage counter in the Singleton object. And each amd-smi destruction decreases the usage counter. When the Singleton counter reaches zero, the metrics cache is destroyed along with the Singleton object. This principle follows the Singleton Design Principal of sharing cached data across multiple objects.

graph LR
    subgraph "Singleton Object"
    data_cache["Metrics data cache<br>instances 3"]
    end
    subgraph "amd-smi 1"
    data1[data] ----> data_cache
    end
    subgraph "amd-smi 2"
    data2[data] ----> data_cache
    end
    subgraph "amd-smi 3"
    data3[data] ----> data_cache
    end

The data caching can be controlled through two environment variables:

AMD_GPU_METRICS_CACHE_MS = 1 ms
AMD_ASIC_INFO_CACHE_MS = 10000 ms

These environment variables control how long information is stored in the data cache before it is refreshed. Therefore calls for the system information will not trigger a system retrieval until the cache goes invalid and needs refreshing.

(best_practice)=

Best practice

You should tune the cache refresh interval based on how frequently your application accesses data. If a multi-threaded application or multiple amd-smi instances are only going to need information every 5 seconds, then set the AMD_GPU_METRICS_CACHE_MS environment variable to something slightly less than 5 seconds.

AMD_GPU_METRICS_CACHE_MS = 4900 ms

In that way, the system is not constantly updating the cache from requests by each of the instances in the threads.