Updated: rocm_smi.py - Remove all else: clauses from functions where rsmi_ret_ok is part of the if clause, as requested. - rsmi_ret_ok() function already handles unsucessful return codes and gracefully handles them. - Updated check_runtime_status() function to sweep through /sys/class/drm to find active runtime_status. - Updated the message to' AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status' - This clarifies the status of the GPU and tells them where to check for more info. Signed-off-by: Juan Castillo <juan.castillo@amd.com> Co-authored-by: Juan Castillo <juan.castillo@amd.com> Co-authored-by: Maisam Arif <Maisam.Arif@amd.com> Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
🛠️ Maintenance Mode Notice 🛠️
Starting with ROCm 7.0, only critical bug fixes will be applied to ROCm-SMI. For a seamless experience and continued support, please switch to AMD-SMI.
Use C++ in ROCm SMI
Device Indices
Many of the functions in the library take a "device index". The device index is a number greater than or equal to 0, and less than the number of devices detected, as determined by rsmi_num_monitor_devices(). The index is used to distinguish the detected devices from one another. It is important to note that a device may end up with a different index after a reboot, so an index should not be relied upon to be constant over reboots.
Hello ROCm SMI
The only required ROCm-SMI call for any program that wants to use ROCm-SMI is the rsmi_init() call. This call initializes some internal data structures that will be used by subsequent ROCm-SMI calls.
When ROCm-SMI is no longer being used, rsmi_shut_down() should be called. This provides a way to do any releasing of resources that ROCm-SMI may have held. In many cases, this may have no effect, but may be necessary in future versions of the library.
A simple "Hello World" type program that displays the device ID of detected devices would look like this:
#include <stdint.h>
#include "rocm_smi/rocm_smi.h"
int main() {
rsmi_status_t ret;
uint32_t num_devices;
uint16_t dev_id;
// We will skip return code checks for this example, but it
// is recommended to always check this as some calls may not
// apply for some devices or ROCm releases
ret = rsmi_init(0);
ret = rsmi_num_monitor_devices(&num_devices);
for (int i=0; i < num_devices; ++i) {
ret = rsmi_dev_id_get(i, &dev_id);
// dev_id holds the device ID of device i, upon a
// successful call
}
ret = rsmi_shut_down();
return 0;
}