Documentation updates for AMDSMI_GPU_METRICS_CACHE_MS (#564)

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 2dc2e12a97]
This commit is contained in:
Pryor, Adam
2025-08-05 19:58:37 -05:00
committad av GitHub
förälder dbc496b36f
incheckning 32a1ef90cd
6 ändrade filer med 58 tillägg och 1 borttagningar
+4
Visa fil
@@ -41,6 +41,10 @@ except ImportError as e:
# from amdsmi import amdsmi_interface
# from amdsmi import amdsmi_exception
# Set the environment variable for GPU metrics cache duration
cache_ms = os.environ.setdefault("AMDSMI_GPU_METRICS_CACHE_MS", "100")
logging.debug("AMDSMI_GPU_METRICS_CACHE_MS = %sms", cache_ms)
try:
from amdsmi_init import *
from amdsmi_helpers import AMDSMIHelpers
@@ -75,6 +75,20 @@ usage information. See [Commands](#cmds).
For more detailed version information, use `amd-smi version`.
```
Environment variables:
You can set one or more variables in front of any `amd-smi` invocation. For example:
```shell-session
AMDSMI_GPU_METRICS_CACHE_MS=200 amd-smi metric
```
Current Variables:
```{note}
AMDSMI_GPU_METRICS_CACHE_MS - Controls the internal GPU metrics cache duration (ms). Default 100, set to 0 to disable.
```
(cmds)=
## Commands
@@ -21,6 +21,12 @@ variable to the directory containing ``librocm_smi64.so`` (usually
``/opt/rocm/lib``) or by passing the ``-lamd_smi`` flag to the compiler.
```
```{note}
The environment variable ``AMDSMI_GPU_METRICS_CACHE_MS`` may be set to
control the internal GPU metrics cache duration (ms).
Default 1, set to 0 to disable.
```
```{seealso}
Refer to the [C++ library API reference](../reference/amdsmi-cpp-api.md).
```
@@ -39,6 +39,30 @@ variable to the directory containing ``librocm_smi64.so`` (usually
``/opt/rocm/lib``) or by passing the ``-lamd_smi`` flag to the compiler.
```
```{note}
The environment variable ``AMDSMI_GPU_METRICS_CACHE_MS`` may be set to
control the internal GPU metrics cache duration (ms).
Default 1, set to 0 to disable.
You can apply it in one of two ways:
1. In Python code (before the AMDSMI library loads):
```
```python
import os
os.environ["AMDSMI_GPU_METRICS_CACHE_MS"] = "200"
from amdsmi import *
```
```{note}
2. On the shell when invoking Python:
```
```shell
AMDSMI_GPU_METRICS_CACHE_MS=200 python tools/amdsmi_quick_start.py
```
To get started, the `amdsmi` folder should be copied and placed next to
the importing script. Import it as follows:
@@ -1104,7 +1104,7 @@ namespace {
// Keep 1 cache map, with an entry for each gpu
std::unordered_map<std::string, GpuMetricsCache> g_gpu_metrics_cache_map;
static const std::chrono::milliseconds kGpuMetricsCacheDuration(
read_env_ms("AMDSMI_GPU_METRICS_CACHE_MS", 100)
read_env_ms("AMDSMI_GPU_METRICS_CACHE_MS", 1)
);
}
@@ -1113,6 +1113,12 @@ int Device::readDevInfoBinary(DevInfoTypes type, std::size_t b_size,
auto sysfs_path = path_;
std::ostringstream ss;
ss << __PRETTY_FUNCTION__
<< " | AMDSMI_GPU_METRICS_CACHE_MS = "
<< kGpuMetricsCacheDuration.count()
<< " ms";
LOG_DEBUG(ss);
// Size will either be 4, or 3872+. When 4, it's only reading from the header.
// If this header read is inconsequential, we could only cache full read.
// However, it seems reading the gpu_metrics sysfs in any capacity
@@ -26,6 +26,9 @@ import logging
import signal
import sys
# Metrics cache set to 1 by default, uncomment to change
# os.environ["AMDSMI_GPU_METRICS_CACHE_MS"] = "1"
try:
from amdsmi import *
except ImportError as e: