Update ROCm 6.0/6.1 CHANGELOG.md & README.md

* Updates:
    - [CHANGELOG.md] Add 6.1 and update 6.0 changes
    - [README.md] Update README.md with ROCm install instructions

Change-Id: Ic701ebcb00e5d0af54d8f97707c1cec71a0aac4c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 583e5e99bf]
This commit is contained in:
Charis Poag
2024-03-19 19:54:01 -05:00
rodzic 3a2f4286cc
commit e9190173ea
4 zmienionych plików z 420 dodań i 8 usunięć
+406 -2
Wyświetl plik
@@ -1,6 +1,410 @@
# Change Log for AMD SMI Library
Full documentation for amd_smi_lib is available at [https://docs.amd.com/](https://rocm.docs.amd.com/projects/amdsmi/en/latest/).
Full documentation for amd_smi_lib is available at [https://rocm.docs.amd.com/](https://rocm.docs.amd.com/projects/amdsmi/en/latest/).
***All information listed below is for reference and subject to change.***
## amd_smi_lib for ROCm 6.1.0
### Added
- **Added Monitor Command**
Provides users the ability to customize GPU metrics to capture, collect, and observe. Output is provided in a table view. This aligns closer to ROCm SMI `rocm-smi` (no argument), additionally allows uers to customize what data is helpful for their use-case.
```shell
$ amd-smi monitor -h
usage: amd-smi monitor [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL]
[-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]]
[-w INTERVAL] [-W TIME] [-i ITERATIONS] [-p] [-t] [-u] [-m] [-n]
[-d] [-s] [-e] [-v] [-r]
Monitor a target device for the specified arguments.
If no arguments are provided, all arguments will be enabled.
Use the watch arguments to run continuously
Monitor Arguments:
-h, --help show this help message and exit
-g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices:
ID: 0 | BDF: 0000:01:00.0 | UUID: 4eff74a0-0000-1000-802d-1d762a397f73
all | Selects all devices
-U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices:
ID: 0
all | Selects all devices
-O, --core CORE [CORE ...] Select a Core ID from the possible choices:
ID: 0 - 23
all | Selects all devices
-w, --watch INTERVAL Reprint the command in a loop of INTERVAL seconds
-W, --watch_time TIME The total TIME to watch the given command
-i, --iterations ITERATIONS Total number of ITERATIONS to loop on the given command
-p, --power-usage Monitor power usage in Watts
-t, --temperature Monitor temperature in Celsius
-u, --gfx Monitor graphics utilization (%) and clock (MHz)
-m, --mem Monitor memory utilization (%) and clock (MHz)
-n, --encoder Monitor encoder utilization (%) and clock (MHz)
-d, --decoder Monitor decoder utilization (%) and clock (MHz)
-s, --throttle-status Monitor thermal throttle status
-e, --ecc Monitor ECC single bit, ECC double bit, and PCIe replay error counts
-v, --vram-usage Monitor memory usage in MB
-r, --pcie Monitor PCIe Tx/Rx in MB/s
Command Modifiers:
--json Displays output in JSON format (human readable by default).
--csv Displays output in CSV format (human readable by default).
--file FILE Saves output into a file on the provided path (stdout by default).
--loglevel LEVEL Set the logging level from the possible choices:
DEBUG, INFO, WARNING, ERROR, CRITICAL
```
```shell
$ amd-smi monitor -ptumv
GPU POWER GPU_TEMP MEM_TEMP GFX_UTIL GFX_CLOCK MEM_UTIL MEM_CLOCK VRAM_USED VRAM_TOTAL
0 171 W 32 °C 33 °C 0 % 114 MHz 0 % 900 MHz 283 MB 196300 MB
1 175 W 33 °C 34 °C 0 % 113 MHz 0 % 900 MHz 283 MB 196300 MB
2 177 W 31 °C 33 °C 0 % 113 MHz 0 % 900 MHz 283 MB 196300 MB
3 172 W 33 °C 32 °C 0 % 113 MHz 0 % 900 MHz 283 MB 196300 MB
4 178 W 32 °C 32 °C 0 % 113 MHz 0 % 900 MHz 284 MB 196300 MB
5 176 W 33 °C 35 °C 0 % 113 MHz 0 % 900 MHz 283 MB 196300 MB
6 176 W 32 °C 32 °C 0 % 113 MHz 0 % 900 MHz 283 MB 196300 MB
7 175 W 34 °C 32 °C 0 % 113 MHz 0 % 900 MHz 283 MB 196300 MB
```
- **Integrated ESMI Tool**
Users can get CPU metrics and telemetry through our API and CLI tools. This information can be seen in `amd-smi static` and `amd-smi metric` commands. Only available for limited target processors. As of ROCm 6.0.2, this is listed as:
- AMD Zen3 based CPU Family 19h Models 0h-Fh and 30h-3Fh
- AMD Zen4 based CPU Family 19h Models 10h-1Fh and A0-AFh
See a few examples listed below.
```shell
$ amd-smi static -U all
CPU: 0
SMU:
FW_VERSION: 85.90.0
INTERFACE_VERSION:
PROTO VERSION: 6
```
```shell
$ amd-smi metric -O 0 1 2
CORE: 0
BOOST_LIMIT:
VALUE: 400 MHz
CURR_ACTIVE_FREQ_CORE_LIMIT:
VALUE: 400 MHz
CORE_ENERGY:
VALUE: N/A
CORE: 1
BOOST_LIMIT:
VALUE: 400 MHz
CURR_ACTIVE_FREQ_CORE_LIMIT:
VALUE: 400 MHz
CORE_ENERGY:
VALUE: N/A
CORE: 2
BOOST_LIMIT:
VALUE: 400 MHz
CURR_ACTIVE_FREQ_CORE_LIMIT:
VALUE: 400 MHz
CORE_ENERGY:
VALUE: N/A
```
```shell
$ amd-smi metric -U all
CPU: 0
POWER_METRICS:
SOCKET POWER: 102675 mW
SOCKET POWER LIMIT: 550000 mW
SOCKET MAX POWER LIMIT: 550000 mW
PROCHOT:
PROCHOT_STATUS: 0
FREQ_METRICS:
FCLKMEMCLK:
FCLK: 2000 MHz
MCLK: 1300 MHz
CCLKFREQLIMIT: 400 MHz
SOC_CURRENT_ACTIVE_FREQ_LIMIT:
FREQ: 400 MHz
FREQ_SRC: [HSMP Agent]
SOC_FREQ_RANGE:
MAX_SOCKET_FREQ: 3700 MHz
MIN_SOCKET_FREQ: 400 MHz
C0_RESIDENCY:
RESIDENCY: 4 %
SVI_TELEMETRY_ALL_RAILS:
POWER: 102673 mW
METRIC_VERSION:
VERSION: 11
METRICS_TABLE:
CPU_FAMILY: 25
CPU_MODEL: 144
RESPONSE:
MTBL_ACCUMULATION_COUNTER: 2887162626
MTBL_MAX_SOCKET_TEMPERATURE: 41.0 °C
MTBL_MAX_VR_TEMPERATURE: 39.0 °C
MTBL_MAX_HBM_TEMPERATURE: 40.0 °C
MTBL_MAX_SOCKET_TEMPERATURE_ACC: 108583340881.125 °C
MTBL_MAX_VR_TEMPERATURE_ACC: 109472702595.0 °C
MTBL_MAX_HBM_TEMPERATURE_ACC: 111516663941.0 °C
MTBL_SOCKET_POWER_LIMIT: 550.0 W
MTBL_MAX_SOCKET_POWER_LIMIT: 550.0 W
MTBL_SOCKET_POWER: 102.678 W
MTBL_TIMESTAMP_RAW: 288731677361880
MTBL_TIMESTAMP_READABLE: Tue Mar 19 12:32:21 2024
MTBL_SOCKET_ENERGY_ACC: 166127.84 kJ
MTBL_CCD_ENERGY_ACC: 3317.837 kJ
MTBL_XCD_ENERGY_ACC: 21889.147 kJ
MTBL_AID_ENERGY_ACC: 121932.397 kJ
MTBL_HBM_ENERGY_ACC: 18994.108 kJ
MTBL_CCLK_FREQUENCY_LIMIT: 3.7 GHz
MTBL_GFXCLK_FREQUENCY_LIMIT: 0.0 MHz
MTBL_FCLK_FREQUENCY: 1999.988 MHz
MTBL_UCLK_FREQUENCY: 1299.993 MHz
MTBL_SOCCLK_FREQUENCY: [35.716, 35.715, 35.714, 35.714] MHz
MTBL_VCLK_FREQUENCY: [0.0, 53.749, 53.749, 53.749] MHz
MTBL_DCLK_FREQUENCY: [7.143, 44.791, 44.791, 44.791] MHz
MTBL_LCLK_FREQUENCY: [20.872, 18.75, 35.938, 599.558] MHz
MTBL_FCLK_FREQUENCY_TABLE: [1200.0, 1600.0, 1900.0, 2000.0] MHz
MTBL_UCLK_FREQUENCY_TABLE: [900.0, 1100.0, 1200.0, 1300.0] MHz
MTBL_SOCCLK_FREQUENCY_TABLE: [800.0, 1000.0, 1142.857, 1142.857] MHz
MTBL_VCLK_FREQUENCY_TABLE: [914.286, 1300.0, 1560.0, 1720.0] MHz
MTBL_DCLK_FREQUENCY_TABLE: [711.111, 975.0, 1300.0, 1433.333] MHz
MTBL_LCLK_FREQUENCY_TABLE: [600.0, 844.444, 1150.0, 1150.0] MHz
MTBL_CCLK_FREQUENCY_ACC: [4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639,
4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639,
4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639,
4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639,
4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639, 4399751656.639,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] GHz
MTBL_GFXCLK_FREQUENCY_ACC: [0.0, 0.0, 250534397827.603, 251546257401.82, 250811364089.836,
249999070486.505, 251622633562.855, 251342375116.05] MHz
MTBL_GFXCLK_FREQUENCY: [0.0, 0.0, 31.091, 31.414, 31.141, 31.478, 31.32, 31.453]
MHz
MTBL_MAX_CCLK_FREQUENCY: 3.7 GHz
MTBL_MIN_CCLK_FREQUENCY: 0.4 GHz
MTBL_MAX_GFXCLK_FREQUENCY: 2100.0 MHz
MTBL_MIN_GFXCLK_FREQUENCY: 500.0 MHz
MTBL_MAX_LCLK_DPM_RANGE: 2
MTBL_MIN_LCLK_DPM_RANGE: 0
MTBL_XGMI_WIDTH: 0.0
MTBL_XGMI_BITRATE: 0.0 Gbps
MTBL_XGMI_READ_BANDWIDTH_ACC: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Gbps
MTBL_XGMI_WRITE_BANDWIDTH_ACC: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Gbps
MTBL_SOCKET_C0_RESIDENCY: 4.329 %
MTBL_SOCKET_GFX_BUSY: 0.0 %
MTBL_HBM_BANDWIDTH_UTILIZATION: 0.001 %
MTBL_SOCKET_C0_RESIDENCY_ACC: 311523106.34
MTBL_SOCKET_GFX_BUSY_ACC: 84739.281
MTBL_HBM_BANDWIDTH_ACC: 33231180.073 Gbps
MTBL_MAX_HBM_BANDWIDTH: 5324.801 Gbps
MTBL_DRAM_BANDWIDTH_UTILIZATION_ACC: 612843.699
MTBL_PCIE_BANDWIDTH_ACC: [0.0, 0.0, 0.0, 0.0] Gbps
MTBL_PROCHOT_RESIDENCY_ACC: 0
MTBL_PPT_RESIDENCY_ACC: 2887162626
MTBL_SOCKET_THM_RESIDENCY_ACC: 2887162626
MTBL_VR_THM_RESIDENCY_ACC: 0
MTBL_HBM_THM_RESIDENCY_ACC: 2887162626
SOCKET_ENERGY:
RESPONSE: N/A
DDR_BANDWIDTH:
RESPONSE: N/A
CPU_TEMP:
RESPONSE: N/A
```
- **Added support for new metrics: VCN, JPEG engines, and PCIe errors**
Using the AMD SMI tool, users can retreive VCN, JPEG engines, and PCIe errors by calling `amd-smi metric -P` or `amd-smi metric --usage`. Depending on device support, `VCN_ACTIVITY` will update for MI3x ASICs (with 4 separate VCN engine activities) for older asics `MM_ACTIVITY` with UVD/VCN engine activity (average of all engines). `JPEG_ACTIVITY` is a new field for MI3x ASICs, where device can support up to 32 JPEG engine activities. See our documentation for more in-depth understanding of these new fields.
```shell
$ amd-smi metric -P
GPU: 0
PCIE:
WIDTH: 16
SPEED: 16 GT/s
REPLAY_COUNT: 0
L0_TO_RECOVERY_COUNT: 1
REPLAY_ROLL_OVER_COUNT: 0
NAK_SENT_COUNT: 0
NAK_RECEIVED_COUNT: 0
CURRENT_BANDWIDTH_SENT: N/A
CURRENT_BANDWIDTH_RECEIVED: N/A
MAX_PACKET_SIZE: N/A
```
```shell
$ amd-smi metric --usage
GPU: 0
USAGE:
GFX_ACTIVITY: 0 %
UMC_ACTIVITY: 0 %
MM_ACTIVITY: N/A
VCN_ACTIVITY: [0 %, 0 %, 0 %, 0 %]
JPEG_ACTIVITY: [0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0
%, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %, 0 %,
0 %, 0 %, 0 %, 0 %]
```
- **Added AMDSMI Tool Version**
AMD SMI will report ***three versions***: AMDSMI Tool, AMDSMI Library version, and ROCm version.
The AMDSMI Tool version is the CLI/tool version number with commit ID appended after `+` sign.
The AMDSMI Library version is the library package version number.
The ROCm version is the system's installed ROCm version, if ROCm is not installed it will report N/A.
```shell
$ amd-smi version
AMDSMI Tool: 23.4.2+505b858 | AMDSMI Library version: 24.2.0.0 | ROCm version: 6.1.0
```
- **Added XGMI table**
Displays XGMI information for AMD GPU devices in a table format. Only available on supported ASICs (eg. MI300). Here users can view read/write data XGMI or PCIe accumulated data transfer size (in KiloBytes).
```shell
$ amd-smi xgmi
LINK METRIC TABLE:
bdf bit_rate max_bandwidth link_type 0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
GPU0 0000:0c:00.0 32 Gb/s 512 Gb/s XGMI
Read N/A 2 KB 2 KB 1 KB 2 KB 1 KB 2 KB 2 KB
Write N/A 1 KB 1 KB 1 KB 1 KB 1 KB 1 KB 1 KB
GPU1 0000:22:00.0 32 Gb/s 512 Gb/s XGMI
Read 0 KB N/A 2 KB 2 KB 1 KB 2 KB 1 KB 2 KB
Write 0 KB N/A 1 KB 1 KB 1 KB 1 KB 1 KB 1 KB
GPU2 0000:38:00.0 32 Gb/s 512 Gb/s XGMI
Read 0 KB 1 KB N/A 2 KB 1 KB 2 KB 0 KB 0 KB
Write 0 KB 1 KB N/A 1 KB 1 KB 1 KB 1 KB 1 KB
GPU3 0000:5c:00.0 32 Gb/s 512 Gb/s XGMI
Read 0 KB 0 KB 2 KB N/A 1 KB 0 KB 0 KB 2 KB
Write 0 KB 1 KB 1 KB N/A 1 KB 1 KB 1 KB 1 KB
GPU4 0000:9f:00.0 32 Gb/s 512 Gb/s XGMI
Read 0 KB 1 KB 0 KB 0 KB N/A 2 KB 0 KB 2 KB
Write 0 KB 1 KB 1 KB 1 KB N/A 1 KB 1 KB 1 KB
GPU5 0000:af:00.0 32 Gb/s 512 Gb/s XGMI
Read 0 KB 2 KB 0 KB 0 KB 0 KB N/A 2 KB 0 KB
Write 0 KB 1 KB 1 KB 1 KB 1 KB N/A 1 KB 1 KB
GPU6 0000:bf:00.0 32 Gb/s 512 Gb/s XGMI
Read 0 KB 0 KB 0 KB 0 KB 0 KB 0 KB N/A 0 KB
Write 0 KB 1 KB 1 KB 1 KB 1 KB 1 KB N/A 1 KB
GPU7 0000:df:00.0 32 Gb/s 512 Gb/s XGMI
Read 0 KB 0 KB 0 KB 0 KB 0 KB 0 KB 0 KB N/A
Write 0 KB 1 KB 1 KB 1 KB 1 KB 1 KB 1 KB N/A
```
- **Added units of measure to JSON output.**
We added unit of measure to JSON/CSV `amd-smi metric`, `amd-smi static`, and `amd-smi monitor` commands.
Ex.
```shell
amd-smi metric -p --json
[
{
"gpu": 0,
"power": {
"socket_power": {
"value": 10,
"unit": "W"
},
"gfx_voltage": {
"value": 6,
"unit": "mV"
},
"soc_voltage": {
"value": 918,
"unit": "mV"
},
"mem_voltage": {
"value": 1250,
"unit": "mV"
},
"power_management": "ENABLED",
"throttle_status": "UNTHROTTLED"
}
}
]
```
### Changed
- **Topology is now left-aligned with BDF of each device listed individual table's row/coloumns.**
We provided each device's BDF for every table's row/columns, then left aligned data. We want AMD SMI Tool output to be easy to understand and digest for our users. Having users scroll up to find this information made it difficult to follow, especially for devices which have many devices associated with one ASIC.
```shell
$ amd-smi topology
ACCESS TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:22:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:38:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:5c:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:9f:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:af:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:bf:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
0000:df:00.0 ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED ENABLED
WEIGHT TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 0 15 15 15 15 15 15 15
0000:22:00.0 15 0 15 15 15 15 15 15
0000:38:00.0 15 15 0 15 15 15 15 15
0000:5c:00.0 15 15 15 0 15 15 15 15
0000:9f:00.0 15 15 15 15 0 15 15 15
0000:af:00.0 15 15 15 15 15 0 15 15
0000:bf:00.0 15 15 15 15 15 15 0 15
0000:df:00.0 15 15 15 15 15 15 15 0
HOPS TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 0 1 1 1 1 1 1 1
0000:22:00.0 1 0 1 1 1 1 1 1
0000:38:00.0 1 1 0 1 1 1 1 1
0000:5c:00.0 1 1 1 0 1 1 1 1
0000:9f:00.0 1 1 1 1 0 1 1 1
0000:af:00.0 1 1 1 1 1 0 1 1
0000:bf:00.0 1 1 1 1 1 1 0 1
0000:df:00.0 1 1 1 1 1 1 1 0
LINK TYPE TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 SELF XGMI XGMI XGMI XGMI XGMI XGMI XGMI
0000:22:00.0 XGMI SELF XGMI XGMI XGMI XGMI XGMI XGMI
0000:38:00.0 XGMI XGMI SELF XGMI XGMI XGMI XGMI XGMI
0000:5c:00.0 XGMI XGMI XGMI SELF XGMI XGMI XGMI XGMI
0000:9f:00.0 XGMI XGMI XGMI XGMI SELF XGMI XGMI XGMI
0000:af:00.0 XGMI XGMI XGMI XGMI XGMI SELF XGMI XGMI
0000:bf:00.0 XGMI XGMI XGMI XGMI XGMI XGMI SELF XGMI
0000:df:00.0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI SELF
NUMA BW TABLE:
0000:0c:00.0 0000:22:00.0 0000:38:00.0 0000:5c:00.0 0000:9f:00.0 0000:af:00.0 0000:bf:00.0 0000:df:00.0
0000:0c:00.0 N/A 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000
0000:22:00.0 50000-50000 N/A 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000
0000:38:00.0 50000-50000 50000-50000 N/A 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000
0000:5c:00.0 50000-50000 50000-50000 50000-50000 N/A 50000-50000 50000-50000 50000-50000 50000-50000
0000:9f:00.0 50000-50000 50000-50000 50000-50000 50000-50000 N/A 50000-50000 50000-50000 50000-50000
0000:af:00.0 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 N/A 50000-50000 50000-50000
0000:bf:00.0 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 N/A 50000-50000
0000:df:00.0 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 50000-50000 N/A
```
### Optimizations
- N/A
### Fixed
- **Fix for Navi3X/Navi2X/MI100 `amdsmi_get_gpu_pci_bandwidth()` in frequencies_read tests**
Devices which do not report (eg. Navi3X/Navi2X/MI100) we have added checks to confirm these devices return AMDSMI_STATUS_NOT_SUPPORTED. Otherwise, tests now display a return string.
- **Fix for devices which have an older pyyaml installed**
Platforms which are identified as having an older pyyaml version or pip, we no manually update both pip and pyyaml as needed. This corrects issues identified below. Fix impacts the following CLI commands:
- `amd-smi list`
- `amd-smi static`
- `amd-smi firmware`
- `amd-smi metric`
- `amd-smi topology`
```shell
TypeError: dump_all() got an unexpected keyword argument 'sort_keys'
```
- **Fix for crash when user is not a member of video/render groups**
AMD SMI now uses same mutex handler for devices as rocm-smi. This helps avoid crashes when DRM/device data is inaccessable to the logged in user.
### Known Issues
- N/A
## amd_smi_lib for ROCm 6.0.0
@@ -26,7 +430,7 @@ Now the information is displayed as a table by each GPU's BDF, which closer rese
### Optimizations
- N/A
- Updated to C++17, gtest-1.14, and cmake 3.14
### Fixed
+11 -4
Wyświetl plik
@@ -26,11 +26,18 @@ installed to query firmware information and hardware IPs.
### Installation
* Install amdgpu driver
* Install amd-smi-lib package through package manager
### Install amdgpu using ROCm
* Install amdgpu driver:
See example below, your release and link may differ. The `amdgpu-install --usecase=rocm` triggers both an amdgpu driver update and AMD SMI packages to be installed on your device.
```shell
sudo apt update
wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo amdgpu-install --usecase=rocm
```
* amd-smi --help
### Install Example for Ubuntu 22.04
### Install Example for Ubuntu 22.04 (without ROCm)
``` bash
apt install amd-smi-lib
@@ -277,4 +284,4 @@ Path to the program `amdsmitst`: build/tests/amd_smi_test/
The information contained herein is for informational purposes only, and is subject to change without notice. In addition, any stated support is planned and is also subject to change. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein.
© 2023 Advanced Micro Devices, Inc. All Rights Reserved.
© 2023-2024 Advanced Micro Devices, Inc. All Rights Reserved.
+1 -1
Wyświetl plik
@@ -15,7 +15,7 @@ Recommended: At least one AMD GPU with AMD driver installed
### Installation
* Install amdgpu driver
* [Install amdgpu driver](../README.md#install-amdgpu-using-rocm)
* Optionally install amd_hsmp driver for ESMI CPU functions
* Install amd-smi-lib package through package manager
* amd-smi --help
@@ -1688,7 +1688,8 @@ struct_amdsmi_error_count_t._pack_ = 1 # source:False
struct_amdsmi_error_count_t._fields_ = [
('correctable_count', ctypes.c_uint64),
('uncorrectable_count', ctypes.c_uint64),
('reserved', ctypes.c_uint64 * 2),
('deferred_count', ctypes.c_uint64),
('reserved', ctypes.c_uint64 * 5),
]
amdsmi_error_count_t = struct_amdsmi_error_count_t