Add ROCm 6.1.1 changelog, ROCm SMI deprication, vbios fix

* Updates:
    - Add ROCm 6.1.1 Changelog updates
    - Add planned ROCm SMI deprication notice
    - Fix rocm-smi --showvbios showing extra errors
      for GPUs which do not have a VBIOS (MI300a ASICs)

Change-Id: I0e5ccfe2677f9c7909ca13863a920e323e82b439
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: f5c32b5415]
Cette révision appartient à :
Charis Poag
2024-03-30 00:11:09 -05:00
Parent 2ddcade4e7
révision 0025ebafca
3 fichiers modifiés avec 431 ajouts et 2 suppressions
+424
Voir le fichier
@@ -4,6 +4,303 @@ Full documentation for rocm_smi_lib is available at [https://rocm.docs.amd.com/]
***All information listed below is for reference and subject to change.***
## rocm_smi_lib for ROCm 6.1.1
### Added
- **Unlock mutex if process is dead**
Added in order to unlock mutex when process is dead. Additional debug output has been added if futher issues are detected.
- **Added Partition ID to rocm-smi CLI**
`rsmi_dev_pci_id_get()` now provides partition ID. See API for better detail. Previously these bits were reserved bits (right before domain) and partition id was within function.
- bits [63:32] = domain
- bits [31:28] = partition id
- bits [27:16] = reserved
- bits [15: 0] = pci bus/device/function
rocm-smi now provides partition ID in `rocm-smi` and `rocm-smi --showhw`. If device supports partitioning and is in a non-SPX mode (CPX, DPX,TPX,... etc) partition ID will be non-zero. In SPX and non-supported devices will show as 0. See examples provided below.
```shell
$ rocm-smi
========================================= ROCm System Management Interface =========================================
=================================================== Concise Info ===================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Avg) (Mem, Compute, ID)
====================================================================================================================
0 1 0x73bf, 34495 43.0°C 6.0W N/A, N/A, 0 0Mhz 96Mhz 0% manual 150.0W 3% 0%
1 2 0x73a3, 22215 34.0°C 8.0W N/A, N/A, 0 0Mhz 96Mhz 20.0% manual 213.0W 0% 0%
====================================================================================================================
=============================================== End of ROCm SMI Log ================================================
```
*Device below is in TPX*
```shell
$ rocm-smi --showhw
================================= ROCm System Management Interface =================================
====================================== Concise Hardware Info =======================================
GPU NODE DID GUID GFX VER GFX RAS SDMA RAS UMC RAS VBIOS BUS PARTITION ID
0 4 0x74a0 3877 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 0
1 5 0x74a0 54196 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 1
2 6 0x74a0 36891 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 2
3 7 0x74a0 28397 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 0
4 8 0x74a0 45692 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 1
5 9 0x74a0 61907 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 2
6 10 0x74a0 52404 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 0
7 11 0x74a0 4133 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 1
8 12 0x74a0 21386 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 2
9 13 0x74a0 10876 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 0
10 14 0x74a0 63213 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 1
11 15 0x74a0 46402 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 2
====================================================================================================
======================================= End of ROCm SMI Log ========================================
```
- **Added `NODE`, `GUID`, and `GFX Version`**
Changes impact the following rocm-smi CLIs:
- `rocm-smi`
- `rocm-smi -i`
- `rocm-smi --showhw`
- `rocm-smi --showproduct`
`NODE` - is the KFD node, since these can both be CPU and GPU devices. This field is invariant between boots.
`GUID` - also known as GPU ID. GUID is the KFD GPU's ID. This field has a chance to be variant between boots.
`GFX Version` - this is the device's target graphics version.
See below for a few example outputs.
```shell
$ rocm-smi --showhw
================================= ROCm System Management Interface =================================
====================================== Concise Hardware Info =======================================
GPU NODE DID GUID GFX VER GFX RAS SDMA RAS UMC RAS VBIOS BUS PARTITION ID
0 4 0x74a0 3877 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 0
1 5 0x74a0 54196 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 1
2 6 0x74a0 36891 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 2
3 7 0x74a0 28397 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 0
4 8 0x74a0 45692 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 1
5 9 0x74a0 61907 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 2
6 10 0x74a0 52404 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 0
7 11 0x74a0 4133 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 1
8 12 0x74a0 21386 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 2
9 13 0x74a0 10876 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 0
10 14 0x74a0 63213 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 1
11 15 0x74a0 46402 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 2
====================================================================================================
======================================= End of ROCm SMI Log ========================================
```
```shell
$ rocm-smi -i
============================ ROCm System Management Interface ============================
=========================================== ID ===========================================
GPU[0] : Device Name: Aqua Vanjaram [Instinct MI300A]
GPU[0] : Device ID: 0x74a0
GPU[0] : Device Rev: 0x00
GPU[0] : Subsystem ID: 0x74a0
GPU[0] : GUID: 60294
GPU[1] : Device Name: Aqua Vanjaram [Instinct MI300A]
GPU[1] : Device ID: 0x74a0
GPU[1] : Device Rev: 0x00
GPU[1] : Subsystem ID: 0x74a0
GPU[1] : GUID: 35406
GPU[2] : Device Name: Aqua Vanjaram [Instinct MI300A]
GPU[2] : Device ID: 0x74a0
GPU[2] : Device Rev: 0x00
GPU[2] : Subsystem ID: 0x74a0
GPU[2] : GUID: 10263
GPU[3] : Device Name: Aqua Vanjaram [Instinct MI300A]
GPU[3] : Device ID: 0x74a0
GPU[3] : Device Rev: 0x00
GPU[3] : Subsystem ID: 0x74a0
GPU[3] : GUID: 52959
==========================================================================================
================================== End of ROCm SMI Log ===================================
```
```shell
$ rocm-smi --showproduct
============================ ROCm System Management Interface ============================
====================================== Product Info ======================================
GPU[0] : Card Series: Aqua Vanjaram [Instinct MI300A]
GPU[0] : Card Model: 0x74a0
GPU[0] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[0] : Card SKU: N/A
GPU[0] : Subsystem ID: 0x74a0
GPU[0] : Device Rev: 0x00
GPU[0] : Node ID: 4
GPU[0] : GUID: 60294
GPU[0] : GFX Version: gfx942
GPU[1] : Card Series: Aqua Vanjaram [Instinct MI300A]
GPU[1] : Card Model: 0x74a0
GPU[1] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[1] : Card SKU: N/A
GPU[1] : Subsystem ID: 0x74a0
GPU[1] : Device Rev: 0x00
GPU[1] : Node ID: 5
GPU[1] : GUID: 35406
GPU[1] : GFX Version: gfx942
GPU[2] : Card Series: Aqua Vanjaram [Instinct MI300A]
GPU[2] : Card Model: 0x74a0
GPU[2] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[2] : Card SKU: N/A
GPU[2] : Subsystem ID: 0x74a0
GPU[2] : Device Rev: 0x00
GPU[2] : Node ID: 6
GPU[2] : GUID: 10263
GPU[2] : GFX Version: gfx942
GPU[3] : Card Series: Aqua Vanjaram [Instinct MI300A]
GPU[3] : Card Model: 0x74a0
GPU[3] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI]
GPU[3] : Card SKU: N/A
GPU[3] : Subsystem ID: 0x74a0
GPU[3] : Device Rev: 0x00
GPU[3] : Node ID: 7
GPU[3] : GUID: 52959
GPU[3] : GFX Version: gfx942
==========================================================================================
================================== End of ROCm SMI Log ===================================
```
- **Documentation now includes C++ and Python: tutorials, API guides, and C++ reference pages**
See [https://rocm.docs.amd.com/](https://rocm.docs.amd.com/projects/rocm_smi_lib/en/latest/) once 6.1.1 is released.
### Changed
- **Aligned `rocm-smi` fields display "N/A" instead of "unknown"/"unsupported": `Card ID`, `DID`, `Model`, `SKU`, and `VBIOS`**
Impacts the following commands:
- `rocm-smi` - see other examples above for 6.1.1
- `rocm-smi --showhw` - see other examples above for 6.1.1
- `rocm-smi --showproduct` - see other examples above for 6.1.1
- `rocm-smi -i` - see other examples above for 6.1.1
- `rocm-smi --showvbios` - see example below
```shell
$ rocm-smi --showvbios
============================ ROCm System Management Interface ============================
========================================= VBIOS ==========================================
GPU[0] : VBIOS version: N/A
GPU[1] : VBIOS version: N/A
GPU[2] : VBIOS version: N/A
GPU[3] : VBIOS version: N/A
==========================================================================================
================================== End of ROCm SMI Log ===================================
```
- **Removed stacked id formatting in `rocm-smi`**
This is to simplify identifiers helpful to users. More identifiers can be found on:
- `rocm-smi -i`
- `rocm-smi --showhw`
- `rocm-smi --showproduct`
See examples shown above for 6.1.1. Previous output example can be seen below.
```shell
$ rocm-smi
========================================== ROCm System Management Interface ==========================================
==================================================== Concise Info ====================================================
Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
Name (20 chars) (Junction) (Socket) (Mem, Compute)
======================================================================================================================
0 [0x74a0 : 0x00] 40.0°C 102.0W NPS1, SPX 31Mhz 1300Mhz 0% manual 550.0W 0% 0%
Aqua Vanjaram [Insti
======================================================================================================================
================================================ End of ROCm SMI Log =================================================
```
### Optimizations
- N/A
### Fixed
- **Fixed HIP and ROCm SMI mismatch on GPU bus assignments**
These changes prompted us to to provide better visability for our device nodes and partition IDs (see addition provided above). See examples below for fix overview.
1. MI300a GPU device `Domain:Bus:Device.function` clashes with another AMD USB device
Cause(s):
a. ROCm SMI did not propagate domain consistently (for partitioned devices)
b. AMD GPU driver previously reported partition IDs within function node - causing clash with the other AMD USB device PCIe ID displayed.
2. Domain does not propagate for devices which support partitioning (MI300x/a)
Cause(s):
a. ROCm SMI did not propagate domain consistently (for partitioned devices)
3. Displayed topology will show disordered nodes when compared to HIP
Cause(s):
a. ROCm SMI did not propogate domain consistently (for partitioned devices)
*Device in TPX*
```shell
$ rocm-smi --showhw
================================= ROCm System Management Interface =================================
====================================== Concise Hardware Info =======================================
GPU NODE DID GUID GFX VER GFX RAS SDMA RAS UMC RAS VBIOS BUS PARTITION ID
0 4 0x74a0 3877 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 0
1 5 0x74a0 54196 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 1
2 6 0x74a0 36891 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 2
3 7 0x74a0 28397 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 0
4 8 0x74a0 45692 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 1
5 9 0x74a0 61907 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 2
6 10 0x74a0 52404 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 0
7 11 0x74a0 4133 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 1
8 12 0x74a0 21386 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 2
9 13 0x74a0 10876 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 0
10 14 0x74a0 63213 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 1
11 15 0x74a0 46402 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 2
====================================================================================================
======================================= End of ROCm SMI Log ========================================
$ lspci -D|grep -i "process\|usb"
0000:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0000:01:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Device 14df
0001:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0002:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
0003:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A]
```
```shell
$ rocm-smi ----showtoponuma
======================================= Numa Nodes =======================================
GPU[0] : (Topology) Numa Node: 0
GPU[0] : (Topology) Numa Affinity: 0
GPU[1] : (Topology) Numa Node: 0
GPU[1] : (Topology) Numa Affinity: 0
GPU[2] : (Topology) Numa Node: 0
GPU[2] : (Topology) Numa Affinity: 0
GPU[3] : (Topology) Numa Node: 1
GPU[3] : (Topology) Numa Affinity: 1
GPU[4] : (Topology) Numa Node: 1
GPU[4] : (Topology) Numa Affinity: 1
GPU[5] : (Topology) Numa Node: 1
GPU[5] : (Topology) Numa Affinity: 1
GPU[6] : (Topology) Numa Node: 2
GPU[6] : (Topology) Numa Affinity: 2
GPU[7] : (Topology) Numa Node: 2
GPU[7] : (Topology) Numa Affinity: 2
GPU[8] : (Topology) Numa Node: 2
GPU[8] : (Topology) Numa Affinity: 2
GPU[9] : (Topology) Numa Node: 3
GPU[9] : (Topology) Numa Affinity: 3
GPU[10] : (Topology) Numa Node: 3
GPU[10] : (Topology) Numa Affinity: 3
GPU[11] : (Topology) Numa Node: 3
GPU[11] : (Topology) Numa Affinity: 3
================================== End of ROCm SMI Log ===================================
```
- **Fixed memory leaks**
Caused by not closing directories and creating maps nodes instead of checking using by using .at().
- **Fixed Python rocm_smi API calls**
Fixed initializing calls which reuse rocmsmi.initializeRsmi() bindings.
```shell
Traceback (most recent call last):
File "/home/charpoag/rocmsmi_pythonapi.py", line 9, in <module>
rocm_smi.initializeRsmi()
File "/opt/rocm/libexec/rocm_smi/rocm_smi.py", line 3531, in initializeRsmi
ret_init = rocmsmi.rsmi_init(0)
NameError: name 'rocmsmi' is not defined
```
- **Fixed rsmi_dev_activity_metric_get gfx/memory activity does not update with GPU activity**
Checks and forces rereading gpu metrics unconditionally.
### Known Issues
- N/A
## rocm_smi_lib for ROCm 6.1.0
### Added
@@ -63,6 +360,133 @@ Updated to use `rsmi_dev_power_get()` within CLI to provide a consistent device
The `rsmi_dev_memory_partition_set` API is updated to handle the readonly SYSFS check. Corresponding tests and CLI (`rocm-smi --setmemorypartition` and `rocm-smi --resetmemorypartition`) calls were updated accordingly.
- Fix `rocm-smi --showclkvolt` and `rocm-smi --showvc` displaying 0 for overdrive and voltage curve is not supported
### Known Issues
- **HIP and ROCm SMI mismatch on GPU bus assignments**
Three separate issues have been identified:
1. MI300a GPU device `Domain:Bus:Device.function` clashes with another AMD USB device
```shell
$ lspci|grep -i "process\|usb"
0000:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0
0000:01:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Device 14df
0001:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0
0002:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0
0003:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0
```
```shell
$ rocm-smi --showbus
============================ ROCm System Management Interface ============================
======================================= PCI Bus ID =======================================
GPU[0] : PCI Bus: 0000:01:00.0
GPU[1] : PCI Bus: 0000:01:00.1
GPU[2] : PCI Bus: 0000:01:00.2
GPU[3] : PCI Bus: 0000:01:00.3
...
==========================================================================================
================================== End of ROCm SMI Log ===================================
```
2. Domain does not propagate for devices which support partitioning (MI300x/a)
For example, a device in non-SPX (single partition) - devices will overlap in function device.
```shell
$ rocm-smi --showbus
============================ ROCm System Management Interface ============================
======================================= PCI Bus ID =======================================
GPU[0] : PCI Bus: 0000:01:00.0
GPU[1] : PCI Bus: 0000:01:00.1
GPU[2] : PCI Bus: 0000:01:00.1
GPU[3] : PCI Bus: 0000:01:00.1
GPU[4] : PCI Bus: 0000:01:00.1
GPU[5] : PCI Bus: 0000:01:00.2
GPU[6] : PCI Bus: 0000:01:00.2
GPU[7] : PCI Bus: 0000:01:00.2
GPU[8] : PCI Bus: 0000:01:00.2
GPU[9] : PCI Bus: 0000:01:00.3
GPU[10] : PCI Bus: 0000:01:00.3
GPU[11] : PCI Bus: 0000:01:00.3
GPU[12] : PCI Bus: 0000:01:00.3
GPU[13] : PCI Bus: 0000:01:00.4
GPU[14] : PCI Bus: 0000:01:00.4
GPU[15] : PCI Bus: 0000:01:00.4
GPU[16] : PCI Bus: 0000:01:00.4
GPU[17] : PCI Bus: 0000:01:00.5
GPU[18] : PCI Bus: 0000:01:00.5
GPU[19] : PCI Bus: 0000:01:00.5
GPU[20] : PCI Bus: 0000:01:00.5
GPU[21] : PCI Bus: 0001:01:00.0
GPU[22] : PCI Bus: 0002:01:00.0
GPU[23] : PCI Bus: 0003:01:00.0
================================== End of ROCm SMI Log ===================================
```
3. Displayed topology will show disordered nodes when compared to HIP
See rocm-smi output vs transferbench.
```shell
rocm-smi --showtopo option is not displaying the correct information when the MI300 driver is loaded in TPX mode.
============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
get_link_weight_topology, Not supported on the given system
ERROR: GPU[1] : Cannot read Link Weight: Not supported on this machine
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU8 GPU9 GPU10 GPU11
GPU0 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI
GPU1 XGMI 0 XXXX XXXX XXXX XGMI XGMI XGMI XGMI XGMI XGMI XGMI
GPU2 XGMI XXXX 0 XXXX XXXX XGMI XGMI XGMI XGMI XGMI XGMI XGMI
GPU3 XGMI XXXX XXXX 0 XXXX XGMI XGMI XGMI XGMI XGMI XGMI XGMI
GPU4 XGMI XXXX XXXX XXXX 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI
GPU5 XGMI XGMI XGMI XGMI XGMI 0 XXXX XXXX XXXX XGMI XGMI XGMI
GPU6 XGMI XGMI XGMI XGMI XGMI XXXX 0 XXXX XXXX XGMI XGMI XGMI
GPU7 XGMI XGMI XGMI XGMI XGMI XXXX XXXX 0 XXXX XGMI XGMI XGMI
GPU8 XGMI XGMI XGMI XGMI XGMI XXXX XXXX XXXX 0 XGMI XGMI XGMI
GPU9 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI XGMI
GPU10 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI
GPU11 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0
======================================= Numa Nodes =======================================
GPU[0] : (Topology) Numa Node: 0
GPU[0] : (Topology) Numa Affinity: 0
GPU[1] : (Topology) Numa Node: 0
GPU[1] : (Topology) Numa Affinity: 0
GPU[2] : (Topology) Numa Node: 0
GPU[2] : (Topology) Numa Affinity: 1
GPU[3] : (Topology) Numa Node: 0
GPU[3] : (Topology) Numa Affinity: 2
GPU[4] : (Topology) Numa Node: 0
GPU[4] : (Topology) Numa Affinity: 3
GPU[5] : (Topology) Numa Node: 0
GPU[5] : (Topology) Numa Affinity: 0
GPU[6] : (Topology) Numa Node: 0
GPU[6] : (Topology) Numa Affinity: 1
GPU[7] : (Topology) Numa Node: 0
GPU[7] : (Topology) Numa Affinity: 2
GPU[8] : (Topology) Numa Node: 0
GPU[8] : (Topology) Numa Affinity: 3
GPU[9] : (Topology) Numa Node: 1
GPU[9] : (Topology) Numa Affinity: 1
GPU[10] : (Topology) Numa Node: 2
GPU[10] : (Topology) Numa Affinity: 2
GPU[11] : (Topology) Numa Node: 3
GPU[11] : (Topology) Numa Affinity: 3
================================== End of ROCm SMI Log ===================================
```
```shell
./Transferbench
...
| GPU 00 | GPU 01 | GPU 02 | GPU 03 | GPU 04 | GPU 05 | GPU 06 | GPU 07 | PCIe Bus ID | #CUs | Closest NUMA | DMA engines
--------+--------+--------+--------+--------+--------+--------+--------+--------+--------------+------+-------------+------------
GPU 00 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:0c:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
GPU 01 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:22:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
GPU 02 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:38:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
GPU 03 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:5c:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
GPU 04 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | 0000:9f:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
GPU 05 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | 0000:af:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
GPU 06 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | 0000:bf:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
GPU 07 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | 0000:df:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
...
```
## rocm_smi_lib for ROCm 6.0.0
### Added
+6 -1
Voir le fichier
@@ -10,6 +10,11 @@ The information contained herein is for informational purposes only, and is subj
© 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.
## Planned Deprication Notice
ROCm System Management Interface (ROCm SMI) Library is planned to be ***depricated***. Release date to be announced soon. Please start migrating to AMD SMI.
- Documentation: [https://rocm.docs.amd.com](https://rocm.docs.amd.com/projects/amdsmi/en/latest/)
- Github: [https://github.com/ROCm/amdsmi](https://github.com/ROCm/amdsmi)
## Installation
### Install amdgpu using ROCm
@@ -21,7 +26,7 @@ wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.
sudo apt install ./amdgpu-install_6.0.60002-1_all.deb
sudo amdgpu-install --usecase=rocm
```
* rocm-smi --help
* `rocm-smi --help`
## Building ROCm SMI
+1 -1
Voir le fichier
@@ -2941,7 +2941,7 @@ def showVbiosVersion(deviceList):
"""
printLogSpacer(' VBIOS ')
for device in deviceList:
printLog(device, 'VBIOS version', getVbiosVersion(device))
printLog(device, 'VBIOS version', getVbiosVersion(device, silent=True))
printLogSpacer()