diff --git a/projects/rocm-smi-lib/CHANGELOG.md b/projects/rocm-smi-lib/CHANGELOG.md index 60613c6d54..1ed65dd85d 100644 --- a/projects/rocm-smi-lib/CHANGELOG.md +++ b/projects/rocm-smi-lib/CHANGELOG.md @@ -4,6 +4,303 @@ Full documentation for rocm_smi_lib is available at [https://rocm.docs.amd.com/] ***All information listed below is for reference and subject to change.*** +## rocm_smi_lib for ROCm 6.1.1 + +### Added +- **Unlock mutex if process is dead** +Added in order to unlock mutex when process is dead. Additional debug output has been added if futher issues are detected. + +- **Added Partition ID to rocm-smi CLI** +`rsmi_dev_pci_id_get()` now provides partition ID. See API for better detail. Previously these bits were reserved bits (right before domain) and partition id was within function. + - bits [63:32] = domain + - bits [31:28] = partition id + - bits [27:16] = reserved + - bits [15: 0] = pci bus/device/function + +rocm-smi now provides partition ID in `rocm-smi` and `rocm-smi --showhw`. If device supports partitioning and is in a non-SPX mode (CPX, DPX,TPX,... etc) partition ID will be non-zero. In SPX and non-supported devices will show as 0. See examples provided below. + +```shell + $ rocm-smi + +========================================= ROCm System Management Interface ========================================= +=================================================== Concise Info =================================================== +Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% + (DID, GUID) (Edge) (Avg) (Mem, Compute, ID) +==================================================================================================================== +0 1 0x73bf, 34495 43.0°C 6.0W N/A, N/A, 0 0Mhz 96Mhz 0% manual 150.0W 3% 0% +1 2 0x73a3, 22215 34.0°C 8.0W N/A, N/A, 0 0Mhz 96Mhz 20.0% manual 213.0W 0% 0% +==================================================================================================================== +=============================================== End of ROCm SMI Log ================================================ +``` +*Device below is in TPX* +```shell +$ rocm-smi --showhw + +================================= ROCm System Management Interface ================================= +====================================== Concise Hardware Info ======================================= +GPU NODE DID GUID GFX VER GFX RAS SDMA RAS UMC RAS VBIOS BUS PARTITION ID +0 4 0x74a0 3877 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 0 +1 5 0x74a0 54196 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 1 +2 6 0x74a0 36891 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 2 +3 7 0x74a0 28397 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 0 +4 8 0x74a0 45692 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 1 +5 9 0x74a0 61907 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 2 +6 10 0x74a0 52404 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 0 +7 11 0x74a0 4133 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 1 +8 12 0x74a0 21386 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 2 +9 13 0x74a0 10876 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 0 +10 14 0x74a0 63213 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 1 +11 15 0x74a0 46402 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 2 +==================================================================================================== +======================================= End of ROCm SMI Log ======================================== +``` + +- **Added `NODE`, `GUID`, and `GFX Version`** +Changes impact the following rocm-smi CLIs: + - `rocm-smi` + - `rocm-smi -i` + - `rocm-smi --showhw` + - `rocm-smi --showproduct` + + `NODE` - is the KFD node, since these can both be CPU and GPU devices. This field is invariant between boots. + `GUID` - also known as GPU ID. GUID is the KFD GPU's ID. This field has a chance to be variant between boots. + `GFX Version` - this is the device's target graphics version. + +See below for a few example outputs. +```shell +$ rocm-smi --showhw + +================================= ROCm System Management Interface ================================= +====================================== Concise Hardware Info ======================================= +GPU NODE DID GUID GFX VER GFX RAS SDMA RAS UMC RAS VBIOS BUS PARTITION ID +0 4 0x74a0 3877 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 0 +1 5 0x74a0 54196 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 1 +2 6 0x74a0 36891 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 2 +3 7 0x74a0 28397 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 0 +4 8 0x74a0 45692 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 1 +5 9 0x74a0 61907 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 2 +6 10 0x74a0 52404 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 0 +7 11 0x74a0 4133 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 1 +8 12 0x74a0 21386 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 2 +9 13 0x74a0 10876 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 0 +10 14 0x74a0 63213 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 1 +11 15 0x74a0 46402 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 2 +==================================================================================================== +======================================= End of ROCm SMI Log ======================================== +``` +```shell +$ rocm-smi -i + +============================ ROCm System Management Interface ============================ +=========================================== ID =========================================== +GPU[0] : Device Name: Aqua Vanjaram [Instinct MI300A] +GPU[0] : Device ID: 0x74a0 +GPU[0] : Device Rev: 0x00 +GPU[0] : Subsystem ID: 0x74a0 +GPU[0] : GUID: 60294 +GPU[1] : Device Name: Aqua Vanjaram [Instinct MI300A] +GPU[1] : Device ID: 0x74a0 +GPU[1] : Device Rev: 0x00 +GPU[1] : Subsystem ID: 0x74a0 +GPU[1] : GUID: 35406 +GPU[2] : Device Name: Aqua Vanjaram [Instinct MI300A] +GPU[2] : Device ID: 0x74a0 +GPU[2] : Device Rev: 0x00 +GPU[2] : Subsystem ID: 0x74a0 +GPU[2] : GUID: 10263 +GPU[3] : Device Name: Aqua Vanjaram [Instinct MI300A] +GPU[3] : Device ID: 0x74a0 +GPU[3] : Device Rev: 0x00 +GPU[3] : Subsystem ID: 0x74a0 +GPU[3] : GUID: 52959 +========================================================================================== +================================== End of ROCm SMI Log =================================== +``` +```shell +$ rocm-smi --showproduct + +============================ ROCm System Management Interface ============================ +====================================== Product Info ====================================== +GPU[0] : Card Series: Aqua Vanjaram [Instinct MI300A] +GPU[0] : Card Model: 0x74a0 +GPU[0] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI] +GPU[0] : Card SKU: N/A +GPU[0] : Subsystem ID: 0x74a0 +GPU[0] : Device Rev: 0x00 +GPU[0] : Node ID: 4 +GPU[0] : GUID: 60294 +GPU[0] : GFX Version: gfx942 +GPU[1] : Card Series: Aqua Vanjaram [Instinct MI300A] +GPU[1] : Card Model: 0x74a0 +GPU[1] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI] +GPU[1] : Card SKU: N/A +GPU[1] : Subsystem ID: 0x74a0 +GPU[1] : Device Rev: 0x00 +GPU[1] : Node ID: 5 +GPU[1] : GUID: 35406 +GPU[1] : GFX Version: gfx942 +GPU[2] : Card Series: Aqua Vanjaram [Instinct MI300A] +GPU[2] : Card Model: 0x74a0 +GPU[2] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI] +GPU[2] : Card SKU: N/A +GPU[2] : Subsystem ID: 0x74a0 +GPU[2] : Device Rev: 0x00 +GPU[2] : Node ID: 6 +GPU[2] : GUID: 10263 +GPU[2] : GFX Version: gfx942 +GPU[3] : Card Series: Aqua Vanjaram [Instinct MI300A] +GPU[3] : Card Model: 0x74a0 +GPU[3] : Card Vendor: Advanced Micro Devices, Inc. [AMD/ATI] +GPU[3] : Card SKU: N/A +GPU[3] : Subsystem ID: 0x74a0 +GPU[3] : Device Rev: 0x00 +GPU[3] : Node ID: 7 +GPU[3] : GUID: 52959 +GPU[3] : GFX Version: gfx942 +========================================================================================== +================================== End of ROCm SMI Log =================================== +``` + +- **Documentation now includes C++ and Python: tutorials, API guides, and C++ reference pages** +See [https://rocm.docs.amd.com/](https://rocm.docs.amd.com/projects/rocm_smi_lib/en/latest/) once 6.1.1 is released. + + +### Changed +- **Aligned `rocm-smi` fields display "N/A" instead of "unknown"/"unsupported": `Card ID`, `DID`, `Model`, `SKU`, and `VBIOS`** +Impacts the following commands: + - `rocm-smi` - see other examples above for 6.1.1 + - `rocm-smi --showhw` - see other examples above for 6.1.1 + - `rocm-smi --showproduct` - see other examples above for 6.1.1 + - `rocm-smi -i` - see other examples above for 6.1.1 + - `rocm-smi --showvbios` - see example below +```shell +$ rocm-smi --showvbios + +============================ ROCm System Management Interface ============================ +========================================= VBIOS ========================================== +GPU[0] : VBIOS version: N/A +GPU[1] : VBIOS version: N/A +GPU[2] : VBIOS version: N/A +GPU[3] : VBIOS version: N/A +========================================================================================== +================================== End of ROCm SMI Log =================================== +``` +- **Removed stacked id formatting in `rocm-smi`** + This is to simplify identifiers helpful to users. More identifiers can be found on: + - `rocm-smi -i` + - `rocm-smi --showhw` + - `rocm-smi --showproduct` + + See examples shown above for 6.1.1. Previous output example can be seen below. + ```shell + $ rocm-smi + +========================================== ROCm System Management Interface ========================================== +==================================================== Concise Info ==================================================== +Device [Model : Revision] Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU% + Name (20 chars) (Junction) (Socket) (Mem, Compute) +====================================================================================================================== +0 [0x74a0 : 0x00] 40.0°C 102.0W NPS1, SPX 31Mhz 1300Mhz 0% manual 550.0W 0% 0% + Aqua Vanjaram [Insti +====================================================================================================================== +================================================ End of ROCm SMI Log ================================================= + ``` + +### Optimizations +- N/A + +### Fixed +- **Fixed HIP and ROCm SMI mismatch on GPU bus assignments** +These changes prompted us to to provide better visability for our device nodes and partition IDs (see addition provided above). See examples below for fix overview. +1. MI300a GPU device `Domain:Bus:Device.function` clashes with another AMD USB device +Cause(s): +a. ROCm SMI did not propagate domain consistently (for partitioned devices) +b. AMD GPU driver previously reported partition IDs within function node - causing clash with the other AMD USB device PCIe ID displayed. +2. Domain does not propagate for devices which support partitioning (MI300x/a) +Cause(s): +a. ROCm SMI did not propagate domain consistently (for partitioned devices) +3. Displayed topology will show disordered nodes when compared to HIP +Cause(s): +a. ROCm SMI did not propogate domain consistently (for partitioned devices) + +*Device in TPX* +```shell +$ rocm-smi --showhw + +================================= ROCm System Management Interface ================================= +====================================== Concise Hardware Info ======================================= +GPU NODE DID GUID GFX VER GFX RAS SDMA RAS UMC RAS VBIOS BUS PARTITION ID +0 4 0x74a0 3877 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 0 +1 5 0x74a0 54196 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 1 +2 6 0x74a0 36891 gfx942 ENABLED ENABLED DISABLED N/A 0000:01:00.0 2 +3 7 0x74a0 28397 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 0 +4 8 0x74a0 45692 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 1 +5 9 0x74a0 61907 gfx942 ENABLED ENABLED DISABLED N/A 0001:01:00.0 2 +6 10 0x74a0 52404 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 0 +7 11 0x74a0 4133 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 1 +8 12 0x74a0 21386 gfx942 ENABLED ENABLED DISABLED N/A 0002:01:00.0 2 +9 13 0x74a0 10876 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 0 +10 14 0x74a0 63213 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 1 +11 15 0x74a0 46402 gfx942 ENABLED ENABLED DISABLED N/A 0003:01:00.0 2 +==================================================================================================== +======================================= End of ROCm SMI Log ======================================== + +$ lspci -D|grep -i "process\|usb" +0000:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A] +0000:01:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Device 14df +0001:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A] +0002:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A] +0003:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Aqua Vanjaram [Instinct MI300A] +``` +```shell +$ rocm-smi ----showtoponuma + +======================================= Numa Nodes ======================================= +GPU[0] : (Topology) Numa Node: 0 +GPU[0] : (Topology) Numa Affinity: 0 +GPU[1] : (Topology) Numa Node: 0 +GPU[1] : (Topology) Numa Affinity: 0 +GPU[2] : (Topology) Numa Node: 0 +GPU[2] : (Topology) Numa Affinity: 0 +GPU[3] : (Topology) Numa Node: 1 +GPU[3] : (Topology) Numa Affinity: 1 +GPU[4] : (Topology) Numa Node: 1 +GPU[4] : (Topology) Numa Affinity: 1 +GPU[5] : (Topology) Numa Node: 1 +GPU[5] : (Topology) Numa Affinity: 1 +GPU[6] : (Topology) Numa Node: 2 +GPU[6] : (Topology) Numa Affinity: 2 +GPU[7] : (Topology) Numa Node: 2 +GPU[7] : (Topology) Numa Affinity: 2 +GPU[8] : (Topology) Numa Node: 2 +GPU[8] : (Topology) Numa Affinity: 2 +GPU[9] : (Topology) Numa Node: 3 +GPU[9] : (Topology) Numa Affinity: 3 +GPU[10] : (Topology) Numa Node: 3 +GPU[10] : (Topology) Numa Affinity: 3 +GPU[11] : (Topology) Numa Node: 3 +GPU[11] : (Topology) Numa Affinity: 3 +================================== End of ROCm SMI Log =================================== +``` +- **Fixed memory leaks** +Caused by not closing directories and creating maps nodes instead of checking using by using .at(). +- **Fixed Python rocm_smi API calls** +Fixed initializing calls which reuse rocmsmi.initializeRsmi() bindings. + +```shell +Traceback (most recent call last): + File "/home/charpoag/rocmsmi_pythonapi.py", line 9, in + rocm_smi.initializeRsmi() + File "/opt/rocm/libexec/rocm_smi/rocm_smi.py", line 3531, in initializeRsmi + ret_init = rocmsmi.rsmi_init(0) +NameError: name 'rocmsmi' is not defined +``` +- **Fixed rsmi_dev_activity_metric_get gfx/memory activity does not update with GPU activity** + Checks and forces rereading gpu metrics unconditionally. + +### Known Issues +- N/A + ## rocm_smi_lib for ROCm 6.1.0 ### Added @@ -63,6 +360,133 @@ Updated to use `rsmi_dev_power_get()` within CLI to provide a consistent device The `rsmi_dev_memory_partition_set` API is updated to handle the readonly SYSFS check. Corresponding tests and CLI (`rocm-smi --setmemorypartition` and `rocm-smi --resetmemorypartition`) calls were updated accordingly. - Fix `rocm-smi --showclkvolt` and `rocm-smi --showvc` displaying 0 for overdrive and voltage curve is not supported +### Known Issues +- **HIP and ROCm SMI mismatch on GPU bus assignments** +Three separate issues have been identified: +1. MI300a GPU device `Domain:Bus:Device.function` clashes with another AMD USB device +```shell +$ lspci|grep -i "process\|usb" +0000:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0 +0000:01:00.1 USB controller: Advanced Micro Devices, Inc. [AMD] Device 14df +0001:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0 +0002:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0 +0003:01:00.0 Processing accelerators: Advanced Micro Devices, Inc. [AMD/ATI] Device 74a0 +``` +```shell +$ rocm-smi --showbus + +============================ ROCm System Management Interface ============================ +======================================= PCI Bus ID ======================================= +GPU[0] : PCI Bus: 0000:01:00.0 +GPU[1] : PCI Bus: 0000:01:00.1 +GPU[2] : PCI Bus: 0000:01:00.2 +GPU[3] : PCI Bus: 0000:01:00.3 +... +========================================================================================== +================================== End of ROCm SMI Log =================================== +``` +2. Domain does not propagate for devices which support partitioning (MI300x/a) +For example, a device in non-SPX (single partition) - devices will overlap in function device. +```shell +$ rocm-smi --showbus + ============================ ROCm System Management Interface ============================ +======================================= PCI Bus ID ======================================= +GPU[0] : PCI Bus: 0000:01:00.0 +GPU[1] : PCI Bus: 0000:01:00.1 +GPU[2] : PCI Bus: 0000:01:00.1 +GPU[3] : PCI Bus: 0000:01:00.1 +GPU[4] : PCI Bus: 0000:01:00.1 +GPU[5] : PCI Bus: 0000:01:00.2 +GPU[6] : PCI Bus: 0000:01:00.2 +GPU[7] : PCI Bus: 0000:01:00.2 +GPU[8] : PCI Bus: 0000:01:00.2 +GPU[9] : PCI Bus: 0000:01:00.3 +GPU[10] : PCI Bus: 0000:01:00.3 +GPU[11] : PCI Bus: 0000:01:00.3 +GPU[12] : PCI Bus: 0000:01:00.3 +GPU[13] : PCI Bus: 0000:01:00.4 +GPU[14] : PCI Bus: 0000:01:00.4 +GPU[15] : PCI Bus: 0000:01:00.4 +GPU[16] : PCI Bus: 0000:01:00.4 +GPU[17] : PCI Bus: 0000:01:00.5 +GPU[18] : PCI Bus: 0000:01:00.5 +GPU[19] : PCI Bus: 0000:01:00.5 +GPU[20] : PCI Bus: 0000:01:00.5 +GPU[21] : PCI Bus: 0001:01:00.0 +GPU[22] : PCI Bus: 0002:01:00.0 +GPU[23] : PCI Bus: 0003:01:00.0 +================================== End of ROCm SMI Log =================================== +``` +3. Displayed topology will show disordered nodes when compared to HIP +See rocm-smi output vs transferbench. +```shell +rocm-smi --showtopo option is not displaying the correct information when the MI300 driver is loaded in TPX mode. + + +============================ ROCm System Management Interface ============================ +================================ Weight between two GPUs ================================= +get_link_weight_topology, Not supported on the given system +ERROR: GPU[1] : Cannot read Link Weight: Not supported on this machine + + GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 GPU8 GPU9 GPU10 GPU11 +GPU0 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI +GPU1 XGMI 0 XXXX XXXX XXXX XGMI XGMI XGMI XGMI XGMI XGMI XGMI +GPU2 XGMI XXXX 0 XXXX XXXX XGMI XGMI XGMI XGMI XGMI XGMI XGMI +GPU3 XGMI XXXX XXXX 0 XXXX XGMI XGMI XGMI XGMI XGMI XGMI XGMI +GPU4 XGMI XXXX XXXX XXXX 0 XGMI XGMI XGMI XGMI XGMI XGMI XGMI +GPU5 XGMI XGMI XGMI XGMI XGMI 0 XXXX XXXX XXXX XGMI XGMI XGMI +GPU6 XGMI XGMI XGMI XGMI XGMI XXXX 0 XXXX XXXX XGMI XGMI XGMI +GPU7 XGMI XGMI XGMI XGMI XGMI XXXX XXXX 0 XXXX XGMI XGMI XGMI +GPU8 XGMI XGMI XGMI XGMI XGMI XXXX XXXX XXXX 0 XGMI XGMI XGMI +GPU9 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI XGMI +GPU10 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0 XGMI +GPU11 XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI XGMI 0 + +======================================= Numa Nodes ======================================= +GPU[0] : (Topology) Numa Node: 0 +GPU[0] : (Topology) Numa Affinity: 0 +GPU[1] : (Topology) Numa Node: 0 +GPU[1] : (Topology) Numa Affinity: 0 +GPU[2] : (Topology) Numa Node: 0 +GPU[2] : (Topology) Numa Affinity: 1 +GPU[3] : (Topology) Numa Node: 0 +GPU[3] : (Topology) Numa Affinity: 2 +GPU[4] : (Topology) Numa Node: 0 +GPU[4] : (Topology) Numa Affinity: 3 +GPU[5] : (Topology) Numa Node: 0 +GPU[5] : (Topology) Numa Affinity: 0 +GPU[6] : (Topology) Numa Node: 0 +GPU[6] : (Topology) Numa Affinity: 1 +GPU[7] : (Topology) Numa Node: 0 +GPU[7] : (Topology) Numa Affinity: 2 +GPU[8] : (Topology) Numa Node: 0 +GPU[8] : (Topology) Numa Affinity: 3 +GPU[9] : (Topology) Numa Node: 1 +GPU[9] : (Topology) Numa Affinity: 1 +GPU[10] : (Topology) Numa Node: 2 +GPU[10] : (Topology) Numa Affinity: 2 +GPU[11] : (Topology) Numa Node: 3 +GPU[11] : (Topology) Numa Affinity: 3 +================================== End of ROCm SMI Log =================================== +``` + +```shell +./Transferbench +... + | GPU 00 | GPU 01 | GPU 02 | GPU 03 | GPU 04 | GPU 05 | GPU 06 | GPU 07 | PCIe Bus ID | #CUs | Closest NUMA | DMA engines +--------+--------+--------+--------+--------+--------+--------+--------+--------+--------------+------+-------------+------------ + GPU 00 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:0c:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + GPU 01 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:22:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + GPU 02 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:38:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + GPU 03 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | 0000:5c:00.0 | 304 | 0 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + GPU 04 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | XGMI-1 | 0000:9f:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + GPU 05 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | XGMI-1 | 0000:af:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + GPU 06 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | XGMI-1 | 0000:bf:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 + GPU 07 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | XGMI-1 | - | 0000:df:00.0 | 304 | 1 |0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 +... +``` + + ## rocm_smi_lib for ROCm 6.0.0 ### Added diff --git a/projects/rocm-smi-lib/README.md b/projects/rocm-smi-lib/README.md index 124ec3322b..2429bca2dc 100755 --- a/projects/rocm-smi-lib/README.md +++ b/projects/rocm-smi-lib/README.md @@ -10,6 +10,11 @@ The information contained herein is for informational purposes only, and is subj © 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved. +## Planned Deprication Notice +ROCm System Management Interface (ROCm SMI) Library is planned to be ***depricated***. Release date to be announced soon. Please start migrating to AMD SMI. + - Documentation: [https://rocm.docs.amd.com](https://rocm.docs.amd.com/projects/amdsmi/en/latest/) + - Github: [https://github.com/ROCm/amdsmi](https://github.com/ROCm/amdsmi) + ## Installation ### Install amdgpu using ROCm @@ -21,7 +26,7 @@ wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6. sudo apt install ./amdgpu-install_6.0.60002-1_all.deb sudo amdgpu-install --usecase=rocm ``` -* rocm-smi --help +* `rocm-smi --help` ## Building ROCm SMI diff --git a/projects/rocm-smi-lib/python_smi_tools/rocm_smi.py b/projects/rocm-smi-lib/python_smi_tools/rocm_smi.py index c13d4c2b72..ae6435debe 100755 --- a/projects/rocm-smi-lib/python_smi_tools/rocm_smi.py +++ b/projects/rocm-smi-lib/python_smi_tools/rocm_smi.py @@ -2941,7 +2941,7 @@ def showVbiosVersion(deviceList): """ printLogSpacer(' VBIOS ') for device in deviceList: - printLog(device, 'VBIOS version', getVbiosVersion(device)) + printLog(device, 'VBIOS version', getVbiosVersion(device, silent=True)) printLogSpacer()