b7221c64b0
https://github.com/ROCm/rocm_smi_lib/pull/197 https://advanced-micro-devices-demo--197.com.readthedocs.build/projects/rocm_smi_lib/en/197/ Change-Id: I64386a4f364e40ce61ad9963376d2686db2aa36d
520 righe
31 KiB
ReStructuredText
520 righe
31 KiB
ReStructuredText
.. meta::
|
|
:description: Learn about using the ROCm SMI library with Python.
|
|
:keywords: install, SMI, library, api, python, system management interface
|
|
|
|
**************************
|
|
Using ROCm SMI with Python
|
|
**************************
|
|
|
|
This tool acts as a command line interface for manipulating and monitoring the amdgpu kernel, and is intended to replace and deprecate the existing ``rocm_smi.py`` CLI tool.
|
|
It uses Ctypes to call the ``rocm_smi_lib`` API.
|
|
|
|
To use ROCm SMI, you must have ROCm SMI library (``librocm_smi64``) installed and should have at least one AMD GPU with the ROCm driver installed.
|
|
|
|
Installation
|
|
============
|
|
|
|
Follow the installation procedure for ``rocm_smi_lib``. Refer to :doc:`../install/install`.
|
|
|
|
.. note::
|
|
|
|
``hipcc`` and other compilers will not automatically link in the ``librocm_smi64`` dynamic library. To ensure the
|
|
``librocm_smi64.so`` can be located, you must either set the ``LD_LIBRARY_PATH`` environment variable to the
|
|
directory containing ``librocm_smi64.so`` (usually ``/opt/rocm/lib`` or pass the ``-lrocm_smi64`` flag to the compiler.
|
|
|
|
Version
|
|
=======
|
|
|
|
The SMI will report two "versions": the ``ROCM-SMI`` version and the ``ROCM-SMI-LIB`` version.
|
|
|
|
- ``ROCM-SMI`` version is the CLI/tool version number with commit ID appended after + sign.
|
|
|
|
- ``ROCM-SMI-LIB`` version is the library package version number.
|
|
|
|
.. code-block:: shell-session
|
|
|
|
ROCM-SMI version: 2.0.0+8e78352
|
|
ROCM-SMI-LIB version: 6.1.0
|
|
|
|
Usage
|
|
=====
|
|
|
|
For detailed and up to date usage information, consult the help.
|
|
|
|
.. code-block:: shell-session
|
|
|
|
/opt/rocm/bin/rocm-smi -h
|
|
|
|
The following is the output from the ``-h`` flag:
|
|
|
|
.. code-block:: shell-session
|
|
|
|
$ /opt/rocm/bin/rocm-smi -h
|
|
usage: rocm-smi [-h] [-V] [-d DEVICE [DEVICE ...]] [--alldevices] [--showhw] [-a] [-i] [-v] [-e [EVENT [EVENT ...]]]
|
|
[--showdriverversion] [--showtempgraph] [--showfwinfo [BLOCK [BLOCK ...]]] [--showmclkrange]
|
|
[--showmemvendor] [--showsclkrange] [--showproductname] [--showserial] [--showuniqueid]
|
|
[--showvoltagerange] [--showbus] [--showpagesinfo] [--showpendingpages] [--showretiredpages]
|
|
[--showunreservablepages] [-f] [-P] [-t] [-u] [--showmemuse] [--showvoltage] [-b] [-c] [-g] [-l] [-M]
|
|
[-m] [-o] [-p] [-S] [-s] [--showmeminfo TYPE [TYPE ...]] [--showpids [VERBOSE]]
|
|
[--showpidgpus [SHOWPIDGPUS [SHOWPIDGPUS ...]]] [--showreplaycount]
|
|
[--showrasinfo [SHOWRASINFO [SHOWRASINFO ...]]] [--showvc] [--showxgmierr] [--showtopo]
|
|
[--showtopoaccess] [--showtopoweight] [--showtopohops] [--showtopotype] [--showtoponuma]
|
|
[--showenergycounter] [--shownodesbw] [--showcomputepartition] [--showmemorypartition] [-r]
|
|
[--resetfans] [--resetprofile] [--resetpoweroverdrive] [--resetxgmierr] [--resetperfdeterminism]
|
|
[--resetcomputepartition] [--resetmemorypartition] [--setclock TYPE LEVEL] [--setsclk LEVEL [LEVEL ...]]
|
|
[--setmclk LEVEL [LEVEL ...]] [--setpcie LEVEL [LEVEL ...]] [--setslevel SCLKLEVEL SCLK SVOLT]
|
|
[--setmlevel MCLKLEVEL MCLK MVOLT] [--setvc POINT SCLK SVOLT] [--setsrange SCLKMIN SCLKMAX]
|
|
[--setextremum min|max sclk|mclk CLK] [--setmrange MCLKMIN MCLKMAX] [--setfan LEVEL]
|
|
[--setperflevel LEVEL] [--setoverdrive %] [--setmemoverdrive %] [--setpoweroverdrive WATTS]
|
|
[--setprofile SETPROFILE] [--setperfdeterminism SCLK]
|
|
[--setcomputepartition {CPX,SPX,DPX,TPX,QPX,cpx,spx,dpx,tpx,qpx}]
|
|
[--setmemorypartition {NPS1,NPS2,NPS4,NPS8,nps1,nps2,nps4,nps8}] [--rasenable BLOCK ERRTYPE]
|
|
[--rasdisable BLOCK ERRTYPE] [--rasinject BLOCK] [--gpureset] [--load FILE | --save FILE]
|
|
[--autorespond RESPONSE] [--loglevel LEVEL] [--json] [--csv]
|
|
|
|
AMD ROCm System Management Interface | ROCM-SMI version: 2.0.0+8e78352
|
|
|
|
optional arguments:
|
|
-h, --help show this help message and exit
|
|
--gpureset Reset specified GPU (One GPU must be specified)
|
|
--load FILE Load Clock, Fan, Performance and Profile settings
|
|
from FILE
|
|
--save FILE Save Clock, Fan, Performance and Profile settings to
|
|
FILE
|
|
|
|
-V, --version Show version information
|
|
|
|
-d DEVICE [DEVICE ...], --device DEVICE [DEVICE ...] Execute command on specified device
|
|
|
|
Display Options:
|
|
--alldevices
|
|
--showhw Show Hardware details
|
|
-a, --showallinfo Show Temperature, Fan and Clock values
|
|
|
|
Topology:
|
|
-i, --showid Show DEVICE ID
|
|
-v, --showvbios Show VBIOS version
|
|
-e [EVENT [EVENT ...]], --showevents [EVENT [EVENT ...]] Show event list
|
|
--showdriverversion Show kernel driver version
|
|
--showtempgraph Show Temperature Graph
|
|
--showfwinfo [BLOCK [BLOCK ...]] Show FW information
|
|
--showmclkrange Show mclk range
|
|
--showmemvendor Show GPU memory vendor
|
|
--showsclkrange Show sclk range
|
|
--showproductname Show SKU/Vendor name
|
|
--showserial Show GPU's Serial Number
|
|
--showuniqueid Show GPU's Unique ID
|
|
--showvoltagerange Show voltage range
|
|
--showbus Show PCI bus number
|
|
|
|
Pages information:
|
|
--showpagesinfo Show retired, pending and unreservable pages
|
|
--showpendingpages Show pending retired pages
|
|
--showretiredpages Show retired pages
|
|
--showunreservablepages Show unreservable pages
|
|
|
|
Hardware-related information:
|
|
-f, --showfan Show current fan speed
|
|
-P, --showpower Show current Average or Socket Graphics Package Power
|
|
Consumption
|
|
-t, --showtemp Show current temperature
|
|
-u, --showuse Show current GPU use
|
|
--showmemuse Show current GPU memory used
|
|
--showvoltage Show current GPU voltage
|
|
|
|
Software-related/controlled information:
|
|
-b, --showbw Show estimated PCIe use
|
|
-c, --showclocks Show current clock frequencies
|
|
-g, --showgpuclocks Show current GPU clock frequencies
|
|
-l, --showprofile Show Compute Profile attributes
|
|
-M, --showmaxpower Show maximum graphics package power this GPU will
|
|
consume
|
|
-m, --showmemoverdrive Show current GPU Memory Clock OverDrive level
|
|
-o, --showoverdrive Show current GPU Clock OverDrive level
|
|
-p, --showperflevel Show current DPM Performance Level
|
|
-S, --showclkvolt Show supported GPU and Memory Clocks and Voltages
|
|
-s, --showclkfrq Show supported GPU and Memory Clock
|
|
--showmeminfo TYPE [TYPE ...] Show Memory usage information for given block(s) TYPE
|
|
--showpids [VERBOSE] Show current running KFD PIDs (pass details to
|
|
VERBOSE for detailed information)
|
|
--showpidgpus [SHOWPIDGPUS [SHOWPIDGPUS ...]] Show GPUs used by specified KFD PIDs (all if no arg
|
|
given)
|
|
--showreplaycount Show PCIe Replay Count
|
|
--showrasinfo [SHOWRASINFO [SHOWRASINFO ...]] Show RAS enablement information and error counts for
|
|
the specified block(s) (all if no arg given)
|
|
--showvc Show voltage curve
|
|
--showxgmierr Show XGMI error information since last read
|
|
--showtopo Show hardware topology information
|
|
--showtopoaccess Shows the link accessibility between GPUs
|
|
--showtopoweight Shows the relative weight between GPUs
|
|
--showtopohops Shows the number of hops between GPUs
|
|
--showtopotype Shows the link type between GPUs
|
|
--showtoponuma Shows the numa nodes
|
|
--showenergycounter Energy accumulator that stores amount of energy
|
|
consumed
|
|
--shownodesbw Shows the numa nodes
|
|
--showcomputepartition Shows current compute partitioning
|
|
--showmemorypartition Shows current memory partition
|
|
|
|
Set options:
|
|
--setclock TYPE LEVEL Set Clock Frequency Level(s) for specified clock
|
|
(requires manual Perf level)
|
|
--setsclk LEVEL [LEVEL ...] Set GPU Clock Frequency Level(s) (requires manual
|
|
Perf level)
|
|
--setmclk LEVEL [LEVEL ...] Set GPU Memory Clock Frequency Level(s) (requires
|
|
manual Perf level)
|
|
--setpcie LEVEL [LEVEL ...] Set PCIE Clock Frequency Level(s) (requires manual
|
|
Perf level)
|
|
--setslevel SCLKLEVEL SCLK SVOLT Change GPU Clock frequency (MHz) and Voltage (mV) for
|
|
a specific Level
|
|
--setmlevel MCLKLEVEL MCLK MVOLT Change GPU Memory clock frequency (MHz) and Voltage
|
|
for (mV) a specific Level
|
|
--setvc POINT SCLK SVOLT Change SCLK Voltage Curve (MHz mV) for a specific
|
|
point
|
|
--setsrange SCLKMIN SCLKMAX Set min and max SCLK speed
|
|
--setextremum min|max sclk|mclk CLK Set min/max of SCLK/MCLK speed
|
|
--setmrange MCLKMIN MCLKMAX Set min and max MCLK speed
|
|
--setfan LEVEL Set GPU Fan Speed (Level or %)
|
|
--setperflevel LEVEL Set Performance Level
|
|
--setoverdrive % Set GPU OverDrive level (requires manual|high Perf
|
|
level)
|
|
--setmemoverdrive % Set GPU Memory Overclock OverDrive level (requires
|
|
manual|high Perf level)
|
|
--setpoweroverdrive WATTS Set the maximum GPU power using Power OverDrive in
|
|
Watts
|
|
--setprofile SETPROFILE Specify Power Profile level (#) or a quoted string of
|
|
CUSTOM Profile attributes "# # # #..." (requires
|
|
manual Perf level)
|
|
--setperfdeterminism SCLK Set clock frequency limit to get minimal performance
|
|
variation
|
|
--setcomputepartition {CPX,SPX,DPX,TPX,QPX,cpx,spx,dpx,tpx,qpx} Set compute partition
|
|
--setmemorypartition {NPS1,NPS2,NPS4,NPS8,nps1,nps2,nps4,nps8} Set memory partition
|
|
--rasenable BLOCK ERRTYPE Enable RAS for specified block and error type
|
|
--rasdisable BLOCK ERRTYPE Disable RAS for specified block and error type
|
|
--rasinject BLOCK Inject RAS poison for specified block (ONLY WORKS ON
|
|
UNSECURE BOARDS)
|
|
|
|
Reset options:
|
|
-r, --resetclocks Reset clocks and OverDrive to default
|
|
--resetfans Reset fans to automatic (driver) control
|
|
--resetprofile Reset Power Profile back to default
|
|
--resetpoweroverdrive Set the maximum GPU power back to the device deafult
|
|
state
|
|
--resetxgmierr Reset XGMI error count
|
|
--resetperfdeterminism Disable performance determinism
|
|
--resetcomputepartition Resets to boot compute partition state
|
|
--resetmemorypartition Resets to boot memory partition state
|
|
|
|
Auto-response options:
|
|
--autorespond RESPONSE Response to automatically provide for all prompts
|
|
(NOT RECOMMENDED)
|
|
|
|
Output options:
|
|
--loglevel LEVEL How much output will be printed for what program is
|
|
doing, one of debug/info/warning/error/critical
|
|
--json Print output in JSON format
|
|
--csv Print output in CSV format
|
|
|
|
Detailed option descriptions
|
|
============================
|
|
|
|
--setextremum <[min or max] [sclk or mclk] [value in MHz to set to]>
|
|
Provided ASIC support, users can now set a maximum or minimum sclk or mclk value through our Python CLI tool (`rocm-smi --setextremum max sclk 1500`). See example below.
|
|
|
|
.. code-block:: shell-session
|
|
|
|
$ sudo /opt/rocm/bin/rocm-smi --setextremum max sclk 2100
|
|
|
|
============================ ROCm System Management Interface ============================
|
|
|
|
******WARNING******
|
|
|
|
Operating your AMD GPU outside of official AMD specifications or outside of
|
|
factory settings, including but not limited to the conducting of overclocking,
|
|
over-volting or under-volting (including use of this interface software,
|
|
even if such software has been directly or indirectly provided by AMD or otherwise
|
|
affiliated in any way with AMD), may cause damage to your AMD GPU, system components
|
|
and/or result in system failure, as well as cause other problems.
|
|
DAMAGES CAUSED BY USE OF YOUR AMD GPU OUTSIDE OF OFFICIAL AMD SPECIFICATIONS OR
|
|
OUTSIDE OF FACTORY SETTINGS ARE NOT COVERED UNDER ANY AMD PRODUCT WARRANTY AND
|
|
MAY NOT BE COVERED BY YOUR BOARD OR SYSTEM MANUFACTURER'S WARRANTY.
|
|
Use this utility with caution.
|
|
|
|
Do you accept these terms? [y/N] y
|
|
================================ Set Valid sclk Extremum =================================
|
|
GPU[0] : Successfully set max sclk to 2100(MHz)
|
|
GPU[1] : Successfully set max sclk to 2100(MHz)
|
|
GPU[2] : Successfully set max sclk to 2100(MHz)
|
|
GPU[3] : Successfully set max sclk to 2100(MHz)
|
|
================================== End of ROCm SMI Log ===================================
|
|
|
|
--setsclk, --setmclk <# [# # ...]>
|
|
This allows you to set a mask for the levels. For example, if a GPU has 8 clock levels,
|
|
you can set a mask to use levels 0, 5, 6 and 7 with --setsclk 0 5 6 7 . This will only
|
|
use the base level, and the top 3 clock levels. This will allow you to keep the GPU at
|
|
base level when there is no GPU load, and the top 3 levels when the GPU load increases.
|
|
|
|
.. note::
|
|
|
|
The clock levels will change dynamically based on GPU load based on the default
|
|
Compute and Graphics profiles. The thresholds and delays for a custom mask cannot
|
|
be controlled through the SMI tool.
|
|
|
|
This flag automatically sets the Performance Level to ``manual`` as the mask is not
|
|
applied when the Performance level is set to ``auto``.
|
|
|
|
--setfan <LEVEL>
|
|
This sets the fan speed to a value ranging from 0 to maxlevel, or from 0%-100%.
|
|
|
|
If the level ends with a %, the fan speed is calculated as pct*maxlevel/100 (maxlevel is usually 255, but is determined by the ASIC).
|
|
|
|
.. note::
|
|
|
|
While the hardware is usually capable of overriding this value when required, it is
|
|
recommended to not set the fan level lower than the default value for extended periods
|
|
of time.
|
|
|
|
--setperflevel <LEVEL>
|
|
This lets you use the pre-defined Performance Level values for clocks and power profile, which can include:
|
|
|
|
* ``auto`` (Automatically change values based on GPU workload)
|
|
|
|
* ``low`` (Keep values low, regardless of workload)
|
|
|
|
* ``high`` (Keep values high, regardless of workload)
|
|
|
|
* ``manual`` (Only use values defined by ``--setsclk`` and ``--setmclk``)
|
|
|
|
--setoverdrive, --setmemoverdrive <#>
|
|
.. warning::
|
|
|
|
DEPRECATED IN NEWER KERNEL VERSIONS. Use ``--setslevel`` or ``--setmlevel`` instead.
|
|
|
|
This sets the percentage above maximum for the max Performance Level.
|
|
For example, ``--setoverdrive`` 20 will increase the top sclk level by 20%, similarly
|
|
``--setmemoverdrive`` 20 will increase the top mclk level by 20%. Thus if the maximum
|
|
clock level is 1000MHz, then ``--setoverdrive`` 20 will increase the maximum clock to 1200MHz.
|
|
|
|
.. note::
|
|
|
|
This option can be used in conjunction with the ``--setsclk``/``--setmclk`` mask.
|
|
|
|
Operating the GPU outside of specifications can cause irreparable damage to your hardware
|
|
Observe the warning displayed when using this option.
|
|
|
|
This flag automatically sets the clock to the highest level, as only the highest level is
|
|
increased by the :ref:`OverDrive <overdrive-settings>` value.
|
|
|
|
--setpoweroverdrive, --resetpoweroverdrive <#>
|
|
This allows users to change the maximum power available to a GPU package.
|
|
The input value is in Watts. This limit is enforced by the hardware, and
|
|
some cards allow users to set it to a higher value than the default that
|
|
ships with the GPU. This Power OverDrive mode allows the GPU to run at
|
|
higher frequencies for longer periods of time, though this may mean the
|
|
GPU uses more power than it is allowed to use per power supply
|
|
specifications. Each GPU has a model-specific maximum Power OverDrive that
|
|
is will take; attempting to set a higher limit than that will cause this
|
|
command to fail.
|
|
|
|
.. note::
|
|
|
|
Operating the GPU outside of specifications can cause irreparable damage to your hardware.
|
|
Observe the warning displayed when using this option.
|
|
|
|
--setprofile <SETPROFILE>
|
|
The Compute Profile accepts 1 or n parameters, either the Profile to select (see ``--showprofile`` for a list
|
|
of preset Power Profiles) or a quoted string of values for the CUSTOM profile.
|
|
These values can vary based on the ASIC, and may include:
|
|
|
|
.. code-block:: shell-session
|
|
|
|
| Setting | Description |
|
|
|---------------------|--------------------------------------------------------------------------------------------|
|
|
| SCLK_PROFILE_ENABLE | Whether or not to apply the 3 following SCLK settings (0=disable,1=enable) |
|
|
| | **NOTE: This is a hidden field. If set to 0, the following 3 values are displayed as '-’** |
|
|
| SCLK_UP_HYST | Delay before sclk is increased (in milliseconds) |
|
|
| SCLK_DOWN_HYST | Delay before sclk is decresed (in milliseconds) |
|
|
| SCLK_ACTIVE_LEVEL | Workload required before sclk levels change (in %) |
|
|
| MCLK_PROFILE_ENABLE | Whether or not to apply the 3 following MCLK settings (0=disable,1=enable) |
|
|
| | **NOTE: This is a hidden field. If set to 0, the following 3 values are displayed as '-'** |
|
|
| MCLK_UP_HYST | Delay before mclk is increased (in milliseconds) |
|
|
| MCLK_DOWN_HYST | Delay before mclk is decresed (in milliseconds) |
|
|
| MCLK_ACTIVE_LEVEL | Workload required before mclk levels change (in %) |
|
|
|
|
Other settings
|
|
--------------
|
|
|
|
.. code-block:: shell-session
|
|
|
|
| Setting | Description |
|
|
|------------------|---------------------------------------------------------------------------|
|
|
| BUSY_SET_POINT | Threshold for raw activity level before levels change |
|
|
| FPS | Frames Per Second |
|
|
| USE_RLC_BUSY | When set to 1, DPM is switched up as long as RLC busy message is received |
|
|
| MIN_ACTIVE_LEVEL | Workload required before levels change (in %) |
|
|
|
|
.. note::
|
|
|
|
When a compute queue is detected, the COMPUTE Power Profile values will be automatically
|
|
applied to the system, provided that the Perf Level is set to ``auto``.
|
|
|
|
The CUSTOM Power Profile is only applied when the Performance Level is set to ``manual``
|
|
so using this flag will automatically set the performance level to ``manual``.
|
|
|
|
It is not possible to modify the non-CUSTOM Profiles. These are hard-coded by the kernel.
|
|
|
|
-P, --showpower
|
|
Show average or instantaneous socket graphics package power consumption.
|
|
|
|
"Graphics Package" refers to the GPU plus any HBM (High-Bandwidth memory) modules, if present.
|
|
|
|
-M, --showmaxpower
|
|
Show the maximum Graphics Package power that the GPU will attempt to consume.
|
|
|
|
This limit is enforced by the hardware.
|
|
|
|
--loglevel
|
|
This will allow the user to set a logging level for the SMI's actions. Currently this is
|
|
only implemented for ``sysfs`` writes, but can easily be expanded upon in the future to log
|
|
other things from the SMI.
|
|
|
|
--showmeminfo
|
|
This allows the user to see the amount of used and total memory for a given block (``vram``,
|
|
``vis_vram``, ``gtt``). It returns the number of bytes used and total number of bytes for each block
|
|
``all`` can be passed as a field to return all blocks, otherwise a quoted-string is used for
|
|
multiple values (such as ``vram vis_vram``).
|
|
|
|
``vram`` refers to the Video RAM, or graphics memory, on the specified device
|
|
``vis_vram`` refers to Visible VRAM, which is the CPU-accessible video memory on the device
|
|
``gtt`` refers to the Graphics Translation Table
|
|
|
|
-b, --showbw
|
|
This shows an approximation of the number of bytes received and sent by the GPU over
|
|
the last second through the PCIe bus. Note that this will not work for APUs since data for
|
|
the GPU portion of the APU goes through the memory fabric and does not 'enter/exit'
|
|
the chip via the PCIe interface, thus no accesses are generated, and the performance
|
|
counters can't count accesses that are not generated.
|
|
|
|
.. note::
|
|
|
|
It is not possible to easily grab the size of every packet that is transmitted
|
|
in real time, so the kernel estimates the bandwidth by taking the maximum payload size (mps),
|
|
which is the max size that a PCIe packet can be. and multiplies it by the number of packets
|
|
received and sent. This means that the SMI will report the maximum estimated bandwidth,
|
|
the actual usage could (and likely will be) less.
|
|
|
|
--showrasinfo
|
|
This shows the RAS information for a given block. This includes enablement of the block
|
|
(currently GFX, SDMA and UMC are the only supported blocks) and the number of errors.
|
|
|
|
* ``ue`` - Uncorrectable errors
|
|
|
|
* ``ce`` - Correctable errors
|
|
|
|
Clock type descriptions
|
|
=======================
|
|
|
|
.. code-block:: shell-session
|
|
|
|
| Clock type | Description |
|
|
| ---------- | -------------------------------------------------------------------------------------------- |
|
|
| DCEFCLK | DCE (Display) |
|
|
| FCLK | Data fabric (VG20 and later) - Data flow from XGMI, Memory, PCIe |
|
|
| SCLK | GFXCLK (Graphics core) |
|
|
| | **Note - SOCCLK split from SCLK as of Vega10. Pre-Vega10 they were both controlled by SCLK** |
|
|
| MCLK | GPU Memory (VRAM) |
|
|
| PCLK | PCIe bus |
|
|
| | **Note - This gives 2 speeds, PCIe Gen1 x1 and the highest available based on the hardware** |
|
|
| SOCCLK | System clock (VG10 and later) - Data Fabric (DF), MM HUB, AT HUB, SYSTEM HUB, OSS, DFD |
|
|
| | **Note - DF split from SOCCLK as of Vega20. Pre-Vega20 they were both controlled by SOCCLK** |
|
|
|
|
--gpureset
|
|
This flag will attempt to reset the GPU for a specified device. This will invoke the GPU reset through
|
|
the kernel debugfs file ``amdgpu_gpu_recover``. Note that GPU reset will not always work, depending on the
|
|
manner in which the GPU is hung.
|
|
|
|
--showdriverversion
|
|
This flag will print out the AMDGPU module version for amdgpu-pro or ROCm kernels. For other kernels,
|
|
it will simply print out the name of the kernel (``uname -r``).
|
|
|
|
--showserial
|
|
This flag will print out the serial number for the graphics card.
|
|
|
|
.. note::
|
|
|
|
This is currently only supported on Vega20 server cards that support it. Consumer cards and
|
|
cards older than Vega20 will not support this feature.
|
|
|
|
--showproductname
|
|
This uses the ``pci.ids`` file to print out more information regarding the GPUs on the system.
|
|
``update-pciids`` may need to be executed on the machine to get the latest PCI ID snapshot,
|
|
as certain newer GPUs will not be present in the stock ``pci.ids`` file, and the file may even
|
|
be absent on certain OS installation types.
|
|
|
|
--showpagesinfo, --showretiredpages, --showpendingpages, --showunreservablepages
|
|
These flags display the different "bad pages" as reported by the kernel. The three
|
|
types of pages are:
|
|
|
|
* Retired pages (reserved pages) - These pages are reserved and are unable to be used.
|
|
|
|
* Pending pages - These pages are pending for reservation, and will be reserved/retired.
|
|
|
|
* Unreservable pages - These pages are not reservable for some reason.
|
|
|
|
--showmemuse, --showuse, --showmeminfo
|
|
``--showuse`` and ``--showmemuse`` are used to indicate how busy the respective blocks are. For
|
|
example, for ``--showuse (gpu_busy_percent sysfs file)``, the SMU samples every ms or so to see
|
|
if any GPU block (RLC, MEC, PFP, CP) is busy. If so, that's 1 (or high). If not, that's 0 (low).
|
|
If we have 5 high and 5 low samples, that means 50% utilization (50% GPU busy, or 50% GPU use).
|
|
The windows and sampling vary from generation to generation, but that is how GPU and VRAM use
|
|
is calculated in a generic sense.
|
|
``--showmeminfo`` (and VRAM% in concise output) will show the amount of VRAM used (visible, total, GTT),
|
|
as well as the total available for those partitions. The percentage shown there indicates the
|
|
amount of used memory in terms of current allocations.
|
|
|
|
.. _overdrive-settings:
|
|
|
|
OverDrive settings
|
|
==================
|
|
|
|
Enabling OverDrive requires both a card that support OverDrive and a driver parameter that enables its use.
|
|
Because OverDrive features can damage your card, most workstation and server GPUs cannot use OverDrive.
|
|
Consumer GPUs that can use OverDrive must enable this feature by setting bit 14 in the amdgpu driver's
|
|
ppfeaturemask module parameter
|
|
|
|
For OverDrive functionality, the OverDrive bit (bit 14) must be enabled (by default, the
|
|
OverDrive bit is disabled on the ROCK and upstream kernels). This can be done by setting
|
|
amdgpu.ppfeaturemask accordingly in the kernel parameters, or by changing the default value
|
|
inside amdgpu_drv.c (if building your own kernel).
|
|
|
|
As an example, if the ``ppfeaturemask`` is set to ``0xffffbfff`` (``11111111111111111011111111111111``),
|
|
then enabling the OverDrive bit would make it ``0xffffffff`` (``11111111111111111111111111111111``).
|
|
|
|
These are the flags that require OverDrive functionality to be enabled for the flag to work:
|
|
|
|
* ``--showclkvolt``
|
|
|
|
* ``--showvoltagerange``
|
|
|
|
* ``--showvc``
|
|
|
|
* ``--showsclkrange``
|
|
|
|
* ``--showmclkrange``
|
|
|
|
* ``--setslevel``
|
|
|
|
* ``--setmlevel``
|
|
|
|
* ``--setoverdrive``
|
|
|
|
* ``--setpoweroverdrive``
|
|
|
|
* ``--resetpoweroverdrive``
|
|
|
|
* ``--setvc``
|
|
|
|
* ``--setsrange``
|
|
|
|
* ``--setmrange``
|