diff --git a/docs/conceptual/test.md b/docs/conceptual/test.md new file mode 100644 index 0000000000..9daeafb986 --- /dev/null +++ b/docs/conceptual/test.md @@ -0,0 +1 @@ +test diff --git a/docs/doxygen/Doxyfile b/docs/doxygen/Doxyfile index 144fc42d49..4e09b4615a 100644 --- a/docs/doxygen/Doxyfile +++ b/docs/doxygen/Doxyfile @@ -907,8 +907,11 @@ WARN_LOGFILE = # spaces. See also FILE_PATTERNS and EXTENSION_MAPPING # Note: If this tag is empty the current directory is searched. -INPUT = ../../README.md \ +INPUT = ../reference/index.rst \ ../../include/amd_smi/amdsmi.h + + + # This tag can be used to specify the character encoding of the source files # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses diff --git a/docs/how-to/using-AMD-SMI-CLI-tool.md b/docs/how-to/using-AMD-SMI-CLI-tool.md new file mode 100644 index 0000000000..1455ec380d --- /dev/null +++ b/docs/how-to/using-AMD-SMI-CLI-tool.md @@ -0,0 +1,1020 @@ +# Using AMD SMI Command Line Interface tool + +AMD-SMI reports the version and current platform detected when running the command line interface (CLI) without arguments: + +``` bash +~$ amd-smi +usage: amd-smi [-h] ... + +AMD System Management Interface | Version: 24.5.2.0 | ROCm version: 6.1.2 | Platform: Linux Baremetal + +options: + -h, --help show this help message and exit + +AMD-SMI Commands: + Descriptions: + version Display version information + list List GPU information + static Gets static information about the specified GPU + firmware (ucode) Gets firmware information about the specified GPU + bad-pages Gets bad page information about the specified GPU + metric Gets metric/performance information about the specified GPU + process Lists general process information running on the specified GPU + event Displays event information for the given GPU + topology Displays topology information of the devices + set Set options for devices + reset Reset options for devices + monitor Monitor metrics for target devices + xgmi Displays xgmi information of the devices +``` + +More detailed verison information is available from `amd-smi version` + +Each command will have detailed information via `amd-smi [command] --help` + +## Commands + +For convenience, here is the help output for each command + +``` bash +~$ amd-smi list --help +usage: amd-smi list [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] + [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] + +- Lists all the devices on the system and the links between devices. +- Lists all the sockets and for each socket, GPUs and/or CPUs associated to that socket alongside some basic information for each device. + +.. NOTE:: + +In virtualization environments, it can also list VFs associated to each GPU with some basic information for each VF. + +options: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +~$ amd-smi static --help +usage: amd-smi static [-h] [-g GPU [GPU ...]] [-a] [-b] [-V] [-d] [-v] [-c] [-B] [-r] [-p] + [-l] [-P] [-x] [-s] [-u] [--json | --csv] [--file FILE] + [--loglevel LEVEL] + +- If no GPU is specified, returns static information for all GPUs on the system. +- If no static argument is provided, all static information will be displayed. + +Static Arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -a, --asic All asic information + -b, --bus All bus information + -V, --vbios All video bios information (if available) + -d, --driver Displays driver version + -v, --vram All vram information + -c, --cache All cache information + -B, --board All board information + -r, --ras Displays RAS features information + -p, --partition Partition information + -l, --limit All limit metric values (i.e. power and thermal limits) + -s, --process-isolation The process isolation status + -u, --numa All numa node information + +CPU Arguments: + -s, --smu All SMU FW information + -i, --interface-ver Displays hsmp interface version + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +``` bash +~$ amd-smi firmware --help +usage: amd-smi firmware [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] + [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] [-f] + +If no GPU is specified, return firmware information for all GPUs on the system. + +Firmware Arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + -f, --ucode-list, --fw-list All FW list information + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +~$ amd-smi bad-pages --help +usage: amd-smi bad-pages [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] + [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] [-p] + [-r] [-u] + +If no GPU is specified, return bad page information for all GPUs on the system. + +Bad Pages Arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + -p, --pending Displays all pending retired pages + -r, --retired Displays retired pages + -u, --un-res Displays unreservable pages + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +~$ amd-smi metric --help +usage: amd-smi metric [-h] [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] + [-w INTERVAL] [-W TIME] [-i ITERATIONS] [-m] [-u] [-p] [-c] [-t] + [-P] [-e] [-k] [-f] [-C] [-o] [-l] [-x] [-E] [--cpu-power-metrics] + [--cpu-prochot] [--cpu-freq-metrics] [--cpu-c0-res] + [--cpu-lclk-dpm-level NBIOID] [--cpu-pwr-svi-telemtry-rails] + [--cpu-io-bandwidth IO_BW LINKID_NAME] + [--cpu-xgmi-bandwidth XGMI_BW LINKID_NAME] [--cpu-metrics-ver] + [--cpu-metrics-table] [--cpu-socket-energy] [--cpu-ddr-bandwidth] + [--cpu-temp] [--cpu-dimm-temp-range-rate DIMM_ADDR] + [--cpu-dimm-pow-consumption DIMM_ADDR] + [--cpu-dimm-thermal-sensor DIMM_ADDR] [--core-boost-limit] + [--core-curr-active-freq-core-limit] [--core-energy] + [--json | --csv] [--file FILE] [--loglevel LEVEL] + +- If no GPU is specified, returns metric information for all GPUs on the system. +- If no metric argument is provided all metric information will be displayed. + +Metric arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + -w, --watch INTERVAL Reprint the command in a loop of INTERVAL seconds + -W, --watch_time TIME The total TIME to watch the given command + -i, --iterations ITERATIONS Total number of ITERATIONS to loop on the given command + -m, --mem-usage Memory usage per block + -u, --usage Displays engine usage information + -p, --power Current power usage + -c, --clock Average, max, and current clock frequencies + -t, --temperature Current temperatures + -P, --pcie Current PCIe speed, width, and replay count + -e, --ecc Total number of ECC errors + -k, --ecc-blocks Number of ECC errors per block + -f, --fan Current fan speed + -C, --voltage-curve Display voltage curve + -o, --overdrive Current GPU clock overdrive level + -l, --perf-level Current DPM performance level + -x, --xgmi-err XGMI error information since last read + -E, --energy Amount of energy consumed + +CPU Arguments: + --cpu-power-metrics CPU power metrics + --cpu-prochot Displays prochot status + --cpu-freq-metrics Displays currentFclkMemclk frequencies and cclk frequency limit + --cpu-c0-res Displays C0 residency + --cpu-lclk-dpm-level NBIOID Displays lclk dpm level range. Requires socket ID and NBOID as inputs + --cpu-pwr-svi-telemtry-rails Displays svi based telemetry for all rails + --cpu-io-bandwidth IO_BW LINKID_NAME Displays current IO bandwidth for the selected CPU. + input parameters are bandwidth type(1) and link ID encodings + i.e. P2, P3, G0 - G7 + --cpu-xgmi-bandwidth XGMI_BW LINKID_NAME Displays current XGMI bandwidth for the selected CPU + input parameters are bandwidth type(1,2,4) and link ID encodings + i.e. P2, P3, G0 - G7 + --cpu-metrics-ver Displays metrics table version + --cpu-metrics-table Displays metric table + --cpu-socket-energy Displays socket energy for the selected CPU socket + --cpu-ddr-bandwidth Displays per socket max ddr bw, current utilized bw, + and current utilized ddr bw in percentage + --cpu-temp Displays cpu socket temperature + --cpu-dimm-temp-range-rate DIMM_ADDR Displays dimm temperature range and refresh rate + --cpu-dimm-pow-consumption DIMM_ADDR Displays dimm power consumption + --cpu-dimm-thermal-sensor DIMM_ADDR Displays dimm thermal sensor + +CPU Core Arguments: + --core-boost-limit Get boost limit for the selected cores + --core-curr-active-freq-core-limit Get Current CCLK limit set per Core + --core-energy Displays core energy for the selected core + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +~$ amd-smi process --help +usage: amd-smi process [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] + [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] + [-w INTERVAL] [-W TIME] [-i ITERATIONS] [-G] [-e] [-p PID] + [-n NAME] + +- If no GPU is specified, returns information for all GPUs on the system. +- If no process argument is provided all process information will be displayed. + +Process arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + -w, --watch INTERVAL Reprint the command in a loop of INTERVAL seconds + -W, --watch_time TIME The total TIME to watch the given command + -i, --iterations ITERATIONS Total number of ITERATIONS to loop on the given command + -G, --general pid, process name, memory usage + -e, --engine All engine usages + -p, --pid PID Gets all process information about the specified process based on Process ID + -n, --name NAME Gets all process information about the specified process based on Process Name. + If multiple processes have the same name information is returned for all of them. + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +~$ amd-smi event --help +usage: amd-smi event [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] + [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] + +If no GPU is specified, returns event information for all GPUs on the system. + +Event Arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +~$ amd-smi topology --help +usage: amd-smi topology [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] + [-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]] [-a] + [-w] [-o] [-t] [-b] + +- If no GPU is specified, returns information for all GPUs on the system. +- If no topology argument is provided all topology information will be displayed. + +Topology arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + -a, --access Displays link accessibility between GPUs + -w, --weight Displays relative weight between GPUs + -o, --hops Displays the number of hops between GPUs + -t, --link-type Displays the link type between GPUs + -b, --numa-bw Display max and min bandwidth between nodes + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +usage: amd-smi set [-h] (-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]) [-f %] + [-l LEVEL] [-P SETPROFILE] [-d SCLKMAX] [-C PARTITION] [-M PARTITION] + [-o WATTS] [-p POLICY] [-i STATUS] [--cpu-pwr-limit PWR_LIMIT] + [--cpu-xgmi-link-width MIN_WIDTH MAX_WIDTH] + [--cpu-lclk-dpm-level NBIOID MIN_DPM MAX_DPM] [--cpu-pwr-eff-mode MODE] + [--cpu-gmi3-link-width MIN_LW MAX_LW] [--cpu-pcie-link-rate LINK_RATE] + [--cpu-df-pstate-range MAX_PSTATE MIN_PSTATE] [--cpu-enable-apb] + [--cpu-disable-apb DF_PSTATE] [--soc-boost-limit BOOST_LIMIT] + [--core-boost-limit BOOST_LIMIT] [-c] [--json | --csv] [--file FILE] + [--loglevel LEVEL] + +- A GPU must be specified to set a configuration. +- A set argument must be provided; Multiple set arguments are accepted + +Set Arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + -f, --fan % Set GPU fan speed (0-255 or 0-100%) + -l, --perf-level LEVEL Set performance level + -P, --profile SETPROFILE Set power profile level (#) or a quoted string of custom profile attributes + -d, --perf-determinism SCLKMAX Set GPU clock frequency limit and performance level to determinism to get minimal performance variation + -C, --compute-partition PARTITION Set one of the following the compute partition modes: + CPX, SPX, DPX, TPX, QPX + -M, --memory-partition PARTITION Set one of the following the memory partition modes: + NPS1, NPS2, NPS4, NPS8 + -o, --power-cap WATTS Set power capacity limit + -p, --dpm-policy POLICY_ID Set the GPU DPM policy using policy id + -x, --xgmi-plpd POLICY_ID Set the GPU XGMI per-link power down policy using policy id + -i, --process-isolation STATUS Enable or disable the GPU process isolation: 0 for disable and 1 for enable. + -c, --clear-sram-data Clear the GPU SRAM data + +CPU Arguments: + --cpu-pwr-limit PWR_LIMIT Set power limit for the given socket. Input parameter is power limit value. + --cpu-xgmi-link-width MIN_WIDTH MAX_WIDTH Set max and Min linkwidth. Input parameters are min and max link width values + --cpu-lclk-dpm-level NBIOID MIN_DPM MAX_DPM Sets the max and min dpm level on a given NBIO. + Input parameters are die_index, min dpm, max dpm. + --cpu-pwr-eff-mode MODE Sets the power efficency mode policy. Input parameter is mode. + --cpu-gmi3-link-width MIN_LW MAX_LW Sets max and min gmi3 link width range + --cpu-pcie-link-rate LINK_RATE Sets pcie link rate + --cpu-df-pstate-range MAX_PSTATE MIN_PSTATE Sets max and min df-pstates + --cpu-enable-apb Enables the DF p-state performance boost algorithm + --cpu-disable-apb DF_PSTATE Disables the DF p-state performance boost algorithm. Input parameter is DFPstate (0-3) + --soc-boost-limit BOOST_LIMIT Sets the boost limit for the given socket. Input parameter is socket BOOST_LIMIT value + +CPU Core Arguments: + --core-boost-limit BOOST_LIMIT Sets the boost limit for the given core. Input parameter is core BOOST_LIMIT value + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +```bash +~$ amd-smi reset --help +usage: amd-smi reset [-h] [--json | --csv] [--file FILE] [--loglevel LEVEL] + (-g GPU [GPU ...] | -U CPU [CPU ...] | -O CORE [CORE ...]) [-G] [-c] + [-f] [-p] [-x] [-d] [-C] [-M] [-o] + +- A GPU must be specified to reset a configuration. +- A reset argument must be provided; Multiple reset arguments are accepted + +Reset Arguments: + -h, --help show this help message and exit + -g, --gpu GPU [GPU ...] Select a GPU ID, BDF, or UUID from the possible choices: + ID: 0 | BDF: 0000:01:00.0 | UUID: 71ff74a0-0000-1000-8066-0a3c71d5f817 + ID: 1 | BDF: 0001:01:00.0 | UUID: b4ff74a0-0000-1000-80b2-fa0be8628b1a + ID: 2 | BDF: 0002:01:00.0 | UUID: a9ff74a0-0000-1000-8007-3066a98ba4a6 + ID: 3 | BDF: 0003:01:00.0 | UUID: 53ff74a0-0000-1000-80a0-a1ff3830f499 + all | Selects all devices + -U, --cpu CPU [CPU ...] Select a CPU ID from the possible choices: + ID: 0 + ID: 1 + ID: 2 + ID: 3 + all | Selects all devices + -O, --core CORE [CORE ...] Select a Core ID from the possible choices: + ID: 0 - 95 + all | Selects all devices + -G, --gpureset Reset the specified GPU + -c, --clocks Reset clocks and overdrive to default + -f, --fans Reset fans to automatic (driver) control + -p, --profile Reset power profile back to default + -x, --xgmierr Reset XGMI error counts + -d, --perf-determinism Disable performance determinism + -C, --compute-partition Reset compute partitions on the specified GPU + -M, --memory-partition Reset memory partitions on the specified GPU + -o, --power-cap Reset power capacity limit to max capable + +Command Modifiers: + --json Displays output in JSON format (human readable by default). + --csv Displays output in CSV format (human readable by default). + --file FILE Saves output into a file on the provided path (stdout by default). + --loglevel LEVEL Set the logging level from the possible choices: + DEBUG, INFO, WARNING, ERROR, CRITICAL +``` + +### Example output from amd-smi static + +Here is some example output from the tool: + +```bash +~$ amd-smi static +CPU: 0 + SMU: + FW_VERSION: 85:81:0 + INTERFACE_VERSION: + PROTO VERSION: 6 + +CPU: 1 + SMU: + FW_VERSION: 85:81:0 + INTERFACE_VERSION: + PROTO VERSION: 6 + +CPU: 2 + SMU: + FW_VERSION: 85:81:0 + INTERFACE_VERSION: + PROTO VERSION: 6 + +CPU: 3 + SMU: + FW_VERSION: 85:81:0 + INTERFACE_VERSION: + PROTO VERSION: 6 + +GPU: 0 + ASIC: + MARKET_NAME: MI300A + VENDOR_ID: 0x1002 + VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI] + SUBVENDOR_ID: 0x1002 + DEVICE_ID: 0x74a0 + REV_ID: 0x0 + ASIC_SERIAL: 0x71660A3C71D5F817 + OAM_ID: 0 + BUS: + BDF: 0000:01:00.0 + MAX_PCIE_WIDTH: 16 + MAX_PCIE_SPEED: 32 GT/s + PCIE_INTERFACE_VERSION: Gen 5 + SLOT_TYPE: PCIE + VBIOS: + NAME: N/A + BUILD_DATE: N/A + PART_NUMBER: N/A + VERSION: N/A + LIMIT: + MAX_POWER: 550 W + SOCKET_POWER: 550 W + SLOWDOWN_EDGE_TEMPERATURE: N/A + SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C + SLOWDOWN_VRAM_TEMPERATURE: 95 °C + SHUTDOWN_EDGE_TEMPERATURE: N/A + SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C + SHUTDOWN_VRAM_TEMPERATURE: 105 °C + DRIVER: + NAME: amdgpu + VERSION: 6.7.0 + BOARD: + MODEL_NUMBER: N/A + PRODUCT_SERIAL: N/A + FRU_ID: N/A + PRODUCT_NAME: N/A + MANUFACTURER_NAME: N/A + RAS: + EEPROM_VERSION: 0x0 + PARITY_SCHEMA: DISABLED + SINGLE_BIT_SCHEMA: DISABLED + DOUBLE_BIT_SCHEMA: DISABLED + POISON_SCHEMA: ENABLED + ECC_BLOCK_STATE: + UMC: DISABLED + SDMA: ENABLED + GFX: ENABLED + MMHUB: ENABLED + ATHUB: DISABLED + PCIE_BIF: DISABLED + HDP: DISABLED + XGMI_WAFL: DISABLED + DF: DISABLED + SMN: DISABLED + SEM: DISABLED + MP0: DISABLED + MP1: DISABLED + FUSE: DISABLED + PARTITION: + COMPUTE_PARTITION: SPX + MEMORY_PARTITION: NPS1 + DPM_POLICY: + NUM_SUPPORTED: 4 + CURRENT_ID: 1 + POLICIES: + POLICY_ID: 0 + POLICY_DESCRIPTION: pstate_default + POLICY_ID: 1 + POLICY_DESCRIPTION: soc_pstate_0 + POLICY_ID: 2 + POLICY_DESCRIPTION: soc_pstate_1 + POLICY_ID: 3 + POLICY_DESCRIPTION: soc_pstate_2 + XGMI_PLPD: + NUM_SUPPORTED: 3 + CURRENT_ID: 1 + PLPDS: + POLICY_ID: 0 + POLICY_DESCRIPTION: plpd_disallow + POLICY_ID: 1 + POLICY_DESCRIPTION: plpd_default + POLICY_ID: 2 + POLICY_DESCRIPTION: plpd_optimized + NUMA: + NODE: 0 + AFFINITY: 0 + VRAM: + TYPE: HBM + VENDOR: N/A + SIZE: 96432 MB + CACHE_INFO: + CACHE_0: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 464 + CACHE_1: + CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE + CACHE_SIZE: 64 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 160 + CACHE_2: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32768 KB + CACHE_LEVEL: 2 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + CACHE_3: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 262144 KB + CACHE_LEVEL: 3 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + +GPU: 1 + ASIC: + MARKET_NAME: MI300A + VENDOR_ID: 0x1002 + VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI] + SUBVENDOR_ID: 0x1002 + DEVICE_ID: 0x74a0 + REV_ID: 0x0 + ASIC_SERIAL: 0xB4B2FA0BE8628B1A + OAM_ID: 1 + BUS: + BDF: 0001:01:00.0 + MAX_PCIE_WIDTH: 16 + MAX_PCIE_SPEED: 32 GT/s + PCIE_INTERFACE_VERSION: Gen 5 + SLOT_TYPE: PCIE + VBIOS: + NAME: N/A + BUILD_DATE: N/A + PART_NUMBER: N/A + VERSION: N/A + LIMIT: + MAX_POWER: 550 W + SOCKET_POWER: 550 W + SLOWDOWN_EDGE_TEMPERATURE: N/A + SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C + SLOWDOWN_VRAM_TEMPERATURE: 95 °C + SHUTDOWN_EDGE_TEMPERATURE: N/A + SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C + SHUTDOWN_VRAM_TEMPERATURE: 105 °C + DRIVER: + NAME: amdgpu + VERSION: 6.7.0 + BOARD: + MODEL_NUMBER: N/A + PRODUCT_SERIAL: N/A + FRU_ID: N/A + PRODUCT_NAME: N/A + MANUFACTURER_NAME: N/A + RAS: + EEPROM_VERSION: 0x0 + PARITY_SCHEMA: DISABLED + SINGLE_BIT_SCHEMA: DISABLED + DOUBLE_BIT_SCHEMA: DISABLED + POISON_SCHEMA: ENABLED + ECC_BLOCK_STATE: + UMC: DISABLED + SDMA: ENABLED + GFX: ENABLED + MMHUB: ENABLED + ATHUB: DISABLED + PCIE_BIF: DISABLED + HDP: DISABLED + XGMI_WAFL: DISABLED + DF: DISABLED + SMN: DISABLED + SEM: DISABLED + MP0: DISABLED + MP1: DISABLED + FUSE: DISABLED + PARTITION: + COMPUTE_PARTITION: SPX + MEMORY_PARTITION: NPS1 + DPM_POLICY: + NUM_SUPPORTED: 4 + CURRENT_ID: 1 + POLICIES: + POLICY_ID: 0 + POLICY_DESCRIPTION: pstate_default + POLICY_ID: 1 + POLICY_DESCRIPTION: soc_pstate_0 + POLICY_ID: 2 + POLICY_DESCRIPTION: soc_pstate_1 + POLICY_ID: 3 + POLICY_DESCRIPTION: soc_pstate_2 + XGMI_PLPD: + NUM_SUPPORTED: 3 + CURRENT_ID: 1 + PLPDS: + POLICY_ID: 0 + POLICY_DESCRIPTION: plpd_disallow + POLICY_ID: 1 + POLICY_DESCRIPTION: plpd_default + POLICY_ID: 2 + POLICY_DESCRIPTION: plpd_optimized + NUMA: + NODE: 1 + AFFINITY: 1 + VRAM: + TYPE: HBM + VENDOR: N/A + SIZE: 96432 MB + CACHE_INFO: + CACHE_0: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 464 + CACHE_1: + CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE + CACHE_SIZE: 64 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 160 + CACHE_2: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32768 KB + CACHE_LEVEL: 2 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + CACHE_3: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 262144 KB + CACHE_LEVEL: 3 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + +GPU: 2 + ASIC: + MARKET_NAME: MI300A + VENDOR_ID: 0x1002 + VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI] + SUBVENDOR_ID: 0x1002 + DEVICE_ID: 0x74a0 + REV_ID: 0x0 + ASIC_SERIAL: 0xA9073066A98BA4A6 + OAM_ID: 2 + BUS: + BDF: 0002:01:00.0 + MAX_PCIE_WIDTH: 16 + MAX_PCIE_SPEED: 32 GT/s + PCIE_INTERFACE_VERSION: Gen 5 + SLOT_TYPE: PCIE + VBIOS: + NAME: N/A + BUILD_DATE: N/A + PART_NUMBER: N/A + VERSION: N/A + LIMIT: + MAX_POWER: 550 W + SOCKET_POWER: 550 W + SLOWDOWN_EDGE_TEMPERATURE: N/A + SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C + SLOWDOWN_VRAM_TEMPERATURE: 95 °C + SHUTDOWN_EDGE_TEMPERATURE: N/A + SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C + SHUTDOWN_VRAM_TEMPERATURE: 105 °C + DRIVER: + NAME: amdgpu + VERSION: 6.7.0 + BOARD: + MODEL_NUMBER: N/A + PRODUCT_SERIAL: N/A + FRU_ID: N/A + PRODUCT_NAME: N/A + MANUFACTURER_NAME: N/A + RAS: + EEPROM_VERSION: 0x0 + PARITY_SCHEMA: DISABLED + SINGLE_BIT_SCHEMA: DISABLED + DOUBLE_BIT_SCHEMA: DISABLED + POISON_SCHEMA: ENABLED + ECC_BLOCK_STATE: + UMC: DISABLED + SDMA: ENABLED + GFX: ENABLED + MMHUB: ENABLED + ATHUB: DISABLED + PCIE_BIF: DISABLED + HDP: DISABLED + XGMI_WAFL: DISABLED + DF: DISABLED + SMN: DISABLED + SEM: DISABLED + MP0: DISABLED + MP1: DISABLED + FUSE: DISABLED + PARTITION: + COMPUTE_PARTITION: SPX + MEMORY_PARTITION: NPS1 + DPM_POLICY: + NUM_SUPPORTED: 4 + CURRENT_ID: 1 + POLICIES: + POLICY_ID: 0 + POLICY_DESCRIPTION: pstate_default + POLICY_ID: 1 + POLICY_DESCRIPTION: soc_pstate_0 + POLICY_ID: 2 + POLICY_DESCRIPTION: soc_pstate_1 + POLICY_ID: 3 + POLICY_DESCRIPTION: soc_pstate_2 + XGMI_PLPD: + NUM_SUPPORTED: 3 + CURRENT_ID: 1 + PLPDS: + POLICY_ID: 0 + POLICY_DESCRIPTION: plpd_disallow + POLICY_ID: 1 + POLICY_DESCRIPTION: plpd_default + POLICY_ID: 2 + POLICY_DESCRIPTION: plpd_optimized + NUMA: + NODE: 2 + AFFINITY: 2 + VRAM: + TYPE: HBM + VENDOR: N/A + SIZE: 96432 MB + CACHE_INFO: + CACHE_0: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 464 + CACHE_1: + CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE + CACHE_SIZE: 64 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 160 + CACHE_2: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32768 KB + CACHE_LEVEL: 2 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + CACHE_3: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 262144 KB + CACHE_LEVEL: 3 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + +GPU: 3 + ASIC: + MARKET_NAME: MI300A + VENDOR_ID: 0x1002 + VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI] + SUBVENDOR_ID: 0x1002 + DEVICE_ID: 0x74a0 + REV_ID: 0x0 + ASIC_SERIAL: 0x53A0A1FF3830F499 + OAM_ID: 3 + BUS: + BDF: 0003:01:00.0 + MAX_PCIE_WIDTH: 16 + MAX_PCIE_SPEED: 32 GT/s + PCIE_INTERFACE_VERSION: Gen 5 + SLOT_TYPE: PCIE + VBIOS: + NAME: N/A + BUILD_DATE: N/A + PART_NUMBER: N/A + VERSION: N/A + LIMIT: + MAX_POWER: 550 W + SOCKET_POWER: 550 W + SLOWDOWN_EDGE_TEMPERATURE: N/A + SLOWDOWN_HOTSPOT_TEMPERATURE: 100 °C + SLOWDOWN_VRAM_TEMPERATURE: 95 °C + SHUTDOWN_EDGE_TEMPERATURE: N/A + SHUTDOWN_HOTSPOT_TEMPERATURE: 110 °C + SHUTDOWN_VRAM_TEMPERATURE: 105 °C + DRIVER: + NAME: amdgpu + VERSION: 6.7.0 + BOARD: + MODEL_NUMBER: N/A + PRODUCT_SERIAL: N/A + FRU_ID: N/A + PRODUCT_NAME: N/A + MANUFACTURER_NAME: N/A + RAS: + EEPROM_VERSION: 0x0 + PARITY_SCHEMA: DISABLED + SINGLE_BIT_SCHEMA: DISABLED + DOUBLE_BIT_SCHEMA: DISABLED + POISON_SCHEMA: ENABLED + ECC_BLOCK_STATE: + UMC: DISABLED + SDMA: ENABLED + GFX: ENABLED + MMHUB: ENABLED + ATHUB: DISABLED + PCIE_BIF: DISABLED + HDP: DISABLED + XGMI_WAFL: DISABLED + DF: DISABLED + SMN: DISABLED + SEM: DISABLED + MP0: DISABLED + MP1: DISABLED + FUSE: DISABLED + PARTITION: + COMPUTE_PARTITION: SPX + MEMORY_PARTITION: NPS1 + DPM_POLICY: + NUM_SUPPORTED: 4 + CURRENT_ID: 1 + POLICIES: + POLICY_ID: 0 + POLICY_DESCRIPTION: pstate_default + POLICY_ID: 1 + POLICY_DESCRIPTION: soc_pstate_0 + POLICY_ID: 2 + POLICY_DESCRIPTION: soc_pstate_1 + POLICY_ID: 3 + POLICY_DESCRIPTION: soc_pstate_2 + XGMI_PLPD: + NUM_SUPPORTED: 3 + CURRENT_ID: 1 + PLPDS: + POLICY_ID: 0 + POLICY_DESCRIPTION: plpd_disallow + POLICY_ID: 1 + POLICY_DESCRIPTION: plpd_default + POLICY_ID: 2 + POLICY_DESCRIPTION: plpd_optimized + NUMA: + NODE: 3 + AFFINITY: 3 + VRAM: + TYPE: HBM + VENDOR: N/A + SIZE: 96432 MB + CACHE_INFO: + CACHE_0: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 464 + CACHE_1: + CACHE_PROPERTIES: INST_CACHE, SIMD_CACHE + CACHE_SIZE: 64 KB + CACHE_LEVEL: 1 + MAX_NUM_CU_SHARED: 2 + NUM_CACHE_INSTANCE: 160 + CACHE_2: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 32768 KB + CACHE_LEVEL: 2 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + CACHE_3: + CACHE_PROPERTIES: DATA_CACHE, SIMD_CACHE + CACHE_SIZE: 262144 KB + CACHE_LEVEL: 3 + MAX_NUM_CU_SHARED: 304 + NUM_CACHE_INSTANCE: 1 + +``` + diff --git a/docs/how-to/using-amdsmi-for-C++.rst b/docs/how-to/using-amdsmi-for-C++.rst new file mode 100644 index 0000000000..9428bc99cb --- /dev/null +++ b/docs/how-to/using-amdsmi-for-C++.rst @@ -0,0 +1,255 @@ +.. meta:: + :description: Using AMD SMI + :keywords: AMD, SMI, system, management, interface, ROCm + +******************************** +Usage Basics for the C Library +******************************** + +Device/Socket handles +---------------------- + +Many of the AMD SMI library's functions take a "socket handle" or "device handle." The socket is an abstraction of the hardware's physical socket. This will enable AMD SMI to provide a better representation of the hardware to the user. Although there is always one distinct GPU for a socket, the APU may have both GPU and CPU devices on the same socket. Moreover, for the MI200 series, it may have multiple GCDs. + +To discover the sockets in the system, `amdsmi_get_socket_handles()` is called to get a list of socket handles, which, in turn, can be used to query the devices in that socket using `amdsmi_get_processor_handles().` The device handler is used to distinguish the detected devices from one another. It is important to note that a device may end up with a different device handle after restarting the application, so a device handle should not be relied upon to be constant over the process. + +The list of socket handles discovered using `amdsmi_get_socket_handles()`, can also be used to query the CPUs in that socket using amdsmi_get_processor_handles_by_type(), which in turn can then be used to query the cores in that CPU using amdsmi_get_processor_handles_by_type() again. + + +Hello AMD SMI +-------------- + +The only required AMD SMI call for any program that wants to use AMD SMI is the `amdsmi_init()` call. This call initializes some internal data structures that subsequent AMD-SMI calls will use. A flag can be passed in the call if the application is only interested in a specific device type. + +When AMD SMI is no longer used, `amdsmi_shut_down()` should be called. This provides a way to release resources that AMD-SMI may have held. + +1) A simple "Hello World" type program that displays the temperature of detected devices would look like this: + +.. code-block:: + + #include + #include + #include "amd_smi/amdsmi.h" + + int main() { + amdsmi_status_t ret; + + // Init amdsmi for sockets and devices. Here we are only interested in AMD_GPUS. + ret = amdsmi_init(AMDSMI_INIT_AMD_GPUS); + + // Get all sockets + uint32_t socket_count = 0; + + // Get the socket count available in the system. + ret = amdsmi_get_socket_handles(&socket_count, nullptr); + + // Allocate the memory for the sockets + std::vector sockets(socket_count); + // Get the socket handles in the system + ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]); + + std::cout << "Total Socket: " << socket_count << std::endl; + + // For each socket, get identifier and devices + for (uint32_t i=0; i < socket_count; i++) { + // Get Socket info + char socket_info[128]; + ret = amdsmi_get_socket_info(sockets[i], 128, socket_info); + std::cout << "Socket " << socket_info<< std::endl; + + // Get the device count for the socket. + uint32_t device_count = 0; + ret = amdsmi_get_processor_handles(sockets[i], &device_count, nullptr); + + // Allocate the memory for the device handlers on the socket + std::vector processor_handles(device_count); + // Get all devices of the socket + ret = amdsmi_get_processor_handles(sockets[i], + &device_count, &processor_handles[0]); + + // For each device of the socket, get name and temperature. + for (uint32_t j=0; j < device_count; j++) { + // Get device type. Since the amdsmi is initialized with + // AMD_SMI_INIT_AMD_GPUS, the processor_type must be AMD_GPU. + processor_type_t processor_type; + ret = amdsmi_get_processor_type(processor_handles[j], &processor_type); + if (processor_type != AMD_GPU) { + std::cout << "Expect AMD_GPU device type!\n"; + return 1; + } + + // Get device name + amdsmi_board_info_t board_info; + ret = amdsmi_get_gpu_board_info(processor_handles[j], &board_info); + std::cout << "\tdevice " + << j <<"\n\t\tName:" << board_info.product_name << std::endl; + + // Get temperature + int64_t val_i64 = 0; + ret = amdsmi_get_temp_metric(processor_handles[j], TEMPERATURE_TYPE_EDGE, + AMDSMI_TEMP_CURRENT, &val_i64); + std::cout << "\t\tTemperature: " << val_i64 << "C" << std::endl; + } + } + + // Clean up resources allocated at amdsmi_init. It will invalidate sockets + // and devices pointers + ret = amdsmi_shut_down(); + + return 0; + } + + +2) A sample program that displays the power of detected cpus would look like this: + +.. code-block:: + + #include + #include + #include "amd_smi/amdsmi.h" + + int main(int argc, char **argv) { + amdsmi_status_t ret; + uint32_t socket_count = 0; + + // Initialize amdsmi for AMD CPUs + ret = amdsmi_init(AMDSMI_INIT_AMD_CPUS); + + ret = amdsmi_get_socket_handles(&socket_count, nullptr); + + // Allocate the memory for the sockets + std::vector sockets(socket_count); + + // Get the sockets of the system + ret = amdsmi_get_socket_handles(&socket_count, &sockets[0]); + + std::cout << "Total Socket: " << socket_count << std::endl; + + // For each socket, get cpus + for (uint32_t i = 0; i < socket_count; i++) { + uint32_t cpu_count = 0; + + // Set processor type as AMD_CPU + processor_type_t processor_type = AMD_CPU; + ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, nullptr, &cpu_count); + + // Allocate the memory for the cpus + std::vector plist(cpu_count); + + // Get the cpus for each socket + ret = amdsmi_get_processor_handles_by_type(sockets[i], processor_type, &plist[0], &cpu_count); + + for (uint32_t index = 0; index < plist.size(); index++) { + uint32_t socket_power; + std::cout<<"CPU "<(socket_power)/1000<::.` or `:.` in hexcode format. +Where: + +* `` is 4 hex digits long from 0000-FFFF interval +* `` is 2 hex digits long from 00-FF interval +* `` is 2 hex digits long from 00-1F interval +* `` is 1 hex digit long from 0-7 interval + +Output: device handle object + +Exceptions that can be thrown by `amdsmi_get_processor_handle_from_bdf` function: + +* `AmdSmiLibraryException` +* `AmdSmiBdfFormatException` + +Example: + +```python +try: + device = amdsmi_get_processor_handle_from_bdf("0000:23:00.0") + print(amdsmi_get_gpu_device_uuid(device)) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_device_bdf + +Description: Returns BDF of the given device + +Input parameters: + +* `processor_handle` dev for which to query + +Output: BDF string in form of `::.` in hexcode format. +Where: + +* `` is 4 hex digits long from 0000-FFFF interval +* `` is 2 hex digits long from 00-FF interval +* `` is 2 hex digits long from 00-1F interval +* `` is 1 hex digit long from 0-7 interval + +Exceptions that can be thrown by `amdsmi_get_gpu_device_bdf` function: + +* `AmdSmiParameterException` +* `AmdSmiLibraryException` + +Example: + +```python +try: + device = amdsmi_get_processor_handles()[0] + print("Device's bdf:", amdsmi_get_gpu_device_bdf(device)) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_device_uuid + +Description: Returns the UUID of the device + +Input parameters: + +* `processor_handle` dev for which to query + +Output: UUID string unique to the device + +Exceptions that can be thrown by `amdsmi_get_gpu_device_uuid` function: + +* `AmdSmiParameterException` +* `AmdSmiLibraryException` + +Example: + +```python +try: + device = amdsmi_get_processor_handles()[0] + print("Device UUID: ", amdsmi_get_gpu_device_uuid(device)) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_driver_info + +Description: Returns the info of the driver + +Input parameters: + +* `processor_handle` dev for which to query + +Output: Dictionary with fields + +Field | Content +---|--- +`driver_name` | driver name +`driver_version` | driver_version +`driver_date` | driver_date + +Exceptions that can be thrown by `amdsmi_get_gpu_driver_info` function: + +* `AmdSmiParameterException` +* `AmdSmiLibraryException` + +Example: + +```python +try: + device = amdsmi_get_processor_handles()[0] + print("Driver info: ", amdsmi_get_gpu_driver_info(device)) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_asic_info + +Description: Returns asic information for the given GPU + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Content +---|--- +`market_name` | market name +`vendor_id` | vendor id +`vendor_name` | vendor name +`device_id` | device id +`rev_id` | revision id +`asic_serial` | asic serial +`oam_id` | oam id + +Exceptions that can be thrown by `amdsmi_get_gpu_asic_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + asic_info = amdsmi_get_gpu_asic_info(device) + print(asic_info['market_name']) + print(hex(asic_info['vendor_id'])) + print(asic_info['vendor_name']) + print(hex(asic_info['device_id'])) + print(hex(asic_info['rev_id'])) + print(asic_info['asic_serial']) + print(asic_info['oam_id']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_power_cap_info + +Description: Returns dictionary of power capabilities as currently configured +on the given GPU. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Description +---|--- +`power_cap` | power capability +`dpm_cap` | dynamic power management capability +`default_power_cap` | default power capability +`min_power_cap` | min power capability +`max_power_cap` | max power capability + +Exceptions that can be thrown by `amdsmi_get_power_cap_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + power_info = amdsmi_get_power_cap_info(device) + print(power_info['power_cap']) + print(power_info['dpm_cap']) + print(power_info['default_power_cap']) + print(power_info['min_power_cap']) + print(power_info['max_power_cap']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_vram_info + +Description: Returns dictionary of vram information for the given GPU. + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Description +---|--- +`vram_type` | vram type +`vram_vendor` | vram vendor +`vram_size_mb` | vram size in mb + +Exceptions that can be thrown by `amdsmi_get_gpu_vram_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + vram_info = amdsmi_get_gpu_vram_info(device) + print(vram_info['vram_type']) + print(vram_info['vram_vendor']) + print(vram_info['vram_size_mb']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_cache_info + +Description: Returns a list of dictionaries containing cache information for the given GPU. + +Input parameters: + +* `processor_handle` device which to query + +Output: List of Dictionaries containing cache information following the schema below: + +Schema: + +``` +{ + cache_properties: + { + "type" : "array", + "items" : {"type" : "string"} + }, + cache_size: {"type" : "number"}, + cache_level: {"type" : "number"}, + max_num_cu_shared: {"type" : "number"}, + num_cache_instance: {"type" : "number"} +} + +``` + +Field | Description +---|--- +`cache_properties` | list of up to 4 cache property type strings. Ex. data ("DATA_CACHE"), instruction ("INST_CACHE"), CPU ("CPU_CACHE"), or SIMD ("SIMD_CACHE"). +`cache_size` | size of cache in KB +`cache_level` | level of cache +`max_num_cu_shared` | max number of compute units shared +`num_cache_instance` | number of cache instances + +Exceptions that can be thrown by `amdsmi_get_gpu_cache_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + cache_info = amdsmi_get_gpu_cache_info(device) + for cache_index, cache_values in cache_info.items(): + print(cache_values['cache_properties']) + print(cache_values['cache_size']) + print(cache_values['cache_level']) + print(cache_values['max_num_cu_shared']) + print(cache_values['num_cache_instance']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_vbios_info + +Description: Returns the static information for the VBIOS on the device. + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Description +---|--- +`name` | vbios name +`build_date` | vbios build date +`part_number` | vbios part number +`version` | vbios version string + +Exceptions that can be thrown by `amdsmi_get_gpu_vbios_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + vbios_info = amdsmi_get_gpu_vbios_info(device) + print(vbios_info['name']) + print(vbios_info['build_date']) + print(vbios_info['part_number']) + print(vbios_info['version']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_fw_info + +Description: Returns GPU firmware related information. + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Description +---|--- +`fw_list` | List of dictionaries that contain information about a certain firmware block + +Exceptions that can be thrown by `amdsmi_get_fw_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + firmware_list = amdsmi_get_fw_info(device)['fw_list'] + for firmware_block in firmware_list: + print(firmware_block['fw_name']) + # String formated hex or decimal value ie: 21.00.00.AC or 130 + print(firmware_block['fw_version']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_activity + +Description: Returns the engine usage for the given GPU. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary of activites to their respective usage percentage or 'N/A' if not supported + +Field | Description +---|--- +`gfx_activity` | graphics engine usage percentage (0 - 100) +`umc_activity` | memory engine usage percentage (0 - 100) +`mm_activity` | average multimedia engine usages in percentage (0 - 100) + +Exceptions that can be thrown by `amdsmi_get_gpu_activity` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + engine_usage = amdsmi_get_gpu_activity(device) + print(engine_usage['gfx_activity']) + print(engine_usage['umc_activity']) + print(engine_usage['mm_activity']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_power_info + +Description: Returns the current power and voltage for the given GPU. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Description +---|--- +`average_socket_power` | average socket power +`gfx_voltage` | voltage gfx +`power_limit` | power limit + +Exceptions that can be thrown by `amdsmi_get_power_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + power_measure = amdsmi_get_power_info(device) + print(power_measure['average_socket_power']) + print(power_measure['gfx_voltage']) + print(power_measure['power_limit']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_vram_usage + +Description: Returns total VRAM and VRAM in use + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Description +---|--- +`vram_total` | VRAM total +`vram_used` | VRAM currently in use + +Exceptions that can be thrown by `amdsmi_get_gpu_vram_usage` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + vram_usage = amdsmi_get_gpu_vram_usage(device) + print(vram_usage['vram_used']) + print(vram_usage['vram_total']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_clock_info + +Description: Returns the clock measure for the given GPU. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query +* `clock_type` one of `AmdSmiClkType` enum values: + +Field | Description +---|--- +`SYS` | SYS clock type +`GFX` | GFX clock type +`DF` | DF clock type +`DCEF` | DCEF clock type +`SOC` | SOC clock type +`MEM` | MEM clock type +`PCIE` | PCIE clock type +`VCLK0` | VCLK0 clock type +`VCLK1` | VCLK1 clock type +`DCLK0` | DCLK0 clock type +`DCLK1` | DCLK1 clock type + +Output: Dictionary with fields + +Field | Description +---|--- +`cur_clk` | Current clock for given clock type +`max_clk` | Maximum clock for given clock type +`min_clk` | Minimum clock for given clock type + +Exceptions that can be thrown by `amdsmi_get_clock_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + clock_measure = amdsmi_get_clock_info(device, AmdSmiClkType.GFX) + print(clock_measure['cur_clk']) + print(clock_measure['min_clk']) + print(clock_measure['max_clk']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_pcie_info + +Description: Returns the pcie metric and static information for the given GPU. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with 2 fields `pcie_static` and `pcie_metric` + +Fields | Description +---|--- +`pcie_static` |
Subfield Description
`max_pcie_width`Maximum number of pcie lanes available
`max_pcie_speed`Maximum capable pcie speed in GT/s
`pcie_interface_version`PCIe generation ie. 3,4,5...
`slot_type`The type of form factor of the slot: OAM, PCIE, CEM, or Unknown
+`pcie_metric` |
Subfield Description
`pcie_width`Current number of pcie lanes available
`pcie_speed`Current pcie speed capable in GT/s
`pcie_bandwidth`Current instantaneous bandwidth usage in Mb/s
`pcie_replay_count`Total number of PCIe replays (NAKs)
`pcie_l0_to_recovery_count`PCIE L0 to recovery state transition accumulated count
`pcie_replay_roll_over_count`PCIe Replay accumulated count
`pcie_nak_sent_count`PCIe NAK sent accumulated count
`pcie_nak_received_count`PCIe NAK received accumulated count
+ +Exceptions that can be thrown by `amdsmi_get_pcie_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + pcie_info = amdsmi_get_pcie_info(device) + print(pcie_info["pcie_static"]) + print(pcie_info["pcie_metric"]) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_bad_page_info + +Description: Returns bad page info for the given GPU. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: List consisting of dictionaries with fields for each bad page found + +Field | Description +---|--- +`value` | Value of page +`page_address` | Address of bad page +`page_size` | Size of bad page +`status` | Status of bad page + +Exceptions that can be thrown by `amdsmi_get_gpu_bad_page_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + bad_page_info = amdsmi_get_gpu_bad_page_info(device) + if not len(bad_page_info): + print("No bad pages found") + continue + for bad_page in bad_page_info: + print(bad_page["value"]) + print(bad_page["page_address"]) + print(bad_page["page_size"]) + print(bad_page["status"]) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_process_list + +Description: Returns the list of processes running on the target GPU; May require root level access + +Input parameters: + +* `processor_handle` device which to query + +Output: List of Dictionaries with the corresponding fields; empty list if no running process are detected + +Field | Description +---|--- +`name` | Name of process +`pid` | Process ID +`mem` | Process memory usage +`engine_usage` |
Subfield Description
`gfx`GFX engine usage in ns
`enc`Encode engine usage in ns
+`memory_usage` |
Subfield Description
`gtt_mem`GTT memory usage
`cpu_mem`CPU memory usage
`vram_mem`VRAM memory usage
+ +Exceptions that can be thrown by `amdsmi_get_gpu_process_list` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + processes = amdsmi_get_gpu_process_list(device) + if len(processes) == 0: + print("No processes running on this GPU") + else: + for process in processes: + print(process) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_total_ecc_count + +Description: Returns the ECC error count for the given GPU. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Description +---|--- +`correctable_count` | Correctable ECC error count +`uncorrectable_count` | Uncorrectable ECC error count +`deferred_count` | Deferred ECC error count + +Exceptions that can be thrown by `amdsmi_get_gpu_total_ecc_count` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + ecc_error_count = amdsmi_get_gpu_total_ecc_count(device) + print(ecc_error_count["correctable_count"]) + print(ecc_error_count["uncorrectable_count"]) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_board_info + +Description: Returns board info for the given GPU + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields correctable and uncorrectable + +Field | Description +---|--- +`model_number` | Board serial number +`product_serial` | Product serial +`fru_id` | FRU ID +`product_name` | Product name +`manufacturer_name` | Manufacturer name + +Exceptions that can be thrown by `amdsmi_get_gpu_board_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + device = amdsmi_get_processor_handle_from_bdf("0000:23.00.0") + board_info = amdsmi_get_gpu_board_info(device) + print(board_info["model_number"]) + print(board_info["product_serial"]) + print(board_info["fru_id"]) + print(board_info["product_name"]) + print(board_info["manufacturer_name"]) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_ras_feature_info + +Description: Returns RAS version and schema information +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: List containing dictionaries with fields + +Field | Description +---|--- +`eeprom_version` | eeprom version +`parity_schema` | parity schema +`single_bit_schema` | single bit schema +`double_bit_schema` | double bit schema +`poison_schema` | poison schema + +Exceptions that can be thrown by `amdsmi_get_gpu_ras_feature_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + ras_info = amdsmi_get_gpu_ras_feature_info(device) + print(ras_info) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_ras_block_features_enabled + +Description: Returns status of each RAS block for the given GPU. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: List containing dictionaries with fields for each RAS block + +Field | Description +---|--- +`block` | RAS block +`status` | RAS block status + +Exceptions that can be thrown by `amdsmi_get_gpu_ras_block_features_enabled` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + ras_block_features = amdsmi_get_gpu_ras_block_features_enabled(device) + print(ras_block_features) +except AmdSmiException as e: + print(e) +``` + +### AmdSmiEventReader class + +Description: Providing methods for event monitoring. This is context manager class. +Can be used with `with` statement for automatic cleanup. + +Methods: + +#### Constructor + +Description: Allocates a new event reader notifier to monitor different types of events for the given GPU + +Input parameters: + +* `processor_handle` device handle corresponding to the device on which to listen for events +* `event_types` list of event types from AmdSmiEvtNotificationType enum. Specifying which events to collect for the given device. + +Event Type | Description +---|------ +`VMFAULT` | VM page fault +`THERMAL_THROTTLE` | thermal throttle +`GPU_PRE_RESET` | gpu pre reset +`GPU_POST_RESET` | gpu post reset + +#### read + +Description: Reads events on the given device. When event is caught, device handle, message and event type are returned. Reading events stops when timestamp passes without event reading. + +Input parameters: + +* `timestamp` number of milliseconds to wait for an event to occur. If event does not happen monitoring is finished +* `num_elem` number of events. This is optional parameter. Default value is 10. + +#### stop + +Description: Any resources used by event notification for the the given device will be freed with this function. This can be used explicitly or +automatically using `with` statement, like in the examples below. This should be called either manually or automatically for every created AmdSmiEventReader object. + +Input parameters: `None` + +Example with manual cleanup of AmdSmiEventReader: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + event = AmdSmiEventReader(device[0], AmdSmiEvtNotificationType.GPU_PRE_RESET, AmdSmiEvtNotificationType.GPU_POST_RESET) + event.read(10000) +except AmdSmiException as e: + print(e) +finally: + event.stop() +``` + +Example with automatic cleanup using `with` statement: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + with AmdSmiEventReader(device[0], AmdSmiEvtNotificationType.GPU_PRE_RESET, AmdSmiEvtNotificationType.GPU_POST_RESET) as event: + event.read(10000) +except AmdSmiException as e: + print(e) + +``` + +### amdsmi_set_gpu_pci_bandwidth + +Description: Control the set of allowed PCIe bandwidths that can be used +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `bw_bitmask` A bitmask indicating the indices of the bandwidths that are +to be enabled (1) and disabled (0) + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_pci_bandwidth` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_pci_bandwidth(device, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_power_cap + +Description: Set the power cap value. It is not supported on virtual machine +guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_ind` a 0-based sensor index. Normally, this will be 0. If a +device has more than one sensor, it could be greater than 0 +* `cap` int that indicates the desired power cap, in microwatts + +Output: None + +Exceptions that can be thrown by `amdsmi_set_power_cap` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + power_cap = 250 * 1000000 + amdsmi_set_power_cap(device, 0, power_cap) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_power_profile + +Description: Set the power profile. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `reserved` Not currently used, set to 0 +* `profile` a amdsmi_power_profile_preset_masks_t that hold the mask of +the desired new power profile + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_power_profile` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + profile = ... + amdsmi_set_gpu_power_profile(device, 0, profile) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_clk_range + +Description: This function sets the clock range information. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `min_clk_value` minimum clock value for desired clock range +* `max_clk_value` maximum clock value for desired clock range +* `clk_type`AMDSMI_CLK_TYPE_SYS | AMDSMI_CLK_TYPE_MEM range type + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_clk_range` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_clk_range(device, 0, 1000, AmdSmiClkType.AMDSMI_CLK_TYPE_SYS) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_bdf_id + +Description: Get the unique PCI device identifier associated for a device + +Input parameters: + +* `processor_handle` device which to query + +Output: device bdf +The format of bdfid will be as follows: + +BDFID = ((DOMAIN & 0xffffffff) << 32) | ((BUS & 0xff) << 8) | + ((DEVICE & 0x1f) <<3 ) | (FUNCTION & 0x7) + +| Name | Field | +---------- | ------- | +| Domain | [64:32] | +| Reserved | [31:16] | +| Bus | [15: 8] | +| Device | [ 7: 3] | +| Function | [ 2: 0] | + +Exceptions that can be thrown by `amdsmi_get_gpu_bdf_id` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + bdfid = amdsmi_get_gpu_bdf_id(device) + print(bdfid) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_pci_bandwidth + +Description: Get the list of possible PCIe bandwidths that are available. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with the possible T/s values and associated number of lanes + +Field | Content +---|--- +`transfer_rate` | transfer_rate dictionary +`lanes` | lanes + +transfer_rate dictionary + +Field | Content +---|--- +`num_supported` | num_supported +`current` | current +`frequency` | list of frequency + +Exceptions that can be thrown by `amdsmi_get_gpu_pci_bandwidth` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + bandwidth = amdsmi_get_gpu_pci_bandwidth(device) + print(bandwidth) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_pci_throughput + +Description: Get PCIe traffic information. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with the fields + +Field | Content +---|--- +`sent` | number of bytes sent in 1 second +`received` | the number of bytes received +`max_pkt_sz` | maximum packet size + +Exceptions that can be thrown by `amdsmi_get_gpu_pci_throughput` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + pci = amdsmi_get_gpu_pci_throughput(device) + print(pci) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_pci_replay_counter + +Description: Get PCIe replay counter + +Input parameters: + +* `processor_handle` device which to query + +Output: counter value +The sum of the NAK's received and generated by the GPU + +Exceptions that can be thrown by `amdsmi_get_gpu_pci_replay_counter` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + counter = amdsmi_get_gpu_pci_replay_counter(device) + print(counter) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_topo_numa_affinity + +Description: Get the NUMA node associated with a device + +Input parameters: + +* `processor_handle` device which to query + +Output: NUMA node value + +Exceptions that can be thrown by `amdsmi_get_gpu_topo_numa_affinity` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + numa_node = amdsmi_get_gpu_topo_numa_affinity(device) + print(numa_node) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_energy_count + +Description: Get the energy accumulator counter of the device. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query + +Output: Dictionary with fields + +Field | Content +---|--- +`power` | power +`counter_resolution` | counter resolution +`timestamp` | timestamp + +Exceptions that can be thrown by `amdsmi_get_energy_count` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + power = amdsmi_get_energy_count(device) + print(power) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_memory_total + +Description: Get the total amount of memory that exists + +Input parameters: + +* `processor_handle` device which to query +* `mem_type` enum AmdSmiMemoryType + +Output: total amount of memory + +Exceptions that can be thrown by `amdsmi_get_gpu_memory_total` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + memory = amdsmi_get_gpu_memory_total(device) + print(memory) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_od_clk_info + +Description: This function sets the clock frequency information +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `level` AMDSMI_FREQ_IND_MIN|AMDSMI_FREQ_IND_MAX to set the minimum (0) +or maximum (1) speed +* `clk_value` value to apply to the clock range +* `clk_type` AMDSMI_CLK_TYPE_SYS | AMDSMI_CLK_TYPE_MEM range type + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_od_clk_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_od_clk_info( + device, + AmdSmiFreqInd.AMDSMI_FREQ_IND_MAX, + 1000, + AmdSmiClkType.AMDSMI_CLK_TYPE_SYS + ) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_memory_usage + +Description: Get the current memory usage + +Input parameters: + +* `processor_handle` device which to query +* `mem_type` enum AmdSmiMemoryType + +Output: the amount of memory currently being used + +Exceptions that can be thrown by `amdsmi_get_gpu_memory_usage` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + memory = amdsmi_get_gpu_memory_usage(device) + print(memory) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_od_volt_info + +Description: This function sets 1 of the 3 voltage curve points. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `vpoint` voltage point [0|1|2] on the voltage curve +* `clk_value` clock value component of voltage curve point +* `volt_value` voltage value component of voltage curve point + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_od_volt_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_od_volt_info(device, 1, 1000, 980) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_fan_rpms + +Description: Get the fan speed in RPMs of the device with the specified device +handle and 0-based sensor index. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_idx` a 0-based sensor index. Normally, this will be 0. If a device has +more than one sensor, it could be greater than 0. + +Output: Fan speed in rpms as integer + +Exceptions that can be thrown by `amdsmi_get_gpu_fan_rpms` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + fan_rpm = amdsmi_get_gpu_fan_rpms(device, 0) + print(fan_rpm) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_fan_speed + +Description: Get the fan speed for the specified device as a value relative to +AMDSMI_MAX_FAN_SPEED. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_idx` a 0-based sensor index. Normally, this will be 0. If a device has +more than one sensor, it could be greater than 0. + +Output: Fan speed in relative to MAX + +Exceptions that can be thrown by `amdsmi_get_gpu_fan_speed` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + fan_speed = amdsmi_get_gpu_fan_speed(device, 0) + print(fan_speed) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_fan_speed_max + +Description: Get the max fan speed of the device with provided device handle. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_idx` a 0-based sensor index. Normally, this will be 0. If a device has +more than one sensor, it could be greater than 0. + +Output: Max fan speed as integer + +Exceptions that can be thrown by `amdsmi_get_gpu_fan_speed_max` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + max_fan_speed = amdsmi_get_gpu_fan_speed_max(device, 0) + print(max_fan_speed) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_is_gpu_power_management_enabled + +Description: Returns is power management enabled + +Input parameters: + +* `processor_handle` GPU device which to query + +Output: Bool true if power management enabled else false + +Exceptions that can be thrown by `amdsmi_is_gpu_power_management_enabled` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for processor in devices: + is_power_management_enabled = amdsmi_is_gpu_power_management_enabled(processor) + print(is_power_management_enabled) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_temp_metric + +Description: Get the temperature metric value for the specified metric, from the +specified temperature sensor on the specified device. It is not supported on virtual +machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_type` part of device from which temperature should be obtained +* `metric` enum indicated which temperature value should be retrieved + +Output: Temperature as integer in millidegrees Celcius + +Exceptions that can be thrown by `amdsmi_get_temp_metric` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + temp_metric = amdsmi_get_temp_metric(device, AmdSmiTemperatureType.EDGE, + AmdSmiTemperatureMetric.CURRENT) + print(temp_metric) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_volt_metric + +Description: Get the voltage metric value for the specified metric, from the +specified voltage sensor on the specified device. It is not supported on virtual +machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_type` part of device from which voltage should be obtained +* `metric` enum indicated which voltage value should be retrieved + +Output: Voltage as integer in millivolts + +Exceptions that can be thrown by `amdsmi_get_gpu_volt_metric` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + voltage = amdsmi_get_gpu_volt_metric(device, AmdSmiVoltageType.VDDGFX, + AmdSmiVoltageMetric.AVERAGE) + print(voltage) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_utilization_count + +Description: Get coarse grain utilization counter of the specified device + +Input parameters: + +* `processor_handle` handle for the given device +* `counter_types` variable number of counter types desired + +Output: List containing dictionaries with fields + +Field | Description +---|--- +`timestamp` | The timestamp when the counter is retreived - Resolution: 1 ns +`Dictionary for each counter` |
Subfield Description
`type`Type of utilization counter
`value`Value gotten for utilization counter
+ +Exceptions that can be thrown by `amdsmi_get_utilization_count` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + utilization = amdsmi_get_utilization_count( + device, + AmdSmiUtilizationCounterType.COARSE_GRAIN_GFX_ACTIVITY + ) + print(utilization) + utilization = amdsmi_get_utilization_count( + device, + AmdSmiUtilizationCounterType.COARSE_GRAIN_GFX_ACTIVITY, + AmdSmiUtilizationCounterType.COARSE_GRAIN_MEM_ACTIVITY + ) + print(utilization) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_perf_level + +Description: Get the performance level of the device with provided device handle. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: Performance level as enum value of dev_perf_level_t + +Exceptions that can be thrown by `amdsmi_get_gpu_perf_level` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + perf_level = amdsmi_get_gpu_perf_level(dev) + print(perf_level) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_perf_determinism_mode + +Description: Enter performance determinism mode with provided device handle. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `clkvalue` softmax value for GFXCLK in MHz + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_perf_determinism_mode` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_perf_determinism_mode(device, 1333) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_process_isolation + +Description: Get the status of the Process Isolation + +Input parameters: + +* `processor_handle` handle for the given device + +Output: integer corresponding to isolation_status; 0 - disabled, 1 - enabled + +Exceptions that can be thrown by `amdsmi_get_gpu_process_isolation` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + isolate = amdsmi_get_gpu_process_isolation(device) + print("Process Isolation Status: ", isolate) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_process_isolation +Description: Enable/disable the system Process Isolation for the given device handle. + +Input parameters: + +* `processor_handle` handle for the given device +* `pisolate` the process isolation status to set. 0 is the process isolation disabled, and 1 is the process isolation enabled. + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_process_isolation` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_process_isolation(device, 1) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_clear_sram_data +Description: Clear the SRAM data of the given device. This can be called between user logins to prevent information leak. + +Input parameters: + +* `processor_handle` handle for the given device +* `sclean` the clean flag. Only 1 will take effect and other number are reserved for future usage. + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_clear_sram_data` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_clear_sram_data(device, 1) +except AmdSmiException as e: + print(e) +``` + + +### amdsmi_get_gpu_overdrive_level + +Description: Get the overdrive percent associated with the device with provided +device handle. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: Overdrive percentage as integer + +Exceptions that can be thrown by `amdsmi_get_gpu_overdrive_level` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + od_level = amdsmi_get_gpu_overdrive_level(dev) + print(od_level) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_clk_freq + +Description: Get the list of possible system clock speeds of device for a +specified clock type. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `clk_type` the type of clock for which the frequency is desired + +Output: Dictionary with fields + +Field | Description +---|--- +`num_supported` | The number of supported frequencies +`current` | The current frequency index +`frequency` | List of frequencies, only the first num_supported frequencies are valid + +Exceptions that can be thrown by `amdsmi_get_clk_freq` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_get_clk_freq(device, AmdSmiClkType.SYS) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_od_volt_info + +Description: This function retrieves the voltage/frequency curve information +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: Dictionary with fields + +Field | Description +---|--- +`curr_sclk_range` |
Subfield Description
`lower_bound`lower bound sclk range
`upper_bound`upper bound sclk range
+`curr_mclk_range` |
Subfield Description
`lower_bound`lower bound mclk range
`upper_bound`upper bound mclk range
+`sclk_freq_limits` |
Subfield Description
`lower_bound`lower bound sclk range limt
`upper_bound`upper bound sclk range limit
+`mclk_freq_limits` |
Subfield Description
`lower_bound`lower bound mclk range limit
`upper_bound`upper bound mclk range limit
+`curve.vc_points` | The number of supported frequencies +`num_regions` | The current frequency index + +Exceptions that can be thrown by `amdsmi_get_gpu_od_volt_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_get_gpu_od_volt_info(dev) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_metrics_info + +Description: This function retrieves the gpu metrics information. It is not +supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: Dictionary with fields + +| Field | Description |Unit| +|-------|-------------|----| +`temperature_edge` | Edge temperature value | Celsius (C) +`temperature_hotspot` | Hotspot (aka junction) temperature value | Celsius (C) +`temperature_mem` | Memory temperature value | Celsius (C) +`temperature_vrgfx` | vrgfx temperature value | Celsius (C) +`temperature_vrsoc` | vrsoc temperature value | Celsius (C) +`temperature_vrmem` | vrmem temperature value | Celsius (C) +`average_gfx_activity` | Average gfx activity | % +`average_umc_activity` | Average umc (Universal Memory Controller) activity | % +`average_mm_activity` | Average mm (multimedia) engine activity | % +`average_socket_power` | Average socket power | W +`energy_accumulator` | Energy accumulated with a 15.3 uJ resolution over 1ns | uJ +`system_clock_counter` | System clock counter | ns +`average_gfxclk_frequency` | Average gfx clock frequency | MHz +`average_socclk_frequency` | Average soc clock frequency | MHz +`average_uclk_frequency` | Average uclk frequency | MHz +`average_vclk0_frequency` | Average vclk0 frequency | MHz +`average_dclk0_frequency` | Average dclk0 frequency | MHz +`average_vclk1_frequency` | Average vclk1 frequency | MHz +`average_dclk1_frequency` | Average dclk1 frequency | MHz +`current_gfxclk` | Current gfx clock | MHz +`current_socclk` | Current soc clock | MHz +`current_uclk` | Current uclk | MHz +`current_vclk0` | Current vclk0 | MHz +`current_dclk0` | Current dclk0 | MHz +`current_vclk1` | Current vclk1 | MHz +`current_dclk1` | Current dclk1 | MHz +`throttle_status` | Current throttle status | MHz +`current_fan_speed` | Current fan speed | RPM +`pcie_link_width` | PCIe link width (number of lanes) | lanes +`pcie_link_speed` | PCIe link speed in 0.1 GT/s (Giga Transfers per second) | GT/s +`padding` | padding +`gfx_activity_acc` | gfx activity accumulated | % +`mem_activity_acc` | Memory activity accumulated | % +`temperature_hbm` | list of hbm temperatures | Celsius (C) +`firmware_timestamp` | timestamp from PMFW (10ns resolution) | ns +`voltage_soc` | soc voltage | mV +`voltage_gfx` | gfx voltage | mV +`voltage_mem` | mem voltage | mV +`indep_throttle_status` | ASIC independent throttle status (see drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h for bit flags) | +`current_socket_power` | Current socket power (also known as instant socket power) | W +`vcn_activity` | List of VCN encode/decode engine utilization per AID | % +`gfxclk_lock_status` | Clock lock status. Bits 0:7 correspond to each gfx clock engine instance. Bits 0:5 for APU/AID devices | +`xgmi_link_width` | XGMI bus width | lanes +`xgmi_link_speed` | XGMI bitrate | GB/s +`pcie_bandwidth_acc` | PCIe accumulated bandwidth | GB/s +`pcie_bandwidth_inst` | PCIe instantaneous bandwidth | GB/s +`pcie_l0_to_recov_count_acc` | PCIe L0 to recovery state transition accumulated count | +`pcie_replay_count_acc` | PCIe replay accumulated count | +`pcie_replay_rover_count_acc` | PCIe replay rollover accumulated count | +`xgmi_read_data_acc` | XGMI accumulated read data transfer size (KiloBytes) | KB +`xgmi_write_data_acc` | XGMI accumulated write data transfer size (KiloBytes) | KB +`current_gfxclks` | List of current gfx clock frequencies | MHz +`current_socclks` | List of current soc clock frequencies | MHz +`current_vclk0s` | List of current v0 clock frequencies | MHz +`current_dclk0s` | List of current d0 clock frequencies | MHz +`pcie_nak_sent_count_acc` | PCIe NAC sent count accumulated | +`pcie_nak_rcvd_count_acc` | PCIe NAC received count accumulated | +`jpeg_activity` | List of JPEG engine activity | % + +Exceptions that can be thrown by `amdsmi_get_gpu_metrics_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_get_gpu_metrics_info(dev) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_od_volt_curve_regions + +Description: This function will retrieve the current valid regions in the +frequency/voltage space. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `num_regions` number of freq volt regions + +Output: List containing a dictionary with fields for each freq volt region + +Field | Description +---|--- +`freq_range` |
Subfield Description
`lower_bound`lower bound freq range
`upper_bound`upper bound freq range
+`volt_range` |
Subfield Description
`lower_bound`lower bound volt range
`upper_bound`upper bound volt range
+ +Exceptions that can be thrown by `amdsmi_get_gpu_od_volt_curve_regions` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_get_gpu_od_volt_curve_regions(device, 3) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_power_profile_presets + +Description: Get the list of available preset power profiles and an indication of +which profile is currently active. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_idx` number of freq volt regions + +Output: Dictionary with fields + +Field | Description +---|--- +`available_profiles` | Which profiles are supported by this system +`current` | Which power profile is currently active +`num_profiles` | How many power profiles are available + +Exceptions that can be thrown by `amdsmi_get_gpu_power_profile_presets` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_get_gpu_power_profile_presets(device, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_gpu_counter_group_supported + +Description: Tell if an event group is supported by a given device. +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` device which to query +* `event_group` event group being checked for support + +Output: None + +Exceptions that can be thrown by `amdsmi_gpu_counter_group_supported` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_gpu_counter_group_supported(device, AmdSmiEventGroup.XGMI) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_gpu_create_counter + +Description: Creates a performance counter object + +Input parameters: + +* `processor_handle` device which to query +* `event_type` event group being checked for support + +Output: An event handle of the newly created performance counter object + +Exceptions that can be thrown by `amdsmi_gpu_create_counter` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + event_handle = amdsmi_gpu_create_counter(device, AmdSmiEventGroup.XGMI) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_gpu_destroy_counter + +Description: Destroys a performance counter object + +Input parameters: + +* `event_handle` event handle of the performance counter object + +Output: None + +Exceptions that can be thrown by `amdsmi_gpu_destroy_counter` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + event_handle = amdsmi_gpu_create_counter(device, AmdSmiEventGroup.XGMI) + amdsmi_gpu_destroy_counter(event_handle) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_gpu_control_counter + +Description: Issue performance counter control commands. It is not supported +on virtual machine guest + +Input parameters: + +* `event_handle` event handle of the performance counter object +* `counter_command` command being passed to counter as AmdSmiCounterCommand + +Output: None + +Exceptions that can be thrown by `amdsmi_gpu_control_counter` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + event_handle = amdsmi_gpu_create_counter(device, AmdSmiEventType.XGMI_1_REQUEST_TX) + amdsmi_gpu_control_counter(event_handle, AmdSmiCounterCommand.CMD_START) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_gpu_read_counter + +Description: Read the current value of a performance counter + +Input parameters: + +* `event_handle` event handle of the performance counter object + +Output: Dictionary with fields + +Field | Description +---|--- +`value` | Counter value +`time_enabled` | Time that the counter was enabled in nanoseconds +`time_running` | Time that the counter was running in nanoseconds + +Exceptions that can be thrown by `amdsmi_gpu_read_counter` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + event_handle = amdsmi_gpu_create_counter(device, AmdSmiEventType.XGMI_1_REQUEST_TX) + amdsmi_gpu_read_counter(event_handle) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_available_counters + +Description: Get the number of currently available counters. It is not supported +on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `event_group` event group being checked as AmdSmiEventGroup + +Output: Number of available counters for the given device of the inputted event group + +Exceptions that can be thrown by `amdsmi_get_gpu_available_counters` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + available_counters = amdsmi_get_gpu_available_counters(device, AmdSmiEventGroup.XGMI) + print(available_counters) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_perf_level + +Description: Set a desired performance level for given device. It is not +supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `perf_level` performance level being set as AmdSmiDevPerfLevel + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_perf_level` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_perf_level(device, AmdSmiDevPerfLevel.STABLE_PEAK) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_reset_gpu + +Description: Reset the gpu associated with the device with provided device handle +It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: None + +Exceptions that can be thrown by `amdsmi_reset_gpu` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_reset_gpu(device) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_fan_speed + +Description: Set the fan speed for the specified device with the provided speed, +in RPMs. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_idx` sensor index as integer +* `fan_speed` the speed to which the function will attempt to set the fan + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_fan_speed` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_fan_speed(device, 0, 1333) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_reset_gpu_fan + +Description: Reset the fan to automatic driver control. It is not +supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `sensor_idx` sensor index as integer + +Output: None + +Exceptions that can be thrown by `amdsmi_reset_gpu_fan` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_reset_gpu_fan(device, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_clk_freq + +Description: Control the set of allowed frequencies that can be used for the +specified clock. It is not supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `clk_type` the type of clock for which the set of frequencies will be modified +as AmdSmiClkType +* `freq_bitmask` bitmask indicating the indices of the frequencies that are to +be enabled (1) and disabled (0). Only the lowest ::amdsmi_frequencies_t.num_supported +bits of this mask are relevant. + +Output: None + +Exceptions that can be thrown by `amdsmi_set_clk_freq` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + freq_bitmask = 0 + amdsmi_set_clk_freq(device, AmdSmiClkType.GFX, freq_bitmask) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_dpm_policy + +Description: Get dpm policy information. + +Input parameters: + +* `processor_handle` handle for the given device +* `policy_id` the policy id to set. + +Output: Dictionary with fields + +Field | Description +---|--- +`num_supported` | total number of supported policies +`current_id` | current policy id +`policies` | list of dictionaries containing possible policies + +Exceptions that can be thrown by `amdsmi_get_dpm_policy` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + dpm_policies = amdsmi_get_dpm_policy(device) + print(dpm_policies) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_dpm_policy + +Description: Set the dpm policy to corresponding policy_id. Typically following: 0(default),1,2,3 + +Input parameters: + +* `processor_handle` handle for the given device +* `policy_id` the policy id to set. + +Output: None + +Exceptions that can be thrown by `amdsmi_set_dpm_policy` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_dpm_policy(device, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_xgmi_plpd + +Description: Set the xgmi per-link power down policy parameter for the processor + +Input parameters: + +* `processor_handle` handle for the given device +* `policy_id` the xgmi plpd id to set. + +Output: None + +Exceptions that can be thrown by `amdsmi_set_xgmi_plpd` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_xgmi_plpd(device, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_xgmi_plpd + +Description: Get the xgmi per-link power down policy parameter for the processor + +Input parameters: + +* `processor_handle` handle for the given device + +Output: Dict containing information about xgmi per-link power down policy + +Field | Description +---|--- +`num_supported` | The number of supported policies +`current_id` | The current policy index +`plpds` | List of policies. + +Exceptions that can be thrown by `amdsmi_get_xgmi_plpd` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + xgmi_plpd = amdsmi_get_xgmi_plpd(device) + print(xgmi_plpd) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_overdrive_level + +Description: **deprecated** Set the overdrive percent associated with the +device with provided device handle with the provided value. It is not +supported on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `overdrive_value` value to which the overdrive level should be set + +Output: None + +Exceptions that can be thrown by `amdsmi_set_gpu_overdrive_level` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_overdrive_level(device, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_ecc_count + +Description: Retrieve the error counts for a GPU block. It is not supported +on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `block` The block for which error counts should be retrieved + +Output: Dict containing information about error counts + +Field | Description +---|--- +`correctable_count` | Count of correctable errors +`uncorrectable_count` | Count of uncorrectable errors +`deferred_count` | Count of deferred errors + +Exceptions that can be thrown by `amdsmi_get_gpu_ecc_count` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + ecc_count = amdsmi_get_gpu_ecc_count(device, AmdSmiGpuBlock.UMC) + print(ecc_count) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_ecc_enabled + +Description: Retrieve the enabled ECC bit-mask. It is not supported on virtual +machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: Enabled ECC bit-mask + +Exceptions that can be thrown by `amdsmi_get_gpu_ecc_enabled` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + enabled = amdsmi_get_gpu_ecc_enabled(device) + print(enabled) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_ecc_status + +Description: Retrieve the ECC status for a GPU block. It is not supported +on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device +* `block` The block for which ECC status should be retrieved + +Output: ECC status for a requested GPU block + +Exceptions that can be thrown by `amdsmi_get_gpu_ecc_status` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + status = amdsmi_get_gpu_ecc_status(device, AmdSmiGpuBlock.UMC) + print(status) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_status_code_to_string + +Description: Get a description of a provided AMDSMI error status + +Input parameters: + +* `status` The error status for which a description is desired + +Output: String description of the provided error code + +Exceptions that can be thrown by `amdsmi_status_code_to_string` function: + +* `AmdSmiParameterException` + +Example: + +```python +try: + status_str = amdsmi_status_code_to_string(ctypes.c_uint32(0)) + print(status_str) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_compute_process_info + +Description: Get process information about processes currently using GPU + +Input parameters: None + +Output: List of python dicts each containing a process information + +Field | Description +---|--- +`process_id` | Process ID +`pasid` | PASID +`vram_usage` | VRAM usage +`sdma_usage` | SDMA usage in microseconds +`cu_occupancy` | Compute Unit usage in percents + +Exceptions that can be thrown by `amdsmi_get_gpu_compute_process_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` + +Example: + +```python +try: + procs = amdsmi_get_gpu_compute_process_info() + for proc in procs: + print(proc) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_compute_process_info_by_pid + +Description: Get process information about processes currently using GPU + +Input parameters: + +* `pid` The process ID for which process information is being requested + +Output: Dict containing a process information + +Field | Description +---|--- +`process_id` | Process ID +`pasid` | PASID +`vram_usage` | VRAM usage +`sdma_usage` | SDMA usage in microseconds +`cu_occupancy` | Compute Unit usage in percents + +Exceptions that can be thrown by `amdsmi_get_gpu_compute_process_info_by_pid` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + pid = 0 # << valid pid here + proc = amdsmi_get_gpu_compute_process_info_by_pid(pid) + print(proc) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_compute_process_gpus + +Description: Get the device indices currently being used by a process + +Input parameters: + +* `pid` The process id of the process for which the number of gpus currently being used is requested + +Output: List of indices of devices currently being used by the process + +Exceptions that can be thrown by `amdsmi_get_gpu_compute_process_gpus` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + pid = 0 # << valid pid here + indices = amdsmi_get_gpu_compute_process_gpus(pid) + print(indices) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_gpu_xgmi_error_status + +Description: Retrieve the XGMI error status for a device. It is not supported on +virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: XGMI error status for a requested device + +Exceptions that can be thrown by `amdsmi_gpu_xgmi_error_status` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + status = amdsmi_gpu_xgmi_error_status(device) + print(status) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_reset_gpu_xgmi_error + +Description: Reset the XGMI error status for a device. It is not supported +on virtual machine guest + +Input parameters: + +* `processor_handle` handle for the given device + +Output: None + +Exceptions that can be thrown by `amdsmi_reset_gpu_xgmi_error` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_reset_gpu_xgmi_error(device) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_vendor_name + +Description: Returns the device vendor name + +Input parameters: + +* `processor_handle` device which to query + +Output: device vendor name + +Exceptions that can be thrown by `amdsmi_get_gpu_vendor_name` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + vendor_name = amdsmi_get_gpu_vendor_name(device) + print(vendor_name) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_id + +Description: Get the device id associated with the device with provided device handler + +Input parameters: + +* `processor_handle` device which to query + +Output: device id + +Exceptions that can be thrown by `amdsmi_get_gpu_id` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + dev_id = amdsmi_get_gpu_id(device) + print(dev_id) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_vram_vendor + +Description: Get the vram vendor string of a gpu device. + +Input parameters: + +* `processor_handle` device which to query + +Output: vram vendor + +Exceptions that can be thrown by `amdsmi_get_gpu_vram_vendor` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + vram_vendor = amdsmi_get_gpu_vram_vendor(device) + print(vram_vendor) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_subsystem_id + +Description: Get the subsystem device id associated with the device with provided device handle. + +Input parameters: + +* `processor_handle` device which to query + +Output: subsystem device id + +Exceptions that can be thrown by `amdsmi_get_gpu_subsystem_id` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + id = amdsmi_get_gpu_subsystem_id(device) + print(id) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_subsystem_name + +Description: Get the name string for the device subsytem + +Input parameters: + +* `processor_handle` device which to query + +Output: device subsytem + +Exceptions that can be thrown by `amdsmi_get_gpu_subsystem_name` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + subsystem_nam = amdsmi_get_gpu_subsystem_name(device) + print(subsystem_nam) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_lib_version + +Description: Get the build version information for the currently running build of AMDSMI. + +Output: amdsmi build version + +Exceptions that can be thrown by `amdsmi_get_lib_version` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + version = amdsmi_get_lib_version() + print(version) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_topo_get_numa_node_number + +Description: Retrieve the NUMA CPU node number for a device + +Input parameters: + +* `processor_handle` device which to query + +Output: node number of NUMA CPU for the device + +Exceptions that can be thrown by `amdsmi_topo_get_numa_node_number` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + node_number = amdsmi_topo_get_numa_node_number() + print(node_number) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_topo_get_link_weight + +Description: Retrieve the weight for a connection between 2 GPUs. + +Input parameters: + +* `processor_handle_src` the source device handle +* `processor_handle_dest` the destination device handle + +Output: the weight for a connection between 2 GPUs + +Exceptions that can be thrown by `amdsmi_topo_get_link_weight` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + processor_handle_src = devices[0] + processor_handle_dest = devices[1] + weight = amdsmi_topo_get_link_weight(processor_handle_src, processor_handle_dest) + print(weight) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_minmax_bandwidth_between_processors + +Description: Retreive minimal and maximal io link bandwidth between 2 GPUs. + +Input parameters: + +* `processor_handle_src` the source device handle +* `processor_handle_dest` the destination device handle + +Output: Dictionary with fields: + +Field | Description +---|--- +`min_bandwidth` | minimal bandwidth for the connection +`max_bandwidth` | maximal bandwidth for the connection + +Exceptions that can be thrown by `amdsmi_get_minmax_bandwidth_between_processors` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + processor_handle_src = devices[0] + processor_handle_dest = devices[1] + bandwidth = amdsmi_get_minmax_bandwidth_between_processors(processor_handle_src, processor_handle_dest) + print(bandwidth['min_bandwidth']) + print(bandwidth['max_bandwidth']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_topo_get_link_type + +Description: Retrieve the hops and the connection type between 2 GPUs + +Input parameters: + +* `processor_handle_src` the source device handle +* `processor_handle_dest` the destination device handle + +Output: Dictionary with fields: + +Field | Description +---|--- +`hops` | number of hops +`type` | the connection type + +Exceptions that can be thrown by `amdsmi_topo_get_link_type` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + processor_handle_src = devices[0] + processor_handle_dest = devices[1] + link_type = amdsmi_topo_get_link_type(processor_handle_src, processor_handle_dest) + print(link_type['hops']) + print(link_type['type']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_is_P2P_accessible + +Description: Return P2P availability status between 2 GPUs + +Input parameters: + +* `processor_handle_src` the source device handle +* `processor_handle_dest` the destination device handle + +Output: P2P availability status between 2 GPUs + +Exceptions that can be thrown by `amdsmi_is_P2P_accessible` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + processor_handle_src = devices[0] + processor_handle_dest = devices[1] + accessible = amdsmi_is_P2P_accessible(processor_handle_src, processor_handle_dest) + print(accessible) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_compute_partition + +Description: Get the compute partition from the given GPU + +Input parameters: + +* `processor_handle` the device handle + +Output: String of the partition type + +Exceptions that can be thrown by `amdsmi_get_gpu_compute_partition` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + compute_partition_type = amdsmi_get_gpu_compute_partition(device) + print(compute_partition_type) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_compute_partition + +Description: Set the compute partition to the given GPU + +Input parameters: + +* `processor_handle` the device handle +* `compute_partition` the type of compute_partition to set + +Output: String of the partition type + +Exceptions that can be thrown by `amdsmi_set_gpu_compute_partition` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + compute_partition = AmdSmiComputePartitionType.SPX + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_compute_partition(device, compute_partition) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_reset_gpu_compute_partition + +Description: Reset the compute partitioning on the given GPU + +Input parameters: + +* `processor_handle` the device handle + +Output: String of the partition type + +Exceptions that can be thrown by `amdsmi_reset_gpu_compute_partition` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_reset_gpu_compute_partition(device) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_gpu_memory_partition + +Description: Get the memory partition from the given GPU + +Input parameters: + +* `processor_handle` the device handle + +Output: String of the partition type + +Exceptions that can be thrown by `amdsmi_get_gpu_memory_partition` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + memory_partition_type = amdsmi_get_gpu_memory_partition(device) + print(memory_partition_type) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_gpu_memory_partition + +Description: Set the memory partition to the given GPU + +Input parameters: + +* `processor_handle` the device handle +* `memory_partition` the type of memory_partition to set + +Output: String of the partition type + +Exceptions that can be thrown by `amdsmi_set_gpu_memory_partition` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + memory_partition = AmdSmiMemoryPartitionType.NPS1 + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_set_gpu_memory_partition(device, memory_partition) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_reset_gpu_memory_partition + +Description: Reset the memory partitioning on the given GPU + +Input parameters: + +* `processor_handle` the device handle + +Output: String of the partition type + +Exceptions that can be thrown by `amdsmi_reset_gpu_memory_partition` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + amdsmi_reset_gpu_memory_partition(device) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_xgmi_info + +Description: Returns XGMI information for the GPU. + +Input parameters: + +* `processor_handle` device handle + +Output: Dictionary with fields: + +Field | Description +---|--- +`xgmi_lanes` | xgmi lanes +`xgmi_hive_id` | xgmi hive id +`xgmi_node_id` | xgmi node id +`index` | index + +Exceptions that can be thrown by `amdsmi_get_xgmi_info` function: + +* `AmdSmiLibraryException` +* `AmdSmiRetryException` +* `AmdSmiParameterException` + +Example: + +```python +try: + devices = amdsmi_get_processor_handles() + if len(devices) == 0: + print("No GPUs on machine") + else: + for device in devices: + xgmi_info = amdsmi_get_xgmi_info(device) + print(xgmi_info['xgmi_lanes']) + print(xgmi_info['xgmi_hive_id']) + print(xgmi_info['xgmi_node_id']) + print(xgmi_info['index']) +except AmdSmiException as e: + print(e) +``` + +## CPU APIs + +### amdsmi_get_processor_info + +**Note: CURRENTLY HARDCODED TO RETURN EMPTY VALUES** +Description: Return processor name + +Input parameters: +`processor_handle` processor handle + +Output: Processor name + +Exceptions that can be thrown by `amdsmi_get_processor_info` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_processor_handles() + if len(processor_handles) == 0: + print("No processors on machine") + else: + for processor in processor_handles: + print(amdsmi_get_processor_info(processor)) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_hsmp_proto_ver + +Description: Get the hsmp protocol version. + +Output: amdsmi hsmp protocol version + +Exceptions that can be thrown by `amdsmi_get_cpu_hsmp_proto_ver` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + version = amdsmi_get_cpu_hsmp_proto_ver(processor) + print(version) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_smu_fw_version + +Description: Get the SMU Firmware version. + +Output: amdsmi SMU Firmware version + +Exceptions that can be thrown by `amdsmi_get_cpu_smu_fw_version` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + version = amdsmi_get_cpu_smu_fw_version(processor) + print(version['debug']) + print(version['minor']) + print(version['major']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_prochot_status + +Description: Get the CPU's prochot status. + +Output: amdsmi cpu prochot status + +Exceptions that can be thrown by `amdsmi_get_cpu_prochot_status` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + prochot = amdsmi_get_cpu_prochot_status(processor) + print(prochot) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_fclk_mclk + +Description: Get the Data fabric clock and Memory clock in MHz. + +Output: amdsmi data fabric clock and memory clock + +Exceptions that can be thrown by `amdsmi_get_cpu_fclk_mclk` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + clk = amdsmi_get_cpu_fclk_mclk(processor) + for fclk, mclk in clk.items(): + print(fclk) + print(mclk) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_cclk_limit + +Description: Get the core clock in MHz. + +Output: amdsmi core clock + +Exceptions that can be thrown by `amdsmi_get_cpu_cclk_limit` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + cclk_limit = amdsmi_get_cpu_cclk_limit(processor) + print(cclk_limit) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_current_active_freq_limit + +Description: Get current active frequency limit of the socket. + +Output: amdsmi frequency value in MHz and frequency source name + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_current_active_freq_limit` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + freq_limit = amdsmi_get_cpu_socket_current_active_freq_limit(processor) + for freq, src in freq_limit.items(): + print(freq) + print(src) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_freq_range + +Description: Get socket frequency range + +Output: amdsmi maximum frequency and minimum frequency + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_freq_range` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + freq_range = amdsmi_get_cpu_socket_freq_range(processor) + for fmax, fmin in freq_range.items(): + print(fmax) + print(fmin) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_core_current_freq_limit + +Description: Get socket frequency limit of the core + +Output: amdsmi frequency + +Exceptions that can be thrown by `amdsmi_get_cpu_core_current_freq_limit` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpucore_handles() + if len(processor_handles) == 0: + print("No CPU cores on machine") + else: + for processor in processor_handles: + freq_limit = amdsmi_get_cpu_core_current_freq_limit(processor) + print(freq_limit) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_power + +Description: Get the socket power. + +Output: amdsmi socket power + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_power` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + sock_power = amdsmi_get_cpu_socket_power(processor) + print(sock_power) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_power_cap + +Description: Get the socket power cap. + +Output: amdsmi socket power cap + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_power_cap` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + sock_power = amdsmi_get_cpu_socket_power_cap(processor) + print(sock_power) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_power_cap_max + +Description: Get the socket power cap max. + +Output: amdsmi socket power cap max + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_power_cap_max` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + sock_power = amdsmi_get_cpu_socket_power_cap_max(processor) + print(sock_power) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_pwr_svi_telemetry_all_rails + +Description: Get the SVI based power telemetry for all rails. + +Output: amdsmi svi based power value + +Exceptions that can be thrown by `amdsmi_get_cpu_pwr_svi_telemetry_all_rails` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + power = amdsmi_get_cpu_pwr_svi_telemetry_all_rails(processor) + print(power) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_socket_power_cap + +Description: Set the power cap value for a given socket. + +Input: amdsmi socket power cap value + +Exceptions that can be thrown by `amdsmi_set_cpu_socket_power_cap` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + power = amdsmi_set_cpu_socket_power_cap(processor, 1000) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_pwr_efficiency_mode + +Description: Set the power efficiency profile policy. + +Input: mode(0, 1, or 2) + +Exceptions that can be thrown by `amdsmi_set_cpu_pwr_efficiency_mode` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + policy = amdsmi_set_cpu_pwr_efficiency_mode(processor, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_core_boostlimit + +Description: Get boost limit of the cpu core + +Output: amdsmi frequency + +Exceptions that can be thrown by `amdsmi_get_cpu_core_boostlimit` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpucore_handles() + if len(processor_handles) == 0: + print("No CPU cores on machine") + else: + for processor in processor_handles: + boost_limit = amdsmi_get_cpu_core_boostlimit(processor) + print(boost_limit) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_c0_residency + +Description: Get the cpu socket C0 residency. + +Output: amdsmi C0 residency value + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_c0_residency` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + c0_residency = amdsmi_get_cpu_socket_c0_residency(processor) + print(c0_residency) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_core_boostlimit + +Description: Set the cpu core boost limit. + +Output: amdsmi boostlimit value + +Exceptions that can be thrown by `amdsmi_set_cpu_core_boostlimit` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpucore_handles() + if len(processor_handles) == 0: + print("No CPU cores on machine") + else: + for processor in processor_handles: + boost_limit = amdsmi_set_cpu_core_boostlimit(processor, 1000) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_socket_boostlimit + +Description: Set the cpu socket boost limit. + +Input: amdsmi boostlimit value + +Exceptions that can be thrown by `amdsmi_set_cpu_socket_boostlimit` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + boost_limit = amdsmi_set_cpu_socket_boostlimit(processor, 1000) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_ddr_bw + +Description: Get the CPU DDR Bandwidth. + +Output: amdsmi ddr bandwidth data + +Exceptions that can be thrown by `amdsmi_get_cpu_ddr_bw` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + ddr_bw = amdsmi_get_cpu_ddr_bw(processor) + print(ddr_bw['max_bw']) + print(ddr_bw['utilized_bw']) + print(ddr_bw['utilized_pct']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_temperature + +Description: Get the socket temperature. + +Output: amdsmi temperature value + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_temperature` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + ptmon = amdsmi_get_cpu_socket_temperature(processor) + print(ptmon) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_dimm_temp_range_and_refresh_rate + +Description: Get DIMM temperature range and refresh rate. + +Output: amdsmi dimm metric data + +Exceptions that can be thrown by `amdsmi_get_cpu_dimm_temp_range_and_refresh_rate` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + dimm = amdsmi_get_cpu_dimm_temp_range_and_refresh_rate(processor) + print(dimm['range']) + print(dimm['ref_rate']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_dimm_power_consumption + +Description: amdsmi_get_cpu_dimm_power_consumption. + +Output: amdsmi dimm power consumption value + +Exceptions that can be thrown by `amdsmi_get_cpu_dimm_power_consumption` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + dimm = amdsmi_get_cpu_dimm_power_consumption(processor) + print(dimm['power']) + print(dimm['update_rate']) + print(dimm['dimm_addr']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_dimm_thermal_sensor + +Description: Get DIMM thermal sensor value. + +Output: amdsmi dimm temperature data + +Exceptions that can be thrown by `amdsmi_get_cpu_dimm_thermal_sensor` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + dimm = amdsmi_get_cpu_dimm_thermal_sensor(processor) + print(dimm['sensor']) + print(dimm['update_rate']) + print(dimm['dimm_addr']) + print(dimm['temp']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_xgmi_width + +Description: Set xgmi width. + +Input: amdsmi xgmi width + +Exceptions that can be thrown by `amdsmi_set_cpu_xgmi_width` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + xgmi_width = amdsmi_set_cpu_xgmi_width(processor, 0, 100) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_gmi3_link_width_range + +Description: Set gmi3 link width range. + +Input: minimum & maximum link width to be set. + +Exceptions that can be thrown by `amdsmi_set_cpu_gmi3_link_width_range` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + gmi_link_width = amdsmi_set_cpu_gmi3_link_width_range(processor, 0, 100) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_cpu_apb_enable + +Description: Enable APB. + +Input: amdsmi processor handle + +Exceptions that can be thrown by `amdsmi_cpu_apb_enable` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + apb_enable = amdsmi_cpu_apb_enable(processor) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_cpu_apb_disable + +Description: Disable APB. + +Input: pstate value + +Exceptions that can be thrown by `amdsmi_cpu_apb_disable` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + apb_disable = amdsmi_cpu_apb_disable(processor, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_socket_lclk_dpm_level + +Description: Set NBIO lclk dpm level value. + +Input: nbio id, min value, max value + +Exceptions that can be thrown by `amdsmi_set_cpu_socket_lclk_dpm_level` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for socket in socket_handles: + nbio = amdsmi_set_cpu_socket_lclk_dpm_level(socket, 0, 0, 2) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_socket_lclk_dpm_level + +Description: Get NBIO LCLK dpm level. + +Output: nbio id + +Exceptions that can be thrown by `amdsmi_get_cpu_socket_lclk_dpm_level` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + nbio = amdsmi_get_cpu_socket_lclk_dpm_level(processor) + print(nbio['max_dpm_level']) + print(nbio['max_dpm_level']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_pcie_link_rate + +Description: Set pcie link rate. + +Input: rate control value + +Exceptions that can be thrown by `amdsmi_set_cpu_pcie_link_rate` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + link_rate = amdsmi_set_cpu_pcie_link_rate(processor, 0, 0) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_set_cpu_df_pstate_range + +Description: Set df pstate range. + +Input: max pstate, min pstate + +Exceptions that can be thrown by `amdsmi_set_cpu_df_pstate_range` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + pstate_range = amdsmi_set_cpu_df_pstate_range(processor, 0, 2) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_current_io_bandwidth + +Description: Get current input output bandwidth. + +Output: link id and bw type to which io bandwidth to be obtained + +Exceptions that can be thrown by `amdsmi_get_cpu_current_io_bandwidth` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + io_bw = amdsmi_get_cpu_current_io_bandwidth(processor) + print(io_bw) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_current_xgmi_bw + +Description: Get current xgmi bandwidth. + +Output: amdsmi link id and bw type to which xgmi bandwidth to be obtained + +Exceptions that can be thrown by `amdsmi_get_cpu_current_xgmi_bw` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + xgmi_bw = amdsmi_get_cpu_current_xgmi_bw(processor) + print(xgmi_bw) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_hsmp_metrics_table_version + +Description: Get HSMP metrics table version. + +Output: amdsmi HSMP metrics table version + +Exceptions that can be thrown by `amdsmi_get_hsmp_metrics_table_version` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + met_ver = amdsmi_get_hsmp_metrics_table_version(processor) + print(met_ver) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_hsmp_metrics_table + +Description: Get HSMP metrics table + +Output: HSMP metric table data + +Exceptions that can be thrown by `amdsmi_get_hsmp_metrics_table` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + mtbl = amdsmi_get_hsmp_metrics_table(processor) + print(mtbl['accumulation_counter']) + print(mtbl['max_socket_temperature']) + print(mtbl['max_vr_temperature']) + print(mtbl['max_hbm_temperature']) + print(mtbl['socket_power_limit']) + print(mtbl['max_socket_power_limit']) + print(mtbl['socket_power']) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_first_online_core_on_cpu_socket + +Description: Get first online core on cpu socket. + +Output: first online core on cpu socket + +Exceptions that can be thrown by `amdsmi_first_online_core_on_cpu_socket` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + processor_handles = amdsmi_get_cpusocket_handles() + if len(processor_handles) == 0: + print("No CPU sockets on machine") + else: + for processor in processor_handles: + pcore_ind = amdsmi_first_online_core_on_cpu_socket(processor) + print(pcore_ind) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_family + +Description: Get cpu family. + +Output: cpu family + +Exceptions that can be thrown by `amdsmi_get_cpu_family` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + cpu_family = amdsmi_get_cpu_family() + print(cpu_family) +except AmdSmiException as e: + print(e) +``` + +### amdsmi_get_cpu_model + +Description: Get cpu model. + +Output: cpu model + +Exceptions that can be thrown by `amdsmi_get_cpu_model` function: + +* `AmdSmiLibraryException` + +Example: + +```python +try: + cpu_model = amdsmi_get_cpu_model() + print(cpu_model) +except AmdSmiException as e: + print(e) +``` diff --git a/docs/index.md b/docs/index.md deleted file mode 100644 index 451bedaec2..0000000000 --- a/docs/index.md +++ /dev/null @@ -1,2 +0,0 @@ -```{include} ../README.md -``` diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 0000000000..7d5725cba1 --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,50 @@ +.. meta:: + :description: AMDSMI documentation and API reference library + :keywords: amdsmi, ROCm, API, documentation + +******************************************************************** +AMD SMI documentation +******************************************************************** + +The AMD System Management Interface (SMI) Library, or AMD SMI library, is a C library for Linux that provides a user space interface for applications to monitor and control AMD devices. + +You can access the AMD SMI code on the `GitHub repository `_. + +.. Note:: + +This project is a successor to `rocm_smi_lib. `_ + +.. grid:: 2 + :gutter: 3 + + .. grid-item-card:: Install + + * :doc:`AMD SMI installation <./install/install>` + + .. grid-item-card:: API reference + + * :doc:`Files <../doxygen/docBin/html/files>` + * :doc:`Globals <../doxygen/docBin/html/globals>` + * :doc:`Data structures <../doxygen/docBin/html/annotated>` + * :doc:`Modules <../doxygen/docBin/html/modules>` + * :doc:`Data fields <../doxygen/docBin/html/functions_data_fields>` + + .. grid-item-card:: How to + + * :doc:`Use AMD SMI for C++ library ` + * :doc:`Use AMD SMI for Python library ` + * :doc:`Use AMD SMI CLI tool ` + + + .. grid-item-card:: Tutorials + + * `AMD SMI GitHub samples `_ + * `ROCm SMI Github samples `_ + + +To contribute to the documentation, refer to +`Contributing to ROCm `_. + +You can find licensing information on the +`Licensing `_ page. + diff --git a/docs/install/install.rst b/docs/install/install.rst new file mode 100644 index 0000000000..fa6355e720 --- /dev/null +++ b/docs/install/install.rst @@ -0,0 +1,147 @@ +.. meta:: + :description: Install AMD SMI + :keywords: install, SMI, AMD, ROCm + +******************************************************************** +Installation +******************************************************************** + +AMD System Management Interface (AMD SMI) library +------------------------------------------------- + +The AMD System Management Interface Library (AMD SMI library) is a C library for Linux that provides a user space interface for applications to monitor and control AMD devices. + +.. Note:: + +This project is a successor to `rocm_smi_lib. `_ + +Supported platforms +===================== +In its initial release, the AMD SMI library supports Linux bare metal and Linux virtual machine guest for AMD GPUs. In a future release, the library will extend to support AMD EPYC™ CPUs. + +The AMD SMI library can run on AMD ROCm-supported platforms. Refer to `System requirements - Linux `_ for more information. + +To run the AMD SMI library, the `amdgpu` driver and the `hsmp` driver must be installed. Optionally, `libdrm` can be installed to query firmware information and hardware IPs. + + +CLI tool and libraries installation +------------------------------------ + +Requirements +============= + +* Python 3.6.8+ 64-bit +* amdgpu driver must be loaded for `amdsmi_init()` to pass + +Installation steps +------------------- + +1. Install amdgpu using ROCm. + +2. Install amdgpu driver. See the following example. Note that your release and link may differ. The `amdgpu-install --usecase=rocm` triggers both the amdgpu driver update and AMD SMI packages to be installed on your device. + +.. code-block:: shell + + sudo apt update + + wget https://repo.radeon.com/amdgpu-install/6.0.2/ubuntu/jammy/amdgpu-install_6.0.60002-1_all.deb + + sudo apt install ./amdgpu-install_6.0.60002-1_all.deb + + sudo amdgpu-install --usecase=rocm + + amd-smi --help + +3. Install an example for Ubuntu 22.04 (without ROCm). + +.. code-block:: bash + + apt install amd-smi-lib + + # if installed with rocm ignore the export + + export PATH="${PATH:+${PATH}:}~/opt/rocm/bin" + + amd-smi --help + + +Optional autocompletion +------------------------ + +The `amd-smi` cli application supports autocompletion. The package should attempt to install it, if argcomplete is not installed, you can enable it by using the following commands: + +.. code:: bash + + python3 -m pip install argcomplete + + activate-global-python-argcomplete --user + + # restart shell to enable + + +Manual/Multiple ROCm instance Python library install +------------------------------------------------------ + +In the event there are multiple ROCm installations and `pyenv` is not being used to use the correct amdsmi version, you must uninstall previous versions of AMD SMI and install the latest version you want directly from your ROCm instance. + +Python library install example for Ubuntu 22.04 +================================================= + +1. Remove any existing AMD SMI installation: + +.. code-block:: bash + + python3 -m pip list | grep amd + + python3 -m pip uninstall amdsmi + + +2. Install Python library from your target ROCm instance: + +.. code:: bash + + apt install amd-smi-lib + + cd /opt/rocm/share/amd_smi + + python3 -m pip install --upgrade pip + + python3 -m pip install --user + + +Now you have the AMD SMI Python library in your Python path: + + +.. code:: bash + + ~$ python3 + + Python 3.8.10 (default, May 26 2023, 14:05:08) + + [GCC 9.4.0] on linux + +3. Type "help", "copyright", "credits" or "license" for more information + +.. code:: bash + + import amdsmi + + +Sphinx documentation +===================== + +Run the following commands to build the documentation locally: + +.. code-block:: bash + + cd docs + + python3 -m pip install -r sphinx/requirements.txt + + python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html + + +The output is available in `docs/_build/html`. + +For additional details, see `Contribute to ROCm documentation `_. + diff --git a/docs/reference/index.rst b/docs/reference/index.rst new file mode 100644 index 0000000000..019a2bb523 --- /dev/null +++ b/docs/reference/index.rst @@ -0,0 +1,14 @@ +.. meta:: + :description: Install AMD SMI + :keywords: install, SMI, AMD, ROCm + +****************** +API reference +****************** + +This section provides technical descriptions and important information about the different AMD SMI and library components. + +* {doc}`Library <../doxygen/docBin/html/files>` +* {doc}`Functions <../doxygen/docBin/html/globals>` +* {doc}`Data structures <../doxygen/docBin/html/annotated>` + diff --git a/docs/rocm-smi-lib/tutorials/c++_tutorials.rst b/docs/rocm-smi-lib/tutorials/c++_tutorials.rst new file mode 100644 index 0000000000..8c8d82aacd --- /dev/null +++ b/docs/rocm-smi-lib/tutorials/c++_tutorials.rst @@ -0,0 +1,34 @@ +==================== +C++ Tutorials +==================== + +This chapter contains the ROCm SMI C++ API tutorials. + +.. code-block:: c++ + + #include + #include "rocm_smi/rocm_smi.h" + int main() { + + rsmi_status_t ret; + uint32_t num_devices; + uint16_t dev_id; + + // We will skip return code checks for this example, but it + // is recommended to always check this as some calls may not + // apply for some devices or ROCm releases + + ret = rsmi_init(0); + ret = rsmi_num_monitor_devices(&num_devices); + + for (int i=0; i < num_devices; ++i) { + ret = rsmi_dev_id_get(i, &dev_id); + // dev_id holds the device ID of device i, upon a + // successful call + } + ret = rsmi_shut_down(); + return 0; + } + +For more examples please check the `C++ example `_ +or `tests. `_ diff --git a/docs/rocm-smi-lib/tutorials/index.rst b/docs/rocm-smi-lib/tutorials/index.rst new file mode 100644 index 0000000000..a0915ee846 --- /dev/null +++ b/docs/rocm-smi-lib/tutorials/index.rst @@ -0,0 +1,13 @@ + +.. meta:: + :description: ROCm SMI documentation and API reference library + :keywords: SMI, ROCm, API, documentation + + +**************************************************** +ROCm System Management Interface (ROCm SMI) library +**************************************************** + +The ROCm System Management Interface Library, or ROCm SMI library, is part of the ROCm software stack. It is a C library for Linux that provides a user space interface for applications to monitor and control GPU applications. + +ROCm SMI Library still works in the current release, but its documentation is now integrated with AMD SMI. For information specific to ROCm SMI Library, refer to `ROCm SMI Library `_ diff --git a/docs/rocm-smi-lib/tutorials/python_api.rst b/docs/rocm-smi-lib/tutorials/python_api.rst new file mode 100644 index 0000000000..604803b9c0 --- /dev/null +++ b/docs/rocm-smi-lib/tutorials/python_api.rst @@ -0,0 +1,269 @@ +==================== +Python API Reference +==================== + +This chapter describes the ROCm SMI Python module API. + +.. default-domain:: py +.. py:currentmodule:: rocm_smi + +Functions +--------- + +.. autofunction:: rocm_smi.driverInitialized + +.. autofunction:: rocm_smi.formatJson + +.. autofunction:: rocm_smi.formatCsv + +.. autofunction:: rocm_smi.formatMatrixToJSON + +.. autofunction:: rocm_smi.getBus + +.. autofunction:: rocm_smi.getFanSpeed + +.. autofunction:: rocm_smi.getGpuUse + +.. autofunction:: rocm_smi.getDRMDeviceId + +.. autofunction:: rocm_smi.getSubsystemId + +.. autofunction:: rocm_smi.getVendor + +.. autofunction:: rocm_smi.getGUID + +.. autofunction:: rocm_smi.getTargetGfxVersion + +.. autofunction:: rocm_smi.getNodeId + +.. autofunction:: rocm_smi.getDeviceName + +.. autofunction:: rocm_smi.getRev + +.. autofunction:: rocm_smi.getMaxPower + +.. autofunction:: rocm_smi.getMemInfo + +.. autofunction:: rocm_smi.getProcessName + +.. autofunction:: rocm_smi.getPerfLevel + +.. autofunction:: rocm_smi.getPid + +.. autofunction:: rocm_smi.getPidList + +.. autofunction:: rocm_smi.getPower + +.. autofunction:: rocm_smi.getRasEnablement + +.. autofunction:: rocm_smi.getTemp + +.. autofunction:: rocm_smi.findFirstAvailableTemp + +.. autofunction:: rocm_smi.getTemperatureLabel + +.. autofunction:: rocm_smi.getPowerLabel + +.. autofunction:: rocm_smi.getVbiosVersion + +.. autofunction:: rocm_smi.getVersion + +.. autofunction:: rocm_smi.getComputePartition + +.. autofunction:: rocm_smi.getMemoryPartition + +.. autofunction:: rocm_smi.print2DArray + +.. autofunction:: rocm_smi.printEmptyLine + +.. autofunction:: rocm_smi.printErrLog + +.. autofunction:: rocm_smi.printInfoLog + +.. autofunction:: rocm_smi.printEventList + +.. autofunction:: rocm_smi.printLog + +.. autofunction:: rocm_smi.printListLog + +.. autofunction:: rocm_smi.printLogSpacer + +.. autofunction:: rocm_smi.printSysLog + +.. autofunction:: rocm_smi.printTableLog + +.. autofunction:: rocm_smi.printTableRow + +.. autofunction:: rocm_smi.checkIfSecondaryDie + +.. autofunction:: rocm_smi.resetClocks + +.. autofunction:: rocm_smi.resetFans + +.. autofunction:: rocm_smi.resetPowerOverDrive + +.. autofunction:: rocm_smi.resetProfile + +.. autofunction:: rocm_smi.resetXgmiErr + +.. autofunction:: rocm_smi.resetPerfDeterminism + +.. autofunction:: rocm_smi.resetComputePartition + +.. autofunction:: rocm_smi.resetMemoryPartition + +.. autofunction:: rocm_smi.setClockRange + +.. autofunction:: rocm_smi.setClockExtremum + +.. autofunction:: rocm_smi.setVoltageCurve + +.. autofunction:: rocm_smi.setPowerPlayTableLevel + +.. autofunction:: rocm_smi.setClockOverDrive + +.. autofunction:: rocm_smi.setClocks + +.. autofunction:: rocm_smi.setPerfDeterminism + +.. autofunction:: rocm_smi.resetGpu + +.. autofunction:: rocm_smi.isRasControlAvailable + +.. autofunction:: rocm_smi.setRas + +.. autofunction:: rocm_smi.setFanSpeed + +.. autofunction:: rocm_smi.setPerformanceLevel + +.. autofunction:: rocm_smi.setPowerOverDrive + +.. autofunction:: rocm_smi.setProfile + +.. autofunction:: rocm_smi.setComputePartition + +.. autofunction:: rocm_smi.progressbar + +.. autofunction:: rocm_smi.showProgressbar + +.. autofunction:: rocm_smi.setMemoryPartition + +.. autofunction:: rocm_smi.showVersion + +.. autofunction:: rocm_smi.showAllConcise + +.. autofunction:: rocm_smi.showAllConciseHw + +.. autofunction:: rocm_smi.showBus + +.. autofunction:: rocm_smi.showClocks + +.. autofunction:: rocm_smi.showCurrentClocks + +.. autofunction:: rocm_smi.showCurrentFans + +.. autofunction:: rocm_smi.showCurrentTemps + +.. autofunction:: rocm_smi.showFwInfo + +.. autofunction:: rocm_smi.showGpusByPid + +.. autofunction:: rocm_smi.getCoarseGrainUtil + +.. autofunction:: rocm_smi.showGpuUse + +.. autofunction:: rocm_smi.showEnergy + +.. autofunction:: rocm_smi.showId + +.. autofunction:: rocm_smi.showMaxPower + +.. autofunction:: rocm_smi.showMemInfo + +.. autofunction:: rocm_smi.showMemUse + +.. autofunction:: rocm_smi.showMemVendor + +.. autofunction:: rocm_smi.showOverDrive + +.. autofunction:: rocm_smi.showPcieBw + +.. autofunction:: rocm_smi.showPcieReplayCount + +.. autofunction:: rocm_smi.showPerformanceLevel + +.. autofunction:: rocm_smi.showPids + +.. autofunction:: rocm_smi.showPower + +.. autofunction:: rocm_smi.showPowerPlayTable + +.. autofunction:: rocm_smi.showProduct + +.. autofunction:: rocm_smi.showProfile + +.. autofunction:: rocm_smi.showRange + +.. autofunction:: rocm_smi.showRasInfo + +.. autofunction:: rocm_smi.showRetiredPages + +.. autofunction:: rocm_smi.showSerialNumber + +.. autofunction:: rocm_smi.showUId + +.. autofunction:: rocm_smi.showVbiosVersion + +.. autofunction:: rocm_smi.showEvents + +.. autofunction:: rocm_smi.showDriverVersion + +.. autofunction:: rocm_smi.showVoltage + +.. autofunction:: rocm_smi.showVoltageCurve + +.. autofunction:: rocm_smi.showXgmiErr + +.. autofunction:: rocm_smi.showAccessibleTopology + +.. autofunction:: rocm_smi.showWeightTopology + +.. autofunction:: rocm_smi.showHopsTopology + +.. autofunction:: rocm_smi.showTypeTopology + +.. autofunction:: rocm_smi.showNumaTopology + +.. autofunction:: rocm_smi.showHwTopology + +.. autofunction:: rocm_smi.showNodesBw + +.. autofunction:: rocm_smi.showComputePartition + +.. autofunction:: rocm_smi.showMemoryPartition + +.. autofunction:: rocm_smi.checkAmdGpus + +.. autofunction:: rocm_smi.component_str + +.. autofunction:: rocm_smi.confirmOutOfSpecWarning + +.. autofunction:: rocm_smi.doesDeviceExist + +.. autofunction:: rocm_smi.initializeRsmi + +.. autofunction:: rocm_smi.isAmdDevice + +.. autofunction:: rocm_smi.listDevices + +.. autofunction:: rocm_smi.load + +.. autofunction:: rocm_smi.padHexValue + +.. autofunction:: rocm_smi.profileString + +.. autofunction:: rocm_smi.relaunchAsSudo + +.. autofunction:: rocm_smi.rsmi_ret_ok + +.. autofunction:: rocm_smi.save diff --git a/docs/rocm-smi-lib/tutorials/python_tutorials.rst b/docs/rocm-smi-lib/tutorials/python_tutorials.rst new file mode 100644 index 0000000000..78a4a43db2 --- /dev/null +++ b/docs/rocm-smi-lib/tutorials/python_tutorials.rst @@ -0,0 +1,29 @@ +==================== +Python Tutorials +==================== + +This chapter is the rocm_smi Python api tutorials. + +.. code-block:: python + + import sys + sys.path.append("/opt/rocm/libexec/rocm_smi/") + try: + import rocm_smi + except ImportError: + raise ImportError("Could not import /opt/rocm/libexec/rocm_smi/rocm_smi.py") + + class prof_utils: + def __init__(self, mode) -> None: + rocm_smi.initializeRsmi() + + def getPower(self, device): + return rocm_smi.getPower(device) + + def listDevices(self): + return rocm_smi.listDevices() + + def getMemInfo(self, device): + (memUsed, memTotal) = rocm_smi.getMemInfo(device, "vram") + return round(float(memUsed)/float(memTotal) * 100, 2) + diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index fabff6292c..415d73783d 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -1,26 +1,49 @@ -# Anywhere {branch} is used, the branch name will be substituted. -# These comments will also be removed. defaults: - numbered: False - maxdepth: 6 + numbered: false root: index -subtrees: - - caption: AMD SMI APIs - entries: - - file: doxygen/docBin/html/index - title: C - - file: py-interface_readme_link - title: Python - - caption: CLI Tools - entries: - - file: amdsmi_cli_readme_link - title: Python CLI Tool - - file: amdsmi_release_notes_link - title: Python CLI Release Notes - - caption: Changelog - entries: - - file: amdsmi_changelog_link - title: AMD-SMI Changelog - - caption: About - entries: - - file: license +subtrees: +- entries: + - file: what-is-AMDSMI.rst + title: What is AMD SMI? + +- caption: Install + entries: + - file: install/install.rst + title: AMD SMI installation + + +- caption: How to + entries: + - file: how-to/using-amdsmi-for-C++.rst + title: Use AMD SMI for C++ library + - file: how-to/using-amdsmi-for-python.md + title: Use AMD SMI for Python library + - file: how-to/using-AMD-SMI-CLI-tool.md + title: Use AMD SMI CLI tool + +- caption: API reference + entries: + - file: doxygen/docBin/html/files + title: Files + - file: doxygen/docBin/html/globals + title: Globals + - file: doxygen/docBin/html/annotated + title: Data structures + - file: doxygen/docBin/html/modules + title: Modules + - file: doxygen/docBin/html/functions_data_fields + title: Data fields + + + +- caption: Tutorials + entries: + - url: https://github.com/ROCm/amdsmi/tree/develop/example + title: AMD SMI GitHub samples + - url: https://github.com/ROCm/rocm_smi_lib/tree/develop/docs + title: ROCm SMI lib GitHub samples + +- caption: About + entries: + - file: license.md +