diff --git a/CHANGELOG.md b/CHANGELOG.md index 3370ea348a..071ac4f29a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,32 @@ Full documentation for amd_smi_lib is available at [https://rocm.docs.amd.com/pr ***All information listed below is for reference and subject to change.*** +## amd_smi_lib for ROCm 6.3.0 + +### Changes + +- **Added Pytest functionality to test amdsmi API calls in Python**. + +### Removals + +- N/A + +### Optimizations + +- N/A + +### Resolved issues + +- N/A + +### Known issues + +- N/A + +### Upcoming changes + +- N/A + ## amd_smi_lib for ROCm 6.2.1 ### Additions @@ -20,9 +46,9 @@ Guest VMs can view enabled/disabled ras features that are on Host cards. ### Fixes -- **Fixed TypeError in `amd-smi process -G`**. +- **Fixed TypeError in `amd-smi process -G`**. -- **Updated CLI error strings to handle empty and invalid GPU/CPU inputs**. +- **Updated CLI error strings to handle empty and invalid GPU/CPU inputs**. - **Fixed Guest VM showing passthrough options**. @@ -36,9 +62,9 @@ Guest VMs can view enabled/disabled ras features that are on Host cards. ### Additions -- **`amd-smi dmon` is now available as an alias to `amd-smi monitor`**. +- **`amd-smi dmon` is now available as an alias to `amd-smi monitor`**. -- **Added optional process table under `amd-smi monitor -q`**. +- **Added optional process table under `amd-smi monitor -q`**. The monitor subcommand within the CLI Tool now has the `-q` option to enable an optional process table underneath the original monitored output. ```shell @@ -51,10 +77,10 @@ GPU NAME PID GTT_MEM CPU_MEM VRAM_MEM MEM_USAGE GF 0 rvs 1564865 0.0 B 0.0 B 1.1 GB 0.0 B 0 ns 0 ns ``` -- **Added Handling to detect VMs with passthrough configurations in CLI Tool**. +- **Added Handling to detect VMs with passthrough configurations in CLI Tool**. CLI Tool had only allowed a restricted set of options for Virtual Machines with passthrough GPUs. Now we offer an expanded set of functions availble to passthrough configured GPUs. -- **Added Process Isolation and Clear SRAM functionality to the CLI Tool for VMs**. +- **Added Process Isolation and Clear SRAM functionality to the CLI Tool for VMs**. VMs now have the ability to set the process isolation and clear the sram from the CLI tool. Using the following commands ```shell @@ -62,22 +88,22 @@ amd-smi set --process-isolation <0 or 1> amd-smi reset --clean_local_data ``` -- **Added macros that were in `amdsmi.h` to the amdsmi Python library `amdsmi_interface.py`**. +- **Added macros that were in `amdsmi.h` to the amdsmi Python library `amdsmi_interface.py`**. Added macros to reference max size limitations for certain amdsmi functions such as max dpm policies and max fanspeed. -- **Added Ring Hang event**. +- **Added Ring Hang event**. Added `AMDSMI_EVT_NOTIF_RING_HANG` to the possible events in the `amdsmi_evt_notification_type_t` enum. ### Optimizations -- **Updated CLI error strings to specify invalid device type queried** +- **Updated CLI error strings to specify invalid device type queried** ```shell $ amd-smi static --asic --gpu 123123 Can not find a device: GPU '123123' Error code: -3 ``` -- **Removed elevated permission requirements for `amdsmi_get_gpu_process_list()`**. +- **Removed elevated permission requirements for `amdsmi_get_gpu_process_list()`**. Previously if a processes with elevated permissions was running amd-smi would required sudo to display all output. Now amd-smi will populate all process data and return N/A for elevated process names instead. However if ran with sudo you will be able to see the name like so: ```shell @@ -112,10 +138,10 @@ GPU: 0 ENC: 0 ns ``` -- **Updated naming for `amdsmi_set_gpu_clear_sram_data()` to `amdsmi_clean_gpu_local_data()`**. +- **Updated naming for `amdsmi_set_gpu_clear_sram_data()` to `amdsmi_clean_gpu_local_data()`**. Changed the naming to be more accurate to what the function was doing. This change also extends to the CLI where we changed the `clear-sram-data` command to `clean_local_data`. -- **Updated `amdsmi_clk_info_t` struct in amdsmi.h and amdsmi_interface.py to align with host/guest**. +- **Updated `amdsmi_clk_info_t` struct in amdsmi.h and amdsmi_interface.py to align with host/guest**. Changed cur_clk to clk, changed sleep_clk to clk_deep_sleep, and added clk_locked value. New struct will be in the following format: ```shell @@ -129,7 +155,7 @@ Changed cur_clk to clk, changed sleep_clk to clk_deep_sleep, and added clk_locke } amdsmi_clk_info_t; ``` -- **Multiple structure updates in amdsmi.h and amdsmi_interface.py to align with host/guest**. +- **Multiple structure updates in amdsmi.h and amdsmi_interface.py to align with host/guest**. Multiple structures used by APIs were changed for alignment unification: - Changed `amdsmi_vram_info_t` `vram_size_mb` field changed to to `vram_size` - Updated `amdsmi_vram_type_t` struct updated to include new enums and added `AMDSMI` prefix @@ -137,7 +163,7 @@ Multiple structures used by APIs were changed for alignment unification: - Added `AMDSMI_PROCESSOR_TYPE` prefix to `processor_type_t` enums - Removed the fields structure definition in favor for an anonymous definition in `amdsmi_bdf_t` -- **Added `AMDSMI` prefix in amdsmi.h and amdsmi_interface.py to align with host/guest**. +- **Added `AMDSMI` prefix in amdsmi.h and amdsmi_interface.py to align with host/guest**. Multiple structures used by APIs were changed for alignment unification. `AMDSMI` prefix was added to the following structures: - Added AMDSMI prefix to `amdsmi_container_types_t` enums - Added AMDSMI prefix to `amdsmi_clk_type_t` enums @@ -147,13 +173,13 @@ Multiple structures used by APIs were changed for alignment unification. `AMDSMI - Added AMDSMI prefix to `amdsmi_temperature_type_t` enums - Added AMDSMI prefix to `amdsmi_fw_block_t` enums -- **Changed dpm_policy references to soc_pstate**. +- **Changed dpm_policy references to soc_pstate**. The file structure referenced to dpm_policy changed to soc_pstate and we have changed the APIs and CLI tool to be inline with the current structure. `amdsmi_get_dpm_policy()` and `amdsmi_set_dpm_policy()` is no longer valid with the new API being `amdsmi_get_soc_pstate()` and `amdsmi_set_soc_pstate()`. The CLI tool has been changed from `--policy` to `--soc-pstate` -- **Updated `amdsmi_get_gpu_board_info()` product_name to fallback to pciids**. +- **Updated `amdsmi_get_gpu_board_info()` product_name to fallback to pciids**. Previously on devices without a FRU we would not populate the product name in the `amdsmi_board_info_t` structure, now we will fallback to using the name listed according to the pciids file if available. -- **Updated CLI voltage curve command output**. +- **Updated CLI voltage curve command output**. The output for `amd-smi metric --voltage-curve` now splits the frequency and voltage output by curve point or outputs N/A for each curve point if not applicable ```shell @@ -167,16 +193,16 @@ GPU: 0 POINT_2_VOLTAGE: 1186 mV ``` -- **Updated `amdsmi_get_gpu_board_info()` now has larger structure sizes for `amdsmi_board_info_t`**. +- **Updated `amdsmi_get_gpu_board_info()` now has larger structure sizes for `amdsmi_board_info_t`**. Updated sizes that work for retreiving relavant board information across AMD's ASIC products. This requires users to update any ABIs using this structure. ### Fixes -- **Fixed Leftover Mutex deadlock when running multiple instances of the CLI tool**. +- **Fixed Leftover Mutex deadlock when running multiple instances of the CLI tool**. When running `amd-smi reset --gpureset --gpu all` and then running an instance of `amd-smi static` (or any other subcommand that access the GPUs) a mutex would lock and not return requiring either a clear of the mutex in /dev/shm or rebooting the machine. -- **Fixed multiple processes not being registered in `amd-smi process` with json and csv format**. +- **Fixed multiple processes not being registered in `amd-smi process` with json and csv format**. Multiple process outputs in the CLI tool were not being registered correctly. The json output did not handle multiple processes and is now in a new valid json format: ```shell @@ -209,33 +235,33 @@ Multiple process outputs in the CLI tool were not being registered correctly. Th ] ``` -- **Removed `throttle-status` from `amd-smi monitor` as it is no longer reliably supported**. +- **Removed `throttle-status` from `amd-smi monitor` as it is no longer reliably supported**. Throttle status may work for older ASICs, but will be replaced with PVIOL and TVIOL metrics for future ASIC support. It remains a field in the gpu_metrics API and in `amd-smi metric --power`. -- **`amdsmi_get_gpu_board_info()` no longer returns junk char strings**. +- **`amdsmi_get_gpu_board_info()` no longer returns junk char strings**. Previously if there was a partial failure to retrieve character strings, we would return garbage output to users using the API. This fix intends to populate as many values as possible. Then any failure(s) found along the way, `\0` is provided to `amdsmi_board_info_t` structures data members which cannot be populated. Ensuring empty char string values. -- **Fixed parsing of `pp_od_clk_voltage` within `amdsmi_get_gpu_od_volt_info`**. +- **Fixed parsing of `pp_od_clk_voltage` within `amdsmi_get_gpu_od_volt_info`**. The parsing of `pp_od_clk_voltage` was not dynamic enough to work with the dropping of voltage curve support on MI series cards. This propagates down to correcting the CLI's output `amd-smi metric --voltage-curve` to N/A if voltage curve is not enabled. ### Known Issues -- **`amdsmi_get_gpu_process_isolation` and `amdsmi_clean_gpu_local_data` commands do no currently work and will be supported in a future release**. +- **`amdsmi_get_gpu_process_isolation` and `amdsmi_clean_gpu_local_data` commands do no currently work and will be supported in a future release**. ## amd_smi_lib for ROCm 6.1.2 ### Additions -- **Added process isolation and clean shader APIs and CLI commands**. +- **Added process isolation and clean shader APIs and CLI commands**. Added APIs CLI and APIs to address LeftoverLocals security issues. Allowing clearing the sram data and setting process isolation on a per GPU basis. New APIs: - `amdsmi_get_gpu_process_isolation()` - `amdsmi_set_gpu_process_isolation()` - `amdsmi_set_gpu_clear_sram_data()` -- **Added `MIN_POWER` to output of `amd-smi static --limit`**. +- **Added `MIN_POWER` to output of `amd-smi static --limit`**. This change helps users identify the range to which they can change the power cap of the GPU. The change is added to simplify why a device supports (or does not support) power capping (also known as overdrive). See `amd-smi set -g all --power-cap ` or `amd-smi reset -g all --power-cap`. ```shell @@ -267,7 +293,7 @@ GPU: 1 ### Optimizations -- **Updated `amd-smi monitor --pcie` output**. +- **Updated `amd-smi monitor --pcie` output**. The source for pcie bandwidth monitor output was a legacy file we no longer support and was causing delays within the monitor command. The output is no longer using TX/RX but instantaneous bandwidth from gpu_metrics instead; updated output: ```shell @@ -276,13 +302,13 @@ GPU PCIE_BW 0 26 Mb/s ``` -- **`amdsmi_get_power_cap_info` now returns values in uW instead of W**. +- **`amdsmi_get_power_cap_info` now returns values in uW instead of W**. `amdsmi_get_power_cap_info` will return in uW as originally reflected by driver. Previously `amdsmi_get_power_cap_info` returned W values, this conflicts with our sets and modifies values retrieved from driver. We decided to keep the values returned from driver untouched (in original units, uW). Then in CLI we will convert to watts (as previously done - no changes here). Additionally, driver made updates to min power cap displayed for devices when overdrive is disabled which prompted for this change (in this case min_power_cap and max_power_cap are the same). -- **Updated Python Library return types for amdsmi_get_gpu_memory_reserved_pages & amdsmi_get_gpu_bad_page_info**. +- **Updated Python Library return types for amdsmi_get_gpu_memory_reserved_pages & amdsmi_get_gpu_bad_page_info**. Previously calls were returning "No bad pages found." if no pages were found, now it only returns the list type and can be empty. -- **Updated `amd-smi metric --ecc-blocks` output**. +- **Updated `amd-smi metric --ecc-blocks` output**. The ecc blocks argument was outputing blocks without counters available, updated the filtering show blocks that counters are available for: ``` shell @@ -319,12 +345,12 @@ GPU: 0 DEFERRED_COUNT: 0 ``` -- **Removed `amdsmi_get_gpu_process_info` from Python library**. +- **Removed `amdsmi_get_gpu_process_info` from Python library**. amdsmi_get_gpu_process_info was removed from the C library in an earlier build, but the API was still in the Python interface. ### Fixes -- **Fixed `amd-smi metric --power` now provides power output for Navi2x/Navi3x/MI1x**. +- **Fixed `amd-smi metric --power` now provides power output for Navi2x/Navi3x/MI1x**. These systems use an older version of gpu_metrics in amdgpu. This fix only updates what CLI outputs. No change in any of our APIs. @@ -349,10 +375,10 @@ GPU: 1 THROTTLE_STATUS: UNTHROTTLED ``` -- **Fixed `amdsmitstReadWrite.TestPowerCapReadWrite` test for Navi3X, Navi2X, MI100**. +- **Fixed `amdsmitstReadWrite.TestPowerCapReadWrite` test for Navi3X, Navi2X, MI100**. Updates required `amdsmi_get_power_cap_info` to return in uW as originally reflected by driver. Previously `amdsmi_get_power_cap_info` returned W values, this conflicts with our sets and modifies values retrieved from driver. We decided to keep the values returned from driver untouched (in original units, uW). Then in CLI we will convert to watts (as previously done - no changes here). Additionally, driver made updates to min power cap displayed for devices when overdrive is disabled which prompted for this change (in this case min_power_cap and max_power_cap are the same). -- **Fixed Python interface call amdsmi_get_gpu_memory_reserved_pages & amdsmi_get_gpu_bad_page_info**. +- **Fixed Python interface call amdsmi_get_gpu_memory_reserved_pages & amdsmi_get_gpu_bad_page_info**. Previously Python interface calls to populated bad pages resulted in a `ValueError: NULL pointer access`. This fixes the bad-pages subcommand CLI subcommand as well. ### Known Issues @@ -363,7 +389,7 @@ Previously Python interface calls to populated bad pages resulted in a `ValueErr ### Changes -- **Updated metrics --clocks**. +- **Updated metrics --clocks**. Output for `amd-smi metric --clock` is updated to reflect each engine and bug fixes for the clock lock status and deep sleep status. ``` shell @@ -474,7 +500,7 @@ GPU: 0 DEEP_SLEEP: ENABLED ``` -- **Added deferred ecc counts**. +- **Added deferred ecc counts**. Added deferred error correctable counts to `amd-smi metric --ecc --ecc-blocks` ```shell @@ -498,7 +524,7 @@ GPU: 0 ... ``` -- **Updated `amd-smi topology --json` to align with host/guest**. +- **Updated `amd-smi topology --json` to align with host/guest**. Topology's `--json` output now is changed to align with output host/guest systems. Additionally, users can select/filter specific topology details as desired (refer to `amd-smi topology -h` for full list). See examples shown below. *Previous format:* @@ -633,18 +659,18 @@ $ /opt/rocm/bin/amd-smi topology -a -t --json ### Fixes -- **Fix for GPU reset error on non-amdgpu cards**. +- **Fix for GPU reset error on non-amdgpu cards**. Previously our reset could attempting to reset non-amd GPUS- resuting in "Unable to reset non-amd GPU" error. Fix updates CLI to target only AMD ASICs. -- **Fix for `amd-smi metric --pcie` and `amdsmi_get_pcie_info()`Navi32/31 cards**. +- **Fix for `amd-smi metric --pcie` and `amdsmi_get_pcie_info()`Navi32/31 cards**. Updated API to include `amdsmi_card_form_factor_t.AMDSMI_CARD_FORM_FACTOR_CEM`. Prevously, this would report "UNKNOWN". This fix provides the correct board `SLOT_TYPE` associated with these ASICs (and other Navi cards). -- **Fix for `amd-smi process`**. +- **Fix for `amd-smi process`**. Fixed output results when getting processes running on a device. -- **Improved Error handling for `amd-smi process`**. +- **Improved Error handling for `amd-smi process`**. Fixed Attribute Error when getting process in csv format ### Known issues @@ -655,7 +681,7 @@ Fixed Attribute Error when getting process in csv format ### Additions -- **Added Monitor Command**. +- **Added Monitor Command**. Provides users the ability to customize GPU metrics to capture, collect, and observe. Output is provided in a table view. This aligns closer to ROCm SMI `rocm-smi` (no argument), additionally allows uers to customize what data is helpful for their use-case. ```shell @@ -715,7 +741,7 @@ GPU POWER GPU_TEMP MEM_TEMP GFX_UTIL GFX_CLOCK MEM_UTIL MEM_CLOCK VRAM_U 7 175 W 34 °C 32 °C 0 % 113 MHz 0 % 900 MHz 283 MB 196300 MB ``` -- **Integrated ESMI Tool**. +- **Integrated ESMI Tool**. Users can get CPU metrics and telemetry through our API and CLI tools. This information can be seen in `amd-smi static` and `amd-smi metric` commands. Only available for limited target processors. As of ROCm 6.0.2, this is listed as: - AMD Zen3 based CPU Family 19h Models 0h-Fh and 30h-3Fh - AMD Zen4 based CPU Family 19h Models 10h-1Fh and A0-AFh @@ -865,7 +891,7 @@ CPU: 0 RESPONSE: N/A ``` -- **Added support for new metrics: VCN, JPEG engines, and PCIe errors**. +- **Added support for new metrics: VCN, JPEG engines, and PCIe errors**. Using the AMD SMI tool, users can retreive VCN, JPEG engines, and PCIe errors by calling `amd-smi metric -P` or `amd-smi metric --usage`. Depending on device support, `VCN_ACTIVITY` will update for MI3x ASICs (with 4 separate VCN engine activities) for older asics `MM_ACTIVITY` with UVD/VCN engine activity (average of all engines). `JPEG_ACTIVITY` is a new field for MI3x ASICs, where device can support up to 32 JPEG engine activities. See our documentation for more in-depth understanding of these new fields. ```shell @@ -898,7 +924,7 @@ GPU: 0 ``` -- **Added AMDSMI Tool Version**. +- **Added AMDSMI Tool Version**. AMD SMI will report ***three versions***: AMDSMI Tool, AMDSMI Library version, and ROCm version. The AMDSMI Tool version is the CLI/tool version number with commit ID appended after `+` sign. The AMDSMI Library version is the library package version number. @@ -909,7 +935,7 @@ $ amd-smi version AMDSMI Tool: 23.4.2+505b858 | AMDSMI Library version: 24.2.0.0 | ROCm version: 6.1.0 ``` -- **Added XGMI table**. +- **Added XGMI table**. Displays XGMI information for AMD GPU devices in a table format. Only available on supported ASICs (eg. MI300). Here users can view read/write data XGMI or PCIe accumulated data transfer size (in KiloBytes). ```shell @@ -943,7 +969,7 @@ GPU7 0000:df:00.0 32 Gb/s 512 Gb/s XGMI ``` -- **Added units of measure to JSON output**. +- **Added units of measure to JSON output**. We added unit of measure to JSON/CSV `amd-smi metric`, `amd-smi static`, and `amd-smi monitor` commands. Ex. @@ -979,7 +1005,7 @@ amd-smi metric -p --json ### Changes -- **Topology is now left-aligned with BDF of each device listed individual table's row/coloumns**. +- **Topology is now left-aligned with BDF of each device listed individual table's row/coloumns**. We provided each device's BDF for every table's row/columns, then left aligned data. We want AMD SMI Tool output to be easy to understand and digest for our users. Having users scroll up to find this information made it difficult to follow, especially for devices which have many devices associated with one ASIC. ```shell @@ -1042,9 +1068,9 @@ NUMA BW TABLE: ### Fixes -- **Fix for Navi3X/Navi2X/MI100 `amdsmi_get_gpu_pci_bandwidth()` in frequencies_read tests**. +- **Fix for Navi3X/Navi2X/MI100 `amdsmi_get_gpu_pci_bandwidth()` in frequencies_read tests**. Devices which do not report (eg. Navi3X/Navi2X/MI100) we have added checks to confirm these devices return AMDSMI_STATUS_NOT_SUPPORTED. Otherwise, tests now display a return string. -- **Fix for devices which have an older pyyaml installed**. +- **Fix for devices which have an older pyyaml installed**. Platforms which are identified as having an older pyyaml version or pip, we no manually update both pip and pyyaml as needed. This corrects issues identified below. Fix impacts the following CLI commands: - `amd-smi list` - `amd-smi static` @@ -1056,20 +1082,20 @@ Platforms which are identified as having an older pyyaml version or pip, we no m TypeError: dump_all() got an unexpected keyword argument 'sort_keys' ``` -- **Fix for crash when user is not a member of video/render groups**. +- **Fix for crash when user is not a member of video/render groups**. AMD SMI now uses same mutex handler for devices as rocm-smi. This helps avoid crashes when DRM/device data is inaccessable to the logged in user. ## amd_smi_lib for ROCm 6.0.0 ### Additions -- **Integrated the E-SMI (EPYC-SMI) library**. +- **Integrated the E-SMI (EPYC-SMI) library**. You can now query CPU-related information directly through AMD SMI. Metrics include power, energy, performance, and other system details. -- **Added support for gfx942 metrics**. +- **Added support for gfx942 metrics**. You can now query MI300 device metrics to get real-time information. Metrics include power, temperature, energy, and performance. -- **Compute and memory partition support**. +- **Compute and memory partition support**. Users can now view, set, and reset partitions. The topology display can provide a more in-depth look at the device's current configuration. ### Optimizations @@ -1078,13 +1104,13 @@ Users can now view, set, and reset partitions. The topology display can provide ### Changes -- **GPU index sorting made consistent with other tools**. +- **GPU index sorting made consistent with other tools**. To ensure alignment with other ROCm software tools, GPU index sorting is optimized to use Bus:Device.Function (BDF) rather than the card number. -- **Topology output is now aligned with GPU BDF table**. +- **Topology output is now aligned with GPU BDF table**. Earlier versions of the topology output were difficult to read since each GPU was displayed linearly. Now the information is displayed as a table by each GPU's BDF, which closer resembles rocm-smi output. ### Fixes -- **Fix for driver not initialized**. +- **Fix for driver not initialized**. If driver module is not loaded, user retrieve error reponse indicating amdgpu module is not loaded. diff --git a/CMakeLists.txt b/CMakeLists.txt index ba6636a658..18b11617fd 100755 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -187,9 +187,10 @@ if(BUILD_TESTS) add_subdirectory("tests/amd_smi_test") endif() -# python interface and CLI depend on shared libraries +# python interface, CLI, and py-test depend on shared libraries if(BUILD_SHARED_LIBS) add_subdirectory("py-interface") + add_subdirectory("pytest") if(BUILD_CLI) add_subdirectory("amdsmi_cli") endif() diff --git a/DEBIAN/postinst.in b/DEBIAN/postinst.in index 76c97b92f8..474cebc1b8 100755 --- a/DEBIAN/postinst.in +++ b/DEBIAN/postinst.in @@ -2,6 +2,7 @@ do_updatepciids() { update-pciids >/dev/null 2>&1 || true + return } do_configureLogrotate() { @@ -187,9 +188,22 @@ do_install_amdsmi_python_lib() { fi } +do_install_amdsmi_pytest() { + echo -n "Installing pytest... " + pip install -U pytest >/dev/null 2>&1 + if [ $? -ne 0 ]; then + echo "[WARNING] Detected pytest could not be installed. Running pytest may not work as documented." + else + echo -n "[SUCCESS]" + echo "" + fi + return +} + case "$1" in ( configure ) do_install_amdsmi_python_lib + do_install_amdsmi_pytest do_ldconfig do_updatepciids do_configureLogrotate || exit 0 diff --git a/RPM/post.in b/RPM/post.in index f8591489b5..78c71864ed 100755 --- a/RPM/post.in +++ b/RPM/post.in @@ -2,6 +2,7 @@ do_updatepciids() { update-pciids >/dev/null 2>&1 || true + return } do_configureLogrotate() { @@ -186,9 +187,22 @@ do_install_amdsmi_python_lib() { fi } +do_install_amdsmi_pytest() { + echo -n "Installing pytest... " + pip install -U pytest >/dev/null 2>&1 + if [ $? -ne 0 ]; then + echo "[WARNING] Detected pytest could not be installed. Running pytest may not work as documented." + else + echo -n "[SUCCESS]" + echo "" + fi + return +} + # post install or upgrade, $i is 1 or 2 -> do these actions if [ "$1" -ge 1 ]; then do_install_amdsmi_python_lib + do_install_amdsmi_pytest do_ldconfig do_updatepciids do_configureLogrotate || exit 0 diff --git a/py-interface/amdsmi_interface.py b/py-interface/amdsmi_interface.py index 664b7d2b3e..8c001011d9 100644 --- a/py-interface/amdsmi_interface.py +++ b/py-interface/amdsmi_interface.py @@ -590,6 +590,26 @@ def _padHexValue(value, length): return '0x' + value[2:].zfill(length) return value +class UIntegerTypes(IntEnum): + UINT8_T = 0xFF + UINT16_T = 0xFFFF + UINT32_T = 0xFFFFFFFF + UINT64_T = 0xFFFFFFFFFFFFFFFF + +def _validateIfMaxUint(valToCheck, uintType: UIntegerTypes): + return_val = "N/A" + if not isinstance(valToCheck, list): + if valToCheck == uintType: + return return_val + else: + return valToCheck + else: + return_val = valToCheck + for idx, v in enumerate(valToCheck): + if v == uintType: + return_val[idx] = "N/A" + return return_val + def amdsmi_get_socket_handles() -> List[amdsmi_wrapper.amdsmi_socket_handle]: """ @@ -2277,31 +2297,23 @@ def amdsmi_get_pcie_info( pcie_info_dict = { "pcie_static": { - "max_pcie_width": pcie_info.pcie_static.max_pcie_width, - "max_pcie_speed": pcie_info.pcie_static.max_pcie_speed, - "pcie_interface_version": pcie_info.pcie_static.pcie_interface_version, + "max_pcie_width": _validateIfMaxUint(pcie_info.pcie_static.max_pcie_width, UIntegerTypes.UINT16_T), + "max_pcie_speed": _validateIfMaxUint(pcie_info.pcie_static.max_pcie_speed, UIntegerTypes.UINT32_T), + "pcie_interface_version": _validateIfMaxUint(pcie_info.pcie_static.pcie_interface_version, UIntegerTypes.UINT32_T), "slot_type": pcie_info.pcie_static.slot_type, }, "pcie_metric": { - "pcie_width": pcie_info.pcie_metric.pcie_width, - "pcie_speed": pcie_info.pcie_metric.pcie_speed, - "pcie_bandwidth": pcie_info.pcie_metric.pcie_bandwidth, - "pcie_replay_count": pcie_info.pcie_metric.pcie_replay_count, - "pcie_l0_to_recovery_count": pcie_info.pcie_metric.pcie_l0_to_recovery_count, - "pcie_replay_roll_over_count": pcie_info.pcie_metric.pcie_replay_roll_over_count, - "pcie_nak_sent_count": pcie_info.pcie_metric.pcie_nak_sent_count, - "pcie_nak_received_count": pcie_info.pcie_metric.pcie_nak_received_count, + "pcie_width": _validateIfMaxUint(pcie_info.pcie_metric.pcie_width, UIntegerTypes.UINT16_T), + "pcie_speed": _validateIfMaxUint(pcie_info.pcie_metric.pcie_speed, UIntegerTypes.UINT32_T), + "pcie_bandwidth": _validateIfMaxUint(pcie_info.pcie_metric.pcie_bandwidth, UIntegerTypes.UINT32_T), + "pcie_replay_count": _validateIfMaxUint(pcie_info.pcie_metric.pcie_replay_count, UIntegerTypes.UINT64_T), + "pcie_l0_to_recovery_count": _validateIfMaxUint(pcie_info.pcie_metric.pcie_l0_to_recovery_count, UIntegerTypes.UINT64_T), + "pcie_replay_roll_over_count": _validateIfMaxUint(pcie_info.pcie_metric.pcie_replay_roll_over_count, UIntegerTypes.UINT64_T), + "pcie_nak_sent_count": _validateIfMaxUint(pcie_info.pcie_metric.pcie_nak_sent_count, UIntegerTypes.UINT64_T), + "pcie_nak_received_count": _validateIfMaxUint(pcie_info.pcie_metric.pcie_nak_received_count, UIntegerTypes.UINT64_T), } } - # Check pcie static values for uint max - if pcie_info_dict['pcie_static']['max_pcie_width'] == 0xFFFF: - pcie_info_dict['pcie_static']['max_pcie_width'] = "N/A" - if pcie_info_dict['pcie_static']['max_pcie_speed'] == 0xFFFFFFFF: - pcie_info_dict['pcie_static']['max_pcie_speed'] = "N/A" - if pcie_info_dict['pcie_static']['pcie_interface_version'] == 0xFFFFFFFF: - pcie_info_dict['pcie_static']['pcie_interface_version'] = "N/A" - slot_type = pcie_info_dict['pcie_static']['slot_type'] if isinstance(slot_type, int): slot_types = amdsmi_wrapper.amdsmi_card_form_factor_t__enumvalues @@ -2312,29 +2324,6 @@ def amdsmi_get_pcie_info( else: pcie_info_dict['pcie_static']['slot_type'] = "N/A" - # Check pcie metric values for uint max - if pcie_info_dict['pcie_metric']['pcie_width'] == 0xFFFF: - pcie_info_dict['pcie_metric']['pcie_width'] = "N/A" - if pcie_info_dict['pcie_metric']['pcie_speed'] == 0xFFFFFFFF: - pcie_info_dict['pcie_metric']['pcie_speed'] = "N/A" - if pcie_info_dict['pcie_metric']['pcie_bandwidth'] == 0xFFFFFFFF: - pcie_info_dict['pcie_metric']['pcie_bandwidth'] = "N/A" - - # TODO Just Navi 21 has a different uint max size for pcie_bandwidth - # if pcie_info_dict['pcie_metric']['pcie_bandwidth'] == 0xFFFFFFFF: - # pcie_info_dict['pcie_metric']['pcie_bandwidth'] = "N/A" - - if pcie_info_dict['pcie_metric']['pcie_replay_count'] == 0xFFFFFFFFFFFFFFFF: - pcie_info_dict['pcie_metric']['pcie_replay_count'] = "N/A" - if pcie_info_dict['pcie_metric']['pcie_l0_to_recovery_count'] == 0xFFFFFFFFFFFFFFFF: - pcie_info_dict['pcie_metric']['pcie_l0_to_recovery_count'] = "N/A" - if pcie_info_dict['pcie_metric']['pcie_replay_roll_over_count'] == 0xFFFFFFFFFFFFFFFF: - pcie_info_dict['pcie_metric']['pcie_replay_roll_over_count'] = "N/A" - if pcie_info_dict['pcie_metric']['pcie_nak_sent_count'] == 0xFFFFFFFFFFFFFFFF: - pcie_info_dict['pcie_metric']['pcie_nak_sent_count'] = "N/A" - if pcie_info_dict['pcie_metric']['pcie_nak_received_count'] == 0xFFFFFFFFFFFFFFFF: - pcie_info_dict['pcie_metric']['pcie_nak_received_count'] = "N/A" - return pcie_info_dict diff --git a/py-interface/pyproject.toml.in b/py-interface/pyproject.toml.in index 45b18e4d11..6ea29b712f 100644 --- a/py-interface/pyproject.toml.in +++ b/py-interface/pyproject.toml.in @@ -29,3 +29,9 @@ packages = ["amdsmi"] # install libamd_smi.so [tool.setuptools.package-data] amdsmi = ["*.so"] + +[tool.pytest.ini_options] +pythonpath = "/opt/rocm/share/amd_smi" +addopts = [ + "--import-mode=importlib", +] diff --git a/pytest/CMakeLists.txt b/pytest/CMakeLists.txt new file mode 100644 index 0000000000..693d3b9f57 --- /dev/null +++ b/pytest/CMakeLists.txt @@ -0,0 +1,28 @@ +message("&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&") +message(" CMake AMD SMI Pytest ") +message("&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&") + +message("") +message("Build Configuration:") +# message("-----------BuildType: " ${CMAKE_BUILD_TYPE}) +# message("------------Compiler: " ${CMAKE_CXX_COMPILER}) +# message("-------------Version: " ${CMAKE_CXX_COMPILER_VERSION}) +message("--------Proj Src Dir: " ${PROJECT_SOURCE_DIR}) +# message("--------Proj Bld Dir: " ${PROJECT_BINARY_DIR}) +# message("--------Proj Lib Dir: " ${PROJECT_BINARY_DIR}/lib) +# message("--------Proj Exe Dir: " ${PROJECT_BINARY_DIR}/bin) +message("--------Share Install Prefix: " ${SHARE_INSTALL_PREFIX}) +message("--------Cpack_include_toplevel_directory: " ${CPACK_INCLUDE_TOPLEVEL_DIRECTORY}) +message("--------CPACK_COMPONENT_INCLUDE_TOPLEVEL_DIRECTORY: " ${CPACK_COMPONENT_INCLUDE_TOPLEVEL_DIRECTORY}) + +# copy python test files into shared directory +install( + DIRECTORY ./ + DESTINATION ${SHARE_INSTALL_PREFIX}/tests/pytest/ + COMPONENT dev + USE_SOURCE_PERMISSIONS + FILES_MATCHING + PATTERN "*.py" +) + +# message(FATAL_ERROR "python lib stop") \ No newline at end of file diff --git a/pytest/README.md b/pytest/README.md new file mode 100644 index 0000000000..aaee9d0852 --- /dev/null +++ b/pytest/README.md @@ -0,0 +1,1583 @@ +# How to Python Unit Tests +## Overview +We use Python's default Python unittest testing framework. You can read more about it here [Python unittest v3.8](https://docs.python.org/3.8/library/unittest.html). Alternatively, you can read up on pytest through here [Pytest how-to usage](https://docs.pytest.org/en/latest/index.html). + +## Warning to Users +AMD SMI Python API tests are subject to change. These tests are currently a work in progress and may not work on your system. + +## Pre-Requisites Before Running Tests +Follow our install/build guides to ensure the Python API is installed correctly according to [AMD SMI installation](https://rocm.docs.amd.com/projects/amdsmi/en/latest/). + +***Versions***: Python 3.8+ + +## How to Run +### Basic How To +The 2 tests are in this PATH: +```/opt/rocm/share/amd_smi/tests/pytest/integration_test.py``` +```/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py``` + + +The recommended method to run the tests: +Pytest verbose +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -s -v``` +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -s -v``` + +Pytest only (not verbose) +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v``` +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v``` + +Unittest verbose +```/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v``` +```/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v``` + +Unittest only (not verbose) +```/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -b -v``` +```/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -b -v``` + +See sections below for more detailed options with examples. + +### Unittest or Pytest Run +The Unittest Run calls the tests directly, assuming pytest is correctly installed in the PATH. +This is more straightforward and intuitive but with less control for options. For example, the cache provider will always be used. + +```/opt/rocm/share/amd_smi/tests/pytest/*``` + +options: + - -h, --help show this help message and exit + - -v, --verbose Verbose output + - -q, --quiet Quiet output + - -b, --buffer Buffer stdout and stderr during tests + - -k "TESTNAME" Only run tests which match the given substring + +The Pytest Run could be more reliable and consistent, especially if pytest is not in the PATH. +This offers more options and flexibility, such as the option to disable the cache provider, ensuring completely independent runs. + +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/*``` + +options: + - -h, --help show this help message and exit + - --co Collect and list tests + - -p no:cacheprovider Disable cache provider + - -v, --verbose Verbose output + - -q, --quiet Quiet output + - -s, --capture=no Disables output capturing, stdout output + - -k "TESTNAME" Only run tests which match the given substring + +The complete list of options can be accessed here [Pytest command-line flags](https://docs.pytest.org/en/latest/reference/reference.html#command-line-flags). + +## Unittest Run Options +### Unittest Run: Verbose on +Helpful to see print outs of Python. + +```/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v``` + +```/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v``` + +ex. +
+ Click for example: Unittest run: verbose on + +~~~shell +/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v +test_init (__main__.TestAmdSmiInit) ... ok +test_bad_page_info (__main__.TestAmdSmiPythonInterface) ... ###Test amdsmi_get_gpu_bad_page_info + +**** [ERROR] | Test: test_bad_page_info | Caught AmdSmiLibraryException +ok +test_bdf_device_id (__main__.TestAmdSmiPythonInterface) ... ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['version'] is: 020.001.000.038.015697 + + vbios_info['name'] is: NAVI21 Gaming XL D412 + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 81ff73bf-0000-1000-80c1-6890a5911040 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['version'] is: 020.001.000.060.016898 + + vbios_info['name'] is: NAVI21 D43001 GLXL + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 1fff73a3-0000-1000-8075-223e5e64eac1 +ok +test_ecc (__main__.TestAmdSmiPythonInterface) ... ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_ras_feature_info + +**** [ERROR] | Test: test_ecc | Caught AmdSmiLibraryException +ok +test_gpu_performance (__main__.TestAmdSmiPythonInterface) ... ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 3 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 8 + power_info['gfx_voltage'] is: 768 + power_info['soc_voltage'] is: 918 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 203000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 42 + Current temperature for HOTSPOT is: 42 + Current temperature for VRAM is: 38 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 100 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 105 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2475 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 4 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 5000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 0 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 13 + power_info['gfx_voltage'] is: 781 + power_info['soc_voltage'] is: 812 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 213000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 34 + Current temperature for HOTSPOT is: 38 + Current temperature for VRAM is: 36 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 109 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 114 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2555 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 16 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 8000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +ok +test_walkthrough (__main__.TestAmdSmiPythonInterface) ... ###Test amdsmi_get_processor_handles() +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 0 +###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: NAVI21 + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73bf + asic_info['rev_id'] is: 0xc3 + + asic_info['asic_serial'] is: 0x81C16890A5911040 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 203000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['name'] is: NAVI21 Gaming XL D412 + + vbios_info['version'] is: 020.001.000.038.015697 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 0 +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 1 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73a3 + asic_info['rev_id'] is: 0x00 + + asic_info['asic_serial'] is: 0x1F75223E5E64EAC1 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 213000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['name'] is: NAVI21 D43001 GLXL + + vbios_info['version'] is: 020.001.000.060.016898 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 1 +ok + +---------------------------------------------------------------------- +Ran 6 tests in 0.083s + +OK + +~~~ + +
+ + +### Unittest Run: Verbose on + Filter (or exclude) a test + +```/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -k "test_walkthrough" -v``` + +```/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -k "not test_walkthrough" -v``` + +ex. +
+ Click for example: Unittest Run: Verbose on + Filter (or exclude) a Test + +~~~shell +> /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -k "test_bdf_device_id" -v +test_bdf_device_id (__main__.TestAmdSmiPythonInterface) ... ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['version'] is: 020.001.000.038.015697 + + vbios_info['name'] is: NAVI21 Gaming XL D412 + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 81ff73bf-0000-1000-80c1-6890a5911040 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['version'] is: 020.001.000.060.016898 + + vbios_info['name'] is: NAVI21 D43001 GLXL + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 1fff73a3-0000-1000-8075-223e5e64eac1 +ok + +---------------------------------------------------------------------- +Ran 1 test in 0.012s + +OK +~~~ +
+ + +### Unittest Run: Silence stdout (print statements) and run all tests + Runs all tests. Silence print statements to stdout. Lists tests results. + This is also the best way to list all tests available. + +```/opt/rocm/share/amd_smi/tests/pytest/integration_test.py -b -v``` + +```/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -b -v``` + +ex. +
+ Click for example: Unittest Run: Silence stdout (print statements) and run all tests + +~~~shell +/opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -b -v +test_check_res (__main__.TestAmdSmiPythonBDF) ... ok +test_format_bdf (__main__.TestAmdSmiPythonBDF) ... ok +test_parse_bdf (__main__.TestAmdSmiPythonBDF) ... ok + +---------------------------------------------------------------------- +Ran 3 tests in 0.001s + +OK +~~~ + +
+ +## Pytest Run Options +### Pytest: List tests +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py --co``` + +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py --co``` + +ex. +
+ Click for example: Pytest: List tests + +~~~shell +python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py --co +===================================================== test session starts ===================================================== +platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 +rootdir: /opt/rocm/share/amd_smi +configfile: pyproject.toml +collected 6 items + + + + + + + + + + + + + + +================================================= 6 tests collected in 0.04s ================================================== +~~~ +
+ +### Pytest Run: Verbose on +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v``` + +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v``` + +ex. +
+ Click for example: Pytest Run: verbose on + +~~~shell + python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v +===================================================== test session starts ===================================================== +platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /usr/bin/python3 +rootdir: /opt/rocm/share/amd_smi +configfile: pyproject.toml +collected 3 items + +../../opt/rocm/share/amd_smi/tests/pytest/unit_tests.py::TestAmdSmiPythonBDF::test_check_res PASSED [ 33%] +../../opt/rocm/share/amd_smi/tests/pytest/unit_tests.py::TestAmdSmiPythonBDF::test_format_bdf PASSED [ 66%] +../../opt/rocm/share/amd_smi/tests/pytest/unit_tests.py::TestAmdSmiPythonBDF::test_parse_bdf PASSED [100%] + +====================================================== 3 passed in 0.04s ====================================================== +~~~ +
+ +### Pytest Run: Verbose on + stdout (print statements) +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -s -v``` + +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -s -v``` + +ex. +
+ Click for example: Pytest Run: verbose on + stdout (print statements) + +~~~shell +python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -s -v +===================================================== test session starts ===================================================== +platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /usr/bin/python3 +rootdir: /opt/rocm/share/amd_smi +configfile: pyproject.toml +collected 6 items + +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiInit::test_init PASSED +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_bad_page_info ###Test amdsmi_get_gpu_bad_page_info + +**** [ERROR] | Test: test_bad_page_info | Caught AmdSmiLibraryException +PASSED +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_bdf_device_id ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['version'] is: 020.001.000.038.015697 + + vbios_info['name'] is: NAVI21 Gaming XL D412 + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 81ff73bf-0000-1000-80c1-6890a5911040 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['version'] is: 020.001.000.060.016898 + + vbios_info['name'] is: NAVI21 D43001 GLXL + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 1fff73a3-0000-1000-8075-223e5e64eac1 +PASSED +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_ecc ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_ras_feature_info + +**** [ERROR] | Test: test_ecc | Caught AmdSmiLibraryException +PASSED +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_gpu_performance ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 1 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 8 + power_info['gfx_voltage'] is: 768 + power_info['soc_voltage'] is: 918 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 203000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 42 + Current temperature for HOTSPOT is: 43 + Current temperature for VRAM is: 38 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 100 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 105 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2475 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 4 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 5000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 0 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 13 + power_info['gfx_voltage'] is: 787 + power_info['soc_voltage'] is: 806 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 213000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 34 + Current temperature for HOTSPOT is: 37 + Current temperature for VRAM is: 36 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 109 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 114 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2555 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 16 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 8000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +PASSED +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_walkthrough ###Test amdsmi_get_processor_handles() +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 0 +###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: NAVI21 + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73bf + asic_info['rev_id'] is: 0xc3 + + asic_info['asic_serial'] is: 0x81C16890A5911040 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 203000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['name'] is: NAVI21 Gaming XL D412 + + vbios_info['version'] is: 020.001.000.038.015697 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 0 +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 1 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73a3 + asic_info['rev_id'] is: 0x00 + + asic_info['asic_serial'] is: 0x1F75223E5E64EAC1 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 213000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['name'] is: NAVI21 D43001 GLXL + + vbios_info['version'] is: 020.001.000.060.016898 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 1 +PASSED + +====================================================== 6 passed in 0.13s ====================================================== +~~~ +
+ +### Pytest Run: Verbose on + Filter (or exclude) a Test +Use [Pytest: List tests](###-Pytest:-List-tests) then either exclude (with "not") or only run the specified test. + +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -k "test_gpu_performance" -v``` + +```python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -k "not test_gpu_performance" -v``` + +ex. +
+ Click for example: Pytest Run: Verbose on + Filter (or exclude) a Test + +~~~shell +python3 -m pytest -p no:cacheprovider /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -k "not test_gpu_performance" -v +===================================================== test session starts ===================================================== +platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /usr/bin/python3 +rootdir: /opt/rocm/share/amd_smi +configfile: pyproject.toml +collected 6 items / 1 deselected / 5 selected + +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiInit::test_init PASSED [ 20%] +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_bad_page_info PASSED [ 40%] +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_bdf_device_id PASSED [ 60%] +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_ecc PASSED [ 80%] +../../opt/rocm/share/amd_smi/tests/pytest/integration_test.py::TestAmdSmiPythonInterface::test_walkthrough PASSED [100%] + +=============================================== 5 passed, 1 deselected in 0.09s =============================================== +~~~ +
+ +## Run Tests +### Example Runs +Please refer to Python's UnitTest documentation for better overview of commands to run. + +```shell +python3 /opt/rocm/share/amd_smi/tests/pytest/unit_tests.py -v +test_check_res (tests.amd_smi_test.py-test.unit_tests.TestAmdSmiPythonBDF) ... ok +test_format_bdf (tests.amd_smi_test.py-test.unit_tests.TestAmdSmiPythonBDF) ... ok +test_parse_bdf (tests.amd_smi_test.py-test.unit_tests.TestAmdSmiPythonBDF) ... ok +``` + +```shell +python3 /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -v +test_init (__main__.TestAmdSmiInit) ... ok +test_bad_page_info (__main__.TestAmdSmiPythonInterface) ... ###Test amdsmi_get_gpu_bad_page_info + +**** [ERROR] | Test: test_bad_page_info | Caught AmdSmiLibraryException +ok +test_bdf_device_id (__main__.TestAmdSmiPythonInterface) ... ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['version'] is: 020.001.000.038.015697 + + vbios_info['name'] is: NAVI21 Gaming XL D412 + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 81ff73bf-0000-1000-80c1-6890a5911040 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['version'] is: 020.001.000.060.016898 + + vbios_info['name'] is: NAVI21 D43001 GLXL + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 1fff73a3-0000-1000-8075-223e5e64eac1 +ok +test_ecc (__main__.TestAmdSmiPythonInterface) ... ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_ras_feature_info + +**** [ERROR] | Test: test_ecc | Caught AmdSmiLibraryException +ok +test_gpu_performance (__main__.TestAmdSmiPythonInterface) ... ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 5 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 8 + power_info['gfx_voltage'] is: 768 + power_info['soc_voltage'] is: 918 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 203000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 41 + Current temperature for HOTSPOT is: 42 + Current temperature for VRAM is: 38 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 100 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 105 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2475 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 4 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 5000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 0 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 12 + power_info['gfx_voltage'] is: 787 + power_info['soc_voltage'] is: 806 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 213000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 33 + Current temperature for HOTSPOT is: 37 + Current temperature for VRAM is: 36 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 109 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 114 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2555 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 16 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 8000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +ok +test_walkthrough (__main__.TestAmdSmiPythonInterface) ... ###Test amdsmi_get_processor_handles() +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 0 +###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: NAVI21 + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73bf + asic_info['rev_id'] is: 0xc3 + + asic_info['asic_serial'] is: 0x81C16890A5911040 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 203000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['name'] is: NAVI21 Gaming XL D412 + + vbios_info['version'] is: 020.001.000.038.015697 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 0 +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 1 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73a3 + asic_info['rev_id'] is: 0x00 + + asic_info['asic_serial'] is: 0x1F75223E5E64EAC1 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 213000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['name'] is: NAVI21 D43001 GLXL + + vbios_info['version'] is: 020.001.000.060.016898 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 1 +ok + +---------------------------------------------------------------------- +Ran 6 tests in 0.077s + +OK +``` + +```shell +(Tue Jul-7 12:07:47am)-(CPU 0.3%:0:Net 18)-(charpoag@mlsetools2:/opt/rocm/share/amd_smi/tests/pytest)-(44K:3) +> python3 -m pytest -s -ra -vvv -p no:cacheprovider +==================================== test session starts ===================================== +platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /usr/bin/python3 +rootdir: /opt/rocm/share/amd_smi +configfile: pyproject.toml +collected 6 items + +integration_test.py::TestAmdSmiInit::test_init PASSED +integration_test.py::TestAmdSmiPythonInterface::test_bad_page_info ###Test amdsmi_get_gpu_bad_page_info + +**** [ERROR] | Test: test_bad_page_info | Caught AmdSmiLibraryException +PASSED +integration_test.py::TestAmdSmiPythonInterface::test_bdf_device_id ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['version'] is: 020.001.000.038.015697 + + vbios_info['name'] is: NAVI21 Gaming XL D412 + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 81ff73bf-0000-1000-80c1-6890a5911040 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_vbios_info + + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['version'] is: 020.001.000.060.016898 + + vbios_info['name'] is: NAVI21 D43001 GLXL + +###Test amdsmi_get_gpu_device_uuid + + uuid is: 1fff73a3-0000-1000-8075-223e5e64eac1 +PASSED +integration_test.py::TestAmdSmiPythonInterface::test_ecc ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_ras_feature_info + +**** [ERROR] | Test: test_ecc | Caught AmdSmiLibraryException +PASSED +integration_test.py::TestAmdSmiPythonInterface::test_gpu_performance ###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 3 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 8 + power_info['gfx_voltage'] is: 768 + power_info['soc_voltage'] is: 918 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 203000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 44 + Current temperature for HOTSPOT is: 45 + Current temperature for VRAM is: 40 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 100 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 105 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2475 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 4 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 5000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_activity + engine_usage['gfx_activity'] is: 0 % + engine_usage['umc_activity'] is: 0 % + engine_usage['mm_activity'] is: 0 % + +###Test amdsmi_get_power_info + power_info['current_socket_power'] is: N/A + power_info['average_socket_power'] is: 13 + power_info['gfx_voltage'] is: 781 + power_info['soc_voltage'] is: 806 + power_info['mem_voltage'] is: 1250 + power_info['power_limit'] is: 213000000 +###Test amdsmi_is_gpu_power_management_enabled + Is power management enabled is: True +###Test amdsmi_get_temp_metric + Current temperature for EDGE is: 36 + Current temperature for HOTSPOT is: 39 + Current temperature for VRAM is: 38 +###Test amdsmi_get_temp_metric + Limit (critical) temperature for EDGE is: 109 + Limit (critical) temperature for HOTSPOT is: 110 + Limit (critical) temperature for VRAM is: 100 +###Test amdsmi_get_temp_metric + Shutdown (emergency) temperature for EDGE is: 114 + Shutdown (emergency) temperature for HOTSPOT is: 115 + Shutdown (emergency) temperature for VRAM is: 105 +###Test amdsmi_get_clock_info + Current clock for domain GFX is: 500 + Max clock for domain GFX is: 2555 + Min clock for domain GFX is: 500 + Is GFX clock locked: 0 + Is GFX clock in deep sleep: 255 + Current clock for domain MEM is: 96 + Max clock for domain MEM is: 1000 + Min clock for domain MEM is: 96 + Is MEM clock in deep sleep: 255 + Current clock for domain VCLK0 is: 0 + Max clock for domain VCLK0 is: 0 + Min clock for domain VCLK0 is: 0 + Is VCLK0 clock in deep sleep: 255 + Current clock for domain VCLK1 is: 0 + Max clock for domain VCLK1 is: 0 + Min clock for domain VCLK1 is: 0 + Is VCLK1 clock in deep sleep: 255 + Current clock for domain DCLK0 is: 0 + Max clock for domain DCLK0 is: 0 + Min clock for domain DCLK0 is: 0 + Is DCLK0 clock in deep sleep: 255 + Current clock for domain DCLK1 is: 0 + Max clock for domain DCLK1 is: 0 + Min clock for domain DCLK1 is: 0 + Is DCLK1 clock in deep sleep: 255 +###Test amdsmi_get_pcie_info + pcie_info['pcie_metric']['pcie_width'] is: 16 + pcie_info['pcie_static']['max_pcie_width'] is: 16 + pcie_info['pcie_metric']['pcie_speed'] is: 8000 MT/s + pcie_info['pcie_static']['max_pcie_speed'] is: 16000 + pcie_info['pcie_static']['pcie_interface_version'] is: 4 + pcie_info['pcie_static']['slot_type'] is: CEM + pcie_info['pcie_metric']['pcie_replay_count'] is: N/A + pcie_info['pcie_metric']['pcie_bandwidth'] is: N/A + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: N/A + pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_sent_count'] is: N/A + pcie_info['pcie_metric']['pcie_nak_received_count'] is: N/A +PASSED +integration_test.py::TestAmdSmiPythonInterface::test_walkthrough ###Test amdsmi_get_processor_handles() +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 0 +###Test Processor 0, bdf: 0000:08:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: NAVI21 + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73bf + asic_info['rev_id'] is: 0xc3 + + asic_info['asic_serial'] is: 0x81C16890A5911040 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 203000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D41207XL-038 + vbios_info['build_date'] is: 2020/10/06 17:59 + vbios_info['name'] is: NAVI21 Gaming XL D412 + + vbios_info['version'] is: 020.001.000.038.015697 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 0 +###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = 1 +###Test Processor 1, bdf: 0000:44:00.0 + +###Test amdsmi_get_gpu_asic_info + asic_info['market_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + asic_info['vendor_id'] is: 0x1002 + asic_info['vendor_name'] is: Advanced Micro Devices Inc. [AMD/ATI] + asic_info['device_id'] is: 0x73a3 + asic_info['rev_id'] is: 0x00 + + asic_info['asic_serial'] is: 0x1F75223E5E64EAC1 + + asic_info['oam_id'] is: N/A + +###Test amdsmi_get_power_cap_info + power_info['dpm_cap'] is: 1 + power_info['power_cap'] is: 213000000 + +###Test amdsmi_get_gpu_vbios_info + vbios_info['part_number'] is: 113-D4300100-100 + vbios_info['build_date'] is: 2021/04/22 09:34 + vbios_info['name'] is: NAVI21 D43001 GLXL + + vbios_info['version'] is: 020.001.000.060.016898 + +###Test amdsmi_get_gpu_board_info + board_info['model_number'] is: N/A + + board_info['product_serial'] is: N/A + + board_info['fru_id'] is: N/A + + board_info['manufacturer_name'] is: Advanced Micro Devices, Inc. [AMD/ATI] + + board_info['product_name'] is: Navi 21 GL-XL [Radeon PRO W6800] + +###Test amdsmi_get_fw_info +FW name: AMDSMI_FW_ID_CP_CE +FW version: 37 +FW name: AMDSMI_FW_ID_CP_PFP +FW version: 98 +FW name: AMDSMI_FW_ID_CP_ME +FW version: 64 +FW name: AMDSMI_FW_ID_CP_MEC1 +FW version: 118 +FW name: AMDSMI_FW_ID_CP_MEC2 +FW version: 118 +FW name: AMDSMI_FW_ID_RLC +FW version: 96 +FW name: AMDSMI_FW_ID_SDMA0 +FW version: 83 +FW name: AMDSMI_FW_ID_SDMA1 +FW version: 83 +FW name: AMDSMI_FW_ID_VCN +FW version: 31.1E.00.8 +FW name: AMDSMI_FW_ID_PSP_SOSDRV +FW version: 21.0E.64 +FW name: AMDSMI_FW_ID_ASD +FW version: 553648340 +FW name: AMDSMI_FW_ID_TA_RAS +FW version: 1B.00.01.3E +FW name: AMDSMI_FW_ID_TA_XGMI +FW version: 20.00.00.0F +FW name: AMDSMI_FW_ID_PM +FW version: 58.89.0 +###Test amdsmi_get_gpu_driver_info +Driver info: {'driver_name': 'amdgpu', 'driver_version': '6.7.8', 'driver_date': '2015/01/01 00:00'} +###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = 1 +PASSED + +===================================== 6 passed in 0.10s ====================================== +``` + +```shell +$ python3 /opt/rocm/share/amd_smi/tests/pytest/integration_test.py -k "*test_init" -vvv +test_init (__main__.TestAmdSmiInit) ... ok + +---------------------------------------------------------------------- +Ran 1 test in 0.009s + +OK + +``` + +```shell +(Tue Jul-7 12:10:10am)-(CPU 0.3%:0:Net 16)-(charpoag@mlsetools2:/opt/rocm/share/amd_smi/tests/pytest)-(44K:3) +> python3 -m pytest -ra -vvv -p no:cacheprovider +==================================== test session starts ===================================== +platform linux -- Python 3.8.10, pytest-8.2.2, pluggy-1.5.0 -- /usr/bin/python3 +rootdir: /opt/rocm/share/amd_smi +configfile: pyproject.toml +collected 6 items + +integration_test.py::TestAmdSmiInit::test_init PASSED [ 16%] +integration_test.py::TestAmdSmiPythonInterface::test_bad_page_info PASSED [ 33%] +integration_test.py::TestAmdSmiPythonInterface::test_bdf_device_id PASSED [ 50%] +integration_test.py::TestAmdSmiPythonInterface::test_ecc PASSED [ 66%] +integration_test.py::TestAmdSmiPythonInterface::test_gpu_performance PASSED [ 83%] +integration_test.py::TestAmdSmiPythonInterface::test_walkthrough PASSED [100%] + +===================================== 6 passed in 0.11s ====================================== +``` \ No newline at end of file diff --git a/pytest/__init__.py b/pytest/__init__.py new file mode 100644 index 0000000000..7a3d34733f --- /dev/null +++ b/pytest/__init__.py @@ -0,0 +1,4 @@ +import sys +sys.path.append("/opt/rocm/libexec/amdsmi_cli/") + +from _version import __version__ \ No newline at end of file diff --git a/pytest/integration_test.py b/pytest/integration_test.py new file mode 100755 index 0000000000..71de1f7114 --- /dev/null +++ b/pytest/integration_test.py @@ -0,0 +1,557 @@ +#!/usr/bin/env python3 +# +# Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +import sys +sys.path.append("/opt/rocm/libexec/amdsmi_cli/") + +try: + import amdsmi +except ImportError: + raise ImportError("Could not import /opt/rocm/libexec/amdsmi_cli/amdsmi_cli.py") + +import unittest +import threading +import multiprocessing +from datetime import datetime + +def handle_exceptions(func): + """Exposes, silences, and logs AMD SMI exceptions to users what exception was raised. + + params: + func: test function(s) that use decorator to expose AMD SMI exceptions + return: + On success - original function is returned + On failure - silences error and prints to user what exception was caught + """ + def wrapper(*args, **kwargs): + try: + return func(*args, **kwargs) + except amdsmi.AmdSmiRetryException as e: + print("**** [ERROR] | Test: " + str(func.__name__) + " | Caught AmdSmiRetryException: {}".format(e)) + pass + except amdsmi.AmdSmiTimeoutException as e: + print("**** [ERROR] | Test: " + str(func.__name__) + " | Caught AmdSmiTimeoutException: {}".format(e)) + pass + except amdsmi.AmdSmiLibraryException as e: + print("**** [ERROR] | Test: " + str(func.__name__) + " | Caught AmdSmiLibraryException: {}".format(e)) + pass + except Exception as e: + print("**** [ERROR] | Test: " + str(func.__name__) + " | Caught unknown exception: {}".format(e)) + pass + return wrapper + +class TestAmdSmiInit(unittest.TestCase): + @handle_exceptions + def test_init(self): + amdsmi.amdsmi_init() + amdsmi.amdsmi_shut_down() + +class TestAmdSmiPythonInterface(unittest.TestCase): + @handle_exceptions + def setUp(self): + amdsmi.amdsmi_init() + @handle_exceptions + def tearDown(self): + amdsmi.amdsmi_shut_down() + + # Bad page is not supported in Navi21 and Navi31 + @handle_exceptions + def test_bad_page_info(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + processor = amdsmi.amdsmi_get_processor_handle_from_bdf(bdf) + print("\n###Test amdsmi_get_gpu_bad_page_info \n") + bad_page_info = amdsmi.amdsmi_get_gpu_bad_page_info(processors[i]) + print("bad_page_info: " + str(bad_page_info)) + print("Number of bad pages: {}".format(len(bad_page_info))) + j = 0 + for table_record in bad_page_info: + print("\ntable_record[\"value\"]" + str(table_record["value"])) + print("Page: {}".format(j)) + print("Page Address: " + str(table_record["page_address"])) + print("Page Size: " + str(table_record["page_size"])) + print("Status: " + str(table_record["status"])) + print() + j += 1 + print() + + def test_bdf_device_id(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + processor = amdsmi.amdsmi_get_processor_handle_from_bdf(bdf) + print("\n###Test amdsmi_get_gpu_vbios_info \n") + vbios_info = amdsmi.amdsmi_get_gpu_vbios_info(processor) + print(" vbios_info['part_number'] is: {}".format( + vbios_info['part_number'])) + print(" vbios_info['build_date'] is: {}".format( + vbios_info['build_date'])) + print(" vbios_info['version'] is: {}".format( + vbios_info['version'])) + print(" vbios_info['name'] is: {}".format( + vbios_info['name'])) + print("\n###Test amdsmi_get_gpu_device_uuid \n") + uuid = amdsmi.amdsmi_get_gpu_device_uuid(processor) + print(" uuid is: {}".format(uuid)) + print() + + def test_ecc(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_gpu_total_ecc_count \n") + ecc_info = amdsmi.amdsmi_get_gpu_total_ecc_count(processors[i]) + print("Number of uncorrectable errors: {}".format( + ecc_info['uncorrectable_count'])) + print("Number of correctable errors: {}".format( + ecc_info['correctable_count'])) + print("Number of deferred errors: {}".format( + ecc_info['deferred_count'])) + self.assertGreaterEqual(ecc_info['uncorrectable_count'], 0) + self.assertGreaterEqual(ecc_info['correctable_count'], 0) + self.assertGreaterEqual(ecc_info['deferred_count'], 0) + print() + + # RAS is not supported in Navi21 and Navi31 + @handle_exceptions + def test_ras(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_gpu_ras_feature_info \n") + ras_feature = amdsmi.amdsmi_get_gpu_ras_feature_info(processors[i]) + print("ras_feature: " + str(ras_feature)) + if ras_feature != None: + print("ras_feature: " + str(ras_feature)) + print("RAS eeprom version: {}".format(ras_feature['eeprom_version'])) + print("RAS parity schema: {}".format(ras_feature['parity_schema'])) + print("RAS single bit schema: {}".format(ras_feature['single_bit_schema'])) + print("RAS double bit schema: {}".format(ras_feature['double_bit_schema'])) + print("Poisioning supported: {}".format(ras_feature['poison_schema'])) + print() + + def test_clock_info(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_clock_info \n") + clock_measure = amdsmi.amdsmi_get_clock_info( + processors[i], amdsmi.AmdSmiClkType.GFX) + print(" Current clock for domain GFX is: {}".format( + clock_measure['clk'])) + print(" Max clock for domain GFX is: {}".format( + clock_measure['max_clk'])) + print(" Min clock for domain GFX is: {}".format( + clock_measure['min_clk'])) + print(" Is GFX clock locked: {}".format( + clock_measure['clk_locked'])) + print(" Is GFX clock in deep sleep: {}".format( + clock_measure['clk_deep_sleep'])) + clock_measure = amdsmi.amdsmi_get_clock_info( + processors[i], amdsmi.AmdSmiClkType.MEM) + print(" Current clock for domain MEM is: {}".format( + clock_measure['clk'])) + print(" Max clock for domain MEM is: {}".format( + clock_measure['max_clk'])) + print(" Min clock for domain MEM is: {}".format( + clock_measure['min_clk'])) + print(" Is MEM clock in deep sleep: {}".format( + clock_measure['clk_deep_sleep'])) + print() + + # VCLK0 and DCLK0 are not supported in MI210 + @handle_exceptions + def test_gpu_clock_vclk0_dclk0(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_clock_info \n") + clock_measure = amdsmi.amdsmi_get_clock_info( + processors[i], amdsmi.AmdSmiClkType.VCLK0) + print(" Current clock for domain VCLK0 is: {}".format( + clock_measure['clk'])) + print(" Max clock for domain VCLK0 is: {}".format( + clock_measure['max_clk'])) + print(" Min clock for domain VCLK0 is: {}".format( + clock_measure['min_clk'])) + print(" Is VCLK0 clock in deep sleep: {}".format( + clock_measure['clk_deep_sleep'])) + clock_measure = amdsmi.amdsmi_get_clock_info( + processors[i], amdsmi.AmdSmiClkType.DCLK0) + print(" Current clock for domain DCLK0 is: {}".format( + clock_measure['clk'])) + print(" Max clock for domain DCLK0 is: {}".format( + clock_measure['max_clk'])) + print(" Min clock for domain DCLK0 is: {}".format( + clock_measure['min_clk'])) + print(" Is DCLK0 clock in deep sleep: {}".format( + clock_measure['clk_deep_sleep'])) + print() + + # VCLK1 and DCLK1 are not supported in Navi 31, MI210, and MI300 + @handle_exceptions + def test_gpu_clock_vclk1_dclk1(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_clock_info \n") + clock_measure = amdsmi.amdsmi_get_clock_info( + processors[i], amdsmi.AmdSmiClkType.VCLK1) + print(" Current clock for domain VCLK1 is: {}".format( + clock_measure['clk'])) + print(" Max clock for domain VCLK1 is: {}".format( + clock_measure['max_clk'])) + print(" Min clock for domain VCLK1 is: {}".format( + clock_measure['min_clk'])) + print(" Is VCLK1 clock in deep sleep: {}".format( + clock_measure['clk_deep_sleep'])) + clock_measure = amdsmi.amdsmi_get_clock_info( + processors[i], amdsmi.AmdSmiClkType.DCLK1) + print(" Current clock for domain DCLK1 is: {}".format( + clock_measure['clk'])) + print(" Max clock for domain DCLK1 is: {}".format( + clock_measure['max_clk'])) + print(" Min clock for domain DCLK1 is: {}".format( + clock_measure['min_clk'])) + print(" Is DCLK1 clock in deep sleep: {}".format( + clock_measure['clk_deep_sleep'])) + print() + + def test_gpu_activity(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_gpu_activity \n") + engine_usage = amdsmi.amdsmi_get_gpu_activity(processors[i]) + print(" engine_usage['gfx_activity'] is: {} %".format( + engine_usage['gfx_activity'])) + print(" engine_usage['umc_activity'] is: {} %".format( + engine_usage['umc_activity'])) + print(" engine_usage['mm_activity'] is: {} %".format( + engine_usage['mm_activity'])) + print() + + def test_pcie(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_pcie_info \n") + pcie_info = amdsmi.amdsmi_get_pcie_info(processors[i]) + print(" pcie_info['pcie_metric']['pcie_width'] is: {}".format( + pcie_info['pcie_metric']['pcie_width'])) + print(" pcie_info['pcie_static']['max_pcie_width'] is: {} ".format( + pcie_info['pcie_static']['max_pcie_width'])) + print(" pcie_info['pcie_metric']['pcie_speed'] is: {} MT/s".format( + pcie_info['pcie_metric']['pcie_speed'])) + print(" pcie_info['pcie_static']['max_pcie_speed'] is: {} ".format( + pcie_info['pcie_static']['max_pcie_speed'])) + print(" pcie_info['pcie_static']['pcie_interface_version'] is: {}".format( + pcie_info['pcie_static']['pcie_interface_version'])) + print(" pcie_info['pcie_static']['slot_type'] is: {}".format( + pcie_info['pcie_static']['slot_type'])) + print(" pcie_info['pcie_metric']['pcie_replay_count'] is: {}".format( + pcie_info['pcie_metric']['pcie_replay_count'])) + print(" pcie_info['pcie_metric']['pcie_bandwidth'] is: {}".format( + pcie_info['pcie_metric']['pcie_bandwidth'])) + print(" pcie_info['pcie_metric']['pcie_l0_to_recovery_count'] is: {}".format( + pcie_info['pcie_metric']['pcie_l0_to_recovery_count'])) + print(" pcie_info['pcie_metric']['pcie_replay_roll_over_count'] is: {}".format( + pcie_info['pcie_metric']['pcie_replay_roll_over_count'])) + print(" pcie_info['pcie_metric']['pcie_nak_sent_count'] is: {}".format( + pcie_info['pcie_metric']['pcie_nak_sent_count'])) + print(" pcie_info['pcie_metric']['pcie_nak_received_count'] is: {}".format( + pcie_info['pcie_metric']['pcie_nak_received_count'])) + print() + + def test_power(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_power_info \n") + power_info = amdsmi.amdsmi_get_power_info(processors[i]) + print(" power_info['current_socket_power'] is: {}".format( + power_info['current_socket_power'])) + print(" power_info['average_socket_power'] is: {}".format( + power_info['average_socket_power'])) + print(" power_info['gfx_voltage'] is: {}".format( + power_info['gfx_voltage'])) + print(" power_info['soc_voltage'] is: {}".format( + power_info['soc_voltage'])) + print(" power_info['mem_voltage'] is: {}".format( + power_info['mem_voltage'])) + print(" power_info['power_limit'] is: {}".format( + power_info['power_limit'])) + print("\n###Test amdsmi_is_gpu_power_management_enabled \n") + is_power_management_enabled = amdsmi.amdsmi_is_gpu_power_management_enabled(processors[i]) + print(" Is power management enabled is: {}".format( + is_power_management_enabled)) + print() + + def test_temperature(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_temp_metric \n") + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.HOTSPOT, amdsmi.AmdSmiTemperatureMetric.CURRENT) + print(" Current temperature for HOTSPOT is: {}".format( + temperature_measure)) + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.VRAM, amdsmi.AmdSmiTemperatureMetric.CURRENT) + print(" Current temperature for VRAM is: {}".format( + temperature_measure)) + print("\n###Test amdsmi_get_temp_metric \n") + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.HOTSPOT, amdsmi.AmdSmiTemperatureMetric.CRITICAL) + print(" Limit (critical) temperature for HOTSPOT is: {}".format( + temperature_measure)) + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.VRAM, amdsmi.AmdSmiTemperatureMetric.CRITICAL) + print(" Limit (critical) temperature for VRAM is: {}".format( + temperature_measure)) + print("\n###Test amdsmi_get_temp_metric \n") + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.HOTSPOT, amdsmi.AmdSmiTemperatureMetric.EMERGENCY) + print(" Shutdown (emergency) temperature for HOTSPOT is: {}".format( + temperature_measure)) + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.VRAM, amdsmi.AmdSmiTemperatureMetric.EMERGENCY) + print(" Shutdown (emergency) temperature for VRAM is: {}".format( + temperature_measure)) + print() + + # Edge temperature is not supported in MI300 + @handle_exceptions + def test_temperature_edge(self): + processors = amdsmi.amdsmi_get_processor_handles() + self.assertGreaterEqual(len(processors), 1) + self.assertLessEqual(len(processors), 32) + for i in range(0, len(processors)): + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + print("\n###Test amdsmi_get_temp_metric \n") + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.EDGE, amdsmi.AmdSmiTemperatureMetric.CURRENT) # current + print(" Current temperature for EDGE is: {}".format( + temperature_measure)) + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.EDGE, amdsmi.AmdSmiTemperatureMetric.CRITICAL) # slowdown/limit + print(" Limit (critical) temperature for EDGE is: {}".format( + temperature_measure)) + temperature_measure = amdsmi.amdsmi_get_temp_metric( + processors[i], amdsmi.AmdSmiTemperatureType.EDGE, amdsmi.AmdSmiTemperatureMetric.EMERGENCY) # shutdown + print(" Shutdown (emergency) temperature for EDGE is: {}".format( + temperature_measure)) + print() + + def test_walkthrough(self): + walk_through(self) + + # Unstable on workstation cards + # @handle_exceptions + # def test_walkthrough_multiprocess(self): + # print("\n\n========> test_walkthrough_multiprocess start <========\n") + # processors = amdsmi.amdsmi_get_processor_handles() + # self.assertGreaterEqual(len(processors), 1) + # self.assertLessEqual(len(processors), 32) + # p0 = multiprocessing.Process(target=walk_through, args=[self]) + # p1 = multiprocessing.Process(target=walk_through, args=[self]) + # p2 = multiprocessing.Process(target=walk_through, args=[self]) + # p3 = multiprocessing.Process(target=walk_through, args=[self]) + # p0.start() + # p1.start() + # p2.start() + # p3.start() + # p0.join() + # p1.join() + # p2.join() + # p3.join() + # print("\n========> test_walkthrough_multiprocess end <========\n") + + # Unstable on workstation cards + # @handle_exceptions + # def test_walkthrough_multithread(self): + # print("\n\n========> test_walkthrough_multithread start <========\n") + # processors = amdsmi.amdsmi_get_processor_handles() + # self.assertGreaterEqual(len(processors), 1) + # self.assertLessEqual(len(processors), 32) + # t0 = threading.Thread(target=walk_through, args=[self]) + # t1 = threading.Thread(target=walk_through, args=[self]) + # t2 = threading.Thread(target=walk_through, args=[self]) + # t3 = threading.Thread(target=walk_through, args=[self]) + # t0.start() + # t1.start() + # t2.start() + # t3.start() + # t0.join() + # t1.join() + # t2.join() + # t3.join() + # print("\n========> test_walkthrough_multithread end <========\n") + + # # Unstable - do not run + # @handle_exceptions + # def test_z_gpureset_asicinfo_multithread(self): + # def get_asic_info(processor): + # print("\n###Test amdsmi_get_gpu_asic_info \n") + # asic_info = amdsmi.amdsmi_get_gpu_asic_info(processor) + # print(" asic_info['market_name'] is: {}".format( + # asic_info['market_name'])) + # print(" asic_info['vendor_id'] is: {}".format( + # asic_info['vendor_id'])) + # print(" asic_info['vendor_name'] is: {}".format( + # asic_info['vendor_name'])) + # print(" asic_info['device_id'] is: {}".format( + # asic_info['device_id'])) + # print(" asic_info['rev_id'] is: {}".format( + # asic_info['rev_id'])) + # print(" asic_info['asic_serial'] is: {}".format( + # asic_info['asic_serial'])) + # print(" asic_info['oam_id'] is: {}\n".format( + # asic_info['oam_id'])) + # def gpu_reset(processor): + # print("\n###Test amdsmi_reset_gpu \n") + # amdsmi.amdsmi_reset_gpu(processor) + # print(" GPU reset completed.\n") + # print("\n\n========> test_z_gpureset_asicinfo_multithread start <========\n") + # processors = amdsmi.amdsmi_get_processor_handles() + # self.assertGreaterEqual(len(processors), 1) + # self.assertLessEqual(len(processors), 32) + # for i in range(0, len(processors)): + # bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + # print("\n\n###Test Processor {}, bdf: {}".format(i, bdf)) + # t0 = threading.Thread(target=get_asic_info, args=[processors[i]]) + # t1 = threading.Thread(target=gpu_reset, args=[processors[i]]) + # # t2 = threading.Thread(target=walk_through, args=[self]) + # # t3 = threading.Thread(target=walk_through, args=[self]) + # t0.start() + # t1.start() + # # t2.start() + # # t3.start() + # t0.join() + # t1.join() + # # t2.join() + # # t3.join() + # print("\n========> test_z_gpureset_asicinfo_multithread end <========\n") + +def walk_through(self): + print("\n###Test amdsmi_get_processor_handles() \n") + processors = amdsmi.amdsmi_get_processor_handles() + for i in range(0, len(processors)): + print("\n###Test amdsmi_get_gpu_device_bdf() | START walk_through | processor i = " + str(i) + "\n") + bdf = amdsmi.amdsmi_get_gpu_device_bdf(processors[i]) + print("###Test Processor {}, bdf: {} ".format(i, bdf)) + print("\n###Test amdsmi_get_gpu_asic_info \n") + asic_info = amdsmi.amdsmi_get_gpu_asic_info(processors[i]) + print(" asic_info['market_name'] is: {}".format( + asic_info['market_name'])) + print(" asic_info['vendor_id'] is: {}".format( + asic_info['vendor_id'])) + print(" asic_info['vendor_name'] is: {}".format( + asic_info['vendor_name'])) + print(" asic_info['device_id'] is: {}".format( + asic_info['device_id'])) + print(" asic_info['rev_id'] is: {}\n".format( + asic_info['rev_id'])) + print(" asic_info['asic_serial'] is: {}\n".format( + asic_info['asic_serial'])) + print(" asic_info['oam_id'] is: {}\n".format( + asic_info['oam_id'])) + print("###Test amdsmi_get_power_cap_info \n") + power_info = amdsmi.amdsmi_get_power_cap_info(processors[i]) + print(" power_info['dpm_cap'] is: {}".format( + power_info['dpm_cap'])) + print(" power_info['power_cap'] is: {}\n".format( + power_info['power_cap'])) + print("###Test amdsmi_get_gpu_vbios_info \n") + vbios_info = amdsmi.amdsmi_get_gpu_vbios_info(processors[i]) + print(" vbios_info['part_number'] is: {}".format( + vbios_info['part_number'])) + print(" vbios_info['build_date'] is: {}".format( + vbios_info['build_date'])) + print(" vbios_info['name'] is: {}\n".format( + vbios_info['name'])) + print(" vbios_info['version'] is: {}\n".format( + vbios_info['version'])) + print("###Test amdsmi_get_gpu_board_info \n") + board_info = amdsmi.amdsmi_get_gpu_board_info(processors[i]) + print(" board_info['model_number'] is: {}\n".format( + board_info['model_number'])) + print(" board_info['product_serial'] is: {}\n".format( + board_info['product_serial'])) + print(" board_info['fru_id'] is: {}\n".format( + board_info['fru_id'])) + print(" board_info['manufacturer_name'] is: {}\n".format( + board_info['manufacturer_name'])) + print(" board_info['product_name'] is: {}\n".format( + board_info['product_name'])) + print("###Test amdsmi_get_fw_info \n") + fw_info = amdsmi.amdsmi_get_fw_info(processors[i]) + fw_num = len(fw_info['fw_list']) + self.assertLessEqual(fw_num, len(amdsmi.AmdSmiFwBlock)) + for j in range(0, fw_num): + fw = fw_info['fw_list'][j] + if fw['fw_version'] != 0: + print("FW name: {}".format( + fw['fw_name'].name)) + print("FW version: {}".format( + fw['fw_version'])) + print("\n###Test amdsmi_get_gpu_driver_info \n") + driver_info = amdsmi.amdsmi_get_gpu_driver_info(processors[i]) + print("Driver info: {}".format(driver_info)) + print("\n###Test amdsmi_get_gpu_driver_info() | END walk_through | processor i = " + str(i) + "\n") + +if __name__ == '__main__': + unittest.main() \ No newline at end of file diff --git a/pytest/unit_tests.py b/pytest/unit_tests.py new file mode 100755 index 0000000000..abcce34258 --- /dev/null +++ b/pytest/unit_tests.py @@ -0,0 +1,163 @@ +#!/usr/bin/env python3 +# +# Copyright (c) 2024 Advanced Micro Devices, Inc. All rights reserved. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in +# all copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +# THE SOFTWARE. +import unittest + +import sys +sys.path.append("/opt/rocm/libexec/amdsmi_cli/") + +try: + import amdsmi +except ImportError: + raise ImportError("Could not import /opt/rocm/libexec/amdsmi_cli/amdsmi_cli.py") + +class TestAmdSmiPythonBDF(unittest.TestCase): + valid_bdfs = { + "00:00.0": [0, 0, 0, 0], + "01:01.1": [0, 1, 1, 1], + "FF:1F.7": [0, 255, 31, 7], + "FF:00.7": [0, 255, 0, 7], + "11:01.2": [0, 17, 1, 2], + "11:0a.2": [0, 17, 10, 2], + "0000:FF:1F.7": [0, 255, 31, 7], + "0001:ff:1F.7": [1, 255, 31, 7], + "ffff:FF:1f.7": [65535, 255, 31, 7], + } + invalid_bdfs = { + # invalid bdf strings, expect None + None: None, + "": None, + "00:00:0": None, + "00.00:0": None, + "00:00.Z": None, + "00:0Z.0": None, + "0Z:00.0": None, + "Z00:00.0": None, + "A00:00.0": None, + "0A00:00.0": None, + "00:00.07": None, + "00:00.8": None, + "00:00.10": None, + "00:00.11": None, + "00:00.-1": None, + "00:00.*-1": None, + "00:00.123": None, + "00:20.0": None, + "00:45.0": None, + "00:200.0": None, + "00:002.0": None, + "100:00.0": None, + "0100:00.0": None, + "00100:00.0": None, + "0101:00.0": None, + "00001:00.0": None, + "10001:00.0": None, + "45:0.0": None, + ".00:00.0": None, + "00.00.0": None, + "00.0.0": None, + "0.00.0": None, + "000.00.0": None, + "00 00 0": None, + " 00:00.0": None, + "00:00.0 ": None, + "0000:00.00.0": None, + "000:00:00.0": None, + "00:00:00.1": None, + "0:00:00.1": None, + "0000 00 00 0": None, + "-1-1:00:00.0": None, + "AAAA:00:AA.0": None, + "*1*1:00:00.0": None, + "0000:00:00.07": None, + "0000:00:00.8": None, + "0000:00:00.10": None, + "0000:00:00.11": None, + "0000:00:00.-1": None, + "0000:00:00.*-1": None, + "0000:00:00.123": None, + "0000:00:20.0": None, + "0000:00:45.0": None, + "0000:00:200.0": None, + "0000:00:002.0": None, + "0000:100:00.0": None, + "0000:0100:00.0": None, + "0000:00100:00.0": None, + "0000:0101:00.0": None, + "0000:00001:00.0": None, + "0000:10001:00.0": None, + "0000:45:0.0": None, + ".0000.00:00.0": None, + "0000.00.0.0": None, + " 0000:00:00.0": None, + "0000:00:00.0 ": None, + } + def test_parse_bdf(self): + # go through all bdfs + expectations = self.valid_bdfs.copy() + expectations.update(self.invalid_bdfs) + for bdf in expectations: + expected = expectations[bdf] + result = amdsmi.amdsmi_interface._parse_bdf(bdf) + self.assertEqual(result, expected, + "Expected {} for bdf {}, but got {}".format( + expected, bdf, result)) + @classmethod + def _convert_bdf_to_long(clz, bdf): + if len(bdf) == 12: + return bdf + if len(bdf) == 7: + return "0000:" + bdf + return None + def test_format_bdf(self): + # go through valid bdfs + expectations = self.valid_bdfs.copy() + for bdf_string in expectations: + # use key as result and value as input + bdf_list = expectations[bdf_string] + smi_bdf = amdsmi.amdsmi_interface._make_amdsmi_bdf_from_list(bdf_list) + expected = TestAmdSmiPythonBDF._convert_bdf_to_long(bdf_string) + expected = expected.lower() + result = amdsmi.amdsmi_interface._format_bdf(smi_bdf) + self.assertEqual(result, expected, + "Expected {} for bdf {}, but got {}".format( + expected, bdf_string, result)) + def test_check_res(self): + # expect retry error to raise SmiRetryException + with self.assertRaises(amdsmi.AmdSmiRetryException) as retry_test: + amdsmi.amdsmi_interface._check_res( + (lambda: amdsmi.amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_RETRY)()) + # except retry error to have AMDSMI_STATUS_RETRY error code + self.assertEqual(retry_test.exception.get_error_code(), + amdsmi.amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_RETRY) + # expect invalid args error to raise AmdSmiLibraryException + with self.assertRaises(amdsmi.AmdSmiLibraryException) as inval_test: + amdsmi.amdsmi_interface._check_res( + (lambda: amdsmi.amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_INVAL)()) + # expect invalid args error to have AMDSMI_STATUS_INVAL error code + self.assertEqual(inval_test.exception.get_error_code(), + amdsmi.amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_INVAL) + # for successfull call, expect no error is given + result = amdsmi.amdsmi_interface._check_res( + (lambda: amdsmi.amdsmi_interface.amdsmi_wrapper.AMDSMI_STATUS_SUCCESS)()) + self.assertEqual(None, result) +if __name__ == '__main__': + unittest.main() diff --git a/rocm_smi/src/rocm_smi.cc b/rocm_smi/src/rocm_smi.cc index 1526e5832c..3830f63986 100755 --- a/rocm_smi/src/rocm_smi.cc +++ b/rocm_smi/src/rocm_smi.cc @@ -963,6 +963,7 @@ rsmi_dev_oam_id_get(uint32_t dv_ind, uint16_t *id) { ss << __PRETTY_FUNCTION__ << "| ======= start ======="; LOG_TRACE(ss); CHK_SUPPORT_NAME_ONLY(id) + *id = std::numeric_limits::max(); ret = get_id(dv_ind, amd::smi::kDevXGMIPhysicalID, id); ss << __PRETTY_FUNCTION__ << " | ======= end =======" diff --git a/src/amd_smi/amd_smi.cc b/src/amd_smi/amd_smi.cc index be0adfd876..80ee92cbfa 100644 --- a/src/amd_smi/amd_smi.cc +++ b/src/amd_smi/amd_smi.cc @@ -470,7 +470,7 @@ amdsmi_status_t amdsmi_get_gpu_board_info(amdsmi_processor_handle processor_hand LOG_INFO(ss); } - ss << __PRETTY_FUNCTION__ << "[After rocm smi correction] " + ss << __PRETTY_FUNCTION__ << " | [After rocm smi correction] " << "Returning status = AMDSMI_STATUS_SUCCESS" << "\n; info->model_number: |" << board_info->model_number << "|" << "\n; info->product_serial: |" << board_info->product_serial << "|" @@ -751,10 +751,9 @@ amdsmi_get_gpu_asic_info(amdsmi_processor_handle processor_handle, amdsmi_asic_i } // default to 0xffff as not supported - info->oam_id = std::numeric_limits::max(); + info->oam_id = std::numeric_limits::max(); uint16_t tmp_oam_id = 0; - status = rsmi_wrapper(rsmi_dev_oam_id_get, processor_handle, - &(tmp_oam_id)); + status = rsmi_wrapper(rsmi_dev_oam_id_get, processor_handle, &(tmp_oam_id)); info->oam_id = tmp_oam_id; // default to 0xffffffff as not supported