* Fix the amdgpu version string comparison
The intention behind it was to avoid showing the string if it's not
got information.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
* Display the kernel version in amd-smi output
This is an interesting debugging point, especially in the case of
not having a DKMS package installed.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Moving os_kernel_version to static --driver
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
---------
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
* Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control
- Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control
in AMD SMI and also in the CLI tool
* CHANGELOG.md file updated
* SWDEV-562837: Update amdsmi-py-api.md as per the new APIs
Updated amdsmi-py-api.md as per the new APIs added.
---------
Signed-off-by: Soumya <sranjanr@amd.com>
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Saka Sitharammurthy <SitharamMurthy.Saka@amd.com>
* Run pre-commit's whitespace related hooks on projects/amdsmi
In order for pre-commit to be useful, everything needs to meet a common
baseline.
* Add whitespace back to Changelog for formatting
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
* Read the ids_flags when fetching GPU info
The ids_flags contains the flags that can help identify if a GPU
is a dGPU or an APU.
* Show correct memory pool for APUs
The kernel policy for APUs will be to choose the bigger pool of
memory (GTT or VRAM) for KFD work. Adjust the policy for the monitor
and default commands to show the right memory pool when using an APU.
* Don't require powercap support
APUs don't necessarily support setting a power cap from sysfs.
Ignore failures of the file missing.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Show edge temperature in default output if hotspot is missing
APUs don't have a hotspot temperature, they have an edge though.
Use that.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Format all "power" keys as watts
There will be more power keys when APU support is added, so format
them properly.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Don't show power limit in output if it's invalid
APUs can't set power limit using power_cap1 interface. The limit
will be 0 and thus the UX looks weird in default output.
Only add the `/power_limit` if it's valid.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Unify sizes of `amdsmi_power_info_t`
Sizes are used inconsistently. This causes tools to not show
N/A when they should. Make them unified.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Fix powercap default to enum for sensor_ind
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
* [SWDEV-559965] Refactor amdsmi set power cap
Modified power cap set to accept args with
optional power_cap type. Added power_cap helper
validate_and_set_power_cap(). Fixed JSON output
format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Handled numa data - including cpu and socket list, bitmask,
and affinity for csv format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 1b027d15bd]
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
- amdsmi_get_node_handle
- amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: f8e4771363]
Changes:
- Simplified reset calls
- Updated static limit N/A values to all possible data
(helps csv format be consistent)
- Unit format was broken on static
- get_power_cap() had min/max values swapped, and the return
was missing two fields
- Updated changelog to reflect all changes
Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 00a893d299]
- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
[ROCm/amdsmi commit: 18faddf6f3]
* Added amd-smi set --pcie command
* Removed current pcie level due to it not being static
* Added pcie information to static --bus
---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 9e3537d778]
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
[ROCm/amdsmi commit: 2144cfbba4]
* [SWDEV-542718] Correct socket_affinity
Updated Socket affinity to show bitmask and expanded cpu list.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Update per-device local_cpulist for socket_affinity
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Added amdsmi_get_cpu_affinity_from_local_cpulist API.
Updated the wrapper.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Revert "Added amdsmi_get_cpu_affinity_from_local_cpulist API."
This reverts commit 9a2ef934b1787f8aa09d3e4efe02f897b4295215.
* Moved the changes to C API.
In case of SOCKET_SCOPE, use local_cpulist first.
If it is unavailable or not readable, fallback to
numa.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Addressed review comments
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 09a97f02ed]
* Updated groups printing
* added parameters to check_required_groups
* two device groups since kfd and render require the same group
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: ee1445e2cc]
Added `GPU LINK PORT STATUS` table to `amd-smi xgmi` command
The `amd-smi xgmi -s` or `amd-smi xgmi --source-status` will show `GPU LINK PORT STATUS` table.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 7ddd91653e]
Added the following API's to amdsmi_interface.py.
amdsmi_get_cpu_handle()
amdsmi_get_esmi_err_msg()
amdsmi_get_gpu_event_notification()
amdsmi_get_processor_count_from_handles()
amdsmi_get_processor_handles_by_type()
amdsmi_gpu_validate_ras_eeprom()
amdsmi_init_gpu_event_notification()
amdsmi_set_gpu_event_notification_mask()
amdsmi_stop_gpu_event_notification()
amdsmi_get_gpu_busy_percent()
Added additional return value to API amdsmi_get_xgmi_plpd().
The entry policies is added to the end of the dictionary to match API definition.
The entry plpds is marked for deprecation as it has the same information as policies.
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: 7decbc67a1]
* Added ability to format gpu_metrics v1_9
* New gpu_metrics format from the driver should allow amd-smi to parse with future compatibility guaranteed
---------
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Signed-off-by: adapryor <Adam.pryor@amd.com>
Co-authored-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 5ef0b3c34d]
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a
[ROCm/amdsmi commit: cd21b5edcc]
For process -
Dual CSV is required in order to print 4 separate rows.
1. Metric header + data
2. Process header + data
Change-Id: Ibb7bfb13fa95a7c43b2e3f9061ada3a6be4aa8cb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 4fd8b88aa5]
* Adjusted amdsmitst and reset command to account for separation of power profile and perf level behavior
* Updated test to reset power profile to previous user setting
* Removed performance level from reset_profile_results in reset --profile command
* Updated Changelog with change to reset profile behavior
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
[ROCm/amdsmi commit: 954d4860c1]
Added bad_page_threshold_exceeded field to ras, which
compares retired pages count against bad page threshold.
This field displays True if retired pages exceed the
threshold, False if within threshold, or N/A if
threshold data is unavailable.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
[ROCm/amdsmi commit: edaae978a2]
* Added gpu-board and base-board temperatures to amd-smi metric
* Updated Changelog and adjusted the metric base-board/gpu-board output
* Adjusted output of metric to hide base/gpu-board when not relevant
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
[ROCm/amdsmi commit: b13fc16d60]
Changes:
- Fixes amd-smi monitor such as:
amd-smi monitor -Vqt, amd-smi monitor -g 0 -Vqt -w 1
amd-smi monitor -Vqt --file /tmp/test1, ...
- Required moving around when process is called, since xcp
information is gathered in right format expected by monitor
- Requires process to be appended first with the gpu data -> xcp
info to be gathered + added after 1st device
Change-Id: I76356a4610944f633a9530970fac66556d65bf11
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 1b2edd70bd]
Changes:
- This aligns back to original struct naming for ROCm 7.0. This removes
any Major ABI breakages for updates for 7.0 release.
- Minor ABI breakage is required since there were additions to the
header. Refer to changelog for these updates.
Change-Id: If35af74eac6beac8c267d05ce789b7761ed24bff
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: d3b73fac82]