Changes:
- Simplified reset calls
- Updated static limit N/A values to all possible data
(helps csv format be consistent)
- Unit format was broken on static
- get_power_cap() had min/max values swapped, and the return
was missing two fields
- Updated changelog to reflect all changes
Change-Id: I23713471b984f52085372486c6e6ff852e2f42f8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
The argparse 'choices' parameter was receiving a comma-separated string
instead of a list, causing it to treat individual characters as valid
choices rather than complete tokens like 'SPX', 'DPX', etc.
Fixed by removing the unnecessary join() operation in
get_accelerator_choices_types_indices() to return the list directly.
This matches the pattern used by get_memory_partition_types().
Now 'amd-smi set -C DPX' and other partition commands work correctly.
* Changes:
- Modified attempting to open files to check
permissions -> check read access only.
Do not try to open all paths, may cause driver issues.
Read access is sufficient to check permissions.
Reason: GPUs which support partitioning (memory/compute),
logical devices will not be valid until configured.
See `sudo amd-smi set -h` or applicable APIs
to configure on supported hardware.
Example error dmesg output:
[965358.883112] amdgpu 0000:15:00.0: amdgpu: renderD153 partition 1 not valid!
[965358.883283] amdgpu 0000:15:00.0: amdgpu: renderD154 partition 2 not valid!
[965358.883438] amdgpu 0000:15:00.0: amdgpu: renderD155 partition 3 not valid!
[965358.883594] amdgpu 0000:15:00.0: amdgpu: renderD156 partition 4 not valid!
[965358.883749] amdgpu 0000:15:00.0: amdgpu: renderD157 partition 5 not valid!
[965358.883904] amdgpu 0000:15:00.0: amdgpu: renderD158 partition 6 not valid!
[965358.884060] amdgpu 0000:15:00.0: amdgpu: renderD159 partition 7 not valid!
---------
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Added json and csv output format support for newly
added xgmi link_status. Aligned legend.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Added amd-smi set --pcie command
* Removed current pcie level due to it not being static
* Added pcie information to static --bus
---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
* [SWDEV-542718] Correct socket_affinity
Updated Socket affinity to show bitmask and expanded cpu list.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Update per-device local_cpulist for socket_affinity
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Added amdsmi_get_cpu_affinity_from_local_cpulist API.
Updated the wrapper.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Revert "Added amdsmi_get_cpu_affinity_from_local_cpulist API."
This reverts commit 9a2ef934b1787f8aa09d3e4efe02f897b4295215.
* Moved the changes to C API.
In case of SOCKET_SCOPE, use local_cpulist first.
If it is unavailable or not readable, fallback to
numa.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Addressed review comments
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Updated groups printing
* added parameters to check_required_groups
* two device groups since kfd and render require the same group
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
Added `GPU LINK PORT STATUS` table to `amd-smi xgmi` command
The `amd-smi xgmi -s` or `amd-smi xgmi --source-status` will show `GPU LINK PORT STATUS` table.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Added the following API's to amdsmi_interface.py.
amdsmi_get_cpu_handle()
amdsmi_get_esmi_err_msg()
amdsmi_get_gpu_event_notification()
amdsmi_get_processor_count_from_handles()
amdsmi_get_processor_handles_by_type()
amdsmi_gpu_validate_ras_eeprom()
amdsmi_init_gpu_event_notification()
amdsmi_set_gpu_event_notification_mask()
amdsmi_stop_gpu_event_notification()
amdsmi_get_gpu_busy_percent()
Added additional return value to API amdsmi_get_xgmi_plpd().
The entry policies is added to the end of the dictionary to match API definition.
The entry plpds is marked for deprecation as it has the same information as policies.
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
* Added ability to format gpu_metrics v1_9
* New gpu_metrics format from the driver should allow amd-smi to parse with future compatibility guaranteed
---------
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Signed-off-by: adapryor <Adam.pryor@amd.com>
Co-authored-by: Oliveira, Daniel <daniel.oliveira@amd.com>
This unbreaks having sources on one mount point and builds at another.
Signed-off-by: Marius Brehler <marius.brehler@amd.com>
Change-Id: I68363112382a95baaa867cad91e09bdec2b30d90
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a
For process -
Dual CSV is required in order to print 4 separate rows.
1. Metric header + data
2. Process header + data
Change-Id: Ibb7bfb13fa95a7c43b2e3f9061ada3a6be4aa8cb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
* Adjusted amdsmitst and reset command to account for separation of power profile and perf level behavior
* Updated test to reset power profile to previous user setting
* Removed performance level from reset_profile_results in reset --profile command
* Updated Changelog with change to reset profile behavior
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Added bad_page_threshold_exceeded field to ras, which
compares retired pages count against bad page threshold.
This field displays True if retired pages exceed the
threshold, False if within threshold, or N/A if
threshold data is unavailable.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
* Added gpu-board and base-board temperatures to amd-smi metric
* Updated Changelog and adjusted the metric base-board/gpu-board output
* Adjusted output of metric to hide base/gpu-board when not relevant
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
* Filter out amd-smi from process detection
* Fixed N/A stripping N/ incorrectly from running elevated processes
Signed-off-by: adapryor <Adam.pryor@amd.com>
* Update amd-smi doc with examples of CPER and AFID API usage.
---------
Signed-off-by: Oosman Saeed <oossaeed@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>