- Updated python integration test to account for PPT1 support changes
- Updated set/reset power-cap input format
- Adjusted python API and updated C++ API test
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
- **Added evicted_time metric for kfd processes**.
- Time that queues are evicted on a GPU in milliseconds
- Added to CLI in `amd-smi monitor -q` and `amd-smi process`
- Added to C API and Python API:
- amdsmi_get_gpu_process_list()
- amdsmi_get_gpu_compute_process_info()
- amdsmi_get_gpu_compute_process_info_by_pid()
---------
Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
* [SWDEV-542718] Correct socket_affinity
Updated Socket affinity to show bitmask and expanded cpu list.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Update per-device local_cpulist for socket_affinity
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Added amdsmi_get_cpu_affinity_from_local_cpulist API.
Updated the wrapper.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Revert "Added amdsmi_get_cpu_affinity_from_local_cpulist API."
This reverts commit 9a2ef934b1787f8aa09d3e4efe02f897b4295215.
* Moved the changes to C API.
In case of SOCKET_SCOPE, use local_cpulist first.
If it is unavailable or not readable, fallback to
numa.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Addressed review comments
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* Clarified comments regarding power limit retrieval and its support on virtualized systems.
* Change unsupported comment to UINT32_MAX
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
* Used KFD to determine linking between GPUs and PIDs rather than depend on fdinfo's per pid single gpu bdf info that we were getting.
Signed-off-by: adapryor <Adam.pryor@amd.com>
---------
Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a
Increased the AMDSMI_MAX_DEVICES to 64 to accomodate all
devices in CPX mode. The link type has been modified in
amd-smi to match with rocm-smi types, updated the same
for drm tests.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Decrement padding to keep struct size the same
Change-Id: I4bea5d4b4d5c908423c7cc55a7e8c404b4a6b5e8
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Changes:
- This aligns back to original struct naming for ROCm 7.0. This removes
any Major ABI breakages for updates for 7.0 release.
- Minor ABI breakage is required since there were additions to the
header. Refer to changelog for these updates.
Change-Id: If35af74eac6beac8c267d05ce789b7761ed24bff
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Changes:
- Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency)
- Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs
(Violation Status is the first example of this in monitor)
- Improve CLI monitor output:
support multiple GPU lines per GPU, add new columns, and better formatting
- Refactor helpers and logger for flexible unit formatting and table rendering
- Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info()
new metrics APIs in C++ example
- Sync Python/C++ interface and structures for new metrics fields and naming
- Remove deprecated/unused RSMI activity APIs, documentation not needed since
the APIs no longer exist in ROCm SMI either.
- Cleanup metric violations + fix handle watch arguments
- Provide better handling/doc for average_flattened_ints()
- Group xcp metrics with brackets in human readable + adjust output size
Signed-off-by: Poag, Charis <Charis.Poag@amd.com>
Description:
- Added a new API `amdsmi_gpu_driver_reload()` to reload the AMD GPU driver independently.
- Updated CLI (`sudo amd-smi reset -r`) and Python bindings to support driver reload functionality.
- Removed automatic driver reload from `amdsmi_set_gpu_memory_partition()` and `amdsmi_set_gpu_memory_partition_mode()`.
- Enhanced CLI and test cases to allow users to control when the driver reload occurs.
- Updated documentation and changelog to reflect the new driver reload process.
- Improved error handling and logging for driver reload operations.
- Added progress bar and user confirmation prompts for driver reload commands.
* Update build/test strategy to only allow one test execution at a time
* Modify API verbage + modify systemctl error output
- Systemctl is typically not enabled on docker.
- And is an edge case for gpu being active process/etc for display devices.
* Remove AMDSMI_STATUS_AMDGPU_RESTART_ERR from the return values
* Move driver reload to after we save original compute partitions
---------
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Add support to Query UBB/OAM temperature.
* Updated Python API with new temperature metrics enum
---------
Co-authored-by: Bill Liu <shuzhliu@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Moved the bit_rate and max_bandwidth back into links in the
amdsmi_link_metrics_t struct as this change was impacting
other teams. Modified the C and python API's, wrapper, and
CLI accordingly.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
* fix endian problem
* use hw_revision and flags_mask from cper section instead of hardcoded values
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
*Update aca-decode to #4cd539d that fixes some errors in parsing cper files for afid extraction
*Without this fix, we get garbage value for some cper input files relating GFX_poison_cpers
Signed-off-by: Oosman Saeed <oossaeed@amd.com>
The xgmi command was showing pcie bit rate and bandwidth instead of xgmi. Corrected the API to get xgmi data from gpu metric.
Added python API for amdsmi_get_link_metrics. Modified the amdsmi_link_metrics struct.
Added check to confirm non zero partition got xgmi command.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
* [SWDEV-488303] Adjusted process vram_mem data source
* Standardized sscanf format strings
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>