* Don't require powercap support
APUs don't necessarily support setting a power cap from sysfs.
Ignore failures of the file missing.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Show edge temperature in default output if hotspot is missing
APUs don't have a hotspot temperature, they have an edge though.
Use that.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Format all "power" keys as watts
There will be more power keys when APU support is added, so format
them properly.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Don't show power limit in output if it's invalid
APUs can't set power limit using power_cap1 interface. The limit
will be 0 and thus the UX looks weird in default output.
Only add the `/power_limit` if it's valid.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Unify sizes of `amdsmi_power_info_t`
Sizes are used inconsistently. This causes tools to not show
N/A when they should. Make them unified.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a
[ROCm/amdsmi commit: cd21b5edcc]
Increased the AMDSMI_MAX_DEVICES to 64 to accomodate all
devices in CPX mode. The link type has been modified in
amd-smi to match with rocm-smi types, updated the same
for drm tests.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
[ROCm/amdsmi commit: 6715c5aa92]
Changes:
- Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency)
- Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs
(Violation Status is the first example of this in monitor)
- Improve CLI monitor output:
support multiple GPU lines per GPU, add new columns, and better formatting
- Refactor helpers and logger for flexible unit formatting and table rendering
- Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info()
new metrics APIs in C++ example
- Sync Python/C++ interface and structures for new metrics fields and naming
- Remove deprecated/unused RSMI activity APIs, documentation not needed since
the APIs no longer exist in ROCm SMI either.
- Cleanup metric violations + fix handle watch arguments
- Provide better handling/doc for average_flattened_ints()
- Group xcp metrics with brackets in human readable + adjust output size
Signed-off-by: Poag, Charis <Charis.Poag@amd.com>
[ROCm/amdsmi commit: e2e4fc65c1]
Description:
- Added a new API `amdsmi_gpu_driver_reload()` to reload the AMD GPU driver independently.
- Updated CLI (`sudo amd-smi reset -r`) and Python bindings to support driver reload functionality.
- Removed automatic driver reload from `amdsmi_set_gpu_memory_partition()` and `amdsmi_set_gpu_memory_partition_mode()`.
- Enhanced CLI and test cases to allow users to control when the driver reload occurs.
- Updated documentation and changelog to reflect the new driver reload process.
- Improved error handling and logging for driver reload operations.
- Added progress bar and user confirmation prompts for driver reload commands.
* Update build/test strategy to only allow one test execution at a time
* Modify API verbage + modify systemctl error output
- Systemctl is typically not enabled on docker.
- And is an edge case for gpu being active process/etc for display devices.
* Remove AMDSMI_STATUS_AMDGPU_RESTART_ERR from the return values
* Move driver reload to after we save original compute partitions
---------
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: d24dc7ef89]
* CMAKE - Remove example build from src/CMakeLists.txt
For some reason it was building examples every time even when not
necessary...
* CMAKE - Format
* Fix drm_example broken PRIu32
* CMAKE - Do NOT create lib64 when building examples
* CMAKE - Examples should only install C and CMake files
---------
Change-Id: I6274b72a085a41b5bd5ae698af798f60a8a092a0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
[ROCm/amdsmi commit: f9b8066c26]
feat: Report PLDM Bundle from SMC to IB
Code changes related to the following:
* APIs
* CLI
* Unit tests
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Change-Id: I35ccf01eb612ca80e3ae6b72039085c18c989222
[ROCm/amdsmi commit: fe9b6eeb49]
Changes:
- Removed libdrm/libdrm_amdgpu dependencies
- Added/updated new internal libdrm/libdrm_amdgpu/xf86drm APIs
to allow our APIs to reference before dynamic loading
the libdrm/libdrm_amdgpu libraries:
1. amdgpu_drm.h to what's seen in mainline
2. Added xf86drm.h to whats seen in mainline
- Modified internal DRM capabilities:
1. Require each API to independently connect to libdrm/libdrm_amdgpu
+ validate API handles reponses accordingly
2. Initialization of AMD SMI no longer has as strong of a tie to
libdrm
- Updated internal implementations of several APIs which have
connections to libdrm/libdrm_amdgpu or APIs which have conflicts
with open libdrm/libdrm_amdgpu connections:
1. amdsmi_init()
2. amdsmi_get_gpu_vram_usage()
3. amdsmi_get_gpu_asic_info()
4. amdsmi_get_gpu_vram_info()
5. amdsmi_get_gpu_vbios_info()
6. amdsmi_get_gpu_driver_info()
7. amdsmi_get_gpu_virtualization_mode()
8. amdsmi_set_gpu_memory_partition()
9. amdsmi_set_gpu_memory_partition_mode()
- Cleaned up effected tests/APIs
Change-Id: I96e2cf1b06b0cfee1b01a5e991ccc6116c4245a8
[ROCm/amdsmi commit: b5a43b7744]
- Updates:
- Adding the following metrics to allow new calculations for violation status:
- Per XCP metrics gfx_below_host_limit_ppt_acc
- Per XCP metrics gfx_below_host_limit_thm_acc
- Per XCP metrics gfx_low_utilization_acc
- Per XCP metrics gfx_below_host_limit_total_acc
- Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: 7c882b2f69]
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram
- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
[ROCm/amdsmi commit: f8b8347627]
Change copyrights to MIT and remove date
Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I16f5b412f2b9ddefaaa1771aa714cc18829a1be4
[ROCm/amdsmi commit: 3052ad4220]
amdsmi_get_link_topology_nearest() is used to retrieve
the set of GPUs that are nearest to a given device
at a specific interconnectivity level.
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Header Unification Change: "/amdsmi/+/1122408"
Change-Id: Id0317797c652c267742513936d321677793ec634
Signed-off-by: Lang Yu <lang.yu@amd.com>
[ROCm/amdsmi commit: 7a557b1c50]
number of compute units `amdgpu_gpu_info.num_of_compute_units` is exposed through amdsmi_get_gpu_asic_info().
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Change-Id: Ibeb612d079ed87437a0e56124b8504098fc2dcfd
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: b05849dad0]
Driver info `amdgpu_gpu_info.vram_bit_width` is exposed through amdsmi_get_gpu_vram_info().
Code changes related to the following:
* API
* CLI
* Unit tests
* Examples
Change-Id: I8abd8db7a603078b2b1c008b2685cecf35caf3d2
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 893f13ab98]
The sysfs is changed to use the pm_policy folder with multiple
dpm_policy files.
Change-Id: I40fac8de2d0cb127950d238b8196f6d2416778d0
[ROCm/amdsmi commit: e5d1ba4621]
Updates:
* CLI - Added AMDSMIHelpers.convert_SI_unit() to help
conversion of units
* API - Reverted to uW for power cap limits
* CLI - amd-smi static --limit now includes MIN_POWER
* Tests now are all using uW units to keep W conversion
to only happen in CLI
* Python API now reflects same units as uW (what is seen
in amdgpu driver)
* CLI - amd-smi metric --power:
Fixed power seen on gpu_metrics v1.3
Change-Id: I32d9ba78d0d8806772f0860f9a803a885b3f316a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
[ROCm/amdsmi commit: c24d66740e]
Checks and forces rereading gpu metrics unconditionally
Code changes related to the following:
* Device::dev_log_gpu_metrics()
* amdsmi_get_gpu_metrics_header_info()
Removed unintentionally during work on 'header cleanup Remove non-unified headers'
* Examples
* Unit tests
Change-Id: I83710e173c0f7102d0b7f865c18474c979a95cd8
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
[ROCm/amdsmi commit: 78074d7d77]
Renamed structs to be more conistent with what they are calling
Signed-off-by: Maisam Arif <maisarif@amd.com>
Change-Id: I6f2be2fcb76f004aa592f0dad8545565700ccd4b
[ROCm/amdsmi commit: f831cf49f7]