614 Commity

Autor SHA1 Zpráva Datum
systems-assistant[bot] c6b7448227 Add support for get and set APIs for CPUISOFreqPolicy and DFCState Co… (#1901)
* Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control

  - Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control
    in AMD SMI and also in the CLI tool

* CHANGELOG.md file updated

* SWDEV-562837: Update amdsmi-py-api.md as per the new APIs

Updated amdsmi-py-api.md as per the new APIs added.

---------

Signed-off-by: Soumya <sranjanr@amd.com>
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Saka Sitharammurthy <SitharamMurthy.Saka@amd.com>
2026-01-06 10:37:07 -06:00
Mario Limonciello 08949cb884 Run pre-commit's whitespace related hooks on projects/amdsmi (#2119)
* Run pre-commit's whitespace related hooks on projects/amdsmi

In order for pre-commit to be useful, everything needs to meet a common
baseline.

* Add whitespace back to Changelog for formatting

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-12-15 13:20:47 -06:00
koushikbillakanti-amd 9e06ea8f79 [SWDEV-564696] Structure size mismatch in SOC pstate/XGMI PLPD (#2207)
* Address PR feedback: consolidate switch cases, move CSV formatting, use direct API calls for error messages
* csv output flattening changes

---------

Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>
2025-12-10 23:37:36 -06:00
Mario Limonciello 73778bf83c Adjust policy for memory display on APUs (#1967)
* Read the ids_flags when fetching GPU info

The ids_flags contains the flags that can help identify if a GPU
is a dGPU or an APU.

* Show correct memory pool for APUs

The kernel policy for APUs will be to choose the bigger pool of
memory (GTT or VRAM) for KFD work.  Adjust the policy for the monitor
and default commands to show the right memory pool when using an APU.
2025-12-09 21:49:06 -06:00
Mario Limonciello a08170bc75 Apu prerequisites (#1946)
* Don't require powercap support

APUs don't necessarily support setting a power cap from sysfs.
Ignore failures of the file missing.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Show edge temperature in default output if hotspot is missing

APUs don't have a hotspot temperature, they have an edge though.
Use that.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Format all "power" keys as watts

There will be more power keys when APU support is added, so format
them properly.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Don't show power limit in output if it's invalid

APUs can't set power limit using power_cap1 interface.  The limit
will be 0 and thus the UX looks weird in default output.
Only add the `/power_limit` if it's valid.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Unify sizes of `amdsmi_power_info_t`

Sizes are used inconsistently.  This causes tools to not show
N/A when they should.  Make them unified.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-08 21:36:45 -06:00
systems-assistant[bot] eb357fcd45 [SWDEV-531902] python docs need exception type updated (#1895)
* add parameter checks

* remove AmdSmiRetryException and AMDSMI_STATUS_RETRY

* remove bdf exception

* revert retry exception

* add parameter checks

* remove AmdSmiRetryException and AMDSMI_STATUS_RETRY

* remove bdf exception

* revert retry exception

* wip

* wip

* add missing error codes

* wip

* Updated amdsmi-py-api.md file and amdsmi_exception.py

* Updated amdsmi-py-api.md file

* "Deleted backup related files"

* updated amdsmi_interface.py file

* amdsmi_interface.py file changes

* updated amdsmi_interface.py file to fix check issues

* updated amdsmi-py-api.md file

* Reverted AmdSmiBdfFormatException definition

---------

Co-authored-by: Oosman Saeed <oossaeed@amd.com>
Co-authored-by: ssaka_amdeng <SitharamMurthy.Saka@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
2025-12-08 12:57:23 -06:00
systems-assistant[bot] ed02159bf6 Stop trying to fit too much in one line for default view (#1897)
* Stop trying to fit too much in one line for default view

The default view is really cramped trying to put a lot of version
information into one line, to the point that some strings are
cropped. Instead of cropping the strings just put each into it's
own line.

For running without a ROCm release installed hide the ROCm version
line.

Sample output:
```
+------------------------------------------------------------------------------+
| AMD-SMI 26.1.0+2a668c34                                                      |
| amdgpu version: Linuxver                                                     |
| VBIOS version: 023.010.001.022.000001                                        |
| Platform: Linux Baremetal                                                    |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:c1:00.0 ...adeon 890M Graphics | N/A      59 °C   0                17 W |
|   0       0     N/A             N/A | 25 %       N/A              479/512 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
|==============================================================================|
|  No running processes found                                                  |
+------------------------------------------------------------------------------+
```

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Don't show amdgpu version on mainline kernels

amdgpu version doesn't exist on a mainline kernel.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Truncate amdgpu version string to 80 characters

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Allow longer AMD-SMI version strings

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Adjusted version header format

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Co-authored-by: Mario Limonciello (AMD) <superm1@kernel.org>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-04 23:23:34 -06:00
Adam Pryor 422253f871 Implement PTL support (#1957)
* Implement PTL support

Signed-off-by: adapryor <Adam.pryor@amd.com>
(cherry picked from commit 45bc31292e7940a3b8fca044ef7df22047b95733)

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-11-26 08:33:27 -06:00
Saeed, Oosman 6ba1f066ad [SWDEV_562432] update inband CPER meta data to be more consistent with OOB (#824)
* Added Product Serial Number to the raw_bytes cper entries
* Added Product Serial Number to the Python API return
---------

Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 05ea00dcc4]
2025-11-17 13:25:56 -06:00
Bindhiya Kanangot Balakrishnan 40aa184c2a [SWDEV-538483] Fix C++17 fs linking for GCC<9.0
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: e1b3d5f02e]
2025-11-14 17:59:07 -06:00
Kanangot Balakrishnan, Bindhiya 072daa28d5 [SWDEV-538483] Add NPM API's and CLI (#817)
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
 - amdsmi_get_node_handle
 - amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: f8e4771363]
2025-11-13 21:51:31 -06:00
gabrpham_amdeng 351b6f96ae Added support for configuring PPT1 power cap
- Updated python integration test to account for PPT1 support changes
  - Updated set/reset power-cap input format
  - Adjusted python API and updated C++ API test

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f


[ROCm/amdsmi commit: 18faddf6f3]
2025-11-13 13:08:12 -06:00
Pryor, Adam 354886f4ff [SWDEV-357472] Add evicted_ms metric (#620)
- **Added evicted_time metric for kfd processes**.  
  - Time that queues are evicted on a GPU in milliseconds
  - Added to CLI in `amd-smi monitor -q` and `amd-smi process`
  - Added to C API and Python API:
    - amdsmi_get_gpu_process_list()
    - amdsmi_get_gpu_compute_process_info()
    - amdsmi_get_gpu_compute_process_info_by_pid()

---------

Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>

[ROCm/amdsmi commit: 2144cfbba4]
2025-10-28 14:49:03 -05:00
Charis Poag fee59b2c58 [SWDEV-562726] Fix clang + ASAN errors
* Updates:
  - [ASAN] GCC does not support `-shared-libsan flags`, so removed this one
  - [Clang] Fixed refernces to local binding errors (name collision)
    & other strict scope/structure/lamda binding errors
  - [Clang] Fix rsmi_wrapper error: \"error: missing default argument on parameter \'args\'\"
  - [ASAN] Fixed stack-buffer-overflow found in
    `amdsmi_get_gpu_accelerator_partition_profile()`

Change-Id: I854007efb75d828dbb8088c0d56dbc125081f0f2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 00a04f5810]
2025-10-28 09:54:23 -05:00
Saeed, Oosman be53750aa3 Sync with latest ras-decode @bc6b43c (#770)
Signed-off-by: Oosman Saeed <oossaeed@amd.com>

[ROCm/amdsmi commit: 90f4b8c43d]
2025-10-27 14:10:00 -05:00
Kanangot Balakrishnan, Bindhiya 3924171d74 [SWDEV-542718] Correct socket_affinity (#760)
* [SWDEV-542718] Correct socket_affinity

Updated Socket affinity to show bitmask and expanded cpu list.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

* Update per-device local_cpulist for socket_affinity

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

* Added amdsmi_get_cpu_affinity_from_local_cpulist API.
Updated the wrapper.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

* Revert "Added amdsmi_get_cpu_affinity_from_local_cpulist API."

This reverts commit 9a2ef934b1787f8aa09d3e4efe02f897b4295215.

* Moved the changes to C API.
In case of SOCKET_SCOPE, use local_cpulist first.
If it is unavailable or not readable, fallback to
numa.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

* Addressed review comments

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/amdsmi commit: 09a97f02ed]
2025-10-22 16:20:41 -05:00
Poag, Charis ce19b921b0 [SWDEV-535159] Add support for GPU partition metrics (#490)
[SWDEV-535159] Add support for GPU partition metrics

Changes include:
  - Internal logic to smart-switch between gpu_metrics/xcp_metrics files
  - [WIP] Initial plumbing for new partition metric API

Change-Id: I4340fb1b48bac0117d80d5d486b9e871430d5cd8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amdsmi_get_gpu_partition_metrics_info() + minor cleanup

Change-Id: I5d60604f18baddbd03852dc90e88aa0b8107d50e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fix partition metric logic + update logging/tests

Change-Id: I9e89b19ead17694c54e224f8e13ff8ee3eb2e22a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Adjust amd-smi metric/monitor/default to show (some) partition information

Change-Id: I2e8d2745876a19bdaec3c039daa97345c9f701b5
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add C++ tests

Change-Id: Ib9eb0b57a6d7a280992e05a4c6eba632826952ef
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Remove modification of energy counter, not needed

Change-Id: I5c48eaaae248ee6dc79abba609d837ec35d78022
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[CLI] amd-smi metric: cleaned up N/A'd multi-valued to show just N/A

Changes:
1. amd-smi metric: cleaned up N/A'd multi-valued to show just N/A
ex.
JPEG_ACTIVITY: [N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A]

Now just shows: N/A

2. [Python Unit Test] Changed testname TestAmdSmiPythonBDF(unittest.TestCase) ->
 AmdSmiPythonUnitTest

Test name was confusing.

Change-Id: Ieb3b036f30002fd22362508eb9fc5d443df395ae
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Log cleanup

Change-Id: I1b1a95f1844d35bec7a7bd8cb996f87e4914c069
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amd-smi partition-metrics CLI + general cleanup

Change-Id: Ia91488e6cb3a4d62b4087afbddfe0b3bb9378fdc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[1.3 metrics] Remove forwards compatibility for partition metrics

Change-Id: Iab928983e6f6f1587bc9307f6f3fa2b2696ca6f7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fixed violation output not showing % + general cleanup

Change-Id: Icac1b0a55b18c7628b07109ae0c377d17e0825f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Clean up amdsmi_get_gpu_partition_metrics_info & amd-smi partition-metric outputs

Change-Id: I6427028b980874641e9ffb3b5d88ad493dbf9cf4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix metrics not found + extra logging/formatting

Change-Id: I841a27bb2c305e97ec7579a13ac915e5be497c3a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update license to current default

Change-Id: I0de9b8a2d5dbbeab4491097f0354ba17b0d30866
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Cleanup for review

Change-Id: I96ed25c3f2b8968eea1af24c5e5860c2b4e74e6e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Moderize updated/new interal APIs.

Change-Id: I3c48a250eeb703709b14cb5ffa68268d8321626c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove extra logging in dynamic metrics

Change-Id: Idb97547bcbe143d6fa1cb5cb278ffe4da615ce14
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove amd-smi partition-metric command

Change-Id: Ib83c17e5cd7e0da3798198943bddd46c296b411c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Move new CLI updates to another PR + minor fixes

Change-Id: I3b1163eec12f9b5f7d95ee33de08e168cec1b1fe
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dynamic metrics to work for gpu/xcp metrics 1.9+/1.1+

Updated some logging as well.

Change-Id: I2ed9f5a5ef8afb1520508820ca6153525f0644b4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dyn gpu/xcp metric v1.9+/v1.1+

Added tests for quick check

Change-Id: I576d6f6582a55afb08e5ac57791ce95e2fa184a2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update tests for larger subset of version checks

Change-Id: I3cdf4f8bb4fc6161f4c76566939f90545d0f362a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix XCP metrics in gpu/partition metric pre-v1.9/v1.1 (dynamic)

Change-Id: I4dabc1ed6bef6b86c8e7f92bf9cb5992f3966fe2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 01b4fe6614]
2025-10-20 14:43:40 -05:00
Narlo, Joseph 5ec7b213e4 [SWDEV-555807] TestCudaMallocAsync test power draw failing (#755)
* Clarified comments regarding power limit retrieval and its support on virtualized systems.
* Change unsupported comment to UINT32_MAX

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 460cfcba1f]
2025-10-17 08:57:57 -05:00
Pryor, Adam 5127c923b9 [SWDEV-559082] Add asic info cache (#756)
Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: cba4c871d3]
2025-10-08 21:48:08 -05:00
Maisam Arif dcb8ba2215 Clean up and add comments
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Id30c0ccb68918e109533593df7c360837bdfa002


[ROCm/amdsmi commit: 4e8ed1f3e3]
2025-10-08 12:00:21 -05:00
Oosman Saeed 2214445327 [SWDEV-553168] Add support for decoding out of band boot time CPER files.
Change-Id: Ic4278698f9c5b5ae56bd56fd43150c0653c1ef05


[ROCm/amdsmi commit: c6698c9100]
2025-10-07 22:23:33 -05:00
Maisam Arif 0b16d22254 [SWDEV-558993] Fix bdf sourcing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I0c50f490334f6de12a4c01abf1c2ed9e50d87295


[ROCm/amdsmi commit: a0d59397b4]
2025-10-07 01:32:26 -05:00
Narlo, Joseph 6975b29c15 [SWDEV-539078] Add missing API definitions to python interface (#525)
Added the following API's to amdsmi_interface.py.
	amdsmi_get_cpu_handle()
	amdsmi_get_esmi_err_msg()
	amdsmi_get_gpu_event_notification()
	amdsmi_get_processor_count_from_handles()
	amdsmi_get_processor_handles_by_type()
	amdsmi_gpu_validate_ras_eeprom()
	amdsmi_init_gpu_event_notification()
	amdsmi_set_gpu_event_notification_mask()
	amdsmi_stop_gpu_event_notification()
	amdsmi_get_gpu_busy_percent()

Added additional return value to API amdsmi_get_xgmi_plpd().
	The entry policies is added to the end of the dictionary to match API definition.
	The entry plpds is marked for deprecation as it has the same information as policies.

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 7decbc67a1]
2025-10-06 14:50:00 -05:00
Pryor, Adam 8bc7216a65 [SWDEV-525336] Use KFD to determine process start/stop (#723)
* Used KFD to determine linking between GPUs and PIDs rather than depend on fdinfo's per pid single gpu bdf info that we were getting.

Signed-off-by: adapryor <Adam.pryor@amd.com>

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: c967aead58]
2025-10-02 10:57:08 -05:00
Maisam Arif 4fcf92ee14 Removed unused version config files
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I3b00a8c302615026422f6d5d602959989ee0418e


[ROCm/amdsmi commit: 843dfaeed2]
2025-09-25 18:19:14 -05:00
Mario Limonciello ef8882b4bf Set the SOVERSION in CMake from MAJOR/MINOR/RELEASE variables
Having the SOVERSION derived from the git tags doesn't scale well
for distributions that don't have the git history while building
(such as a tarball).

As part of 8b96ee5 the strings are parsed from a header.  Re-use
those.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>


[ROCm/amdsmi commit: ccfdb65b6f]
2025-09-25 18:19:14 -05:00
Maisam Arif f20ecc8512 Changed amd_smi_drm.cc to depend on dynamic libdrm
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ie8a794578da1a3ad8893d436e54bbfb67857a7ae


[ROCm/amdsmi commit: df87246b40]
2025-09-25 17:40:05 -05:00
Maisam Arif 0caec33dc5 Change libdrm.so.2 references to dynamic libdrm naming
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ie02c91a3a210ab7612fec670b2aad66d476d2cf3


[ROCm/amdsmi commit: 9f22e59c52]
2025-09-24 20:44:03 -05:00
Stella Laurenzo 7412d14fed Fix delay loading of drm by soname.
[ROCm/amdsmi commit: 4d5d24d1c6]
2025-09-24 20:44:03 -05:00
Stella Laurenzo e16c125f20 Add rt dep back
[ROCm/amdsmi commit: 62e4329559]
2025-09-24 20:44:03 -05:00
Stella Laurenzo 060293c7fc [cmake] Fix dependencies.
* Use CMAKE_DL_LIBS instead of hard-coded `dl`.
* Use Threads::Threads instead of `pthread`.
* Drop `rt` dep.
* Find libdrm via pkgconfig (consistent to how other ROCm projects do it as documented here: https://github.com/ROCm/TheRock/blob/main/docs/development/dependencies.md#libdrm)


[ROCm/amdsmi commit: 4e6731a817]
2025-09-24 20:44:03 -05:00
Maisam Arif 405f34e4d1 [SWDEV-554587] Added IFWI Version and boot_firmware API
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a


[ROCm/amdsmi commit: cd21b5edcc]
2025-09-23 16:05:10 -05:00
Kanangot Balakrishnan, Bindhiya e0995ce7a0 [SWDEV-534605] Increase max devices supported and drm test link type (#625)
Increased the AMDSMI_MAX_DEVICES to 64 to accomodate all
devices in CPX mode. The link type has been modified in
amd-smi to match with rocm-smi types, updated the same
for drm tests.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/amdsmi commit: 6715c5aa92]
2025-09-17 16:30:04 -05:00
Mario Limonciello (AMD) 0c7c849c42 Use nested namespace for amd::smi
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: eacec681dd]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 47b2e80ab8 Drop an unnecessary NULL comparison
warning: the address of ‘amdsmi_asic_info_t::vendor_name’ will never be NULL [-Waddress]

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 4a863b27ab]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 02b357526b Fix a comparison between signed and unsigned integer
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: a15bad1c9e]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 08eec3c675 Drop unused variables
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: a99e827d97]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) c9eddf75e7 Remove unnecessary includes
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 924a06d1e1]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 15e335ac3f Use nested namespace for amd::smi
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: faca0222f0]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 3b411b6759 Fix a crash when running amd-smi version --cpu
When running on a system that doesn't support HSMP (such as an APU)
then the following is observed:
```
/usr/include/c++/15.1.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = void*; _Alloc = std::allocator<void*>; reference = void*&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```

This is because no "CPU" are detected on the SOC, which really means
no CPUs that support HSMP.  Catch this case so that a clean return
can be passed up.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: e5d9e1361e]
2025-09-03 00:49:48 -05:00
Maisam Arif d8c125f2b0 [SWDEV-553016] Added Copyright to scoped_fd.cc
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2ea872e7c5c61a6e4b5c7e7114d016b8a1069b28


[ROCm/amdsmi commit: c876180875]
2025-09-02 15:02:47 -05:00
Maisam Arif ed3e242202 [SWDEV-540665] Remove amdsmi_set_power_cap API Guest Restriction
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I682506b48c10eefbd04f9b494ad57fb8ae8842b0


[ROCm/amdsmi commit: 4ffa468613]
2025-08-27 20:18:43 -05:00
Arif, Maisam 433893c770 Revert "[SWDEV-536176] libdrm_amdgpu depdency change (#448)"
This reverts commit 4d33e79baa.


[ROCm/amdsmi commit: ed2300516f]
2025-08-27 20:11:17 -05:00
Arif, Maisam 4d33e79baa [SWDEV-536176] libdrm_amdgpu depdency change (#448)
* Cmake fix updates
* Next fix will be addressing libdrm further

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Justin Williams <juwillia@amd.com>

[ROCm/amdsmi commit: 652761de54]
2025-08-27 09:32:51 -05:00
Pryor, Adam 8486ac80ba [SWDEV-540665] Move Virtualization checks in APIs into amd-smi APIs (#643)
* Remove vm checks in rocm-smi
* Move virtualization checks up the stack into amd-smi

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: f8afba0a5f]
2025-08-21 18:11:50 -05:00
Oosman Saeed 7c83dac63d [SWDEV-547223] RAS HBM CRC Read CE failed due to AFID missing 24
cherry-pick aca-decode repo changeset: aca-decode repo: f9e5ad5 (HEAD -> main, origin/main, origin/HEAD) Fix bug in Corrected HBM Error being decoded as AFID 34 (#5)


[ROCm/amdsmi commit: ffca095246]
2025-08-21 11:00:30 -05:00
Maisam Arif 2851f80253 Removed kfd_ioctl.h from rocm include install
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I7948eb050f79a8a0f71e0b8a8e4e08187ac0bb84


[ROCm/amdsmi commit: 6de6290dc1]
2025-08-19 17:18:14 -05:00
Charis Poag 7ab967ec69 Revert Major ABI break for amdsmi_get_violation_status()
Changes:
- This aligns back to original struct naming for ROCm 7.0. This removes
any Major ABI breakages for updates for 7.0 release.
- Minor ABI breakage is required since there were additions to the
header. Refer to changelog for these updates.

Change-Id: If35af74eac6beac8c267d05ce789b7761ed24bff
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: d3b73fac82]
2025-08-18 11:36:57 -05:00
Bill Liu 2ebf71976e [SWDEV-548260] Enable Support for Multiple init() and shutdown()
Implemented reference counting to manage init and shutdown processes,
allowing for multiple initializations and shutdowns.


[ROCm/amdsmi commit: c45a53d751]
2025-08-15 11:44:50 -05:00
josnarlo 4fe4c4df23 Fix getting version information
Change-Id: I2695733307888f5ab41a1265ae4369a2ea011e09


[ROCm/amdsmi commit: 925014ddaf]
2025-08-08 08:12:10 -05:00