373 Коммитов

Автор SHA1 Сообщение Дата
Joseph Narlo 48a4cda75c [SWDEV-552552] Provide CLI testing within amd-smi-lib-tests install (#2485)
* Add common module
* Added information to help with unknowns
* Allow paring of cmds
* change cmd print default
* Reduce cmds to be tested

---------

Signed-off-by: amd-josnarlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>
2026-01-28 22:16:01 -06:00
Joseph Narlo baf676f003 [SWDEV-572968] Readonly test failures on gfx1151 (#2697)
Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2026-01-27 08:29:19 -06:00
Daniel Oliveira 32fde0f73d [SWDEV-568613] Add gpu_metrics 1.0 support for older GPUs (#2444)
fix: Add gpu_metrics 1.0 support which is still used by some hardware

Code changes related to the following:
  * APIs
  * Unit tests

Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2026-01-06 14:25:13 -06:00
Bindhiya Kanangot Balakrishnan 641fa27699 [SWDEV-566543] Fix param validation in FrequenciesRead test (#2430)
Fixed incorrect error code expectation in FrequenciesRead
test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr
parameter.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-23 15:38:25 -08:00
Mario Limonciello 08949cb884 Run pre-commit's whitespace related hooks on projects/amdsmi (#2119)
* Run pre-commit's whitespace related hooks on projects/amdsmi

In order for pre-commit to be useful, everything needs to meet a common
baseline.

* Add whitespace back to Changelog for formatting

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-12-15 13:20:47 -06:00
SakaSitharammurthy caecbb4d01 [SWDEV-354749] Added CPU Performance Tests (#2173)
* CPU Performance testcases
  
---------

Signed-off-by: Saka, Sitharam Murthy <SitharamMurthy.Saka@amd.com>
2025-12-10 21:57:47 -06:00
systems-assistant[bot] e39fe03bcf [SWDEV-488296] Implemented API Performance test case (#1903)
Add API performance testing and execution script

---------

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
Co-authored-by: Sumanth Gavini <sumanth.gavini@amd.com>
2025-12-10 21:33:44 -06:00
Gavini, Sumanth c20a8a9a84 [SWDEV-563788] - Fix: amdsmitst crash from kernel in xcp_metrics read (#826)
Use fork/waitpid to isolate API call and detect SIGKILL from kernel

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>

[ROCm/amdsmi commit: a044536b8d]
2025-11-17 16:24:45 -06:00
Ramalingam, Muthusamy 3659db6f21 [SWDEV-560044]: [AMDSMI][CPU] Update AMDSMI as per latest ESMI Driver (#763)
[AMDSMI][CPU] Update AMDSMI as per latest ESMI Driver,
1) hsmp_acpi
2) amd_hsmp
3) hsmp_common

Signed-off-by: Muthusamy Ramalingam <muthusamy.ramalingam@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: ssaka_amdeng <SitharamMurthy.Saka@amd.com>

[ROCm/amdsmi commit: b4b3539631]
2025-11-17 13:45:43 -06:00
Bindhiya Kanangot Balakrishnan 40aa184c2a [SWDEV-538483] Fix C++17 fs linking for GCC<9.0
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: e1b3d5f02e]
2025-11-14 17:59:07 -06:00
gabrpham_amdeng 351b6f96ae Added support for configuring PPT1 power cap
- Updated python integration test to account for PPT1 support changes
  - Updated set/reset power-cap input format
  - Adjusted python API and updated C++ API test

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f


[ROCm/amdsmi commit: 18faddf6f3]
2025-11-13 13:08:12 -06:00
Galantsev, Dmitrii 4e8d89306e Add downloaded gtest as fallback
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: aac09912ec]
2025-11-06 01:26:40 -06:00
Galantsev, Dmitrii 87ace88e72 Fix missing iomanip and cstdio in tests
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 982737a852]
2025-11-05 10:14:19 -06:00
Galantsev, Dmitrii adaf3c9966 Use system gtest instead of building from source
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: a375479386]
2025-10-30 12:38:11 -05:00
Pryor, Adam 354886f4ff [SWDEV-357472] Add evicted_ms metric (#620)
- **Added evicted_time metric for kfd processes**.  
  - Time that queues are evicted on a GPU in milliseconds
  - Added to CLI in `amd-smi monitor -q` and `amd-smi process`
  - Added to C API and Python API:
    - amdsmi_get_gpu_process_list()
    - amdsmi_get_gpu_compute_process_info()
    - amdsmi_get_gpu_compute_process_info_by_pid()

---------

Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>

[ROCm/amdsmi commit: 2144cfbba4]
2025-10-28 14:49:03 -05:00
Charis Poag fee59b2c58 [SWDEV-562726] Fix clang + ASAN errors
* Updates:
  - [ASAN] GCC does not support `-shared-libsan flags`, so removed this one
  - [Clang] Fixed refernces to local binding errors (name collision)
    & other strict scope/structure/lamda binding errors
  - [Clang] Fix rsmi_wrapper error: \"error: missing default argument on parameter \'args\'\"
  - [ASAN] Fixed stack-buffer-overflow found in
    `amdsmi_get_gpu_accelerator_partition_profile()`

Change-Id: I854007efb75d828dbb8088c0d56dbc125081f0f2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 00a04f5810]
2025-10-28 09:54:23 -05:00
Narlo, Joseph 54317f3fe8 [SWDEV-553416] Fix amdsmi_get_gpu_reg_table_info and amdsmi_get_gpu_pm_metrics_info(#787)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: ced7d12395]
2025-10-27 14:43:31 -05:00
Poag, Charis ce19b921b0 [SWDEV-535159] Add support for GPU partition metrics (#490)
[SWDEV-535159] Add support for GPU partition metrics

Changes include:
  - Internal logic to smart-switch between gpu_metrics/xcp_metrics files
  - [WIP] Initial plumbing for new partition metric API

Change-Id: I4340fb1b48bac0117d80d5d486b9e871430d5cd8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amdsmi_get_gpu_partition_metrics_info() + minor cleanup

Change-Id: I5d60604f18baddbd03852dc90e88aa0b8107d50e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fix partition metric logic + update logging/tests

Change-Id: I9e89b19ead17694c54e224f8e13ff8ee3eb2e22a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Adjust amd-smi metric/monitor/default to show (some) partition information

Change-Id: I2e8d2745876a19bdaec3c039daa97345c9f701b5
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add C++ tests

Change-Id: Ib9eb0b57a6d7a280992e05a4c6eba632826952ef
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Remove modification of energy counter, not needed

Change-Id: I5c48eaaae248ee6dc79abba609d837ec35d78022
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[CLI] amd-smi metric: cleaned up N/A'd multi-valued to show just N/A

Changes:
1. amd-smi metric: cleaned up N/A'd multi-valued to show just N/A
ex.
JPEG_ACTIVITY: [N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A]

Now just shows: N/A

2. [Python Unit Test] Changed testname TestAmdSmiPythonBDF(unittest.TestCase) ->
 AmdSmiPythonUnitTest

Test name was confusing.

Change-Id: Ieb3b036f30002fd22362508eb9fc5d443df395ae
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Log cleanup

Change-Id: I1b1a95f1844d35bec7a7bd8cb996f87e4914c069
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amd-smi partition-metrics CLI + general cleanup

Change-Id: Ia91488e6cb3a4d62b4087afbddfe0b3bb9378fdc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[1.3 metrics] Remove forwards compatibility for partition metrics

Change-Id: Iab928983e6f6f1587bc9307f6f3fa2b2696ca6f7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fixed violation output not showing % + general cleanup

Change-Id: Icac1b0a55b18c7628b07109ae0c377d17e0825f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Clean up amdsmi_get_gpu_partition_metrics_info & amd-smi partition-metric outputs

Change-Id: I6427028b980874641e9ffb3b5d88ad493dbf9cf4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix metrics not found + extra logging/formatting

Change-Id: I841a27bb2c305e97ec7579a13ac915e5be497c3a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update license to current default

Change-Id: I0de9b8a2d5dbbeab4491097f0354ba17b0d30866
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Cleanup for review

Change-Id: I96ed25c3f2b8968eea1af24c5e5860c2b4e74e6e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Moderize updated/new interal APIs.

Change-Id: I3c48a250eeb703709b14cb5ffa68268d8321626c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove extra logging in dynamic metrics

Change-Id: Idb97547bcbe143d6fa1cb5cb278ffe4da615ce14
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove amd-smi partition-metric command

Change-Id: Ib83c17e5cd7e0da3798198943bddd46c296b411c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Move new CLI updates to another PR + minor fixes

Change-Id: I3b1163eec12f9b5f7d95ee33de08e168cec1b1fe
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dynamic metrics to work for gpu/xcp metrics 1.9+/1.1+

Updated some logging as well.

Change-Id: I2ed9f5a5ef8afb1520508820ca6153525f0644b4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dyn gpu/xcp metric v1.9+/v1.1+

Added tests for quick check

Change-Id: I576d6f6582a55afb08e5ac57791ce95e2fa184a2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update tests for larger subset of version checks

Change-Id: I3cdf4f8bb4fc6161f4c76566939f90545d0f362a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix XCP metrics in gpu/partition metric pre-v1.9/v1.1 (dynamic)

Change-Id: I4dabc1ed6bef6b86c8e7f92bf9cb5992f3966fe2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 01b4fe6614]
2025-10-20 14:43:40 -05:00
Narlo, Joseph 6975b29c15 [SWDEV-539078] Add missing API definitions to python interface (#525)
Added the following API's to amdsmi_interface.py.
	amdsmi_get_cpu_handle()
	amdsmi_get_esmi_err_msg()
	amdsmi_get_gpu_event_notification()
	amdsmi_get_processor_count_from_handles()
	amdsmi_get_processor_handles_by_type()
	amdsmi_gpu_validate_ras_eeprom()
	amdsmi_init_gpu_event_notification()
	amdsmi_set_gpu_event_notification_mask()
	amdsmi_stop_gpu_event_notification()
	amdsmi_get_gpu_busy_percent()

Added additional return value to API amdsmi_get_xgmi_plpd().
	The entry policies is added to the end of the dictionary to match API definition.
	The entry plpds is marked for deprecation as it has the same information as policies.

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 7decbc67a1]
2025-10-06 14:50:00 -05:00
Narlo, Joseph 098aa488aa Add ASIC and Board information (#721)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: b1eeff9992]
2025-10-01 17:39:26 -05:00
Justin Williams 33ff308c20 Adjust the amdsmitst.exclude file to blacklist the partition tests for non-MI300+ asics
Signed-off-by: Justin Williams <Justin.Williams@amd.com>


[ROCm/amdsmi commit: b727fe1f8b]
2025-10-01 08:36:42 -05:00
Maisam Arif 405f34e4d1 [SWDEV-554587] Added IFWI Version and boot_firmware API
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a


[ROCm/amdsmi commit: cd21b5edcc]
2025-09-23 16:05:10 -05:00
3049ac537468bd90fe07f2cbb3d7a83e_amdeng 85bcf06edd [SWDEV-531904] Unit and Integ Test Updates (#563)
* [SWDEV-531904] Unit and Integ Test Updates
Updated: unit_tests.py
- Removed redundant self.setUp() and self.tearDown() calls.
- Removed test_free_name_value_pairs() since is internal only.
Updated: integration_test.py
- Added logic to set AMDSMI_CLI_PATH from environment or default.
- Raise FileNotFoundError if path does not exist.
- Append CLI path to sys.path and handle ImportError with a clear message.
- Removed redundant @handle_exceptions function decorator.
- Removed redundant self.setUp() and self.tearDown() calls.
Updated: amdsmi_interface.py
- Removed POINTER conversion in amdsmi_get_gpu_pm_metrics_info() and amdsmi_get_gpu_reg_table_info()

All tests pass/skip

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

* Update tests/python_unittest/integration_test.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>

* Review Update 1
Modified: integration_test.py
- Added logic to properly loop through firmware list and display each name and version

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

* Skip xgmi_err tests + improve running output

Changes:
1. Now check for elevated permissions
2. Skip xgmi_error related SYSFS tests, refer to xgmi_read_write.cc
   (both are skipped)
3. Added list of tests and provided a summary of additional output
   provided

Change-Id: Iefc85c270faad89c625e2bd7af397d24faed2437
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

---------

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Co-authored-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 67eb541c15]
2025-09-11 16:39:31 -05:00
Pham, Gabriel e9ee0bccf2 [SWDEV-551309] Adjusted amdsmitst and reset command (#654)
* Adjusted amdsmitst and reset command to account for separation of power profile and perf level behavior
* Updated test to reset power profile to previous user setting
* Removed performance level from reset_profile_results in reset --profile command
* Updated Changelog with change to reset profile behavior

---------

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: 954d4860c1]
2025-09-09 16:11:07 -05:00
Galantsev, Dmitrii 74efdc57a7 Clean up clang-tidy warnings and unused variables
Change-Id: I1365edf8926908b3a49652fb87f079f8fbf1f56b


[ROCm/amdsmi commit: aba1c792b4]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 3a7b4a283a Remove an impossible check
amdsmi/tests/amd_smi_test/functional/memorypartition_read_write.cc:453:32: warning: the address of ‘orig_memory_partition’ will never be NULL [-Waddress]
  453 |     if ((orig_memory_partition == nullptr) ||
      |          ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 66eb189396]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) c9eddf75e7 Remove unnecessary includes
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 924a06d1e1]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 5fe413710b Fix a typo
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 05f79879c3]
2025-09-05 17:44:17 -05:00
Poag, Charis 61c817a2d3 [SWDEV-546220] Fix mVF xcd check within tests (#628)
Adding a check to see if we're in guest -> allowing equal XCD values.
This is because in mVF configurations, we may not be able to read the gfx clock values.

Change-Id: I8e5d9627e061e98ec854734a91624c8077644a2a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: e12d270693]
2025-08-19 11:13:18 -05:00
Charis Poag 3eb536e34c [SWDEV-548755] Driver reload temporary fix for CQE
Temporary solution until CQE can update how their containers are ran.

This is because the driver reload requires:
1) Containers must run serially
   (i.e. no parallel containers running at the same time)
2) Containers must run with extra parameters:
   `--cap-add=SYS_ADMIN -v /lib/modules:/lib/modules`

Change-Id: If6364c9e82da8404b73ac6a9688833f4d18693b0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 425b05cb18]
2025-08-11 13:06:57 -05:00
Poag, Charis bf8bbd99c6 [SWDEV-518561] Separate Driver Reload from Memory Partition Sets (#582)
Description:
  - Added a new API `amdsmi_gpu_driver_reload()` to reload the AMD GPU driver independently.
  - Updated CLI (`sudo amd-smi reset -r`) and Python bindings to support driver reload functionality.
  - Removed automatic driver reload from `amdsmi_set_gpu_memory_partition()` and `amdsmi_set_gpu_memory_partition_mode()`.
  - Enhanced CLI and test cases to allow users to control when the driver reload occurs.
  - Updated documentation and changelog to reflect the new driver reload process.
  - Improved error handling and logging for driver reload operations.
  - Added progress bar and user confirmation prompts for driver reload commands.

* Update build/test strategy to only allow one test execution at a time
* Modify API verbage + modify systemctl error output
  - Systemctl is typically not enabled on docker.
  - And is an edge case for gpu being active process/etc for display devices.
* Remove AMDSMI_STATUS_AMDGPU_RESTART_ERR from the return values
* Move driver reload to after we save original compute partitions

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: d24dc7ef89]
2025-08-05 20:44:28 -05:00
Liu, Shuzhou (Bill) 7ec0a1a7dd Query UBB/OAM temperature API (#581)
Add support to Query UBB/OAM temperature.
* Updated Python API with new temperature metrics enum

---------

Co-authored-by: Bill Liu <shuzhliu@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: abd3c02a3c]
2025-08-05 20:37:45 -05:00
Poag, Charis e754e8e7ad [SWDEV-536953] Fix sets/resets + Align Power Cap Behavior with ROCM_SMI (#456)
Changes:
  - Modified outputputs for amd-smi set/reset when in partitions
    to display error codes
  - Provided some general cleanup for the above ^
----------------------------------------------------
  - Updated  `amd-smi set -o <value>` /  `amd-smi set --power-cap <value>`  command to
    allow setting power cap to values other than 0, provided the current power cap is not 0.
  - Modified power_cap_read_write.cc:
    - Added a check to ensure that the power cap can only be set to non-zero values if the current
      power cap is not 0.
    - Reset the power cap to the original value after the test to maintain state consistency.
Change-Id: If489bb35812ba4fc4cc34723b0dc39c99926e5d7

---------

Signed-off-by: Poag, Charis <Charis.Poag@amd.com>

[ROCm/amdsmi commit: ec055f2c2d]
2025-07-22 17:21:15 -05:00
Castillo, Juan 801dbaedec [SWDEV-531904] Added test_get_gpu_revision (#533)
* [SWDEV-531904] Added test_get_gpu_revision
New:
- amdsmi_get_gpu_revision() previously not implemented in amdsmi_interface.py
- test_get_gpu_revision() missing integration test.

Updated:
-changelog.md added new doc fields for ROCm 7.1
-amdsmi-py-api.md added field|description doc fields

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

[ROCm/amdsmi commit: 3b1957e674]
2025-07-15 19:35:54 -05:00
Castillo, Juan c9d14c1c93 [SWDEV-531904] Removed Handle Exceptions function (#531)
Removed:
- handle_exceptions() Exposes, silences, and logs AMDSMI exceptions to users returns success/failure

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

[ROCm/amdsmi commit: 34f465bfc5]
2025-07-07 13:26:26 -05:00
josnarlo 3f6b0bb1c7 [SWDEV-539912] Add Skipping to Unit Tests
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 5858d643f3]
2025-06-24 12:01:32 -05:00
josnarlo 4c0c050962 [SWDEV-539591] Allow integration tests to skip Not Supported APIs
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: d8b8dc4116]
2025-06-20 14:19:56 -05:00
Galantsev, Dmitrii 44986cfbd4 DRM - Remove FD usage
Change-Id: I77dfa778ccd0d39a03289c2e11cf10357566ff16
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 9b5bbf555a]
2025-06-20 11:00:42 -05:00
Narlo, Joseph c5e604f357 [SWDEV-489696] Improve AMD SMI Python APIs Functional and Unit Testing (#468)
* Adding python unit tests
* Remove duplicate functions definitions
* Added missing classes for __init__ for py-interface

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 7c0802889b]
2025-06-19 16:38:34 -05:00
Arif, Maisam 6123abe733 [SWDEV-538786] Fix ecc counts returning file error (#494)
Change-Id: I5cea584289df95e89b6151d549bf69e4c3e50d22

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 967e879861]
2025-06-19 15:24:03 -05:00
Castillo, Juan 4a55abaa05 [SWDEV-531904] - Added GPU Cache Read Tests (#464)
New:
- gpu_cache_read.h and gpu_cache_read.cc
- Test reads GPU cache info and asserts valid structure
Updated:
- integration_test.py
- Added test_gpu_cache_info() and asserts valid structure
- test_get_gpu_compute_partition() to loop through all devices when test fail/pass
Added:
- test_get_gpu_compute_partition_returns_string() to integration_test.py
- This test displays the current compute partition for each bdf

---------

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 470c62f887]
2025-06-19 15:23:34 -05:00
Narlo, Joseph f543f77e30 [SWDEV-537038] amd-smi-lib build failing Fix for integration_test.py (#496)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: 57a749f457]
2025-06-19 15:12:31 -05:00
josnarlo 0862dd11fb [SWDEV-537038] amd_smi-lib build failing Fix for integration_test.py
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 99b2bfbc61]
2025-06-19 11:23:25 -05:00
Maisam Arif 6e37490e87 [SWDEV-529665] PLDM Bundle naming
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Id7f652ddc4e790027869683a4aaa3226ffc05c83


[ROCm/amdsmi commit: 6da33b8ded]
2025-06-12 02:19:37 -05:00
Charis Poag df6de25624 [SWDEV-529030/SWDEV-531217] Fix tests & output for partitioned configurations (CPX, DPX, QPX, etc.)
Changes:
  - Updated AMD SMI firmware to display "N/A" for unavailable firmware in partitioned environments, improving clarity.
    Example (in DPX):
    $ amd-smi firmware
    GPU: 0
        FW_LIST:
            ...
            FW 12:
                FW_ID: PM
                FW_VERSION: 00.86.39.00
    GPU: 1
        FW_LIST: N/A
  - Fixed amd-smi partition not showing current partition information on
    asics with inablity to set memory or accelerator partitions.
    $ amd-smi partition -c -m
    CURRENT_PARTITION:
    GPU_ID  MEMORY  ACCELERATOR_TYPE  ACCELERATOR_PROFILE_INDEX  PARTITION_ID
    0       NPS1    CPX               2                          0
    1       N/A     N/A               N/A                        1
    2       N/A     N/A               N/A                        2
    3       N/A     N/A               N/A                        3
    4       N/A     N/A               N/A                        4
    5       N/A     N/A               N/A                        5
    6       NPS1    SPX               0                          0
    7       NPS1    SPX               0                          0
    8       NPS1    SPX               0                          0

    MEMORY_PARTITION:
    GPU_ID  MEMORY_PARTITION_CAPS  CURRENT_MEMORY_PARTITION
    0       N/A                    NPS1
    1       N/A                    N/A
    2       N/A                    N/A
    3       N/A                    N/A
    4       N/A                    N/A
    5       N/A                    N/A
    6       N/A                    NPS1
    7       N/A                    NPS1
    8       N/A                    NPS1

  - Refactored amd_smi_drm_example.cc:
    - Grouped partition changes and restores original partition settings.
    - Now handles partitioned environments allowing example to continue even if some APIs are not supported in partitioned configurations.
  - Modified amdsmi_asic_info_t (see amdsmi_get_gpu_asic_info()) to report OAM ID as N/A if 0xFFFFFFFF (was 0xFFFF).
    Allows for better handling of OAM IDs in partitioned environments (DNE for non-primary nodes,
    since its a physical identifier). Easier to handle in tests and example code (ie. now consistent w/ max size of the structure's value).
  - Introduced amdsmi_RAII_open_FD() (internal API) to manage file descriptors using RAII, ensuring proper closure and preventing resource leaks.
    Updated the following APIs to use this function:
      - amdsmi_get_gpu_asic_info(), amdsmi_get_gpu_vram_usage(),
        amdsmi_get_gpu_vram_info(), amdsmi_get_gpu_vbios_info(),
        amdsmi_get_gpu_driver_info(), amdsmi_get_gpu_virtualization_mode()
  - Updated AMD SMI test_base.cc/.h:
    - Improved output and handling for partitioned environments.
    - Added detailed ASIC information logging to align with structure changes.
    - Enhanced error messages for better context before ASSERT checks.
  - Resolved test failures in partitioned environments by updating
    logic and handling for partition-specific configurations.
    Fixed tests include:
      - computepartition_read_write.cc, frequencies_read_write.cc,
        gpu_metrics_read.cc, mem_util_read.cc, memorypartition_read_write.cc,
        perf_level_read.cc, perf_level_read_write.cc, power_cap_read_write.cc,
        power_read.cc, sys_info_read.cc, gpu_busy_read.cc

Change-Id: I36e903f8fddd714c74c719459c71aba8bbb77e6f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Resetting head + adding fixes for tests ran in partitions

Change-Id: I0c1e9ac07488b50c95f3bc6d8a724e67d2c715dc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 391451752b]
2025-06-05 19:24:49 -05:00
Maisam Arif 00ad72baf9 Deprecated PASID
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib008f80f3d736172079358c0ceb3ebca87340d28


[ROCm/amdsmi commit: c89b5db09d]
2025-05-30 20:48:29 -05:00
Liu, Shuzhou (Bill) ff2e230a34 [SWDEV-520665] Add support for board voltage (#303)
* Add the API and CLI to show the board voltage. 

---------

Change-Id: Icb25bd653bb1d004704b5a21b378ca31b2b242c7
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>

[ROCm/amdsmi commit: 970560fc7c]
2025-05-29 18:55:08 -05:00
Narlo, Joseph fc54da7679 [SWDEV-489696] Improve AMD SMI Python APIs Functional and Unit Testing (#408)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: 13148c5d8e]
2025-05-29 17:18:08 -05:00
Pryor, Adam 69fde31369 Remove ring hang (#391)
Change-Id: I856cd0949d3661911ab9302148aa1bc6e72abeed

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: d0a89393df]
2025-05-29 11:58:46 -05:00
Narlo, Joseph 8d6253d772 [SWDEV-532125] Remove_Unused_Definitions (#385)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: b6d638d942]
2025-05-28 18:49:08 -05:00