228 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
habajpai-amd bad8d915c3 Fix: Add visibility hidden to devInfoTypesStrings to prevent symbol interposition (#2575) 2026-01-14 09:48:49 -08:00
Daniel Oliveira 32fde0f73d [SWDEV-568613] Add gpu_metrics 1.0 support for older GPUs (#2444)
fix: Add gpu_metrics 1.0 support which is still used by some hardware

Code changes related to the following:
  * APIs
  * Unit tests

Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2026-01-06 14:25:13 -06:00
Mario Limonciello 08949cb884 Run pre-commit's whitespace related hooks on projects/amdsmi (#2119)
* Run pre-commit's whitespace related hooks on projects/amdsmi

In order for pre-commit to be useful, everything needs to meet a common
baseline.

* Add whitespace back to Changelog for formatting

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-12-15 13:20:47 -06:00
koushikbillakanti-amd 9e06ea8f79 [SWDEV-564696] Structure size mismatch in SOC pstate/XGMI PLPD (#2207)
* Address PR feedback: consolidate switch cases, move CSV formatting, use direct API calls for error messages
* csv output flattening changes

---------

Signed-off-by: Billakanti, Koushik <Koushik.Billakanti@amd.com>
2025-12-10 23:37:36 -06:00
Adam Pryor 422253f871 Implement PTL support (#1957)
* Implement PTL support

Signed-off-by: adapryor <Adam.pryor@amd.com>
(cherry picked from commit 45bc31292e7940a3b8fca044ef7df22047b95733)

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-11-26 08:33:27 -06:00
Bindhiya Kanangot Balakrishnan 40aa184c2a [SWDEV-538483] Fix C++17 fs linking for GCC<9.0
Added check for GCC versions prior to 9.0 and
link with libstdc++fs when needed. This fixes
undefined symbols on older systems like Deb10
with GCC 8.3.0.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: e1b3d5f02e]
2025-11-14 17:59:07 -06:00
Kanangot Balakrishnan, Bindhiya 072daa28d5 [SWDEV-538483] Add NPM API's and CLI (#817)
* Added Python & C API's for new node devices. Currently these are functional for node 0 only.
 - amdsmi_get_node_handle
 - amdsmi_get_npm_info
* Added `amd-smi node` CLI for Node Power Management

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: f8e4771363]
2025-11-13 21:51:31 -06:00
gabrpham_amdeng 351b6f96ae Added support for configuring PPT1 power cap
- Updated python integration test to account for PPT1 support changes
  - Updated set/reset power-cap input format
  - Adjusted python API and updated C++ API test

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f


[ROCm/amdsmi commit: 18faddf6f3]
2025-11-13 13:08:12 -06:00
Bindhiya Kanangot Balakrishnan 9973a6b324 [SWDEV-558046] Fix topology weight corruption due to casting
The out of bound writes caused corruption in next field,
which was weight. Fixed by reading to a temp and then assigning
safely.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: a2aae5e8a9]
2025-10-30 10:49:38 -05:00
Allan Xavier 9b4a9acd27 Allowed GPU enumeration to continue with non-contiguous render nodes
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 51971426bd]
2025-10-29 15:31:56 -05:00
Pryor, Adam 354886f4ff [SWDEV-357472] Add evicted_ms metric (#620)
- **Added evicted_time metric for kfd processes**.  
  - Time that queues are evicted on a GPU in milliseconds
  - Added to CLI in `amd-smi monitor -q` and `amd-smi process`
  - Added to C API and Python API:
    - amdsmi_get_gpu_process_list()
    - amdsmi_get_gpu_compute_process_info()
    - amdsmi_get_gpu_compute_process_info_by_pid()

---------

Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>

[ROCm/amdsmi commit: 2144cfbba4]
2025-10-28 14:49:03 -05:00
Charis Poag fee59b2c58 [SWDEV-562726] Fix clang + ASAN errors
* Updates:
  - [ASAN] GCC does not support `-shared-libsan flags`, so removed this one
  - [Clang] Fixed refernces to local binding errors (name collision)
    & other strict scope/structure/lamda binding errors
  - [Clang] Fix rsmi_wrapper error: \"error: missing default argument on parameter \'args\'\"
  - [ASAN] Fixed stack-buffer-overflow found in
    `amdsmi_get_gpu_accelerator_partition_profile()`

Change-Id: I854007efb75d828dbb8088c0d56dbc125081f0f2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 00a04f5810]
2025-10-28 09:54:23 -05:00
Poag, Charis ce19b921b0 [SWDEV-535159] Add support for GPU partition metrics (#490)
[SWDEV-535159] Add support for GPU partition metrics

Changes include:
  - Internal logic to smart-switch between gpu_metrics/xcp_metrics files
  - [WIP] Initial plumbing for new partition metric API

Change-Id: I4340fb1b48bac0117d80d5d486b9e871430d5cd8
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amdsmi_get_gpu_partition_metrics_info() + minor cleanup

Change-Id: I5d60604f18baddbd03852dc90e88aa0b8107d50e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fix partition metric logic + update logging/tests

Change-Id: I9e89b19ead17694c54e224f8e13ff8ee3eb2e22a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Adjust amd-smi metric/monitor/default to show (some) partition information

Change-Id: I2e8d2745876a19bdaec3c039daa97345c9f701b5
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add C++ tests

Change-Id: Ib9eb0b57a6d7a280992e05a4c6eba632826952ef
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Remove modification of energy counter, not needed

Change-Id: I5c48eaaae248ee6dc79abba609d837ec35d78022
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[CLI] amd-smi metric: cleaned up N/A'd multi-valued to show just N/A

Changes:
1. amd-smi metric: cleaned up N/A'd multi-valued to show just N/A
ex.
JPEG_ACTIVITY: [N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A, N/A]

Now just shows: N/A

2. [Python Unit Test] Changed testname TestAmdSmiPythonBDF(unittest.TestCase) ->
 AmdSmiPythonUnitTest

Test name was confusing.

Change-Id: Ieb3b036f30002fd22362508eb9fc5d443df395ae
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Log cleanup

Change-Id: I1b1a95f1844d35bec7a7bd8cb996f87e4914c069
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Add amd-smi partition-metrics CLI + general cleanup

Change-Id: Ia91488e6cb3a4d62b4087afbddfe0b3bb9378fdc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[1.3 metrics] Remove forwards compatibility for partition metrics

Change-Id: Iab928983e6f6f1587bc9307f6f3fa2b2696ca6f7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Fixed violation output not showing % + general cleanup

Change-Id: Icac1b0a55b18c7628b07109ae0c377d17e0825f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Clean up amdsmi_get_gpu_partition_metrics_info & amd-smi partition-metric outputs

Change-Id: I6427028b980874641e9ffb3b5d88ad493dbf9cf4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix metrics not found + extra logging/formatting

Change-Id: I841a27bb2c305e97ec7579a13ac915e5be497c3a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update license to current default

Change-Id: I0de9b8a2d5dbbeab4491097f0354ba17b0d30866
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Cleanup for review

Change-Id: I96ed25c3f2b8968eea1af24c5e5860c2b4e74e6e
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Moderize updated/new interal APIs.

Change-Id: I3c48a250eeb703709b14cb5ffa68268d8321626c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove extra logging in dynamic metrics

Change-Id: Idb97547bcbe143d6fa1cb5cb278ffe4da615ce14
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Remove amd-smi partition-metric command

Change-Id: Ib83c17e5cd7e0da3798198943bddd46c296b411c
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Move new CLI updates to another PR + minor fixes

Change-Id: I3b1163eec12f9b5f7d95ee33de08e168cec1b1fe
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dynamic metrics to work for gpu/xcp metrics 1.9+/1.1+

Updated some logging as well.

Change-Id: I2ed9f5a5ef8afb1520508820ca6153525f0644b4
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Allow dyn gpu/xcp metric v1.9+/v1.1+

Added tests for quick check

Change-Id: I576d6f6582a55afb08e5ac57791ce95e2fa184a2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Update tests for larger subset of version checks

Change-Id: I3cdf4f8bb4fc6161f4c76566939f90545d0f362a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

* Fix XCP metrics in gpu/partition metric pre-v1.9/v1.1 (dynamic)

Change-Id: I4dabc1ed6bef6b86c8e7f92bf9cb5992f3966fe2
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 01b4fe6614]
2025-10-20 14:43:40 -05:00
adapryor cda730140f [SWDEV-560778] Update gpu metrics factory to return a new pointer every time
[ROCm/amdsmi commit: a64e9b4ac4]
2025-10-15 11:00:44 -05:00
Pryor, Adam 5127c923b9 [SWDEV-559082] Add asic info cache (#756)
Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: cba4c871d3]
2025-10-08 21:48:08 -05:00
Pryor, Adam a93b9d473d [SWDEV-558895] Fix rsmi monitor fds (#748)
Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: 346e1516af]
2025-10-07 21:31:23 -05:00
Pryor, Adam d1679c7ade [SWDEV-558895] Fix rsmi_event_notification_get segfaulting (#738)
Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: ce016f0dcb]
2025-10-06 15:10:56 -05:00
Pryor, Adam 8bc7216a65 [SWDEV-525336] Use KFD to determine process start/stop (#723)
* Used KFD to determine linking between GPUs and PIDs rather than depend on fdinfo's per pid single gpu bdf info that we were getting.

Signed-off-by: adapryor <Adam.pryor@amd.com>

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: c967aead58]
2025-10-02 10:57:08 -05:00
Pryor, Adam 94e6ba68b4 [SWDEV-547088] Dynamic GPU Metrics Implementation (#692)
* Added ability to format gpu_metrics v1_9
* New gpu_metrics format from the driver should allow amd-smi to parse with future compatibility guaranteed

---------

Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Signed-off-by: adapryor <Adam.pryor@amd.com>
Co-authored-by: Oliveira, Daniel <daniel.oliveira@amd.com>

[ROCm/amdsmi commit: 5ef0b3c34d]
2025-10-01 15:46:10 -05:00
Bindhiya Kanangot Balakrishnan f4b921c5f5 [SWDEV-556005 & SWDEV-556853] Initialize temp-type map
Added back the temp-type map initialization to
RSMI_TEMP_TYPE_INVALID before probing hwmon files. This
prevents std::out_of_range for unsupported or absent
temperature sensor types.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 3e7e4ab1ac]
2025-09-25 12:03:35 -05:00
Stella Laurenzo 7412d14fed Fix delay loading of drm by soname.
[ROCm/amdsmi commit: 4d5d24d1c6]
2025-09-24 20:44:03 -05:00
Stella Laurenzo e16c125f20 Add rt dep back
[ROCm/amdsmi commit: 62e4329559]
2025-09-24 20:44:03 -05:00
Stella Laurenzo 060293c7fc [cmake] Fix dependencies.
* Use CMAKE_DL_LIBS instead of hard-coded `dl`.
* Use Threads::Threads instead of `pthread`.
* Drop `rt` dep.
* Find libdrm via pkgconfig (consistent to how other ROCm projects do it as documented here: https://github.com/ROCm/TheRock/blob/main/docs/development/dependencies.md#libdrm)


[ROCm/amdsmi commit: 4e6731a817]
2025-09-24 20:44:03 -05:00
Maisam Arif 405f34e4d1 [SWDEV-554587] Added IFWI Version and boot_firmware API
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a


[ROCm/amdsmi commit: cd21b5edcc]
2025-09-23 16:05:10 -05:00
Mario Limonciello e9fdf65aa2 Fix compilation with gcc 15
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>


[ROCm/amdsmi commit: 902667db3c]
2025-09-17 16:29:38 -05:00
Bindhiya Kanangot Balakrishnan 9d0ce8ba42 [SWDEV-414304] Reduce excessive hwmon operations
Previously, the function was iterating through all enum
values(0-250). This fix reduces the number of hwmon operations
by calling add_temp_sensor_entry only for temperature types
that fall within the defined enum ranges.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 17ffe5a1bd]
2025-09-09 10:30:51 -05:00
Galantsev, Dmitrii 74efdc57a7 Clean up clang-tidy warnings and unused variables
Change-Id: I1365edf8926908b3a49652fb87f079f8fbf1f56b


[ROCm/amdsmi commit: aba1c792b4]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 0c7c849c42 Use nested namespace for amd::smi
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: eacec681dd]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 08eec3c675 Drop unused variables
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: a99e827d97]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) c9eddf75e7 Remove unnecessary includes
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 924a06d1e1]
2025-09-05 17:44:17 -05:00
adapryor 671612471d [SWDEV-546543] Fix segfault in gpu_metrics
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/amdsmi commit: e8fa06d223]
2025-08-22 15:23:57 -05:00
adapryor 17f9feb94e [SWDEV-546543] Fix segfault in gpu_metrics
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/amdsmi commit: d25c01e802]
2025-08-22 15:23:57 -05:00
Pryor, Adam 8486ac80ba [SWDEV-540665] Move Virtualization checks in APIs into amd-smi APIs (#643)
* Remove vm checks in rocm-smi
* Move virtualization checks up the stack into amd-smi

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: f8afba0a5f]
2025-08-21 18:11:50 -05:00
Pryor, Adam 5e4a23dd01 [SWDEV-525336] Filter out amd-smi process itself from detection (#638)
* Filter out amd-smi from process detection
* Fixed N/A stripping N/ incorrectly from running elevated processes

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: ad29de4238]
2025-08-21 11:41:03 -05:00
Pryor, Adam 96a28009fc [SWDEV-544620] Add kfd fallback for GPU Processes (#631)
Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: b62900c372]
2025-08-19 18:53:16 -05:00
Charis Poag 79ce271d1f Fix amd-smi sets attribute error & memory partition sets
* Changes:
- Fix for any set without CPU loaded (ex.):
sudo /opt/rocm/bin/amd-smi set -o 250
AttributeError: 'Namespace' object has no attribute 'core_boost_limit'

- Fix for recent changes to memory partition sets
  Needed to account for permission denied -> to display not supported.
  EACCESS == *_STATUS_PERMISSION, but in this case need to show
  NOT_SUPPORTED

Change-Id: Ie00bbb34d01adfe38300f1ac4c1620d78885b9b7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: e7964cda49]
2025-08-07 16:09:56 -05:00
Poag, Charis 07dfa789d0 [SWDEV-542223] Update Violation Status Changes to Design + Minor cleanup (#558)
Changes:
  - Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency)
  - Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs
    (Violation Status is the first example of this in monitor)
  - Improve CLI monitor output:
    support multiple GPU lines per GPU, add new columns, and better formatting
  - Refactor helpers and logger for flexible unit formatting and table rendering
  - Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info()
    new metrics APIs in C++ example
  - Sync Python/C++ interface and structures for new metrics fields and naming
  - Remove deprecated/unused RSMI activity APIs, documentation not needed since
    the APIs no longer exist in ROCm SMI either.
  - Cleanup metric violations + fix handle watch arguments
  - Provide better handling/doc for average_flattened_ints()
  - Group xcp metrics with brackets in human readable + adjust output size

Signed-off-by: Poag, Charis <Charis.Poag@amd.com>

[ROCm/amdsmi commit: e2e4fc65c1]
2025-08-06 16:03:06 -05:00
Pham, Gabriel 4565e23aca [SWDEV-542706] Corrected get_od_clk_volt_info (#604)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: b916ceedb6]
2025-08-06 12:24:02 -05:00
Pham, Gabriel c8698c87ef [SWDEV-542706] Adjusted logic for reading pp_od_clk_voltage (#592)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 95c11daa68]
2025-08-06 11:20:09 -05:00
Poag, Charis bf8bbd99c6 [SWDEV-518561] Separate Driver Reload from Memory Partition Sets (#582)
Description:
  - Added a new API `amdsmi_gpu_driver_reload()` to reload the AMD GPU driver independently.
  - Updated CLI (`sudo amd-smi reset -r`) and Python bindings to support driver reload functionality.
  - Removed automatic driver reload from `amdsmi_set_gpu_memory_partition()` and `amdsmi_set_gpu_memory_partition_mode()`.
  - Enhanced CLI and test cases to allow users to control when the driver reload occurs.
  - Updated documentation and changelog to reflect the new driver reload process.
  - Improved error handling and logging for driver reload operations.
  - Added progress bar and user confirmation prompts for driver reload commands.

* Update build/test strategy to only allow one test execution at a time
* Modify API verbage + modify systemctl error output
  - Systemctl is typically not enabled on docker.
  - And is an edge case for gpu being active process/etc for display devices.
* Remove AMDSMI_STATUS_AMDGPU_RESTART_ERR from the return values
* Move driver reload to after we save original compute partitions

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: d24dc7ef89]
2025-08-05 20:44:28 -05:00
Liu, Shuzhou (Bill) 7ec0a1a7dd Query UBB/OAM temperature API (#581)
Add support to Query UBB/OAM temperature.
* Updated Python API with new temperature metrics enum

---------

Co-authored-by: Bill Liu <shuzhliu@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: abd3c02a3c]
2025-08-05 20:37:45 -05:00
Saeed, Oosman 7a6d75af7c [SWDEV-533349] codeQL erors in amdsmi source code (#588)
Signed-off-by: Saeed, Oosman <Oosman.Saeed@amd.com>

[ROCm/amdsmi commit: 753a5ea326]
2025-08-05 20:17:21 -05:00
Pryor, Adam 32a1ef90cd Documentation updates for AMDSMI_GPU_METRICS_CACHE_MS (#564)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 2dc2e12a97]
2025-08-05 19:58:37 -05:00
Maisam Arif 45d8f954c8 gpu_metrics caching fix
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I6dacb0b81d6677c354ef3c86af4d7d5156a76d8b


[ROCm/amdsmi commit: fcf494bbc5]
2025-07-14 12:12:37 -05:00
Pryor, Adam 4303644f90 Add gpu metrics cache (#541)
* Add gpu metrics caching defaulted to 100ms
* AMDSMI_GPU_METRICS_CACHE_MS is used to set the caching rate limits

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 42096c1398]
2025-07-13 09:56:29 -05:00
Poag, Charis 92f926b43b [SWDEV-533305] Remove partition info from amd-smi static (-p/--partition still available) + CLI API call cleanup (#529)
Updates:
- Separate extra APIs calls from amd-smi CLI to target specific CLI commands that need them.
- Remove extra current_compute_partition SYSFS calls from amd-smi static.
- Remove the partition information from the default `amd-smi static` CLI command.
- Users must now use the `-p` argument to view partition information with `amd-smi static`.
- The help text for the `partition` argument has been updated to reflect this change.
- The partition information can still be accessed using the `amd-smi partition -c -m` or `sudo amd-smi partition -a` commands.

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 88473b7fd0]
2025-07-07 11:21:46 -05:00
Arif, Maisam 6123abe733 [SWDEV-538786] Fix ecc counts returning file error (#494)
Change-Id: I5cea584289df95e89b6151d549bf69e4c3e50d22

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 967e879861]
2025-06-19 15:24:03 -05:00
Castillo, Juan 4a55abaa05 [SWDEV-531904] - Added GPU Cache Read Tests (#464)
New:
- gpu_cache_read.h and gpu_cache_read.cc
- Test reads GPU cache info and asserts valid structure
Updated:
- integration_test.py
- Added test_gpu_cache_info() and asserts valid structure
- test_get_gpu_compute_partition() to loop through all devices when test fail/pass
Added:
- test_get_gpu_compute_partition_returns_string() to integration_test.py
- This test displays the current compute partition for each bdf

---------

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 470c62f887]
2025-06-19 15:23:34 -05:00
Galantsev, Dmitrii a480b2869d rsmi_init: Do not complain loudly when no driver is found (#74)
Co-authored-by: Samuel Thibault <samuel.thibault@ens-lyon.org>


[ROCm/amdsmi commit: ca52da194d]
2025-06-19 13:22:48 -05:00
Maisam Arif 6e37490e87 [SWDEV-529665] PLDM Bundle naming
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Id7f652ddc4e790027869683a4aaa3226ffc05c83


[ROCm/amdsmi commit: 6da33b8ded]
2025-06-12 02:19:37 -05:00