Gráfico de commits

1945 Commits

Autor SHA1 Mensaje Fecha
Mario Limonciello (AMD) a8a89db945 Remove unnecessary typedef declarations
amd_smi_cper.h:32:1: warning: ‘typedef’ was ignored in this declaration

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 3d0ea25af3]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) c9eddf75e7 Remove unnecessary includes
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 924a06d1e1]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 5fe413710b Fix a typo
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: 05f79879c3]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 15e335ac3f Use nested namespace for amd::smi
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: faca0222f0]
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 3b411b6759 Fix a crash when running amd-smi version --cpu
When running on a system that doesn't support HSMP (such as an APU)
then the following is observed:
```
/usr/include/c++/15.1.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = void*; _Alloc = std::allocator<void*>; reference = void*&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```

This is because no "CPU" are detected on the SOC, which really means
no CPUs that support HSMP.  Catch this case so that a clean return
can be passed up.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>


[ROCm/amdsmi commit: e5d9e1361e]
2025-09-03 00:49:48 -05:00
Maisam Arif d8c125f2b0 [SWDEV-553016] Added Copyright to scoped_fd.cc
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2ea872e7c5c61a6e4b5c7e7114d016b8a1069b28


[ROCm/amdsmi commit: c876180875]
2025-09-02 15:02:47 -05:00
Maisam Arif db443c025c [SWDEV-540665] Change parser to not accept 0 as a power set input
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I510fa5523b8dd7ea33f49e21cc199d4a2cfcf9bb


[ROCm/amdsmi commit: 2c9f3af026]
2025-08-29 04:18:36 -05:00
gabrpham_amdeng 51c2ea4731 reverted help formatting column width to 80
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 39b26104d4]
2025-08-28 11:30:24 -05:00
Tim Huang c3f5771541 Regenerate Rust bindings against latest amdsmi.h header
- Regenerate Rust wrapper against latest amdsmi.h header
- Add libc dependency for proper C memory management
- Fix compilation errors caused by types removed from amdsmi.h
- Add FFI bindings regeneration documentation in README

This update ensures the Rust bindings are synchronized with the latest
C API and provides guidance for developers on regenerating
Bindings.

Signed-off-by: Tim Huang <tim.huang@amd.com>


[ROCm/amdsmi commit: 51a44bc0c4]
2025-08-28 09:34:57 -05:00
Maisam Arif ed3e242202 [SWDEV-540665] Remove amdsmi_set_power_cap API Guest Restriction
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I682506b48c10eefbd04f9b494ad57fb8ae8842b0


[ROCm/amdsmi commit: 4ffa468613]
2025-08-27 20:18:43 -05:00
Arif, Maisam 433893c770 Revert "[SWDEV-536176] libdrm_amdgpu depdency change (#448)"
This reverts commit 4d33e79baa.


[ROCm/amdsmi commit: ed2300516f]
2025-08-27 20:11:17 -05:00
Oosman Saeed 190ed3953d [SWDEV-546239] Match amdsmi output with host output
[ROCm/amdsmi commit: 594d5ce8ee]
2025-08-27 18:41:59 -05:00
Maisam Arif 8d5335a8de [SWDEV-544299] Fix CLI prefix for amd-smi metric -G
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ic184ec824213421388356417e713d9ed5adeddeb


[ROCm/amdsmi commit: 978fad01d2]
2025-08-27 18:08:06 -05:00
Arif, Maisam 46a2ef944f [SWDEV-552378] Removed First enums in amdsmi_interface.py (#659)
- **Fixed gpuboard and baseboard temperatures enums in amdsmi Python Library**.  
  - AmdSmiTemperatureType had issues with referencing the right attribute, so we removed the following duplicate enums:
    - `AmdSmiTemperatureType.GPUBOARD_NODE_FIRST`
    - `AmdSmiTemperatureType.GPUBOARD_VR_FIRST`
    - `AmdSmiTemperatureType.BASEBOARD_FIRST`

Change-Id: Ia61446b593bd9182d597c4b4c2ac3c5ffdae7493
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 286c421a49]
2025-08-27 18:07:17 -05:00
Arif, Maisam 4d33e79baa [SWDEV-536176] libdrm_amdgpu depdency change (#448)
* Cmake fix updates
* Next fix will be addressing libdrm further

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Justin Williams <juwillia@amd.com>

[ROCm/amdsmi commit: 652761de54]
2025-08-27 09:32:51 -05:00
Pham, Gabriel 3ef5bfef94 Added gpuboard and baseboard temperatures to amd-smi metric (#617)
* Added gpu-board and base-board temperatures to amd-smi metric
* Updated Changelog and adjusted the metric base-board/gpu-board output
* Adjusted output of metric to hide base/gpu-board when not relevant

---------

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: b13fc16d60]
2025-08-26 12:49:56 -05:00
adapryor 671612471d [SWDEV-546543] Fix segfault in gpu_metrics
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/amdsmi commit: e8fa06d223]
2025-08-22 15:23:57 -05:00
adapryor 17f9feb94e [SWDEV-546543] Fix segfault in gpu_metrics
Signed-off-by: adapryor <Adam.pryor@amd.com>


[ROCm/amdsmi commit: d25c01e802]
2025-08-22 15:23:57 -05:00
Maisam Arif a68cd9612a [SWDEV-540665] Power cap on 1VF cli parsing fix
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5aac8f820fd8ae1c6c1dbae3b5b9e69018c69452


[ROCm/amdsmi commit: e030f71229]
2025-08-22 15:22:44 -05:00
Oosman Saeed 588cf7d0c2 continue to process all entries
[ROCm/amdsmi commit: dee18e9fb4]
2025-08-21 23:37:24 -05:00
gabrpham_amdeng f55c41202e [SWDEV-549373] Added vbios and pldm information to version header and adjusted platform info display
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 71c8b92076]
2025-08-21 18:16:47 -05:00
Pryor, Adam 8486ac80ba [SWDEV-540665] Move Virtualization checks in APIs into amd-smi APIs (#643)
* Remove vm checks in rocm-smi
* Move virtualization checks up the stack into amd-smi

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: f8afba0a5f]
2025-08-21 18:11:50 -05:00
gabrpham_amdeng d12d268029 Added Version Header to all Help Sections
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 5aae1a31fa]
2025-08-21 17:17:16 -05:00
Pryor, Adam 7ede8b9f4a [SWDEV-540665] Fix power_caps in help text (#642)
Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: 4ac1c7e453]
2025-08-21 16:45:37 -05:00
Maisam Arif f732ee4e98 Fix spelling and incorrect error references
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I23e947a0cfd4f68067f9fca703574f44680163d4


[ROCm/amdsmi commit: 074c4b7a3f]
2025-08-21 12:36:43 -05:00
Pryor, Adam 5e4a23dd01 [SWDEV-525336] Filter out amd-smi process itself from detection (#638)
* Filter out amd-smi from process detection
* Fixed N/A stripping N/ incorrectly from running elevated processes

Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: ad29de4238]
2025-08-21 11:41:03 -05:00
Oosman Saeed 7c83dac63d [SWDEV-547223] RAS HBM CRC Read CE failed due to AFID missing 24
cherry-pick aca-decode repo changeset: aca-decode repo: f9e5ad5 (HEAD -> main, origin/main, origin/HEAD) Fix bug in Corrected HBM Error being decoded as AFID 34 (#5)


[ROCm/amdsmi commit: ffca095246]
2025-08-21 11:00:30 -05:00
Saeed, Oosman 3779562abb [SWDEV-546239] amd-smi ras cper - no data created (#614)
* Update amd-smi doc with examples of CPER and AFID API usage.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: fd5e37a07e]
2025-08-20 11:27:41 -05:00
Pham, Gabriel d32bae0e8f Updated Changelog for updated temperature metrics API (#616)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: e6af86f44a]
2025-08-19 19:02:50 -05:00
AL Musaffar, Yazen 678972b8ec [SWDEV-549789] Removed incorrect CPER AFID references (#619)
* Fix for afid help
* Update amdsmi_parser.py

Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>

[ROCm/amdsmi commit: e84e364b35]
2025-08-19 18:55:33 -05:00
Pryor, Adam 96a28009fc [SWDEV-544620] Add kfd fallback for GPU Processes (#631)
Signed-off-by: adapryor <Adam.pryor@amd.com>

[ROCm/amdsmi commit: b62900c372]
2025-08-19 18:53:16 -05:00
Pham, Gabriel 729b7beddf [SWDEV-446394] Updated error message for setting clock limit (#633)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: c0ea186d47]
2025-08-19 18:51:49 -05:00
Poag, Charis 35b4b0df38 [SWDEV-550355] Fix process + violation output when in partitions (#623)
Changes:
  - Fixes amd-smi monitor such as:
    amd-smi monitor -Vqt, amd-smi monitor -g 0 -Vqt -w 1
    amd-smi monitor -Vqt --file /tmp/test1, ...
  - Required moving around when process is called, since xcp
    information is gathered in right format expected by monitor
  - Requires process to be appended first with the gpu data -> xcp
    info to be gathered + added after 1st device

Change-Id: I76356a4610944f633a9530970fac66556d65bf11
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 1b2edd70bd]
2025-08-19 18:50:51 -05:00
Charis Poag b239e5be60 [SWDEV-550679] Fix amd-smi monitor AttributeError
Impacts only Guest systems

Fixes following error:
$ amd-smi monitor
AttributeError: 'Namespace' object has no attribute 'violation'

Change-Id: If501819be3f8e2d2dfd75775dc776873a92465a3
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 5fe58a8e38]
2025-08-19 17:58:44 -05:00
Maisam Arif 2851f80253 Removed kfd_ioctl.h from rocm include install
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I7948eb050f79a8a0f71e0b8a8e4e08187ac0bb84


[ROCm/amdsmi commit: 6de6290dc1]
2025-08-19 17:18:14 -05:00
Galantsev, Dmitrii beaffa2c84 [SWDEV-545751] CMAKE - Enable fPIC (#629)
Change-Id: Iaade10e70b3a39d6bca23ae98f9f501339ffd76d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

[ROCm/amdsmi commit: cd33b75540]
2025-08-19 11:39:39 -05:00
Poag, Charis 61c817a2d3 [SWDEV-546220] Fix mVF xcd check within tests (#628)
Adding a check to see if we're in guest -> allowing equal XCD values.
This is because in mVF configurations, we may not be able to read the gfx clock values.

Change-Id: I8e5d9627e061e98ec854734a91624c8077644a2a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: e12d270693]
2025-08-19 11:13:18 -05:00
Bindhiya Kanangot Balakrishnan 8e645a6da7 [SWDEV-547160] Fix VRAM percentage calculation
The vram_percent calculation was missing
multiplication by 100.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 41488f0c18]
2025-08-18 17:28:30 -05:00
Arif, Maisam 4e568b2eea [SWDEV-540665] Add power_cap set to Linux Guest (#626)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I3c8d707681c141390b40521231e0d638c81cdeaf

[ROCm/amdsmi commit: 2d5accd000]
2025-08-18 14:59:14 -05:00
Charis Poag 7ab967ec69 Revert Major ABI break for amdsmi_get_violation_status()
Changes:
- This aligns back to original struct naming for ROCm 7.0. This removes
any Major ABI breakages for updates for 7.0 release.
- Minor ABI breakage is required since there were additions to the
header. Refer to changelog for these updates.

Change-Id: If35af74eac6beac8c267d05ce789b7761ed24bff
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: d3b73fac82]
2025-08-18 11:36:57 -05:00
Bill Liu 2ebf71976e [SWDEV-548260] Enable Support for Multiple init() and shutdown()
Implemented reference counting to manage init and shutdown processes,
allowing for multiple initializations and shutdowns.


[ROCm/amdsmi commit: c45a53d751]
2025-08-15 11:44:50 -05:00
Maisam Arif 029ca0f256 [SWDEV-549831] Fixed file outputs not printing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I56b792256c30d618d59d2d40faf5fa0f1c2c4dc6


[ROCm/amdsmi commit: c8d0e5c497]
2025-08-14 11:08:49 -05:00
Charis Poag 3eb536e34c [SWDEV-548755] Driver reload temporary fix for CQE
Temporary solution until CQE can update how their containers are ran.

This is because the driver reload requires:
1) Containers must run serially
   (i.e. no parallel containers running at the same time)
2) Containers must run with extra parameters:
   `--cap-add=SYS_ADMIN -v /lib/modules:/lib/modules`

Change-Id: If6364c9e82da8404b73ac6a9688833f4d18693b0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 425b05cb18]
2025-08-11 13:06:57 -05:00
Galantsev, Dmitrii 8b96ee5271 Bump version to 26.1
Change-Id: I1b6ab552c9be965524ad49a866374a0d21b9ceb3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: e7d6590bbc]
2025-08-08 08:12:10 -05:00
josnarlo 4fe4c4df23 Fix getting version information
Change-Id: I2695733307888f5ab41a1265ae4369a2ea011e09


[ROCm/amdsmi commit: 925014ddaf]
2025-08-08 08:12:10 -05:00
Bindhiya Kanangot Balakrishnan 82a2c0dffc [SWDEV-543308] Fix xgmi_metrics_info initialization in xgmi
The xgmi_metrics_info variable was being referenced before
assignment when no destination GPUs were found or when the API
call failed. This caused an UnboundLocalError. Fixed this by
initializing xgmi_metrics_info with empty links structure.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: f0453c2c75]
2025-08-07 16:19:10 -05:00
Charis Poag 79ce271d1f Fix amd-smi sets attribute error & memory partition sets
* Changes:
- Fix for any set without CPU loaded (ex.):
sudo /opt/rocm/bin/amd-smi set -o 250
AttributeError: 'Namespace' object has no attribute 'core_boost_limit'

- Fix for recent changes to memory partition sets
  Needed to account for permission denied -> to display not supported.
  EACCESS == *_STATUS_PERMISSION, but in this case need to show
  NOT_SUPPORTED

Change-Id: Ie00bbb34d01adfe38300f1ac4c1620d78885b9b7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: e7964cda49]
2025-08-07 16:09:56 -05:00
Justin Williams 05ee73d6de CI - Updated Runners & Max Parallels
Signed-off-by: Justin Williams <juwillia@amd.com>


[ROCm/amdsmi commit: d0321875d9]
2025-08-07 12:07:19 -05:00
Poag, Charis 07dfa789d0 [SWDEV-542223] Update Violation Status Changes to Design + Minor cleanup (#558)
Changes:
  - Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency)
  - Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs
    (Violation Status is the first example of this in monitor)
  - Improve CLI monitor output:
    support multiple GPU lines per GPU, add new columns, and better formatting
  - Refactor helpers and logger for flexible unit formatting and table rendering
  - Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info()
    new metrics APIs in C++ example
  - Sync Python/C++ interface and structures for new metrics fields and naming
  - Remove deprecated/unused RSMI activity APIs, documentation not needed since
    the APIs no longer exist in ROCm SMI either.
  - Cleanup metric violations + fix handle watch arguments
  - Provide better handling/doc for average_flattened_ints()
  - Group xcp metrics with brackets in human readable + adjust output size

Signed-off-by: Poag, Charis <Charis.Poag@amd.com>

[ROCm/amdsmi commit: e2e4fc65c1]
2025-08-06 16:03:06 -05:00
62d92968791937c6480e7d49e40bec15_amdeng 3437d5b5da [SWDEV-539532] Enabled and updated set CPU APIs from CLI (#513)
* Enabled and updated set CPU APIs from CLI
* Fix sets not working consistently across devices + string/int comparison

Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>

[ROCm/amdsmi commit: 1dedeac4e3]
2025-08-06 12:52:35 -05:00