2
0
Gráfico de cometimentos

1945 Cometimentos

Autor(a) SHA1 Mensagem Data
Mario Limonciello (AMD) 3d0ea25af3 Remove unnecessary typedef declarations
amd_smi_cper.h:32:1: warning: ‘typedef’ was ignored in this declaration

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 924a06d1e1 Remove unnecessary includes
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) 05f79879c3 Fix a typo
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) faca0222f0 Use nested namespace for amd::smi
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-09-05 17:44:17 -05:00
Mario Limonciello (AMD) e5d9e1361e Fix a crash when running amd-smi version --cpu
When running on a system that doesn't support HSMP (such as an APU)
then the following is observed:
```
/usr/include/c++/15.1.1/bits/stl_vector.h:1263: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = void*; _Alloc = std::allocator<void*>; reference = void*&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```

This is because no "CPU" are detected on the SOC, which really means
no CPUs that support HSMP.  Catch this case so that a clean return
can be passed up.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-09-03 00:49:48 -05:00
Maisam Arif c876180875 [SWDEV-553016] Added Copyright to scoped_fd.cc
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2ea872e7c5c61a6e4b5c7e7114d016b8a1069b28
2025-09-02 15:02:47 -05:00
Maisam Arif 2c9f3af026 [SWDEV-540665] Change parser to not accept 0 as a power set input
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I510fa5523b8dd7ea33f49e21cc199d4a2cfcf9bb
2025-08-29 04:18:36 -05:00
gabrpham_amdeng 39b26104d4 reverted help formatting column width to 80
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-28 11:30:24 -05:00
Tim Huang 51a44bc0c4 Regenerate Rust bindings against latest amdsmi.h header
- Regenerate Rust wrapper against latest amdsmi.h header
- Add libc dependency for proper C memory management
- Fix compilation errors caused by types removed from amdsmi.h
- Add FFI bindings regeneration documentation in README

This update ensures the Rust bindings are synchronized with the latest
C API and provides guidance for developers on regenerating
Bindings.

Signed-off-by: Tim Huang <tim.huang@amd.com>
2025-08-28 09:34:57 -05:00
Maisam Arif 4ffa468613 [SWDEV-540665] Remove amdsmi_set_power_cap API Guest Restriction
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I682506b48c10eefbd04f9b494ad57fb8ae8842b0
2025-08-27 20:18:43 -05:00
Arif, Maisam ed2300516f Revert "[SWDEV-536176] libdrm_amdgpu depdency change (#448)"
This reverts commit 652761de54.
2025-08-27 20:11:17 -05:00
Oosman Saeed 594d5ce8ee [SWDEV-546239] Match amdsmi output with host output 2025-08-27 18:41:59 -05:00
Maisam Arif 978fad01d2 [SWDEV-544299] Fix CLI prefix for amd-smi metric -G
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ic184ec824213421388356417e713d9ed5adeddeb
2025-08-27 18:08:06 -05:00
Arif, Maisam 286c421a49 [SWDEV-552378] Removed First enums in amdsmi_interface.py (#659)
- **Fixed gpuboard and baseboard temperatures enums in amdsmi Python Library**.  
  - AmdSmiTemperatureType had issues with referencing the right attribute, so we removed the following duplicate enums:
    - `AmdSmiTemperatureType.GPUBOARD_NODE_FIRST`
    - `AmdSmiTemperatureType.GPUBOARD_VR_FIRST`
    - `AmdSmiTemperatureType.BASEBOARD_FIRST`

Change-Id: Ia61446b593bd9182d597c4b4c2ac3c5ffdae7493
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-27 18:07:17 -05:00
Arif, Maisam 652761de54 [SWDEV-536176] libdrm_amdgpu depdency change (#448)
* Cmake fix updates
* Next fix will be addressing libdrm further

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Justin Williams <juwillia@amd.com>
2025-08-27 09:32:51 -05:00
Pham, Gabriel b13fc16d60 Added gpuboard and baseboard temperatures to amd-smi metric (#617)
* Added gpu-board and base-board temperatures to amd-smi metric
* Updated Changelog and adjusted the metric base-board/gpu-board output
* Adjusted output of metric to hide base/gpu-board when not relevant

---------

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-26 12:49:56 -05:00
adapryor e8fa06d223 [SWDEV-546543] Fix segfault in gpu_metrics
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-08-22 15:23:57 -05:00
adapryor d25c01e802 [SWDEV-546543] Fix segfault in gpu_metrics
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-08-22 15:23:57 -05:00
Maisam Arif e030f71229 [SWDEV-540665] Power cap on 1VF cli parsing fix
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5aac8f820fd8ae1c6c1dbae3b5b9e69018c69452
2025-08-22 15:22:44 -05:00
Oosman Saeed dee18e9fb4 continue to process all entries 2025-08-21 23:37:24 -05:00
gabrpham_amdeng 71c8b92076 [SWDEV-549373] Added vbios and pldm information to version header and adjusted platform info display
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-21 18:16:47 -05:00
Pryor, Adam f8afba0a5f [SWDEV-540665] Move Virtualization checks in APIs into amd-smi APIs (#643)
* Remove vm checks in rocm-smi
* Move virtualization checks up the stack into amd-smi

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-21 18:11:50 -05:00
gabrpham_amdeng 5aae1a31fa Added Version Header to all Help Sections
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-21 17:17:16 -05:00
Pryor, Adam 4ac1c7e453 [SWDEV-540665] Fix power_caps in help text (#642)
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-08-21 16:45:37 -05:00
Maisam Arif 074c4b7a3f Fix spelling and incorrect error references
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I23e947a0cfd4f68067f9fca703574f44680163d4
2025-08-21 12:36:43 -05:00
Pryor, Adam ad29de4238 [SWDEV-525336] Filter out amd-smi process itself from detection (#638)
* Filter out amd-smi from process detection
* Fixed N/A stripping N/ incorrectly from running elevated processes

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-08-21 11:41:03 -05:00
Oosman Saeed ffca095246 [SWDEV-547223] RAS HBM CRC Read CE failed due to AFID missing 24
cherry-pick aca-decode repo changeset: aca-decode repo: f9e5ad5 (HEAD -> main, origin/main, origin/HEAD) Fix bug in Corrected HBM Error being decoded as AFID 34 (#5)
2025-08-21 11:00:30 -05:00
Saeed, Oosman fd5e37a07e [SWDEV-546239] amd-smi ras cper - no data created (#614)
* Update amd-smi doc with examples of CPER and AFID API usage.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-20 11:27:41 -05:00
Pham, Gabriel e6af86f44a Updated Changelog for updated temperature metrics API (#616)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-19 19:02:50 -05:00
AL Musaffar, Yazen e84e364b35 [SWDEV-549789] Removed incorrect CPER AFID references (#619)
* Fix for afid help
* Update amdsmi_parser.py

Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-08-19 18:55:33 -05:00
Pryor, Adam b62900c372 [SWDEV-544620] Add kfd fallback for GPU Processes (#631)
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-08-19 18:53:16 -05:00
Pham, Gabriel c0ea186d47 [SWDEV-446394] Updated error message for setting clock limit (#633)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-19 18:51:49 -05:00
Poag, Charis 1b2edd70bd [SWDEV-550355] Fix process + violation output when in partitions (#623)
Changes:
  - Fixes amd-smi monitor such as:
    amd-smi monitor -Vqt, amd-smi monitor -g 0 -Vqt -w 1
    amd-smi monitor -Vqt --file /tmp/test1, ...
  - Required moving around when process is called, since xcp
    information is gathered in right format expected by monitor
  - Requires process to be appended first with the gpu data -> xcp
    info to be gathered + added after 1st device

Change-Id: I76356a4610944f633a9530970fac66556d65bf11
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-19 18:50:51 -05:00
Charis Poag 5fe58a8e38 [SWDEV-550679] Fix amd-smi monitor AttributeError
Impacts only Guest systems

Fixes following error:
$ amd-smi monitor
AttributeError: 'Namespace' object has no attribute 'violation'

Change-Id: If501819be3f8e2d2dfd75775dc776873a92465a3
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-19 17:58:44 -05:00
Maisam Arif 6de6290dc1 Removed kfd_ioctl.h from rocm include install
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I7948eb050f79a8a0f71e0b8a8e4e08187ac0bb84
2025-08-19 17:18:14 -05:00
Galantsev, Dmitrii cd33b75540 [SWDEV-545751] CMAKE - Enable fPIC (#629)
Change-Id: Iaade10e70b3a39d6bca23ae98f9f501339ffd76d
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-08-19 11:39:39 -05:00
Poag, Charis e12d270693 [SWDEV-546220] Fix mVF xcd check within tests (#628)
Adding a check to see if we're in guest -> allowing equal XCD values.
This is because in mVF configurations, we may not be able to read the gfx clock values.

Change-Id: I8e5d9627e061e98ec854734a91624c8077644a2a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-19 11:13:18 -05:00
Bindhiya Kanangot Balakrishnan 41488f0c18 [SWDEV-547160] Fix VRAM percentage calculation
The vram_percent calculation was missing
multiplication by 100.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-08-18 17:28:30 -05:00
Arif, Maisam 2d5accd000 [SWDEV-540665] Add power_cap set to Linux Guest (#626)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I3c8d707681c141390b40521231e0d638c81cdeaf
2025-08-18 14:59:14 -05:00
Charis Poag d3b73fac82 Revert Major ABI break for amdsmi_get_violation_status()
Changes:
- This aligns back to original struct naming for ROCm 7.0. This removes
any Major ABI breakages for updates for 7.0 release.
- Minor ABI breakage is required since there were additions to the
header. Refer to changelog for these updates.

Change-Id: If35af74eac6beac8c267d05ce789b7761ed24bff
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-18 11:36:57 -05:00
Bill Liu c45a53d751 [SWDEV-548260] Enable Support for Multiple init() and shutdown()
Implemented reference counting to manage init and shutdown processes,
allowing for multiple initializations and shutdowns.
2025-08-15 11:44:50 -05:00
Maisam Arif c8d0e5c497 [SWDEV-549831] Fixed file outputs not printing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I56b792256c30d618d59d2d40faf5fa0f1c2c4dc6
2025-08-14 11:08:49 -05:00
Charis Poag 425b05cb18 [SWDEV-548755] Driver reload temporary fix for CQE
Temporary solution until CQE can update how their containers are ran.

This is because the driver reload requires:
1) Containers must run serially
   (i.e. no parallel containers running at the same time)
2) Containers must run with extra parameters:
   `--cap-add=SYS_ADMIN -v /lib/modules:/lib/modules`

Change-Id: If6364c9e82da8404b73ac6a9688833f4d18693b0
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-11 13:06:57 -05:00
Galantsev, Dmitrii e7d6590bbc Bump version to 26.1
Change-Id: I1b6ab552c9be965524ad49a866374a0d21b9ceb3
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-08-08 08:12:10 -05:00
josnarlo 925014ddaf Fix getting version information
Change-Id: I2695733307888f5ab41a1265ae4369a2ea011e09
2025-08-08 08:12:10 -05:00
Bindhiya Kanangot Balakrishnan f0453c2c75 [SWDEV-543308] Fix xgmi_metrics_info initialization in xgmi
The xgmi_metrics_info variable was being referenced before
assignment when no destination GPUs were found or when the API
call failed. This caused an UnboundLocalError. Fixed this by
initializing xgmi_metrics_info with empty links structure.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-08-07 16:19:10 -05:00
Charis Poag e7964cda49 Fix amd-smi sets attribute error & memory partition sets
* Changes:
- Fix for any set without CPU loaded (ex.):
sudo /opt/rocm/bin/amd-smi set -o 250
AttributeError: 'Namespace' object has no attribute 'core_boost_limit'

- Fix for recent changes to memory partition sets
  Needed to account for permission denied -> to display not supported.
  EACCESS == *_STATUS_PERMISSION, but in this case need to show
  NOT_SUPPORTED

Change-Id: Ie00bbb34d01adfe38300f1ac4c1620d78885b9b7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-07 16:09:56 -05:00
Justin Williams d0321875d9 CI - Updated Runners & Max Parallels
Signed-off-by: Justin Williams <juwillia@amd.com>
2025-08-07 12:07:19 -05:00
Poag, Charis e2e4fc65c1 [SWDEV-542223] Update Violation Status Changes to Design + Minor cleanup (#558)
Changes:
  - Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency)
  - Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs
    (Violation Status is the first example of this in monitor)
  - Improve CLI monitor output:
    support multiple GPU lines per GPU, add new columns, and better formatting
  - Refactor helpers and logger for flexible unit formatting and table rendering
  - Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info()
    new metrics APIs in C++ example
  - Sync Python/C++ interface and structures for new metrics fields and naming
  - Remove deprecated/unused RSMI activity APIs, documentation not needed since
    the APIs no longer exist in ROCm SMI either.
  - Cleanup metric violations + fix handle watch arguments
  - Provide better handling/doc for average_flattened_ints()
  - Group xcp metrics with brackets in human readable + adjust output size

Signed-off-by: Poag, Charis <Charis.Poag@amd.com>
2025-08-06 16:03:06 -05:00
62d92968791937c6480e7d49e40bec15_amdeng 1dedeac4e3 [SWDEV-539532] Enabled and updated set CPU APIs from CLI (#513)
* Enabled and updated set CPU APIs from CLI
* Fix sets not working consistently across devices + string/int comparison

Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>
2025-08-06 12:52:35 -05:00