830 Коммитов

Автор SHA1 Сообщение Дата
Loganaden Velvindron bf36e5f620 Fix disabled fortify source security flag (#2570)
Fix spurious character that caused CI issue.
2026-01-28 22:30:24 -06:00
Bindhiya Kanangot Balakrishnan aa16cca39a [SWDEV-549108] Increase gpu_metrics API execution test threshold (#2617)
Increased threshold from 2100 μs to 3100 µs to accommodate
gpu_metric read time variation across Navi systems.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2026-01-15 11:20:17 -06:00
Matthias Gehre 1883f736ad Fix double-free crash when librocm_smi64.so and libamd_smi.so are loaded together (#2531)
Problem:
When TheRock-based PyTorch package is installed along with amdsmi, importing
torch causes a double-free crash on exit (GitHub issue ROCm/TheRock#2269).

Root cause:
Both librocm_smi64.so and libamd_smi.so export the C++ static member
'amd::smi::Device::devInfoTypesStrings'. When libraries are loaded with
RTLD_GLOBAL, the dynamic linker resolves libamd_smi.so's reference to this
symbol to the one in librocm_smi64.so. This causes:
1. librocm_smi64.so registers its destructor for devInfoTypesStrings
2. libamd_smi.so also registers a destructor, but for the SAME address
3. On exit, both destructors run on the same object -> double-free

Fix:
Change devInfoTypesStrings from a class static member to a file-local static
variable. This ensures the symbol has internal linkage and is not exported,
preventing the symbol collision.

Changes:
- rocm_smi_device.h: Remove static member declaration
- rocm_smi_device.cc: Change from 'Device::devInfoTypesStrings' to file-local
  'static const std::map<...> devInfoTypesStrings'
- rocm_smi.cc: Remove the global alias to the (now removed) class member

Tested on gfx1151. `import torch` crashed on exit before the fix, and doesn't crash after the fix.
2026-01-15 08:43:47 -08:00
Bindhiya Kanangot Balakrishnan 8326c33d33 [SWDEV-573540] Add DRM-based wake for suspended AMD GPUs (#2510)
Implements automatic device wake using getDRMDeviceId() DRM call when GPUs
are detected in low-power state. This ensures rocm-smi can access device
information on suspended GPUs.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2026-01-08 10:19:45 -06:00
Yazen AL Musaffar cb372748f8 [ROCM-SMI] [SWDEV-569731] rsmi tests failing on Frequency/Power/GpuMetrics ReadOnly Fix (#2303)
* Updated unsupported metric version file for rocm_smi_tests Frequency/Power/GpuMetrics ReadOnly tests

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2026-01-06 16:46:38 -06:00
arvindcheru 21afa807a9 Enable Lintian Support for ROCM-SMI, ROCMINFO (#1650)
* Enable Lintian Support for ROCM-SMI

* Enable Lintian Support for ROCMINFO

* Updated Lintian Override File Processing

* Update UT Fix for Lintian rocmsmi,rocminfo

* Update UT Fixes, Review Comments

* Update Review Comments - removed extra white spaces, added error check for gzip, date commands

* Update Review Comments - Correcting License Type

* Sync Lintian ChangeLog

* Changelog data sync enhanced

* Update Review Comments, UT fix

* white space cleanup - precommit check
2025-12-15 14:35:28 -06:00
Mario Limonciello bfb13f2b43 Run pre-commit's whitespace related hooks on projects/rocm-smi-lib (#2117)
* Run pre-commit's whitespace related hooks on projects/rocm-smi-lib

In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Added Changelog Spaces for formatting

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-12-11 15:41:24 -06:00
Yazen AL Musaffar c9d6a8720c [SWDEV-548312] Fix for rsmitstReadWrite.TestPciReadWrite failure in rsmi-tests on MI200. (#1834)
* Fix for rsmitstReadWrite.TestPciReadWrite failure in rsmi-tests

Signed-off-by: yalmusaf_amdeng <yalmusaf@amd.com>

* Resolved comments

Signed-off-by: yalmusaf_amdeng <yalmusaf@amd.com>

---------

Signed-off-by: yalmusaf_amdeng <yalmusaf@amd.com>
Co-authored-by: yalmusaf_amdeng <yalmusaf@amd.com>
2025-12-03 15:21:36 -06:00
Bindhiya Kanangot Balakrishnan e8c3b22734 [SWDEV-556483] Fix runtime PM suspend causing test failures (#1931)
Added runtime PM detection and DRM ioctl-based device wake
to handle GPUs in BACO state. Modified tests to wake
suspended devices before reading sysfs files.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-11-25 13:36:45 -06:00
darren-amd 16e7ee32e6 [rocm-smi-lib] Add iomanip include to frequencies_read (#1797) 2025-11-24 16:38:21 -05:00
gabrpham 6b1e6187f6 [SWDEV-560681] Allowed GPU enumeration to continue with non-contiguous render nodes (#1609)
* Fix uninitialized variable in GPU enumeration loop (#1643)
* Initialize node_to_gpu_id to prevent undefined behavior

---------

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Allan Xavier <axavier@digitalocean.com>
2025-11-21 10:01:10 -06:00
Charis Poag Jones 933fdc3c7e [SWDEV-558141] Fix rocm-smi --setsclk [0...n] & other clocks in partitioned configurations (#1493)
Changes:
  - Fix `rocm-smi --setsclk [0 .. n]` for multiple devices to continue on fail when
    in a partitioned configuration (ex. in DPX/QPX/CPX/etc).
  - Partitioned configurations or devices which do not support changing
    sclk/mclk/pcie clks will now continue on failure. Will report a "not
    supported" or other (rocm-smi) error codes for these devices.
  - Updates impact other clock settings such as `--setmclk` and
    `--setpcie`.

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-10-23 08:56:41 -05:00
Bindhiya Kanangot Balakrishnan 97b6e806da SWDEV-560768 - SMI test return if no devices available (#1369)
Return from Setup if no monitor devices are available.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-10-16 15:35:18 -05:00
Bindhiya Kanangot Balakrishnan b4288fd8d4 SWDEV-554099 - Update rsmi tests expected output (#1364)
Updated rsmitsts expected outputs to accomodate
returned status.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-10-16 15:34:07 -05:00
systems-assistant[bot] 857e5ef3ce chore: unset executable permission (#213)
Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-09-16 11:06:54 -05:00
systems-assistant[bot] 88201d2b79 [SWDEV-544729] Updated CLI error handling (#216)
Updated: rocm_smi.py
- Remove all else: clauses from functions where rsmi_ret_ok is part of the if clause, as requested.
- rsmi_ret_ok() function already handles unsucessful return codes and gracefully handles them.
- Updated check_runtime_status() function to sweep through /sys/class/drm to find active runtime_status.
- Updated the message to' AMD GPU device(s) is/are in a low-power state. Check power control/runtime_status'
- This clarifies the status of the GPU and tells them where to check for more info.

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
2025-09-16 10:56:03 -05:00
harkgill-amd 902ec4d3ad Fix documentation to match function signature (#990)
Co-authored-by: ammallya <ameyakeshava.mallya@amd.com>
2025-09-15 11:19:21 -07:00
systems-assistant[bot] bfdb3bc636 fix(python): fix comparison to None (#211)
from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations):

> Comparisons to singletons like None should always be done with is or is not, never the equality operators.

Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2025-09-10 14:50:32 -05:00
systems-assistant[bot] 39ea16e544 fix(E712): fix comparison to True/False (#212)
from PEP8 (https://peps.python.org/pep-0008/#programming-recommendations):

> Comparisons to singletons like None should always be done with is or is not, never the equality operators.

Co-authored-by: Eisuke Kawashima <e-kwsm@users.noreply.github.com>
2025-09-10 14:50:23 -05:00
systems-assistant[bot] f450ff0624 Update rocm_smi_device.cc of delete duplicate content (#214)
It has the duplicate content in line 488 in the same file.

Co-authored-by: qiwei_ji <qiwei_ji@outlook.com>
2025-09-10 14:50:13 -05:00
Bindhiya Kanangot Balakrishnan c7b6bb9600 SWDEV-476667 - Get pcie bw from GPU metric (#853)
The sysfs pcie bandwidth file pcie_bw is deprecated
in newer asics. This change will get pcie BW from
GPU metric for version 1.5 or later.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-09-10 14:48:31 -05:00
Joseph Macaranas 696881ae82 LICENSE clean up (#919)
- Clean up and standardization of MIT licenses after discussion with legal team.
- Update README.md with blurb for top-level files.
- MIT License explicitly mentioned for relevant projects.
- Removal of years.
- Copyright attribution should be to `Advanced Micro Devices, Inc.` and not `AMD ROCm(TM) Software`
- Removal of `All rights reserved.`
- Reduce line width of the text for readability.
- Add clear visual separators for additional licenses.
- Convert text files to markdown format for aforementioned separators.
- Update build scripts to point to renamed files.
- Fixed SMI doc references

Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-09-10 12:06:14 -04:00
gabrpham 5dbca01d2d [SWDEV-551309] Adjusted rocmsmitst and --resetprofile command (#769) 2025-09-09 14:32:35 -05:00
gabrpham ee38e26ab2 [SWDEV_543709] Updated tests with new expectations for output (#692)
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-09-09 14:32:01 -05:00
gabrpham 94e194eba2 [SWDEV-540377] Fixed segfault in --showevent command (#649)
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-28 11:49:36 -05:00
Peter Park b4e336aef3 [rocm-smi-lib] docs: fix changelog heading ROCm 6.5.0 -> ROCm 7.0.0 (#363) 2025-08-18 11:46:02 -04:00
GabrPham 67fd6c0f73 Applied Copilot suggestions
[ROCm/rocm_smi_lib commit: def9a3c92d]
2025-08-06 12:42:44 -05:00
GabrPham 12f71cab20 Adjusted logic for reading pp_od_clk_voltage
Signed-off-by: GabrPham <gabrpham_amdeng@amd.com>


[ROCm/rocm_smi_lib commit: 3bea40dfd0]
2025-08-06 12:42:44 -05:00
GabrPham a1052e7598 Updated Tool and Lib Version
Signed-off-by: GabrPham <gabrpham_amdeng@amd.com>


[ROCm/rocm_smi_lib commit: 25aec994a0]
2025-08-06 12:38:08 -05:00
GabrPham 4e69ac4f59 Update threading to use more stable threading module
Unstable threading was causing segmentation faults. Update to use more
recent threading module rather than the _thread module solved
segmentation fault issue.

multiple issues solved by this commit:
	[SWDEV-537518]
	[SWDEV-540377]
	[SWDEV-540223]

Signed-off-by: GabrPham <gabrpham_amdeng@amd.com>


[ROCm/rocm_smi_lib commit: 7dba992ebd]
2025-07-09 09:36:44 -05:00
Castillo, Juan 8133e89e82 [SWDEV-539845] Add support for board voltage (#92)
* Add the API and CLI to show the board voltage.

---------

Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Liu, Shuzhou (Bill) <Shuzhou.Liu@amd.com>

[ROCm/rocm_smi_lib commit: bab82d98b7]
2025-07-03 01:58:50 -05:00
Cheruvally, Aravindan d1c281a87e SWDEV-530465 Update share/doc/<pkgnm> License folder (#86)
Update share/doc/ folder for license/docs to reflect correct package name.
Signed-off-by: Cheruvally, Aravindan <Aravindan.Cheruvally@amd.com>

[ROCm/rocm_smi_lib commit: 55ab9b2bda]
2025-07-03 01:56:50 -05:00
Adam McDaniel b3caa4972c Exposed the energy count and current socket power metrics to profilers like PAPI
[ROCm/rocm_smi_lib commit: f688c9938d]
2025-07-02 09:47:17 -05:00
Galantsev, Dmitrii ff02dc85da Revert "rsmi_init: Do not complain loudly when no driver is found"
This reverts commit 42dc44f54d.


[ROCm/rocm_smi_lib commit: 731be3f743]
2025-07-02 09:45:41 -05:00
Samuel Thibault 42dc44f54d rsmi_init: Do not complain loudly when no driver is found
When librocm-smi is pulled through a dependency, we may end up on a system
without actual hardware supported by ROCM, and rsmi_init() failing is
actually expected, we do want to frighten the user in such a case.


[ROCm/rocm_smi_lib commit: 8ca4207d5c]
2025-06-19 13:30:51 -05:00
Ranjith Ramakrishnan e8477f460f SWDEV-534264 - Add liboam.a to static rocm-smi package
liboam.a was missing in static rocm-smi package and resulting in compilation error on appliction that use rocm-smi


[ROCm/rocm_smi_lib commit: 59468e3f78]
2025-06-13 12:09:41 -05:00
Arif, Maisam 9002bcc5a8 Revert "SWDEV-534264 - Add liboam.a to static package"
This reverts commit ae9bcb11e1.


[ROCm/rocm_smi_lib commit: 5cc6c1ca1c]
2025-06-12 16:26:19 -05:00
Ranjith Ramakrishnan ae9bcb11e1 SWDEV-534264 - Add liboam.a to static package
liboam.a was missing in static package. The library is gettting created but not packaged.
Fixed the same


[ROCm/rocm_smi_lib commit: ff7561607e]
2025-06-11 13:45:21 -05:00
Charis Poag b45713faf5 [SWDEV-530035] Fix tests ran with partitioned configurations (CPX, DPX, QPX, etc.)
Changes: - Updates to APIs to handle null pointers or RSMI_STATUS_NOT_SUPPORTED
  - Fixes to tests to handle partitioned configurations correctly
  - Synced with latest AMD SMI API changes
Change-Id: I7a932f9336ef29ccb01d3b15e2101f6136b45720


[ROCm/rocm_smi_lib commit: 12b78439d2]
2025-06-06 16:39:29 -05:00
Peter Park 5a3556ca85 update copyright years to 2025
revert shared_mutex.h


[ROCm/rocm_smi_lib commit: a156bfa4ae]
2025-06-03 17:16:54 -05:00
Peter Park 148400af45 update license year to 2025
[ROCm/rocm_smi_lib commit: b0831d79cf]
2025-06-03 17:16:54 -05:00
Charis Poag 5e31509711 Removed backwards compatibility for jpeg_activity/vcn_activity
Updated:
- Removed backwards compatibility for jpeg_activity/vcn_activity
- On supported ASICs users can use XCP (partition) stat values:
  jpeg_busy and vcn_busy

Change-Id: I78c403f8462668738ec57cac12b107f6a3989b18
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/rocm_smi_lib commit: 1c6b2adae7]
2025-05-29 13:47:56 -05:00
Stella Laurenzo a768477da4 [PATCH] Miscellaneous CMake fixes.
Change-Id: Ibca31745d2e9375523193310bc1ca5994c87aa32


[ROCm/rocm_smi_lib commit: 92db324944]
2025-05-27 12:12:42 -05:00
Afzal Patel 29602fec52 add interface drm include directory
add interface drm include directory


[ROCm/rocm_smi_lib commit: f3c6e80fab]
2025-05-27 12:06:56 -05:00
Pham, Gabriel 87c455684d [SWDEV-533221] Synced rocm-smi with amd-smi lib to fix warning messages (#71)
* Removed URL that was on prohibited source list

---------

Signed-off-by: Gabriel Pham <Gabriel.Pham@amd.com.>


[ROCm/rocm_smi_lib commit: 4243e42758]
2025-05-26 10:08:16 -05:00
Castillo, Juan eaa2000af5 [SWDEV-523359] fan_read_write: Add set fan speed validation check. (#61)
[SWDEV-523359] fan_read_write: Add set fan speed validation check.
- Handled NOT_SUPPORTED status which previously caused rsmitst to false fail
- Added continute statement to proceed with rest of FanReadWrite test.
- fixed spacing line 140

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

[ROCm/rocm_smi_lib commit: ac31c6e576]
2025-05-26 09:54:41 -05:00
Ramakrishnan, Ranjith 623a6452e5 SWDEV-532478 - Add rocm-core dependency to RPM packages (#67)
rocm-core dependency was missing for rpm packages and fixed the same

[ROCm/rocm_smi_lib commit: d32ff28ebf]
2025-05-20 14:41:38 -07:00
Maisam Arif ffb52bf65e Updated Maintenance mode notice
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>


[ROCm/rocm_smi_lib commit: 3f11537401]
2025-05-20 01:44:57 -05:00
Arif, Maisam a2443a5efe Revert "Correct the dependencies of rocm_smi package."
This reverts commit 93ff9b3547.


[ROCm/rocm_smi_lib commit: 40671be7c9]
2025-05-18 10:34:28 -05:00
Arif, Maisam a9ddb0dcea Revert "Use devel in RPM package requires field"
This reverts commit 22777b75b5.


[ROCm/rocm_smi_lib commit: a0ad4a3fcd]
2025-05-18 10:34:28 -05:00