Граф коммитов

257 Коммитов

Автор SHA1 Сообщение Дата
Charis Poag d3b73fac82 Revert Major ABI break for amdsmi_get_violation_status()
Changes:
- This aligns back to original struct naming for ROCm 7.0. This removes
any Major ABI breakages for updates for 7.0 release.
- Minor ABI breakage is required since there were additions to the
header. Refer to changelog for these updates.

Change-Id: If35af74eac6beac8c267d05ce789b7761ed24bff
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-18 11:36:57 -05:00
Poag, Charis e2e4fc65c1 [SWDEV-542223] Update Violation Status Changes to Design + Minor cleanup (#558)
Changes:
  - Update violation status logic and metric naming for XCP/XCC metrics (thrm/thm consistency)
  - Added XCP identifier in monitor to allow partition metrics to be shown with applicable APIs
    (Violation Status is the first example of this in monitor)
  - Improve CLI monitor output:
    support multiple GPU lines per GPU, add new columns, and better formatting
  - Refactor helpers and logger for flexible unit formatting and table rendering
  - Add examples for amdsmi_get_gpu_pm_metrics_info()/amdsmi_get_gpu_reg_table_info()
    new metrics APIs in C++ example
  - Sync Python/C++ interface and structures for new metrics fields and naming
  - Remove deprecated/unused RSMI activity APIs, documentation not needed since
    the APIs no longer exist in ROCm SMI either.
  - Cleanup metric violations + fix handle watch arguments
  - Provide better handling/doc for average_flattened_ints()
  - Group xcp metrics with brackets in human readable + adjust output size

Signed-off-by: Poag, Charis <Charis.Poag@amd.com>
2025-08-06 16:03:06 -05:00
Poag, Charis d24dc7ef89 [SWDEV-518561] Separate Driver Reload from Memory Partition Sets (#582)
Description:
  - Added a new API `amdsmi_gpu_driver_reload()` to reload the AMD GPU driver independently.
  - Updated CLI (`sudo amd-smi reset -r`) and Python bindings to support driver reload functionality.
  - Removed automatic driver reload from `amdsmi_set_gpu_memory_partition()` and `amdsmi_set_gpu_memory_partition_mode()`.
  - Enhanced CLI and test cases to allow users to control when the driver reload occurs.
  - Updated documentation and changelog to reflect the new driver reload process.
  - Improved error handling and logging for driver reload operations.
  - Added progress bar and user confirmation prompts for driver reload commands.

* Update build/test strategy to only allow one test execution at a time
* Modify API verbage + modify systemctl error output
  - Systemctl is typically not enabled on docker.
  - And is an edge case for gpu being active process/etc for display devices.
* Remove AMDSMI_STATUS_AMDGPU_RESTART_ERR from the return values
* Move driver reload to after we save original compute partitions

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-08-05 20:44:28 -05:00
Liu, Shuzhou (Bill) abd3c02a3c Query UBB/OAM temperature API (#581)
Add support to Query UBB/OAM temperature.
* Updated Python API with new temperature metrics enum

---------

Co-authored-by: Bill Liu <shuzhliu@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-05 20:37:45 -05:00
Pham, Gabriel e2eac98496 [SWDEV-545342] Fixed amdsmi_link_type_t enumeration (#560)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-22 18:22:49 -05:00
Bindhiya Kanangot Balakrishnan 645c313f00 [SWDEV-543308] Revert amdsmi_link_metrics structure change
Moved the bit_rate and max_bandwidth back into links in the
amdsmi_link_metrics_t struct as this change was impacting
other teams. Modified the C and python API's, wrapper, and
CLI accordingly.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-07-14 13:56:26 -05:00
Narlo, Joseph 2cf6272b53 [SWDEV-541675] Remove Unnecessary API from amdsmi.h (#530)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-07-07 11:14:27 -05:00
Maisam Arif 6da33b8ded [SWDEV-529665] PLDM Bundle naming
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Id7f652ddc4e790027869683a4aaa3226ffc05c83
2025-06-12 02:19:37 -05:00
Arif, Maisam e2692ab533 Add Directory Not Found Status code to map to ENOTDIR (#238)
* Corrected ecc count error return
* Added directory not found error code
* Added ENOTDIR mapping to RSMI_STATUS_DIRECTORY_NOT_FOUND in ErrnoToRsmiStatus

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-06-03 17:53:33 -05:00
Joseph Narlo ee43ec71e8 [SWDEV-522996] Syncing Unified Header and AMDSMI
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-06-02 13:44:33 -05:00
Maisam Arif c89b5db09d Deprecated PASID
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib008f80f3d736172079358c0ceb3ebca87340d28
2025-05-30 20:48:29 -05:00
Kanangot Balakrishnan, Bindhiya 2eff0b3764 [SWDEV-530633] Use gpu_metric speed and BW for xgmi (#366)
The xgmi command was showing pcie bit rate and bandwidth instead of xgmi. Corrected the API to get xgmi data from gpu metric.
Added python API for amdsmi_get_link_metrics. Modified the amdsmi_link_metrics struct.
Added check to confirm non zero partition got xgmi command.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-30 16:51:11 -05:00
Arif, Maisam 42441c78ea [SWDEV-488303] Adjusted process vram_mem data source (#411)
* [SWDEV-488303] Adjusted process vram_mem data source
* Standardized sscanf format strings

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-05-29 23:26:12 -05:00
Arif, Maisam 0fdaebdbaa [SWDEV-488303] Updated CU occupancy for per-process retrieval (#243)
Change-Id: I2990597c6dd4b2e8cf3e11ce60f72049ebdd9a8c
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-29 20:35:27 -05:00
Liu, Shuzhou (Bill) 970560fc7c [SWDEV-520665] Add support for board voltage (#303)
* Add the API and CLI to show the board voltage. 

---------

Change-Id: Icb25bd653bb1d004704b5a21b378ca31b2b242c7
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-05-29 18:55:08 -05:00
Kanangot Balakrishnan, Bindhiya e7f19b36f0 [SWDEV-463406] ViolationStatus Changes (#288)
* Expanded Violation Status tracking for GPU metrics 1.8
* Added new fields to `amdsmi_violation_status_t` and related interfaces for enhanced violation statuses
---------

Signed-off-by: Kanangot Balakrishnan, Bindhiya <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
2025-05-29 13:26:21 -05:00
Pryor, Adam d0a89393df Remove ring hang (#391)
Change-Id: I856cd0949d3661911ab9302148aa1bc6e72abeed

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-05-29 11:58:46 -05:00
Narlo, Joseph 8724658c14 [SWDEV-535389] Removed unused definition (#402)
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
2025-05-29 10:48:16 -05:00
Narlo, Joseph b6d638d942 [SWDEV-532125] Remove_Unused_Definitions (#385)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
2025-05-28 18:49:08 -05:00
Narlo, Joseph 7c29b4eab8 [SWDEV-532131] Update String Lengths (#383)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
2025-05-28 18:31:30 -05:00
Narlo, Joseph 9862db63dd [SWDEV-532129] Update amdsmi asic info (#369)
* Added `subsystem_id` to `amdsmi_get_gpu_asic_info`
---------
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
2025-05-28 18:26:58 -05:00
Narlo, Joseph f3a5cc9cd5 [SWDEV-533941] Align P2P input struct (#395)
* Removed `amdsmi_io_link_type_t` and replaced with alredy implemented amdsmi_link_type_t
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-05-28 18:22:19 -05:00
Narlo, Joseph 38a1fadf44 [SWDEV-535200] Remove deprecated function amdsmi_get_power_info_v2 (#397)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
2025-05-28 18:09:13 -05:00
Narlo, Joseph 7b3c85e970 [SWDEV-534438] Update structure amdsmi_bdf_t (#388)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
2025-05-28 18:05:43 -05:00
Narlo, Joseph f71ae88956 [SWDEV-529483] Get Vram Vendor Name from Driver (#323)
* Update to remove vram enum and instead use the string directly from the driver.

Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-28 17:57:49 -05:00
Daniel Oliveira fe9b6eeb49 [SWDEV-529665] Add PLDM Bundle version
feat: Report PLDM Bundle from SMC to IB

Code changes related to the following:
  * APIs
  * CLI
  * Unit tests

Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Change-Id: I35ccf01eb612ca80e3ae6b72039085c18c989222
2025-05-20 01:37:00 -05:00
Mewar, Deepak b999f86611 [SWDEV-512393] Added amdsmi_get_cpu_affinity_with_scope (#198)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
2025-05-20 01:06:09 -05:00
Pryor, Adam 51e99965b3 [SWDEV-527092] - Fix ringhang event removal (#372)
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-05-16 16:41:31 -05:00
Pryor, Adam 8713305f80 [SWDEV-527092] - Process Start/Stop event addition (#368)
- Added more events to `amdsmi_evt_notification_type_t`

Change-Id: I6a256fe828e4bec3197c7fecbed374ab17c6f850
Signed-off-by: Adam Pryor <Adam.Pryor@amd.com>
2025-05-16 11:01:15 -05:00
Saeed, Oosman 1bb1f8acc2 [SWDEV-522623] Add afid functionality to API and CLI (#330)
Change-Id: I015bde926491d54e09da8f39b05650515711e09f

[SWDEV-522623] Add afid functionality to API and CLI


Change-Id: I015bde926491d54e09da8f39b05650515711e09f

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
Co-authored-by: Oosman Saeed <oossaeed@amd.com>
2025-05-16 10:49:56 +08:00
Narlo, Joseph d5ce95573f [SWDEV-522996] Sync Unified Header and AMDSMI (#305)
Sync Unified Header and AMDSMI

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-04-24 13:31:08 -05:00
Arif, Maisam 53dbb7bf58 CLI Help text and parser formatting updates (#218)
* Small Fixes
* CLI Help text and parser formatting updates
* Changed metavar for set partition

---------
Change-Id: Ia8809665f6fac670452cd4db4e5e8f9c7270faba
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Co-authored-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2025-04-22 23:32:42 -05:00
Galantsev, Dmitrii 955ceac78a Add amdsmi_get_gpu_busy_percent
This is required for GPU busy percent in RDC

Change-Id: Idf2ab72993ecc8227958e6eb47f36fc68c93759f
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-04-14 10:40:13 -05:00
Arif, Maisam d81871ef16 [SWDEV-511234] Added amdsmi_get_gpu_cper_entries & CLI implementation
Added amdsmi_get_gpu_cper_entries() in the python and C APIs

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Co-authored-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Co-authored-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-04-12 01:54:57 -05:00
Yuan, Perry 68e44c7f66 [SWDEV-482949] Add CPU model name querying support (#33)
- Add support to check CPU vendor info which will be called by RDC to
discovery CPU information
- Move esmi headers declaration to impl/amd_smi_common.h
- remove duplicated amdsmi_cpu_util_t

---------

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>
2025-03-28 21:21:39 -05:00
Galantsev, Dmitrii 4a3c70136f Make amdsmi_get_power_info backwards compatible
Change-Id: Ie5b4c35265827e78934caa94c142d31efce597e4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-19 23:23:48 -05:00
Castillo, Juan 7c882b2f69 SWDEV-518209: GPU Metrics 1.8 (#177)
- Updates:
    - Adding the following metrics to allow new calculations for violation status:
        - Per XCP metrics gfx_below_host_limit_ppt_acc
        - Per XCP metrics gfx_below_host_limit_thm_acc
        - Per XCP metrics gfx_low_utilization_acc
        - Per XCP metrics gfx_below_host_limit_total_acc
    - Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
2025-03-19 10:24:02 -05:00
Poag, Charis 48cb5529d2 [SWDEV-493274/SWDEV-514998] Add AMD SMI partition tests + Add Guest amd-smi static --partition (#127)
* [SWDEV-493274/SWDEV-514998] Add AMD SMI partition tests + Add Guest amd-smi static --partition

Changes:
    - Added amd-smi static --partition for guest systems
    - Added C++ tests for memory and compute (accelerator) partitions
    - Added Python tests for amdsmi_get_gpu_vram_info(),
       amdsmi_get_gpu_accelerator_partition_profile_config()
    - Updated Python tests for
      amdsmi_get_gpu_accelerator_partition_profile()
      Now includes more profile and resource detail
    - Added amdsmi_get_gpu_xcd_counter();
      Tests provided for both C++/Python APIs
    - Added AmdSmiVramType & AmdSmiVramVendor: they were missing
      python testing required adding.

Change-Id: Ib6549d8ccc5fb68726f38745b87c78f890186022
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-03-11 16:38:46 -05:00
AL Musaffar, Yazen 2936e00fed [SWDEV-453922] AMD SMI to provide mapping feature of other enumeration methods (#51)
Added enumeration mapping for 
- drm render
- drm card
- hsa id 
- hip id
- hip uuid (rocminfo uuid)

Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-03-07 09:09:12 -06:00
Joseph Narlo 695e619913 Merge branch 'SWDEV-517156/Synchronize_Unified_and_Amdsmi_Headers' of github.com:AMD-ROCm-Internal/amdsmi into SWDEV-517156/Synchronize_Unified_and_Amdsmi_Headers 2025-02-21 15:11:46 -06:00
Joseph Narlo b38c9aa1cc [SWDEV-517156] Synchronize Unified and Amdsmi Headers
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-02-21 15:05:57 -06:00
Joseph Narlo 499ba8acb0 [SWDEV-517156] Synchronize Unified and Amdsmi Headers
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-02-21 13:52:30 -06:00
Galantsev, Dmitrii 30a6cb02f2 [SWDEV-513769] Search standard locations for libamd_smi.so
Change-Id: I2c364b7dfa6ffa91e5b1c837c2d3d14ef58ee66b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-02-19 01:34:03 +00:00
Mewar, Deepak 2c591ffcc1 [SWDEV-499995] ESMI Build/Compiler warnings messages (#105)
* [SWDEV-499995] ESMI Build/Compiler warnings messages

Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
2025-02-18 16:20:28 -06:00
Narlo, Joseph dc4a16da6f [SWDEV-513651] Sync Unified And Linux Header (#98)
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-02-06 22:25:50 -06:00
Narlo, Joseph 8e454950ef [SWDEV-509782] Add tags and redefine groups (#73)
Add tags and redefine groups in amd-smi header

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-02-05 18:43:55 -06:00
Pham, Gabriel e663bed7d6 [SWDEV-462952] Updated passthrough to use virtualization mode struct
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-01-31 17:34:01 -06:00
Pham, Gabriel 0f79efac78 [SWDEV-462952] Options enabled for GPU passthrough scenarios
Added Dynamic Passthrough detection

Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
2025-01-30 18:12:03 -06:00
Ramalingam, Muthusamy ced110dbb6 amdsmi: Adding Support to get hsmp Driver version
* amdsmi: Adding Support to get hsmp Driver version

Adding Support to fetch hsmp driver version from ESmi Interfaces.
Adding Support to fetch memory bandwidth per socket.

Signed-off-by: muthusamy <muthusamy.ramalingam@amd.com>
2025-01-29 13:45:02 -06:00
Williams, Justin 21841f44a5 [DCSM-524] ESMI build fix (#72)
Fix amd_hsmp failure to copy new version

Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
2025-01-29 13:39:19 -06:00