Граф коммитов

441 Коммитов

Автор SHA1 Сообщение Дата
Yuan, Perry 68e44c7f66 [SWDEV-482949] Add CPU model name querying support (#33)
- Add support to check CPU vendor info which will be called by RDC to
discovery CPU information
- Move esmi headers declaration to impl/amd_smi_common.h
- remove duplicated amdsmi_cpu_util_t

---------

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>
2025-03-28 21:21:39 -05:00
Poag, Charis 0402bb4d75 [SWDEV-513807] Fix amd-smi partition --accelerator not returning AMDSMI_STATUS_NO_PERM (#192)
* [SWDEV-513807] Fix amd-smi partition --accelerator not returning AMDSMI_STATUS_NO_PERM

Changes:
- Fixed amdsmi_get_gpu_accelerator_partition_profile_config() from not
  returning AMDSMI_STATUS_NO_PERM
- Changed amd-smi partition --accelerator to provide user with a warning
  if users does not use sudo or root permissions.
- Updated changelog for fixes planned for 6.4.1 release

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-03-20 17:23:01 -05:00
Galantsev, Dmitrii 4a3c70136f Make amdsmi_get_power_info backwards compatible
Change-Id: Ie5b4c35265827e78934caa94c142d31efce597e4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-19 23:23:48 -05:00
Castillo, Juan 7c882b2f69 SWDEV-518209: GPU Metrics 1.8 (#177)
- Updates:
    - Adding the following metrics to allow new calculations for violation status:
        - Per XCP metrics gfx_below_host_limit_ppt_acc
        - Per XCP metrics gfx_below_host_limit_thm_acc
        - Per XCP metrics gfx_low_utilization_acc
        - Per XCP metrics gfx_below_host_limit_total_acc
    - Increasing available JPEG engines to 40. Current ASICs may not support all 40. These will be indicated as UINT16_MAX or N/A in CLI.

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
2025-03-19 10:24:02 -05:00
Poag, Charis 48cb5529d2 [SWDEV-493274/SWDEV-514998] Add AMD SMI partition tests + Add Guest amd-smi static --partition (#127)
* [SWDEV-493274/SWDEV-514998] Add AMD SMI partition tests + Add Guest amd-smi static --partition

Changes:
    - Added amd-smi static --partition for guest systems
    - Added C++ tests for memory and compute (accelerator) partitions
    - Added Python tests for amdsmi_get_gpu_vram_info(),
       amdsmi_get_gpu_accelerator_partition_profile_config()
    - Updated Python tests for
      amdsmi_get_gpu_accelerator_partition_profile()
      Now includes more profile and resource detail
    - Added amdsmi_get_gpu_xcd_counter();
      Tests provided for both C++/Python APIs
    - Added AmdSmiVramType & AmdSmiVramVendor: they were missing
      python testing required adding.

Change-Id: Ib6549d8ccc5fb68726f38745b87c78f890186022
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-03-11 16:38:46 -05:00
Arif, Maisam 0e67568902 [SWDEV-501958] Doc Update deprecating pasid in 7.0 (#166)
Change-Id: Ie19ba271c901d0be324143474871241272166124

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I024f7e2b5e7a5fcd6e1d12181d21ffacfe29c00f
2025-03-07 14:56:46 -06:00
AL Musaffar, Yazen 2936e00fed [SWDEV-453922] AMD SMI to provide mapping feature of other enumeration methods (#51)
Added enumeration mapping for 
- drm render
- drm card
- hsa id 
- hip id
- hip uuid (rocminfo uuid)

Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-03-07 09:09:12 -06:00
Pham, Gabriel d5b2763aba [SWDEV-515730] Updated set partition documentation (#151)
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2025-03-06 23:16:32 -06:00
Park, Peter 0b4a6ff149 [SWDEV-513210] Add references to AMDGPU RAS Support info in API docs (#144)
Add reference to AMDGPU RAS Support info in API docs
2025-03-04 09:32:23 -06:00
Narlo, Joseph d7c3ad0886 [SWDEV-515031] Change Header Version to 25.2.0 (#109)
Change Versioning Scheme to match https://semver.org/
Dropping the year enum and API fields in a future release.
Should not impact library versioning since we are now starting from 25.2.0
---------

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
Change-Id: Id090e23f156926d08f9c0b781447388adf268cf6
2025-02-26 19:17:09 -06:00
Joseph Narlo ddcdd60964 Merge branch 'SWDEV-517156/Synchronize_Unified_and_Amdsmi_Headers' of github.com:AMD-ROCm-Internal/amdsmi into SWDEV-517156/Synchronize_Unified_and_Amdsmi_Headers 2025-02-24 09:21:22 -06:00
Joseph Narlo b38c9aa1cc [SWDEV-517156] Synchronize Unified and Amdsmi Headers
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-02-21 15:05:57 -06:00
Mewar, Deepak 2c591ffcc1 [SWDEV-499995] ESMI Build/Compiler warnings messages (#105)
* [SWDEV-499995] ESMI Build/Compiler warnings messages

Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
2025-02-18 16:20:28 -06:00
Narlo, Joseph dc4a16da6f [SWDEV-513651] Sync Unified And Linux Header (#98)
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-02-06 22:25:50 -06:00
Pham, Gabriel 09379f8438 Changed default behavior of amdsmi_get_gpu_virtualization_mode (#97)
Changed return behavior of amdsmi_get_gpu_virtualization_mode

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-02-05 19:09:44 -06:00
Narlo, Joseph 8e454950ef [SWDEV-509782] Add tags and redefine groups (#73)
Add tags and redefine groups in amd-smi header

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-02-05 18:43:55 -06:00
Maisam Arif 5a9fb676bc Fixed description for amdsmi_get_gpu_virtualization_mode
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I9234a4e7b78f19e16484d7bd5fa078c38f0262ff
2025-01-31 17:40:48 -06:00
Pham, Gabriel e663bed7d6 [SWDEV-462952] Updated passthrough to use virtualization mode struct
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-01-31 17:34:01 -06:00
Arif, Maisam fb3a9c2290 Bump Version 25.2.0
Change-Id: I9a38e58c0c9ef9348312e4faf299518073a1c3c2

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-01-31 15:50:34 -05:00
Pham, Gabriel 0f79efac78 [SWDEV-462952] Options enabled for GPU passthrough scenarios
Added Dynamic Passthrough detection

Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
2025-01-30 18:12:03 -06:00
Maisam Arif 70b14166f7 Bump Version to 25.1.0.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I40efe2c9114357a6c34b5ee44fb523293d6b17e7
2025-01-30 04:12:57 -06:00
Ramalingam, Muthusamy ced110dbb6 amdsmi: Adding Support to get hsmp Driver version
* amdsmi: Adding Support to get hsmp Driver version

Adding Support to fetch hsmp driver version from ESmi Interfaces.
Adding Support to fetch memory bandwidth per socket.

Signed-off-by: muthusamy <muthusamy.ramalingam@amd.com>
2025-01-29 13:45:02 -06:00
Scaffidi, Salvatore 87834bf829 [SWDEV-511296] Update violation_timestamp to read timestamp from firmware
Updated violation_status->violation_timestamp to read values from firmware timestamp

Signed-off-by: Greg Scaffidi <salvatore.scaffidi@amd.com>
Change-Id: I567f824a9ace09a780bca8bb182d45bed681e9ce
2025-01-28 15:43:06 -06:00
Joseph Narlo 3d12d64c9b [SWDEV-504389] Sync Comments in Linux BM
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-01-24 07:24:11 -06:00
Poag, Charis c1cd2b46ef [SWDEV-488276] Add partition 2.0 functionality (#44)
Changes:
* CLI:
  - Updated amd-smi partition
  - Updated amd-smi partition -c
  - Updated amd-smi partition -m
  - Updated amd-smi partition -a
  - Updated amd-smi set -M <NPS1/NPS2/NPS4/NPS8>
  - Updated amd-smi set -C <SPX/DPX/QPX/TPX/CPX>
  - Updated amd-smi set -C <ACCELERATOR_TYPE> or <PROFILE_INDEX>
    Where PROFILE_INDEX = available ACCELERATOR_TYPES
  - Updated amd-smi set --help, now includes more detail for
    amd-smi set -C <ACCELERATOR_TYPE> or <PROFILE_INDEX>

* API:
  - Added amdsmi_get_gpu_memory_partition_config
  - Added amdsmi_set_gpu_memory_partition_mode
  - Added amdsmi_get_gpu_accelerator_partition_profile_config
  - Updated amdsmi_get_gpu_accelerator_partition_profile_config
  - Added amdsmi_set_gpu_accelerator_partition_profile

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-01-16 00:53:46 -06:00
Scaffidi, Salvatore 3793be7735 [SWDEV-463406] Update API with fields for gfx_clock_below_host_limit and low_utilization violations
Updated API with fields for gfx_clock_below_host_limit and low_utilization violations
Change-Id: I25647bae6e7b785f44dab024272767658688bcad

---------
Signed-off-by: Scaffidi, Salvatore <Salvatore.Scaffidi@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
2025-01-08 22:07:23 -06:00
Arif, Maisam 490132748f Corrected spacing and simplified logic
Change-Id: I51c98339367d1cb9470a00ee05463ac8662d6b01

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-01-08 20:18:24 -06:00
Meng, Li (Jassmine) dc400d916e [SWDEV-230863] add two new interfaces for background health check (#4)
1. Get the bad pages threshold of a processor.
2. Verify the checksum of RAS EEPROM

Signed-off-by: Meng Li <li.meng@amd.com>
2025-01-07 17:26:55 -06:00
Park, Peter d9bba639df [SWDEV-503717] Remove occurrences of "Fusion" in docs
Tiny PR to remove occurrences of "Kernel **Fusion** Driver" in
public-facing docs.

Signed-off-by: Peter Park <peter.park@amd.com>
2025-01-07 16:11:46 -06:00
Maisam Arif 338cdd63ce [SWDEV-481702] Update marketing name source
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2024-12-19 18:32:51 -06:00
Maisam Arif 6dcbff866b Bump Version to 24.7.2.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2024-12-19 17:04:55 -06:00
Juan Castillo f8b8347627 [SWDEV-496693]GPU Metrics 1.7
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram

- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
2024-12-18 10:57:06 -06:00
Joe Narlo ef31bb7166 SWDEV-504389 [AMD-SMI] Synching Comments in Linux BM
Sync comments from Unified Header to Linux BM

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I9b1ae94db68761a7963ad87cd60177a57e93ad85
2024-12-18 10:57:06 -06:00
gabrpham bd01cfc203 Fixed post reset and ring_hang issues
Issues include:
	SWDEV-480250
	SWDEV-480255
	SWDEV-480248
Known issue:
	`amd-smi event` has threads taking events from the same device
which, in the case of resetting gpus, makes it seem like some gpus have
reset mulitple times and other have not reset at all.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7dcc214e0366fc1532ece579d915d34d35d5407
2024-12-06 17:46:00 -05:00
Justin Williams 2c24cab86c [SWDEV-502001] Added amd_hsmp.h locally
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: I28e48913743f86fb5fc9082307ec326830d55960
2024-12-05 17:02:48 -05:00
Joe Narlo 547db10384 SWDEV-502330 [AMD-SMI][Unified Header] Convert struct to typedef struct
Change struct to a typedef struct

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I6f3b22a5219c0db0aab2c308b71213ae75334476
2024-12-04 09:14:05 -05:00
Charis Poag 7d061f9ae4 [SWDEV-499029] Fix unable to change memory partition modes
Changes:
  * [API] Removed checking board name, fixes for other MI ASICs
  * [API] Fixed unable to restart AMD GPU, libdrm blocked
    doing this operation
  * [API] Added ability to unload/reload libdrm
    from within AMD SMI APIs
  * [CLI] Increased progress bar to change memory partition modes
    to 140 seconds, since driver reload is variable per system

Change-Id: I52f227f2ab850c4a6332ff3ecdc899903b1080f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-25 09:28:02 -05:00
Joe Narlo 35d8e827b9 SWDEV-497305 [AMDSMI] Consistent string lengths
Unify max string length to AMDSMI_MAX_STRING_LENGTH 256
Replace AMDSMI_NORMAL_STRING_LENGTH, AMDSMI_256_LENGTH

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: Ia81d738be0eefb9683ee53d51c969598fe587f50
2024-11-22 15:37:24 -05:00
Joe Narlo 3052ad4220 SWDEV-495787 [AMDSMI] Different license headers
Change copyrights to MIT and remove date

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I16f5b412f2b9ddefaaa1771aa714cc18829a1be4
2024-11-22 08:55:28 -05:00
Maisam Arif f1c3fbf226 Updated CLI exceptions
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5c68eed7719c093727afa434e25ba2560dde894a
2024-11-15 11:44:51 -05:00
Charis Poag 3ea4a42a6e [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - [CLI] Added warning screen to AMD SMI users
    setting memory partition
  - [CLI] Added a progress bar time-bar for CLI sets display to 40 seconds
  - [API] Updated to wait until the driver reloads with SYSFS files active
  - [CLI] Now users can set or reset without providing:
    amd-smi set -g all <set arguments>
    or amd-smi reset -g all <set arguments>
    now can directly call -> sudo amd-smi set <set arguments>
    or sudo amd-smi reset <set arguments>
  - [SWDEV-475712][CLI/API] Fixed target_graphics_version field
    not properly displaying for older MI or Navi ASICs.
  - [All APIs] Added a catch for the driver to report invalid arguments
    now these APIs will show AMDSMI_STATUS_INVAL
    (ex. changing to NPS8 if the device does not support it)
  - [Install] Modified paths for Python install commands to support
    multi-ROCm installs

Change-Id: Id11f25d68a82d23c6b2d77ccb30b51e860dd0ca7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-12 16:50:32 -04:00
Maisam Arif 4b511a31e1 Bump Version to 24.7.1.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I0fc42fe55cb653102d189db9aa5eaf723280170e
2024-11-11 19:23:20 -06:00
Maisam Arif 6e843436f5 Updated amdsmi_get_energy_count() C API documentation
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Iac75a0dcd583f39eb97aada769c736c3305cc8a2
2024-11-08 16:37:10 -05:00
Joe Narlo 54462ab447 SWDEV-495316 [AMDSMI] In amdsmi.h, change typedef amdsmi_accelerator_partition_profile_t to match definition in Confluence
Move memory_caps defintion and correct the number in reserved to match Confluence

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: Id94144f4b3d2d3d7b4d7327211ffc1957ffd0a93
2024-10-31 12:48:48 -04:00
Charis Poag 0ceca28f41 [SWDEV-463406] Update sample rate + align metric output
Changes:
- Corrected max speed users can sample from FW/driver
  is 100 ms
- Added warning to amdsmi_get_violation_status()
  call on delay required 100ms to sample
- Removed guest support, this API will not be supported
- Updated CLI `amd-smi metric --throttle` outputs from
    XXX_active -> XXX_status
    XXX_percent -> XXX_activity
  to align with host
- Changelog updated

Change-Id: Ib30dd35dcc04ff67904ca82c86a55a16689df226
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-10-23 17:36:35 -04:00
gabrpham f5b7761ac7 [SWDEV-490187] reset gpu partition were removed
The reset gpu partition support for both compute and memory were removed

Code changes related to the following:
  * amdsmi_reset_gpu_compute_partition()
  * amdsmi_reset_gpu_memory_partition()
  * CLI

Change-Id: I372589074b4da172bedd39223edde18939e373ae
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-10-18 16:22:26 -05:00
Khader Basha Shaik 8308ede9e8 amdsmi [CPU]: Add implementation to get cpu handles and core handles API
- Update the API names, parameters to return cpu handles and core
handles in the system.
  - Update the amdsmi_wrapper.py.
  - Update the amdsmi_interface.py to use the processor handles and
    core handles API.

Change-Id: Ie24f62f345864f8b6773fdb3c6369993bca7e25b
2024-10-14 05:41:19 -04:00
Charis Poag 5eff39915b [SWDEV-463406] Add volation_status current counter/accumulated values
Changes:
  - amdsmi_violation_status_t now includes current accumulated/counter
   values
  - Tests/wrapper now include added values
  - Removed ASIC references in header for host/bm alignment
  - Fix violation_status->per_hbm_thrm /
    violation_status->active_hbm_thrm
    calculations.

Change-Id: Ic86a7cbad5198a41018f82f6b588b83158d9ba0b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-10-04 15:56:01 -04:00
Maisam Arif a266d602c5 Bump Version to 24.7.0.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ife9277f6abf64ed862e11e12a6472c6e6ea4d68f
2024-09-27 18:55:19 -05:00
Charis Poag 3a4abbd8c0 [SWDEV-422195/SWDEV-440985] GPU metrics 1.6
Changes:
    - Added new GPU metrics:
      1) Violation status' (ex. PVIOL/TVIOL) accumulators
      2) XCP (Graphics Compute Partitions) statistics
      3) pcie other end recovery counter
    - CLI/API/tests changes were made accordingly

Change-Id: I589b9b1f570f25dda12d95bb501feca85da8b3bb
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-09-27 12:04:21 -05:00