Arif, Maisam
240a607904
Revert "[SWDEV-505176] Submodule Unified Header in AMDSMI"
...
This reverts commit a315b62e37 .
2025-07-30 14:08:24 -05:00
Narlo, Joseph
a315b62e37
[SWDEV-505176] Submodule Unified Header in AMDSMI
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
2025-07-30 13:37:01 -05:00
Pham, Gabriel
e2eac98496
[SWDEV-545342] Fixed amdsmi_link_type_t enumeration ( #560 )
...
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
2025-07-22 18:22:49 -05:00
Bindhiya Kanangot Balakrishnan
645c313f00
[SWDEV-543308] Revert amdsmi_link_metrics structure change
...
Moved the bit_rate and max_bandwidth back into links in the
amdsmi_link_metrics_t struct as this change was impacting
other teams. Modified the C and python API's, wrapper, and
CLI accordingly.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
2025-07-14 13:56:26 -05:00
Narlo, Joseph
2cf6272b53
[SWDEV-541675] Remove Unnecessary API from amdsmi.h ( #530 )
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
2025-07-07 11:14:27 -05:00
Galantsev, Dmitrii
9b5bbf555a
DRM - Remove FD usage
...
Change-Id: I77dfa778ccd0d39a03289c2e11cf10357566ff16
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-06-20 11:00:42 -05:00
Galantsev, Dmitrii
202b46d96f
DRM - Remove caching
...
Change-Id: I21716cc953462e385e981024f75a9a7c2d76a466
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-06-20 11:00:42 -05:00
Galantsev, Dmitrii
cb2f152205
DRM - Update to latest public
...
Change-Id: I9f7b46acbae654c377702a599c4b094fd621f101
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
2025-06-20 11:00:42 -05:00
josnarlo
5ed9fba9be
[SWDEV-538604] Sync Unified Header and AMDSMI Comments
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
2025-06-18 09:13:01 -05:00
josnarlo
d4a946717b
[SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
2025-06-13 16:51:59 -05:00
josnarlo
4aee30f49b
[SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
2025-06-13 16:51:59 -05:00
Maisam Arif
6da33b8ded
[SWDEV-529665] PLDM Bundle naming
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Id7f652ddc4e790027869683a4aaa3226ffc05c83
2025-06-12 02:19:37 -05:00
Maisam Arif
5763412f7d
[SWDEV-537491] Updated Copyright to aca-decode files
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I9621e4c54f3b490c6eb4cfc3e9bdfb4d489f0052
2025-06-11 20:51:51 -05:00
Saeed, Oosman
815e0252b1
[SWDEV-536417] AFID & addc decode fixes ( #449 )
...
* fix endian problem
* use hw_revision and flags_mask from cper section instead of hardcoded values
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
2025-06-06 13:41:16 -05:00
Charis Poag
391451752b
[SWDEV-529030/SWDEV-531217] Fix tests & output for partitioned configurations (CPX, DPX, QPX, etc.)
...
Changes:
- Updated AMD SMI firmware to display "N/A" for unavailable firmware in partitioned environments, improving clarity.
Example (in DPX):
$ amd-smi firmware
GPU: 0
FW_LIST:
...
FW 12:
FW_ID: PM
FW_VERSION: 00.86.39.00
GPU: 1
FW_LIST: N/A
- Fixed amd-smi partition not showing current partition information on
asics with inablity to set memory or accelerator partitions.
$ amd-smi partition -c -m
CURRENT_PARTITION:
GPU_ID MEMORY ACCELERATOR_TYPE ACCELERATOR_PROFILE_INDEX PARTITION_ID
0 NPS1 CPX 2 0
1 N/A N/A N/A 1
2 N/A N/A N/A 2
3 N/A N/A N/A 3
4 N/A N/A N/A 4
5 N/A N/A N/A 5
6 NPS1 SPX 0 0
7 NPS1 SPX 0 0
8 NPS1 SPX 0 0
MEMORY_PARTITION:
GPU_ID MEMORY_PARTITION_CAPS CURRENT_MEMORY_PARTITION
0 N/A NPS1
1 N/A N/A
2 N/A N/A
3 N/A N/A
4 N/A N/A
5 N/A N/A
6 N/A NPS1
7 N/A NPS1
8 N/A NPS1
- Refactored amd_smi_drm_example.cc:
- Grouped partition changes and restores original partition settings.
- Now handles partitioned environments allowing example to continue even if some APIs are not supported in partitioned configurations.
- Modified amdsmi_asic_info_t (see amdsmi_get_gpu_asic_info()) to report OAM ID as N/A if 0xFFFFFFFF (was 0xFFFF).
Allows for better handling of OAM IDs in partitioned environments (DNE for non-primary nodes,
since its a physical identifier). Easier to handle in tests and example code (ie. now consistent w/ max size of the structure's value).
- Introduced amdsmi_RAII_open_FD() (internal API) to manage file descriptors using RAII, ensuring proper closure and preventing resource leaks.
Updated the following APIs to use this function:
- amdsmi_get_gpu_asic_info(), amdsmi_get_gpu_vram_usage(),
amdsmi_get_gpu_vram_info(), amdsmi_get_gpu_vbios_info(),
amdsmi_get_gpu_driver_info(), amdsmi_get_gpu_virtualization_mode()
- Updated AMD SMI test_base.cc/.h:
- Improved output and handling for partitioned environments.
- Added detailed ASIC information logging to align with structure changes.
- Enhanced error messages for better context before ASSERT checks.
- Resolved test failures in partitioned environments by updating
logic and handling for partition-specific configurations.
Fixed tests include:
- computepartition_read_write.cc, frequencies_read_write.cc,
gpu_metrics_read.cc, mem_util_read.cc, memorypartition_read_write.cc,
perf_level_read.cc, perf_level_read_write.cc, power_cap_read_write.cc,
power_read.cc, sys_info_read.cc, gpu_busy_read.cc
Change-Id: I36e903f8fddd714c74c719459c71aba8bbb77e6f
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
Resetting head + adding fixes for tests ran in partitions
Change-Id: I0c1e9ac07488b50c95f3bc6d8a724e67d2c715dc
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
2025-06-05 19:24:49 -05:00
Saeed, Oosman
2c3fa591b5
[SWDEV-530385] Update aca-decode with parsing fixes ( #435 )
...
*Update aca-decode to #4cd539d that fixes some errors in parsing cper files for afid extraction
*Without this fix, we get garbage value for some cper input files relating GFX_poison_cpers
Signed-off-by: Oosman Saeed <oossaeed@amd.com >
2025-06-04 18:49:05 -05:00
Arif, Maisam
e2692ab533
Add Directory Not Found Status code to map to ENOTDIR ( #238 )
...
* Corrected ecc count error return
* Added directory not found error code
* Added ENOTDIR mapping to RSMI_STATUS_DIRECTORY_NOT_FOUND in ErrnoToRsmiStatus
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
2025-06-03 17:53:33 -05:00
Narlo, Joseph
c0c4e021ea
[SWDEV-532069] Doxygen Not Picking Non-Documented Values ( #362 )
...
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com >
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com >
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com >
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com >
2025-06-03 17:24:44 -05:00
Narlo, Joseph
ce7d6dfe61
[SWDEV-532769] amd-smi APIs mismatch with documentation ( #428 )
...
* Populated socket_power to get power info
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
2025-06-03 17:12:13 -05:00
Joseph Narlo
ee43ec71e8
[SWDEV-522996] Syncing Unified Header and AMDSMI
...
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com >
2025-06-02 13:44:33 -05:00
Maisam Arif
c89b5db09d
Deprecated PASID
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Ib008f80f3d736172079358c0ceb3ebca87340d28
2025-05-30 20:48:29 -05:00
Maisam Arif
cebb0799cb
[SWDEV-488303] Fixed process list information source
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Iec3416cb5ca1bdd806c3225b514bbf3dbf8c0d2e
2025-05-30 20:48:29 -05:00
Maisam Arif
cc4dfd834f
Version Bump 26.0.0
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I29ea6fa781dfc338a60b390ff498c46b4a1efe52
2025-05-30 20:48:29 -05:00
Kanangot Balakrishnan, Bindhiya
2eff0b3764
[SWDEV-530633] Use gpu_metric speed and BW for xgmi ( #366 )
...
The xgmi command was showing pcie bit rate and bandwidth instead of xgmi. Corrected the API to get xgmi data from gpu metric.
Added python API for amdsmi_get_link_metrics. Modified the amdsmi_link_metrics struct.
Added check to confirm non zero partition got xgmi command.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
2025-05-30 16:51:11 -05:00
Arif, Maisam
42441c78ea
[SWDEV-488303] Adjusted process vram_mem data source ( #411 )
...
* [SWDEV-488303] Adjusted process vram_mem data source
* Standardized sscanf format strings
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
2025-05-29 23:26:12 -05:00
Arif, Maisam
0fdaebdbaa
[SWDEV-488303] Updated CU occupancy for per-process retrieval ( #243 )
...
Change-Id: I2990597c6dd4b2e8cf3e11ce60f72049ebdd9a8c
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
2025-05-29 20:35:27 -05:00
Maisam Arif
fba62e2270
[SWDEV-534707] Adjust power value documentation
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I1c4516e403715b9a1fe9c78fae94848c89daa920
2025-05-29 18:55:44 -05:00
Liu, Shuzhou (Bill)
970560fc7c
[SWDEV-520665] Add support for board voltage ( #303 )
...
* Add the API and CLI to show the board voltage.
---------
Change-Id: Icb25bd653bb1d004704b5a21b378ca31b2b242c7
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com >
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com >
2025-05-29 18:55:08 -05:00
Kanangot Balakrishnan, Bindhiya
e7f19b36f0
[SWDEV-463406] ViolationStatus Changes ( #288 )
...
* Expanded Violation Status tracking for GPU metrics 1.8
* Added new fields to `amdsmi_violation_status_t` and related interfaces for enhanced violation statuses
---------
Signed-off-by: Kanangot Balakrishnan, Bindhiya <Bindhiya.KanangotBalakrishnan@amd.com >
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
Co-authored-by: Charis Poag <Charis.Poag@amd.com >
2025-05-29 13:26:21 -05:00
Mewar, Deepak
9a49e454fd
[SWDEV-512393] Fix for incorrect cpu set size input ( #399 )
...
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com >
2025-05-29 12:14:03 -05:00
Pryor, Adam
d0a89393df
Remove ring hang ( #391 )
...
Change-Id: I856cd0949d3661911ab9302148aa1bc6e72abeed
Signed-off-by: adapryor <Adam.pryor@amd.com >
2025-05-29 11:58:46 -05:00
Maisam Arif
2481573184
Removed leftover AMDSMI_MAX_DRIVER_VERSION_LENGTH
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Iee95728e6eb6d7962ed658b9a77feccb88e24e92
2025-05-29 10:34:21 -05:00
Narlo, Joseph
4cd0f3391e
[SWDEV-522996] Syncing Unified Header and AMDSMI ( #355 )
...
* Update doxygen help text and formatting
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
2025-05-28 19:06:10 -05:00
Narlo, Joseph
b6d638d942
[SWDEV-532125] Remove_Unused_Definitions ( #385 )
...
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
2025-05-28 18:49:08 -05:00
Narlo, Joseph
7c29b4eab8
[SWDEV-532131] Update String Lengths ( #383 )
...
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
2025-05-28 18:31:30 -05:00
Narlo, Joseph
9862db63dd
[SWDEV-532129] Update amdsmi asic info ( #369 )
...
* Added `subsystem_id` to `amdsmi_get_gpu_asic_info`
---------
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
2025-05-28 18:26:58 -05:00
Narlo, Joseph
f3a5cc9cd5
[SWDEV-533941] Align P2P input struct ( #395 )
...
* Removed `amdsmi_io_link_type_t` and replaced with alredy implemented amdsmi_link_type_t
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
2025-05-28 18:22:19 -05:00
Narlo, Joseph
38a1fadf44
[SWDEV-535200] Remove deprecated function amdsmi_get_power_info_v2 ( #397 )
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
2025-05-28 18:09:13 -05:00
Narlo, Joseph
7b3c85e970
[SWDEV-534438] Update structure amdsmi_bdf_t ( #388 )
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
2025-05-28 18:05:43 -05:00
Narlo, Joseph
f71ae88956
[SWDEV-529483] Get Vram Vendor Name from Driver ( #323 )
...
* Update to remove vram enum and instead use the string directly from the driver.
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
2025-05-28 17:57:49 -05:00
Maisam Arif
cebc512b1a
Spellcheck
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I3842ca7552c8d3525ac7fee8c94b15cfdd7defdd
2025-05-27 13:59:23 -05:00
Pham, Gabriel
c40d4291f6
Updated docs with new KFD events ( #382 )
...
* Updated docs with new KFD events
---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com >
2025-05-27 12:21:38 -05:00
Daniel Oliveira
fe9b6eeb49
[SWDEV-529665] Add PLDM Bundle version
...
feat: Report PLDM Bundle from SMC to IB
Code changes related to the following:
* APIs
* CLI
* Unit tests
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com >
Change-Id: I35ccf01eb612ca80e3ae6b72039085c18c989222
2025-05-20 01:37:00 -05:00
Mewar, Deepak
b999f86611
[SWDEV-512393] Added amdsmi_get_cpu_affinity_with_scope ( #198 )
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com >
2025-05-20 01:06:09 -05:00
Pryor, Adam
51e99965b3
[SWDEV-527092] - Fix ringhang event removal ( #372 )
...
Signed-off-by: adapryor <Adam.pryor@amd.com >
2025-05-16 16:41:31 -05:00
Pryor, Adam
8713305f80
[SWDEV-527092] - Process Start/Stop event addition ( #368 )
...
- Added more events to `amdsmi_evt_notification_type_t`
Change-Id: I6a256fe828e4bec3197c7fecbed374ab17c6f850
Signed-off-by: Adam Pryor <Adam.Pryor@amd.com >
2025-05-16 11:01:15 -05:00
Saeed, Oosman
1bb1f8acc2
[SWDEV-522623] Add afid functionality to API and CLI ( #330 )
...
Change-Id: I015bde926491d54e09da8f39b05650515711e09f
[SWDEV-522623] Add afid functionality to API and CLI
Change-Id: I015bde926491d54e09da8f39b05650515711e09f
Signed-off-by: Oosman Saeed <oossaeed@amd.com >
Co-authored-by: Oosman Saeed <oossaeed@amd.com >
2025-05-16 10:49:56 +08:00
Arif, Maisam
ace3b0901a
Version & Doc update ( #343 )
...
Change-Id: Ibf8e1809913e30aba4b21ba889b72e5db7205736
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
2025-05-08 12:19:04 -05:00
Poag, Charis
b5a43b7744
[SWDEV-528647/SWDEV-528450] Reduce API load times and libdrm/libdrm_amdgpu dynamic loading ( #333 )
...
Changes:
- Removed libdrm/libdrm_amdgpu dependencies
- Added/updated new internal libdrm/libdrm_amdgpu/xf86drm APIs
to allow our APIs to reference before dynamic loading
the libdrm/libdrm_amdgpu libraries:
1. amdgpu_drm.h to what's seen in mainline
2. Added xf86drm.h to whats seen in mainline
- Modified internal DRM capabilities:
1. Require each API to independently connect to libdrm/libdrm_amdgpu
+ validate API handles reponses accordingly
2. Initialization of AMD SMI no longer has as strong of a tie to
libdrm
- Updated internal implementations of several APIs which have
connections to libdrm/libdrm_amdgpu or APIs which have conflicts
with open libdrm/libdrm_amdgpu connections:
1. amdsmi_init()
2. amdsmi_get_gpu_vram_usage()
3. amdsmi_get_gpu_asic_info()
4. amdsmi_get_gpu_vram_info()
5. amdsmi_get_gpu_vbios_info()
6. amdsmi_get_gpu_driver_info()
7. amdsmi_get_gpu_virtualization_mode()
8. amdsmi_set_gpu_memory_partition()
9. amdsmi_set_gpu_memory_partition_mode()
- Cleaned up effected tests/APIs
Change-Id: I96e2cf1b06b0cfee1b01a5e991ccc6116c4245a8
2025-05-02 21:58:53 -05:00
Narlo, Joseph
d5ce95573f
[SWDEV-522996] Sync Unified Header and AMDSMI ( #305 )
...
Sync Unified Header and AMDSMI
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
2025-04-24 13:31:08 -05:00