Charis Poag
df6de25624
[SWDEV-529030/SWDEV-531217] Fix tests & output for partitioned configurations (CPX, DPX, QPX, etc.)
...
Changes:
- Updated AMD SMI firmware to display "N/A" for unavailable firmware in partitioned environments, improving clarity.
Example (in DPX):
$ amd-smi firmware
GPU: 0
FW_LIST:
...
FW 12:
FW_ID: PM
FW_VERSION: 00.86.39.00
GPU: 1
FW_LIST: N/A
- Fixed amd-smi partition not showing current partition information on
asics with inablity to set memory or accelerator partitions.
$ amd-smi partition -c -m
CURRENT_PARTITION:
GPU_ID MEMORY ACCELERATOR_TYPE ACCELERATOR_PROFILE_INDEX PARTITION_ID
0 NPS1 CPX 2 0
1 N/A N/A N/A 1
2 N/A N/A N/A 2
3 N/A N/A N/A 3
4 N/A N/A N/A 4
5 N/A N/A N/A 5
6 NPS1 SPX 0 0
7 NPS1 SPX 0 0
8 NPS1 SPX 0 0
MEMORY_PARTITION:
GPU_ID MEMORY_PARTITION_CAPS CURRENT_MEMORY_PARTITION
0 N/A NPS1
1 N/A N/A
2 N/A N/A
3 N/A N/A
4 N/A N/A
5 N/A N/A
6 N/A NPS1
7 N/A NPS1
8 N/A NPS1
- Refactored amd_smi_drm_example.cc:
- Grouped partition changes and restores original partition settings.
- Now handles partitioned environments allowing example to continue even if some APIs are not supported in partitioned configurations.
- Modified amdsmi_asic_info_t (see amdsmi_get_gpu_asic_info()) to report OAM ID as N/A if 0xFFFFFFFF (was 0xFFFF).
Allows for better handling of OAM IDs in partitioned environments (DNE for non-primary nodes,
since its a physical identifier). Easier to handle in tests and example code (ie. now consistent w/ max size of the structure's value).
- Introduced amdsmi_RAII_open_FD() (internal API) to manage file descriptors using RAII, ensuring proper closure and preventing resource leaks.
Updated the following APIs to use this function:
- amdsmi_get_gpu_asic_info(), amdsmi_get_gpu_vram_usage(),
amdsmi_get_gpu_vram_info(), amdsmi_get_gpu_vbios_info(),
amdsmi_get_gpu_driver_info(), amdsmi_get_gpu_virtualization_mode()
- Updated AMD SMI test_base.cc/.h:
- Improved output and handling for partitioned environments.
- Added detailed ASIC information logging to align with structure changes.
- Enhanced error messages for better context before ASSERT checks.
- Resolved test failures in partitioned environments by updating
logic and handling for partition-specific configurations.
Fixed tests include:
- computepartition_read_write.cc, frequencies_read_write.cc,
gpu_metrics_read.cc, mem_util_read.cc, memorypartition_read_write.cc,
perf_level_read.cc, perf_level_read_write.cc, power_cap_read_write.cc,
power_read.cc, sys_info_read.cc, gpu_busy_read.cc
Change-Id: I36e903f8fddd714c74c719459c71aba8bbb77e6f
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
Resetting head + adding fixes for tests ran in partitions
Change-Id: I0c1e9ac07488b50c95f3bc6d8a724e67d2c715dc
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
[ROCm/amdsmi commit: 391451752b ]
2025-06-05 19:24:49 -05:00
Pham, Gabriel
f12b070e14
[SWDEV-536184] Removed extra debug print statement ( #447 )
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: f0233eb664 ]
2025-06-05 17:50:56 -05:00
gabrpham_amdeng
f30205b296
[SWDEV-536184] Modified KFD fallback condition for getting VRAM to include sysfs read failures
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: 7130de3058 ]
2025-06-05 01:49:16 -05:00
Bindhiya Kanangot Balakrishnan
60a86179b9
[SWDEV-534746] Generate valid json output for partition command
...
The amd-smi partition --json output was not in valid json
format. Changes are done to get the output in valid
json format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 872c58b7a3 ]
2025-06-05 01:40:52 -05:00
Saeed, Oosman
99df131155
[SWDEV-530385] Update aca-decode with parsing fixes ( #435 )
...
*Update aca-decode to #4cd539d that fixes some errors in parsing cper files for afid extraction
*Without this fix, we get garbage value for some cper input files relating GFX_poison_cpers
Signed-off-by: Oosman Saeed <oossaeed@amd.com >
[ROCm/amdsmi commit: 2c3fa591b5 ]
2025-06-04 18:49:05 -05:00
Arif, Maisam
e38de3932f
Add Directory Not Found Status code to map to ENOTDIR ( #238 )
...
* Corrected ecc count error return
* Added directory not found error code
* Added ENOTDIR mapping to RSMI_STATUS_DIRECTORY_NOT_FOUND in ErrnoToRsmiStatus
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: e2692ab533 ]
2025-06-03 17:53:33 -05:00
Narlo, Joseph
ba8d2f0d84
[SWDEV-532069] Doxygen Not Picking Non-Documented Values ( #362 )
...
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com >
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com >
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com >
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com >
[ROCm/amdsmi commit: c0c4e021ea ]
2025-06-03 17:24:44 -05:00
Narlo, Joseph
4eb6d34df0
[SWDEV-532769] amd-smi APIs mismatch with documentation ( #428 )
...
* Populated socket_power to get power info
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: ce7d6dfe61 ]
2025-06-03 17:12:13 -05:00
Bindhiya Kanangot Balakrishnan
851d0d015d
[SWDEV-534745] Generate valid json output for xgmi command
...
The amd-smi xgmi --json output was not in valid json
format. Changes are done to get the output in valid
json format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 8f943b03e1 ]
2025-06-03 12:48:02 -05:00
Saeed, Oosman
877c7b1bda
[SWDEV-530385] show afids on each line of printout ( #422 )
...
* show afids on each line of printout
* clean up afids and cper code
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: fab13c5b60 ]
2025-06-02 17:22:10 -05:00
Pham, Gabriel
3d75b7881a
[SWDEV-446039] Added Flat Process table to default output ( #425 )
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 91021da055 ]
2025-06-02 17:15:15 -05:00
Kanangot Balakrishnan, Bindhiya
a3521ea6ed
[SWDEV-519061] xgmi command output shows zero for all xgmi acc read/write data in the first column ( #392 )
...
The xgmi read and write accumulated data from gpu metric index
is based on sysfs xgmi_port_num file. Mapped these two to display
read and write wrt src_gpu Vs dst_gpu.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 8ed52616ad ]
2025-06-02 14:01:06 -05:00
Justin Williams
d8b32bf2ee
[SWDEV-533596] CI - Fixed Docs
...
Signed-off-by: Justin Williams <Justin.Williams@amd.com >
[ROCm/amdsmi commit: bf0448ff96 ]
2025-06-02 13:48:01 -05:00
Joseph Narlo
3d0f98c16d
[SWDEV-522996] Syncing Unified Header and AMDSMI
...
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com >
[ROCm/amdsmi commit: ee43ec71e8 ]
2025-06-02 13:44:33 -05:00
Maisam Arif
cd11d4f051
Updated Changelog
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I10efa8ed10288d3445a330ad27081d1f03113b38
[ROCm/amdsmi commit: 996917e9bc ]
2025-05-30 20:48:29 -05:00
Maisam Arif
00ad72baf9
Deprecated PASID
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Ib008f80f3d736172079358c0ceb3ebca87340d28
[ROCm/amdsmi commit: c89b5db09d ]
2025-05-30 20:48:29 -05:00
Maisam Arif
16d60f3411
[SWDEV-488303] Fixed process list information source
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Iec3416cb5ca1bdd806c3225b514bbf3dbf8c0d2e
[ROCm/amdsmi commit: cebb0799cb ]
2025-05-30 20:48:29 -05:00
Maisam Arif
5324134708
Version Bump 26.0.0
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I29ea6fa781dfc338a60b390ff498c46b4a1efe52
[ROCm/amdsmi commit: cc4dfd834f ]
2025-05-30 20:48:29 -05:00
gabrpham_amdeng
42238ef83c
Updated CLI Tool Help
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: c8f33c96c3 ]
2025-05-30 20:10:32 -05:00
dependabot[bot]
2f803473e1
Bump tornado from 6.4.2 to 6.5.1 in /docs/sphinx ( #418 )
...
Bumps [tornado](https://github.com/tornadoweb/tornado ) from 6.4.2 to 6.5.1.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst )
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.2...v6.5.1 )
---
updated-dependencies:
- dependency-name: tornado
dependency-version: 6.5.1
dependency-type: indirect
...
[ROCm/amdsmi commit: dd81cfd688 ]
2025-05-30 19:53:58 -05:00
gabrpham_amdeng
c4f8ba1178
Suppressed help text of default command
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: 1fa4cdacf3 ]
2025-05-30 19:53:14 -05:00
Pham, Gabriel
d229f86108
[SWDEV-511822] Added group check to default command ( #415 )
...
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: daf74d1cd6 ]
2025-05-30 18:40:18 -05:00
Kanangot Balakrishnan, Bindhiya
f12c72a4e2
[SWDEV-530633] Use gpu_metric speed and BW for xgmi ( #366 )
...
The xgmi command was showing pcie bit rate and bandwidth instead of xgmi. Corrected the API to get xgmi data from gpu metric.
Added python API for amdsmi_get_link_metrics. Modified the amdsmi_link_metrics struct.
Added check to confirm non zero partition got xgmi command.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 2eff0b3764 ]
2025-05-30 16:51:11 -05:00
Castillo, Juan
c830bb4d74
[SWDEV-534728] Fixed deep_sleep status does not work with --json flag ( #413 )
...
- When in json output mode the .rstrip function does not work due to dict obj type.
- The clk_value is now checked for dict instance before extracting the value.
- If clk_value is a dict then the .get() function is used to extract the value.
- Else it is a string obj which uses .split() to extract the value.
- If clk_value is < min_clk_value then deep_sleep is set to ENABLED
- initialize clk_value and min_clk_value to 0 for each loop.
- fix if/else for better readability
---------
Signed-off-by: Juan Castillo <juan.castillo@amd.com >
[ROCm/amdsmi commit: 2e8aaf02c9 ]
2025-05-30 16:45:32 -05:00
Arif, Maisam
da430dec05
[SWDEV-488303] Adjusted process vram_mem data source ( #411 )
...
* [SWDEV-488303] Adjusted process vram_mem data source
* Standardized sscanf format strings
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: 42441c78ea ]
2025-05-29 23:26:12 -05:00
Maisam Arif
b2b6779593
[SWDEV-523247] Corrected amdsmi_get_gpu_vram_usage total
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I0f8bb067bf34f64d1b8d41e2a89d3a79a6745990
[ROCm/amdsmi commit: 876f3976e0 ]
2025-05-29 21:30:00 -05:00
Arif, Maisam
465f2e6a41
[SWDEV-488303] Updated CU occupancy for per-process retrieval ( #243 )
...
Change-Id: I2990597c6dd4b2e8cf3e11ce60f72049ebdd9a8c
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 0fdaebdbaa ]
2025-05-29 20:35:27 -05:00
Maisam Arif
418ec78c21
[SWDEV-534707] Adjust power value documentation
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I1c4516e403715b9a1fe9c78fae94848c89daa920
[ROCm/amdsmi commit: fba62e2270 ]
2025-05-29 18:55:44 -05:00
Liu, Shuzhou (Bill)
ff2e230a34
[SWDEV-520665] Add support for board voltage ( #303 )
...
* Add the API and CLI to show the board voltage.
---------
Change-Id: Icb25bd653bb1d004704b5a21b378ca31b2b242c7
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com >
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com >
[ROCm/amdsmi commit: 970560fc7c ]
2025-05-29 18:55:08 -05:00
Narlo, Joseph
fc54da7679
[SWDEV-489696] Improve AMD SMI Python APIs Functional and Unit Testing ( #408 )
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 13148c5d8e ]
2025-05-29 17:18:08 -05:00
Pham, Gabriel
c283cccf79
[SWDEV-511822] Created default command for amdsmi ( #348 )
...
* Added degree symbol and fixed power usage
* Added degree symbol and fixed power usage
* fixed default command
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: bc158d2b51 ]
2025-05-29 17:14:58 -05:00
Saeed, Oosman
5874eb5151
[SWDEV-533349] codeQL - use strncpy instead of strcpy ( #405 )
...
use strncpy instead of strcpy
Co-authored-by: Oosman Saeed <oossaeed@amd.com >
[ROCm/amdsmi commit: 945e4a159c ]
2025-05-29 15:55:45 -05:00
Kanangot Balakrishnan, Bindhiya
51a7a5dc59
[SWDEV-463406] Update python doc for amdsmi_get_violation_status ( #406 )
...
* Updated the amdsmi_get_violation_status python API doc with newly added fields.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 8e486c832b ]
2025-05-29 14:59:16 -05:00
Justin Williams
928c95f6eb
[SWDEV-533596] CI - Added Docs Generation
...
Signed-off-by: Justin Williams <Justin.Williams@amd.com >
[ROCm/amdsmi commit: 83185695c9 ]
2025-05-29 13:46:13 -05:00
Kanangot Balakrishnan, Bindhiya
2155c96c9e
[SWDEV-463406] ViolationStatus Changes ( #288 )
...
* Expanded Violation Status tracking for GPU metrics 1.8
* Added new fields to `amdsmi_violation_status_t` and related interfaces for enhanced violation statuses
---------
Signed-off-by: Kanangot Balakrishnan, Bindhiya <Bindhiya.KanangotBalakrishnan@amd.com >
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
Co-authored-by: Charis Poag <Charis.Poag@amd.com >
[ROCm/amdsmi commit: e7f19b36f0 ]
2025-05-29 13:26:21 -05:00
Mewar, Deepak
4b0397fd80
[SWDEV-512393] Fix for incorrect cpu set size input ( #399 )
...
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com >
[ROCm/amdsmi commit: 9a49e454fd ]
2025-05-29 12:14:03 -05:00
Saeed, Oosman
b793acaa71
[SWDEV-530385] Fix CPER "--follow" & "--file-limit" ( #380 )
...
* --follow option fix & --file_limit option added
* change --file_limit and --cper_file to --file-limit and --cper-file
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 91c9969b72 ]
2025-05-29 11:59:55 -05:00
Pryor, Adam
69fde31369
Remove ring hang ( #391 )
...
Change-Id: I856cd0949d3661911ab9302148aa1bc6e72abeed
Signed-off-by: adapryor <Adam.pryor@amd.com >
[ROCm/amdsmi commit: d0a89393df ]
2025-05-29 11:58:46 -05:00
Poag, Charis
b88ee7cc5a
Removed backwards compatibility for jpeg_activity/vcn_activity ( #357 )
...
Updated:
- Removed backwards compatibility for jpeg_activity/vcn_activity
- On supported ASICs users can use XCP (partition) stat values:
jpeg_busy and vcn_busy
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
[ROCm/amdsmi commit: f89a8c895c ]
2025-05-29 11:58:06 -05:00
Narlo, Joseph
fea816ee47
[SWDEV-535389] Removed unused definition ( #402 )
...
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com >
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 8724658c14 ]
2025-05-29 10:48:16 -05:00
Maisam Arif
3db6b8b36c
Removed leftover AMDSMI_MAX_DRIVER_VERSION_LENGTH
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Iee95728e6eb6d7962ed658b9a77feccb88e24e92
[ROCm/amdsmi commit: 2481573184 ]
2025-05-29 10:34:21 -05:00
Narlo, Joseph
cd3128f997
[SWDEV-522996] Syncing Unified Header and AMDSMI ( #355 )
...
* Update doxygen help text and formatting
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 4cd0f3391e ]
2025-05-28 19:06:10 -05:00
Narlo, Joseph
8d6253d772
[SWDEV-532125] Remove_Unused_Definitions ( #385 )
...
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: b6d638d942 ]
2025-05-28 18:49:08 -05:00
Narlo, Joseph
41522f665f
[SWDEV-532131] Update String Lengths ( #383 )
...
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 7c29b4eab8 ]
2025-05-28 18:31:30 -05:00
Narlo, Joseph
d2bf77401e
[SWDEV-532129] Update amdsmi asic info ( #369 )
...
* Added `subsystem_id` to `amdsmi_get_gpu_asic_info`
---------
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 9862db63dd ]
2025-05-28 18:26:58 -05:00
Narlo, Joseph
1fbddb6dcc
[SWDEV-533941] Align P2P input struct ( #395 )
...
* Removed `amdsmi_io_link_type_t` and replaced with alredy implemented amdsmi_link_type_t
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: f3a5cc9cd5 ]
2025-05-28 18:22:19 -05:00
Narlo, Joseph
59f5827164
[SWDEV-535200] Remove deprecated function amdsmi_get_power_info_v2 ( #397 )
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 38a1fadf44 ]
2025-05-28 18:09:13 -05:00
Narlo, Joseph
268c4e59ed
[SWDEV-534438] Update structure amdsmi_bdf_t ( #388 )
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 7b3c85e970 ]
2025-05-28 18:05:43 -05:00
Narlo, Joseph
cd71942678
[SWDEV-529483] Get Vram Vendor Name from Driver ( #323 )
...
* Update to remove vram enum and instead use the string directly from the driver.
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: f71ae88956 ]
2025-05-28 17:57:49 -05:00
Maisam Arif
28d1aaf066
Spellcheck
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I3842ca7552c8d3525ac7fee8c94b15cfdd7defdd
[ROCm/amdsmi commit: cebc512b1a ]
2025-05-27 13:59:23 -05:00