Narlo, Joseph
c5e604f357
[SWDEV-489696] Improve AMD SMI Python APIs Functional and Unit Testing ( #468 )
...
* Adding python unit tests
* Remove duplicate functions definitions
* Added missing classes for __init__ for py-interface
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 7c0802889b ]
2025-06-19 16:38:34 -05:00
Arif, Maisam
6123abe733
[SWDEV-538786] Fix ecc counts returning file error ( #494 )
...
Change-Id: I5cea584289df95e89b6151d549bf69e4c3e50d22
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 967e879861 ]
2025-06-19 15:24:03 -05:00
Castillo, Juan
4a55abaa05
[SWDEV-531904] - Added GPU Cache Read Tests ( #464 )
...
New:
- gpu_cache_read.h and gpu_cache_read.cc
- Test reads GPU cache info and asserts valid structure
Updated:
- integration_test.py
- Added test_gpu_cache_info() and asserts valid structure
- test_get_gpu_compute_partition() to loop through all devices when test fail/pass
Added:
- test_get_gpu_compute_partition_returns_string() to integration_test.py
- This test displays the current compute partition for each bdf
---------
Signed-off-by: Juan Castillo <juan.castillo@amd.com >
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 470c62f887 ]
2025-06-19 15:23:34 -05:00
Narlo, Joseph
f543f77e30
[SWDEV-537038] amd-smi-lib build failing Fix for integration_test.py ( #496 )
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 57a749f457 ]
2025-06-19 15:12:31 -05:00
Pham, Gabriel
aa95feee60
[SWDEV-531386] Changed source of metric GFX and MEM min and max clk to pp_od_clk_voltage ( #453 )
...
* Made corrections to reading of pp_od_clk_voltage
* Added fall back to pp_dpm files if pp_od_clk_voltage doesn't exist
---------
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 4262aee8f5 ]
2025-06-19 15:00:45 -05:00
Galantsev, Dmitrii
a480b2869d
rsmi_init: Do not complain loudly when no driver is found ( #74 )
...
Co-authored-by: Samuel Thibault <samuel.thibault@ens-lyon.org >
[ROCm/amdsmi commit: ca52da194d ]
2025-06-19 13:22:48 -05:00
Narlo, Joseph
154d266abc
[SWDEV-482203] amd-smi Usage basics for C Library Multiple doc errors ( #477 )
...
* Added finding rocm include and library paths in code examples
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: db3d763aad ]
2025-06-19 11:25:57 -05:00
josnarlo
0862dd11fb
[SWDEV-537038] amd_smi-lib build failing Fix for integration_test.py
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 99b2bfbc61 ]
2025-06-19 11:23:25 -05:00
Justin Williams
31df8b46bd
Adjusted amd-smi set --compute-partition docs
...
Signed-off-by: Justin Williams <juwillia@amd.com >
[ROCm/amdsmi commit: 81d58f06d1 ]
2025-06-19 10:58:04 -05:00
gabrpham_amdeng
771e3019ad
Adjusted CU % logic to be more robust
...
[ROCm/amdsmi commit: 9729aba695 ]
2025-06-19 10:57:19 -05:00
gabrpham_amdeng
d049815647
Changed NUM_CU to CU %
...
[ROCm/amdsmi commit: fd751ba918 ]
2025-06-19 10:57:19 -05:00
gabrpham
66d3ffe65a
Added GTT Memory to process table of default command
...
Signed-off-by: gabrpham <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: 9e221a3f09 ]
2025-06-19 10:57:19 -05:00
gabrpham
0e30436a0f
Added GTT Memory to default command and adjusted table format
...
Signed-off-by: gabrpham <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: 8a0e65d911 ]
2025-06-19 10:57:19 -05:00
Galantsev, Dmitrii
06b8484bbc
CLI - Fix partition json output
...
Change-Id: I2b9e575cb960db7c136776bfe5c040b27feba727
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
[ROCm/amdsmi commit: 4262802588 ]
2025-06-19 10:34:57 -05:00
josnarlo
ed9086505d
[SWDEV-538604] Sync Unified Header and AMDSMI Comments
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 5ed9fba9be ]
2025-06-18 09:13:01 -05:00
Deepak Mewar
63784f77f7
Updated display format of cpu & socket affinities
...
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com >
[ROCm/amdsmi commit: 7571eb014f ]
2025-06-13 17:37:00 -05:00
Bindhiya Kanangot Balakrishnan
cd709e93d1
[SWDEV-512393] Print keys of lists in custom_dump
...
The custom_dump function was not printing list's key
and so static numa was not displaying list keys
CPU affinity and Socket affinity. Updated custom_dump
to print the keys.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 6fbda16098 ]
2025-06-13 17:37:00 -05:00
josnarlo
48ed5787a6
[SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: d4a946717b ]
2025-06-13 16:51:59 -05:00
josnarlo
986a2dd0b5
[SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
...
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
[ROCm/amdsmi commit: 4aee30f49b ]
2025-06-13 16:51:59 -05:00
Pham, Gabriel
dfaf8386fa
Added GTT Memory to default output process table ( #480 )
...
* Added GTT Memory to default command and adjusted table format
---------
Signed-off-by: gabrpham <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: 940ece6813 ]
2025-06-13 16:43:56 -05:00
dependabot[bot]
b1753ad3b3
Bump rocm-docs-core[api_reference] from 1.17.0 to 1.20.1 in /docs/sphinx
...
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core ) from 1.17.0 to 1.20.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases )
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.20.1/CHANGELOG.md )
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.20.1 )
---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
dependency-version: 1.20.1
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com >
[ROCm/amdsmi commit: 152184dd49 ]
2025-06-13 16:35:08 -05:00
Maisam Arif
34041504f9
Update workflows and Contrib docs
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I2ae31144ee1ab29c8bbba83f0c7eb0bb9dc079ba
[ROCm/amdsmi commit: 049c59c5bb ]
2025-06-13 16:19:10 -05:00
Maisam Arif
6688ae237f
Updated 6.4.2 Changelog
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I975f5db0bde9ebccec3756415cb1e7dc47e78988
[ROCm/amdsmi commit: 772b572913 ]
2025-06-12 17:17:13 -05:00
Maisam Arif
6e37490e87
[SWDEV-529665] PLDM Bundle naming
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Id7f652ddc4e790027869683a4aaa3226ffc05c83
[ROCm/amdsmi commit: 6da33b8ded ]
2025-06-12 02:19:37 -05:00
Maisam Arif
7be2218717
[SWDEV-537491] Updated Copyright to aca-decode files
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I9621e4c54f3b490c6eb4cfc3e9bdfb4d489f0052
[ROCm/amdsmi commit: 5763412f7d ]
2025-06-11 20:51:51 -05:00
Arif, Maisam
2658f0fe20
Fixed type hinting & Added copy rights ( #462 )
...
* Added copyrights
* Fixed type hinting for processor_handle in python_interface
* Fixed Incorrect type hinting to actual return types
---------
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com >
Change-Id: Ie2a09acf628ed0c43eacc8ec78c159d125acbcdb
[ROCm/amdsmi commit: 23b9da656c ]
2025-06-11 17:19:02 -05:00
Justin Williams
0c2228852a
CI - Added Build Warnings
...
Signed-off-by: Justin Williams <Justin.Williams@amd.com >
[ROCm/amdsmi commit: 6d03ca79ff ]
2025-06-11 13:13:38 -05:00
Maisam Arif
b8caa120a8
[SWDEV-537062] Fixed CU Occupancy reporting UINT MAX
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I975579997a9e455eb930f6c0b8fc5f3dc3cbfae4
[ROCm/amdsmi commit: b579d89ae2 ]
2025-06-11 10:42:00 -05:00
dependabot[bot]
aa35398722
Bump requests from 2.32.3 to 2.32.4 in /docs/sphinx ( #471 )
...
Bumps [requests](https://github.com/psf/requests ) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases )
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md )
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4 )
---
updated-dependencies:
- dependency-name: requests
dependency-version: 2.32.4
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com >
[ROCm/amdsmi commit: 7e956ce4f3 ]
2025-06-11 08:23:27 -05:00
Maisam Arif
2cbf0accea
[SWDEV-529665] Fix PLDM version format
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I7df4c2068e32a5c81c83adc69dc82a9f5d725533
[ROCm/amdsmi commit: 93404a6bff ]
2025-06-11 07:35:25 -05:00
Galantsev, Dmitrii
6892907072
CMAKE - Remove example build from src/CMakeLists.txt ( #469 )
...
* CMAKE - Remove example build from src/CMakeLists.txt
For some reason it was building examples every time even when not
necessary...
* CMAKE - Format
* Fix drm_example broken PRIu32
* CMAKE - Do NOT create lib64 when building examples
* CMAKE - Examples should only install C and CMake files
---------
Change-Id: I6274b72a085a41b5bd5ae698af798f60a8a092a0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com >
[ROCm/amdsmi commit: f9b8066c26 ]
2025-06-11 07:12:44 -05:00
Maisam Arif
75fac0a105
Fixed Parser Folder Checking
...
* Adjusted help text
* Adjusted --afid to run only with --cper-file
* Fixed interface return error
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I2b96f4515c85f3b9dd84ba5c2d819729a997141b
[ROCm/amdsmi commit: ac63f410c2 ]
2025-06-10 15:58:06 -05:00
Maisam Arif
7eea09e4d8
[SWDEV-536417] CPER Display fixes
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: Ic2f3901d0f4c95bd9ed4beda8aa5fd3d596df8d2
[ROCm/amdsmi commit: fb592e003a ]
2025-06-10 15:58:06 -05:00
Williams, Justin
20e374663d
CI v5.0 ( #459 )
...
Signed-off-by: Justin Williams <Justin.Williams@amd.com >
[ROCm/amdsmi commit: ae4f56d14b ]
2025-06-06 16:29:20 -05:00
Saeed, Oosman
cc2b4b4067
[SWDEV-536417] AFID & addc decode fixes ( #449 )
...
* fix endian problem
* use hw_revision and flags_mask from cper section instead of hardcoded values
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 815e0252b1 ]
2025-06-06 13:41:16 -05:00
Maisam Arif
8c60c4ed94
[SWDEV-536417] CPER & AFID CLI Fixes
...
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Change-Id: I20aafb1cd2bf8386c30e6d0a0fff8df9c8587554
[ROCm/amdsmi commit: 8bc37a19d2 ]
2025-06-06 12:26:13 -05:00
Charis Poag
df6de25624
[SWDEV-529030/SWDEV-531217] Fix tests & output for partitioned configurations (CPX, DPX, QPX, etc.)
...
Changes:
- Updated AMD SMI firmware to display "N/A" for unavailable firmware in partitioned environments, improving clarity.
Example (in DPX):
$ amd-smi firmware
GPU: 0
FW_LIST:
...
FW 12:
FW_ID: PM
FW_VERSION: 00.86.39.00
GPU: 1
FW_LIST: N/A
- Fixed amd-smi partition not showing current partition information on
asics with inablity to set memory or accelerator partitions.
$ amd-smi partition -c -m
CURRENT_PARTITION:
GPU_ID MEMORY ACCELERATOR_TYPE ACCELERATOR_PROFILE_INDEX PARTITION_ID
0 NPS1 CPX 2 0
1 N/A N/A N/A 1
2 N/A N/A N/A 2
3 N/A N/A N/A 3
4 N/A N/A N/A 4
5 N/A N/A N/A 5
6 NPS1 SPX 0 0
7 NPS1 SPX 0 0
8 NPS1 SPX 0 0
MEMORY_PARTITION:
GPU_ID MEMORY_PARTITION_CAPS CURRENT_MEMORY_PARTITION
0 N/A NPS1
1 N/A N/A
2 N/A N/A
3 N/A N/A
4 N/A N/A
5 N/A N/A
6 N/A NPS1
7 N/A NPS1
8 N/A NPS1
- Refactored amd_smi_drm_example.cc:
- Grouped partition changes and restores original partition settings.
- Now handles partitioned environments allowing example to continue even if some APIs are not supported in partitioned configurations.
- Modified amdsmi_asic_info_t (see amdsmi_get_gpu_asic_info()) to report OAM ID as N/A if 0xFFFFFFFF (was 0xFFFF).
Allows for better handling of OAM IDs in partitioned environments (DNE for non-primary nodes,
since its a physical identifier). Easier to handle in tests and example code (ie. now consistent w/ max size of the structure's value).
- Introduced amdsmi_RAII_open_FD() (internal API) to manage file descriptors using RAII, ensuring proper closure and preventing resource leaks.
Updated the following APIs to use this function:
- amdsmi_get_gpu_asic_info(), amdsmi_get_gpu_vram_usage(),
amdsmi_get_gpu_vram_info(), amdsmi_get_gpu_vbios_info(),
amdsmi_get_gpu_driver_info(), amdsmi_get_gpu_virtualization_mode()
- Updated AMD SMI test_base.cc/.h:
- Improved output and handling for partitioned environments.
- Added detailed ASIC information logging to align with structure changes.
- Enhanced error messages for better context before ASSERT checks.
- Resolved test failures in partitioned environments by updating
logic and handling for partition-specific configurations.
Fixed tests include:
- computepartition_read_write.cc, frequencies_read_write.cc,
gpu_metrics_read.cc, mem_util_read.cc, memorypartition_read_write.cc,
perf_level_read.cc, perf_level_read_write.cc, power_cap_read_write.cc,
power_read.cc, sys_info_read.cc, gpu_busy_read.cc
Change-Id: I36e903f8fddd714c74c719459c71aba8bbb77e6f
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
Resetting head + adding fixes for tests ran in partitions
Change-Id: I0c1e9ac07488b50c95f3bc6d8a724e67d2c715dc
Signed-off-by: Charis Poag <Charis.Poag@amd.com >
[ROCm/amdsmi commit: 391451752b ]
2025-06-05 19:24:49 -05:00
Pham, Gabriel
f12b070e14
[SWDEV-536184] Removed extra debug print statement ( #447 )
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: f0233eb664 ]
2025-06-05 17:50:56 -05:00
gabrpham_amdeng
f30205b296
[SWDEV-536184] Modified KFD fallback condition for getting VRAM to include sysfs read failures
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: 7130de3058 ]
2025-06-05 01:49:16 -05:00
Bindhiya Kanangot Balakrishnan
60a86179b9
[SWDEV-534746] Generate valid json output for partition command
...
The amd-smi partition --json output was not in valid json
format. Changes are done to get the output in valid
json format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 872c58b7a3 ]
2025-06-05 01:40:52 -05:00
Saeed, Oosman
99df131155
[SWDEV-530385] Update aca-decode with parsing fixes ( #435 )
...
*Update aca-decode to #4cd539d that fixes some errors in parsing cper files for afid extraction
*Without this fix, we get garbage value for some cper input files relating GFX_poison_cpers
Signed-off-by: Oosman Saeed <oossaeed@amd.com >
[ROCm/amdsmi commit: 2c3fa591b5 ]
2025-06-04 18:49:05 -05:00
Arif, Maisam
e38de3932f
Add Directory Not Found Status code to map to ENOTDIR ( #238 )
...
* Corrected ecc count error return
* Added directory not found error code
* Added ENOTDIR mapping to RSMI_STATUS_DIRECTORY_NOT_FOUND in ErrnoToRsmiStatus
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
[ROCm/amdsmi commit: e2692ab533 ]
2025-06-03 17:53:33 -05:00
Narlo, Joseph
ba8d2f0d84
[SWDEV-532069] Doxygen Not Picking Non-Documented Values ( #362 )
...
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com >
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com >
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com >
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com >
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com >
[ROCm/amdsmi commit: c0c4e021ea ]
2025-06-03 17:24:44 -05:00
Narlo, Joseph
4eb6d34df0
[SWDEV-532769] amd-smi APIs mismatch with documentation ( #428 )
...
* Populated socket_power to get power info
---------
Signed-off-by: josnarlo <Joseph.Narlo@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: ce7d6dfe61 ]
2025-06-03 17:12:13 -05:00
Bindhiya Kanangot Balakrishnan
851d0d015d
[SWDEV-534745] Generate valid json output for xgmi command
...
The amd-smi xgmi --json output was not in valid json
format. Changes are done to get the output in valid
json format.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 8f943b03e1 ]
2025-06-03 12:48:02 -05:00
Saeed, Oosman
877c7b1bda
[SWDEV-530385] show afids on each line of printout ( #422 )
...
* show afids on each line of printout
* clean up afids and cper code
---------
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: fab13c5b60 ]
2025-06-02 17:22:10 -05:00
Pham, Gabriel
3d75b7881a
[SWDEV-446039] Added Flat Process table to default output ( #425 )
...
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com >
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com >
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com >
[ROCm/amdsmi commit: 91021da055 ]
2025-06-02 17:15:15 -05:00
Kanangot Balakrishnan, Bindhiya
a3521ea6ed
[SWDEV-519061] xgmi command output shows zero for all xgmi acc read/write data in the first column ( #392 )
...
The xgmi read and write accumulated data from gpu metric index
is based on sysfs xgmi_port_num file. Mapped these two to display
read and write wrt src_gpu Vs dst_gpu.
---------
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com >
[ROCm/amdsmi commit: 8ed52616ad ]
2025-06-02 14:01:06 -05:00
Justin Williams
d8b32bf2ee
[SWDEV-533596] CI - Fixed Docs
...
Signed-off-by: Justin Williams <Justin.Williams@amd.com >
[ROCm/amdsmi commit: bf0448ff96 ]
2025-06-02 13:48:01 -05:00
Joseph Narlo
3d0f98c16d
[SWDEV-522996] Syncing Unified Header and AMDSMI
...
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com >
[ROCm/amdsmi commit: ee43ec71e8 ]
2025-06-02 13:44:33 -05:00