Commit Graph

1615 Commits

Author SHA1 Message Date
Narlo, Joseph c5e604f357 [SWDEV-489696] Improve AMD SMI Python APIs Functional and Unit Testing (#468)
* Adding python unit tests
* Remove duplicate functions definitions
* Added missing classes for __init__ for py-interface

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 7c0802889b]
2025-06-19 16:38:34 -05:00
Arif, Maisam 6123abe733 [SWDEV-538786] Fix ecc counts returning file error (#494)
Change-Id: I5cea584289df95e89b6151d549bf69e4c3e50d22

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 967e879861]
2025-06-19 15:24:03 -05:00
Castillo, Juan 4a55abaa05 [SWDEV-531904] - Added GPU Cache Read Tests (#464)
New:
- gpu_cache_read.h and gpu_cache_read.cc
- Test reads GPU cache info and asserts valid structure
Updated:
- integration_test.py
- Added test_gpu_cache_info() and asserts valid structure
- test_get_gpu_compute_partition() to loop through all devices when test fail/pass
Added:
- test_get_gpu_compute_partition_returns_string() to integration_test.py
- This test displays the current compute partition for each bdf

---------

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 470c62f887]
2025-06-19 15:23:34 -05:00
Narlo, Joseph f543f77e30 [SWDEV-537038] amd-smi-lib build failing Fix for integration_test.py (#496)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: 57a749f457]
2025-06-19 15:12:31 -05:00
Pham, Gabriel aa95feee60 [SWDEV-531386] Changed source of metric GFX and MEM min and max clk to pp_od_clk_voltage (#453)
* Made corrections to reading of pp_od_clk_voltage
* Added fall back to pp_dpm files if pp_od_clk_voltage doesn't exist

---------

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 4262aee8f5]
2025-06-19 15:00:45 -05:00
Galantsev, Dmitrii a480b2869d rsmi_init: Do not complain loudly when no driver is found (#74)
Co-authored-by: Samuel Thibault <samuel.thibault@ens-lyon.org>


[ROCm/amdsmi commit: ca52da194d]
2025-06-19 13:22:48 -05:00
Narlo, Joseph 154d266abc [SWDEV-482203] amd-smi Usage basics for C Library Multiple doc errors (#477)
* Added finding rocm include and library paths in code examples

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: db3d763aad]
2025-06-19 11:25:57 -05:00
josnarlo 0862dd11fb [SWDEV-537038] amd_smi-lib build failing Fix for integration_test.py
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 99b2bfbc61]
2025-06-19 11:23:25 -05:00
Justin Williams 31df8b46bd Adjusted amd-smi set --compute-partition docs
Signed-off-by: Justin Williams <juwillia@amd.com>


[ROCm/amdsmi commit: 81d58f06d1]
2025-06-19 10:58:04 -05:00
gabrpham_amdeng 771e3019ad Adjusted CU % logic to be more robust
[ROCm/amdsmi commit: 9729aba695]
2025-06-19 10:57:19 -05:00
gabrpham_amdeng d049815647 Changed NUM_CU to CU %
[ROCm/amdsmi commit: fd751ba918]
2025-06-19 10:57:19 -05:00
gabrpham 66d3ffe65a Added GTT Memory to process table of default command
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 9e221a3f09]
2025-06-19 10:57:19 -05:00
gabrpham 0e30436a0f Added GTT Memory to default command and adjusted table format
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 8a0e65d911]
2025-06-19 10:57:19 -05:00
Galantsev, Dmitrii 06b8484bbc CLI - Fix partition json output
Change-Id: I2b9e575cb960db7c136776bfe5c040b27feba727
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 4262802588]
2025-06-19 10:34:57 -05:00
josnarlo ed9086505d [SWDEV-538604] Sync Unified Header and AMDSMI Comments
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 5ed9fba9be]
2025-06-18 09:13:01 -05:00
Deepak Mewar 63784f77f7 Updated display format of cpu & socket affinities
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>


[ROCm/amdsmi commit: 7571eb014f]
2025-06-13 17:37:00 -05:00
Bindhiya Kanangot Balakrishnan cd709e93d1 [SWDEV-512393] Print keys of lists in custom_dump
The custom_dump function was not printing list's key
and so static numa was not displaying list keys
CPU affinity and Socket affinity. Updated custom_dump
to print the keys.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 6fbda16098]
2025-06-13 17:37:00 -05:00
josnarlo 48ed5787a6 [SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: d4a946717b]
2025-06-13 16:51:59 -05:00
josnarlo 986a2dd0b5 [SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 4aee30f49b]
2025-06-13 16:51:59 -05:00
Pham, Gabriel dfaf8386fa Added GTT Memory to default output process table (#480)
* Added GTT Memory to default command and adjusted table format

---------

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: 940ece6813]
2025-06-13 16:43:56 -05:00
dependabot[bot] b1753ad3b3 Bump rocm-docs-core[api_reference] from 1.17.0 to 1.20.1 in /docs/sphinx
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.20.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.20.1/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.20.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.20.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

[ROCm/amdsmi commit: 152184dd49]
2025-06-13 16:35:08 -05:00
Maisam Arif 34041504f9 Update workflows and Contrib docs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2ae31144ee1ab29c8bbba83f0c7eb0bb9dc079ba


[ROCm/amdsmi commit: 049c59c5bb]
2025-06-13 16:19:10 -05:00
Maisam Arif 6688ae237f Updated 6.4.2 Changelog
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I975f5db0bde9ebccec3756415cb1e7dc47e78988


[ROCm/amdsmi commit: 772b572913]
2025-06-12 17:17:13 -05:00
Maisam Arif 6e37490e87 [SWDEV-529665] PLDM Bundle naming
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Id7f652ddc4e790027869683a4aaa3226ffc05c83


[ROCm/amdsmi commit: 6da33b8ded]
2025-06-12 02:19:37 -05:00
Maisam Arif 7be2218717 [SWDEV-537491] Updated Copyright to aca-decode files
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I9621e4c54f3b490c6eb4cfc3e9bdfb4d489f0052


[ROCm/amdsmi commit: 5763412f7d]
2025-06-11 20:51:51 -05:00
Arif, Maisam 2658f0fe20 Fixed type hinting & Added copy rights (#462)
* Added copyrights
* Fixed type hinting for processor_handle in python_interface
* Fixed Incorrect type hinting to actual return types

---------

Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Change-Id: Ie2a09acf628ed0c43eacc8ec78c159d125acbcdb

[ROCm/amdsmi commit: 23b9da656c]
2025-06-11 17:19:02 -05:00
Justin Williams 0c2228852a CI - Added Build Warnings
Signed-off-by: Justin Williams <Justin.Williams@amd.com>


[ROCm/amdsmi commit: 6d03ca79ff]
2025-06-11 13:13:38 -05:00
Maisam Arif b8caa120a8 [SWDEV-537062] Fixed CU Occupancy reporting UINT MAX
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I975579997a9e455eb930f6c0b8fc5f3dc3cbfae4


[ROCm/amdsmi commit: b579d89ae2]
2025-06-11 10:42:00 -05:00
dependabot[bot] aa35398722 Bump requests from 2.32.3 to 2.32.4 in /docs/sphinx (#471)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

[ROCm/amdsmi commit: 7e956ce4f3]
2025-06-11 08:23:27 -05:00
Maisam Arif 2cbf0accea [SWDEV-529665] Fix PLDM version format
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I7df4c2068e32a5c81c83adc69dc82a9f5d725533


[ROCm/amdsmi commit: 93404a6bff]
2025-06-11 07:35:25 -05:00
Galantsev, Dmitrii 6892907072 CMAKE - Remove example build from src/CMakeLists.txt (#469)
* CMAKE - Remove example build from src/CMakeLists.txt
For some reason it was building examples every time even when not
necessary...
* CMAKE - Format
* Fix drm_example broken PRIu32
* CMAKE - Do NOT create lib64 when building examples
* CMAKE - Examples should only install C and CMake files

---------

Change-Id: I6274b72a085a41b5bd5ae698af798f60a8a092a0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

[ROCm/amdsmi commit: f9b8066c26]
2025-06-11 07:12:44 -05:00
Maisam Arif 75fac0a105 Fixed Parser Folder Checking
* Adjusted help text
* Adjusted --afid to run only with --cper-file
* Fixed interface return error

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2b96f4515c85f3b9dd84ba5c2d819729a997141b


[ROCm/amdsmi commit: ac63f410c2]
2025-06-10 15:58:06 -05:00
Maisam Arif 7eea09e4d8 [SWDEV-536417] CPER Display fixes
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ic2f3901d0f4c95bd9ed4beda8aa5fd3d596df8d2


[ROCm/amdsmi commit: fb592e003a]
2025-06-10 15:58:06 -05:00
Williams, Justin 20e374663d CI v5.0 (#459)
Signed-off-by: Justin Williams <Justin.Williams@amd.com>

[ROCm/amdsmi commit: ae4f56d14b]
2025-06-06 16:29:20 -05:00
Saeed, Oosman cc2b4b4067 [SWDEV-536417] AFID & addc decode fixes (#449)
* fix endian problem
* use hw_revision and flags_mask from cper section instead of hardcoded values

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 815e0252b1]
2025-06-06 13:41:16 -05:00
Maisam Arif 8c60c4ed94 [SWDEV-536417] CPER & AFID CLI Fixes
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I20aafb1cd2bf8386c30e6d0a0fff8df9c8587554


[ROCm/amdsmi commit: 8bc37a19d2]
2025-06-06 12:26:13 -05:00
Charis Poag df6de25624 [SWDEV-529030/SWDEV-531217] Fix tests & output for partitioned configurations (CPX, DPX, QPX, etc.)
Changes:
  - Updated AMD SMI firmware to display "N/A" for unavailable firmware in partitioned environments, improving clarity.
    Example (in DPX):
    $ amd-smi firmware
    GPU: 0
        FW_LIST:
            ...
            FW 12:
                FW_ID: PM
                FW_VERSION: 00.86.39.00
    GPU: 1
        FW_LIST: N/A
  - Fixed amd-smi partition not showing current partition information on
    asics with inablity to set memory or accelerator partitions.
    $ amd-smi partition -c -m
    CURRENT_PARTITION:
    GPU_ID  MEMORY  ACCELERATOR_TYPE  ACCELERATOR_PROFILE_INDEX  PARTITION_ID
    0       NPS1    CPX               2                          0
    1       N/A     N/A               N/A                        1
    2       N/A     N/A               N/A                        2
    3       N/A     N/A               N/A                        3
    4       N/A     N/A               N/A                        4
    5       N/A     N/A               N/A                        5
    6       NPS1    SPX               0                          0
    7       NPS1    SPX               0                          0
    8       NPS1    SPX               0                          0

    MEMORY_PARTITION:
    GPU_ID  MEMORY_PARTITION_CAPS  CURRENT_MEMORY_PARTITION
    0       N/A                    NPS1
    1       N/A                    N/A
    2       N/A                    N/A
    3       N/A                    N/A
    4       N/A                    N/A
    5       N/A                    N/A
    6       N/A                    NPS1
    7       N/A                    NPS1
    8       N/A                    NPS1

  - Refactored amd_smi_drm_example.cc:
    - Grouped partition changes and restores original partition settings.
    - Now handles partitioned environments allowing example to continue even if some APIs are not supported in partitioned configurations.
  - Modified amdsmi_asic_info_t (see amdsmi_get_gpu_asic_info()) to report OAM ID as N/A if 0xFFFFFFFF (was 0xFFFF).
    Allows for better handling of OAM IDs in partitioned environments (DNE for non-primary nodes,
    since its a physical identifier). Easier to handle in tests and example code (ie. now consistent w/ max size of the structure's value).
  - Introduced amdsmi_RAII_open_FD() (internal API) to manage file descriptors using RAII, ensuring proper closure and preventing resource leaks.
    Updated the following APIs to use this function:
      - amdsmi_get_gpu_asic_info(), amdsmi_get_gpu_vram_usage(),
        amdsmi_get_gpu_vram_info(), amdsmi_get_gpu_vbios_info(),
        amdsmi_get_gpu_driver_info(), amdsmi_get_gpu_virtualization_mode()
  - Updated AMD SMI test_base.cc/.h:
    - Improved output and handling for partitioned environments.
    - Added detailed ASIC information logging to align with structure changes.
    - Enhanced error messages for better context before ASSERT checks.
  - Resolved test failures in partitioned environments by updating
    logic and handling for partition-specific configurations.
    Fixed tests include:
      - computepartition_read_write.cc, frequencies_read_write.cc,
        gpu_metrics_read.cc, mem_util_read.cc, memorypartition_read_write.cc,
        perf_level_read.cc, perf_level_read_write.cc, power_cap_read_write.cc,
        power_read.cc, sys_info_read.cc, gpu_busy_read.cc

Change-Id: I36e903f8fddd714c74c719459c71aba8bbb77e6f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Resetting head + adding fixes for tests ran in partitions

Change-Id: I0c1e9ac07488b50c95f3bc6d8a724e67d2c715dc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 391451752b]
2025-06-05 19:24:49 -05:00
Pham, Gabriel f12b070e14 [SWDEV-536184] Removed extra debug print statement (#447)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: f0233eb664]
2025-06-05 17:50:56 -05:00
gabrpham_amdeng f30205b296 [SWDEV-536184] Modified KFD fallback condition for getting VRAM to include sysfs read failures
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 7130de3058]
2025-06-05 01:49:16 -05:00
Bindhiya Kanangot Balakrishnan 60a86179b9 [SWDEV-534746] Generate valid json output for partition command
The amd-smi partition --json output was not in valid json
format. Changes are done to get the output in valid
json format.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 872c58b7a3]
2025-06-05 01:40:52 -05:00
Saeed, Oosman 99df131155 [SWDEV-530385] Update aca-decode with parsing fixes (#435)
*Update aca-decode to #4cd539d that fixes some errors in parsing cper files for afid extraction
*Without this fix, we get garbage value for some cper input files relating GFX_poison_cpers

Signed-off-by: Oosman Saeed <oossaeed@amd.com>

[ROCm/amdsmi commit: 2c3fa591b5]
2025-06-04 18:49:05 -05:00
Arif, Maisam e38de3932f Add Directory Not Found Status code to map to ENOTDIR (#238)
* Corrected ecc count error return
* Added directory not found error code
* Added ENOTDIR mapping to RSMI_STATUS_DIRECTORY_NOT_FOUND in ErrnoToRsmiStatus

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: e2692ab533]
2025-06-03 17:53:33 -05:00
Narlo, Joseph ba8d2f0d84 [SWDEV-532069] Doxygen Not Picking Non-Documented Values (#362)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>

[ROCm/amdsmi commit: c0c4e021ea]
2025-06-03 17:24:44 -05:00
Narlo, Joseph 4eb6d34df0 [SWDEV-532769] amd-smi APIs mismatch with documentation (#428)
* Populated socket_power to get power info
---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: ce7d6dfe61]
2025-06-03 17:12:13 -05:00
Bindhiya Kanangot Balakrishnan 851d0d015d [SWDEV-534745] Generate valid json output for xgmi command
The amd-smi xgmi --json output was not in valid json
format. Changes are done to get the output in valid
json format.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 8f943b03e1]
2025-06-03 12:48:02 -05:00
Saeed, Oosman 877c7b1bda [SWDEV-530385] show afids on each line of printout (#422)
* show afids on each line of printout
* clean up afids and cper code
---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: fab13c5b60]
2025-06-02 17:22:10 -05:00
Pham, Gabriel 3d75b7881a [SWDEV-446039] Added Flat Process table to default output (#425)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 91021da055]
2025-06-02 17:15:15 -05:00
Kanangot Balakrishnan, Bindhiya a3521ea6ed [SWDEV-519061] xgmi command output shows zero for all xgmi acc read/write data in the first column (#392)
The xgmi read and write accumulated data from gpu metric index
is based on sysfs xgmi_port_num file. Mapped these two to display
read and write wrt src_gpu Vs dst_gpu.
---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/amdsmi commit: 8ed52616ad]
2025-06-02 14:01:06 -05:00
Justin Williams d8b32bf2ee [SWDEV-533596] CI - Fixed Docs
Signed-off-by: Justin Williams <Justin.Williams@amd.com>


[ROCm/amdsmi commit: bf0448ff96]
2025-06-02 13:48:01 -05:00
Joseph Narlo 3d0f98c16d [SWDEV-522996] Syncing Unified Header and AMDSMI
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>


[ROCm/amdsmi commit: ee43ec71e8]
2025-06-02 13:44:33 -05:00