Граф коммитов

1822 Коммитов

Автор SHA1 Сообщение Дата
Galantsev, Dmitrii 06b8484bbc CLI - Fix partition json output
Change-Id: I2b9e575cb960db7c136776bfe5c040b27feba727
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 4262802588]
2025-06-19 10:34:57 -05:00
josnarlo ed9086505d [SWDEV-538604] Sync Unified Header and AMDSMI Comments
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 5ed9fba9be]
2025-06-18 09:13:01 -05:00
Deepak Mewar 63784f77f7 Updated display format of cpu & socket affinities
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>


[ROCm/amdsmi commit: 7571eb014f]
2025-06-13 17:37:00 -05:00
Bindhiya Kanangot Balakrishnan cd709e93d1 [SWDEV-512393] Print keys of lists in custom_dump
The custom_dump function was not printing list's key
and so static numa was not displaying list keys
CPU affinity and Socket affinity. Updated custom_dump
to print the keys.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 6fbda16098]
2025-06-13 17:37:00 -05:00
josnarlo 48ed5787a6 [SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: d4a946717b]
2025-06-13 16:51:59 -05:00
josnarlo 986a2dd0b5 [SWDEV-537983] Fix comments about temperature units for amdsmi_get_temp_metric
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 4aee30f49b]
2025-06-13 16:51:59 -05:00
Pham, Gabriel dfaf8386fa Added GTT Memory to default output process table (#480)
* Added GTT Memory to default command and adjusted table format

---------

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: 940ece6813]
2025-06-13 16:43:56 -05:00
dependabot[bot] b1753ad3b3 Bump rocm-docs-core[api_reference] from 1.17.0 to 1.20.1 in /docs/sphinx
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.20.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.20.1/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.20.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.20.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

[ROCm/amdsmi commit: 152184dd49]
2025-06-13 16:35:08 -05:00
Maisam Arif 34041504f9 Update workflows and Contrib docs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2ae31144ee1ab29c8bbba83f0c7eb0bb9dc079ba


[ROCm/amdsmi commit: 049c59c5bb]
2025-06-13 16:19:10 -05:00
Maisam Arif 6688ae237f Updated 6.4.2 Changelog
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I975f5db0bde9ebccec3756415cb1e7dc47e78988


[ROCm/amdsmi commit: 772b572913]
2025-06-12 17:17:13 -05:00
Maisam Arif 6e37490e87 [SWDEV-529665] PLDM Bundle naming
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Id7f652ddc4e790027869683a4aaa3226ffc05c83


[ROCm/amdsmi commit: 6da33b8ded]
2025-06-12 02:19:37 -05:00
Maisam Arif 7be2218717 [SWDEV-537491] Updated Copyright to aca-decode files
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I9621e4c54f3b490c6eb4cfc3e9bdfb4d489f0052


[ROCm/amdsmi commit: 5763412f7d]
2025-06-11 20:51:51 -05:00
Arif, Maisam 2658f0fe20 Fixed type hinting & Added copy rights (#462)
* Added copyrights
* Fixed type hinting for processor_handle in python_interface
* Fixed Incorrect type hinting to actual return types

---------

Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Change-Id: Ie2a09acf628ed0c43eacc8ec78c159d125acbcdb

[ROCm/amdsmi commit: 23b9da656c]
2025-06-11 17:19:02 -05:00
Justin Williams 0c2228852a CI - Added Build Warnings
Signed-off-by: Justin Williams <Justin.Williams@amd.com>


[ROCm/amdsmi commit: 6d03ca79ff]
2025-06-11 13:13:38 -05:00
Maisam Arif b8caa120a8 [SWDEV-537062] Fixed CU Occupancy reporting UINT MAX
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I975579997a9e455eb930f6c0b8fc5f3dc3cbfae4


[ROCm/amdsmi commit: b579d89ae2]
2025-06-11 10:42:00 -05:00
dependabot[bot] aa35398722 Bump requests from 2.32.3 to 2.32.4 in /docs/sphinx (#471)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

[ROCm/amdsmi commit: 7e956ce4f3]
2025-06-11 08:23:27 -05:00
Maisam Arif 2cbf0accea [SWDEV-529665] Fix PLDM version format
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I7df4c2068e32a5c81c83adc69dc82a9f5d725533


[ROCm/amdsmi commit: 93404a6bff]
2025-06-11 07:35:25 -05:00
Galantsev, Dmitrii 6892907072 CMAKE - Remove example build from src/CMakeLists.txt (#469)
* CMAKE - Remove example build from src/CMakeLists.txt
For some reason it was building examples every time even when not
necessary...
* CMAKE - Format
* Fix drm_example broken PRIu32
* CMAKE - Do NOT create lib64 when building examples
* CMAKE - Examples should only install C and CMake files

---------

Change-Id: I6274b72a085a41b5bd5ae698af798f60a8a092a0
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

[ROCm/amdsmi commit: f9b8066c26]
2025-06-11 07:12:44 -05:00
Maisam Arif 75fac0a105 Fixed Parser Folder Checking
* Adjusted help text
* Adjusted --afid to run only with --cper-file
* Fixed interface return error

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2b96f4515c85f3b9dd84ba5c2d819729a997141b


[ROCm/amdsmi commit: ac63f410c2]
2025-06-10 15:58:06 -05:00
Maisam Arif 7eea09e4d8 [SWDEV-536417] CPER Display fixes
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ic2f3901d0f4c95bd9ed4beda8aa5fd3d596df8d2


[ROCm/amdsmi commit: fb592e003a]
2025-06-10 15:58:06 -05:00
Williams, Justin 20e374663d CI v5.0 (#459)
Signed-off-by: Justin Williams <Justin.Williams@amd.com>

[ROCm/amdsmi commit: ae4f56d14b]
2025-06-06 16:29:20 -05:00
Saeed, Oosman cc2b4b4067 [SWDEV-536417] AFID & addc decode fixes (#449)
* fix endian problem
* use hw_revision and flags_mask from cper section instead of hardcoded values

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 815e0252b1]
2025-06-06 13:41:16 -05:00
Maisam Arif 8c60c4ed94 [SWDEV-536417] CPER & AFID CLI Fixes
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I20aafb1cd2bf8386c30e6d0a0fff8df9c8587554


[ROCm/amdsmi commit: 8bc37a19d2]
2025-06-06 12:26:13 -05:00
Charis Poag df6de25624 [SWDEV-529030/SWDEV-531217] Fix tests & output for partitioned configurations (CPX, DPX, QPX, etc.)
Changes:
  - Updated AMD SMI firmware to display "N/A" for unavailable firmware in partitioned environments, improving clarity.
    Example (in DPX):
    $ amd-smi firmware
    GPU: 0
        FW_LIST:
            ...
            FW 12:
                FW_ID: PM
                FW_VERSION: 00.86.39.00
    GPU: 1
        FW_LIST: N/A
  - Fixed amd-smi partition not showing current partition information on
    asics with inablity to set memory or accelerator partitions.
    $ amd-smi partition -c -m
    CURRENT_PARTITION:
    GPU_ID  MEMORY  ACCELERATOR_TYPE  ACCELERATOR_PROFILE_INDEX  PARTITION_ID
    0       NPS1    CPX               2                          0
    1       N/A     N/A               N/A                        1
    2       N/A     N/A               N/A                        2
    3       N/A     N/A               N/A                        3
    4       N/A     N/A               N/A                        4
    5       N/A     N/A               N/A                        5
    6       NPS1    SPX               0                          0
    7       NPS1    SPX               0                          0
    8       NPS1    SPX               0                          0

    MEMORY_PARTITION:
    GPU_ID  MEMORY_PARTITION_CAPS  CURRENT_MEMORY_PARTITION
    0       N/A                    NPS1
    1       N/A                    N/A
    2       N/A                    N/A
    3       N/A                    N/A
    4       N/A                    N/A
    5       N/A                    N/A
    6       N/A                    NPS1
    7       N/A                    NPS1
    8       N/A                    NPS1

  - Refactored amd_smi_drm_example.cc:
    - Grouped partition changes and restores original partition settings.
    - Now handles partitioned environments allowing example to continue even if some APIs are not supported in partitioned configurations.
  - Modified amdsmi_asic_info_t (see amdsmi_get_gpu_asic_info()) to report OAM ID as N/A if 0xFFFFFFFF (was 0xFFFF).
    Allows for better handling of OAM IDs in partitioned environments (DNE for non-primary nodes,
    since its a physical identifier). Easier to handle in tests and example code (ie. now consistent w/ max size of the structure's value).
  - Introduced amdsmi_RAII_open_FD() (internal API) to manage file descriptors using RAII, ensuring proper closure and preventing resource leaks.
    Updated the following APIs to use this function:
      - amdsmi_get_gpu_asic_info(), amdsmi_get_gpu_vram_usage(),
        amdsmi_get_gpu_vram_info(), amdsmi_get_gpu_vbios_info(),
        amdsmi_get_gpu_driver_info(), amdsmi_get_gpu_virtualization_mode()
  - Updated AMD SMI test_base.cc/.h:
    - Improved output and handling for partitioned environments.
    - Added detailed ASIC information logging to align with structure changes.
    - Enhanced error messages for better context before ASSERT checks.
  - Resolved test failures in partitioned environments by updating
    logic and handling for partition-specific configurations.
    Fixed tests include:
      - computepartition_read_write.cc, frequencies_read_write.cc,
        gpu_metrics_read.cc, mem_util_read.cc, memorypartition_read_write.cc,
        perf_level_read.cc, perf_level_read_write.cc, power_cap_read_write.cc,
        power_read.cc, sys_info_read.cc, gpu_busy_read.cc

Change-Id: I36e903f8fddd714c74c719459c71aba8bbb77e6f
Signed-off-by: Charis Poag <Charis.Poag@amd.com>

Resetting head + adding fixes for tests ran in partitions

Change-Id: I0c1e9ac07488b50c95f3bc6d8a724e67d2c715dc
Signed-off-by: Charis Poag <Charis.Poag@amd.com>


[ROCm/amdsmi commit: 391451752b]
2025-06-05 19:24:49 -05:00
Pham, Gabriel f12b070e14 [SWDEV-536184] Removed extra debug print statement (#447)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: f0233eb664]
2025-06-05 17:50:56 -05:00
gabrpham_amdeng f30205b296 [SWDEV-536184] Modified KFD fallback condition for getting VRAM to include sysfs read failures
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 7130de3058]
2025-06-05 01:49:16 -05:00
Bindhiya Kanangot Balakrishnan 60a86179b9 [SWDEV-534746] Generate valid json output for partition command
The amd-smi partition --json output was not in valid json
format. Changes are done to get the output in valid
json format.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 872c58b7a3]
2025-06-05 01:40:52 -05:00
Saeed, Oosman 99df131155 [SWDEV-530385] Update aca-decode with parsing fixes (#435)
*Update aca-decode to #4cd539d that fixes some errors in parsing cper files for afid extraction
*Without this fix, we get garbage value for some cper input files relating GFX_poison_cpers

Signed-off-by: Oosman Saeed <oossaeed@amd.com>

[ROCm/amdsmi commit: 2c3fa591b5]
2025-06-04 18:49:05 -05:00
Arif, Maisam e38de3932f Add Directory Not Found Status code to map to ENOTDIR (#238)
* Corrected ecc count error return
* Added directory not found error code
* Added ENOTDIR mapping to RSMI_STATUS_DIRECTORY_NOT_FOUND in ErrnoToRsmiStatus

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: e2692ab533]
2025-06-03 17:53:33 -05:00
Narlo, Joseph ba8d2f0d84 [SWDEV-532069] Doxygen Not Picking Non-Documented Values (#362)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>

[ROCm/amdsmi commit: c0c4e021ea]
2025-06-03 17:24:44 -05:00
Narlo, Joseph 4eb6d34df0 [SWDEV-532769] amd-smi APIs mismatch with documentation (#428)
* Populated socket_power to get power info
---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: ce7d6dfe61]
2025-06-03 17:12:13 -05:00
Bindhiya Kanangot Balakrishnan 851d0d015d [SWDEV-534745] Generate valid json output for xgmi command
The amd-smi xgmi --json output was not in valid json
format. Changes are done to get the output in valid
json format.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 8f943b03e1]
2025-06-03 12:48:02 -05:00
Saeed, Oosman 877c7b1bda [SWDEV-530385] show afids on each line of printout (#422)
* show afids on each line of printout
* clean up afids and cper code
---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: fab13c5b60]
2025-06-02 17:22:10 -05:00
Pham, Gabriel 3d75b7881a [SWDEV-446039] Added Flat Process table to default output (#425)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 91021da055]
2025-06-02 17:15:15 -05:00
Kanangot Balakrishnan, Bindhiya a3521ea6ed [SWDEV-519061] xgmi command output shows zero for all xgmi acc read/write data in the first column (#392)
The xgmi read and write accumulated data from gpu metric index
is based on sysfs xgmi_port_num file. Mapped these two to display
read and write wrt src_gpu Vs dst_gpu.
---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/amdsmi commit: 8ed52616ad]
2025-06-02 14:01:06 -05:00
Justin Williams d8b32bf2ee [SWDEV-533596] CI - Fixed Docs
Signed-off-by: Justin Williams <Justin.Williams@amd.com>


[ROCm/amdsmi commit: bf0448ff96]
2025-06-02 13:48:01 -05:00
Joseph Narlo 3d0f98c16d [SWDEV-522996] Syncing Unified Header and AMDSMI
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>


[ROCm/amdsmi commit: ee43ec71e8]
2025-06-02 13:44:33 -05:00
Maisam Arif cd11d4f051 Updated Changelog
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I10efa8ed10288d3445a330ad27081d1f03113b38


[ROCm/amdsmi commit: 996917e9bc]
2025-05-30 20:48:29 -05:00
Maisam Arif 00ad72baf9 Deprecated PASID
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib008f80f3d736172079358c0ceb3ebca87340d28


[ROCm/amdsmi commit: c89b5db09d]
2025-05-30 20:48:29 -05:00
Maisam Arif 16d60f3411 [SWDEV-488303] Fixed process list information source
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Iec3416cb5ca1bdd806c3225b514bbf3dbf8c0d2e


[ROCm/amdsmi commit: cebb0799cb]
2025-05-30 20:48:29 -05:00
Maisam Arif 5324134708 Version Bump 26.0.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I29ea6fa781dfc338a60b390ff498c46b4a1efe52


[ROCm/amdsmi commit: cc4dfd834f]
2025-05-30 20:48:29 -05:00
gabrpham_amdeng 42238ef83c Updated CLI Tool Help
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: c8f33c96c3]
2025-05-30 20:10:32 -05:00
dependabot[bot] 2f803473e1 Bump tornado from 6.4.2 to 6.5.1 in /docs/sphinx (#418)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4.2 to 6.5.1.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.2...v6.5.1)

---
updated-dependencies:
- dependency-name: tornado
  dependency-version: 6.5.1
  dependency-type: indirect
...

[ROCm/amdsmi commit: dd81cfd688]
2025-05-30 19:53:58 -05:00
gabrpham_amdeng c4f8ba1178 Suppressed help text of default command
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 1fa4cdacf3]
2025-05-30 19:53:14 -05:00
Pham, Gabriel d229f86108 [SWDEV-511822] Added group check to default command (#415)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: daf74d1cd6]
2025-05-30 18:40:18 -05:00
Kanangot Balakrishnan, Bindhiya f12c72a4e2 [SWDEV-530633] Use gpu_metric speed and BW for xgmi (#366)
The xgmi command was showing pcie bit rate and bandwidth instead of xgmi. Corrected the API to get xgmi data from gpu metric.
Added python API for amdsmi_get_link_metrics. Modified the amdsmi_link_metrics struct.
Added check to confirm non zero partition got xgmi command.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 2eff0b3764]
2025-05-30 16:51:11 -05:00
Castillo, Juan c830bb4d74 [SWDEV-534728] Fixed deep_sleep status does not work with --json flag (#413)
- When in json output mode the .rstrip function does not work due to dict obj type.
	- The clk_value is now checked for dict instance before extracting the value.
	- If clk_value is a dict then the .get() function is used to extract the value.
	- Else it is a string obj which uses .split() to extract the value.
	- If clk_value is < min_clk_value then deep_sleep is set to ENABLED
    - initialize clk_value and min_clk_value to 0 for each loop.
    - fix if/else for better readability

---------

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

[ROCm/amdsmi commit: 2e8aaf02c9]
2025-05-30 16:45:32 -05:00
Arif, Maisam da430dec05 [SWDEV-488303] Adjusted process vram_mem data source (#411)
* [SWDEV-488303] Adjusted process vram_mem data source
* Standardized sscanf format strings

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>

[ROCm/amdsmi commit: 42441c78ea]
2025-05-29 23:26:12 -05:00
Maisam Arif b2b6779593 [SWDEV-523247] Corrected amdsmi_get_gpu_vram_usage total
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I0f8bb067bf34f64d1b8d41e2a89d3a79a6745990


[ROCm/amdsmi commit: 876f3976e0]
2025-05-29 21:30:00 -05:00
Arif, Maisam 465f2e6a41 [SWDEV-488303] Updated CU occupancy for per-process retrieval (#243)
Change-Id: I2990597c6dd4b2e8cf3e11ce60f72049ebdd9a8c
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 0fdaebdbaa]
2025-05-29 20:35:27 -05:00