Graf commitů

191 Commity

Autor SHA1 Zpráva Datum
gabrpham_amdeng 18faddf6f3 Added support for configuring PPT1 power cap
- Updated python integration test to account for PPT1 support changes
  - Updated set/reset power-cap input format
  - Adjusted python API and updated C++ API test

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Change-Id: Ia9d02868b6e91c88c10a9772d9e2d9f37c3c352f
2025-11-13 13:08:12 -06:00
Pryor, Adam 2144cfbba4 [SWDEV-357472] Add evicted_ms metric (#620)
- **Added evicted_time metric for kfd processes**.  
  - Time that queues are evicted on a GPU in milliseconds
  - Added to CLI in `amd-smi monitor -q` and `amd-smi process`
  - Added to C API and Python API:
    - amdsmi_get_gpu_process_list()
    - amdsmi_get_gpu_compute_process_info()
    - amdsmi_get_gpu_compute_process_info_by_pid()

---------

Signed-off-by: Pryor, Adam <Adam.Pryor@amd.com>
2025-10-28 14:49:03 -05:00
dependabot[bot] 6f222c11a6 Bump rocm-docs-core[api_reference] from 1.26.0 to 1.27.0 in /docs/sphinx (#790)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.26.0 to 1.27.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.26.0...v1.27.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.27.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-28 09:59:49 -05:00
Park, Peter 12fb58c30b Update install instructions (#759)
- `amdgpu-install` is no longer recommended. Link to separate driver
installation docs.
- add verify step
- update readme
- add package info

Signed-off-by: Park, Peter <Peter.Park@amd.com>
2025-10-28 09:59:11 -05:00
Saeed, Oosman f7c9fe3011 [SWDEV-551318] Update readme doc: amdsmi_get_afids_from_cper() input arguments (#766)
* Update readme doc: amdsmi_get_afids_from_cper() input argument is only bytes, not a list of dicts each with keys “bytes” (List[int]) and “size” (int)

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
2025-10-17 15:42:17 -05:00
Pryor, Adam cba4c871d3 [SWDEV-559082] Add asic info cache (#756)
Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-10-08 21:48:08 -05:00
Narlo, Joseph 7decbc67a1 [SWDEV-539078] Add missing API definitions to python interface (#525)
Added the following API's to amdsmi_interface.py.
	amdsmi_get_cpu_handle()
	amdsmi_get_esmi_err_msg()
	amdsmi_get_gpu_event_notification()
	amdsmi_get_processor_count_from_handles()
	amdsmi_get_processor_handles_by_type()
	amdsmi_gpu_validate_ras_eeprom()
	amdsmi_init_gpu_event_notification()
	amdsmi_set_gpu_event_notification_mask()
	amdsmi_stop_gpu_event_notification()
	amdsmi_get_gpu_busy_percent()

Added additional return value to API amdsmi_get_xgmi_plpd().
	The entry policies is added to the end of the dictionary to match API definition.
	The entry plpds is marked for deprecation as it has the same information as policies.

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-10-06 14:50:00 -05:00
dependabot[bot] faf0024135 Bump rocm-docs-core[api_reference] from 1.25.0 to 1.26.0 in /docs/sphinx
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.25.0 to 1.26.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.26.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.25.0...v1.26.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.26.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-10-06 09:17:33 -05:00
Arif, Maisam 8758b8f75a [SWDEV-456192] Update process CLI help text (#720)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-09-26 17:33:01 -05:00
Park, Peter 311eade5b1 docs: Update Doxygen, Sphinx, and readthedocs configs (#719)
* conf: update RTD config to ub24.04 (doxygen 1.9.8) and py3.12
* update generate-docs workflow
* Update "modules" to "topics" due to Doxygen 1.9.8
* bump rocm-docs-core to 1.25.0 and pip-compile requirements.txt
* doxygen: fill in version string in Doxyfile from conf.py
* remove unneeded rocm-smi-lib tutorials
* remove wikipedia references in doxyfile to satisfy ci check

Signed-off-by: Park, Peter <Peter.Park@amd.com>
2025-09-26 17:30:48 -05:00
AL Musaffar, Yazen 6550c51b35 [AMD-SMI] [SWDEV-551318] amdsmi_get_afids_from_cper python api Docs Updated (#709)
* Fix formatting & Examples for amdsmi_get_afids_from_cper CPER record examples in documentation

---------
Change-Id: Ib5e268dc818adc633541652a0eb982641385bf7d
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-09-24 21:06:38 -05:00
Maisam Arif cd21b5edcc [SWDEV-554587] Added IFWI Version and boot_firmware API
- Changed amd-smi static --vbios to accept ifwi
- Change population logic for vbios version API
- Added IFWI boot_firmware to the CLI, C++, Rust, and Python API

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I4ea504d40a43cfb011ab38fc9a664ecf12d39c8a
2025-09-23 16:05:10 -05:00
Maisam Arif c708a7e11f Version Bump 26.1.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I2ca5acf58741fa4c64476615371b400b080e17e8
2025-09-23 16:05:10 -05:00
Park, Peter 5d0a39fa9d docs: Fix links to API usage examples (#701)
* fix links to python apis
* add links to repo for example code
* fix `WARNING: Pygments lexer name is not known`

Signed-off-by: Peter Park <Peter.Park@amd.com>
2025-09-19 10:07:38 -05:00
Saeed, Oosman ea225b459b Python_Cli_Examples (#696)
* Adjusted Python CLI examples

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-09-16 18:53:07 -05:00
Arif, Maisam fd5eb4e963 [SWDEV-550075] Updated README to link to amd-smi virtualization repo (#664)
Co-authored-by: Peter Park <peter.park@amd.com>
2025-09-09 16:05:01 -05:00
Park, Peter 5e92adc5b3 [SWDEV-551318] Add doc about RAS / CPER (#636)
* add doc about ras/cper
* add sample code examples for CPER and AFID
---------

Signed-off-by: Park, Peter <Peter.Park@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Oosman Saeed <oossaeed@amd.com>
2025-09-09 10:27:15 -05:00
Mario Limonciello (AMD) 924a06d1e1 Remove unnecessary includes
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-09-05 17:44:17 -05:00
Arif, Maisam ed2300516f Revert "[SWDEV-536176] libdrm_amdgpu depdency change (#448)"
This reverts commit 652761de54.
2025-08-27 20:11:17 -05:00
Arif, Maisam 652761de54 [SWDEV-536176] libdrm_amdgpu depdency change (#448)
* Cmake fix updates
* Next fix will be addressing libdrm further

---------

Signed-off-by: adapryor <Adam.pryor@amd.com>
Signed-off-by: Justin Williams <juwillia@amd.com>
2025-08-27 09:32:51 -05:00
Saeed, Oosman fd5e37a07e [SWDEV-546239] amd-smi ras cper - no data created (#614)
* Update amd-smi doc with examples of CPER and AFID API usage.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-20 11:27:41 -05:00
AL Musaffar, Yazen e84e364b35 [SWDEV-549789] Removed incorrect CPER AFID references (#619)
* Fix for afid help
* Update amdsmi_parser.py

Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-08-19 18:55:33 -05:00
Pryor, Adam 2dc2e12a97 Documentation updates for AMDSMI_GPU_METRICS_CACHE_MS (#564)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-05 19:58:37 -05:00
Pham, Gabriel e2eac98496 [SWDEV-545342] Fixed amdsmi_link_type_t enumeration (#560)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-22 18:22:49 -05:00
Castillo, Juan 3b1957e674 [SWDEV-531904] Added test_get_gpu_revision (#533)
* [SWDEV-531904] Added test_get_gpu_revision
New:
- amdsmi_get_gpu_revision() previously not implemented in amdsmi_interface.py
- test_get_gpu_revision() missing integration test.

Updated:
-changelog.md added new doc fields for ROCm 7.1
-amdsmi-py-api.md added field|description doc fields

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
2025-07-15 19:35:54 -05:00
dependabot[bot] 28e577f1c0 Bump urllib3 from 2.3.0 to 2.5.0 in /docs/sphinx (#546)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.3.0 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.3.0...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-13 11:43:30 -05:00
Kanangot Balakrishnan, Bindhiya f6b854b4ed [SWDEV-541289] Update violation argument in amd-smi (#526)
* Disabled violation argument for monitor on guests as it is supported on BM only. 
* Added `-v` and `--violation` args to metric along with `throttle` due to legacy behavior.
	* Supressed metric throttle arg and do not show in help text

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-07-09 16:38:09 -05:00
Park, Peter 8039ab9449 Fix links in docs (#532)
* fix links in amdsmi_cli/README.md
* fix xrefs to install docs
* rm rocm-smi examples and add cli tutorial
* rm disclaimer and add amd smi contributing guidelines to index

Signed-off-by: Peter Park <Peter.Park@amd.com>
2025-07-07 11:18:40 -05:00
Saeed, Oosman 5b95d227bc [SWDEV-538308] CPER CLI 20 limit bug (#499)
The bug was reproduced like this.

In terminal #1, run command:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

In terminal #2, inject errors:
while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done

The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information.

The fix:

Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available.

The Python CLI should not treat this as an error, but should continue to print what the API returned.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
2025-07-07 11:11:13 -05:00
Narlo, Joseph db3d763aad [SWDEV-482203] amd-smi Usage basics for C Library Multiple doc errors (#477)
* Added finding rocm include and library paths in code examples

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-06-19 11:25:57 -05:00
dependabot[bot] 152184dd49 Bump rocm-docs-core[api_reference] from 1.17.0 to 1.20.1 in /docs/sphinx
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.20.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.20.1/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.20.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.20.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-13 16:35:08 -05:00
dependabot[bot] 7e956ce4f3 Bump requests from 2.32.3 to 2.32.4 in /docs/sphinx (#471)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-11 08:23:27 -05:00
Narlo, Joseph c0c4e021ea [SWDEV-532069] Doxygen Not Picking Non-Documented Values (#362)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>
2025-06-03 17:24:44 -05:00
Narlo, Joseph ce7d6dfe61 [SWDEV-532769] amd-smi APIs mismatch with documentation (#428)
* Populated socket_power to get power info
---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-06-03 17:12:13 -05:00
Saeed, Oosman fab13c5b60 [SWDEV-530385] show afids on each line of printout (#422)
* show afids on each line of printout
* clean up afids and cper code
---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-06-02 17:22:10 -05:00
Maisam Arif c89b5db09d Deprecated PASID
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib008f80f3d736172079358c0ceb3ebca87340d28
2025-05-30 20:48:29 -05:00
Maisam Arif cebb0799cb [SWDEV-488303] Fixed process list information source
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Iec3416cb5ca1bdd806c3225b514bbf3dbf8c0d2e
2025-05-30 20:48:29 -05:00
Maisam Arif cc4dfd834f Version Bump 26.0.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I29ea6fa781dfc338a60b390ff498c46b4a1efe52
2025-05-30 20:48:29 -05:00
gabrpham_amdeng c8f33c96c3 Updated CLI Tool Help
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-05-30 20:10:32 -05:00
dependabot[bot] dd81cfd688 Bump tornado from 6.4.2 to 6.5.1 in /docs/sphinx (#418)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4.2 to 6.5.1.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.2...v6.5.1)

---
updated-dependencies:
- dependency-name: tornado
  dependency-version: 6.5.1
  dependency-type: indirect
...
2025-05-30 19:53:58 -05:00
Kanangot Balakrishnan, Bindhiya 2eff0b3764 [SWDEV-530633] Use gpu_metric speed and BW for xgmi (#366)
The xgmi command was showing pcie bit rate and bandwidth instead of xgmi. Corrected the API to get xgmi data from gpu metric.
Added python API for amdsmi_get_link_metrics. Modified the amdsmi_link_metrics struct.
Added check to confirm non zero partition got xgmi command.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-30 16:51:11 -05:00
Arif, Maisam 42441c78ea [SWDEV-488303] Adjusted process vram_mem data source (#411)
* [SWDEV-488303] Adjusted process vram_mem data source
* Standardized sscanf format strings

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-05-29 23:26:12 -05:00
Arif, Maisam 0fdaebdbaa [SWDEV-488303] Updated CU occupancy for per-process retrieval (#243)
Change-Id: I2990597c6dd4b2e8cf3e11ce60f72049ebdd9a8c
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-29 20:35:27 -05:00
Maisam Arif fba62e2270 [SWDEV-534707] Adjust power value documentation
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I1c4516e403715b9a1fe9c78fae94848c89daa920
2025-05-29 18:55:44 -05:00
Liu, Shuzhou (Bill) 970560fc7c [SWDEV-520665] Add support for board voltage (#303)
* Add the API and CLI to show the board voltage. 

---------

Change-Id: Icb25bd653bb1d004704b5a21b378ca31b2b242c7
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-05-29 18:55:08 -05:00
Kanangot Balakrishnan, Bindhiya 8e486c832b [SWDEV-463406] Update python doc for amdsmi_get_violation_status (#406)
* Updated the amdsmi_get_violation_status python API doc with newly added fields.
---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-05-29 14:59:16 -05:00
Saeed, Oosman 91c9969b72 [SWDEV-530385] Fix CPER "--follow" & "--file-limit" (#380)
* --follow option fix & --file_limit option added
* change --file_limit and --cper_file to --file-limit and --cper-file

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-29 11:59:55 -05:00
Pryor, Adam d0a89393df Remove ring hang (#391)
Change-Id: I856cd0949d3661911ab9302148aa1bc6e72abeed

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-05-29 11:58:46 -05:00
Narlo, Joseph 9862db63dd [SWDEV-532129] Update amdsmi asic info (#369)
* Added `subsystem_id` to `amdsmi_get_gpu_asic_info`
---------
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
2025-05-28 18:26:58 -05:00
Pham, Gabriel c40d4291f6 Updated docs with new KFD events (#382)
* Updated docs with new KFD events

---------

Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2025-05-27 12:21:38 -05:00