Граф коммитов

171 Коммитов

Автор SHA1 Сообщение Дата
Saeed, Oosman fd5e37a07e [SWDEV-546239] amd-smi ras cper - no data created (#614)
* Update amd-smi doc with examples of CPER and AFID API usage.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-20 11:27:41 -05:00
AL Musaffar, Yazen e84e364b35 [SWDEV-549789] Removed incorrect CPER AFID references (#619)
* Fix for afid help
* Update amdsmi_parser.py

Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-08-19 18:55:33 -05:00
Pryor, Adam 2dc2e12a97 Documentation updates for AMDSMI_GPU_METRICS_CACHE_MS (#564)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-05 19:58:37 -05:00
Pham, Gabriel e2eac98496 [SWDEV-545342] Fixed amdsmi_link_type_t enumeration (#560)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-22 18:22:49 -05:00
Castillo, Juan 3b1957e674 [SWDEV-531904] Added test_get_gpu_revision (#533)
* [SWDEV-531904] Added test_get_gpu_revision
New:
- amdsmi_get_gpu_revision() previously not implemented in amdsmi_interface.py
- test_get_gpu_revision() missing integration test.

Updated:
-changelog.md added new doc fields for ROCm 7.1
-amdsmi-py-api.md added field|description doc fields

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
2025-07-15 19:35:54 -05:00
dependabot[bot] 28e577f1c0 Bump urllib3 from 2.3.0 to 2.5.0 in /docs/sphinx (#546)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.3.0 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.3.0...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-13 11:43:30 -05:00
Kanangot Balakrishnan, Bindhiya f6b854b4ed [SWDEV-541289] Update violation argument in amd-smi (#526)
* Disabled violation argument for monitor on guests as it is supported on BM only. 
* Added `-v` and `--violation` args to metric along with `throttle` due to legacy behavior.
	* Supressed metric throttle arg and do not show in help text

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-07-09 16:38:09 -05:00
Park, Peter 8039ab9449 Fix links in docs (#532)
* fix links in amdsmi_cli/README.md
* fix xrefs to install docs
* rm rocm-smi examples and add cli tutorial
* rm disclaimer and add amd smi contributing guidelines to index

Signed-off-by: Peter Park <Peter.Park@amd.com>
2025-07-07 11:18:40 -05:00
Saeed, Oosman 5b95d227bc [SWDEV-538308] CPER CLI 20 limit bug (#499)
The bug was reproduced like this.

In terminal #1, run command:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

In terminal #2, inject errors:
while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done

The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information.

The fix:

Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available.

The Python CLI should not treat this as an error, but should continue to print what the API returned.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
2025-07-07 11:11:13 -05:00
Narlo, Joseph db3d763aad [SWDEV-482203] amd-smi Usage basics for C Library Multiple doc errors (#477)
* Added finding rocm include and library paths in code examples

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-06-19 11:25:57 -05:00
dependabot[bot] 152184dd49 Bump rocm-docs-core[api_reference] from 1.17.0 to 1.20.1 in /docs/sphinx
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.20.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.20.1/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.20.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.20.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-13 16:35:08 -05:00
dependabot[bot] 7e956ce4f3 Bump requests from 2.32.3 to 2.32.4 in /docs/sphinx (#471)
Bumps [requests](https://github.com/psf/requests) from 2.32.3 to 2.32.4.
- [Release notes](https://github.com/psf/requests/releases)
- [Changelog](https://github.com/psf/requests/blob/main/HISTORY.md)
- [Commits](https://github.com/psf/requests/compare/v2.32.3...v2.32.4)

---
updated-dependencies:
- dependency-name: requests
  dependency-version: 2.32.4
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-11 08:23:27 -05:00
Narlo, Joseph c0c4e021ea [SWDEV-532069] Doxygen Not Picking Non-Documented Values (#362)
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Co-authored-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>
2025-06-03 17:24:44 -05:00
Narlo, Joseph ce7d6dfe61 [SWDEV-532769] amd-smi APIs mismatch with documentation (#428)
* Populated socket_power to get power info
---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-06-03 17:12:13 -05:00
Saeed, Oosman fab13c5b60 [SWDEV-530385] show afids on each line of printout (#422)
* show afids on each line of printout
* clean up afids and cper code
---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-06-02 17:22:10 -05:00
Maisam Arif c89b5db09d Deprecated PASID
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib008f80f3d736172079358c0ceb3ebca87340d28
2025-05-30 20:48:29 -05:00
Maisam Arif cebb0799cb [SWDEV-488303] Fixed process list information source
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Iec3416cb5ca1bdd806c3225b514bbf3dbf8c0d2e
2025-05-30 20:48:29 -05:00
Maisam Arif cc4dfd834f Version Bump 26.0.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I29ea6fa781dfc338a60b390ff498c46b4a1efe52
2025-05-30 20:48:29 -05:00
gabrpham_amdeng c8f33c96c3 Updated CLI Tool Help
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-05-30 20:10:32 -05:00
dependabot[bot] dd81cfd688 Bump tornado from 6.4.2 to 6.5.1 in /docs/sphinx (#418)
Bumps [tornado](https://github.com/tornadoweb/tornado) from 6.4.2 to 6.5.1.
- [Changelog](https://github.com/tornadoweb/tornado/blob/master/docs/releases.rst)
- [Commits](https://github.com/tornadoweb/tornado/compare/v6.4.2...v6.5.1)

---
updated-dependencies:
- dependency-name: tornado
  dependency-version: 6.5.1
  dependency-type: indirect
...
2025-05-30 19:53:58 -05:00
Kanangot Balakrishnan, Bindhiya 2eff0b3764 [SWDEV-530633] Use gpu_metric speed and BW for xgmi (#366)
The xgmi command was showing pcie bit rate and bandwidth instead of xgmi. Corrected the API to get xgmi data from gpu metric.
Added python API for amdsmi_get_link_metrics. Modified the amdsmi_link_metrics struct.
Added check to confirm non zero partition got xgmi command.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-30 16:51:11 -05:00
Arif, Maisam 42441c78ea [SWDEV-488303] Adjusted process vram_mem data source (#411)
* [SWDEV-488303] Adjusted process vram_mem data source
* Standardized sscanf format strings

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-05-29 23:26:12 -05:00
Arif, Maisam 0fdaebdbaa [SWDEV-488303] Updated CU occupancy for per-process retrieval (#243)
Change-Id: I2990597c6dd4b2e8cf3e11ce60f72049ebdd9a8c
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-29 20:35:27 -05:00
Maisam Arif fba62e2270 [SWDEV-534707] Adjust power value documentation
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I1c4516e403715b9a1fe9c78fae94848c89daa920
2025-05-29 18:55:44 -05:00
Liu, Shuzhou (Bill) 970560fc7c [SWDEV-520665] Add support for board voltage (#303)
* Add the API and CLI to show the board voltage. 

---------

Change-Id: Icb25bd653bb1d004704b5a21b378ca31b2b242c7
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-05-29 18:55:08 -05:00
Kanangot Balakrishnan, Bindhiya 8e486c832b [SWDEV-463406] Update python doc for amdsmi_get_violation_status (#406)
* Updated the amdsmi_get_violation_status python API doc with newly added fields.
---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-05-29 14:59:16 -05:00
Saeed, Oosman 91c9969b72 [SWDEV-530385] Fix CPER "--follow" & "--file-limit" (#380)
* --follow option fix & --file_limit option added
* change --file_limit and --cper_file to --file-limit and --cper-file

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-29 11:59:55 -05:00
Pryor, Adam d0a89393df Remove ring hang (#391)
Change-Id: I856cd0949d3661911ab9302148aa1bc6e72abeed

Signed-off-by: adapryor <Adam.pryor@amd.com>
2025-05-29 11:58:46 -05:00
Narlo, Joseph 9862db63dd [SWDEV-532129] Update amdsmi asic info (#369)
* Added `subsystem_id` to `amdsmi_get_gpu_asic_info`
---------
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>
2025-05-28 18:26:58 -05:00
Pham, Gabriel c40d4291f6 Updated docs with new KFD events (#382)
* Updated docs with new KFD events

---------

Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2025-05-27 12:21:38 -05:00
Mewar, Deepak b999f86611 [SWDEV-512393] Added amdsmi_get_cpu_affinity_with_scope (#198)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
2025-05-20 01:06:09 -05:00
Saeed, Oosman 1bb1f8acc2 [SWDEV-522623] Add afid functionality to API and CLI (#330)
Change-Id: I015bde926491d54e09da8f39b05650515711e09f

[SWDEV-522623] Add afid functionality to API and CLI


Change-Id: I015bde926491d54e09da8f39b05650515711e09f

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
Co-authored-by: Oosman Saeed <oossaeed@amd.com>
2025-05-16 10:49:56 +08:00
Park, Peter d4f057f95f [SWDEV-528854] docs: Add description of N/A in SMI tool output (#363)
Signed-off-by: Park, Peter <Peter.Park@amd.com>
2025-05-14 11:43:33 -05:00
Arif, Maisam 249537b2ff CPER Doc update (#352)
Change-Id: I59053eda863fc2b7349a3071a02e4557a8abe8c7

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-08 12:20:00 -05:00
Arif, Maisam ace3b0901a Version & Doc update (#343)
Change-Id: Ibf8e1809913e30aba4b21ba889b72e5db7205736

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-05-08 12:19:04 -05:00
Poag, Charis b5a43b7744 [SWDEV-528647/SWDEV-528450] Reduce API load times and libdrm/libdrm_amdgpu dynamic loading (#333)
Changes:
- Removed libdrm/libdrm_amdgpu dependencies
- Added/updated new internal libdrm/libdrm_amdgpu/xf86drm APIs
  to allow our APIs to reference before dynamic loading
  the libdrm/libdrm_amdgpu libraries:
  1. amdgpu_drm.h to what's seen in mainline
  2. Added xf86drm.h to whats seen in mainline
- Modified internal DRM capabilities:
  1. Require each API to independently connect to libdrm/libdrm_amdgpu
     + validate API handles reponses accordingly
  2. Initialization of AMD SMI no longer has as strong of a tie to
     libdrm
- Updated internal implementations of several APIs which have
connections to libdrm/libdrm_amdgpu or APIs which have conflicts
with open libdrm/libdrm_amdgpu connections:
  1. amdsmi_init()
  2. amdsmi_get_gpu_vram_usage()
  3. amdsmi_get_gpu_asic_info()
  4. amdsmi_get_gpu_vram_info()
  5. amdsmi_get_gpu_vbios_info()
  6. amdsmi_get_gpu_driver_info()
  7. amdsmi_get_gpu_virtualization_mode()
  8. amdsmi_set_gpu_memory_partition()
  9. amdsmi_set_gpu_memory_partition_mode()
- Cleaned up effected tests/APIs

Change-Id: I96e2cf1b06b0cfee1b01a5e991ccc6116c4245a8
2025-05-02 21:58:53 -05:00
dependabot[bot] 581ad75729 Bump jinja2 from 3.1.5 to 3.1.6 in /docs/sphinx
Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-version: 3.1.6
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-16 15:08:41 -05:00
Liu, Shuzhou (Bill) d73452b3bf [SWDEV-526610] Palamida scan remediation copyright (#279)
Add missing copyrights
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
2025-04-16 14:54:45 -05:00
Kanangot Balakrishnan, Bindhiya 9d7964dff5 [SWDEV-516592] Add python interface API for Bad Page Threshold (#141)
- Added python interface APIs for amdsmi_get_gpu_bad_page_threshold()
 - Updated the docs and changelog.

---------

Signed-off-by: Kanangot Balakrishnan, Bindhiya <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
2025-04-14 04:19:45 -05:00
Justin Williams af943ac05c [SWDEV-521116] Added 'more_itertools" error workaround
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
2025-04-12 13:42:43 -05:00
Arif, Maisam d81871ef16 [SWDEV-511234] Added amdsmi_get_gpu_cper_entries & CLI implementation
Added amdsmi_get_gpu_cper_entries() in the python and C APIs

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
Co-authored-by: Saeed, Oosman <Oosman.Saeed@amd.com>
Co-authored-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-04-12 01:54:57 -05:00
Arif, Maisam 35fbe2cbf1 [SWDEV-521408] Fixed call to amdsmi_get_gpu_virtualization_mode (#230)
Change-Id: I29c86f8982b53cc139004ebc06b26a5d8f430091

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-04-01 16:57:23 -05:00
Galantsev, Dmitrii b0129c390c Bump Version 25.4.0
Change-Id: Ief60ff2270e7e73d4e14b5181fa6fb18e32bcc1e
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-28 21:50:38 -05:00
Yuan, Perry 68e44c7f66 [SWDEV-482949] Add CPU model name querying support (#33)
- Add support to check CPU vendor info which will be called by RDC to
discovery CPU information
- Move esmi headers declaration to impl/amd_smi_common.h
- remove duplicated amdsmi_cpu_util_t

---------

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Deepak Mewar <deepak.mewar@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Deepak Mewar <deepak.mewar@amd.com>
2025-03-28 21:21:39 -05:00
Galantsev, Dmitrii 4a3c70136f Make amdsmi_get_power_info backwards compatible
Change-Id: Ie5b4c35265827e78934caa94c142d31efce597e4
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-03-19 23:23:48 -05:00
Park, Peter 547ed49efb [SWDEV-519170] Docker container config documentation (#178)
* add docker container guide

* add example

* update index, README, and _toc
2025-03-14 09:58:46 -05:00
Gill, Harkirat 36a965b5c7 [ROCm/ROCm#4476] Update amdsmi-cli-tool.md to include partition cmd (#179)
Update amdsmi-cli-tool.md to include partition cmd

Signedoff-by: Gill, Harkirat <Harkirat.Gill@amd.com>
2025-03-13 04:59:34 -05:00
Arif, Maisam 0e67568902 [SWDEV-501958] Doc Update deprecating pasid in 7.0 (#166)
Change-Id: Ie19ba271c901d0be324143474871241272166124

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I024f7e2b5e7a5fcd6e1d12181d21ffacfe29c00f
2025-03-07 14:56:46 -06:00
Park, Peter 15c32f6116 [SWDEV-510820] Add missing goamdsmi documentation (#147)
* add API doc comments to goamdsmi.go
* update README and usage
* add sphinx directive to parse go doc
* fix walrus operator typos
* make docs more consistent
* add Go docs to index.md

---------

Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
2025-03-07 12:37:54 -06:00
AL Musaffar, Yazen 2936e00fed [SWDEV-453922] AMD SMI to provide mapping feature of other enumeration methods (#51)
Added enumeration mapping for 
- drm render
- drm card
- hsa id 
- hip id
- hip uuid (rocminfo uuid)

Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-03-07 09:09:12 -06:00