Graf commitů

1651 Commity

Autor SHA1 Zpráva Datum
Justin Williams 6d062556c5 CI - Fixed Debian 10 Install Errors
Signed-off-by: Justin Williams <juwillia@amd.com>


[ROCm/amdsmi commit: 553f2bfce3]
2025-07-22 17:17:15 -05:00
Galantsev, Dmitrii 4b722349d6 .clangd - Remove google readability config
Change-Id: I0535af5053eac9add068926c44073ae884df2008
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 1042c4fa6b]
2025-07-21 15:06:53 -05:00
Castillo, Juan 801dbaedec [SWDEV-531904] Added test_get_gpu_revision (#533)
* [SWDEV-531904] Added test_get_gpu_revision
New:
- amdsmi_get_gpu_revision() previously not implemented in amdsmi_interface.py
- test_get_gpu_revision() missing integration test.

Updated:
-changelog.md added new doc fields for ROCm 7.1
-amdsmi-py-api.md added field|description doc fields

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

[ROCm/amdsmi commit: 3b1957e674]
2025-07-15 19:35:54 -05:00
Saeed, Oosman 9092f915fe SWDEV-539482: Different sizes of mem leaks observed in amdsmitst (#538)
Signed-off-by:Oosman Saeed <oossaeed@amd.com>

[ROCm/amdsmi commit: 03414e20ee]
2025-07-15 14:33:27 -05:00
Bindhiya Kanangot Balakrishnan c2bc3ca72e [SWDEV-543308] Revert amdsmi_link_metrics structure change
Moved the bit_rate and max_bandwidth back into links in the
amdsmi_link_metrics_t struct as this change was impacting
other teams. Modified the C and python API's, wrapper, and
CLI accordingly.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: 645c313f00]
2025-07-14 13:56:26 -05:00
Maisam Arif 45d8f954c8 gpu_metrics caching fix
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I6dacb0b81d6677c354ef3c86af4d7d5156a76d8b


[ROCm/amdsmi commit: fcf494bbc5]
2025-07-14 12:12:37 -05:00
dependabot[bot] dbe45d5ea2 Bump urllib3 from 2.3.0 to 2.5.0 in /docs/sphinx (#546)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.3.0 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.3.0...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

[ROCm/amdsmi commit: 28e577f1c0]
2025-07-13 11:43:30 -05:00
Pryor, Adam 4303644f90 Add gpu metrics cache (#541)
* Add gpu metrics caching defaulted to 100ms
* AMDSMI_GPU_METRICS_CACHE_MS is used to set the caching rate limits

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 42096c1398]
2025-07-13 09:56:29 -05:00
Maisam Arif 6531fdd0fb Reduced calls to drm devinfo for getting virtualization_mode
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I22a6a9ca15131b37a775e8d4f595fb13c0b043c7


[ROCm/amdsmi commit: 10f9aae0b3]
2025-07-11 12:26:42 -05:00
Justin Williams 35b740f736 CI - Added Docs Generation Instructions
Signed-off-by: Justin Williams <juwillia@amd.com>


[ROCm/amdsmi commit: af69f75a86]
2025-07-10 09:42:51 -05:00
Kanangot Balakrishnan, Bindhiya be13c1cf81 [SWDEV-541289] Update violation argument in amd-smi (#526)
* Disabled violation argument for monitor on guests as it is supported on BM only. 
* Added `-v` and `--violation` args to metric along with `throttle` due to legacy behavior.
	* Supressed metric throttle arg and do not show in help text

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/amdsmi commit: f6b854b4ed]
2025-07-09 16:38:09 -05:00
Kanangot Balakrishnan, Bindhiya 4f43139bce [SWDEV-539721] Show complete process name (#536)
Modified the file used to fetch process name so that complete name with path can be displayed.

Changes:
amd-smi monitor -q
- human readable format will output only the process name
- csv and json formats will print the full path

amd-smi process
- name will always be the full path to the process

amd-smi (default output)
- name will always be truncated.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 514517e536]
2025-07-09 16:34:39 -05:00
AL Musaffar, Yazen 64bf2d6ae9 [SWDEV-532904] CLI lists unusable UUID without sudo (#510)
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>

[ROCm/amdsmi commit: 01a6158c85]
2025-07-09 15:45:03 -05:00
josnarlo 395a42cafa [SWDEV-536953] Align Power Cap Behavior with ROCM_SMI
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 0257140504]
2025-07-09 15:37:40 -05:00
Castillo, Juan c9d14c1c93 [SWDEV-531904] Removed Handle Exceptions function (#531)
Removed:
- handle_exceptions() Exposes, silences, and logs AMDSMI exceptions to users returns success/failure

Signed-off-by: Juan Castillo <juan.castillo@amd.com>

[ROCm/amdsmi commit: 34f465bfc5]
2025-07-07 13:26:26 -05:00
Kanangot Balakrishnan, Bindhiya a59cd4c25e [SWDEV-537852] Update process name help text (#517)
* [SWDEV-537852] Update process name help text

Currently process name displays N/A if that need elevated
permissions. Updated the default amd-smi, process and monitor
commands help texts to display elevated permission requirement.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: ce230efaaa]
2025-07-07 11:26:10 -05:00
Poag, Charis 92f926b43b [SWDEV-533305] Remove partition info from amd-smi static (-p/--partition still available) + CLI API call cleanup (#529)
Updates:
- Separate extra APIs calls from amd-smi CLI to target specific CLI commands that need them.
- Remove extra current_compute_partition SYSFS calls from amd-smi static.
- Remove the partition information from the default `amd-smi static` CLI command.
- Users must now use the `-p` argument to view partition information with `amd-smi static`.
- The help text for the `partition` argument has been updated to reflect this change.
- The partition information can still be accessed using the `amd-smi partition -c -m` or `sudo amd-smi partition -a` commands.

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>

[ROCm/amdsmi commit: 88473b7fd0]
2025-07-07 11:21:46 -05:00
Park, Peter aecda316c4 Fix links in docs (#532)
* fix links in amdsmi_cli/README.md
* fix xrefs to install docs
* rm rocm-smi examples and add cli tutorial
* rm disclaimer and add amd smi contributing guidelines to index

Signed-off-by: Peter Park <Peter.Park@amd.com>

[ROCm/amdsmi commit: 8039ab9449]
2025-07-07 11:18:40 -05:00
Narlo, Joseph 540ecd41bd [SWDEV-541675] Remove Unnecessary API from amdsmi.h (#530)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: 2cf6272b53]
2025-07-07 11:14:27 -05:00
Saeed, Oosman 1c60502d5f [SWDEV-538308] CPER CLI 20 limit bug (#499)
The bug was reproduced like this.

In terminal #1, run command:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

In terminal #2, inject errors:
while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done

The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information.

The fix:

Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available.

The Python CLI should not treat this as an error, but should continue to print what the API returned.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>

[ROCm/amdsmi commit: 5b95d227bc]
2025-07-07 11:11:13 -05:00
Cheruvally, Aravindan d25c73783e [SWDEV-530465] Update share/doc/<pkgnm> License Folder (#516)
Update share/doc/ folder for license/docs to reflect correct package name.
Signed-off-by: Cheruvally, Aravindan <Aravindan.Cheruvally@amd.com>

[ROCm/amdsmi commit: f559075a81]
2025-07-03 02:07:54 -05:00
gabrpham_amdeng e9f7fe2842 [SWDEV-539451] Adjusted reset command to prevent reset on partitions
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: a2885d6e70]
2025-07-03 01:11:46 -05:00
Kanangot Balakrishnan, Bindhiya f1db852cba [SWDEV-530646] Update changelog for topology optimization (#523)
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>

[ROCm/amdsmi commit: 80f7045f61]
2025-06-30 17:36:14 -05:00
Justin Williams 490a186dfc CI - Changed CI Runners
Signed-off-by: Justin Williams <Justin.Williams@amd.com>


[ROCm/amdsmi commit: abd3bf2dcf]
2025-06-30 14:23:43 -05:00
Bindhiya Kanangot Balakrishnan 38f59e353a [SWDEV-540014] Correct topology link_type check
Topology numa_bw checks for non-xgmi links to set as N/A.
The recent change in link_type enum mapping caused this
condition to check for PCIE instead of XGMI. Corrected
the same.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: fa9ca21520]
2025-06-30 14:01:19 -05:00
Jeremy Newton 7b9ef0e406 Don't install asan docs if disabled
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/amdsmi commit: 529c6ee151]
2025-06-30 12:05:29 -05:00
Williams, Justin 21f5755794 Fixed NoDRM Failures
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Williams, Justin <Justin.Williams@amd.com>

[ROCm/amdsmi commit: 738627af29]
2025-06-25 13:18:25 -05:00
Justin Williams 010f95bfb7 Fixed NoDRM Failures
Signed-off-by: Justin Williams <Justin.Williams@amd.com>


[ROCm/amdsmi commit: bad4868f39]
2025-06-25 13:18:25 -05:00
josnarlo 3f6b0bb1c7 [SWDEV-539912] Add Skipping to Unit Tests
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 5858d643f3]
2025-06-24 12:01:32 -05:00
Bindhiya Kanangot Balakrishnan 371b349f6c [SWDEV-530646] Reduce amdsmi_topo_get_p2p_status calls in topology
The topology method calls amdsmi_topo_get_p2p_status repeatedly
for the same GPU pairs across different table sections,
significantly impacting performance with 60+ GPUs. Reduced this
by implemeting result caching.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>


[ROCm/amdsmi commit: c3453f7c97]
2025-06-24 11:27:28 -05:00
Maisam Arif cd057e446f [SWDEV-533390] Removed kfd_ioctl.h from being copied on install
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I03cb03b5f034e822c8f3c2d1e11e8b4e57251905


[ROCm/amdsmi commit: 2d2e5fe692]
2025-06-20 14:32:16 -05:00
josnarlo 4c0c050962 [SWDEV-539591] Allow integration tests to skip Not Supported APIs
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: d8b8dc4116]
2025-06-20 14:19:56 -05:00
Galantsev, Dmitrii 44986cfbd4 DRM - Remove FD usage
Change-Id: I77dfa778ccd0d39a03289c2e11cf10357566ff16
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 9b5bbf555a]
2025-06-20 11:00:42 -05:00
Galantsev, Dmitrii 40228106b5 DRM - Remove caching
Change-Id: I21716cc953462e385e981024f75a9a7c2d76a466
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 202b46d96f]
2025-06-20 11:00:42 -05:00
Galantsev, Dmitrii ccdd52e9c0 DRM - Update to latest public
Change-Id: I9f7b46acbae654c377702a599c4b094fd621f101
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: cb2f152205]
2025-06-20 11:00:42 -05:00
Maisam Arif bc0c47c515 Fix subsystem_id str comparision
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Icbe2440884458b63b42cb653009e7df36eb31e0f


[ROCm/amdsmi commit: 28a7f536f9]
2025-06-19 17:21:17 -05:00
Narlo, Joseph c5e604f357 [SWDEV-489696] Improve AMD SMI Python APIs Functional and Unit Testing (#468)
* Adding python unit tests
* Remove duplicate functions definitions
* Added missing classes for __init__ for py-interface

---------

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 7c0802889b]
2025-06-19 16:38:34 -05:00
Arif, Maisam 6123abe733 [SWDEV-538786] Fix ecc counts returning file error (#494)
Change-Id: I5cea584289df95e89b6151d549bf69e4c3e50d22

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 967e879861]
2025-06-19 15:24:03 -05:00
Castillo, Juan 4a55abaa05 [SWDEV-531904] - Added GPU Cache Read Tests (#464)
New:
- gpu_cache_read.h and gpu_cache_read.cc
- Test reads GPU cache info and asserts valid structure
Updated:
- integration_test.py
- Added test_gpu_cache_info() and asserts valid structure
- test_get_gpu_compute_partition() to loop through all devices when test fail/pass
Added:
- test_get_gpu_compute_partition_returns_string() to integration_test.py
- This test displays the current compute partition for each bdf

---------

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Signed-off-by: Castillo, Juan <Juan.Castillo@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 470c62f887]
2025-06-19 15:23:34 -05:00
Narlo, Joseph f543f77e30 [SWDEV-537038] amd-smi-lib build failing Fix for integration_test.py (#496)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
Signed-off-by: Narlo, Joseph <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: 57a749f457]
2025-06-19 15:12:31 -05:00
Pham, Gabriel aa95feee60 [SWDEV-531386] Changed source of metric GFX and MEM min and max clk to pp_od_clk_voltage (#453)
* Made corrections to reading of pp_od_clk_voltage
* Added fall back to pp_dpm files if pp_od_clk_voltage doesn't exist

---------

Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

[ROCm/amdsmi commit: 4262aee8f5]
2025-06-19 15:00:45 -05:00
Galantsev, Dmitrii a480b2869d rsmi_init: Do not complain loudly when no driver is found (#74)
Co-authored-by: Samuel Thibault <samuel.thibault@ens-lyon.org>


[ROCm/amdsmi commit: ca52da194d]
2025-06-19 13:22:48 -05:00
Narlo, Joseph 154d266abc [SWDEV-482203] amd-smi Usage basics for C Library Multiple doc errors (#477)
* Added finding rocm include and library paths in code examples

Signed-off-by: josnarlo <Joseph.Narlo@amd.com>

[ROCm/amdsmi commit: db3d763aad]
2025-06-19 11:25:57 -05:00
josnarlo 0862dd11fb [SWDEV-537038] amd_smi-lib build failing Fix for integration_test.py
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>


[ROCm/amdsmi commit: 99b2bfbc61]
2025-06-19 11:23:25 -05:00
Justin Williams 31df8b46bd Adjusted amd-smi set --compute-partition docs
Signed-off-by: Justin Williams <juwillia@amd.com>


[ROCm/amdsmi commit: 81d58f06d1]
2025-06-19 10:58:04 -05:00
gabrpham_amdeng 771e3019ad Adjusted CU % logic to be more robust
[ROCm/amdsmi commit: 9729aba695]
2025-06-19 10:57:19 -05:00
gabrpham_amdeng d049815647 Changed NUM_CU to CU %
[ROCm/amdsmi commit: fd751ba918]
2025-06-19 10:57:19 -05:00
gabrpham 66d3ffe65a Added GTT Memory to process table of default command
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 9e221a3f09]
2025-06-19 10:57:19 -05:00
gabrpham 0e30436a0f Added GTT Memory to default command and adjusted table format
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>


[ROCm/amdsmi commit: 8a0e65d911]
2025-06-19 10:57:19 -05:00
Galantsev, Dmitrii 06b8484bbc CLI - Fix partition json output
Change-Id: I2b9e575cb960db7c136776bfe5c040b27feba727
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>


[ROCm/amdsmi commit: 4262802588]
2025-06-19 10:34:57 -05:00