Граф коммитов

1889 Коммитов

Автор SHA1 Сообщение Дата
Pham, Gabriel fc5ea762b3 Added Platform Information to Default Command (#553)
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-08-05 20:11:42 -05:00
Pryor, Adam 2dc2e12a97 Documentation updates for AMDSMI_GPU_METRICS_CACHE_MS (#564)
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-05 19:58:37 -05:00
Galantsev, Dmitrii 4044d1da41 CI - Use self-hosted machines for format checking due to IP whitelist
Change-Id: I1f0f4af7ed42d849cf4c9384e3c0c6da57b0504c
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-08-04 21:09:25 -05:00
AL Musaffar, Yazen 27cae85910 [SWDEV-544092] Fix Navi process float conversion (#579)
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-08-04 14:40:18 -05:00
Bindhiya Kanangot Balakrishnan b16a66b2c5 [SWDEV-525336] Fix N/A process name display
The amd-smi command will will show only executable
name of a process by stripping absolute path. This
cause "N/A" process names incorrectly display as
"A" in the output. Corrected the same.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-08-04 13:51:42 -05:00
Kanangot Balakrishnan, Bindhiya 27a1705d96 [SWDEV-537852] Update compute-partition set error messages (#505)
[SWDEV-537852] Update compute-partition set error messages

Setting compute partition needs sudo privileges. Added
AmdSmiPermissionDeniedException to display CLI elevated
permission errors.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-08-01 08:22:22 -05:00
Arif, Maisam 240a607904 Revert "[SWDEV-505176] Submodule Unified Header in AMDSMI"
This reverts commit a315b62e37.
2025-07-30 14:08:24 -05:00
Narlo, Joseph a315b62e37 [SWDEV-505176] Submodule Unified Header in AMDSMI
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-30 13:37:01 -05:00
gabrpham_amdeng 4f0d1c8c29 [SWDEV-543627] Fixed incorrect metric min clock values
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-07-26 04:55:25 -05:00
Kanangot Balakrishnan, Bindhiya c9f0d1b953 [SWDEV-545342] Remove link type translation (#575) 2025-07-25 13:16:06 -05:00
Williams, Justin d11ae93eb0 Updated CODEOWNERS (#578)
Signed-off-by: Justin Williams <juwillia@amd.com>
Co-authored-by: Justin Williams <juwillia@amd.com>
2025-07-25 09:42:16 -07:00
Bindhiya Kanangot Balakrishnan 449839a32e [SWDEV-537852] Update help text for InvalidParameterValueException
Updated the help text to display command name.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-07-24 10:47:13 -05:00
Justin Williams 0d76d78e49 CI - Added Debian 10 Repository Updates
Signed-off-by: Justin Williams <juwillia@amd.com>
2025-07-24 10:39:38 -05:00
Kanangot Balakrishnan, Bindhiya 6f7b397998 [SWDEV-537852] Update help and error text (#518)
Improved amd-smi help and error messages.
Updated to show subcommand name in help text.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-07-24 09:06:22 -05:00
Justin Williams 4c09fcac1f CI - Make ABI compliance checks non-blocking with warning labels
Signed-off-by: Justin Williams <juwillia@amd.com>
2025-07-24 08:49:44 -05:00
Pham, Gabriel e2eac98496 [SWDEV-545342] Fixed amdsmi_link_type_t enumeration (#560)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-22 18:22:49 -05:00
Williams, Justin 5b72f3a950 CI - Created Automatic Github to Gerrit Mirror (#556)
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
2025-07-22 17:30:40 -05:00
Poag, Charis ec055f2c2d [SWDEV-536953] Fix sets/resets + Align Power Cap Behavior with ROCM_SMI (#456)
Changes:
  - Modified outputputs for amd-smi set/reset when in partitions
    to display error codes
  - Provided some general cleanup for the above ^
----------------------------------------------------
  - Updated  `amd-smi set -o <value>` /  `amd-smi set --power-cap <value>`  command to
    allow setting power cap to values other than 0, provided the current power cap is not 0.
  - Modified power_cap_read_write.cc:
    - Added a check to ensure that the power cap can only be set to non-zero values if the current
      power cap is not 0.
    - Reset the power cap to the original value after the test to maintain state consistency.
Change-Id: If489bb35812ba4fc4cc34723b0dc39c99926e5d7

---------

Signed-off-by: Poag, Charis <Charis.Poag@amd.com>
2025-07-22 17:21:15 -05:00
Justin Williams 553f2bfce3 CI - Fixed Debian 10 Install Errors
Signed-off-by: Justin Williams <juwillia@amd.com>
2025-07-22 17:17:15 -05:00
Galantsev, Dmitrii 1042c4fa6b .clangd - Remove google readability config
Change-Id: I0535af5053eac9add068926c44073ae884df2008
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-07-21 15:06:53 -05:00
Castillo, Juan 3b1957e674 [SWDEV-531904] Added test_get_gpu_revision (#533)
* [SWDEV-531904] Added test_get_gpu_revision
New:
- amdsmi_get_gpu_revision() previously not implemented in amdsmi_interface.py
- test_get_gpu_revision() missing integration test.

Updated:
-changelog.md added new doc fields for ROCm 7.1
-amdsmi-py-api.md added field|description doc fields

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
2025-07-15 19:35:54 -05:00
Saeed, Oosman 03414e20ee SWDEV-539482: Different sizes of mem leaks observed in amdsmitst (#538)
Signed-off-by:Oosman Saeed <oossaeed@amd.com>
2025-07-15 14:33:27 -05:00
Bindhiya Kanangot Balakrishnan 645c313f00 [SWDEV-543308] Revert amdsmi_link_metrics structure change
Moved the bit_rate and max_bandwidth back into links in the
amdsmi_link_metrics_t struct as this change was impacting
other teams. Modified the C and python API's, wrapper, and
CLI accordingly.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-07-14 13:56:26 -05:00
Maisam Arif fcf494bbc5 gpu_metrics caching fix
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I6dacb0b81d6677c354ef3c86af4d7d5156a76d8b
2025-07-14 12:12:37 -05:00
dependabot[bot] 28e577f1c0 Bump urllib3 from 2.3.0 to 2.5.0 in /docs/sphinx (#546)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.3.0 to 2.5.0.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](https://github.com/urllib3/urllib3/compare/2.3.0...2.5.0)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-version: 2.5.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-07-13 11:43:30 -05:00
Pryor, Adam 42096c1398 Add gpu metrics cache (#541)
* Add gpu metrics caching defaulted to 100ms
* AMDSMI_GPU_METRICS_CACHE_MS is used to set the caching rate limits

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-13 09:56:29 -05:00
Maisam Arif 10f9aae0b3 Reduced calls to drm devinfo for getting virtualization_mode
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I22a6a9ca15131b37a775e8d4f595fb13c0b043c7
2025-07-11 12:26:42 -05:00
Justin Williams af69f75a86 CI - Added Docs Generation Instructions
Signed-off-by: Justin Williams <juwillia@amd.com>
2025-07-10 09:42:51 -05:00
Kanangot Balakrishnan, Bindhiya f6b854b4ed [SWDEV-541289] Update violation argument in amd-smi (#526)
* Disabled violation argument for monitor on guests as it is supported on BM only. 
* Added `-v` and `--violation` args to metric along with `throttle` due to legacy behavior.
	* Supressed metric throttle arg and do not show in help text

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-07-09 16:38:09 -05:00
Kanangot Balakrishnan, Bindhiya 514517e536 [SWDEV-539721] Show complete process name (#536)
Modified the file used to fetch process name so that complete name with path can be displayed.

Changes:
amd-smi monitor -q
- human readable format will output only the process name
- csv and json formats will print the full path

amd-smi process
- name will always be the full path to the process

amd-smi (default output)
- name will always be truncated.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-09 16:34:39 -05:00
AL Musaffar, Yazen 01a6158c85 [SWDEV-532904] CLI lists unusable UUID without sudo (#510)
Signed-off-by: AL Musaffar, Yazen <Yazen.ALMusaffar@amd.com>
2025-07-09 15:45:03 -05:00
josnarlo 0257140504 [SWDEV-536953] Align Power Cap Behavior with ROCM_SMI
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-07-09 15:37:40 -05:00
Castillo, Juan 34f465bfc5 [SWDEV-531904] Removed Handle Exceptions function (#531)
Removed:
- handle_exceptions() Exposes, silences, and logs AMDSMI exceptions to users returns success/failure

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
2025-07-07 13:26:26 -05:00
Kanangot Balakrishnan, Bindhiya ce230efaaa [SWDEV-537852] Update process name help text (#517)
* [SWDEV-537852] Update process name help text

Currently process name displays N/A if that need elevated
permissions. Updated the default amd-smi, process and monitor
commands help texts to display elevated permission requirement.

---------

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2025-07-07 11:26:10 -05:00
Poag, Charis 88473b7fd0 [SWDEV-533305] Remove partition info from amd-smi static (-p/--partition still available) + CLI API call cleanup (#529)
Updates:
- Separate extra APIs calls from amd-smi CLI to target specific CLI commands that need them.
- Remove extra current_compute_partition SYSFS calls from amd-smi static.
- Remove the partition information from the default `amd-smi static` CLI command.
- Users must now use the `-p` argument to view partition information with `amd-smi static`.
- The help text for the `partition` argument has been updated to reflect this change.
- The partition information can still be accessed using the `amd-smi partition -c -m` or `sudo amd-smi partition -a` commands.

---------

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-07-07 11:21:46 -05:00
Park, Peter 8039ab9449 Fix links in docs (#532)
* fix links in amdsmi_cli/README.md
* fix xrefs to install docs
* rm rocm-smi examples and add cli tutorial
* rm disclaimer and add amd smi contributing guidelines to index

Signed-off-by: Peter Park <Peter.Park@amd.com>
2025-07-07 11:18:40 -05:00
Narlo, Joseph 2cf6272b53 [SWDEV-541675] Remove Unnecessary API from amdsmi.h (#530)
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-07-07 11:14:27 -05:00
Saeed, Oosman 5b95d227bc [SWDEV-538308] CPER CLI 20 limit bug (#499)
The bug was reproduced like this.

In terminal #1, run command:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

In terminal #2, inject errors:
while true; do sudo amdgpuras -b 7 -s 1 -m 6 -t 2; sleep 2; done

The terminal #1 starts dumping cper entry information that it captures. After 20 entries have been captured, open terminal #3 and run same command as terminal #1:
sudo amd-smi ras --cper --gpu 6 --severity all --folder /tmp/cper_dump --follow 

From terminal #3, there will be no output, even when terminal #1 continues capturing and printing information.

The fix:

Since we already have more than 20 CPER entries available in the GPU buffer, when we run the command from terminal #3 to start capturing from the beginning and pass 20 buffers to copy entries to, the C++ API returns a code saying there is more data available.

The Python CLI should not treat this as an error, but should continue to print what the API returned.

---------

Signed-off-by: Oosman Saeed <oossaeed@amd.com>
2025-07-07 11:11:13 -05:00
Cheruvally, Aravindan f559075a81 [SWDEV-530465] Update share/doc/<pkgnm> License Folder (#516)
Update share/doc/ folder for license/docs to reflect correct package name.
Signed-off-by: Cheruvally, Aravindan <Aravindan.Cheruvally@amd.com>
2025-07-03 02:07:54 -05:00
gabrpham_amdeng a2885d6e70 [SWDEV-539451] Adjusted reset command to prevent reset on partitions
Signed-off-by: gabrpham_amdeng <Gabriel.Pham@amd.com>
2025-07-03 01:11:46 -05:00
Kanangot Balakrishnan, Bindhiya 80f7045f61 [SWDEV-530646] Update changelog for topology optimization (#523)
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-06-30 17:36:14 -05:00
Justin Williams abd3bf2dcf CI - Changed CI Runners
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
2025-06-30 14:23:43 -05:00
Bindhiya Kanangot Balakrishnan fa9ca21520 [SWDEV-540014] Correct topology link_type check
Topology numa_bw checks for non-xgmi links to set as N/A.
The recent change in link_type enum mapping caused this
condition to check for PCIE instead of XGMI. Corrected
the same.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-06-30 14:01:19 -05:00
Jeremy Newton 529c6ee151 Don't install asan docs if disabled
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2025-06-30 12:05:29 -05:00
Williams, Justin 738627af29 Fixed NoDRM Failures
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Williams, Justin <Justin.Williams@amd.com>
2025-06-25 13:18:25 -05:00
Justin Williams bad4868f39 Fixed NoDRM Failures
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
2025-06-25 13:18:25 -05:00
josnarlo 5858d643f3 [SWDEV-539912] Add Skipping to Unit Tests
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-06-24 12:01:32 -05:00
Bindhiya Kanangot Balakrishnan c3453f7c97 [SWDEV-530646] Reduce amdsmi_topo_get_p2p_status calls in topology
The topology method calls amdsmi_topo_get_p2p_status repeatedly
for the same GPU pairs across different table sections,
significantly impacting performance with 60+ GPUs. Reduced this
by implemeting result caching.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-06-24 11:27:28 -05:00
Maisam Arif 2d2e5fe692 [SWDEV-533390] Removed kfd_ioctl.h from being copied on install
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I03cb03b5f034e822c8f3c2d1e11e8b4e57251905
2025-06-20 14:32:16 -05:00
josnarlo d8b8dc4116 [SWDEV-539591] Allow integration tests to skip Not Supported APIs
Signed-off-by: josnarlo <Joseph.Narlo@amd.com>
2025-06-20 14:19:56 -05:00