Граф коммитов

320 Коммитов

Автор SHA1 Сообщение Дата
Arif, Maisam 703415cb1f Updated Import Error Logging
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ief4a5f100f54668c5bce001ea051136738fbc468
2025-01-28 15:56:49 -06:00
Kanangot Balakrishnan, Bindhiya e3e11835e4 [SWDEV-508042] Fix TypeError in specific clocks csv logging (#57)
Logging specific clocks in csv format was causing TypeError as the levels were int.
Fixed this by appending Level string at the beginning.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-22 18:06:13 -06:00
Pham, Gabriel b779ce2831 [SWDEV-493207] Added amdgpu version to version command
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2025-01-22 18:05:25 -06:00
Kanangot Balakrishnan, Bindhiya 834993e1c3 SWDEV-457845: Fix Linux VM clean_local_data error on set
Corrected clean_local_data error in Linux VM's while doing
amd-smi set without args.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-20 14:45:27 -06:00
Poag, Charis c1cd2b46ef [SWDEV-488276] Add partition 2.0 functionality (#44)
Changes:
* CLI:
  - Updated amd-smi partition
  - Updated amd-smi partition -c
  - Updated amd-smi partition -m
  - Updated amd-smi partition -a
  - Updated amd-smi set -M <NPS1/NPS2/NPS4/NPS8>
  - Updated amd-smi set -C <SPX/DPX/QPX/TPX/CPX>
  - Updated amd-smi set -C <ACCELERATOR_TYPE> or <PROFILE_INDEX>
    Where PROFILE_INDEX = available ACCELERATOR_TYPES
  - Updated amd-smi set --help, now includes more detail for
    amd-smi set -C <ACCELERATOR_TYPE> or <PROFILE_INDEX>

* API:
  - Added amdsmi_get_gpu_memory_partition_config
  - Added amdsmi_set_gpu_memory_partition_mode
  - Added amdsmi_get_gpu_accelerator_partition_profile_config
  - Updated amdsmi_get_gpu_accelerator_partition_profile_config
  - Added amdsmi_set_gpu_accelerator_partition_profile

Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2025-01-16 00:53:46 -06:00
Scaffidi, Salvatore 3793be7735 [SWDEV-463406] Update API with fields for gfx_clock_below_host_limit and low_utilization violations
Updated API with fields for gfx_clock_below_host_limit and low_utilization violations
Change-Id: I25647bae6e7b785f44dab024272767658688bcad

---------
Signed-off-by: Scaffidi, Salvatore <Salvatore.Scaffidi@amd.com>
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
Co-authored-by: Charis Poag <Charis.Poag@amd.com>
2025-01-08 22:07:23 -06:00
Kanangot Balakrishnan, Bindhiya d0e770ffbc SWDEV-504130 Add temperature violation status to amd-smi monitor (#2)
Added boolean temperature violation status to amd-smi monitor.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-08 16:35:53 -06:00
Pham, Gabriel 129ad8ffad [SWDEV-502523] Made amd-smi reset command arguments mutually exclusive
Made reset arguments mutually exclusive so that users can only 
select one option at a time to prevent throwing of errors.

---------
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2025-01-08 16:24:05 -06:00
Kanangot Balakrishnan, Bindhiya 3897670757 [SWDEV-439701] Fix wrong error handling in MissingParameterValue (#32)
Error handling was not displaying the missing parameter details in
argument type validator functions. Fixed this by passing param name to
AmdSmiMissingParameterValueException.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-01-07 17:13:00 -06:00
Pham, Gabriel 5ed340c08b [SWDEV-502523] made set gpu arguments mutually exclusive (#31)
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2025-01-07 16:48:01 -06:00
Pham, Gabriel 93a027ec95 [SWDEV-476303] Exposed valid values for set command (#8)
Updated amd-smi set help text
---------

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Signed-off-by: Pham, Gabriel <Gabriel.Pham@amd.com>
2024-12-20 15:32:10 -06:00
gabrpham 23da950ef0 Additional fixes for amd-smi static --clock
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2024-12-20 14:45:20 -06:00
Charis Poag 3226a1d0ea [SWDEV-484382] Fix VCLK/DCLK outputs for monitor, static, metric
Units were off and VCLK/DCLK outputs were not coming in
properly through amdsmi_get_clk_freq()

Now we match units sent back through rsmi_dev_gpu_clk_freq_get (MHz).

CLI now shows maximum of 2 VCLK/DCLKs otherwise shows N/A if there
is no current_freq listed.

Change-Id: I8a7b66cbb5263e8d396f8568c104e1ce3512923d
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-20 14:11:08 -06:00
Juan Castillo f8b8347627 [SWDEV-496693]GPU Metrics 1.7
Features added:
- [SWDEV-475244] Add new interface to get max memory bandwidth
Updated API: amdsmi_get_gpu_vram_info
Updated: struct amdsmi_vram_info_t to include vram_max_bandwidth
CLI: amd-smi static --vram

- [SWDEV-488349] Add new interface for XGMI link status
New API: amdsmi_get_gpu_xgmi_link_status
CLI: amd-smi xgmi --link-status

Signed-off-by: Juan Castillo <juan.castillo@amd.com>
Change-Id: I1aa35b741136eb4f02f7ea9a95b865886273eb72
2024-12-18 10:57:06 -06:00
gabrpham fe290a2056 [SWDEV-484382] Added fclk and socclk to amd-smi metric -c
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ie7e19c757b05455693c0d26eeb5e8b6c1e238375
2024-12-13 00:33:12 -05:00
gabrpham 5f9c2db6f3 [SWDEV-484382] Added new command amd-smi set -c/--clk-level
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: If45152e3a3c94f65b6a8a960601b9ed16fa3d0d7
2024-12-13 00:32:19 -05:00
gabrpham bc16e1a5da [SWDEV-484382] Added new command amd-smi static --clock
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I49e1aa2e699734d81c40c76c62da1cecc5bd3c0e
2024-12-13 00:30:29 -05:00
Charis Poag 57f45954b7 Fix amd-smi firmware not printing YAML-like dictionary correctly
List string should take into account dictionary value types

Change-Id: Icc08288cb0007d43eacd1aff6d44c40a84ea9448
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-11 10:48:43 -05:00
Maisam Arif 554203c13a Fixed spacing in amd-smi --xgmi
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I9fbd20c50a25aa3be80c8aa68eea37b81a74dc67
2024-12-10 15:45:06 -05:00
Charis Poag bc0015fd36 [SWDEV-488288] Remove GFX_BUSY_ACC from amd-smi metric --usage
Output is not helpful to users.

Change-Id: I12a60e28b8eab2fc3ffca4ea88f03018bf0ef3ce
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-10 13:37:36 -05:00
Bindhiya Kanangot Balakrishnan 288b11df37 [SWDEV-496639] Align amd-smi xgmi statistics
The xgmi read and write values were displayed in KB. The numbers became
unreadable due to misalignment. So, converted read and write values to
readable units using helper function. Updated Changelog.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: I4c90a1de8a58c29cbdf43fe3480a1546f3946673
2024-12-09 12:57:45 -05:00
Charis Poag d323ecff97 [SWDEV-502744] Fix "amd-smi monitor" shows VCN ENC utilization & clock but not VCN DEC
Reason for this fix:
Navi products use vclk and dclk for both encode and decode.
On MI products, only decode is supported.
Navi products cannot support displaying ENC_UTIL % at this time.

Change-Id: I107bb761794ae4724949ac21c110b23a4f616700
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-07 12:11:10 -05:00
gabrpham bd01cfc203 Fixed post reset and ring_hang issues
Issues include:
	SWDEV-480250
	SWDEV-480255
	SWDEV-480248
Known issue:
	`amd-smi event` has threads taking events from the same device
which, in the case of resetting gpus, makes it seem like some gpus have
reset mulitple times and other have not reset at all.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7dcc214e0366fc1532ece579d915d34d35d5407
2024-12-06 17:46:00 -05:00
Bindhiya Kanangot Balakrishnan 1586005a5b [SWDEV-457845] Error code unification for amd-smi set
Earlier amd-smi set was returning different outputs in Linux
and Windows. In Linux it was returning ValueError. As part of
Error Code unification, corrected this output message.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: Iba9ddd9c5b2bed0456f303e4373f6771c93608be
2024-12-06 14:21:31 -05:00
Maisam Arif bc3ac61641 Added gpu_metrics table debug logs in monitor
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8aa96629a65df7a2d52ef9ed42a884732d097a54
2024-12-05 15:18:13 -06:00
Justin Williams 2370aa1b40 [SWDEV-469278] Removed PyYAML Dependency
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: Idec32cfb0de84cc255b506d7f972e2750992745e
2024-12-03 15:40:44 -05:00
Bindhiya Kanangot Balakrishnan fc7e1ddb4a [SWDEV-498507] Tool amd-smi could be more case insensitive
Modified amdsmi_cli to accept case insensitive arguments if
the argument does not start with a single dash(-).

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: I1b6320db0afaad0900d5a2049206002c3899fa71
2024-12-02 18:09:45 -05:00
Charis Poag 7d061f9ae4 [SWDEV-499029] Fix unable to change memory partition modes
Changes:
  * [API] Removed checking board name, fixes for other MI ASICs
  * [API] Fixed unable to restart AMD GPU, libdrm blocked
    doing this operation
  * [API] Added ability to unload/reload libdrm
    from within AMD SMI APIs
  * [CLI] Increased progress bar to change memory partition modes
    to 140 seconds, since driver reload is variable per system

Change-Id: I52f227f2ab850c4a6332ff3ecdc899903b1080f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-25 09:28:02 -05:00
Joe Narlo 3052ad4220 SWDEV-495787 [AMDSMI] Different license headers
Change copyrights to MIT and remove date

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I16f5b412f2b9ddefaaa1771aa714cc18829a1be4
2024-11-22 08:55:28 -05:00
gabrpham 50eaf14b9e [SWDEV-498453] Enabled 'amd-smi set --clk-limit' for virtual environments
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I23e994502d4abc1a602d2341e77ad9c50fcf4839
2024-11-19 16:17:29 -06:00
gabrpham fc9d18dd3e [SWDEV-498453] Enabled for virtual environments
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7b274cf8e579b733515efe84fc0f325256ef8b1
2024-11-18 11:57:04 -05:00
Maisam Arif f1c3fbf226 Updated CLI exceptions
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5c68eed7719c093727afa434e25ba2560dde894a
2024-11-15 11:44:51 -05:00
Charis Poag 3ea4a42a6e [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - [CLI] Added warning screen to AMD SMI users
    setting memory partition
  - [CLI] Added a progress bar time-bar for CLI sets display to 40 seconds
  - [API] Updated to wait until the driver reloads with SYSFS files active
  - [CLI] Now users can set or reset without providing:
    amd-smi set -g all <set arguments>
    or amd-smi reset -g all <set arguments>
    now can directly call -> sudo amd-smi set <set arguments>
    or sudo amd-smi reset <set arguments>
  - [SWDEV-475712][CLI/API] Fixed target_graphics_version field
    not properly displaying for older MI or Navi ASICs.
  - [All APIs] Added a catch for the driver to report invalid arguments
    now these APIs will show AMDSMI_STATUS_INVAL
    (ex. changing to NPS8 if the device does not support it)
  - [Install] Modified paths for Python install commands to support
    multi-ROCm installs

Change-Id: Id11f25d68a82d23c6b2d77ccb30b51e860dd0ca7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-12 16:50:32 -04:00
gabrpham 2273d95a6c [SWDEV-492739] Partial fix for sclk min/max out of bounds
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I1f0230955c890c11a735c8cb352c8a9ee4cebe27
2024-11-11 20:45:37 -05:00
gabrpham 0f067488e1 updated cli tool examples doc to reflect current CLI
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Iab78a412464ba6d7919aeb7da04a031b063a7d09
2024-11-11 17:12:40 -05:00
Maisam Arif 7932de967a Updated parser help text
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8cc65edb1e629a55e0efbfc1109b1c549ed81101
2024-11-11 15:07:21 -06:00
Peter Park e196f98dba docs: Remove redundant/stale docs
bump rocm-docs-core to 1.8.2

rm unused files

rm stale docs

fix sphinx conf

reorg docs

SWDEV-482203 -- add note to usage guides

update readmes

Change-Id: I9e0111ac8fe2a691ac964b27436ba47747c27904
Signed-off-by: Peter Park <Peter.Park@amd.com>
2024-11-11 16:49:17 -04:00
Maisam Arif 5449d78cc4 Adjusted private helper variables
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I0590b9ee5a1b4d5e6d4ae71c9587550c8d95033b
2024-11-08 11:25:50 -06:00
Maisam Arif abee26d4ab Added ras and ecc counting back to Linux VMs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ie981f7fe8f481f2137e95dda2e200d00ab4d92c8
2024-11-08 11:05:15 -06:00
gabrpham 27996aef18 [SWDEV-495985] Changed ACCELERATOR_TYPE default value.
Default value changed from 0 to "N/A".
	Actual values for all fields will be filled out in later API
update.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I69b08fff894a032ef79301754807ed4b5c85257f
2024-11-07 21:22:28 -05:00
Charis Poag 7fc4b853d4 [SWDEV-495305] Fix AttributeError: 'Namespace' object has no attribute 'compute_partition'
Changes:
   - [CLI] Earlier we removed compute & memory partition resets,
     this fix changes back to the correct spacing for
     reset commands

Change-Id: I707ff197baf7a32bfb7ef20f2b26a63acd13f08a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-05 18:49:08 -05:00
Charis Poag 0ceca28f41 [SWDEV-463406] Update sample rate + align metric output
Changes:
- Corrected max speed users can sample from FW/driver
  is 100 ms
- Added warning to amdsmi_get_violation_status()
  call on delay required 100ms to sample
- Removed guest support, this API will not be supported
- Updated CLI `amd-smi metric --throttle` outputs from
    XXX_active -> XXX_status
    XXX_percent -> XXX_activity
  to align with host
- Changelog updated

Change-Id: Ib30dd35dcc04ff67904ca82c86a55a16689df226
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-10-23 17:36:35 -04:00
gabrpham f5b7761ac7 [SWDEV-490187] reset gpu partition were removed
The reset gpu partition support for both compute and memory were removed

Code changes related to the following:
  * amdsmi_reset_gpu_compute_partition()
  * amdsmi_reset_gpu_memory_partition()
  * CLI

Change-Id: I372589074b4da172bedd39223edde18939e373ae
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-10-18 16:22:26 -05:00
gabrpham 27b5a35d65 [SWDEV-488846] Removed '--ecc' option from 'amd-smi monitor' when platform is VM
Change-Id: I8f5d7771cbfac3fe5f52dbccbd9f28020adb5f6f
2024-10-16 10:34:19 -04:00
gabrpham eb9116e8c2 [SWDEV-486872] Removed '--ras' from static command when platform is VM
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I0b03f168d7011428cfea3ab303865f4eaeea78ac
2024-10-16 09:29:24 -05:00
Maisam Arif 9a0d56fea8 [SWDEV-491466] Fix throttle metrics CLI on VM
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I41166df4d155ec1d7d5f30b51dd9e0e02e655eb9
2024-10-16 09:14:25 -05:00
Maisam Arif 27a48e69d8 Corrected clean local data partition indexing
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ib0eeb065f160fccd3c3f4a2d13f0869af01a74ae
2024-10-10 10:54:45 -05:00
Maisam Arif 4fcf281f1d [SWDEV-447451] Fix attribute error for set/reset on Linux Guest
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5d55bef44d2eea75c33ba489a57544976900c4a4
2024-10-09 12:59:19 -05:00
Charis Poag 5eff39915b [SWDEV-463406] Add volation_status current counter/accumulated values
Changes:
  - amdsmi_violation_status_t now includes current accumulated/counter
   values
  - Tests/wrapper now include added values
  - Removed ASIC references in header for host/bm alignment
  - Fix violation_status->per_hbm_thrm /
    violation_status->active_hbm_thrm
    calculations.

Change-Id: Ic86a7cbad5198a41018f82f6b588b83158d9ba0b
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-10-04 15:56:01 -04:00
Maisam Arif 30f6a114e1 [SWDEV-488819] - Backward Compatibility Disclaimer
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8b00d2009e3d01da134ac21ddcb0994357d76a54
2024-10-01 14:57:23 -05:00