Граф коммитов

1348 Коммитов

Автор SHA1 Сообщение Дата
Charis Poag d323ecff97 [SWDEV-502744] Fix "amd-smi monitor" shows VCN ENC utilization & clock but not VCN DEC
Reason for this fix:
Navi products use vclk and dclk for both encode and decode.
On MI products, only decode is supported.
Navi products cannot support displaying ENC_UTIL % at this time.

Change-Id: I107bb761794ae4724949ac21c110b23a4f616700
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-12-07 12:11:10 -05:00
gabrpham bd01cfc203 Fixed post reset and ring_hang issues
Issues include:
	SWDEV-480250
	SWDEV-480255
	SWDEV-480248
Known issue:
	`amd-smi event` has threads taking events from the same device
which, in the case of resetting gpus, makes it seem like some gpus have
reset mulitple times and other have not reset at all.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7dcc214e0366fc1532ece579d915d34d35d5407
2024-12-06 17:46:00 -05:00
Bindhiya Kanangot Balakrishnan 1586005a5b [SWDEV-457845] Error code unification for amd-smi set
Earlier amd-smi set was returning different outputs in Linux
and Windows. In Linux it was returning ValueError. As part of
Error Code unification, corrected this output message.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: Iba9ddd9c5b2bed0456f303e4373f6771c93608be
2024-12-06 14:21:31 -05:00
Justin Williams 2c24cab86c [SWDEV-502001] Added amd_hsmp.h locally
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: I28e48913743f86fb5fc9082307ec326830d55960
2024-12-05 17:02:48 -05:00
Maisam Arif bc3ac61641 Added gpu_metrics table debug logs in monitor
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8aa96629a65df7a2d52ef9ed42a884732d097a54
2024-12-05 15:18:13 -06:00
Joe Narlo 547db10384 SWDEV-502330 [AMD-SMI][Unified Header] Convert struct to typedef struct
Change struct to a typedef struct

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I6f3b22a5219c0db0aab2c308b71213ae75334476
2024-12-04 09:14:05 -05:00
Justin Williams 2370aa1b40 [SWDEV-469278] Removed PyYAML Dependency
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: Idec32cfb0de84cc255b506d7f972e2750992745e
2024-12-03 15:40:44 -05:00
Bindhiya Kanangot Balakrishnan bc77330a74 [SWDEV-499030] Fix truncated FRU_ID
The FRU_ID was truncated because the string copied from sysfs
was limited to 32 characters. This limit has been increased to
AMDSMI_MAX_STRING_LENGTH to accommodate longer FRU_IDs. Also
updated the deprecated string length macros.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: I8becaf9f37609b2e5aecdf92b6ae60f4419ad8ef
2024-12-03 13:43:53 -06:00
Bindhiya Kanangot Balakrishnan fc7e1ddb4a [SWDEV-498507] Tool amd-smi could be more case insensitive
Modified amdsmi_cli to accept case insensitive arguments if
the argument does not start with a single dash(-).

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Change-Id: I1b6320db0afaad0900d5a2049206002c3899fa71
2024-12-02 18:09:45 -05:00
Maisam Arif 664ade7354 [SWDEV-502001] Fix link for amd_hsmp.h
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I402ee539cdd4c896acd7ccc83f3090c3a5eeba12
2024-12-02 16:30:06 -06:00
Charis Poag 7d061f9ae4 [SWDEV-499029] Fix unable to change memory partition modes
Changes:
  * [API] Removed checking board name, fixes for other MI ASICs
  * [API] Fixed unable to restart AMD GPU, libdrm blocked
    doing this operation
  * [API] Added ability to unload/reload libdrm
    from within AMD SMI APIs
  * [CLI] Increased progress bar to change memory partition modes
    to 140 seconds, since driver reload is variable per system

Change-Id: I52f227f2ab850c4a6332ff3ecdc899903b1080f1
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-25 09:28:02 -05:00
Joe Narlo 35d8e827b9 SWDEV-497305 [AMDSMI] Consistent string lengths
Unify max string length to AMDSMI_MAX_STRING_LENGTH 256
Replace AMDSMI_NORMAL_STRING_LENGTH, AMDSMI_256_LENGTH

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: Ia81d738be0eefb9683ee53d51c969598fe587f50
2024-11-22 15:37:24 -05:00
Joe Narlo 3052ad4220 SWDEV-495787 [AMDSMI] Different license headers
Change copyrights to MIT and remove date

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I16f5b412f2b9ddefaaa1771aa714cc18829a1be4
2024-11-22 08:55:28 -05:00
gabrpham 50eaf14b9e [SWDEV-498453] Enabled 'amd-smi set --clk-limit' for virtual environments
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I23e994502d4abc1a602d2341e77ad9c50fcf4839
2024-11-19 16:17:29 -06:00
gabrpham fc9d18dd3e [SWDEV-498453] Enabled for virtual environments
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Ic7b274cf8e579b733515efe84fc0f325256ef8b1
2024-11-18 11:57:04 -05:00
Maisam Arif ed58196e35 Revert "[SWDEV-446215] Update cmake to put test libs in proper lib dir"
This reverts commit 6e01df00ca.

Reason for revert: Incorrect Path

Change-Id: I88bb304cfab997460a916e1a130fdb75435c648b
2024-11-18 11:15:22 -05:00
Adam Pryor b7789d4699 Revert "[SWDEV-446215] Update cmake to put test libs in proper lib dir"
This reverts commit 6e01df00ca.

Reason for revert: Because the gtest of amdsmi is different to other components so it was installed in a share/amdsmi/lib folder. It cannot be installed in a common folder such as /usr/local/bin or /usr/bin because all other components try to search those folder first.

 

This is breaking ROCmValidationSuite and other tools. Per Wang, Yanyao this should be reverted.

Change-Id: Id61bc6056fe41800e738616f39293e9b8762a377
2024-11-15 15:08:12 -05:00
Maisam Arif f1c3fbf226 Updated CLI exceptions
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I5c68eed7719c093727afa434e25ba2560dde894a
2024-11-15 11:44:51 -05:00
Maisam Arif afd06950c1 Revert "SWDEV-489696 [AMD SMI] Update python integration test"
This reverts commit 06e7bf8a98.

Reason for revert: Changes needed

Change-Id: I96cc956a2f1c73a2828c70ec9aa22931ba570d8f
2024-11-14 18:54:48 -05:00
Joe Narlo 06e7bf8a98 SWDEV-489696 [AMD SMI] Update python integration test
Initial update

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: I7c5777159f591f8b402168576b14ef8c1157e8d9
2024-11-14 17:52:01 -05:00
Maisam Arif dfcf5b4ae5 Corrected pyyaml debian package name
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ice1541b4c1fc2297ee8bef5a7c7336c93267e01a
2024-11-14 14:42:50 -06:00
Justin Williams d3d6157854 [SWDEV-492047] Removed setup.cfg.in
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: I97b14d05b17fefbb87368824f57bc4ab690f1bf0
2024-11-13 12:45:09 -05:00
Peter Park cbfe403b1d remove duplicated changelog
black format docs/conf.py
add seealso to python api reference

Change-Id: I60fa754f0af662669282dc90eea4b7dc5c5030cc
Signed-off-by: Peter Park <peter.park@amd.com>
2024-11-13 11:46:47 -05:00
Charis Poag 3ea4a42a6e [SWDEV-488276/SWDEV-497613] Update memory partition set functionality
Changes:
  - [CLI] Added warning screen to AMD SMI users
    setting memory partition
  - [CLI] Added a progress bar time-bar for CLI sets display to 40 seconds
  - [API] Updated to wait until the driver reloads with SYSFS files active
  - [CLI] Now users can set or reset without providing:
    amd-smi set -g all <set arguments>
    or amd-smi reset -g all <set arguments>
    now can directly call -> sudo amd-smi set <set arguments>
    or sudo amd-smi reset <set arguments>
  - [SWDEV-475712][CLI/API] Fixed target_graphics_version field
    not properly displaying for older MI or Navi ASICs.
  - [All APIs] Added a catch for the driver to report invalid arguments
    now these APIs will show AMDSMI_STATUS_INVAL
    (ex. changing to NPS8 if the device does not support it)
  - [Install] Modified paths for Python install commands to support
    multi-ROCm installs

Change-Id: Id11f25d68a82d23c6b2d77ccb30b51e860dd0ca7
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-12 16:50:32 -04:00
gabrpham 19cc4718c0 Documented and adjusted APIs for asic info, vram info, and P2P topology
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I7ac9a868148e29c92299b21540e057f64cb4123e
2024-11-11 20:45:37 -05:00
gabrpham 4d26db84ca Documented and adjusted python apis for pm metrics and reg table info
* amdsmi_get_gpu_pm_metrics_info and amdsmi_get_gpu_reg_table_info
were added to python api documentation
* AmdSmiRegType added as enum
* amdsmi_get_gpu_reg_table_info reg_type changed to AmdSmiRegType

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I57239ecf048e82226151db071e8d9299e9182647
2024-11-11 20:45:37 -05:00
gabrpham 2273d95a6c [SWDEV-492739] Partial fix for sclk min/max out of bounds
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I1f0230955c890c11a735c8cb352c8a9ee4cebe27
2024-11-11 20:45:37 -05:00
Maisam Arif 4b511a31e1 Bump Version to 24.7.1.0
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I0fc42fe55cb653102d189db9aa5eaf723280170e
2024-11-11 19:23:20 -06:00
gabrpham 0f067488e1 updated cli tool examples doc to reflect current CLI
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: Iab78a412464ba6d7919aeb7da04a031b063a7d09
2024-11-11 17:12:40 -05:00
Maisam Arif 7932de967a Updated parser help text
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I8cc65edb1e629a55e0efbfc1109b1c549ed81101
2024-11-11 15:07:21 -06:00
Peter Park e196f98dba docs: Remove redundant/stale docs
bump rocm-docs-core to 1.8.2

rm unused files

rm stale docs

fix sphinx conf

reorg docs

SWDEV-482203 -- add note to usage guides

update readmes

Change-Id: I9e0111ac8fe2a691ac964b27436ba47747c27904
Signed-off-by: Peter Park <Peter.Park@amd.com>
2024-11-11 16:49:17 -04:00
Maisam Arif 6e843436f5 Updated amdsmi_get_energy_count() C API documentation
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Iac75a0dcd583f39eb97aada769c736c3305cc8a2
2024-11-08 16:37:10 -05:00
Maisam Arif 5449d78cc4 Adjusted private helper variables
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I0590b9ee5a1b4d5e6d4ae71c9587550c8d95033b
2024-11-08 11:25:50 -06:00
Maisam Arif abee26d4ab Added ras and ecc counting back to Linux VMs
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: Ie981f7fe8f481f2137e95dda2e200d00ab4d92c8
2024-11-08 11:05:15 -06:00
Peter Park 31821cb585 Mod changelog to fit internal standard
Change-Id: Id90136f16f15a30b2791ed0634a408a7eb73f96f
2024-11-08 11:57:14 -05:00
gabrpham 27996aef18 [SWDEV-495985] Changed ACCELERATOR_TYPE default value.
Default value changed from 0 to "N/A".
	Actual values for all fields will be filled out in later API
update.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I69b08fff894a032ef79301754807ed4b5c85257f
2024-11-07 21:22:28 -05:00
gabrpham 4effd48fe2 [SWDEV-489060] Added python3-setuptools and wheel as prereqs in README.
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I51cf938033d746bd6c255d518d7e0d3a87296be4
2024-11-07 14:42:04 -04:00
Charis Poag 7fc4b853d4 [SWDEV-495305] Fix AttributeError: 'Namespace' object has no attribute 'compute_partition'
Changes:
   - [CLI] Earlier we removed compute & memory partition resets,
     this fix changes back to the correct spacing for
     reset commands

Change-Id: I707ff197baf7a32bfb7ef20f2b26a63acd13f08a
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-11-05 18:49:08 -05:00
Maisam Arif 2678e1f3f7 [SWDEV-492031] Update Market Names
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Change-Id: I13c2047fd8c7af0dc566f88a3cac8b365697a092
2024-11-05 17:52:02 -04:00
Jorge López 172a3e233b Updates driverInitialized() to support amdgpu built as module as well as kernel built-in. Fixes ROCm/rocm_smi_lib#102 and is an updated version of ROCm/rocm_smi_lib#104
Change-Id: Icb3abe820bc67035b822358a1c04bd09a7c22b6b
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Reviewed-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2024-11-05 16:30:34 -05:00
adapryor 02cbffb42a [SWDEV-412505] Handle mclk permission errors as not supported
Change-Id: Idb3eeed76ff55c507f28b5e692f8704704c3e46e
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-10-31 17:40:34 -04:00
Joe Narlo 54462ab447 SWDEV-495316 [AMDSMI] In amdsmi.h, change typedef amdsmi_accelerator_partition_profile_t to match definition in Confluence
Move memory_caps defintion and correct the number in reserved to match Confluence

Signed-off-by: Joe Narlo <Joseph.Narlo@amd.com>
Change-Id: Id94144f4b3d2d3d7b4d7327211ffc1957ffd0a93
2024-10-31 12:48:48 -04:00
adapryor 6e01df00ca [SWDEV-446215] Update cmake to put test libs in proper lib dir
Change-Id: I2e91b904b3f869cdba717d872c10d799d0260c30
Signed-off-by: adapryor <Adam.pryor@amd.com>
2024-10-29 16:07:58 -04:00
Charis Poag 0ceca28f41 [SWDEV-463406] Update sample rate + align metric output
Changes:
- Corrected max speed users can sample from FW/driver
  is 100 ms
- Added warning to amdsmi_get_violation_status()
  call on delay required 100ms to sample
- Removed guest support, this API will not be supported
- Updated CLI `amd-smi metric --throttle` outputs from
    XXX_active -> XXX_status
    XXX_percent -> XXX_activity
  to align with host
- Changelog updated

Change-Id: Ib30dd35dcc04ff67904ca82c86a55a16689df226
Signed-off-by: Charis Poag <Charis.Poag@amd.com>
2024-10-23 17:36:35 -04:00
gabrpham 00b3184e9f SWDEV-478748 Changed TestPciReadWrite Test Failure message to Warning
TEST FAILURE message for `amdsmi_get_gpu_cpi_throughput` and
`amdsmi_get_gpu_pci_bandwidth` changed to WARNING to indicate that
pcie_bw and/or pp_dpm_pcie sysfs files may not be supported on respetive
devices.

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I1ad6e15eceacb5a00b022458ee5fb21df9d845c7
2024-10-18 16:32:57 -05:00
gabrpham f5b7761ac7 [SWDEV-490187] reset gpu partition were removed
The reset gpu partition support for both compute and memory were removed

Code changes related to the following:
  * amdsmi_reset_gpu_compute_partition()
  * amdsmi_reset_gpu_memory_partition()
  * CLI

Change-Id: I372589074b4da172bedd39223edde18939e373ae
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-10-18 16:22:26 -05:00
Justin Williams 2e5b164c43 [SWDEV-482058 / SWDEV-482971] Added setup.py install
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
Change-Id: Ibad07d34dfb455043ce307fe036289f1d5c20a9a
2024-10-18 16:59:13 -04:00
Oliveira, Daniel 25bcf6af2a [SWDEV-488526] BI-Direction Table mismatch
Implements DiscoverIOLinkPerNodeDirection() based on KFD Node infrastructure;
'/kfd/topology/nodes/*/io_links'

Code changes related to the following:
  * Internal implementation

Change-Id: Iccd84d1d69234dbeae4d4925f657e7e3bd801106
Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2024-10-17 15:27:09 -04:00
gabrpham 27b5a35d65 [SWDEV-488846] Removed '--ecc' option from 'amd-smi monitor' when platform is VM
Change-Id: I8f5d7771cbfac3fe5f52dbccbd9f28020adb5f6f
2024-10-16 10:34:19 -04:00
gabrpham eb9116e8c2 [SWDEV-486872] Removed '--ras' from static command when platform is VM
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Change-Id: I0b03f168d7011428cfea3ab303865f4eaeea78ac
2024-10-16 09:29:24 -05:00