Граф коммитов

119 Коммитов

Автор SHA1 Сообщение Дата
Hao Zhou 34f4b63853 Merge amd-staging into amd-master 20221212
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I555da02f58185a9eca00954755c2bfd8e418e153
2022-12-12 16:25:01 +08:00
kent.russell@amd.com 248c6f79f4 rocm_smi.py: Fix order of CE and UE reporting
We append CE then UE, but in the table right after, it goes UE then CE.
Fix the order of the table, and add capitals for consistency

Change-Id: I208f37685508ab1e2ff83d3456620bbbf3a16268
2022-12-08 12:28:37 -05:00
Hao Zhou 0f02a3a272 Merge amd-staging into amd-master 20220909
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ic8bbdad24b0671f6b77543daa9656f5c3662c2c8
2022-09-09 09:26:33 +08:00
Alex Sierra 03fab6b2b6 Consider invalid peer link type during topology report
Invalid peer links are labeled as N/A during topology report creation.
This invalid link type could be triggered by having a configuration
with CPU XGMI iolinks and disable XGMI peer to peer access. This can
be done by setting the driver parameter 'use_xgmi_p2p = 0'.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ifb09a8f3266a3f07686615dfb45781d6cfe55e83
2022-09-06 13:47:32 -05:00
Hao Zhou 1efd6ee29c Merge amd-staging into amd-master 20220901
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ic59465c248f96a74d20226810c9ae98360797e34
2022-09-01 09:54:40 +08:00
Ori Messinger dfd88b593f ROCm SMI CLI: Modify Column Header
The purpose of this patch is to modify the column header of the default
'./rocm-smi' command from 'Temp' to 'Temp (DieEdge)' for clarity.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I127a9214be97a1185c3db010f1c9176d1f412ec9
2022-08-31 09:47:14 -04:00
Hao Zhou a5e286d250 Merge amd-staging into amd-master 20220826
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ibef408e62669ec105571e605f333642cebc33112
2022-08-26 13:45:55 +08:00
Elena Sakhnovitch 8b2bc318eb [rocm_smi.py] bugfix for non-alphanum parce issue
--showdeviceid
Fix for false-positive  "FRU is corrupted" messages,
since str(sn).isalphanum() triggers on empty struct.

--showproductname
fix script termination on non-alphanum product name

Change-Id: I78d4998e156f9b0d9f45338bed2a0d30b789e220
2022-08-23 19:28:19 -04:00
Hao Zhou 350b77a1fc Merge amd-staging into amd-master 20220722
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I1575353fb596e1fa001b888ff8c3a4555375efee
2022-07-22 11:51:56 +08:00
Divya Shikre 8144dd4d8e Add perf determinism to perf_level_string
This fixes the 'unknown' value being displayed
for Perf Level because of a missing mapping of
RSMI_DEV_PERF_LEVEL_DETERMINISM to its string
value.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I479c2baea450f0ff61640ad81cbd4d08ad56ff8e
2022-07-21 08:55:38 -04:00
Ori Messinger cbb068ccac ROCm SMI CLI: Force RETCODE to 0 by Default
The purpose of this patch is to set RETCODE equal to 0 by default
unless an appropriate '--loglevel LEVEL' has been set.

To allow a non-zero RETCODE value, you must use any loglevel that
is not 'warning' or 'None' (default).

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I9484a750206a3f464c59952304e72c59c3d12465
2022-07-18 18:33:29 -04:00
Hao Zhou 46e21f2509 Merge amd-staging into amd-master 20220708
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I8c0061b099bb140ecc6c3c4b491165da44c6b96a
2022-07-08 08:56:45 +08:00
Elena Sakhnovitch 5d5ba738db rocm_smi.py: improve error output
Match alignment of error output with general output

signed-off-by Elena Sakhnovitch

Change-Id: Id4334152f4ad5665ff37d5d47e6f7ca0107a9428
2022-06-24 12:19:43 -04:00
Hao Zhou 4752e3184a Merge amd-staging into amd-master 20220624
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ieb7c5bc9c3480dabb8534a0be5839f00f60e100b
2022-06-24 11:53:19 +08:00
Sreekant Somasekharan 1432e5e040 Add rsmi lib function to get memory overdrive value
Change-Id: I515b51d5ce4baf966bb31714886a0d72330026bc
2022-06-23 11:42:50 -04:00
Elena Sakhnovitch 0f88f59ddd [rocm_smi.py] Hiding unnecessary N/A lines
Hiding not applicable/unsupported sensors under INFO

Signed-off-by: Elena Sakhnovitch
Change-Id: I89c80ca7c6365ef3a2dd751a575ddf90044c8a2e
2022-06-23 11:02:13 -04:00
Hao Zhou 0635134df4 Merge amd-staging into amd-master 20220617
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I1bfa6a012b2bfb7f744e018f129539a495c2c5db
2022-06-17 11:08:54 +08:00
Kent Russell 6b6e840337 rocm_smi.py: Handle corrupted serial number
If the FRU has been corrupted, then the serial number will come in with
any manner of random bytes, which will cause decode() to fail
spectacularily. Check that the serial returned by the kernel is
alphanumeric, and print to the error log if not (then continue to the
next device).

Change-Id: If4f35b140b6089e02729b1490ed6b48d614a122a
2022-06-16 17:29:08 -04:00
Elena Sakhnovitch 4dd2398f3d [rocm_smi.py] error feedback improvement
Cleaning overally verbose error reporting system.

Signed-off-by: Elena Sakhnovitch
Signed-off-by: Sreekant Somasekharan
Change-Id: Icc96086810b8dcfc426848b8c349a2572026c3bd
2022-06-16 14:32:13 -04:00
Ori Messinger 2b8d0ad70f ROCm SMI CLI: Fix setClockRange Error
This patch changes the error handling for setClockRange.

When a device does not support modifying a clock type (sclk/mclk),
an error message is printed through the python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I37d9ea4189b1ca81e5deaab5efa6cfa4901b89b3
2022-06-15 15:47:51 -04:00
Hao Zhou 90571416c1 Merge amd-staging into amd-master 20220610
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Iaefd0a9180925d91d2e3ec03be84e5b04cf262b6
2022-06-10 09:08:16 +08:00
Divya Shikre dcab886394 Print log when PIDs dont use any GPU device.
showpidgpus prints 'none' when no GPU devices are
being used by the running process. Adding a fix
to print a relevant message.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I165a6644a76c8e1c3c3cad676dcfd41eb1c4724f
2022-05-31 16:17:42 -04:00
Hao Zhou 4da4de6dbe Merge amd-staging into amd-master 20220526
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Id96c706f0b9ecee20a9ded0fb1ee220f53219067
2022-05-26 09:23:37 +08:00
Elena Sakhnovitch 44ea49eb01 [rocm_smi.py]: shownodesbw fix for non xgmi
Improve error output for non-xgmi nodes bandwidth

signed-off-by: Elena Sakhnovitch
Change-Id: I833970d3200a75c7639d33bf19e0e83afe176c8d
2022-05-24 16:45:32 -04:00
Ori Messinger 786f66671a ROCm SMI CLI: Fix --showvoltagerange bug
This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99
2022-05-21 05:02:15 -04:00
Ori Messinger 4298cbb400 ROCm SMI CLI: Fix setPowerOverdrive restPowerOverdrive Bugs
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.

Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152
2022-05-21 05:01:32 -04:00
Hao Zhou 7e1154dc45 Merge amd-staging into amd-master 20220513
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: If141a24eeb09b9090fcf409507fe87abd6a90b29
2022-05-13 09:57:15 +08:00
Divya Shikre afe996c2ed Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
2022-05-11 15:33:15 -04:00
Hao Zhou 318a19d5fb Merge amd-staging into amd-master 20220506
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I534e31fe3f65d363e5e83d3a72c7eb62d4a7acaf
2022-05-06 11:51:01 +08:00
Elena Sakhnovitch be66d67ef2 Revert "rocm_smi.py: Don't try to print absent clock files"
This reverts commit b931380f02.
DRM device id  does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().

Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f
2022-05-03 16:42:42 -04:00
Elena Sakhnovitch 9d7fd34d2b [rocm_smi.py] Hide unsupported clocks under debug
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Change-Id: I1f2c7b93d9a81f2735c76e8d441f9e298288f5c0
2022-05-03 16:38:22 -04:00
Hao Zhou 7eb9f16b89 Merge amd-staging into amd-master 20220415
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ie9106c8ebfbf0c9d5cd542f759b70e14fcfa8914
2022-04-15 10:51:52 +08:00
Bill(Shuzhou) Liu 9f6614e83b Sanity check amdgpu module is loaded in rocm_smi.py
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.

Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc
2022-04-14 11:28:38 -04:00
Hao Zhou e273326ffc Merge amd-staging into amd-master 20220408
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I083b47b599f14ccc5269981097a79c83528b2924
2022-04-08 14:46:52 +08:00
Ori Messinger e800cbf161 ROCm SMI CLI: Fix formatCsv Bug
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321
2022-04-07 19:33:46 -04:00
Hao Zhou d1db525155 Merge amd-staging into amd-master 20220317
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I1aaf47be05ceb7c46ee25b34509c11afa3fa7b54
2022-03-17 14:19:04 +08:00
Kent Russell 85571318e2 README: Remove restrictive licensing language
Also update copyright years

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0
2022-03-16 13:52:25 -04:00
Hao Zhou 87af568be9 Merge amd-staging into amd-master 20220310
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I8fcf65fa293919572468a786409db75ea97c1097
2022-03-10 14:07:38 +08:00
Elena Sakhnovitch a3317714cb [rocm_smi.py] resetPowerOverdrive fix
resetPowerOverdrive: improve output messages.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ic5b9084f0637458c36e460231f2d3622b0a23aa6
2022-03-04 11:26:45 -05:00
Ranjith Ramakrishnan f1da5591b5 File reorganization with backward compatibility
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure

Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade
2022-03-03 18:48:52 -05:00
Hao Zhou 35ad11c7d5 Merge amd-staging into amd-master 20220224
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I371300c32821939aec486a70d22bcdd005971e95
2022-02-24 16:41:38 +08:00
Elena Sakhnovitch 9b871fcd9f [rocm_smi.py]: fix input error type for --setclock
signed-off-by: Elena Sakhnovitch
Change-Id: I9626978780f360c591fb8908f5b759f2289dff0b
2022-02-22 14:24:38 -05:00
Hao Zhou 19c569146c Merge amd-staging into amd-master 20220211
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I95fd0cafb212a3e0f64b58ba6a009a4cd37ae0a6
2022-02-11 10:20:57 +08:00
Ori Messinger 007f326c34 ROCm SMI CLI: Hide Failed Command Warning
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9
2022-02-09 11:52:33 -05:00
Hao Zhou 6e7c204564 Merge amd-staging into amd-master 20220121
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I0076befd07044063076f31332baa14ea0bdfb5b4
2022-01-21 11:50:24 +08:00
Sreekant Somasekharan cf2f0b0508 Print ASD firmware version in hex instead of decimal format
Change-Id: Idf113f63b79f2d2903ae795d272d232a43680516
2022-01-18 10:44:20 -05:00
Hao Zhou 3ef213258b Merge amd-staging into amd-master
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ic324e60cd33d0db539537a978710d9c87c1dbd2e
2021-12-09 10:24:19 +08:00
Elena Sakhnovitch 1aeb27c4c9 [rocm_smi.py] remove \r symbol at print
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.

Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d
2021-12-08 10:13:56 -05:00
Divya Shikre 7b1daaef96 Add fix to display correct GPU Memory Activity and GFX Activity value.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab
2021-11-25 14:28:06 -05:00
Ori Messinger 40eed25a3b ROCm SMI CLI: Fix printErrLog Arguments
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955
2021-11-08 12:54:00 -05:00