Граф коммитов

91 Коммитов

Автор SHA1 Сообщение Дата
Hao Zhou 318a19d5fb Merge amd-staging into amd-master 20220506
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I534e31fe3f65d363e5e83d3a72c7eb62d4a7acaf
2022-05-06 11:51:01 +08:00
Elena Sakhnovitch be66d67ef2 Revert "rocm_smi.py: Don't try to print absent clock files"
This reverts commit b931380f02.
DRM device id  does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().

Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f
2022-05-03 16:42:42 -04:00
Elena Sakhnovitch 9d7fd34d2b [rocm_smi.py] Hide unsupported clocks under debug
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Change-Id: I1f2c7b93d9a81f2735c76e8d441f9e298288f5c0
2022-05-03 16:38:22 -04:00
Hao Zhou 7eb9f16b89 Merge amd-staging into amd-master 20220415
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ie9106c8ebfbf0c9d5cd542f759b70e14fcfa8914
2022-04-15 10:51:52 +08:00
Bill(Shuzhou) Liu 9f6614e83b Sanity check amdgpu module is loaded in rocm_smi.py
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.

Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc
2022-04-14 11:28:38 -04:00
Hao Zhou e273326ffc Merge amd-staging into amd-master 20220408
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I083b47b599f14ccc5269981097a79c83528b2924
2022-04-08 14:46:52 +08:00
Ori Messinger e800cbf161 ROCm SMI CLI: Fix formatCsv Bug
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321
2022-04-07 19:33:46 -04:00
Hao Zhou d1db525155 Merge amd-staging into amd-master 20220317
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I1aaf47be05ceb7c46ee25b34509c11afa3fa7b54
2022-03-17 14:19:04 +08:00
Kent Russell 85571318e2 README: Remove restrictive licensing language
Also update copyright years

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0
2022-03-16 13:52:25 -04:00
Hao Zhou 87af568be9 Merge amd-staging into amd-master 20220310
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I8fcf65fa293919572468a786409db75ea97c1097
2022-03-10 14:07:38 +08:00
Elena Sakhnovitch a3317714cb [rocm_smi.py] resetPowerOverdrive fix
resetPowerOverdrive: improve output messages.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ic5b9084f0637458c36e460231f2d3622b0a23aa6
2022-03-04 11:26:45 -05:00
Ranjith Ramakrishnan f1da5591b5 File reorganization with backward compatibility
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure

Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade
2022-03-03 18:48:52 -05:00
Hao Zhou 35ad11c7d5 Merge amd-staging into amd-master 20220224
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I371300c32821939aec486a70d22bcdd005971e95
2022-02-24 16:41:38 +08:00
Elena Sakhnovitch 9b871fcd9f [rocm_smi.py]: fix input error type for --setclock
signed-off-by: Elena Sakhnovitch
Change-Id: I9626978780f360c591fb8908f5b759f2289dff0b
2022-02-22 14:24:38 -05:00
Hao Zhou 19c569146c Merge amd-staging into amd-master 20220211
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I95fd0cafb212a3e0f64b58ba6a009a4cd37ae0a6
2022-02-11 10:20:57 +08:00
Ori Messinger 007f326c34 ROCm SMI CLI: Hide Failed Command Warning
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9
2022-02-09 11:52:33 -05:00
Hao Zhou 6e7c204564 Merge amd-staging into amd-master 20220121
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: I0076befd07044063076f31332baa14ea0bdfb5b4
2022-01-21 11:50:24 +08:00
Sreekant Somasekharan cf2f0b0508 Print ASD firmware version in hex instead of decimal format
Change-Id: Idf113f63b79f2d2903ae795d272d232a43680516
2022-01-18 10:44:20 -05:00
Hao Zhou 3ef213258b Merge amd-staging into amd-master
Signed-off-by: Hao Zhou <Hao.Zhou@amd.com>
Change-Id: Ic324e60cd33d0db539537a978710d9c87c1dbd2e
2021-12-09 10:24:19 +08:00
Elena Sakhnovitch 1aeb27c4c9 [rocm_smi.py] remove \r symbol at print
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.

Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d
2021-12-08 10:13:56 -05:00
Divya Shikre 7b1daaef96 Add fix to display correct GPU Memory Activity and GFX Activity value.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab
2021-11-25 14:28:06 -05:00
Ori Messinger 40eed25a3b ROCm SMI CLI: Fix printErrLog Arguments
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955
2021-11-08 12:54:00 -05:00
Kent Russell 98baeca615 Merge amd-staging into amd-master
Conflicts:
	python_smi_tools/rocm_smi.py

Change-Id: Iad29365d4dd0ac30d19013dae31105343b126733
2021-11-01 11:47:00 -04:00
Elena Sakhnovitch 13cde8429d [ROCm-SMI] add --showNodesBw
Display min and max bandwidth between gpu nodes

Signed-off-by: Elena Sakhnovitch
Change-Id: I7289fb83f80e2f899996b7d7560ece670cc5f31f
2021-10-29 12:49:35 -04:00
Elena Sakhnovitch 15e4fe80e1 [rocm_smi.py] remove repetitive footnote
Printing "Primary die (usually one above or below the secondary) shows
total (primary + secondary) socket power information" footnote only one time, not
for every secondary die.

Signed-off-by: Elena Sakhnovitch
Change-Id: Iae9c5c94945ec38ecdb128a576a4eacafc30a044
2021-10-29 08:32:06 -04:00
Ori Messinger e2d9a37e5f ROCm SMI CLI: Add --showtopoaccess Functionality
The purpose of this patch is to implement --showtopoaccess
functionality in the CLI, which shows True or False if P2P is
possible between two given GPUs.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I07d70d80ae7b484136b31d5d22780c4990029391
2021-10-14 11:06:05 -04:00
Elena Sakhnovitch 2f84906cc2 [rocm_smi.py]: fix fan 255% error
signed-off-by: Elena Sakhnovitch
Change-Id: I265ba32bc3777db5f04f1924547fe432ba78c3d0
2021-09-29 21:11:06 -04:00
Elena Sakhnovitch 80140c3b02 [rocm_smi.py]: pep8 formatting
signed-off-by: Elena Sakhnovitch
Change-Id: If12b3371cd6acac16d9f6b3adf5f5cc8df28992f
2021-08-26 10:23:58 -04:00
Elena Sakhnovitch 6a01b6b2ec [rocm_smi.py] --showpower error bugfix
Fix error message in -P for secondary die

Signed-off-by: Elena Sakhnovitch
Change-Id: Ica3c0a83b565d2231fad23389b9378056a0f56b3
2021-07-30 15:20:21 -04:00
Elena Sakhnovitch 2c39e6cf51 [rocm_smi.py] add secondary die check.
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I46618002c1967ec115db88becbaba9e7c0a08af1
2021-07-30 15:20:21 -04:00
Harish Kasiviswanathan cef19745d1 rocm_smi.py: Remove extraneous line during process termination
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.

BUG: SWDEV-296409

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef
2021-07-30 15:20:21 -04:00
Elena Sakhnovitch 2db7e2a312 [rocm_smi.py] --showpower error bugfix
Fix error message in -P for secondary die

Signed-off-by: Elena Sakhnovitch
Change-Id: Ica3c0a83b565d2231fad23389b9378056a0f56b3
2021-07-30 00:08:14 -04:00
Elena Sakhnovitch b59e752122 [rocm_smi.py] add secondary die check.
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I46618002c1967ec115db88becbaba9e7c0a08af1
2021-07-29 17:46:12 -04:00
Harish Kasiviswanathan a03acf2c07 rocm_smi.py: Remove extraneous line during process termination
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.

BUG: SWDEV-296409

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef
2021-07-27 16:26:49 -04:00
Ori Messinger 8d5ced1f60 ROCm SMI Python CLI: Fix printLog Collisions
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.

This fixes the --showevents output for multi-GPU systems.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c
2021-07-27 15:26:37 -04:00
Ori Messinger 034caf6f76 ROCm SMI Python CLI: Add Zero Padding to Device Model
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850
2021-07-27 15:22:34 -04:00
Ori Messinger 95348f37cc ROCm SMI Python CLI: Fix printLog Collisions
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.

This fixes the --showevents output for multi-GPU systems.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c
2021-07-21 23:58:07 -04:00
Ori Messinger 03ae187a35 ROCm SMI Python CLI: Add Zero Padding to Device Model
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850
2021-07-17 04:29:52 -04:00
Divya Shikre 686e6ac654 Add fix to show usage of setperfdeterminism functionality in --help command
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ife93c887eea2a9aae69f2923dba45c7cde4838d3
2021-05-12 17:29:37 -04:00
Kent Russell 242d94a668 rocm_smi.py: Fix gpu reset error
Since device is a list, we need to pass a single item to the isAmdGpu
function.

Fixes: c7c2ac5559 "rocm_smi.py: Don't try to reset non-AMD GPUs"

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307
2021-04-28 07:44:55 -04:00
Kent Russell b931380f02 rocm_smi.py: Don't try to print absent clock files
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634
2021-04-23 10:19:04 -04:00
Ori Messinger b71e07b3fb rocm_smi.py: Show 'Out of Spec' warning only if required
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587
2021-04-22 14:44:05 -04:00
Ori Messinger a9e7e5a475 ROCm SMI Python CLI: Add showevent Functionality
Implement showevent functionality in the ROCm SMI Python CLI.

It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62
2021-04-22 10:21:07 -04:00
Elena c80fc54500 [rocm_smi.py] add energy counter
--showenergycounter

Signed-off-by: Elena Sakhnovitch
Change-Id: Iede0f2b06523f7cb2719489a883e9c49722f8d93
2021-04-21 18:40:19 -04:00
Elena 771b4af95c [rocm_smi.py] Coarse Grain Utilization Counters
--showuse
--showmemuse

====================================
========= % time GPU is busy =======
GPU[0]          : GPU use (%): 0
GPU[0]          : GFX Activity: 0
====================================

Change-Id: I9db115ad78b394469206b22d195781a430b2f1d8
2021-04-21 17:23:21 -04:00
Harish Kasiviswanathan 1c9e384c8f Suppress warning message in getFanSpeed function
Many data center cards are fanless. Don't show warning if unable to get
fan speed. The fan speed will be reported as 0

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I53efe67ac88fb0824cf4820430b46c18bc7692df
2021-04-21 15:29:44 -04:00
Divya Shikre 56c132873b Update setrange functionality in CLI
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic942bd76297c50caf189bfc0972d30dc42d91f32
2021-04-20 15:39:05 -04:00
Divya Shikre dc431506f5 Add support for mi200 clocks being continuous.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ifb7570054572239b9f48eaefe51e879fb3569031
2021-04-20 13:12:27 -04:00
Divya Shikre d9f7bd0ff4 Fix for cli errors - extra args in perf_determinism, undefined variable in setClocks
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Id138cfcbea4384f520537cc045d358024177b1ac
2021-04-19 17:32:07 -04:00
Elena 81c066350f Adding 4 new HBM temperature sensors.
Signed-off-by: Elena Sakhnovitch
Change-Id: Iaea04c38e8c2353e85d8aa2b871fdb82727157de
2021-04-17 23:58:49 -04:00