This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.
Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
This reverts commit b931380f02.
DRM device id does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().
Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.
Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure
Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.
You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.
Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955
Printing "Primary die (usually one above or below the secondary) shows
total (primary + secondary) socket power information" footnote only one time, not
for every secondary die.
Signed-off-by: Elena Sakhnovitch
Change-Id: Iae9c5c94945ec38ecdb128a576a4eacafc30a044
The purpose of this patch is to implement --showtopoaccess
functionality in the CLI, which shows True or False if P2P is
possible between two given GPUs.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I07d70d80ae7b484136b31d5d22780c4990029391
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.
BUG: SWDEV-296409
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.
BUG: SWDEV-296409
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.
This fixes the --showevents output for multi-GPU systems.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.
This fixes the --showevents output for multi-GPU systems.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850
Since device is a list, we need to pass a single item to the isAmdGpu
function.
Fixes: c7c2ac5559 "rocm_smi.py: Don't try to reset non-AMD GPUs"
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587
Implement showevent functionality in the ROCm SMI Python CLI.
It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62