We append CE then UE, but in the table right after, it goes UE then CE.
Fix the order of the table, and add capitals for consistency
Change-Id: I208f37685508ab1e2ff83d3456620bbbf3a16268
[ROCm/amdsmi commit: 248c6f79f4]
Invalid peer links are labeled as N/A during topology report creation.
This invalid link type could be triggered by having a configuration
with CPU XGMI iolinks and disable XGMI peer to peer access. This can
be done by setting the driver parameter 'use_xgmi_p2p = 0'.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ifb09a8f3266a3f07686615dfb45781d6cfe55e83
[ROCm/amdsmi commit: 03fab6b2b6]
The purpose of this patch is to modify the column header of the default
'./rocm-smi' command from 'Temp' to 'Temp (DieEdge)' for clarity.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I127a9214be97a1185c3db010f1c9176d1f412ec9
[ROCm/amdsmi commit: dfd88b593f]
--showdeviceid
Fix for false-positive "FRU is corrupted" messages,
since str(sn).isalphanum() triggers on empty struct.
--showproductname
fix script termination on non-alphanum product name
Change-Id: I78d4998e156f9b0d9f45338bed2a0d30b789e220
[ROCm/amdsmi commit: 8b2bc318eb]
This fixes the 'unknown' value being displayed
for Perf Level because of a missing mapping of
RSMI_DEV_PERF_LEVEL_DETERMINISM to its string
value.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I479c2baea450f0ff61640ad81cbd4d08ad56ff8e
[ROCm/amdsmi commit: 8144dd4d8e]
The purpose of this patch is to set RETCODE equal to 0 by default
unless an appropriate '--loglevel LEVEL' has been set.
To allow a non-zero RETCODE value, you must use any loglevel that
is not 'warning' or 'None' (default).
You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I9484a750206a3f464c59952304e72c59c3d12465
[ROCm/amdsmi commit: cbb068ccac]
Match alignment of error output with general output
signed-off-by Elena Sakhnovitch
Change-Id: Id4334152f4ad5665ff37d5d47e6f7ca0107a9428
[ROCm/amdsmi commit: 5d5ba738db]
Hiding not applicable/unsupported sensors under INFO
Signed-off-by: Elena Sakhnovitch
Change-Id: I89c80ca7c6365ef3a2dd751a575ddf90044c8a2e
[ROCm/amdsmi commit: 0f88f59ddd]
If the FRU has been corrupted, then the serial number will come in with
any manner of random bytes, which will cause decode() to fail
spectacularily. Check that the serial returned by the kernel is
alphanumeric, and print to the error log if not (then continue to the
next device).
Change-Id: If4f35b140b6089e02729b1490ed6b48d614a122a
[ROCm/amdsmi commit: 6b6e840337]
This patch changes the error handling for setClockRange.
When a device does not support modifying a clock type (sclk/mclk),
an error message is printed through the python CLI.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I37d9ea4189b1ca81e5deaab5efa6cfa4901b89b3
[ROCm/amdsmi commit: 2b8d0ad70f]
showpidgpus prints 'none' when no GPU devices are
being used by the running process. Adding a fix
to print a relevant message.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I165a6644a76c8e1c3c3cad676dcfd41eb1c4724f
[ROCm/amdsmi commit: dcab886394]
This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99
[ROCm/amdsmi commit: 786f66671a]
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.
Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152
[ROCm/amdsmi commit: 4298cbb400]
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
[ROCm/amdsmi commit: afe996c2ed]
This reverts commit 4de1e4094a.
DRM device id does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().
Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f
[ROCm/amdsmi commit: be66d67ef2]
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.
Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc
[ROCm/amdsmi commit: 9f6614e83b]
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321
[ROCm/amdsmi commit: e800cbf161]
Also update copyright years
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0
[ROCm/amdsmi commit: 85571318e2]
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure
Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade
[ROCm/amdsmi commit: f1da5591b5]
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.
You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9
[ROCm/amdsmi commit: 007f326c34]
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.
Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d
[ROCm/amdsmi commit: 1aeb27c4c9]
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab
[ROCm/amdsmi commit: 7b1daaef96]
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955
[ROCm/amdsmi commit: 40eed25a3b]
Display min and max bandwidth between gpu nodes
Signed-off-by: Elena Sakhnovitch
Change-Id: I7289fb83f80e2f899996b7d7560ece670cc5f31f
[ROCm/amdsmi commit: 13cde8429d]
Printing "Primary die (usually one above or below the secondary) shows
total (primary + secondary) socket power information" footnote only one time, not
for every secondary die.
Signed-off-by: Elena Sakhnovitch
Change-Id: Iae9c5c94945ec38ecdb128a576a4eacafc30a044
[ROCm/amdsmi commit: 15e4fe80e1]
The purpose of this patch is to implement --showtopoaccess
functionality in the CLI, which shows True or False if P2P is
possible between two given GPUs.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I07d70d80ae7b484136b31d5d22780c4990029391
[ROCm/amdsmi commit: e2d9a37e5f]
Fix error message in -P for secondary die
Signed-off-by: Elena Sakhnovitch
Change-Id: Ica3c0a83b565d2231fad23389b9378056a0f56b3
[ROCm/amdsmi commit: 2db7e2a312]
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.
BUG: SWDEV-296409
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef
[ROCm/amdsmi commit: a03acf2c07]
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.
This fixes the --showevents output for multi-GPU systems.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c
[ROCm/amdsmi commit: 95348f37cc]
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850
[ROCm/amdsmi commit: 03ae187a35]
Since device is a list, we need to pass a single item to the isAmdGpu
function.
Fixes: ffbe481241 "rocm_smi.py: Don't try to reset non-AMD GPUs"
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307
[ROCm/amdsmi commit: 242d94a668]
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634
[ROCm/amdsmi commit: b931380f02]
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587
[ROCm/amdsmi commit: b71e07b3fb]
Implement showevent functionality in the ROCm SMI Python CLI.
It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62
[ROCm/amdsmi commit: a9e7e5a475]
Many data center cards are fanless. Don't show warning if unable to get
fan speed. The fan speed will be reported as 0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I53efe67ac88fb0824cf4820430b46c18bc7692df
[ROCm/amdsmi commit: 1c9e384c8f]