Commit Graph

416 Commits

Author SHA1 Message Date
Ori Messinger dfd88b593f ROCm SMI CLI: Modify Column Header
The purpose of this patch is to modify the column header of the default
'./rocm-smi' command from 'Temp' to 'Temp (DieEdge)' for clarity.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I127a9214be97a1185c3db010f1c9176d1f412ec9
2022-08-31 09:47:14 -04:00
Elena Sakhnovitch 8b2bc318eb [rocm_smi.py] bugfix for non-alphanum parce issue
--showdeviceid
Fix for false-positive  "FRU is corrupted" messages,
since str(sn).isalphanum() triggers on empty struct.

--showproductname
fix script termination on non-alphanum product name

Change-Id: I78d4998e156f9b0d9f45338bed2a0d30b789e220
2022-08-23 19:28:19 -04:00
Galantsev, Dmitrii cd11d7530b Remove python pyc file before uninstall
I6520b51aac34060b5b90f94a016cec1827a4973f happens after uninstall, which
leaves a dangling directory under /opt/rocm/libexec/rocm_smi.
Removing __pycache__ before uninstall fixes the issue.

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I695bd085d4a43b678b563b4c35f6d2e8ddfa7d7c
2022-08-11 19:55:14 -05:00
Ranjith Ramakrishnan c5159fa6d1 Remove the default setting of cmake install libdir from source code
Any default value if required should be controlled from outside.
For ROCM, build script is setting the value to "lib"

Change-Id: I12a2951307fe64e46a4e608476bfceb678bdc97d
2022-07-28 13:55:55 -04:00
Divya Shikre 8144dd4d8e Add perf determinism to perf_level_string
This fixes the 'unknown' value being displayed
for Perf Level because of a missing mapping of
RSMI_DEV_PERF_LEVEL_DETERMINISM to its string
value.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I479c2baea450f0ff61640ad81cbd4d08ad56ff8e
2022-07-21 08:55:38 -04:00
Ori Messinger cbb068ccac ROCm SMI CLI: Force RETCODE to 0 by Default
The purpose of this patch is to set RETCODE equal to 0 by default
unless an appropriate '--loglevel LEVEL' has been set.

To allow a non-zero RETCODE value, you must use any loglevel that
is not 'warning' or 'None' (default).

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I9484a750206a3f464c59952304e72c59c3d12465
2022-07-18 18:33:29 -04:00
Sreekant Somasekharan aa5cba122c Fix documentation mistake related to get memory overdrive function.
Changes made on rsmi_perf_determinism_mode_set function documentation
as well for styling consistency.

Change-Id: I09ce8139eb9cbda94352ac7725c4c9b9bb06bd59
2022-06-30 08:57:52 -04:00
Elena Sakhnovitch 5d5ba738db rocm_smi.py: improve error output
Match alignment of error output with general output

signed-off-by Elena Sakhnovitch

Change-Id: Id4334152f4ad5665ff37d5d47e6f7ca0107a9428
2022-06-24 12:19:43 -04:00
Sreekant Somasekharan 1432e5e040 Add rsmi lib function to get memory overdrive value
Change-Id: I515b51d5ce4baf966bb31714886a0d72330026bc
2022-06-23 11:42:50 -04:00
Elena Sakhnovitch 0f88f59ddd [rocm_smi.py] Hiding unnecessary N/A lines
Hiding not applicable/unsupported sensors under INFO

Signed-off-by: Elena Sakhnovitch
Change-Id: I89c80ca7c6365ef3a2dd751a575ddf90044c8a2e
2022-06-23 11:02:13 -04:00
Kent Russell 6b6e840337 rocm_smi.py: Handle corrupted serial number
If the FRU has been corrupted, then the serial number will come in with
any manner of random bytes, which will cause decode() to fail
spectacularily. Check that the serial returned by the kernel is
alphanumeric, and print to the error log if not (then continue to the
next device).

Change-Id: If4f35b140b6089e02729b1490ed6b48d614a122a
2022-06-16 17:29:08 -04:00
Elena Sakhnovitch 4dd2398f3d [rocm_smi.py] error feedback improvement
Cleaning overally verbose error reporting system.

Signed-off-by: Elena Sakhnovitch
Signed-off-by: Sreekant Somasekharan
Change-Id: Icc96086810b8dcfc426848b8c349a2572026c3bd
2022-06-16 14:32:13 -04:00
Ranjith Ramakrishnan b72c464ac0 SWDEV-321112 - Use GNUInstallDirs
Use GNUInstallDirs variables to determine the location of LIBDIR, BINDIR, INCLUDEDIR, DOCDIR

Note that CMAKE_INSTALL_LIBDIR is overriden, since the default for RHEL
is lib64, but ROCm packaging wants it to be lib always. Distros or users
can easily override this.

Change-Id: I616152ccd2bc1f5a60bffa940312b38ca6e88c04
2022-06-16 13:22:49 -04:00
Ori Messinger 2b8d0ad70f ROCm SMI CLI: Fix setClockRange Error
This patch changes the error handling for setClockRange.

When a device does not support modifying a clock type (sclk/mclk),
an error message is printed through the python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I37d9ea4189b1ca81e5deaab5efa6cfa4901b89b3
2022-06-15 15:47:51 -04:00
Bill(Shuzhou) Liu 42f11bdd63 Remove python pyc file when uninstall rpm
Remove python pyc file when uninstall rpm.

Change-Id: I6520b51aac34060b5b90f94a016cec1827a4973f
2022-06-09 09:00:38 -04:00
Divya Shikre dcab886394 Print log when PIDs dont use any GPU device.
showpidgpus prints 'none' when no GPU devices are
being used by the running process. Adding a fix
to print a relevant message.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I165a6644a76c8e1c3c3cad676dcfd41eb1c4724f
2022-05-31 16:17:42 -04:00
Elena Sakhnovitch 44ea49eb01 [rocm_smi.py]: shownodesbw fix for non xgmi
Improve error output for non-xgmi nodes bandwidth

signed-off-by: Elena Sakhnovitch
Change-Id: I833970d3200a75c7639d33bf19e0e83afe176c8d
2022-05-24 16:45:32 -04:00
Ori Messinger 786f66671a ROCm SMI CLI: Fix --showvoltagerange bug
This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99
2022-05-21 05:02:15 -04:00
Ori Messinger 4298cbb400 ROCm SMI CLI: Fix setPowerOverdrive restPowerOverdrive Bugs
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.

Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152
2022-05-21 05:01:32 -04:00
Divya Shikre b23cfc0e82 Fix mem leaks observed while running rsmitst
1.  Memory allocated for handle was not deleted
when no variant, subvariant or supported function
was found
2. handle->func_id_iter address was set to 0
before delete[]

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iab50fdfbe03eec8e6fd0e84e03bd2c47e645b3d8
2022-05-18 14:31:44 -04:00
Divya Shikre afe996c2ed Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
2022-05-11 15:33:15 -04:00
Divya Shikre 99be3451d7 Add DEBUG_LOG macro
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996
2022-05-11 11:03:24 -04:00
Divya Shikre c9b42bff57 Add RSMI_CLK_TYPE_PCIE to rsmi_clk_type_t
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139
2022-05-06 09:15:39 -04:00
Ori Messinger 9d6403bb17 ROCm SMI LIB: Add Missing GPU Blocks
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546
2022-05-05 00:44:16 -04:00
Elena Sakhnovitch be66d67ef2 Revert "rocm_smi.py: Don't try to print absent clock files"
This reverts commit b931380f02.
DRM device id  does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().

Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f
2022-05-03 16:42:42 -04:00
Elena Sakhnovitch 9d7fd34d2b [rocm_smi.py] Hide unsupported clocks under debug
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Change-Id: I1f2c7b93d9a81f2735c76e8d441f9e298288f5c0
2022-05-03 16:38:22 -04:00
Bill(Shuzhou) Liu 9f6614e83b Sanity check amdgpu module is loaded in rocm_smi.py
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.

Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc
2022-04-14 11:28:38 -04:00
Bill(Shuzhou) Liu 7860de5107 Suppress "rsmi_init() failed" error message
When an application call the library in a system without amdgpu,
it may always print out "rsmi_init() failed". Suppress the error
message in the library.

Change-Id: Ice63dd3a764b221a6935536bff1bfa6aa3e51a46
2022-04-12 09:44:00 -04:00
Ori Messinger e800cbf161 ROCm SMI CLI: Fix formatCsv Bug
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321
2022-04-07 19:33:46 -04:00
Bill(Shuzhou) Liu 9f814e150e Correct the __pycache__ folder
Remove the __pycache__ in the folder libexec/rocm_smi

Change-Id: I0ad505ff7e7368d5fe86e1eee12080039edc7111
2022-03-24 09:44:33 -04:00
Bill(Shuzhou) Liu c37d4bac8f Remove python pyc file when uninstall
Remove python pyc file when uninstall.

Change-Id: I383faf8fcfaeeb346c9ee38c1aad8577a460281e
2022-03-23 13:39:57 -04:00
Ranjith Ramakrishnan 869670866d Remove rocm_smi/bin folder and prefix name correction in pragma message
/opt/rocm/rocm_smi/bin folder was added by mistake as part of file reorg and removed the same.
File reorg commit :f1da5591b58e7c5f09ac3aa88aef85257b87478d
Pragma message for oam header files was showing prefix as rocm_smi, Changed the same to oam

Change-Id: I74b3c1d2bd7e0ff0eee5738c1658063bc855066c
2022-03-17 18:16:10 -07:00
Kent Russell 85571318e2 README: Remove restrictive licensing language
Also update copyright years

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0
2022-03-16 13:52:25 -04:00
Sreekant Somasekharan dbe3403bd3 make string variable 'tpath' an empty string.
string variable not being empty can lead to incorrect compilation
and corrupted output.

Change-Id: Ie66756c28aef7417759c29387500970a8b53e44c
2022-03-11 21:22:28 -05:00
Bill(Shuzhou) Liu 8ce9289bc2 Upgrade GoogleTest to v1.11.0
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.

Change-Id: I6bbe6afdfad6422a210f258880ddc87a9f088d76
2022-03-09 15:18:43 -05:00
Sreekant Somasekharan e6ae697e9c Add blacklist filter 'virtualization' for rsmi tests failing in SRIOV
Change-Id: Ibbaef092482c0b78ecd86a29f0b9b4331b51abe2
2022-03-04 22:13:44 -05:00
Elena Sakhnovitch a3317714cb [rocm_smi.py] resetPowerOverdrive fix
resetPowerOverdrive: improve output messages.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ic5b9084f0637458c36e460231f2d3622b0a23aa6
2022-03-04 11:26:45 -05:00
Ranjith Ramakrishnan f1da5591b5 File reorganization with backward compatibility
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure

Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade
2022-03-03 18:48:52 -05:00
Bill(Shuzhou) Liu 4b65b0307f Prevent stack buffer overflow
readlink() does not append a null byte to buffer. Initialize the
tpath to prevent stack buffer overflow.

Change-Id: I17895dc3576b080a0c35bd0528a5b83223ec1c1b
2022-03-03 15:43:53 -05:00
Saravanan Solaiyappan 3a3b8dd25d Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
for rocm-smi-lib package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic3dee7ae50a2ac317f1aab88472b6d4805c4de90
2022-02-24 10:11:32 -05:00
Elena Sakhnovitch 9b871fcd9f [rocm_smi.py]: fix input error type for --setclock
signed-off-by: Elena Sakhnovitch
Change-Id: I9626978780f360c591fb8908f5b759f2289dff0b
2022-02-22 14:24:38 -05:00
Freddy Paul d0545854dd rocm-smi:Fix cmake target files to reflect correct location
Change-Id: I86fda8447609c42e0f0615abd837b53ca5fbe717
2022-02-18 09:53:43 -08:00
Ori Messinger 007f326c34 ROCm SMI CLI: Hide Failed Command Warning
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9
2022-02-09 11:52:33 -05:00
Bill(Shuzhou) Liu 3aab7b199e Link the library using sha1 build-id
The address sanitizer build requires build id more than 8 bytes.

Change-Id: I530fe87dffbf4c46f010bf8a1c2914f733678e9a
2022-02-02 17:04:11 -05:00
Divya Shikre 8c4635acea Temporary blacklist TestPerfLevelReadWrite for navi21
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iee2146170b6828fe4fe2846c3ebfd57f95734f34
2022-01-27 22:56:37 -05:00
Laurent Morichetti 2804bf7c28 Don't use NDEBUG when the intent is !DEBUG
CMakeLists.txt does not set up the DEBUG macro correctly to mean
!NDEBUG, so, as a workaround, replace all uses of ifdef NDEBUG with
ifndef DEBUG in the library sources.

Change-Id: I408adb36d1a2310fb894a486574469662ebb27cd
(cherry picked from commit 9f87197d8d)
2022-01-27 11:08:48 -05:00
Divya Shikre ec71380e1c Add fix to check for vector size while reading pp_dpm_pcie
pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677
2022-01-26 10:34:57 -05:00
Bill(Shuzhou) Liu ce9cfa584f Add rpm License header
Add rpm License header for cpack

Change-Id: I2f4a89015b6389cfde801f41d4f6e0f59e7087aa
2022-01-20 13:30:40 -05:00
Divya Shikre 11a71c63b1 Don't assert when fan is not supported.
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d
2022-01-20 12:29:12 -05:00
Bill(Shuzhou) Liu 3356084074 Add license file to smi-lib package
Install LICENSE.txt to share/doc/smi-lib

Change-Id: Idcbb70db8808111203e8e4a4c3ab4d1e070ac79d
2022-01-19 12:15:31 -05:00