提交線圖

406 次程式碼提交

作者 SHA1 備註 日期
Kent Russell 140656e176 rocm_smi.py: Handle corrupted serial number
If the FRU has been corrupted, then the serial number will come in with
any manner of random bytes, which will cause decode() to fail
spectacularily. Check that the serial returned by the kernel is
alphanumeric, and print to the error log if not (then continue to the
next device).

Change-Id: If4f35b140b6089e02729b1490ed6b48d614a122a


[ROCm/rocm_smi_lib commit: 6b6e840337]
2022-06-16 17:29:08 -04:00
Elena Sakhnovitch f4cac3e4ef [rocm_smi.py] error feedback improvement
Cleaning overally verbose error reporting system.

Signed-off-by: Elena Sakhnovitch
Signed-off-by: Sreekant Somasekharan
Change-Id: Icc96086810b8dcfc426848b8c349a2572026c3bd


[ROCm/rocm_smi_lib commit: 4dd2398f3d]
2022-06-16 14:32:13 -04:00
Ranjith Ramakrishnan 24b9610d3a SWDEV-321112 - Use GNUInstallDirs
Use GNUInstallDirs variables to determine the location of LIBDIR, BINDIR, INCLUDEDIR, DOCDIR

Note that CMAKE_INSTALL_LIBDIR is overriden, since the default for RHEL
is lib64, but ROCm packaging wants it to be lib always. Distros or users
can easily override this.

Change-Id: I616152ccd2bc1f5a60bffa940312b38ca6e88c04


[ROCm/rocm_smi_lib commit: b72c464ac0]
2022-06-16 13:22:49 -04:00
Ori Messinger d97ddd9e67 ROCm SMI CLI: Fix setClockRange Error
This patch changes the error handling for setClockRange.

When a device does not support modifying a clock type (sclk/mclk),
an error message is printed through the python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I37d9ea4189b1ca81e5deaab5efa6cfa4901b89b3


[ROCm/rocm_smi_lib commit: 2b8d0ad70f]
2022-06-15 15:47:51 -04:00
Bill(Shuzhou) Liu c8fbb50d8e Remove python pyc file when uninstall rpm
Remove python pyc file when uninstall rpm.

Change-Id: I6520b51aac34060b5b90f94a016cec1827a4973f


[ROCm/rocm_smi_lib commit: 42f11bdd63]
2022-06-09 09:00:38 -04:00
Divya Shikre 100e331812 Print log when PIDs dont use any GPU device.
showpidgpus prints 'none' when no GPU devices are
being used by the running process. Adding a fix
to print a relevant message.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I165a6644a76c8e1c3c3cad676dcfd41eb1c4724f


[ROCm/rocm_smi_lib commit: dcab886394]
2022-05-31 16:17:42 -04:00
Elena Sakhnovitch b2ac46009b [rocm_smi.py]: shownodesbw fix for non xgmi
Improve error output for non-xgmi nodes bandwidth

signed-off-by: Elena Sakhnovitch
Change-Id: I833970d3200a75c7639d33bf19e0e83afe176c8d


[ROCm/rocm_smi_lib commit: 44ea49eb01]
2022-05-24 16:45:32 -04:00
Ori Messinger 750c640171 ROCm SMI CLI: Fix --showvoltagerange bug
This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99


[ROCm/rocm_smi_lib commit: 786f66671a]
2022-05-21 05:02:15 -04:00
Ori Messinger 2459274e03 ROCm SMI CLI: Fix setPowerOverdrive restPowerOverdrive Bugs
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.

Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152


[ROCm/rocm_smi_lib commit: 4298cbb400]
2022-05-21 05:01:32 -04:00
Divya Shikre 75467146b6 Fix mem leaks observed while running rsmitst
1.  Memory allocated for handle was not deleted
when no variant, subvariant or supported function
was found
2. handle->func_id_iter address was set to 0
before delete[]

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iab50fdfbe03eec8e6fd0e84e03bd2c47e645b3d8


[ROCm/rocm_smi_lib commit: b23cfc0e82]
2022-05-18 14:31:44 -04:00
Divya Shikre 231a61a394 Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6


[ROCm/rocm_smi_lib commit: afe996c2ed]
2022-05-11 15:33:15 -04:00
Divya Shikre 853a6e517c Add DEBUG_LOG macro
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996


[ROCm/rocm_smi_lib commit: 99be3451d7]
2022-05-11 11:03:24 -04:00
Divya Shikre 4a8d4b2878 Add RSMI_CLK_TYPE_PCIE to rsmi_clk_type_t
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139


[ROCm/rocm_smi_lib commit: c9b42bff57]
2022-05-06 09:15:39 -04:00
Ori Messinger c5ac3ea7bd ROCm SMI LIB: Add Missing GPU Blocks
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546


[ROCm/rocm_smi_lib commit: 9d6403bb17]
2022-05-05 00:44:16 -04:00
Elena Sakhnovitch 2a06a86b09 Revert "rocm_smi.py: Don't try to print absent clock files"
This reverts commit 2ba625e569.
DRM device id  does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().

Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f


[ROCm/rocm_smi_lib commit: be66d67ef2]
2022-05-03 16:42:42 -04:00
Elena Sakhnovitch 33320d6e1a [rocm_smi.py] Hide unsupported clocks under debug
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Change-Id: I1f2c7b93d9a81f2735c76e8d441f9e298288f5c0


[ROCm/rocm_smi_lib commit: 9d7fd34d2b]
2022-05-03 16:38:22 -04:00
Bill(Shuzhou) Liu 654834be6c Sanity check amdgpu module is loaded in rocm_smi.py
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.

Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc


[ROCm/rocm_smi_lib commit: 9f6614e83b]
2022-04-14 11:28:38 -04:00
Bill(Shuzhou) Liu 54b4ad12df Suppress "rsmi_init() failed" error message
When an application call the library in a system without amdgpu,
it may always print out "rsmi_init() failed". Suppress the error
message in the library.

Change-Id: Ice63dd3a764b221a6935536bff1bfa6aa3e51a46


[ROCm/rocm_smi_lib commit: 7860de5107]
2022-04-12 09:44:00 -04:00
Ori Messinger ecdc660778 ROCm SMI CLI: Fix formatCsv Bug
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321


[ROCm/rocm_smi_lib commit: e800cbf161]
2022-04-07 19:33:46 -04:00
Bill(Shuzhou) Liu 2e2f757d9d Correct the __pycache__ folder
Remove the __pycache__ in the folder libexec/rocm_smi

Change-Id: I0ad505ff7e7368d5fe86e1eee12080039edc7111


[ROCm/rocm_smi_lib commit: 9f814e150e]
2022-03-24 09:44:33 -04:00
Bill(Shuzhou) Liu 5c0dc0f383 Remove python pyc file when uninstall
Remove python pyc file when uninstall.

Change-Id: I383faf8fcfaeeb346c9ee38c1aad8577a460281e


[ROCm/rocm_smi_lib commit: c37d4bac8f]
2022-03-23 13:39:57 -04:00
Ranjith Ramakrishnan bc3759120d Remove rocm_smi/bin folder and prefix name correction in pragma message
/opt/rocm/rocm_smi/bin folder was added by mistake as part of file reorg and removed the same.
File reorg commit :f391b5d73935ebbadaa8f97185f40eefc88af020
Pragma message for oam header files was showing prefix as rocm_smi, Changed the same to oam

Change-Id: I74b3c1d2bd7e0ff0eee5738c1658063bc855066c


[ROCm/rocm_smi_lib commit: 869670866d]
2022-03-17 18:16:10 -07:00
Kent Russell 0e18159c0e README: Remove restrictive licensing language
Also update copyright years

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0


[ROCm/rocm_smi_lib commit: 85571318e2]
2022-03-16 13:52:25 -04:00
Sreekant Somasekharan 51f6fe8bc1 make string variable 'tpath' an empty string.
string variable not being empty can lead to incorrect compilation
and corrupted output.

Change-Id: Ie66756c28aef7417759c29387500970a8b53e44c


[ROCm/rocm_smi_lib commit: dbe3403bd3]
2022-03-11 21:22:28 -05:00
Bill(Shuzhou) Liu 30fb7220bd Upgrade GoogleTest to v1.11.0
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.

Change-Id: I6bbe6afdfad6422a210f258880ddc87a9f088d76


[ROCm/rocm_smi_lib commit: 8ce9289bc2]
2022-03-09 15:18:43 -05:00
Sreekant Somasekharan 265026251b Add blacklist filter 'virtualization' for rsmi tests failing in SRIOV
Change-Id: Ibbaef092482c0b78ecd86a29f0b9b4331b51abe2


[ROCm/rocm_smi_lib commit: e6ae697e9c]
2022-03-04 22:13:44 -05:00
Elena Sakhnovitch 090011b153 [rocm_smi.py] resetPowerOverdrive fix
resetPowerOverdrive: improve output messages.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ic5b9084f0637458c36e460231f2d3622b0a23aa6


[ROCm/rocm_smi_lib commit: a3317714cb]
2022-03-04 11:26:45 -05:00
Ranjith Ramakrishnan f391b5d739 File reorganization with backward compatibility
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure

Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade


[ROCm/rocm_smi_lib commit: f1da5591b5]
2022-03-03 18:48:52 -05:00
Bill(Shuzhou) Liu 9b2017de0c Prevent stack buffer overflow
readlink() does not append a null byte to buffer. Initialize the
tpath to prevent stack buffer overflow.

Change-Id: I17895dc3576b080a0c35bd0528a5b83223ec1c1b


[ROCm/rocm_smi_lib commit: 4b65b0307f]
2022-03-03 15:43:53 -05:00
Saravanan Solaiyappan 913986f721 Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
for rocm-smi-lib package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic3dee7ae50a2ac317f1aab88472b6d4805c4de90


[ROCm/rocm_smi_lib commit: 3a3b8dd25d]
2022-02-24 10:11:32 -05:00
Elena Sakhnovitch 45763cc1bb [rocm_smi.py]: fix input error type for --setclock
signed-off-by: Elena Sakhnovitch
Change-Id: I9626978780f360c591fb8908f5b759f2289dff0b


[ROCm/rocm_smi_lib commit: 9b871fcd9f]
2022-02-22 14:24:38 -05:00
Freddy Paul 566a0c794c rocm-smi:Fix cmake target files to reflect correct location
Change-Id: I86fda8447609c42e0f0615abd837b53ca5fbe717


[ROCm/rocm_smi_lib commit: d0545854dd]
2022-02-18 09:53:43 -08:00
Ori Messinger 9d6285f6c8 ROCm SMI CLI: Hide Failed Command Warning
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9


[ROCm/rocm_smi_lib commit: 007f326c34]
2022-02-09 11:52:33 -05:00
Bill(Shuzhou) Liu f4ad11bc29 Link the library using sha1 build-id
The address sanitizer build requires build id more than 8 bytes.

Change-Id: I530fe87dffbf4c46f010bf8a1c2914f733678e9a


[ROCm/rocm_smi_lib commit: 3aab7b199e]
2022-02-02 17:04:11 -05:00
Divya Shikre 25c9398a0d Temporary blacklist TestPerfLevelReadWrite for navi21
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iee2146170b6828fe4fe2846c3ebfd57f95734f34


[ROCm/rocm_smi_lib commit: 8c4635acea]
2022-01-27 22:56:37 -05:00
Laurent Morichetti fbb6e77dda Don't use NDEBUG when the intent is !DEBUG
CMakeLists.txt does not set up the DEBUG macro correctly to mean
!NDEBUG, so, as a workaround, replace all uses of ifdef NDEBUG with
ifndef DEBUG in the library sources.

Change-Id: I408adb36d1a2310fb894a486574469662ebb27cd
(cherry picked from commit f430cd4f91)


[ROCm/rocm_smi_lib commit: 2804bf7c28]
2022-01-27 11:08:48 -05:00
Divya Shikre a7a7c65e2a Add fix to check for vector size while reading pp_dpm_pcie
pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677


[ROCm/rocm_smi_lib commit: ec71380e1c]
2022-01-26 10:34:57 -05:00
Bill(Shuzhou) Liu 9db28252c2 Add rpm License header
Add rpm License header for cpack

Change-Id: I2f4a89015b6389cfde801f41d4f6e0f59e7087aa


[ROCm/rocm_smi_lib commit: ce9cfa584f]
2022-01-20 13:30:40 -05:00
Divya Shikre 17e4460690 Don't assert when fan is not supported.
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d


[ROCm/rocm_smi_lib commit: 11a71c63b1]
2022-01-20 12:29:12 -05:00
Bill(Shuzhou) Liu 1a1e04b5a2 Add license file to smi-lib package
Install LICENSE.txt to share/doc/smi-lib

Change-Id: Idcbb70db8808111203e8e4a4c3ab4d1e070ac79d


[ROCm/rocm_smi_lib commit: 3356084074]
2022-01-19 12:15:31 -05:00
Sreekant Somasekharan 8266782850 Print ASD firmware version in hex instead of decimal format
Change-Id: Idf113f63b79f2d2903ae795d272d232a43680516


[ROCm/rocm_smi_lib commit: cf2f0b0508]
2022-01-18 10:44:20 -05:00
Bill(Shuzhou) Liu 9824aa1545 Enable the linker build id generation for address sanitizer build
The -Wl,--build-id option is added for address sanitizer build

Change-Id: I0d75bc8e6169010c460e62e51708828e75de478e


[ROCm/rocm_smi_lib commit: 7b69dde24f]
2022-01-17 09:06:34 -05:00
Bill(Shuzhou) Liu 7bf29acf35 strip the library instead of link when build release
When build the release, it will strip the library file instead of link.

Change-Id: Ib2d4cea614e8938bdb2be0fd74f046680158d256


[ROCm/rocm_smi_lib commit: 77502bed2a]
2022-01-14 10:39:15 -05:00
Harish Kasiviswanathan 16a9531a4d rocm_smi_lib: add stdbool.h needed for C90
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c


[ROCm/rocm_smi_lib commit: 8de6ed2b8d]
2021-12-14 15:25:59 -05:00
Elena Sakhnovitch 5553c7fb40 [rocm_smi.py] remove \r symbol at print
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.

Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d


[ROCm/rocm_smi_lib commit: 1aeb27c4c9]
2021-12-08 10:13:56 -05:00
Divya Shikre a83ee69dd3 Add null ptr check for temperature read from all sensors.
The (temperature == nullptr) check happens only when HBM temperature is retrieved.
This check needs to apply in other cases as well, hence moving this outside the HBM condition.
This should return RSMI_STATUS_INVALID_ARGS consistently in all cases when nullptr is passed through rsmitst.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iea3cec75312a0a669c7da27e15e9782e6a885c5f


[ROCm/rocm_smi_lib commit: 432df20321]
2021-12-01 14:05:46 -05:00
Divya Shikre 92fe455a8e Update temp_read rsmitst.
Check for RSMI_STATUS_INVALID_ARGS when invalid args are passed.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d5ff84aee5cce4214026ddcd860a17ae3e43147


[ROCm/rocm_smi_lib commit: b4fd9c0d94]
2021-11-29 18:09:45 -05:00
Sreekant Somasekharan 835f43311a Skip TestFrequenciesReadWrite for unsupported ASICs
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.

As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.

Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd


[ROCm/rocm_smi_lib commit: c6f695f5a9]
2021-11-29 11:23:38 -05:00
Divya Shikre c23694e66a Add fix to display correct GPU Memory Activity and GFX Activity value.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab


[ROCm/rocm_smi_lib commit: 7b1daaef96]
2021-11-25 14:28:06 -05:00
Divya Shikre a95af9b70d Add fix for out of range temperature value for HBM.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iacb6474486e3732f2aa824ff447c17f8243b65cd


[ROCm/rocm_smi_lib commit: f61cb1b41d]
2021-11-23 15:37:41 -05:00