If the FRU has been corrupted, then the serial number will come in with
any manner of random bytes, which will cause decode() to fail
spectacularily. Check that the serial returned by the kernel is
alphanumeric, and print to the error log if not (then continue to the
next device).
Change-Id: If4f35b140b6089e02729b1490ed6b48d614a122a
[ROCm/rocm_smi_lib commit: 6b6e840337]
Use GNUInstallDirs variables to determine the location of LIBDIR, BINDIR, INCLUDEDIR, DOCDIR
Note that CMAKE_INSTALL_LIBDIR is overriden, since the default for RHEL
is lib64, but ROCm packaging wants it to be lib always. Distros or users
can easily override this.
Change-Id: I616152ccd2bc1f5a60bffa940312b38ca6e88c04
[ROCm/rocm_smi_lib commit: b72c464ac0]
This patch changes the error handling for setClockRange.
When a device does not support modifying a clock type (sclk/mclk),
an error message is printed through the python CLI.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I37d9ea4189b1ca81e5deaab5efa6cfa4901b89b3
[ROCm/rocm_smi_lib commit: 2b8d0ad70f]
showpidgpus prints 'none' when no GPU devices are
being used by the running process. Adding a fix
to print a relevant message.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I165a6644a76c8e1c3c3cad676dcfd41eb1c4724f
[ROCm/rocm_smi_lib commit: dcab886394]
This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99
[ROCm/rocm_smi_lib commit: 786f66671a]
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.
Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152
[ROCm/rocm_smi_lib commit: 4298cbb400]
1. Memory allocated for handle was not deleted
when no variant, subvariant or supported function
was found
2. handle->func_id_iter address was set to 0
before delete[]
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iab50fdfbe03eec8e6fd0e84e03bd2c47e645b3d8
[ROCm/rocm_smi_lib commit: b23cfc0e82]
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
[ROCm/rocm_smi_lib commit: afe996c2ed]
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996
[ROCm/rocm_smi_lib commit: 99be3451d7]
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139
[ROCm/rocm_smi_lib commit: c9b42bff57]
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546
[ROCm/rocm_smi_lib commit: 9d6403bb17]
This reverts commit 2ba625e569.
DRM device id does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().
Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f
[ROCm/rocm_smi_lib commit: be66d67ef2]
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.
Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc
[ROCm/rocm_smi_lib commit: 9f6614e83b]
When an application call the library in a system without amdgpu,
it may always print out "rsmi_init() failed". Suppress the error
message in the library.
Change-Id: Ice63dd3a764b221a6935536bff1bfa6aa3e51a46
[ROCm/rocm_smi_lib commit: 7860de5107]
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321
[ROCm/rocm_smi_lib commit: e800cbf161]
/opt/rocm/rocm_smi/bin folder was added by mistake as part of file reorg and removed the same.
File reorg commit :f391b5d73935ebbadaa8f97185f40eefc88af020
Pragma message for oam header files was showing prefix as rocm_smi, Changed the same to oam
Change-Id: I74b3c1d2bd7e0ff0eee5738c1658063bc855066c
[ROCm/rocm_smi_lib commit: 869670866d]
Also update copyright years
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0
[ROCm/rocm_smi_lib commit: 85571318e2]
string variable not being empty can lead to incorrect compilation
and corrupted output.
Change-Id: Ie66756c28aef7417759c29387500970a8b53e44c
[ROCm/rocm_smi_lib commit: dbe3403bd3]
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.
Change-Id: I6bbe6afdfad6422a210f258880ddc87a9f088d76
[ROCm/rocm_smi_lib commit: 8ce9289bc2]
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure
Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade
[ROCm/rocm_smi_lib commit: f1da5591b5]
readlink() does not append a null byte to buffer. Initialize the
tpath to prevent stack buffer overflow.
Change-Id: I17895dc3576b080a0c35bd0528a5b83223ec1c1b
[ROCm/rocm_smi_lib commit: 4b65b0307f]
Include the upgrade operation check in the prerm and postun scripts
for rocm-smi-lib package.
Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic3dee7ae50a2ac317f1aab88472b6d4805c4de90
[ROCm/rocm_smi_lib commit: 3a3b8dd25d]
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.
You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9
[ROCm/rocm_smi_lib commit: 007f326c34]
The address sanitizer build requires build id more than 8 bytes.
Change-Id: I530fe87dffbf4c46f010bf8a1c2914f733678e9a
[ROCm/rocm_smi_lib commit: 3aab7b199e]
CMakeLists.txt does not set up the DEBUG macro correctly to mean
!NDEBUG, so, as a workaround, replace all uses of ifdef NDEBUG with
ifndef DEBUG in the library sources.
Change-Id: I408adb36d1a2310fb894a486574469662ebb27cd
(cherry picked from commit f430cd4f91)
[ROCm/rocm_smi_lib commit: 2804bf7c28]
pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677
[ROCm/rocm_smi_lib commit: ec71380e1c]
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d
[ROCm/rocm_smi_lib commit: 11a71c63b1]
The -Wl,--build-id option is added for address sanitizer build
Change-Id: I0d75bc8e6169010c460e62e51708828e75de478e
[ROCm/rocm_smi_lib commit: 7b69dde24f]
When build the release, it will strip the library file instead of link.
Change-Id: Ib2d4cea614e8938bdb2be0fd74f046680158d256
[ROCm/rocm_smi_lib commit: 77502bed2a]
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c
[ROCm/rocm_smi_lib commit: 8de6ed2b8d]
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.
Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d
[ROCm/rocm_smi_lib commit: 1aeb27c4c9]
The (temperature == nullptr) check happens only when HBM temperature is retrieved.
This check needs to apply in other cases as well, hence moving this outside the HBM condition.
This should return RSMI_STATUS_INVALID_ARGS consistently in all cases when nullptr is passed through rsmitst.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iea3cec75312a0a669c7da27e15e9782e6a885c5f
[ROCm/rocm_smi_lib commit: 432df20321]
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.
As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.
Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd
[ROCm/rocm_smi_lib commit: c6f695f5a9]
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab
[ROCm/rocm_smi_lib commit: 7b1daaef96]
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iacb6474486e3732f2aa824ff447c17f8243b65cd
[ROCm/rocm_smi_lib commit: f61cb1b41d]