Commit Graph

409 Commits

Author SHA1 Message Date
Bill(Shuzhou) Liu 0c91ef919d Restructure the folder
Move rocm_smi related function to rocm_smi folder. Move amd_smi to
top level include/ and src/ folder. Remove obsolte oam folder.
Change the CMakeLists.txt to update folder locations.

Change-Id: I52e6be739e49f3b0545865f25364787f5985e9c3
2022-10-20 09:23:51 -05:00
Bill(Shuzhou) Liu 1ec3a2182e Support rocm-smi related device information
A few fields are added to board_info and asic_info for rocm-smi
device information.

Implement rocm-smi related fw block in amdsmi_get_fw_info().

Change-Id: I825d3e5c7feaa07a6e05386d4f1a59ebf528dfc0
2022-10-20 09:23:41 -05:00
Bill(Shuzhou) Liu f1d02aca79 Port rocm-smi function to amd-smi
Port most rocm-smi function to amd-smi and add unit tests.

Change-Id: I6387a4bdaf20ead2389c99bb01d438156ccd0747
2022-09-06 12:08:59 -04:00
Bill(Shuzhou) Liu 86017b799c Port more rocm-smi function to amd-smi
The API support function, performance counter, process information,
topology and xgmi info.

Change-Id: I3350ec75fdd2ca1438e79134582ae83c49763056
2022-08-24 12:49:27 -05:00
Bill(Shuzhou) Liu 7b92c694a0 Support events in the amdsmi
Port the events handling from rocm-smi to amd-smi

Change-Id: I0b4cb30a585cb2188a24be0e21c1c156b461bb1d
2022-08-23 16:49:56 -04:00
Bill(Shuzhou) Liu 98df483bef Add unit test support
Add gtest based unit test framework. Implement fan read/write function.

Change-Id: I83375c24b99d24d01d12bccda863a38f75f5987f
2022-08-05 09:55:34 -04:00
Alexsandar Nedeljkovic 61289339d8 Update amdsmi header to include GpuvSMI related APIs and definitions
Signed-off-by: Alexsandar Nedeljkovic <alexsandar.nedeljkovic@amd.com>
Change-Id: Iff46d724f35b52028b67ce272f800fcf820c96ac
2022-07-22 16:06:20 +02:00
Bill(Shuzhou) Liu 5ba371f285 Load libdrm at run time
Remove the compile time dependency on libdrm. Load it at the run
time instead.

Add the headers missed from smi-lib

Change-Id: Ie1ecf293b51425b6a61c502d11a42809dc099f70
2022-06-28 14:48:59 -04:00
Bill(Shuzhou) Liu 91ad08aa65 The init version of amd_smi
The init version includes the amd_smi.h header, an example uses the
amd_smi, folder structure and CMake files.

Add the support to libdrm.

Change-Id: I779e55e4cf9491c61dc226a30d24e96be9bc6016
2022-06-14 09:14:24 -04:00
Elena Sakhnovitch 44ea49eb01 [rocm_smi.py]: shownodesbw fix for non xgmi
Improve error output for non-xgmi nodes bandwidth

signed-off-by: Elena Sakhnovitch
Change-Id: I833970d3200a75c7639d33bf19e0e83afe176c8d
2022-05-24 16:45:32 -04:00
Ori Messinger 786f66671a ROCm SMI CLI: Fix --showvoltagerange bug
This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99
2022-05-21 05:02:15 -04:00
Ori Messinger 4298cbb400 ROCm SMI CLI: Fix setPowerOverdrive restPowerOverdrive Bugs
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.

Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152
2022-05-21 05:01:32 -04:00
Divya Shikre b23cfc0e82 Fix mem leaks observed while running rsmitst
1.  Memory allocated for handle was not deleted
when no variant, subvariant or supported function
was found
2. handle->func_id_iter address was set to 0
before delete[]

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iab50fdfbe03eec8e6fd0e84e03bd2c47e645b3d8
2022-05-18 14:31:44 -04:00
Divya Shikre afe996c2ed Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6
2022-05-11 15:33:15 -04:00
Divya Shikre 99be3451d7 Add DEBUG_LOG macro
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996
2022-05-11 11:03:24 -04:00
Divya Shikre c9b42bff57 Add RSMI_CLK_TYPE_PCIE to rsmi_clk_type_t
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139
2022-05-06 09:15:39 -04:00
Ori Messinger 9d6403bb17 ROCm SMI LIB: Add Missing GPU Blocks
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546
2022-05-05 00:44:16 -04:00
Elena Sakhnovitch be66d67ef2 Revert "rocm_smi.py: Don't try to print absent clock files"
This reverts commit b931380f02.
DRM device id  does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().

Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f
2022-05-03 16:42:42 -04:00
Elena Sakhnovitch 9d7fd34d2b [rocm_smi.py] Hide unsupported clocks under debug
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Change-Id: I1f2c7b93d9a81f2735c76e8d441f9e298288f5c0
2022-05-03 16:38:22 -04:00
Bill(Shuzhou) Liu 9f6614e83b Sanity check amdgpu module is loaded in rocm_smi.py
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.

Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc
2022-04-14 11:28:38 -04:00
Bill(Shuzhou) Liu 7860de5107 Suppress "rsmi_init() failed" error message
When an application call the library in a system without amdgpu,
it may always print out "rsmi_init() failed". Suppress the error
message in the library.

Change-Id: Ice63dd3a764b221a6935536bff1bfa6aa3e51a46
2022-04-12 09:44:00 -04:00
Ori Messinger e800cbf161 ROCm SMI CLI: Fix formatCsv Bug
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321
2022-04-07 19:33:46 -04:00
Bill(Shuzhou) Liu 9f814e150e Correct the __pycache__ folder
Remove the __pycache__ in the folder libexec/rocm_smi

Change-Id: I0ad505ff7e7368d5fe86e1eee12080039edc7111
2022-03-24 09:44:33 -04:00
Bill(Shuzhou) Liu c37d4bac8f Remove python pyc file when uninstall
Remove python pyc file when uninstall.

Change-Id: I383faf8fcfaeeb346c9ee38c1aad8577a460281e
2022-03-23 13:39:57 -04:00
Ranjith Ramakrishnan 869670866d Remove rocm_smi/bin folder and prefix name correction in pragma message
/opt/rocm/rocm_smi/bin folder was added by mistake as part of file reorg and removed the same.
File reorg commit :f1da5591b58e7c5f09ac3aa88aef85257b87478d
Pragma message for oam header files was showing prefix as rocm_smi, Changed the same to oam

Change-Id: I74b3c1d2bd7e0ff0eee5738c1658063bc855066c
2022-03-17 18:16:10 -07:00
Kent Russell 85571318e2 README: Remove restrictive licensing language
Also update copyright years

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0
2022-03-16 13:52:25 -04:00
Sreekant Somasekharan dbe3403bd3 make string variable 'tpath' an empty string.
string variable not being empty can lead to incorrect compilation
and corrupted output.

Change-Id: Ie66756c28aef7417759c29387500970a8b53e44c
2022-03-11 21:22:28 -05:00
Bill(Shuzhou) Liu 8ce9289bc2 Upgrade GoogleTest to v1.11.0
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.

Change-Id: I6bbe6afdfad6422a210f258880ddc87a9f088d76
2022-03-09 15:18:43 -05:00
Sreekant Somasekharan e6ae697e9c Add blacklist filter 'virtualization' for rsmi tests failing in SRIOV
Change-Id: Ibbaef092482c0b78ecd86a29f0b9b4331b51abe2
2022-03-04 22:13:44 -05:00
Elena Sakhnovitch a3317714cb [rocm_smi.py] resetPowerOverdrive fix
resetPowerOverdrive: improve output messages.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ic5b9084f0637458c36e460231f2d3622b0a23aa6
2022-03-04 11:26:45 -05:00
Ranjith Ramakrishnan f1da5591b5 File reorganization with backward compatibility
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure

Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade
2022-03-03 18:48:52 -05:00
Bill(Shuzhou) Liu 4b65b0307f Prevent stack buffer overflow
readlink() does not append a null byte to buffer. Initialize the
tpath to prevent stack buffer overflow.

Change-Id: I17895dc3576b080a0c35bd0528a5b83223ec1c1b
2022-03-03 15:43:53 -05:00
Saravanan Solaiyappan 3a3b8dd25d Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
for rocm-smi-lib package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic3dee7ae50a2ac317f1aab88472b6d4805c4de90
2022-02-24 10:11:32 -05:00
Elena Sakhnovitch 9b871fcd9f [rocm_smi.py]: fix input error type for --setclock
signed-off-by: Elena Sakhnovitch
Change-Id: I9626978780f360c591fb8908f5b759f2289dff0b
2022-02-22 14:24:38 -05:00
Freddy Paul d0545854dd rocm-smi:Fix cmake target files to reflect correct location
Change-Id: I86fda8447609c42e0f0615abd837b53ca5fbe717
2022-02-18 09:53:43 -08:00
Ori Messinger 007f326c34 ROCm SMI CLI: Hide Failed Command Warning
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9
2022-02-09 11:52:33 -05:00
Bill(Shuzhou) Liu 3aab7b199e Link the library using sha1 build-id
The address sanitizer build requires build id more than 8 bytes.

Change-Id: I530fe87dffbf4c46f010bf8a1c2914f733678e9a
2022-02-02 17:04:11 -05:00
Divya Shikre 8c4635acea Temporary blacklist TestPerfLevelReadWrite for navi21
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iee2146170b6828fe4fe2846c3ebfd57f95734f34
2022-01-27 22:56:37 -05:00
Laurent Morichetti 2804bf7c28 Don't use NDEBUG when the intent is !DEBUG
CMakeLists.txt does not set up the DEBUG macro correctly to mean
!NDEBUG, so, as a workaround, replace all uses of ifdef NDEBUG with
ifndef DEBUG in the library sources.

Change-Id: I408adb36d1a2310fb894a486574469662ebb27cd
(cherry picked from commit 9f87197d8d)
2022-01-27 11:08:48 -05:00
Divya Shikre ec71380e1c Add fix to check for vector size while reading pp_dpm_pcie
pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677
2022-01-26 10:34:57 -05:00
Bill(Shuzhou) Liu ce9cfa584f Add rpm License header
Add rpm License header for cpack

Change-Id: I2f4a89015b6389cfde801f41d4f6e0f59e7087aa
2022-01-20 13:30:40 -05:00
Divya Shikre 11a71c63b1 Don't assert when fan is not supported.
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d
2022-01-20 12:29:12 -05:00
Bill(Shuzhou) Liu 3356084074 Add license file to smi-lib package
Install LICENSE.txt to share/doc/smi-lib

Change-Id: Idcbb70db8808111203e8e4a4c3ab4d1e070ac79d
2022-01-19 12:15:31 -05:00
Sreekant Somasekharan cf2f0b0508 Print ASD firmware version in hex instead of decimal format
Change-Id: Idf113f63b79f2d2903ae795d272d232a43680516
2022-01-18 10:44:20 -05:00
Bill(Shuzhou) Liu 7b69dde24f Enable the linker build id generation for address sanitizer build
The -Wl,--build-id option is added for address sanitizer build

Change-Id: I0d75bc8e6169010c460e62e51708828e75de478e
2022-01-17 09:06:34 -05:00
Bill(Shuzhou) Liu 77502bed2a strip the library instead of link when build release
When build the release, it will strip the library file instead of link.

Change-Id: Ib2d4cea614e8938bdb2be0fd74f046680158d256
2022-01-14 10:39:15 -05:00
Harish Kasiviswanathan 8de6ed2b8d rocm_smi_lib: add stdbool.h needed for C90
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c
2021-12-14 15:25:59 -05:00
Elena Sakhnovitch 1aeb27c4c9 [rocm_smi.py] remove \r symbol at print
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.

Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d
2021-12-08 10:13:56 -05:00
Divya Shikre 432df20321 Add null ptr check for temperature read from all sensors.
The (temperature == nullptr) check happens only when HBM temperature is retrieved.
This check needs to apply in other cases as well, hence moving this outside the HBM condition.
This should return RSMI_STATUS_INVALID_ARGS consistently in all cases when nullptr is passed through rsmitst.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iea3cec75312a0a669c7da27e15e9782e6a885c5f
2021-12-01 14:05:46 -05:00
Divya Shikre b4fd9c0d94 Update temp_read rsmitst.
Check for RSMI_STATUS_INVALID_ARGS when invalid args are passed.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d5ff84aee5cce4214026ddcd860a17ae3e43147
2021-11-29 18:09:45 -05:00