Gráfico de commits

357 Commits

Autor SHA1 Mensaje Fecha
Ranjith Ramakrishnan e6f3945503 SWDEV-366823 - Change pragma message to warning
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition

Change-Id: I281ad17949435fee4b508a2a7e112b6fa3365838


[ROCm/rocm_smi_lib commit: e7ed902fd6]
2022-11-21 01:08:12 -08:00
Sreekant Somasekharan 82331af987 [rocm_smi_kfd.cc] Handle return value from ReadSysfsStr function.
Return value from ReadSysfsStr function that reads cu_occupancy file
was not handled correctly. Modified the script to handle any fail conditions.

Change-Id: I3c71e0f6f288f196ed1f833e8709255c2b6e78ee


[ROCm/rocm_smi_lib commit: e9e3ba541e]
2022-10-31 12:20:06 -04:00
Ranjith Ramakrishnan 4f08cdd1ea SWDEV-345870 - Correct install interface for new directory layout
Install interface should provide /opt/rocm-ver/include as the include path
Path /opt/rocm-ver/rocm_smi/include should be used only as  part of backward compatibility support

Change-Id: Idc1f663069356c6b1fbd492f45ef4637fc90e4eb


[ROCm/rocm_smi_lib commit: 9a650b1378]
2022-09-13 10:48:21 -07:00
Alex Sierra 43d4d2c55c Consider invalid peer link type during topology report
Invalid peer links are labeled as N/A during topology report creation.
This invalid link type could be triggered by having a configuration
with CPU XGMI iolinks and disable XGMI peer to peer access. This can
be done by setting the driver parameter 'use_xgmi_p2p = 0'.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ifb09a8f3266a3f07686615dfb45781d6cfe55e83


[ROCm/rocm_smi_lib commit: 03fab6b2b6]
2022-09-06 13:47:32 -05:00
Alex Sierra f4bb38e6ef Avoid report PCIe peer devices with CPU XGMI iolinks
Devices with CPU XGMI iolink do not support PCIe peer access. Therefore,
they should not be reported as accessible links in the topology.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I3ee51796945dc0966200dee03886510e8f1846b7


[ROCm/rocm_smi_lib commit: 4658630d8d]
2022-09-02 09:18:30 -05:00
Ori Messinger d415a3b2e1 ROCm SMI CLI: Modify Column Header
The purpose of this patch is to modify the column header of the default
'./rocm-smi' command from 'Temp' to 'Temp (DieEdge)' for clarity.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I127a9214be97a1185c3db010f1c9176d1f412ec9


[ROCm/rocm_smi_lib commit: dfd88b593f]
2022-08-31 09:47:14 -04:00
Elena Sakhnovitch 827344a3e8 [rocm_smi.py] bugfix for non-alphanum parce issue
--showdeviceid
Fix for false-positive  "FRU is corrupted" messages,
since str(sn).isalphanum() triggers on empty struct.

--showproductname
fix script termination on non-alphanum product name

Change-Id: I78d4998e156f9b0d9f45338bed2a0d30b789e220


[ROCm/rocm_smi_lib commit: 8b2bc318eb]
2022-08-23 19:28:19 -04:00
Galantsev, Dmitrii 26ad77dace Remove python pyc file before uninstall
I6520b51aac34060b5b90f94a016cec1827a4973f happens after uninstall, which
leaves a dangling directory under /opt/rocm/libexec/rocm_smi.
Removing __pycache__ before uninstall fixes the issue.

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I695bd085d4a43b678b563b4c35f6d2e8ddfa7d7c


[ROCm/rocm_smi_lib commit: cd11d7530b]
2022-08-11 19:55:14 -05:00
Ranjith Ramakrishnan 37482beadc Remove the default setting of cmake install libdir from source code
Any default value if required should be controlled from outside.
For ROCM, build script is setting the value to "lib"

Change-Id: I12a2951307fe64e46a4e608476bfceb678bdc97d


[ROCm/rocm_smi_lib commit: c5159fa6d1]
2022-07-28 13:55:55 -04:00
Divya Shikre f1154d2599 Add perf determinism to perf_level_string
This fixes the 'unknown' value being displayed
for Perf Level because of a missing mapping of
RSMI_DEV_PERF_LEVEL_DETERMINISM to its string
value.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I479c2baea450f0ff61640ad81cbd4d08ad56ff8e


[ROCm/rocm_smi_lib commit: 8144dd4d8e]
2022-07-21 08:55:38 -04:00
Ori Messinger 6f372e2e7a ROCm SMI CLI: Force RETCODE to 0 by Default
The purpose of this patch is to set RETCODE equal to 0 by default
unless an appropriate '--loglevel LEVEL' has been set.

To allow a non-zero RETCODE value, you must use any loglevel that
is not 'warning' or 'None' (default).

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I9484a750206a3f464c59952304e72c59c3d12465


[ROCm/rocm_smi_lib commit: cbb068ccac]
2022-07-18 18:33:29 -04:00
Sreekant Somasekharan 36521ce4be Fix documentation mistake related to get memory overdrive function.
Changes made on rsmi_perf_determinism_mode_set function documentation
as well for styling consistency.

Change-Id: I09ce8139eb9cbda94352ac7725c4c9b9bb06bd59


[ROCm/rocm_smi_lib commit: aa5cba122c]
2022-06-30 08:57:52 -04:00
Elena Sakhnovitch fd81567c79 rocm_smi.py: improve error output
Match alignment of error output with general output

signed-off-by Elena Sakhnovitch

Change-Id: Id4334152f4ad5665ff37d5d47e6f7ca0107a9428


[ROCm/rocm_smi_lib commit: 5d5ba738db]
2022-06-24 12:19:43 -04:00
Sreekant Somasekharan 37136ee50e Add rsmi lib function to get memory overdrive value
Change-Id: I515b51d5ce4baf966bb31714886a0d72330026bc


[ROCm/rocm_smi_lib commit: 1432e5e040]
2022-06-23 11:42:50 -04:00
Elena Sakhnovitch 19bbfffbfc [rocm_smi.py] Hiding unnecessary N/A lines
Hiding not applicable/unsupported sensors under INFO

Signed-off-by: Elena Sakhnovitch
Change-Id: I89c80ca7c6365ef3a2dd751a575ddf90044c8a2e


[ROCm/rocm_smi_lib commit: 0f88f59ddd]
2022-06-23 11:02:13 -04:00
Kent Russell 140656e176 rocm_smi.py: Handle corrupted serial number
If the FRU has been corrupted, then the serial number will come in with
any manner of random bytes, which will cause decode() to fail
spectacularily. Check that the serial returned by the kernel is
alphanumeric, and print to the error log if not (then continue to the
next device).

Change-Id: If4f35b140b6089e02729b1490ed6b48d614a122a


[ROCm/rocm_smi_lib commit: 6b6e840337]
2022-06-16 17:29:08 -04:00
Elena Sakhnovitch f4cac3e4ef [rocm_smi.py] error feedback improvement
Cleaning overally verbose error reporting system.

Signed-off-by: Elena Sakhnovitch
Signed-off-by: Sreekant Somasekharan
Change-Id: Icc96086810b8dcfc426848b8c349a2572026c3bd


[ROCm/rocm_smi_lib commit: 4dd2398f3d]
2022-06-16 14:32:13 -04:00
Ranjith Ramakrishnan 24b9610d3a SWDEV-321112 - Use GNUInstallDirs
Use GNUInstallDirs variables to determine the location of LIBDIR, BINDIR, INCLUDEDIR, DOCDIR

Note that CMAKE_INSTALL_LIBDIR is overriden, since the default for RHEL
is lib64, but ROCm packaging wants it to be lib always. Distros or users
can easily override this.

Change-Id: I616152ccd2bc1f5a60bffa940312b38ca6e88c04


[ROCm/rocm_smi_lib commit: b72c464ac0]
2022-06-16 13:22:49 -04:00
Ori Messinger d97ddd9e67 ROCm SMI CLI: Fix setClockRange Error
This patch changes the error handling for setClockRange.

When a device does not support modifying a clock type (sclk/mclk),
an error message is printed through the python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I37d9ea4189b1ca81e5deaab5efa6cfa4901b89b3


[ROCm/rocm_smi_lib commit: 2b8d0ad70f]
2022-06-15 15:47:51 -04:00
Bill(Shuzhou) Liu c8fbb50d8e Remove python pyc file when uninstall rpm
Remove python pyc file when uninstall rpm.

Change-Id: I6520b51aac34060b5b90f94a016cec1827a4973f


[ROCm/rocm_smi_lib commit: 42f11bdd63]
2022-06-09 09:00:38 -04:00
Divya Shikre 100e331812 Print log when PIDs dont use any GPU device.
showpidgpus prints 'none' when no GPU devices are
being used by the running process. Adding a fix
to print a relevant message.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I165a6644a76c8e1c3c3cad676dcfd41eb1c4724f


[ROCm/rocm_smi_lib commit: dcab886394]
2022-05-31 16:17:42 -04:00
Elena Sakhnovitch b2ac46009b [rocm_smi.py]: shownodesbw fix for non xgmi
Improve error output for non-xgmi nodes bandwidth

signed-off-by: Elena Sakhnovitch
Change-Id: I833970d3200a75c7639d33bf19e0e83afe176c8d


[ROCm/rocm_smi_lib commit: 44ea49eb01]
2022-05-24 16:45:32 -04:00
Ori Messinger 750c640171 ROCm SMI CLI: Fix --showvoltagerange bug
This patch fixes a --showvoltagerange bug, which attempts to check
the voltage curve on a device that does not have any voltage
regions in its OverDrive voltage frequency data (odvf).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I647c30c978ffb13f6819ac3d069ee340710a7f99


[ROCm/rocm_smi_lib commit: 786f66671a]
2022-05-21 05:02:15 -04:00
Ori Messinger 2459274e03 ROCm SMI CLI: Fix setPowerOverdrive restPowerOverdrive Bugs
Fixes bug in the 'setPowerOverdrive' function which mishandles
GPUs with secondary dies. Secondary dies have a default power cap
of 0W and cannot be changed, so they are now skipped.

Fixes bug in the 'resetPowerOverdrive' function which incorrectly
resets the wattage to the current value.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I483fa3f58b1fa44a3bf7bae3b52c59ce523ae152


[ROCm/rocm_smi_lib commit: 4298cbb400]
2022-05-21 05:01:32 -04:00
Divya Shikre 75467146b6 Fix mem leaks observed while running rsmitst
1.  Memory allocated for handle was not deleted
when no variant, subvariant or supported function
was found
2. handle->func_id_iter address was set to 0
before delete[]

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iab50fdfbe03eec8e6fd0e84e03bd2c47e645b3d8


[ROCm/rocm_smi_lib commit: b23cfc0e82]
2022-05-18 14:31:44 -04:00
Divya Shikre 231a61a394 Update get_frequencies to handle failures.
Show an optional debug log (RSMI_DEBUG_BITFIELD=2) to
the user in the following scenarios:
1. If more than one current frequency is found
2. If frequencies are not read in increasing order of
   their value
If current frequency is not available, index for it is
set to -1, values will not have * next to it in the
output. This will also be handled in rocm_smi.py.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I477ec065f7513c8045d6392f12ef6cb835a6b8f6


[ROCm/rocm_smi_lib commit: afe996c2ed]
2022-05-11 15:33:15 -04:00
Divya Shikre 853a6e517c Add DEBUG_LOG macro
Add DEBUG_LOG that will optionally print error
message when RSMI_DEBUG_BITFIELD is set to 2.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6017e92d8a9e5f9861ae29ece0488d4bc198f996


[ROCm/rocm_smi_lib commit: 99be3451d7]
2022-05-11 11:03:24 -04:00
Divya Shikre 4a8d4b2878 Add RSMI_CLK_TYPE_PCIE to rsmi_clk_type_t
showclocks/showclkfrq does not display pp_dpm_pcie values
in sriov. This fix adds pcie clocks to rsmi_clk_type_t
where rest of the clocks are present.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I6d129ae412623b369c14456ae9781b2dbceb2139


[ROCm/rocm_smi_lib commit: c9b42bff57]
2022-05-06 09:15:39 -04:00
Ori Messinger c5ac3ea7bd ROCm SMI LIB: Add Missing GPU Blocks
This patch adds the following 4 missing GPU blocks to the SMI LIB:
-RSMI_GPU_BLOCK_MMHUB
-RSMI_GPU_BLOCK_PCIE_BIF
-RSMI_GPU_BLOCK_HDP
-RSMI_GPU_BLOCK_XGMI_WAFL

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia1ec6f53e195f4bf7b8f073d6bed4fdb6572e546


[ROCm/rocm_smi_lib commit: 9d6403bb17]
2022-05-05 00:44:16 -04:00
Elena Sakhnovitch 2a06a86b09 Revert "rocm_smi.py: Don't try to print absent clock files"
This reverts commit 2ba625e569.
DRM device id  does not always match GPU ID in the rocm_smi.py. This leads to cases where wrong device is checked by os.path.isfile().

Change-Id: Ib6f2b9be123b7eb64334d3feec57f63d7eb37d6f


[ROCm/rocm_smi_lib commit: be66d67ef2]
2022-05-03 16:42:42 -04:00
Elena Sakhnovitch 33320d6e1a [rocm_smi.py] Hide unsupported clocks under debug
Signed-off-by: Elena Sakhnovitch <elena.sakhnovitch@amd.com>
Change-Id: I1f2c7b93d9a81f2735c76e8d441f9e298288f5c0


[ROCm/rocm_smi_lib commit: 9d7fd34d2b]
2022-05-03 16:38:22 -04:00
Bill(Shuzhou) Liu 654834be6c Sanity check amdgpu module is loaded in rocm_smi.py
Instead of check /proc/modules for amdgpu, the code will check
/sys/module/amdgpu/initstate which covers the case when the driver
is compiled into the kernel.

Change-Id: Id39ec5b0eb9b68204bc9f5f779057ba8cc090bdc


[ROCm/rocm_smi_lib commit: 9f6614e83b]
2022-04-14 11:28:38 -04:00
Bill(Shuzhou) Liu 54b4ad12df Suppress "rsmi_init() failed" error message
When an application call the library in a system without amdgpu,
it may always print out "rsmi_init() failed". Suppress the error
message in the library.

Change-Id: Ice63dd3a764b221a6935536bff1bfa6aa3e51a46


[ROCm/rocm_smi_lib commit: 7860de5107]
2022-04-12 09:44:00 -04:00
Ori Messinger ecdc660778 ROCm SMI CLI: Fix formatCsv Bug
Fixes a bug in the 'formatCsv' function which mishandles json
data conversion for 'system' data types.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I705060409bf5ae75b994ffda270843065ca12321


[ROCm/rocm_smi_lib commit: e800cbf161]
2022-04-07 19:33:46 -04:00
Bill(Shuzhou) Liu 2e2f757d9d Correct the __pycache__ folder
Remove the __pycache__ in the folder libexec/rocm_smi

Change-Id: I0ad505ff7e7368d5fe86e1eee12080039edc7111


[ROCm/rocm_smi_lib commit: 9f814e150e]
2022-03-24 09:44:33 -04:00
Bill(Shuzhou) Liu 5c0dc0f383 Remove python pyc file when uninstall
Remove python pyc file when uninstall.

Change-Id: I383faf8fcfaeeb346c9ee38c1aad8577a460281e


[ROCm/rocm_smi_lib commit: c37d4bac8f]
2022-03-23 13:39:57 -04:00
Ranjith Ramakrishnan bc3759120d Remove rocm_smi/bin folder and prefix name correction in pragma message
/opt/rocm/rocm_smi/bin folder was added by mistake as part of file reorg and removed the same.
File reorg commit :f391b5d73935ebbadaa8f97185f40eefc88af020
Pragma message for oam header files was showing prefix as rocm_smi, Changed the same to oam

Change-Id: I74b3c1d2bd7e0ff0eee5738c1658063bc855066c


[ROCm/rocm_smi_lib commit: 869670866d]
2022-03-17 18:16:10 -07:00
Kent Russell 0e18159c0e README: Remove restrictive licensing language
Also update copyright years

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ic9ead543c4937680afc1957623c4d5fcbfbd58b0


[ROCm/rocm_smi_lib commit: 85571318e2]
2022-03-16 13:52:25 -04:00
Sreekant Somasekharan 51f6fe8bc1 make string variable 'tpath' an empty string.
string variable not being empty can lead to incorrect compilation
and corrupted output.

Change-Id: Ie66756c28aef7417759c29387500970a8b53e44c


[ROCm/rocm_smi_lib commit: dbe3403bd3]
2022-03-11 21:22:28 -05:00
Bill(Shuzhou) Liu 30fb7220bd Upgrade GoogleTest to v1.11.0
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.

Change-Id: I6bbe6afdfad6422a210f258880ddc87a9f088d76


[ROCm/rocm_smi_lib commit: 8ce9289bc2]
2022-03-09 15:18:43 -05:00
Sreekant Somasekharan 265026251b Add blacklist filter 'virtualization' for rsmi tests failing in SRIOV
Change-Id: Ibbaef092482c0b78ecd86a29f0b9b4331b51abe2


[ROCm/rocm_smi_lib commit: e6ae697e9c]
2022-03-04 22:13:44 -05:00
Elena Sakhnovitch 090011b153 [rocm_smi.py] resetPowerOverdrive fix
resetPowerOverdrive: improve output messages.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ic5b9084f0637458c36e460231f2d3622b0a23aa6


[ROCm/rocm_smi_lib commit: a3317714cb]
2022-03-04 11:26:45 -05:00
Ranjith Ramakrishnan f391b5d739 File reorganization with backward compatibility
Wrapper header files
Soft link to libraries and binaries
rocm_smi.py and rsmiBindings.py installed in libexec/rocm_smi
Binaries, libraries and header files installed as per File Reorg folder structure

Change-Id: I3166ab67f89c2ae4aafbc87bb00c9a5233221ade


[ROCm/rocm_smi_lib commit: f1da5591b5]
2022-03-03 18:48:52 -05:00
Bill(Shuzhou) Liu 9b2017de0c Prevent stack buffer overflow
readlink() does not append a null byte to buffer. Initialize the
tpath to prevent stack buffer overflow.

Change-Id: I17895dc3576b080a0c35bd0528a5b83223ec1c1b


[ROCm/rocm_smi_lib commit: 4b65b0307f]
2022-03-03 15:43:53 -05:00
Saravanan Solaiyappan 913986f721 Consider apt/yum upgrade operation check in package scripts.
Include the upgrade operation check in the prerm and postun scripts
for rocm-smi-lib package.

Signed-off-by: Saravanan Solaiyappan <saravanan.solaiyappan@amd.com>
Change-Id: Ic3dee7ae50a2ac317f1aab88472b6d4805c4de90


[ROCm/rocm_smi_lib commit: 3a3b8dd25d]
2022-02-24 10:11:32 -05:00
Elena Sakhnovitch 45763cc1bb [rocm_smi.py]: fix input error type for --setclock
signed-off-by: Elena Sakhnovitch
Change-Id: I9626978780f360c591fb8908f5b759f2289dff0b


[ROCm/rocm_smi_lib commit: 9b871fcd9f]
2022-02-22 14:24:38 -05:00
Freddy Paul 566a0c794c rocm-smi:Fix cmake target files to reflect correct location
Change-Id: I86fda8447609c42e0f0615abd837b53ca5fbe717


[ROCm/rocm_smi_lib commit: d0545854dd]
2022-02-18 09:53:43 -08:00
Ori Messinger 9d6285f6c8 ROCm SMI CLI: Hide Failed Command Warning
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9


[ROCm/rocm_smi_lib commit: 007f326c34]
2022-02-09 11:52:33 -05:00
Bill(Shuzhou) Liu f4ad11bc29 Link the library using sha1 build-id
The address sanitizer build requires build id more than 8 bytes.

Change-Id: I530fe87dffbf4c46f010bf8a1c2914f733678e9a


[ROCm/rocm_smi_lib commit: 3aab7b199e]
2022-02-02 17:04:11 -05:00
Divya Shikre 25c9398a0d Temporary blacklist TestPerfLevelReadWrite for navi21
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iee2146170b6828fe4fe2846c3ebfd57f95734f34


[ROCm/rocm_smi_lib commit: 8c4635acea]
2022-01-27 22:56:37 -05:00