pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677
[ROCm/amdsmi commit: ec71380e1c]
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d
[ROCm/amdsmi commit: 11a71c63b1]
When build the release, it will strip the library file instead of link.
Change-Id: Ib2d4cea614e8938bdb2be0fd74f046680158d256
[ROCm/amdsmi commit: 77502bed2a]
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c
[ROCm/amdsmi commit: 8de6ed2b8d]
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.
Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d
[ROCm/amdsmi commit: 1aeb27c4c9]
The (temperature == nullptr) check happens only when HBM temperature is retrieved.
This check needs to apply in other cases as well, hence moving this outside the HBM condition.
This should return RSMI_STATUS_INVALID_ARGS consistently in all cases when nullptr is passed through rsmitst.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iea3cec75312a0a669c7da27e15e9782e6a885c5f
[ROCm/amdsmi commit: 432df20321]
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.
As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.
Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd
[ROCm/amdsmi commit: c6f695f5a9]
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab
[ROCm/amdsmi commit: 7b1daaef96]
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iacb6474486e3732f2aa824ff447c17f8243b65cd
[ROCm/amdsmi commit: f61cb1b41d]
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955
[ROCm/amdsmi commit: 40eed25a3b]
Display min and max bandwidth between gpu nodes
Signed-off-by: Elena Sakhnovitch
Change-Id: I7289fb83f80e2f899996b7d7560ece670cc5f31f
[ROCm/amdsmi commit: 13cde8429d]
Printing "Primary die (usually one above or below the secondary) shows
total (primary + secondary) socket power information" footnote only one time, not
for every secondary die.
Signed-off-by: Elena Sakhnovitch
Change-Id: Iae9c5c94945ec38ecdb128a576a4eacafc30a044
[ROCm/amdsmi commit: 15e4fe80e1]
The purpose of this patch is to implement --showtopoaccess
functionality in the CLI, which shows True or False if P2P is
possible between two given GPUs.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I07d70d80ae7b484136b31d5d22780c4990029391
[ROCm/amdsmi commit: e2d9a37e5f]
Implements rsmi_is_p2p_accessible API.
The function returns True if P2P is possible between two nodes.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ic7316eebcec4480175c7ad04c21a42b2e1a4c454
[ROCm/amdsmi commit: ff02042c64]
rocm_smi will provide cmake files exporting the INCLUDE/LIBRARY targets.
Change-Id: I1943a3142bdc0abd8f03ff62e12e947aac835401
[ROCm/amdsmi commit: 088fe48d12]
rocm-smi --showproductname will not show "Card series" in its output if
product_name exported by Kernel is empty string. This has been raised a
regression by customer.
BUG: SWDEV-297228
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I9aae24778e2d3a30aa661d8f338278c1666590fb
[ROCm/amdsmi commit: 7a8c3f3629]
Fix error message in -P for secondary die
Signed-off-by: Elena Sakhnovitch
Change-Id: Ica3c0a83b565d2231fad23389b9378056a0f56b3
[ROCm/amdsmi commit: 2db7e2a312]
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.
BUG: SWDEV-296409
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef
[ROCm/amdsmi commit: a03acf2c07]
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.
This fixes the --showevents output for multi-GPU systems.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c
[ROCm/amdsmi commit: 95348f37cc]
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850
[ROCm/amdsmi commit: 03ae187a35]
Fix the stack-use-after-scope error reported by the AddressSanitizer.
Bug: SWDEV-291913
Change-Id: I0ffd71af8679b8bff6c363096fafe75dffcf329e
[ROCm/amdsmi commit: 8c60dbebaa]
Specify that timestamp resolution is in ns in header file.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4db00a07c0b5c43ae23c98213f2fbbcf93110234
[ROCm/amdsmi commit: 14201290a2]
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c
[ROCm/amdsmi commit: 5b42cdf780]
Since device is a list, we need to pass a single item to the isAmdGpu
function.
Fixes: ffbe481241 "rocm_smi.py: Don't try to reset non-AMD GPUs"
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307
[ROCm/amdsmi commit: 242d94a668]
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634
[ROCm/amdsmi commit: b931380f02]
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587
[ROCm/amdsmi commit: b71e07b3fb]
Implement default GPU power cap functionality in the LIB.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia6b3420beb0e4df5559c3e6d11d0667972590b53
[ROCm/amdsmi commit: 83cd2fe4f1]