Граф коммитов

375 Коммитов

Автор SHA1 Сообщение Дата
Freddy Paul 566a0c794c rocm-smi:Fix cmake target files to reflect correct location
Change-Id: I86fda8447609c42e0f0615abd837b53ca5fbe717


[ROCm/rocm_smi_lib commit: d0545854dd]
2022-02-18 09:53:43 -08:00
Ori Messinger 9d6285f6c8 ROCm SMI CLI: Hide Failed Command Warning
The purpose of this patch is to hide 'One or more commands failed.'
from showing up, unless an appropriate log level has been set.

You can set the loglevel in the CLI with:
--loglevel <debug/info/warning/error/critical>

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ifa309cd62596491a6ea5892e0752251f037fc0e9


[ROCm/rocm_smi_lib commit: 007f326c34]
2022-02-09 11:52:33 -05:00
Bill(Shuzhou) Liu f4ad11bc29 Link the library using sha1 build-id
The address sanitizer build requires build id more than 8 bytes.

Change-Id: I530fe87dffbf4c46f010bf8a1c2914f733678e9a


[ROCm/rocm_smi_lib commit: 3aab7b199e]
2022-02-02 17:04:11 -05:00
Divya Shikre 25c9398a0d Temporary blacklist TestPerfLevelReadWrite for navi21
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iee2146170b6828fe4fe2846c3ebfd57f95734f34


[ROCm/rocm_smi_lib commit: 8c4635acea]
2022-01-27 22:56:37 -05:00
Laurent Morichetti fbb6e77dda Don't use NDEBUG when the intent is !DEBUG
CMakeLists.txt does not set up the DEBUG macro correctly to mean
!NDEBUG, so, as a workaround, replace all uses of ifdef NDEBUG with
ifndef DEBUG in the library sources.

Change-Id: I408adb36d1a2310fb894a486574469662ebb27cd
(cherry picked from commit f430cd4f91)


[ROCm/rocm_smi_lib commit: 2804bf7c28]
2022-01-27 11:08:48 -05:00
Divya Shikre a7a7c65e2a Add fix to check for vector size while reading pp_dpm_pcie
pop_back() was causing a seg fault when pp_dpm_pcie file is empty and returns whitespace.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I888f1f79751cd456e43751a5b96d08560a039677


[ROCm/rocm_smi_lib commit: ec71380e1c]
2022-01-26 10:34:57 -05:00
Bill(Shuzhou) Liu 9db28252c2 Add rpm License header
Add rpm License header for cpack

Change-Id: I2f4a89015b6389cfde801f41d4f6e0f59e7087aa


[ROCm/rocm_smi_lib commit: ce9cfa584f]
2022-01-20 13:30:40 -05:00
Divya Shikre 17e4460690 Don't assert when fan is not supported.
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d


[ROCm/rocm_smi_lib commit: 11a71c63b1]
2022-01-20 12:29:12 -05:00
Bill(Shuzhou) Liu 1a1e04b5a2 Add license file to smi-lib package
Install LICENSE.txt to share/doc/smi-lib

Change-Id: Idcbb70db8808111203e8e4a4c3ab4d1e070ac79d


[ROCm/rocm_smi_lib commit: 3356084074]
2022-01-19 12:15:31 -05:00
Sreekant Somasekharan 8266782850 Print ASD firmware version in hex instead of decimal format
Change-Id: Idf113f63b79f2d2903ae795d272d232a43680516


[ROCm/rocm_smi_lib commit: cf2f0b0508]
2022-01-18 10:44:20 -05:00
Bill(Shuzhou) Liu 9824aa1545 Enable the linker build id generation for address sanitizer build
The -Wl,--build-id option is added for address sanitizer build

Change-Id: I0d75bc8e6169010c460e62e51708828e75de478e


[ROCm/rocm_smi_lib commit: 7b69dde24f]
2022-01-17 09:06:34 -05:00
Bill(Shuzhou) Liu 7bf29acf35 strip the library instead of link when build release
When build the release, it will strip the library file instead of link.

Change-Id: Ib2d4cea614e8938bdb2be0fd74f046680158d256


[ROCm/rocm_smi_lib commit: 77502bed2a]
2022-01-14 10:39:15 -05:00
Harish Kasiviswanathan 16a9531a4d rocm_smi_lib: add stdbool.h needed for C90
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c


[ROCm/rocm_smi_lib commit: 8de6ed2b8d]
2021-12-14 15:25:59 -05:00
Elena Sakhnovitch 5553c7fb40 [rocm_smi.py] remove \r symbol at print
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.

Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d


[ROCm/rocm_smi_lib commit: 1aeb27c4c9]
2021-12-08 10:13:56 -05:00
Divya Shikre a83ee69dd3 Add null ptr check for temperature read from all sensors.
The (temperature == nullptr) check happens only when HBM temperature is retrieved.
This check needs to apply in other cases as well, hence moving this outside the HBM condition.
This should return RSMI_STATUS_INVALID_ARGS consistently in all cases when nullptr is passed through rsmitst.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iea3cec75312a0a669c7da27e15e9782e6a885c5f


[ROCm/rocm_smi_lib commit: 432df20321]
2021-12-01 14:05:46 -05:00
Divya Shikre 92fe455a8e Update temp_read rsmitst.
Check for RSMI_STATUS_INVALID_ARGS when invalid args are passed.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d5ff84aee5cce4214026ddcd860a17ae3e43147


[ROCm/rocm_smi_lib commit: b4fd9c0d94]
2021-11-29 18:09:45 -05:00
Sreekant Somasekharan 835f43311a Skip TestFrequenciesReadWrite for unsupported ASICs
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.

As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.

Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd


[ROCm/rocm_smi_lib commit: c6f695f5a9]
2021-11-29 11:23:38 -05:00
Divya Shikre c23694e66a Add fix to display correct GPU Memory Activity and GFX Activity value.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab


[ROCm/rocm_smi_lib commit: 7b1daaef96]
2021-11-25 14:28:06 -05:00
Divya Shikre a95af9b70d Add fix for out of range temperature value for HBM.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iacb6474486e3732f2aa824ff447c17f8243b65cd


[ROCm/rocm_smi_lib commit: f61cb1b41d]
2021-11-23 15:37:41 -05:00
Sreekant Somasekharan 70be1fab11 Modify bool variable to true in if condition of src=dst
Change-Id: Ie2024b3a6ad68e48384bb3472fe8785bcd643665


[ROCm/rocm_smi_lib commit: 3f27dcc1ac]
2021-11-17 12:53:40 -05:00
Ori Messinger 4883fa50c4 ROCm SMI CLI: Fix printErrLog Arguments
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955


[ROCm/rocm_smi_lib commit: 40eed25a3b]
2021-11-08 12:54:00 -05:00
Elena Sakhnovitch 8a5effb2e1 [ROCm-SMI] add --showNodesBw
Display min and max bandwidth between gpu nodes

Signed-off-by: Elena Sakhnovitch
Change-Id: I7289fb83f80e2f899996b7d7560ece670cc5f31f


[ROCm/rocm_smi_lib commit: 13cde8429d]
2021-10-29 12:49:35 -04:00
Elena Sakhnovitch f0a86d3d29 [rocm_smi.py] remove repetitive footnote
Printing "Primary die (usually one above or below the secondary) shows
total (primary + secondary) socket power information" footnote only one time, not
for every secondary die.

Signed-off-by: Elena Sakhnovitch
Change-Id: Iae9c5c94945ec38ecdb128a576a4eacafc30a044


[ROCm/rocm_smi_lib commit: 15e4fe80e1]
2021-10-29 08:32:06 -04:00
Sreekant Somasekharan 01cb5a2b61 Add test case for rsmi_is_P2P_accessible API.
Change-Id: Iccfede42925c98d96454b5f25cc0ed6fc9258911


[ROCm/rocm_smi_lib commit: ce46fd237a]
2021-10-28 17:06:07 -04:00
Elena Sakhnovitch 71fe1f8bce [ROCm SMI LIB]: Add rsmi_minmax_bandwidth_get()
API provides min/max bandwidth values between nodes.
(Current implementation only supports directly (1 hop)
connected XGMI devices.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ifc95da13845fbe7903c5386d320183ffd58c5b53


[ROCm/rocm_smi_lib commit: 50ea68e694]
2021-10-28 17:00:41 -04:00
Divya Shikre 68a7ef02b5 Add failing rsmi tests to exclude file to enable blacklisting
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibdad4d54ffe87391b13379c63e005fd04c6abaf5


[ROCm/rocm_smi_lib commit: e96d6ab77e]
2021-10-26 17:57:05 -04:00
Ori Messinger b1720b42cd ROCm SMI CLI: Add --showtopoaccess Functionality
The purpose of this patch is to implement --showtopoaccess
functionality in the CLI, which shows True or False if P2P is
possible between two given GPUs.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I07d70d80ae7b484136b31d5d22780c4990029391


[ROCm/rocm_smi_lib commit: e2d9a37e5f]
2021-10-14 11:06:05 -04:00
Ori Messinger a9993fb509 ROCm SMI LIB: Add rsmi_is_P2P_accessible() API
Implements rsmi_is_p2p_accessible API.
The function returns True if P2P is possible between two nodes.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ic7316eebcec4480175c7ad04c21a42b2e1a4c454


[ROCm/rocm_smi_lib commit: ff02042c64]
2021-10-13 22:01:33 -04:00
Bill(Shuzhou) Liu 6601ab509b Add cmake target for rocm_smi
rocm_smi will provide cmake files exporting the INCLUDE/LIBRARY targets.

Change-Id: I1943a3142bdc0abd8f03ff62e12e947aac835401


[ROCm/rocm_smi_lib commit: 088fe48d12]
2021-10-04 11:08:23 -04:00
Elena Sakhnovitch 683df7e44c [rocm_smi.py]: fix fan 255% error
signed-off-by: Elena Sakhnovitch
Change-Id: I265ba32bc3777db5f04f1924547fe432ba78c3d0


[ROCm/rocm_smi_lib commit: 2f84906cc2]
2021-09-29 21:11:06 -04:00
Elena Sakhnovitch bc5030e721 [rocm_smi.py]: pep8 formatting
signed-off-by: Elena Sakhnovitch
Change-Id: If12b3371cd6acac16d9f6b3adf5f5cc8df28992f


[ROCm/rocm_smi_lib commit: 80140c3b02]
2021-08-26 10:23:58 -04:00
Elena Sakhnovitch 64a2e50a43 rocm_smi_lib: fix gpu_metrics_v1_3 support
Signed-off-by: Elena Sakhnovitch
Change-Id: Ia7a6b17eb0f317465613ba92ae7548a221c46ee3


[ROCm/rocm_smi_lib commit: 5e1bfcadd7]
2021-08-13 11:59:50 -04:00
Elena Sakhnovitch 0bd439006a rocm_smi_lib: add gpu_metrics_v1_3 support
Signed-off-by: Elena Sakhnovitch
Change-Id: I4a9dedc80b8fce60e12c5baf8651d54d16a6a41c


[ROCm/rocm_smi_lib commit: fee82af1fe]
2021-08-13 09:23:35 -04:00
Harish Kasiviswanathan 131a0d1705 Fall back to pci-ids if FRU product_name is empty
rocm-smi --showproductname will not show "Card series" in its output if
product_name exported by Kernel is empty string. This has been raised a
regression by customer.

BUG: SWDEV-297228

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I9aae24778e2d3a30aa661d8f338278c1666590fb


[ROCm/rocm_smi_lib commit: 7a8c3f3629]
2021-08-04 10:53:55 -04:00
Bill(Shuzhou) Liu 3e74c1a903 support rocm_smi_lib version in the header file
Package the rocm_smi64Config.h into deb/rpm.

Change-Id: Ic4ba90646a0dbeb8bc2dd4edf455004b1a7ea859


[ROCm/rocm_smi_lib commit: 26874d2a10]
2021-08-04 10:19:44 -04:00
Bill(Shuzhou) Liu 1f9cb25055 Add -g compiler option for ADDRESS_SANITIZER
Add -g compiler option for Address Sanitizer

Change-Id: I958fefa6c4b5871c29734ab1d4ec238c9e073192


[ROCm/rocm_smi_lib commit: 42d39d3e34]
2021-08-03 13:54:19 -04:00
Elena Sakhnovitch 578d20c037 [rocm_smi.py] --showpower error bugfix
Fix error message in -P for secondary die

Signed-off-by: Elena Sakhnovitch
Change-Id: Ica3c0a83b565d2231fad23389b9378056a0f56b3


[ROCm/rocm_smi_lib commit: 2db7e2a312]
2021-07-30 00:08:14 -04:00
Elena Sakhnovitch ebba123919 [rocm_smi.py] add secondary die check.
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I46618002c1967ec115db88becbaba9e7c0a08af1


[ROCm/rocm_smi_lib commit: b59e752122]
2021-07-29 17:46:12 -04:00
Harish Kasiviswanathan 3da3df8905 rocm_smi.py: Remove extraneous line during process termination
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.

BUG: SWDEV-296409

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef


[ROCm/rocm_smi_lib commit: a03acf2c07]
2021-07-27 16:26:49 -04:00
Icarus Sparry 8a61dda627 Add dependency on rocm-core
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: Ie2a5b08747129a1313edf2a834f2e0e8638372c2
(cherry picked from commit c0ad8375c7)


[ROCm/rocm_smi_lib commit: de025ca5f6]
2021-07-27 09:42:30 -04:00
Ori Messinger 8e3d715d10 ROCm SMI Python CLI: Fix printLog Collisions
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.

This fixes the --showevents output for multi-GPU systems.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c


[ROCm/rocm_smi_lib commit: 95348f37cc]
2021-07-21 23:58:07 -04:00
Ori Messinger b8324162e0 ROCm SMI Python CLI: Add Zero Padding to Device Model
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850


[ROCm/rocm_smi_lib commit: 03ae187a35]
2021-07-17 04:29:52 -04:00
Bill(Shuzhou) Liu a4dba3ba7e AddressSanitizer report stack-use-after-scope
Fix the stack-use-after-scope error reported by the AddressSanitizer.

Bug: SWDEV-291913
Change-Id: I0ffd71af8679b8bff6c363096fafe75dffcf329e


[ROCm/rocm_smi_lib commit: 8c60dbebaa]
2021-06-25 13:33:38 -04:00
Divya Shikre 9abb288ace Add fix to ignore error returned when perf determinism is not supported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I89b6a0a3dbba6fbd4b12ff2e20670eff9f32ed7f


[ROCm/rocm_smi_lib commit: 6edea7a92e]
2021-06-14 12:18:22 -04:00
Divya Shikre 47d033876c Add fix to show usage of setperfdeterminism functionality in --help command
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ife93c887eea2a9aae69f2923dba45c7cde4838d3


[ROCm/rocm_smi_lib commit: 686e6ac654]
2021-05-12 17:29:37 -04:00
Divya Shikre f3c90aa582 Return an error when user tries to set out of range clock values for setsrange functionality
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibe1075c1d2b6c009332a52b81f4b41f7e93d0756


[ROCm/rocm_smi_lib commit: 462d4adc24]
2021-05-11 12:32:19 -04:00
Harish Kasiviswanathan deac3a055c Add timestamp resolution info in comments
Specify that timestamp resolution is in ns in header file.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4db00a07c0b5c43ae23c98213f2fbbcf93110234


[ROCm/rocm_smi_lib commit: 14201290a2]
2021-05-05 12:32:58 -04:00
Harish Kasiviswanathan 1f7954113f Add support to read gpu_metrics version 1.2
gpu_metrics version 1.2 provides atomic timestamp. Use this timestamp.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I7a1a675f53b93718f34b1f2979173e9064e0ef93


[ROCm/rocm_smi_lib commit: 6b10a7761b]
2021-05-05 12:31:10 -04:00
Harish Kasiviswanathan 1aac6e61d4 Change #define RSMI_GPU_METRICS_API_CONTENT_VER
Chnage to RSMI_GPU_METRICS_API_CONTENT_VER_1. In preparation for
supporting additional formats

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4367a2622a0fa41e6b05bc4436ecd24b8c4e30e2


[ROCm/rocm_smi_lib commit: e83cf605c6]
2021-05-04 20:51:10 -04:00
Harish Kasiviswanathan debafec88c Move gpu_metrics functions to different file
No logic change. Only structural change

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Id5e1a678c0888f04081ee06db4521c72b5eb9b16


[ROCm/rocm_smi_lib commit: c416726054]
2021-05-04 20:49:51 -04:00