Graf commitů

301 Commity

Autor SHA1 Zpráva Datum
Bill(Shuzhou) Liu fcbb9e5945 Enable the linker build id generation for address sanitizer build
The -Wl,--build-id option is added for address sanitizer build

Change-Id: I0d75bc8e6169010c460e62e51708828e75de478e


[ROCm/amdsmi commit: 7b69dde24f]
2022-01-17 09:06:34 -05:00
Bill(Shuzhou) Liu e21e1aff43 strip the library instead of link when build release
When build the release, it will strip the library file instead of link.

Change-Id: Ib2d4cea614e8938bdb2be0fd74f046680158d256


[ROCm/amdsmi commit: 77502bed2a]
2022-01-14 10:39:15 -05:00
Harish Kasiviswanathan a014132bba rocm_smi_lib: add stdbool.h needed for C90
'bool' keyword is supported only from C99 onwards. Include stdbool.h
for older compilers

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I09fd5cf6eac20e7185e85a1123bc4826958b2b7c


[ROCm/amdsmi commit: 8de6ed2b8d]
2021-12-14 15:25:59 -05:00
Elena Sakhnovitch 48a2251ff6 [rocm_smi.py] remove \r symbol at print
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.

Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d


[ROCm/amdsmi commit: 1aeb27c4c9]
2021-12-08 10:13:56 -05:00
Divya Shikre d72346c920 Add null ptr check for temperature read from all sensors.
The (temperature == nullptr) check happens only when HBM temperature is retrieved.
This check needs to apply in other cases as well, hence moving this outside the HBM condition.
This should return RSMI_STATUS_INVALID_ARGS consistently in all cases when nullptr is passed through rsmitst.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iea3cec75312a0a669c7da27e15e9782e6a885c5f


[ROCm/amdsmi commit: 432df20321]
2021-12-01 14:05:46 -05:00
Divya Shikre 656b39646e Update temp_read rsmitst.
Check for RSMI_STATUS_INVALID_ARGS when invalid args are passed.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d5ff84aee5cce4214026ddcd860a17ae3e43147


[ROCm/amdsmi commit: b4fd9c0d94]
2021-11-29 18:09:45 -05:00
Sreekant Somasekharan 1a4346e6ba Skip TestFrequenciesReadWrite for unsupported ASICs
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.

As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.

Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd


[ROCm/amdsmi commit: c6f695f5a9]
2021-11-29 11:23:38 -05:00
Divya Shikre 58b5a538a7 Add fix to display correct GPU Memory Activity and GFX Activity value.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab


[ROCm/amdsmi commit: 7b1daaef96]
2021-11-25 14:28:06 -05:00
Divya Shikre 73f8cb0c71 Add fix for out of range temperature value for HBM.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iacb6474486e3732f2aa824ff447c17f8243b65cd


[ROCm/amdsmi commit: f61cb1b41d]
2021-11-23 15:37:41 -05:00
Sreekant Somasekharan d48617a959 Modify bool variable to true in if condition of src=dst
Change-Id: Ie2024b3a6ad68e48384bb3472fe8785bcd643665


[ROCm/amdsmi commit: 3f27dcc1ac]
2021-11-17 12:53:40 -05:00
Ori Messinger 7e248102eb ROCm SMI CLI: Fix printErrLog Arguments
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955


[ROCm/amdsmi commit: 40eed25a3b]
2021-11-08 12:54:00 -05:00
Elena Sakhnovitch 398df0b9d0 [ROCm-SMI] add --showNodesBw
Display min and max bandwidth between gpu nodes

Signed-off-by: Elena Sakhnovitch
Change-Id: I7289fb83f80e2f899996b7d7560ece670cc5f31f


[ROCm/amdsmi commit: 13cde8429d]
2021-10-29 12:49:35 -04:00
Elena Sakhnovitch ff2bcc16fa [rocm_smi.py] remove repetitive footnote
Printing "Primary die (usually one above or below the secondary) shows
total (primary + secondary) socket power information" footnote only one time, not
for every secondary die.

Signed-off-by: Elena Sakhnovitch
Change-Id: Iae9c5c94945ec38ecdb128a576a4eacafc30a044


[ROCm/amdsmi commit: 15e4fe80e1]
2021-10-29 08:32:06 -04:00
Sreekant Somasekharan 3370c2dd7a Add test case for rsmi_is_P2P_accessible API.
Change-Id: Iccfede42925c98d96454b5f25cc0ed6fc9258911


[ROCm/amdsmi commit: ce46fd237a]
2021-10-28 17:06:07 -04:00
Elena Sakhnovitch b6792d995b [ROCm SMI LIB]: Add rsmi_minmax_bandwidth_get()
API provides min/max bandwidth values between nodes.
(Current implementation only supports directly (1 hop)
connected XGMI devices.

Signed-off-by: Elena Sakhnovitch
Change-Id: Ifc95da13845fbe7903c5386d320183ffd58c5b53


[ROCm/amdsmi commit: 50ea68e694]
2021-10-28 17:00:41 -04:00
Divya Shikre 2d9a7dfb0b Add failing rsmi tests to exclude file to enable blacklisting
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibdad4d54ffe87391b13379c63e005fd04c6abaf5


[ROCm/amdsmi commit: e96d6ab77e]
2021-10-26 17:57:05 -04:00
Ori Messinger de16bc4552 ROCm SMI CLI: Add --showtopoaccess Functionality
The purpose of this patch is to implement --showtopoaccess
functionality in the CLI, which shows True or False if P2P is
possible between two given GPUs.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I07d70d80ae7b484136b31d5d22780c4990029391


[ROCm/amdsmi commit: e2d9a37e5f]
2021-10-14 11:06:05 -04:00
Ori Messinger ac01f99000 ROCm SMI LIB: Add rsmi_is_P2P_accessible() API
Implements rsmi_is_p2p_accessible API.
The function returns True if P2P is possible between two nodes.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ic7316eebcec4480175c7ad04c21a42b2e1a4c454


[ROCm/amdsmi commit: ff02042c64]
2021-10-13 22:01:33 -04:00
Bill(Shuzhou) Liu c1c9290a55 Add cmake target for rocm_smi
rocm_smi will provide cmake files exporting the INCLUDE/LIBRARY targets.

Change-Id: I1943a3142bdc0abd8f03ff62e12e947aac835401


[ROCm/amdsmi commit: 088fe48d12]
2021-10-04 11:08:23 -04:00
Elena Sakhnovitch 8b42fe51b5 [rocm_smi.py]: fix fan 255% error
signed-off-by: Elena Sakhnovitch
Change-Id: I265ba32bc3777db5f04f1924547fe432ba78c3d0


[ROCm/amdsmi commit: 2f84906cc2]
2021-09-29 21:11:06 -04:00
Elena Sakhnovitch cda3383b3b [rocm_smi.py]: pep8 formatting
signed-off-by: Elena Sakhnovitch
Change-Id: If12b3371cd6acac16d9f6b3adf5f5cc8df28992f


[ROCm/amdsmi commit: 80140c3b02]
2021-08-26 10:23:58 -04:00
Elena Sakhnovitch a620a895db rocm_smi_lib: fix gpu_metrics_v1_3 support
Signed-off-by: Elena Sakhnovitch
Change-Id: Ia7a6b17eb0f317465613ba92ae7548a221c46ee3


[ROCm/amdsmi commit: 5e1bfcadd7]
2021-08-13 11:59:50 -04:00
Elena Sakhnovitch 9a7f79905d rocm_smi_lib: add gpu_metrics_v1_3 support
Signed-off-by: Elena Sakhnovitch
Change-Id: I4a9dedc80b8fce60e12c5baf8651d54d16a6a41c


[ROCm/amdsmi commit: fee82af1fe]
2021-08-13 09:23:35 -04:00
Harish Kasiviswanathan 0fa22cf381 Fall back to pci-ids if FRU product_name is empty
rocm-smi --showproductname will not show "Card series" in its output if
product_name exported by Kernel is empty string. This has been raised a
regression by customer.

BUG: SWDEV-297228

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I9aae24778e2d3a30aa661d8f338278c1666590fb


[ROCm/amdsmi commit: 7a8c3f3629]
2021-08-04 10:53:55 -04:00
Bill(Shuzhou) Liu e72af58262 support rocm_smi_lib version in the header file
Package the rocm_smi64Config.h into deb/rpm.

Change-Id: Ic4ba90646a0dbeb8bc2dd4edf455004b1a7ea859


[ROCm/amdsmi commit: 26874d2a10]
2021-08-04 10:19:44 -04:00
Bill(Shuzhou) Liu e26cec9f5a Add -g compiler option for ADDRESS_SANITIZER
Add -g compiler option for Address Sanitizer

Change-Id: I958fefa6c4b5871c29734ab1d4ec238c9e073192


[ROCm/amdsmi commit: 42d39d3e34]
2021-08-03 13:54:19 -04:00
Elena Sakhnovitch 8e8586591a [rocm_smi.py] --showpower error bugfix
Fix error message in -P for secondary die

Signed-off-by: Elena Sakhnovitch
Change-Id: Ica3c0a83b565d2231fad23389b9378056a0f56b3


[ROCm/amdsmi commit: 2db7e2a312]
2021-07-30 00:08:14 -04:00
Elena Sakhnovitch fc4aa3d271 [rocm_smi.py] add secondary die check.
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I46618002c1967ec115db88becbaba9e7c0a08af1


[ROCm/amdsmi commit: b59e752122]
2021-07-29 17:46:12 -04:00
Harish Kasiviswanathan 419b720ea5 rocm_smi.py: Remove extraneous line during process termination
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.

BUG: SWDEV-296409

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef


[ROCm/amdsmi commit: a03acf2c07]
2021-07-27 16:26:49 -04:00
Icarus Sparry f860abd385 Add dependency on rocm-core
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: Ie2a5b08747129a1313edf2a834f2e0e8638372c2
(cherry picked from commit 3d74653383)


[ROCm/amdsmi commit: de025ca5f6]
2021-07-27 09:42:30 -04:00
Ori Messinger 546e11c058 ROCm SMI Python CLI: Fix printLog Collisions
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.

This fixes the --showevents output for multi-GPU systems.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c


[ROCm/amdsmi commit: 95348f37cc]
2021-07-21 23:58:07 -04:00
Ori Messinger 0cdc8fb26c ROCm SMI Python CLI: Add Zero Padding to Device Model
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850


[ROCm/amdsmi commit: 03ae187a35]
2021-07-17 04:29:52 -04:00
Bill(Shuzhou) Liu d9e824060c AddressSanitizer report stack-use-after-scope
Fix the stack-use-after-scope error reported by the AddressSanitizer.

Bug: SWDEV-291913
Change-Id: I0ffd71af8679b8bff6c363096fafe75dffcf329e


[ROCm/amdsmi commit: 8c60dbebaa]
2021-06-25 13:33:38 -04:00
Divya Shikre 7a0b4bc8ac Add fix to ignore error returned when perf determinism is not supported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I89b6a0a3dbba6fbd4b12ff2e20670eff9f32ed7f


[ROCm/amdsmi commit: 6edea7a92e]
2021-06-14 12:18:22 -04:00
Divya Shikre d356da056d Add fix to show usage of setperfdeterminism functionality in --help command
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ife93c887eea2a9aae69f2923dba45c7cde4838d3


[ROCm/amdsmi commit: 686e6ac654]
2021-05-12 17:29:37 -04:00
Divya Shikre ebec7991cb Return an error when user tries to set out of range clock values for setsrange functionality
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibe1075c1d2b6c009332a52b81f4b41f7e93d0756


[ROCm/amdsmi commit: 462d4adc24]
2021-05-11 12:32:19 -04:00
Harish Kasiviswanathan 10a16579c1 Add timestamp resolution info in comments
Specify that timestamp resolution is in ns in header file.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4db00a07c0b5c43ae23c98213f2fbbcf93110234


[ROCm/amdsmi commit: 14201290a2]
2021-05-05 12:32:58 -04:00
Harish Kasiviswanathan 0e17236bc5 Add support to read gpu_metrics version 1.2
gpu_metrics version 1.2 provides atomic timestamp. Use this timestamp.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I7a1a675f53b93718f34b1f2979173e9064e0ef93


[ROCm/amdsmi commit: 6b10a7761b]
2021-05-05 12:31:10 -04:00
Harish Kasiviswanathan 3c7b9cef95 Change #define RSMI_GPU_METRICS_API_CONTENT_VER
Chnage to RSMI_GPU_METRICS_API_CONTENT_VER_1. In preparation for
supporting additional formats

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4367a2622a0fa41e6b05bc4436ecd24b8c4e30e2


[ROCm/amdsmi commit: e83cf605c6]
2021-05-04 20:51:10 -04:00
Harish Kasiviswanathan ab54197e08 Move gpu_metrics functions to different file
No logic change. Only structural change

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Id5e1a678c0888f04081ee06db4521c72b5eb9b16


[ROCm/amdsmi commit: c416726054]
2021-05-04 20:49:51 -04:00
Ori Messinger a9e6f40bbb ROCm SMI LIB: Add Default Power Cap To rsmitst
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c


[ROCm/amdsmi commit: 5b42cdf780]
2021-04-28 12:42:34 -04:00
Kent Russell 23635d1f90 rocm_smi.py: Fix gpu reset error
Since device is a list, we need to pass a single item to the isAmdGpu
function.

Fixes: ffbe481241 "rocm_smi.py: Don't try to reset non-AMD GPUs"

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307


[ROCm/amdsmi commit: 242d94a668]
2021-04-28 07:44:55 -04:00
Kent Russell 4de1e4094a rocm_smi.py: Don't try to print absent clock files
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634


[ROCm/amdsmi commit: b931380f02]
2021-04-23 10:19:04 -04:00
Ori Messinger 8a1ca3d26c rocm_smi.py: Show 'Out of Spec' warning only if required
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587


[ROCm/amdsmi commit: b71e07b3fb]
2021-04-22 14:44:05 -04:00
Ori Messinger 9537c89a6b ROCm SMI LIB: Add Default GPU Power Cap
Implement default GPU power cap functionality in the LIB.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia6b3420beb0e4df5559c3e6d11d0667972590b53


[ROCm/amdsmi commit: 83cd2fe4f1]
2021-04-22 10:49:55 -04:00
Harish Kasiviswanathan 52dc52654d Add energy counter resolution to rsmi_dev_energy_count_get
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I03b70968257db7a45e21d7ba62542cdedd18eb85


[ROCm/amdsmi commit: 844acbc0d8]
2021-04-22 10:25:06 -04:00
Ori Messinger f225c95878 ROCm SMI Python CLI: Add showevent Functionality
Implement showevent functionality in the ROCm SMI Python CLI.

It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62


[ROCm/amdsmi commit: a9e7e5a475]
2021-04-22 10:21:07 -04:00
Elena 3eb9426800 [rocm_smi.py] add energy counter
--showenergycounter

Signed-off-by: Elena Sakhnovitch
Change-Id: Iede0f2b06523f7cb2719489a883e9c49722f8d93


[ROCm/amdsmi commit: c80fc54500]
2021-04-21 18:40:19 -04:00
Elena 23d7d4a5ff [rocm_smi.py] Coarse Grain Utilization Counters
--showuse
--showmemuse

====================================
========= % time GPU is busy =======
GPU[0]          : GPU use (%): 0
GPU[0]          : GFX Activity: 0
====================================

Change-Id: I9db115ad78b394469206b22d195781a430b2f1d8


[ROCm/amdsmi commit: 771b4af95c]
2021-04-21 17:23:21 -04:00
Harish Kasiviswanathan 608afb879b Suppress warning message in getFanSpeed function
Many data center cards are fanless. Don't show warning if unable to get
fan speed. The fan speed will be reported as 0

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I53efe67ac88fb0824cf4820430b46c18bc7692df


[ROCm/amdsmi commit: 1c9e384c8f]
2021-04-21 15:29:44 -04:00