Граф коммитов

66 Коммитов

Автор SHA1 Сообщение Дата
Elena Sakhnovitch 48a2251ff6 [rocm_smi.py] remove \r symbol at print
Remove carriage return at the end of the line in printLog function.
On linux end of line is encoded with \n, not \n\r.

Change-Id: If3835d773033b53a7f25b4a0284df359a6f9555d


[ROCm/amdsmi commit: 1aeb27c4c9]
2021-12-08 10:13:56 -05:00
Divya Shikre 58b5a538a7 Add fix to display correct GPU Memory Activity and GFX Activity value.
Driver mem fills in 0xFF for all for the metrices not supported for that ASIC.
So if 0xFF is detected, return RSMI_STATUS_NOT_SUPPORTED

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I86a38148c7a288ea0db94893f685560eaac098ab


[ROCm/amdsmi commit: 7b1daaef96]
2021-11-25 14:28:06 -05:00
Ori Messinger 7e248102eb ROCm SMI CLI: Fix printErrLog Arguments
This patch removes every erroneous occurance of a third argument
when calling printErrLog(device, err), since it takes two arguments.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I5971cc68b69c86f37c69f44e4785dabfc82c7955


[ROCm/amdsmi commit: 40eed25a3b]
2021-11-08 12:54:00 -05:00
Elena Sakhnovitch 398df0b9d0 [ROCm-SMI] add --showNodesBw
Display min and max bandwidth between gpu nodes

Signed-off-by: Elena Sakhnovitch
Change-Id: I7289fb83f80e2f899996b7d7560ece670cc5f31f


[ROCm/amdsmi commit: 13cde8429d]
2021-10-29 12:49:35 -04:00
Elena Sakhnovitch ff2bcc16fa [rocm_smi.py] remove repetitive footnote
Printing "Primary die (usually one above or below the secondary) shows
total (primary + secondary) socket power information" footnote only one time, not
for every secondary die.

Signed-off-by: Elena Sakhnovitch
Change-Id: Iae9c5c94945ec38ecdb128a576a4eacafc30a044


[ROCm/amdsmi commit: 15e4fe80e1]
2021-10-29 08:32:06 -04:00
Ori Messinger de16bc4552 ROCm SMI CLI: Add --showtopoaccess Functionality
The purpose of this patch is to implement --showtopoaccess
functionality in the CLI, which shows True or False if P2P is
possible between two given GPUs.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I07d70d80ae7b484136b31d5d22780c4990029391


[ROCm/amdsmi commit: e2d9a37e5f]
2021-10-14 11:06:05 -04:00
Elena Sakhnovitch 8b42fe51b5 [rocm_smi.py]: fix fan 255% error
signed-off-by: Elena Sakhnovitch
Change-Id: I265ba32bc3777db5f04f1924547fe432ba78c3d0


[ROCm/amdsmi commit: 2f84906cc2]
2021-09-29 21:11:06 -04:00
Elena Sakhnovitch cda3383b3b [rocm_smi.py]: pep8 formatting
signed-off-by: Elena Sakhnovitch
Change-Id: If12b3371cd6acac16d9f6b3adf5f5cc8df28992f


[ROCm/amdsmi commit: 80140c3b02]
2021-08-26 10:23:58 -04:00
Elena Sakhnovitch 8e8586591a [rocm_smi.py] --showpower error bugfix
Fix error message in -P for secondary die

Signed-off-by: Elena Sakhnovitch
Change-Id: Ica3c0a83b565d2231fad23389b9378056a0f56b3


[ROCm/amdsmi commit: 2db7e2a312]
2021-07-30 00:08:14 -04:00
Elena Sakhnovitch fc4aa3d271 [rocm_smi.py] add secondary die check.
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I46618002c1967ec115db88becbaba9e7c0a08af1


[ROCm/amdsmi commit: b59e752122]
2021-07-29 17:46:12 -04:00
Harish Kasiviswanathan 419b720ea5 rocm_smi.py: Remove extraneous line during process termination
During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.

BUG: SWDEV-296409

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef


[ROCm/amdsmi commit: a03acf2c07]
2021-07-27 16:26:49 -04:00
Ori Messinger 546e11c058 ROCm SMI Python CLI: Fix printLog Collisions
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.

This fixes the --showevents output for multi-GPU systems.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c


[ROCm/amdsmi commit: 95348f37cc]
2021-07-21 23:58:07 -04:00
Ori Messinger 0cdc8fb26c ROCm SMI Python CLI: Add Zero Padding to Device Model
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850


[ROCm/amdsmi commit: 03ae187a35]
2021-07-17 04:29:52 -04:00
Divya Shikre d356da056d Add fix to show usage of setperfdeterminism functionality in --help command
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ife93c887eea2a9aae69f2923dba45c7cde4838d3


[ROCm/amdsmi commit: 686e6ac654]
2021-05-12 17:29:37 -04:00
Kent Russell 23635d1f90 rocm_smi.py: Fix gpu reset error
Since device is a list, we need to pass a single item to the isAmdGpu
function.

Fixes: ffbe481241 "rocm_smi.py: Don't try to reset non-AMD GPUs"

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307


[ROCm/amdsmi commit: 242d94a668]
2021-04-28 07:44:55 -04:00
Kent Russell 4de1e4094a rocm_smi.py: Don't try to print absent clock files
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634


[ROCm/amdsmi commit: b931380f02]
2021-04-23 10:19:04 -04:00
Ori Messinger 8a1ca3d26c rocm_smi.py: Show 'Out of Spec' warning only if required
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587


[ROCm/amdsmi commit: b71e07b3fb]
2021-04-22 14:44:05 -04:00
Ori Messinger f225c95878 ROCm SMI Python CLI: Add showevent Functionality
Implement showevent functionality in the ROCm SMI Python CLI.

It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62


[ROCm/amdsmi commit: a9e7e5a475]
2021-04-22 10:21:07 -04:00
Elena 3eb9426800 [rocm_smi.py] add energy counter
--showenergycounter

Signed-off-by: Elena Sakhnovitch
Change-Id: Iede0f2b06523f7cb2719489a883e9c49722f8d93


[ROCm/amdsmi commit: c80fc54500]
2021-04-21 18:40:19 -04:00
Elena 23d7d4a5ff [rocm_smi.py] Coarse Grain Utilization Counters
--showuse
--showmemuse

====================================
========= % time GPU is busy =======
GPU[0]          : GPU use (%): 0
GPU[0]          : GFX Activity: 0
====================================

Change-Id: I9db115ad78b394469206b22d195781a430b2f1d8


[ROCm/amdsmi commit: 771b4af95c]
2021-04-21 17:23:21 -04:00
Harish Kasiviswanathan 608afb879b Suppress warning message in getFanSpeed function
Many data center cards are fanless. Don't show warning if unable to get
fan speed. The fan speed will be reported as 0

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I53efe67ac88fb0824cf4820430b46c18bc7692df


[ROCm/amdsmi commit: 1c9e384c8f]
2021-04-21 15:29:44 -04:00
Divya Shikre 38cee239c7 Update setrange functionality in CLI
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic942bd76297c50caf189bfc0972d30dc42d91f32


[ROCm/amdsmi commit: 56c132873b]
2021-04-20 15:39:05 -04:00
Divya Shikre 86e595089b Add support for mi200 clocks being continuous.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ifb7570054572239b9f48eaefe51e879fb3569031


[ROCm/amdsmi commit: dc431506f5]
2021-04-20 13:12:27 -04:00
Divya Shikre 3a11b92287 Fix for cli errors - extra args in perf_determinism, undefined variable in setClocks
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Id138cfcbea4384f520537cc045d358024177b1ac


[ROCm/amdsmi commit: d9f7bd0ff4]
2021-04-19 17:32:07 -04:00
Elena ab17fca25f Adding 4 new HBM temperature sensors.
Signed-off-by: Elena Sakhnovitch
Change-Id: Iaea04c38e8c2353e85d8aa2b871fdb82727157de


[ROCm/amdsmi commit: 81c066350f]
2021-04-17 23:58:49 -04:00
Kent Russell ffbe481241 rocm_smi.py: Don't try to reset non-AMD GPUs
This won't work for obvious reasons, so exit with an error instead of
trying to access a file that doesn't exist and segfaulting

Change-Id: Id1230922fa6e9a19e9394280faad88a43c7d2e34


[ROCm/amdsmi commit: c7c2ac5559]
2021-04-13 08:00:17 -04:00
Divya Shikre 0fc1abdced Update performance determinism api as per the modified sysfs interface.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib0ec5128819644a2ff6c916da9194a7fe1dad795


[ROCm/amdsmi commit: aaf2120117]
2021-04-07 16:38:48 -04:00
Chris Freehill d1e4491505 Handle set freq for double-digit index in rocm_smi.py
rocm_smi.py --set<m|s>clk was treating the freq as a string.
This causes problems in parsing when the index is more than 1
digit. Now, treat the indexes as integers.

Change-Id: Ia0d859d33b685fe90689a86ff1c83980808b1514


[ROCm/amdsmi commit: 11440536cf]
2021-02-23 18:51:29 -06:00
Ori Messinger 42b33ea096 ROCm SMI Python CLI: Fix Lower Power Cap Warning
The purpose of this patch is to fix a power cap bug for --setpoweroverdrive.
This bug occurs when the user attempts to set a lower wattage than the current
or default wattage, which displays an unnecessary warning message.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I730d2c6031b7d7c4af5acf32ecd28da5ca21ab12


[ROCm/amdsmi commit: 20e2d260fb]
2021-01-27 03:24:22 -05:00
Ori Messinger d41364d1cf ROCm SMI Python CLI & LIB: Add GPU Reset Functionality
The purpose of this patch is to implement GPU reset functionality
in the LIB, and to call it from the rocm_smi python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Iaf525f7016f8354a7fd93af0209ca2e97ef4fd56


[ROCm/amdsmi commit: 80f629b9be]
2021-01-26 17:52:24 -05:00
Ori Messinger a5fee40cbb ROCm SMI Python CLI: Fix Fan Speed Bug
The purpose of this patch is to fix a fan speed bug for --showfan.
This bug occurs when the current and/or maximum fan speeds are not
found by the LIB, which displayed an unclear error message.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ied06e460f22391238dd2d86572813e2a5a64f45b


[ROCm/amdsmi commit: 4f297bdeb3]
2021-01-26 08:51:04 -05:00
Kent Russell 8d37749c05 Fix type in --setmrange documentation
mrange is for MCLK, not SCLK, so fix the typo accordingly

Change-Id: Ib20774b073288a8ec193322f2f767616979c95da


[ROCm/amdsmi commit: a902770f86]
2021-01-25 13:20:20 -05:00
Elena bb879e7f38 ROCm SMI Pythoc CLI: Fix division by zero fan bug
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: If259ac1ad6d77ce85b2b7616d972b6e7964a9f78


[ROCm/amdsmi commit: 61cdfff562]
2021-01-20 18:21:23 -05:00
Kent Russell 2ecaedb600 rocm-smi: Try find the librocm_smi64.so in a few locations
Instead of looking solely in ../lib, try looking in any /opt folder as a
backup option. This is a little more robust and hopefully leads to fewer
issues trying to find the lib

Change-Id: Ie0d3944b48b32d9965917e5c831388838b6d4ef7


[ROCm/amdsmi commit: c7b6b47211]
2021-01-08 15:29:11 -05:00
Ori Messinger 848697c287 ROCm SMI Python CLI: Fix --showclkfrq/--showclocks Failure
The purpose of this patch is to check if each valid clock is supported
on the GPU before attempting to retrieve its value.

The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.

This should get rid of the 'one or more commands failed' message when
running --showclkfrq or --showclocks on a machine that doesn't support
all the possible valid clocks.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I1fb10989fc1a36f38b68a23e17e6e600ed0ac85b


[ROCm/amdsmi commit: 3b52c895cc]
2020-12-18 17:46:23 -05:00
Ori Messinger 348ab2cf8e ROCm SMI Python CLI: Add Json Functionality to showPids
The purpose of this patch is to add Json functionality to showPids
by modifying the print2DArray function to use printSysLog.

Change-Id: Ie834d209b29332777c3f13f776f81c37d94b01b6
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: d9da490214]
2020-12-18 02:16:04 -05:00
Chris Freehill 24b23e90a8 Make rocm_smi.py handle disappearing PIDs
rocm_smi.py had an issue where it gets process information
in 2 different places. If the process disappears in between
those 2 places, a crash would occur.

This fix gracefully returns in this scenario.
Reading the file information from /proc instead of using
the python subProcess() call was considered, but it has the
drawback of not being able to read the process names of
processes not owned by the caller.

Change-Id: If812c4641f00da37e99defb0740f670107c8a797


[ROCm/amdsmi commit: db6d8d36ea]
2020-12-10 20:53:45 -06:00
Divya Shikre 08ba8bed83 Fix for syntax error caused due to performance determinism commit.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I02fbfec667e7f96ab0d0662036cf339a56025ba6


[ROCm/amdsmi commit: a0d10e021b]
2020-12-02 16:31:01 -05:00
Divya Shikre 4fd8d18e22 Adding Performance Determinism Mode to rocm_smi lib, CLI & gtest.
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.

Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock

Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2


[ROCm/amdsmi commit: 60d0f3052f]
2020-12-02 11:11:00 -05:00
Ori Messinger ffac195623 ROCm SMI Python CLI: Fix --gpureset Bug
The purpose of this patch is to fix a bug present when using the
--gpureset option on a machine that has both an AMD GPU and a
non-AMD GPU (such as a motherboard's integrated graphics).

This bug occurs due to non-AMD GPUs being ignored by the LIB when
enumerating a list of valid AMD GPUs, causing the gpuReset method
to attempt a reset on the integrated graphics.

Change-Id: I1c03a3c41f905786e3c8246ec0c2b42786ff1770
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: c0c1fd2098]
2020-11-25 11:21:36 -05:00
Ori Messinger 273ab71c38 ROCm SMI Python CLI: GPU showproductname SKU Fix
The purpose of this patch is to fix a bug present when using the
--showproducname option, resulting in the following error:
undefined symbol: rsmi_dev_sku_get

This bug fix uses a substring from vbios version instead of using the
LIB's rsmi_dev_sku_get to avoid getting the undefined symbol error.

Change-Id: I56d72a481d5dde44c56106ae297f4bcff40ac15f
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 015c7d59d0]
2020-11-12 15:38:52 -05:00
Chris Freehill b5e575875c Use relative path to find librocm_smi
Change-Id: Ifca3f54d680a802c1c5fa360d17e64338b9ac9a8


[ROCm/amdsmi commit: 438d28612f]
2020-10-29 14:36:48 -05:00
Elena Sakhnovitch 61b8cdbe43 ROCm SMI Python CLI: --rasinject partial support
This implementation is copied directly from the previous rocm_smi.py
script; This feature is experimental and will be updated or removed with
feauture releases.

Signed-off-by: Elena Saknovitch
Change-Id: I5cd38266946302bc4123aeafaa825e13f704235e


[ROCm/amdsmi commit: 4117719edd]
2020-10-22 17:22:13 -04:00
Chris Freehill cac03f5a0e Add new XGMI counter events to rsmiBindings.py
Also, correct RSMI_EVNT_LAST to new value.

Change-Id: I9f693cb398bba583201f6b5b5f0e2d45ede2e4e0


[ROCm/amdsmi commit: 1982fdc4fb]
2020-10-22 17:21:50 -04:00
Divya Shikre 33ccef9a1e Fix for weight/hops not being updated
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I333d49fa011b85d41eca63c082c0615febe2f7e9


[ROCm/amdsmi commit: 94291bf882]
2020-10-20 15:01:06 -04:00
Ori Messinger 297f89a62a ROCm SMI Python CLI: Add CU Occupancy to showPids function
The purpose of this patch is to add CU occupancy functionality to showPids
by calling rsmi_compute_process_info_get from the LIB.

Now showPids shows the following information on (KFD compute) processes:
PID, process name, GPU(s), VRAM used, SDMA used, and CU occupancy.

Change-Id: Ie005901e0eb946ef0fbb3523245ca451c4eed595
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 20ae72b078]
2020-10-15 21:21:32 -04:00
Ramesh Errabolu c69c210a6e Update ROCm SMI library with ability to read CU occupancy
Change-Id: Ib9882fa2d81c13604af282279bfa116bc2fd05a4


[ROCm/amdsmi commit: 328878343c]
2020-10-14 09:33:37 -04:00
Ori Messinger 6ea0c8b524 ROCm SMI Python CLI: Check for amdgpu Driver Initialization
The purpose of this patch is to check for amdgpu driver initialization
before attempting to initialize rocmsmi in the CLI.

Additionally, since the '--help' functionality does not rely on anything
external to the CLI, it can now be called without the driver initialized.

Change-Id: I2fcce60ca6d9f77835549e3558c4bb1747499c5c
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: e3c9aec714]
2020-10-08 11:17:45 -04:00
Kent Russell d41e3be1b0 Check FRU-based product information if available
WKS and server cards have an FRU with product information, so try to use
that for product name and product SKU if it exists.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I40bbd3bf62f4cb02e96015ed1630112691cacbc3


[ROCm/amdsmi commit: df7c3434cd]
2020-10-07 14:09:23 -04:00
Ori Messinger eca48bfd0b ROCm SMI Python CLI: Implement --setclock for all Valid Clocks
The purpose of this patch is to implement --setclock functionality for
all of the valid clocks (can be set with --setclock TYPE LEVEL).

The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.
This functionality uses the existing 'setClocks' method.

Change-Id: I1d62baf372427ac1c0642c26a949663b673ef335
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 4ed1c1d492]
2020-09-22 15:41:51 -04:00