During the tail end when process is terminating, subprocess module fails
to find the process. This results in extraneous printing of a line with
char 'b'. Fix this.
BUG: SWDEV-296409
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I39aacf8ae948a5acec0aa93296cc0e0aec88b3ef
Python's default 'print' implementation is not thread safe, causing
empty lines to be printed during multithreaded code execution.
This fixes the --showevents output for multi-GPU systems.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I72f7341cdf4401f1fed4cd8f7d7a4a90bf9a3a4c
Use zero padding for the hexadecimal value 'device_model' inside
showProductName with a padding length of 4.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I962b94d414c6ba050d951486ad9e7559123f8850
Since device is a list, we need to pass a single item to the isAmdGpu
function.
Fixes: c7c2ac5559 "rocm_smi.py: Don't try to reset non-AMD GPUs"
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587
Implement showevent functionality in the ROCm SMI Python CLI.
It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62
Many data center cards are fanless. Don't show warning if unable to get
fan speed. The fan speed will be reported as 0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I53efe67ac88fb0824cf4820430b46c18bc7692df
This won't work for obvious reasons, so exit with an error instead of
trying to access a file that doesn't exist and segfaulting
Change-Id: Id1230922fa6e9a19e9394280faad88a43c7d2e34
rocm_smi.py --set<m|s>clk was treating the freq as a string.
This causes problems in parsing when the index is more than 1
digit. Now, treat the indexes as integers.
Change-Id: Ia0d859d33b685fe90689a86ff1c83980808b1514
The purpose of this patch is to fix a power cap bug for --setpoweroverdrive.
This bug occurs when the user attempts to set a lower wattage than the current
or default wattage, which displays an unnecessary warning message.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I730d2c6031b7d7c4af5acf32ecd28da5ca21ab12
The purpose of this patch is to implement GPU reset functionality
in the LIB, and to call it from the rocm_smi python CLI.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Iaf525f7016f8354a7fd93af0209ca2e97ef4fd56
The purpose of this patch is to fix a fan speed bug for --showfan.
This bug occurs when the current and/or maximum fan speeds are not
found by the LIB, which displayed an unclear error message.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ied06e460f22391238dd2d86572813e2a5a64f45b
Instead of looking solely in ../lib, try looking in any /opt folder as a
backup option. This is a little more robust and hopefully leads to fewer
issues trying to find the lib
Change-Id: Ie0d3944b48b32d9965917e5c831388838b6d4ef7
The purpose of this patch is to check if each valid clock is supported
on the GPU before attempting to retrieve its value.
The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.
This should get rid of the 'one or more commands failed' message when
running --showclkfrq or --showclocks on a machine that doesn't support
all the possible valid clocks.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I1fb10989fc1a36f38b68a23e17e6e600ed0ac85b
The purpose of this patch is to add Json functionality to showPids
by modifying the print2DArray function to use printSysLog.
Change-Id: Ie834d209b29332777c3f13f776f81c37d94b01b6
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
rocm_smi.py had an issue where it gets process information
in 2 different places. If the process disappears in between
those 2 places, a crash would occur.
This fix gracefully returns in this scenario.
Reading the file information from /proc instead of using
the python subProcess() call was considered, but it has the
drawback of not being able to read the process names of
processes not owned by the caller.
Change-Id: If812c4641f00da37e99defb0740f670107c8a797
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.
Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock
Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2
The purpose of this patch is to fix a bug present when using the
--gpureset option on a machine that has both an AMD GPU and a
non-AMD GPU (such as a motherboard's integrated graphics).
This bug occurs due to non-AMD GPUs being ignored by the LIB when
enumerating a list of valid AMD GPUs, causing the gpuReset method
to attempt a reset on the integrated graphics.
Change-Id: I1c03a3c41f905786e3c8246ec0c2b42786ff1770
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
The purpose of this patch is to fix a bug present when using the
--showproducname option, resulting in the following error:
undefined symbol: rsmi_dev_sku_get
This bug fix uses a substring from vbios version instead of using the
LIB's rsmi_dev_sku_get to avoid getting the undefined symbol error.
Change-Id: I56d72a481d5dde44c56106ae297f4bcff40ac15f
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
This implementation is copied directly from the previous rocm_smi.py
script; This feature is experimental and will be updated or removed with
feauture releases.
Signed-off-by: Elena Saknovitch
Change-Id: I5cd38266946302bc4123aeafaa825e13f704235e
The purpose of this patch is to add CU occupancy functionality to showPids
by calling rsmi_compute_process_info_get from the LIB.
Now showPids shows the following information on (KFD compute) processes:
PID, process name, GPU(s), VRAM used, SDMA used, and CU occupancy.
Change-Id: Ie005901e0eb946ef0fbb3523245ca451c4eed595
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
The purpose of this patch is to check for amdgpu driver initialization
before attempting to initialize rocmsmi in the CLI.
Additionally, since the '--help' functionality does not rely on anything
external to the CLI, it can now be called without the driver initialized.
Change-Id: I2fcce60ca6d9f77835549e3558c4bb1747499c5c
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
WKS and server cards have an FRU with product information, so try to use
that for product name and product SKU if it exists.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I40bbd3bf62f4cb02e96015ed1630112691cacbc3
The purpose of this patch is to implement --setclock functionality for
all of the valid clocks (can be set with --setclock TYPE LEVEL).
The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.
This functionality uses the existing 'setClocks' method.
Change-Id: I1d62baf372427ac1c0642c26a949663b673ef335
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
-Add divider between devices for --showclocks to increase readibility.
-Fix fan rounding error
-Fix spaces to comply with coding standard
-Fix @param description error in topo functions
-JSON result for topology:
{
"card0": {
"(Topology) Numa Node": "0",
"(Topology) Numa Affinity": "4294967295"
},
"card1": {
"(Topology) Numa Node": "0",
"(Topology) Numa Affinity": "4294967295"
},
"system": {
"(Topology) Weight between DRM devices 0 and 1": "40",
"(Topology) Hops between DRM devices 0 and 1": "2",
"(Topology) Link type between DRM devices 0 and 1": "PCIE"
}
}
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I711c100362826ed729ff90edd407009237d64f8f
During rocm_smi_lib installation the path should be set using ldconfig
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I0cab18f492013b783d1ce632591ce295f934a168