Specify that timestamp resolution is in ns in header file.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4db00a07c0b5c43ae23c98213f2fbbcf93110234
[ROCm/rocm_smi_lib commit: 14201290a2]
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c
[ROCm/rocm_smi_lib commit: 5b42cdf780]
Since device is a list, we need to pass a single item to the isAmdGpu
function.
Fixes: 17bdc065a1 "rocm_smi.py: Don't try to reset non-AMD GPUs"
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307
[ROCm/rocm_smi_lib commit: 242d94a668]
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634
[ROCm/rocm_smi_lib commit: b931380f02]
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587
[ROCm/rocm_smi_lib commit: b71e07b3fb]
Implement default GPU power cap functionality in the LIB.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia6b3420beb0e4df5559c3e6d11d0667972590b53
[ROCm/rocm_smi_lib commit: 83cd2fe4f1]
Implement showevent functionality in the ROCm SMI Python CLI.
It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62
[ROCm/rocm_smi_lib commit: a9e7e5a475]
Many data center cards are fanless. Don't show warning if unable to get
fan speed. The fan speed will be reported as 0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I53efe67ac88fb0824cf4820430b46c18bc7692df
[ROCm/rocm_smi_lib commit: 1c9e384c8f]
The coarse grain utilization counter includes GFX and Memory activity.
Change-Id: I5d09976792d3f4a1c1081651fa24ff857016d4c0
[ROCm/rocm_smi_lib commit: 9bfb9ac297]
This won't work for obvious reasons, so exit with an error instead of
trying to access a file that doesn't exist and segfaulting
Change-Id: Id1230922fa6e9a19e9394280faad88a43c7d2e34
[ROCm/rocm_smi_lib commit: c7c2ac5559]
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.
Change-Id: I96b979296e90cf881523627b41b1a02849676416
[ROCm/rocm_smi_lib commit: da480b4589]
Previously, RSMI assumed that the event counter values returned
from perf were only new events. But in fact, when we read the
counter values, they are running totals. To account for this, we
now record the value we read and take the difference between the
current value and the previously recorded value.
Change-Id: I1e04b514e89c7c4d4719889f2dae3a1283864e7f
[ROCm/rocm_smi_lib commit: ce475b009c]
rocm_smi.py --set<m|s>clk was treating the freq as a string.
This causes problems in parsing when the index is more than 1
digit. Now, treat the indexes as integers.
Change-Id: Ia0d859d33b685fe90689a86ff1c83980808b1514
[ROCm/rocm_smi_lib commit: 11440536cf]
rocm_smi_lib is not currently known to only compile
on specific architectures.
Change-Id: I209e8baa063e99ebe5ff09eaf0dc6541770aa829
[ROCm/rocm_smi_lib commit: 7effb405f0]
Previously, during the rsmi_init discovery process, the existence
of an hwmon# directory was used to distinguish between gpus nodes
and non-gpu nodes. This isn't reliable in some scenarios. Instead,
the existence of the vbios_version file is used as an
indicator that the node is indeed a gpu.
Change-Id: Icfbe5c42ed0970077b05f25c3d209308a31bec85
[ROCm/rocm_smi_lib commit: ff9546aa62]
The purpose of this patch is to fix a power cap bug for --setpoweroverdrive.
This bug occurs when the user attempts to set a lower wattage than the current
or default wattage, which displays an unnecessary warning message.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I730d2c6031b7d7c4af5acf32ecd28da5ca21ab12
[ROCm/rocm_smi_lib commit: 20e2d260fb]
The purpose of this patch is to implement GPU reset functionality
in the LIB, and to call it from the rocm_smi python CLI.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Iaf525f7016f8354a7fd93af0209ca2e97ef4fd56
[ROCm/rocm_smi_lib commit: 80f629b9be]
The purpose of this patch is to fix a fan speed bug for --showfan.
This bug occurs when the current and/or maximum fan speeds are not
found by the LIB, which displayed an unclear error message.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ied06e460f22391238dd2d86572813e2a5a64f45b
[ROCm/rocm_smi_lib commit: 4f297bdeb3]
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9
[ROCm/rocm_smi_lib commit: ef7f99a7e2]
Instead of looking solely in ../lib, try looking in any /opt folder as a
backup option. This is a little more robust and hopefully leads to fewer
issues trying to find the lib
Change-Id: Ie0d3944b48b32d9965917e5c831388838b6d4ef7
[ROCm/rocm_smi_lib commit: c7b6b47211]
If we fail to find an expected temperature or voltage label
file, previously we were attempting to re-add a mapping of file
index to sensor types. Attempting to insert a map item that is already
present has no effect, so there should be no functional change.
This was a remnant of old code that should have been deleted.
Change-Id: Ie6f8a62f619a1ae58756e0fd891532434518cf78
[ROCm/rocm_smi_lib commit: bb5132a66c]
The environment variable RSMI_DEBUG_INFINITE_LOOP is introduced
to facilitate debugging RSMI in user applications. When this
env. variable is non-zero, an infinite loop will be entered in
rsmi_init(). At this point, a debugger can be attached and RSMI
can be debugger. This only applies to debug builds.
Change-Id: I23f6dd730fc965764295070de053314a1cc5b6aa
[ROCm/rocm_smi_lib commit: 68095b50e7]
There are some systems that don't have sudo, and since we require sudo
for any of the "set" functionality, add it to "Suggests".
See https://github.com/RadeonOpenCompute/ROCm/issues/1245
Change-Id: I9428b9a68810ee8b51f91bb2e3b63312463161b0
[ROCm/rocm_smi_lib commit: 7b5f220f76]
Now that rocm-smi is deprecated, change the DEB/RPM info so that it
provides the rocm-smi package. This will allow for a seamless transition
over during ROCm upgrades
Change-Id: Ia29aab6e45c5974f7b623b786d0649710ba1f7cc
[ROCm/rocm_smi_lib commit: 36a0465127]