Граф коммитов

332 Коммитов

Автор SHA1 Сообщение Дата
Divya Shikre 9abb288ace Add fix to ignore error returned when perf determinism is not supported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I89b6a0a3dbba6fbd4b12ff2e20670eff9f32ed7f


[ROCm/rocm_smi_lib commit: 6edea7a92e]
2021-06-14 12:18:22 -04:00
Divya Shikre 47d033876c Add fix to show usage of setperfdeterminism functionality in --help command
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ife93c887eea2a9aae69f2923dba45c7cde4838d3


[ROCm/rocm_smi_lib commit: 686e6ac654]
2021-05-12 17:29:37 -04:00
Divya Shikre f3c90aa582 Return an error when user tries to set out of range clock values for setsrange functionality
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibe1075c1d2b6c009332a52b81f4b41f7e93d0756


[ROCm/rocm_smi_lib commit: 462d4adc24]
2021-05-11 12:32:19 -04:00
Harish Kasiviswanathan deac3a055c Add timestamp resolution info in comments
Specify that timestamp resolution is in ns in header file.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4db00a07c0b5c43ae23c98213f2fbbcf93110234


[ROCm/rocm_smi_lib commit: 14201290a2]
2021-05-05 12:32:58 -04:00
Harish Kasiviswanathan 1f7954113f Add support to read gpu_metrics version 1.2
gpu_metrics version 1.2 provides atomic timestamp. Use this timestamp.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I7a1a675f53b93718f34b1f2979173e9064e0ef93


[ROCm/rocm_smi_lib commit: 6b10a7761b]
2021-05-05 12:31:10 -04:00
Harish Kasiviswanathan 1aac6e61d4 Change #define RSMI_GPU_METRICS_API_CONTENT_VER
Chnage to RSMI_GPU_METRICS_API_CONTENT_VER_1. In preparation for
supporting additional formats

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I4367a2622a0fa41e6b05bc4436ecd24b8c4e30e2


[ROCm/rocm_smi_lib commit: e83cf605c6]
2021-05-04 20:51:10 -04:00
Harish Kasiviswanathan debafec88c Move gpu_metrics functions to different file
No logic change. Only structural change

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Id5e1a678c0888f04081ee06db4521c72b5eb9b16


[ROCm/rocm_smi_lib commit: c416726054]
2021-05-04 20:49:51 -04:00
Ori Messinger ecaf3c52ff ROCm SMI LIB: Add Default Power Cap To rsmitst
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c


[ROCm/rocm_smi_lib commit: 5b42cdf780]
2021-04-28 12:42:34 -04:00
Kent Russell 22485bf114 rocm_smi.py: Fix gpu reset error
Since device is a list, we need to pass a single item to the isAmdGpu
function.

Fixes: 17bdc065a1 "rocm_smi.py: Don't try to reset non-AMD GPUs"

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I19a74377636ff4589f11d092f41e1d35c1acb307


[ROCm/rocm_smi_lib commit: 242d94a668]
2021-04-28 07:44:55 -04:00
Kent Russell 2ba625e569 rocm_smi.py: Don't try to print absent clock files
Instead of throwing "Unsupported clock" errors for ASICs that don't
support a certain clock type (e.g. dcefclk on MI-series), just dump the
warning to logging.debug and don't try to read the clock

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If3cb9a472b03aa535a76fc24bcd9f77122090634


[ROCm/rocm_smi_lib commit: b931380f02]
2021-04-23 10:19:04 -04:00
Ori Messinger 563db7514b rocm_smi.py: Show 'Out of Spec' warning only if required
Use default power cap exposed via sysfs to determine when to
show 'Out of Spec" warning.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I0fa3612b50e230856b0d5a390f876b35268d9587


[ROCm/rocm_smi_lib commit: b71e07b3fb]
2021-04-22 14:44:05 -04:00
Ori Messinger 9a23204e22 ROCm SMI LIB: Add Default GPU Power Cap
Implement default GPU power cap functionality in the LIB.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ia6b3420beb0e4df5559c3e6d11d0667972590b53


[ROCm/rocm_smi_lib commit: 83cd2fe4f1]
2021-04-22 10:49:55 -04:00
Harish Kasiviswanathan 108cf12a97 Add energy counter resolution to rsmi_dev_energy_count_get
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I03b70968257db7a45e21d7ba62542cdedd18eb85


[ROCm/rocm_smi_lib commit: 844acbc0d8]
2021-04-22 10:25:06 -04:00
Ori Messinger 6b4889a3a4 ROCm SMI Python CLI: Add showevent Functionality
Implement showevent functionality in the ROCm SMI Python CLI.

It can be called using --showevents with any combination of:
VM_FAULT, THERMAL_THROTTLE, and/or GPU_RESET
For example:
./rocm-smi --showevents VM_FAULT, THERMAL_THROTTLE, GPU_RESET

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I905fd9c949e91423b79833a04ab89d6ba3760e62


[ROCm/rocm_smi_lib commit: a9e7e5a475]
2021-04-22 10:21:07 -04:00
Elena 6f751e3fd5 [rocm_smi.py] add energy counter
--showenergycounter

Signed-off-by: Elena Sakhnovitch
Change-Id: Iede0f2b06523f7cb2719489a883e9c49722f8d93


[ROCm/rocm_smi_lib commit: c80fc54500]
2021-04-21 18:40:19 -04:00
Elena 8ee1e50e75 [rocm_smi.py] Coarse Grain Utilization Counters
--showuse
--showmemuse

====================================
========= % time GPU is busy =======
GPU[0]          : GPU use (%): 0
GPU[0]          : GFX Activity: 0
====================================

Change-Id: I9db115ad78b394469206b22d195781a430b2f1d8


[ROCm/rocm_smi_lib commit: 771b4af95c]
2021-04-21 17:23:21 -04:00
Harish Kasiviswanathan 7717cc9d88 Suppress warning message in getFanSpeed function
Many data center cards are fanless. Don't show warning if unable to get
fan speed. The fan speed will be reported as 0

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I53efe67ac88fb0824cf4820430b46c18bc7692df


[ROCm/rocm_smi_lib commit: 1c9e384c8f]
2021-04-21 15:29:44 -04:00
Harish Kasiviswanathan 5e9519a066 Add time profile for set_power_cap function
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Id728cb5fe85b3558e52b4517508211dca499e801


[ROCm/rocm_smi_lib commit: 92cf7ff28a]
2021-04-21 15:29:44 -04:00
Divya Shikre 7b99a4e180 Update setrange functionality in CLI
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic942bd76297c50caf189bfc0972d30dc42d91f32


[ROCm/rocm_smi_lib commit: 56c132873b]
2021-04-20 15:39:05 -04:00
Divya Shikre 275094d6c5 Add support for mi200 clocks being continuous.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ifb7570054572239b9f48eaefe51e879fb3569031


[ROCm/rocm_smi_lib commit: dc431506f5]
2021-04-20 13:12:27 -04:00
Divya Shikre 58fe0fb6db Add new setrange function in C++ lib
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I670aaeb93827bf4b2cc08eb36d0f9756f00e4e4e


[ROCm/rocm_smi_lib commit: 9f9a7aaf65]
2021-04-19 22:38:59 -04:00
Divya Shikre f17e6de490 Fix for cli errors - extra args in perf_determinism, undefined variable in setClocks
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Id138cfcbea4384f520537cc045d358024177b1ac


[ROCm/rocm_smi_lib commit: d9f7bd0ff4]
2021-04-19 17:32:07 -04:00
Elena e813049fd7 [rocm-smi-lib] add HBM temperature conversion factor
Change-Id: I45339c87c3d2a40670baf1b76ada60dceb650dc0


[ROCm/rocm_smi_lib commit: a383dd23aa]
2021-04-19 16:41:48 -04:00
Elena 134bb5d820 Adding 4 new HBM temperature sensors.
Signed-off-by: Elena Sakhnovitch
Change-Id: Iaea04c38e8c2353e85d8aa2b871fdb82727157de


[ROCm/rocm_smi_lib commit: 81c066350f]
2021-04-17 23:58:49 -04:00
Bill(Shuzhou) Liu db8d076b35 Unit test for energy accumulator counter
Add a few unit tests for energy accumulator counter.

Change-Id: Ib78a67e29465de9c14e6e934c5d62ec64de66d8a


[ROCm/rocm_smi_lib commit: 392d13e318]
2021-04-14 16:04:46 -04:00
Bill(Shuzhou) Liu 6218ae6fa9 Unit tests for coarse grain utilization counters
The unit tests for GFX and Memory activity counters.

Change-Id: I968dabc9ef6de9d335d7f751b290fb713b51a79c


[ROCm/rocm_smi_lib commit: 6340176b99]
2021-04-14 10:53:55 -04:00
Bill(Shuzhou) Liu c3c7019436 Add energy accumulator counter
The energy accumulator counter tracks all energy consumed.

Change-Id: I5b25f817b7802d81c477361447f0ecd7ec02fc61


[ROCm/rocm_smi_lib commit: 8eec0a7d36]
2021-04-14 10:43:01 -04:00
Bill(Shuzhou) Liu dfa31bb5c4 Add coarse grain utilization counter
The coarse grain utilization counter includes GFX and Memory activity.

Change-Id: I5d09976792d3f4a1c1081651fa24ff857016d4c0


[ROCm/rocm_smi_lib commit: 9bfb9ac297]
2021-04-14 10:40:19 -04:00
Kent Russell 17bdc065a1 rocm_smi.py: Don't try to reset non-AMD GPUs
This won't work for obvious reasons, so exit with an error instead of
trying to access a file that doesn't exist and segfaulting

Change-Id: Id1230922fa6e9a19e9394280faad88a43c7d2e34


[ROCm/rocm_smi_lib commit: c7c2ac5559]
2021-04-13 08:00:17 -04:00
Kent Russell cd21f8fdb3 CMakeLists: Add python3 to required packages
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I434b24d12e92d2f6a6928b7450e74c3898303a44


[ROCm/rocm_smi_lib commit: b016a8269a]
2021-04-12 11:33:39 -04:00
Divya Shikre b44fccf1b3 Update performance determinism api as per the modified sysfs interface.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib0ec5128819644a2ff6c916da9194a7fe1dad795


[ROCm/rocm_smi_lib commit: aaf2120117]
2021-04-07 16:38:48 -04:00
Bill(Shuzhou) Liu 72d9f4b9ce Add support for the HBM temperature
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.

Change-Id: I96b979296e90cf881523627b41b1a02849676416


[ROCm/rocm_smi_lib commit: da480b4589]
2021-04-05 15:55:55 -04:00
Cole Nelson 578d553580 CMakeLists.txt: add ENABLE_LDCONFIG to support multi-version install
Signed-off-by: Cole Nelson <cole.nelson@amd.com>
Change-Id: If06e8b7b57ad12f22c1970622d241a42083d575e


[ROCm/rocm_smi_lib commit: f990d775b7]
2021-03-30 15:39:47 -04:00
Chris Freehill 6448acfcaa Handle different gpu_metrics content versions for format v1
Change-Id: I344d1815da683befc8f8b5caf921803b267ae29f


[ROCm/rocm_smi_lib commit: 5e2a4f3a15]
2021-03-24 14:34:55 -05:00
Chris Freehill 0bf5eb21a5 Adjust event counters to report only new events
Previously, RSMI assumed that the event counter values returned
from perf were only new events. But in fact, when we read the
counter values, they are running totals. To account for this, we
now record the value we read and take the difference between the
current value and the previously recorded value.

Change-Id: I1e04b514e89c7c4d4719889f2dae3a1283864e7f


[ROCm/rocm_smi_lib commit: ce475b009c]
2021-02-24 11:02:17 -06:00
Chris Freehill 37e617e0c8 Handle set freq for double-digit index in rocm_smi.py
rocm_smi.py --set<m|s>clk was treating the freq as a string.
This causes problems in parsing when the index is more than 1
digit. Now, treat the indexes as integers.

Change-Id: Ia0d859d33b685fe90689a86ff1c83980808b1514


[ROCm/rocm_smi_lib commit: 11440536cf]
2021-02-23 18:51:29 -06:00
Chris Freehill 1e1dbeb8b4 Change Debian Architecture from amd64 to any
rocm_smi_lib is not currently known to only compile
on specific architectures.

Change-Id: I209e8baa063e99ebe5ff09eaf0dc6541770aa829


[ROCm/rocm_smi_lib commit: 7effb405f0]
2021-02-01 13:48:38 -06:00
Chris Freehill e202097fb0 Don't use hwmon# as indicator of gpu
Previously, during the rsmi_init discovery process, the existence
of an hwmon# directory was used to distinguish between gpus nodes
and non-gpu nodes. This isn't reliable in some scenarios. Instead,
the existence of the vbios_version file is used as an
indicator that the node is indeed a gpu.

Change-Id: Icfbe5c42ed0970077b05f25c3d209308a31bec85


[ROCm/rocm_smi_lib commit: ff9546aa62]
2021-01-29 13:05:10 -05:00
Ori Messinger eaec11ce8a ROCm SMI Python CLI: Fix Lower Power Cap Warning
The purpose of this patch is to fix a power cap bug for --setpoweroverdrive.
This bug occurs when the user attempts to set a lower wattage than the current
or default wattage, which displays an unnecessary warning message.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I730d2c6031b7d7c4af5acf32ecd28da5ca21ab12


[ROCm/rocm_smi_lib commit: 20e2d260fb]
2021-01-27 03:24:22 -05:00
Ori Messinger 12fd0f8c40 ROCm SMI Python CLI & LIB: Add GPU Reset Functionality
The purpose of this patch is to implement GPU reset functionality
in the LIB, and to call it from the rocm_smi python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Iaf525f7016f8354a7fd93af0209ca2e97ef4fd56


[ROCm/rocm_smi_lib commit: 80f629b9be]
2021-01-26 17:52:24 -05:00
Ori Messinger 4c3c50ea13 ROCm SMI Python CLI: Fix Fan Speed Bug
The purpose of this patch is to fix a fan speed bug for --showfan.
This bug occurs when the current and/or maximum fan speeds are not
found by the LIB, which displayed an unclear error message.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ied06e460f22391238dd2d86572813e2a5a64f45b


[ROCm/rocm_smi_lib commit: 4f297bdeb3]
2021-01-26 08:51:04 -05:00
Kent Russell 98a39bf706 Fix type in --setmrange documentation
mrange is for MCLK, not SCLK, so fix the typo accordingly

Change-Id: Ib20774b073288a8ec193322f2f767616979c95da


[ROCm/rocm_smi_lib commit: a902770f86]
2021-01-25 13:20:20 -05:00
Elena e03f3b97f2 ROCm SMI Pythoc CLI: Fix division by zero fan bug
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: If259ac1ad6d77ce85b2b7616d972b6e7964a9f78


[ROCm/rocm_smi_lib commit: 61cdfff562]
2021-01-20 18:21:23 -05:00
Kent Russell b15eab2821 CMakeLIsts: Fix libasan usage
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9


[ROCm/rocm_smi_lib commit: ef7f99a7e2]
2021-01-15 15:39:05 -05:00
Chris Freehill dae44b59eb Comment out CPACK_RPM_PACKAGE_SUGGESTS line
This line make the build fail on Centos. It may be
that it's not supported on that disto.

See https://bugzilla.redhat.com/show_bug.cgi?id=1811358

Change-Id: Ied7ce634ae9fb2b1544f85c0b10ceecc039c388a


[ROCm/rocm_smi_lib commit: 47b882b8d3]
2021-01-12 17:15:52 -06:00
Kent Russell 6dd5a5b420 rocm-smi: Try find the librocm_smi64.so in a few locations
Instead of looking solely in ../lib, try looking in any /opt folder as a
backup option. This is a little more robust and hopefully leads to fewer
issues trying to find the lib

Change-Id: Ie0d3944b48b32d9965917e5c831388838b6d4ef7


[ROCm/rocm_smi_lib commit: c7b6b47211]
2021-01-08 15:29:11 -05:00
Chris Freehill d71ba4c666 Remove adding of bogus hwmon label entries
If we fail to find an expected temperature or voltage label
file, previously we were attempting to re-add a mapping of file
index to sensor types. Attempting to insert a map item that is already
present has no effect, so there should be no functional change.

This was a remnant of old code that should have been deleted.

Change-Id: Ie6f8a62f619a1ae58756e0fd891532434518cf78


[ROCm/rocm_smi_lib commit: bb5132a66c]
2021-01-06 11:01:07 -05:00
Chris Freehill 185ebc2f07 Introduce RSMI_DEBUG_INFINITE_LOOP
The environment variable RSMI_DEBUG_INFINITE_LOOP is introduced
to facilitate debugging RSMI in user applications. When this
env. variable is non-zero, an infinite loop will be entered in
rsmi_init(). At this point, a debugger can be attached and RSMI
can be debugger. This only applies to debug builds.

Change-Id: I23f6dd730fc965764295070de053314a1cc5b6aa


[ROCm/rocm_smi_lib commit: 68095b50e7]
2021-01-06 10:30:24 -05:00
Kent Russell 367ea32ad7 CMakeLists: Add sudo to Suggests field
There are some systems that don't have sudo, and since we require sudo
for any of the "set" functionality, add it to "Suggests".

See https://github.com/RadeonOpenCompute/ROCm/issues/1245

Change-Id: I9428b9a68810ee8b51f91bb2e3b63312463161b0


[ROCm/rocm_smi_lib commit: 7b5f220f76]
2021-01-04 10:46:46 -05:00
Kent Russell cd25354017 CMakeLists: Make rocm_smi_lib provide rocm-smi
Now that rocm-smi is deprecated, change the DEB/RPM info so that it
provides the rocm-smi package. This will allow for a seamless transition
over during ROCm upgrades

Change-Id: Ia29aab6e45c5974f7b623b786d0649710ba1f7cc


[ROCm/rocm_smi_lib commit: 36a0465127]
2021-01-04 10:46:40 -05:00