نمودار کامیت

296 کامیت‌ها

مولف SHA1 پیام تاریخ
Chris Freehill 9d2e2ffffd Change Debian Architecture from amd64 to any
rocm_smi_lib is not currently known to only compile
on specific architectures.

Change-Id: I209e8baa063e99ebe5ff09eaf0dc6541770aa829


[ROCm/amdsmi commit: 7effb405f0]
2021-02-01 13:48:38 -06:00
Chris Freehill fff19b1b3e Don't use hwmon# as indicator of gpu
Previously, during the rsmi_init discovery process, the existence
of an hwmon# directory was used to distinguish between gpus nodes
and non-gpu nodes. This isn't reliable in some scenarios. Instead,
the existence of the vbios_version file is used as an
indicator that the node is indeed a gpu.

Change-Id: Icfbe5c42ed0970077b05f25c3d209308a31bec85


[ROCm/amdsmi commit: ff9546aa62]
2021-01-29 13:05:10 -05:00
Ori Messinger 42b33ea096 ROCm SMI Python CLI: Fix Lower Power Cap Warning
The purpose of this patch is to fix a power cap bug for --setpoweroverdrive.
This bug occurs when the user attempts to set a lower wattage than the current
or default wattage, which displays an unnecessary warning message.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I730d2c6031b7d7c4af5acf32ecd28da5ca21ab12


[ROCm/amdsmi commit: 20e2d260fb]
2021-01-27 03:24:22 -05:00
Ori Messinger d41364d1cf ROCm SMI Python CLI & LIB: Add GPU Reset Functionality
The purpose of this patch is to implement GPU reset functionality
in the LIB, and to call it from the rocm_smi python CLI.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Iaf525f7016f8354a7fd93af0209ca2e97ef4fd56


[ROCm/amdsmi commit: 80f629b9be]
2021-01-26 17:52:24 -05:00
Ori Messinger a5fee40cbb ROCm SMI Python CLI: Fix Fan Speed Bug
The purpose of this patch is to fix a fan speed bug for --showfan.
This bug occurs when the current and/or maximum fan speeds are not
found by the LIB, which displayed an unclear error message.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: Ied06e460f22391238dd2d86572813e2a5a64f45b


[ROCm/amdsmi commit: 4f297bdeb3]
2021-01-26 08:51:04 -05:00
Kent Russell 8d37749c05 Fix type in --setmrange documentation
mrange is for MCLK, not SCLK, so fix the typo accordingly

Change-Id: Ib20774b073288a8ec193322f2f767616979c95da


[ROCm/amdsmi commit: a902770f86]
2021-01-25 13:20:20 -05:00
Elena bb879e7f38 ROCm SMI Pythoc CLI: Fix division by zero fan bug
Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: If259ac1ad6d77ce85b2b7616d972b6e7964a9f78


[ROCm/amdsmi commit: 61cdfff562]
2021-01-20 18:21:23 -05:00
Kent Russell 4a35269cc1 CMakeLIsts: Fix libasan usage
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9


[ROCm/amdsmi commit: ef7f99a7e2]
2021-01-15 15:39:05 -05:00
Chris Freehill 23345c0c3a Comment out CPACK_RPM_PACKAGE_SUGGESTS line
This line make the build fail on Centos. It may be
that it's not supported on that disto.

See https://bugzilla.redhat.com/show_bug.cgi?id=1811358

Change-Id: Ied7ce634ae9fb2b1544f85c0b10ceecc039c388a


[ROCm/amdsmi commit: 47b882b8d3]
2021-01-12 17:15:52 -06:00
Kent Russell 2ecaedb600 rocm-smi: Try find the librocm_smi64.so in a few locations
Instead of looking solely in ../lib, try looking in any /opt folder as a
backup option. This is a little more robust and hopefully leads to fewer
issues trying to find the lib

Change-Id: Ie0d3944b48b32d9965917e5c831388838b6d4ef7


[ROCm/amdsmi commit: c7b6b47211]
2021-01-08 15:29:11 -05:00
Chris Freehill 55e86989c1 Remove adding of bogus hwmon label entries
If we fail to find an expected temperature or voltage label
file, previously we were attempting to re-add a mapping of file
index to sensor types. Attempting to insert a map item that is already
present has no effect, so there should be no functional change.

This was a remnant of old code that should have been deleted.

Change-Id: Ie6f8a62f619a1ae58756e0fd891532434518cf78


[ROCm/amdsmi commit: bb5132a66c]
2021-01-06 11:01:07 -05:00
Chris Freehill 76323354d1 Introduce RSMI_DEBUG_INFINITE_LOOP
The environment variable RSMI_DEBUG_INFINITE_LOOP is introduced
to facilitate debugging RSMI in user applications. When this
env. variable is non-zero, an infinite loop will be entered in
rsmi_init(). At this point, a debugger can be attached and RSMI
can be debugger. This only applies to debug builds.

Change-Id: I23f6dd730fc965764295070de053314a1cc5b6aa


[ROCm/amdsmi commit: 68095b50e7]
2021-01-06 10:30:24 -05:00
Kent Russell e4175d0eeb CMakeLists: Add sudo to Suggests field
There are some systems that don't have sudo, and since we require sudo
for any of the "set" functionality, add it to "Suggests".

See https://github.com/RadeonOpenCompute/ROCm/issues/1245

Change-Id: I9428b9a68810ee8b51f91bb2e3b63312463161b0


[ROCm/amdsmi commit: 7b5f220f76]
2021-01-04 10:46:46 -05:00
Kent Russell 2411ad3aea CMakeLists: Make rocm_smi_lib provide rocm-smi
Now that rocm-smi is deprecated, change the DEB/RPM info so that it
provides the rocm-smi package. This will allow for a seamless transition
over during ROCm upgrades

Change-Id: Ia29aab6e45c5974f7b623b786d0649710ba1f7cc


[ROCm/amdsmi commit: 36a0465127]
2021-01-04 10:46:40 -05:00
Ori Messinger 848697c287 ROCm SMI Python CLI: Fix --showclkfrq/--showclocks Failure
The purpose of this patch is to check if each valid clock is supported
on the GPU before attempting to retrieve its value.

The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.

This should get rid of the 'one or more commands failed' message when
running --showclkfrq or --showclocks on a machine that doesn't support
all the possible valid clocks.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I1fb10989fc1a36f38b68a23e17e6e600ed0ac85b


[ROCm/amdsmi commit: 3b52c895cc]
2020-12-18 17:46:23 -05:00
Divya Shikre 22516a3b63 Fix for error while reading gpu_metrics sysfs file
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: If69b7eeb3573ebece9ed0cb539f5ddffbe3c2f09


[ROCm/amdsmi commit: efd234c9e3]
2020-12-18 15:31:16 -05:00
Ori Messinger 348ab2cf8e ROCm SMI Python CLI: Add Json Functionality to showPids
The purpose of this patch is to add Json functionality to showPids
by modifying the print2DArray function to use printSysLog.

Change-Id: Ie834d209b29332777c3f13f776f81c37d94b01b6
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: d9da490214]
2020-12-18 02:16:04 -05:00
Ashutosh Mishra 6cbe1fda15 Standardizing Package name
Enables standards compliant package naming for debian and rpm

Signed-off-by: Ashutosh Mishra <ashutosh.mishra@amd.com>
Change-Id: Ib37d29aedc20b610619f6921f4147b41c0eaf134


[ROCm/amdsmi commit: d097cb21e1]
2020-12-17 02:47:53 -05:00
Chris Freehill a0c99bc972 Correct TestPerfLevelReadWrite test
Enums referenced in the test did not match what's in rocm_smi.h.
Added static assert to try to catch this. Also moved enum string
map to test_common.cc/h where other such maps are.

Also, fixed some cpplint issues.

Change-Id: I683553248ceb2fabb28ce1a1208bc9744aaf88d6


[ROCm/amdsmi commit: 7e17684532]
2020-12-16 17:12:04 -06:00
Divya Shikre 1b30b426ae Fix for inconsistent GPU indexing between rocm-smi and rbt/hip.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d966c91bfe0f0d51859ff098d15011a3e4e8b29


[ROCm/amdsmi commit: 47ca37aef7]
2020-12-11 15:11:21 -05:00
Chris Freehill 24b23e90a8 Make rocm_smi.py handle disappearing PIDs
rocm_smi.py had an issue where it gets process information
in 2 different places. If the process disappears in between
those 2 places, a crash would occur.

This fix gracefully returns in this scenario.
Reading the file information from /proc instead of using
the python subProcess() call was considered, but it has the
drawback of not being able to read the process names of
processes not owned by the caller.

Change-Id: If812c4641f00da37e99defb0740f670107c8a797


[ROCm/amdsmi commit: db6d8d36ea]
2020-12-10 20:53:45 -06:00
Chris Freehill 5d92777473 Show more info in stderr when rsmi_init() fails
Some rsmi apps fail without much explanation when
rsmi_init() fails. This patch hopes to provide some clues to
the reason for the failure.

Change-Id: Id51308dc327b9871d537dd3e709b677db4ef10bc


[ROCm/amdsmi commit: 6377e0258d]
2020-12-10 07:32:03 -05:00
Chris Freehill 3be45e2ea9 Fix process killed while holding mutex
Previously, when a process holding a shared mutex was killed,
the next time an RSMI application was started, it would not be
able to obtain the mutex--the application would have to exit.
This fix uses pthread_mutexattr_setrobust() to detect this
situation and act accordingingly.

Also, add some missing, needed mutexes and move mutexes
closer to where the protect resource is used.

Change-Id: Icfdc3a246f4cfa3fd008e3f13472199abd76fd35


[ROCm/amdsmi commit: f4938b0ac9]
2020-12-04 12:59:55 -05:00
Divya Shikre 08ba8bed83 Fix for syntax error caused due to performance determinism commit.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I02fbfec667e7f96ab0d0662036cf339a56025ba6


[ROCm/amdsmi commit: a0d10e021b]
2020-12-02 16:31:01 -05:00
Divya Shikre 4fd8d18e22 Adding Performance Determinism Mode to rocm_smi lib, CLI & gtest.
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.

Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock

Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2


[ROCm/amdsmi commit: 60d0f3052f]
2020-12-02 11:11:00 -05:00
Ori Messinger ffac195623 ROCm SMI Python CLI: Fix --gpureset Bug
The purpose of this patch is to fix a bug present when using the
--gpureset option on a machine that has both an AMD GPU and a
non-AMD GPU (such as a motherboard's integrated graphics).

This bug occurs due to non-AMD GPUs being ignored by the LIB when
enumerating a list of valid AMD GPUs, causing the gpuReset method
to attempt a reset on the integrated graphics.

Change-Id: I1c03a3c41f905786e3c8246ec0c2b42786ff1770
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: c0c1fd2098]
2020-11-25 11:21:36 -05:00
Mukul Joshi c017960507 Update Event Notification test for GPU Reset event
Update the event notification tests to handle both GPU pre reset
and GPU post reset events. GPU post reset event takes sometime to
be generated after the pre reset event, so issue another
notification read to wait for post reset event.

Change-Id: I2812760b184d5357130e478cc35d27b14592abb3


[ROCm/amdsmi commit: 446ab5c8c7]
2020-11-23 10:53:23 -05:00
Ori Messinger 273ab71c38 ROCm SMI Python CLI: GPU showproductname SKU Fix
The purpose of this patch is to fix a bug present when using the
--showproducname option, resulting in the following error:
undefined symbol: rsmi_dev_sku_get

This bug fix uses a substring from vbios version instead of using the
LIB's rsmi_dev_sku_get to avoid getting the undefined symbol error.

Change-Id: I56d72a481d5dde44c56106ae297f4bcff40ac15f
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 015c7d59d0]
2020-11-12 15:38:52 -05:00
Chris Freehill a061e62b2b Add disclaimer to README and update pdf manual
Change-Id: I19c957e5a1de9f87f1834d341221fad6c826b252


[ROCm/amdsmi commit: af5227cdf7]
2020-11-10 17:36:51 -06:00
Chris Freehill bd9e47a53c rsmitst address sanitizer support
Also, add libasan flag variants for librocm_smi

Change-Id: Ibd012e40d26907addf8c0550aaf9f78c11b8d51f


[ROCm/amdsmi commit: bf6af90908]
2020-11-10 15:45:56 -06:00
Chris Freehill c6a58d91cb Quiet address sanitizer warnings
Also,
* Fix some doxygen issues
* Fix address sanitizer issues in rsmitst

Change-Id: Ie6c6fd9af5c418210b7064e79650fb92cd4a5e2b


[ROCm/amdsmi commit: 63064b0000]
2020-11-10 14:16:39 -06:00
Chris Freehill fa475d307a Make CMakeLists.txt recognize ADDRESS_SANITIZER
Change-Id: Ic80ac42c62cd400e48fb26d504547931fdd6863a


[ROCm/amdsmi commit: e7c8dfe2a2]
2020-11-04 17:57:31 -06:00
Chris Freehill b5e575875c Use relative path to find librocm_smi
Change-Id: Ifca3f54d680a802c1c5fa360d17e64338b9ac9a8


[ROCm/amdsmi commit: 438d28612f]
2020-10-29 14:36:48 -05:00
Elena Sakhnovitch 61b8cdbe43 ROCm SMI Python CLI: --rasinject partial support
This implementation is copied directly from the previous rocm_smi.py
script; This feature is experimental and will be updated or removed with
feauture releases.

Signed-off-by: Elena Saknovitch
Change-Id: I5cd38266946302bc4123aeafaa825e13f704235e


[ROCm/amdsmi commit: 4117719edd]
2020-10-22 17:22:13 -04:00
Chris Freehill cac03f5a0e Add new XGMI counter events to rsmiBindings.py
Also, correct RSMI_EVNT_LAST to new value.

Change-Id: I9f693cb398bba583201f6b5b5f0e2d45ede2e4e0


[ROCm/amdsmi commit: 1982fdc4fb]
2020-10-22 17:21:50 -04:00
Divya Shikre 33ccef9a1e Fix for weight/hops not being updated
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I333d49fa011b85d41eca63c082c0615febe2f7e9


[ROCm/amdsmi commit: 94291bf882]
2020-10-20 15:01:06 -04:00
Ori Messinger 297f89a62a ROCm SMI Python CLI: Add CU Occupancy to showPids function
The purpose of this patch is to add CU occupancy functionality to showPids
by calling rsmi_compute_process_info_get from the LIB.

Now showPids shows the following information on (KFD compute) processes:
PID, process name, GPU(s), VRAM used, SDMA used, and CU occupancy.

Change-Id: Ie005901e0eb946ef0fbb3523245ca451c4eed595
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 20ae72b078]
2020-10-15 21:21:32 -04:00
Ramesh Errabolu c69c210a6e Update ROCm SMI library with ability to read CU occupancy
Change-Id: Ib9882fa2d81c13604af282279bfa116bc2fd05a4


[ROCm/amdsmi commit: 328878343c]
2020-10-14 09:33:37 -04:00
Divya Shikre 207d6339ec Adding gtest for gpu metrics read
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I66edb15c8b7380f3427822b33e845202bfac7a2b


[ROCm/amdsmi commit: f397cba414]
2020-10-08 13:37:47 -04:00
Ori Messinger 6ea0c8b524 ROCm SMI Python CLI: Check for amdgpu Driver Initialization
The purpose of this patch is to check for amdgpu driver initialization
before attempting to initialize rocmsmi in the CLI.

Additionally, since the '--help' functionality does not rely on anything
external to the CLI, it can now be called without the driver initialized.

Change-Id: I2fcce60ca6d9f77835549e3558c4bb1747499c5c
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: e3c9aec714]
2020-10-08 11:17:45 -04:00
Chris Freehill 137e33a1e6 Revert "Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters""
This reverts commit 1be2933806.



Change-Id: Ic412a64d35aab74caf12bf4c791f0a66ac15b061


[ROCm/amdsmi commit: 5465d872aa]
2020-10-08 10:36:30 -04:00
Kent Russell fb687f2c68 Remove extraneous mutexes
We already grab the mutex before getting the device name, so we don't
need to grab it again

Change-Id: Ib627ba3a39c485f6069af052cfd3e6c522873d43


[ROCm/amdsmi commit: e350278b68]
2020-10-08 07:55:07 -04:00
Chris Freehill 1be2933806 Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters"
This reverts commit dc48f149f0.

Temporarily reverting until the driver side of this is upstream

Change-Id: I2d8243208c1271ebad90bc2ee0fda2dfefb0831b


[ROCm/amdsmi commit: ae6d3fbdd0]
2020-10-07 18:42:56 -04:00
Kent Russell d41e3be1b0 Check FRU-based product information if available
WKS and server cards have an FRU with product information, so try to use
that for product name and product SKU if it exists.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I40bbd3bf62f4cb02e96015ed1630112691cacbc3


[ROCm/amdsmi commit: df7c3434cd]
2020-10-07 14:09:23 -04:00
Chris Freehill 17f6a43fa1 Fail gracefully if drm directory is not found
Change-Id: I0f3ab2721108355752caf0280124469b98af4967


[ROCm/amdsmi commit: c6f02b4d62]
2020-10-05 21:12:11 -04:00
Chris Freehill dc48f149f0 Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters
Also some format fixes

Change-Id: Id3c0f6b3cf5b327bb9ca6acb6091dc67764c8032


[ROCm/amdsmi commit: 946bf93dfb]
2020-10-05 17:22:19 -05:00
Divya Shikre 5cddbccec6 Adding functionality that will parse gpu_metrics sysfs file
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I3a84870b83eb4cd0ed46f10bb19169c91f99fd8e


[ROCm/amdsmi commit: 8b48564ce3]
2020-10-02 10:25:41 -04:00
Chris Freehill e775527ffc Add gtest lib dir to library search path
Change-Id: I57bb20e2a67a4eaac2d0e24314e22d1a5fbe3533


[ROCm/amdsmi commit: 3522e94ed0]
2020-10-01 23:46:33 -04:00
Ori Messinger eca48bfd0b ROCm SMI Python CLI: Implement --setclock for all Valid Clocks
The purpose of this patch is to implement --setclock functionality for
all of the valid clocks (can be set with --setclock TYPE LEVEL).

The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.
This functionality uses the existing 'setClocks' method.

Change-Id: I1d62baf372427ac1c0642c26a949663b673ef335
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 4ed1c1d492]
2020-09-22 15:41:51 -04:00
Mukul Joshi ffc473d40a Use correct string conversion function for VRAM and SDMA usage
VRAM and SDMA usage can be 64-bit long numbers. Use stoull()
instead of stoi() to convert the VRAM and SDMA usage strings to
numbers.

Change-Id: Ifadbada9f33320fc67666036ce8439823c1d1fb7


[ROCm/amdsmi commit: fb2ed24372]
2020-09-21 12:28:22 -04:00