Граф коммитов

280 Коммитов

Автор SHA1 Сообщение Дата
Ori Messinger e5a245ff00 ROCm SMI Python CLI: Add Json Functionality to showPids
The purpose of this patch is to add Json functionality to showPids
by modifying the print2DArray function to use printSysLog.

Change-Id: Ie834d209b29332777c3f13f776f81c37d94b01b6
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: d9da490214]
2020-12-18 02:16:04 -05:00
Ashutosh Mishra a6c76d6952 Standardizing Package name
Enables standards compliant package naming for debian and rpm

Signed-off-by: Ashutosh Mishra <ashutosh.mishra@amd.com>
Change-Id: Ib37d29aedc20b610619f6921f4147b41c0eaf134


[ROCm/rocm_smi_lib commit: d097cb21e1]
2020-12-17 02:47:53 -05:00
Chris Freehill c401b025fe Correct TestPerfLevelReadWrite test
Enums referenced in the test did not match what's in rocm_smi.h.
Added static assert to try to catch this. Also moved enum string
map to test_common.cc/h where other such maps are.

Also, fixed some cpplint issues.

Change-Id: I683553248ceb2fabb28ce1a1208bc9744aaf88d6


[ROCm/rocm_smi_lib commit: 7e17684532]
2020-12-16 17:12:04 -06:00
Divya Shikre 9f37923767 Fix for inconsistent GPU indexing between rocm-smi and rbt/hip.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d966c91bfe0f0d51859ff098d15011a3e4e8b29


[ROCm/rocm_smi_lib commit: 47ca37aef7]
2020-12-11 15:11:21 -05:00
Chris Freehill be455127c2 Make rocm_smi.py handle disappearing PIDs
rocm_smi.py had an issue where it gets process information
in 2 different places. If the process disappears in between
those 2 places, a crash would occur.

This fix gracefully returns in this scenario.
Reading the file information from /proc instead of using
the python subProcess() call was considered, but it has the
drawback of not being able to read the process names of
processes not owned by the caller.

Change-Id: If812c4641f00da37e99defb0740f670107c8a797


[ROCm/rocm_smi_lib commit: db6d8d36ea]
2020-12-10 20:53:45 -06:00
Chris Freehill df789612b6 Show more info in stderr when rsmi_init() fails
Some rsmi apps fail without much explanation when
rsmi_init() fails. This patch hopes to provide some clues to
the reason for the failure.

Change-Id: Id51308dc327b9871d537dd3e709b677db4ef10bc


[ROCm/rocm_smi_lib commit: 6377e0258d]
2020-12-10 07:32:03 -05:00
Chris Freehill aef625bfd3 Fix process killed while holding mutex
Previously, when a process holding a shared mutex was killed,
the next time an RSMI application was started, it would not be
able to obtain the mutex--the application would have to exit.
This fix uses pthread_mutexattr_setrobust() to detect this
situation and act accordingingly.

Also, add some missing, needed mutexes and move mutexes
closer to where the protect resource is used.

Change-Id: Icfdc3a246f4cfa3fd008e3f13472199abd76fd35


[ROCm/rocm_smi_lib commit: f4938b0ac9]
2020-12-04 12:59:55 -05:00
Divya Shikre 129e3e8934 Fix for syntax error caused due to performance determinism commit.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I02fbfec667e7f96ab0d0662036cf339a56025ba6


[ROCm/rocm_smi_lib commit: a0d10e021b]
2020-12-02 16:31:01 -05:00
Divya Shikre 6d4fb11c6e Adding Performance Determinism Mode to rocm_smi lib, CLI & gtest.
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.

Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock

Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2


[ROCm/rocm_smi_lib commit: 60d0f3052f]
2020-12-02 11:11:00 -05:00
Ori Messinger 3c7607d7f0 ROCm SMI Python CLI: Fix --gpureset Bug
The purpose of this patch is to fix a bug present when using the
--gpureset option on a machine that has both an AMD GPU and a
non-AMD GPU (such as a motherboard's integrated graphics).

This bug occurs due to non-AMD GPUs being ignored by the LIB when
enumerating a list of valid AMD GPUs, causing the gpuReset method
to attempt a reset on the integrated graphics.

Change-Id: I1c03a3c41f905786e3c8246ec0c2b42786ff1770
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: c0c1fd2098]
2020-11-25 11:21:36 -05:00
Mukul Joshi 87e86be7a9 Update Event Notification test for GPU Reset event
Update the event notification tests to handle both GPU pre reset
and GPU post reset events. GPU post reset event takes sometime to
be generated after the pre reset event, so issue another
notification read to wait for post reset event.

Change-Id: I2812760b184d5357130e478cc35d27b14592abb3


[ROCm/rocm_smi_lib commit: 446ab5c8c7]
2020-11-23 10:53:23 -05:00
Ori Messinger 8cae8c3ad5 ROCm SMI Python CLI: GPU showproductname SKU Fix
The purpose of this patch is to fix a bug present when using the
--showproducname option, resulting in the following error:
undefined symbol: rsmi_dev_sku_get

This bug fix uses a substring from vbios version instead of using the
LIB's rsmi_dev_sku_get to avoid getting the undefined symbol error.

Change-Id: I56d72a481d5dde44c56106ae297f4bcff40ac15f
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: 015c7d59d0]
2020-11-12 15:38:52 -05:00
Chris Freehill 7faab5b601 Add disclaimer to README and update pdf manual
Change-Id: I19c957e5a1de9f87f1834d341221fad6c826b252


[ROCm/rocm_smi_lib commit: af5227cdf7]
2020-11-10 17:36:51 -06:00
Chris Freehill 7280f28b40 rsmitst address sanitizer support
Also, add libasan flag variants for librocm_smi

Change-Id: Ibd012e40d26907addf8c0550aaf9f78c11b8d51f


[ROCm/rocm_smi_lib commit: bf6af90908]
2020-11-10 15:45:56 -06:00
Chris Freehill 6fc9f802ae Quiet address sanitizer warnings
Also,
* Fix some doxygen issues
* Fix address sanitizer issues in rsmitst

Change-Id: Ie6c6fd9af5c418210b7064e79650fb92cd4a5e2b


[ROCm/rocm_smi_lib commit: 63064b0000]
2020-11-10 14:16:39 -06:00
Chris Freehill 0fb36c2f41 Make CMakeLists.txt recognize ADDRESS_SANITIZER
Change-Id: Ic80ac42c62cd400e48fb26d504547931fdd6863a


[ROCm/rocm_smi_lib commit: e7c8dfe2a2]
2020-11-04 17:57:31 -06:00
Chris Freehill b7df80c34b Use relative path to find librocm_smi
Change-Id: Ifca3f54d680a802c1c5fa360d17e64338b9ac9a8


[ROCm/rocm_smi_lib commit: 438d28612f]
2020-10-29 14:36:48 -05:00
Elena Sakhnovitch c17f9e05e1 ROCm SMI Python CLI: --rasinject partial support
This implementation is copied directly from the previous rocm_smi.py
script; This feature is experimental and will be updated or removed with
feauture releases.

Signed-off-by: Elena Saknovitch
Change-Id: I5cd38266946302bc4123aeafaa825e13f704235e


[ROCm/rocm_smi_lib commit: 4117719edd]
2020-10-22 17:22:13 -04:00
Chris Freehill bbbdd0cb2c Add new XGMI counter events to rsmiBindings.py
Also, correct RSMI_EVNT_LAST to new value.

Change-Id: I9f693cb398bba583201f6b5b5f0e2d45ede2e4e0


[ROCm/rocm_smi_lib commit: 1982fdc4fb]
2020-10-22 17:21:50 -04:00
Divya Shikre d7d7d1e7ea Fix for weight/hops not being updated
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I333d49fa011b85d41eca63c082c0615febe2f7e9


[ROCm/rocm_smi_lib commit: 94291bf882]
2020-10-20 15:01:06 -04:00
Ori Messinger 4e97667f31 ROCm SMI Python CLI: Add CU Occupancy to showPids function
The purpose of this patch is to add CU occupancy functionality to showPids
by calling rsmi_compute_process_info_get from the LIB.

Now showPids shows the following information on (KFD compute) processes:
PID, process name, GPU(s), VRAM used, SDMA used, and CU occupancy.

Change-Id: Ie005901e0eb946ef0fbb3523245ca451c4eed595
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: 20ae72b078]
2020-10-15 21:21:32 -04:00
Ramesh Errabolu 8b53e7812f Update ROCm SMI library with ability to read CU occupancy
Change-Id: Ib9882fa2d81c13604af282279bfa116bc2fd05a4


[ROCm/rocm_smi_lib commit: 328878343c]
2020-10-14 09:33:37 -04:00
Divya Shikre caf3a5132b Adding gtest for gpu metrics read
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I66edb15c8b7380f3427822b33e845202bfac7a2b


[ROCm/rocm_smi_lib commit: f397cba414]
2020-10-08 13:37:47 -04:00
Ori Messinger 9a20f6fa3e ROCm SMI Python CLI: Check for amdgpu Driver Initialization
The purpose of this patch is to check for amdgpu driver initialization
before attempting to initialize rocmsmi in the CLI.

Additionally, since the '--help' functionality does not rely on anything
external to the CLI, it can now be called without the driver initialized.

Change-Id: I2fcce60ca6d9f77835549e3558c4bb1747499c5c
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: e3c9aec714]
2020-10-08 11:17:45 -04:00
Chris Freehill c2acc451af Revert "Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters""
This reverts commit c009809fcd.



Change-Id: Ic412a64d35aab74caf12bf4c791f0a66ac15b061


[ROCm/rocm_smi_lib commit: 5465d872aa]
2020-10-08 10:36:30 -04:00
Kent Russell f5015e6cb4 Remove extraneous mutexes
We already grab the mutex before getting the device name, so we don't
need to grab it again

Change-Id: Ib627ba3a39c485f6069af052cfd3e6c522873d43


[ROCm/rocm_smi_lib commit: e350278b68]
2020-10-08 07:55:07 -04:00
Chris Freehill c009809fcd Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters"
This reverts commit 8acd845e5b.

Temporarily reverting until the driver side of this is upstream

Change-Id: I2d8243208c1271ebad90bc2ee0fda2dfefb0831b


[ROCm/rocm_smi_lib commit: ae6d3fbdd0]
2020-10-07 18:42:56 -04:00
Kent Russell e432b4e9e3 Check FRU-based product information if available
WKS and server cards have an FRU with product information, so try to use
that for product name and product SKU if it exists.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I40bbd3bf62f4cb02e96015ed1630112691cacbc3


[ROCm/rocm_smi_lib commit: df7c3434cd]
2020-10-07 14:09:23 -04:00
Chris Freehill 624f906f07 Fail gracefully if drm directory is not found
Change-Id: I0f3ab2721108355752caf0280124469b98af4967


[ROCm/rocm_smi_lib commit: c6f02b4d62]
2020-10-05 21:12:11 -04:00
Chris Freehill 8acd845e5b Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters
Also some format fixes

Change-Id: Id3c0f6b3cf5b327bb9ca6acb6091dc67764c8032


[ROCm/rocm_smi_lib commit: 946bf93dfb]
2020-10-05 17:22:19 -05:00
Divya Shikre 94fc1524c3 Adding functionality that will parse gpu_metrics sysfs file
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I3a84870b83eb4cd0ed46f10bb19169c91f99fd8e


[ROCm/rocm_smi_lib commit: 8b48564ce3]
2020-10-02 10:25:41 -04:00
Chris Freehill 91267d1440 Add gtest lib dir to library search path
Change-Id: I57bb20e2a67a4eaac2d0e24314e22d1a5fbe3533


[ROCm/rocm_smi_lib commit: 3522e94ed0]
2020-10-01 23:46:33 -04:00
Ori Messinger 1b36ce7e6d ROCm SMI Python CLI: Implement --setclock for all Valid Clocks
The purpose of this patch is to implement --setclock functionality for
all of the valid clocks (can be set with --setclock TYPE LEVEL).

The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.
This functionality uses the existing 'setClocks' method.

Change-Id: I1d62baf372427ac1c0642c26a949663b673ef335
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: 4ed1c1d492]
2020-09-22 15:41:51 -04:00
Mukul Joshi 602f182344 Use correct string conversion function for VRAM and SDMA usage
VRAM and SDMA usage can be 64-bit long numbers. Use stoull()
instead of stoi() to convert the VRAM and SDMA usage strings to
numbers.

Change-Id: Ifadbada9f33320fc67666036ce8439823c1d1fb7


[ROCm/rocm_smi_lib commit: fb2ed24372]
2020-09-21 12:28:22 -04:00
Mukul Joshi 40bf5754fd Add support for GPU reset SMI events
Add handling for both pre GPU reset and post GPU reset SMI
events.

Change-Id: I64d5e006bef58cb28b1c580c75f482a4590427da


[ROCm/rocm_smi_lib commit: 8b95705e6f]
2020-09-16 13:25:06 -04:00
Mukul Joshi 4ad8b300d8 Add support for KFD Thermal Throttling SMI event
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.

Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0


[ROCm/rocm_smi_lib commit: aff75c955f]
2020-09-16 13:24:57 -04:00
Mukul Joshi 8082416569 Update KFD SMI event notification handling
Event bitmask in KFD SMI event is now replaced with event index in
the SMI event message. Sending a event bitmask, which was a 64-bit
field with only 1 bit set, was quite wasteful of memory and also
potentially limiting to 64 events. Instead the kernel would send
event index in the SMI event message. As a result, update the
KFD SMI event handling to expect the event index in the message.

Change-Id: I3e74620788d3c1f7c0bdaa69e9d9ab3d1aba2c92


[ROCm/rocm_smi_lib commit: 406859ca8a]
2020-09-16 13:24:50 -04:00
Chris Freehill 74113a5594 Enable library-based rocm_smi.py
Change-Id: I5443308905456defc9818fac07ac2f20fe9426fd


[ROCm/rocm_smi_lib commit: 8f9f9433d8]
2020-09-16 09:31:30 -05:00
Chris Freehill fb7952f401 Make sure all sensor labels have valid mappings
There may not be label files for some sensors on older
devices. We need to make sure there is a valid dummy
mapping in these cases.

Change-Id: Id6a8b71e554552be84a0e42a477070b504151e7f


[ROCm/rocm_smi_lib commit: b015052a07]
2020-09-11 17:32:54 -05:00
Chris Freehill c2381bff52 Add missing docs section for EvntNotif
Change-Id: I69187c734d2618ddb4272c58bb76d04646908793


[ROCm/rocm_smi_lib commit: cafd678d5d]
2020-09-11 15:48:56 -05:00
Elena Sakhnovitch 8116b10d72 ROCm SMI CLI: Add JSON support for topo functions
-Add divider between devices for --showclocks to increase readibility.
-Fix fan rounding error
-Fix spaces to comply with coding standard
-Fix @param description error in topo functions
-JSON result for topology:
{
  "card0": {
    "(Topology) Numa Node": "0",
    "(Topology) Numa Affinity": "4294967295"
  },
  "card1": {
    "(Topology) Numa Node": "0",
    "(Topology) Numa Affinity": "4294967295"
  },
  "system": {
    "(Topology) Weight between DRM devices 0 and 1": "40",
    "(Topology) Hops between DRM devices 0 and 1": "2",
    "(Topology) Link type between DRM devices 0 and 1": "PCIE"
  }
}

Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I711c100362826ed729ff90edd407009237d64f8f


[ROCm/rocm_smi_lib commit: 91f8fcb7b1]
2020-09-10 12:57:14 -04:00
Elena Sakhnovitch 248fee7425 Add README.md starter file
signed-off-by: Elena Sakhnovitch
Change-Id: I677b7d643c6559693c5ad627b704ee36631cc32e


[ROCm/rocm_smi_lib commit: edcae88fe9]
2020-09-10 11:09:42 -04:00
Elena Sakhnovitch 889bda96e1 ROCm SMI Python CLI: Implement --showbw
PCIE bandwidth functionality

Signed-off-by: Elena Sakhnovitch
Change-Id: I5a9ddc589846b6032739d491319078ead5723a27


[ROCm/rocm_smi_lib commit: 8b82621e72]
2020-09-09 14:52:58 -04:00
Harish Kasiviswanathan 43831998c9 Don't hard code rocm_smi_lib path
During rocm_smi_lib installation the path should be set using ldconfig

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I0cab18f492013b783d1ce632591ce295f934a168


[ROCm/rocm_smi_lib commit: f1786a3095]
2020-09-08 19:29:09 -04:00
Divya Shikre 31f3b6d33d Adding setsrange, setmrange, setvc, setslevel and setmlevel functionality to rocm lib and cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I5fd65ea7bcd5403aaf2e42d2aa28d837929da253


[ROCm/rocm_smi_lib commit: 54d4b9d500]
2020-09-08 18:42:39 -04:00
Ori Messinger e4aff0d37c ROCm SMI Python CLI: Implement show/set mclk OverDrive
The purpose of this patch is to implement show and set mclk OverDrive.
This implementation is copied directly from the previous rocm_smi.py
script since this functionality is mostly deprecated.

Change-Id: I705430f873a73f954b6812c222a385ff4e9b6eb2
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: 95d43e30e3]
2020-09-08 14:24:11 -04:00
Ori Messinger c73a70b431 ROCm SMI Python CLI: Implement Valid Clocks
The purpose of this patch is to implement the remaining valid clocks.
The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk
This functionality is needed for the 'setClocks' method.

Change-Id: Ie648fb29dbbd61f0f064d4462ac566911f1ca2aa
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: 2d59d0877b]
2020-09-02 06:40:59 -04:00
Divya Shikre b6ca634dcd Adding voltage range functionality to rocm cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I9288c0c6cda2a984c34cfd2570deec640b6c9f0d


[ROCm/rocm_smi_lib commit: d1f4c252b0]
2020-08-28 12:04:36 -04:00
Divya Shikre 3e5469164e Adding logic to skip the loop if src and dest device are the same in HW Topology.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib9cfbf5a7238ba75f6463e8fa6250bb9946b7979


[ROCm/rocm_smi_lib commit: 49734f8d34]
2020-08-20 10:44:28 -04:00
Harish Kasiviswanathan a659ff0a72 Update rsmi_process_info_t with sdma_usage field
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ie326e75674127a2e13f17fac344e2b672e877ce1


[ROCm/rocm_smi_lib commit: 9f5d4a698e]
2020-08-19 17:54:15 -04:00