Граф коммитов

1558 Коммитов

Автор SHA1 Сообщение Дата
Divya Shikre f397cba414 Adding gtest for gpu metrics read
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I66edb15c8b7380f3427822b33e845202bfac7a2b
2020-10-08 13:37:47 -04:00
Ori Messinger e3c9aec714 ROCm SMI Python CLI: Check for amdgpu Driver Initialization
The purpose of this patch is to check for amdgpu driver initialization
before attempting to initialize rocmsmi in the CLI.

Additionally, since the '--help' functionality does not rely on anything
external to the CLI, it can now be called without the driver initialized.

Change-Id: I2fcce60ca6d9f77835549e3558c4bb1747499c5c
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-10-08 11:17:45 -04:00
Chris Freehill 5465d872aa Revert "Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters""
This reverts commit ae6d3fbdd0.



Change-Id: Ic412a64d35aab74caf12bf4c791f0a66ac15b061
2020-10-08 10:36:30 -04:00
Kent Russell e350278b68 Remove extraneous mutexes
We already grab the mutex before getting the device name, so we don't
need to grab it again

Change-Id: Ib627ba3a39c485f6069af052cfd3e6c522873d43
2020-10-08 07:55:07 -04:00
Chris Freehill ae6d3fbdd0 Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters"
This reverts commit 946bf93dfb.

Temporarily reverting until the driver side of this is upstream

Change-Id: I2d8243208c1271ebad90bc2ee0fda2dfefb0831b
2020-10-07 18:42:56 -04:00
Kent Russell df7c3434cd Check FRU-based product information if available
WKS and server cards have an FRU with product information, so try to use
that for product name and product SKU if it exists.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I40bbd3bf62f4cb02e96015ed1630112691cacbc3
2020-10-07 14:09:23 -04:00
Chris Freehill c6f02b4d62 Fail gracefully if drm directory is not found
Change-Id: I0f3ab2721108355752caf0280124469b98af4967
2020-10-05 21:12:11 -04:00
Chris Freehill 946bf93dfb Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters
Also some format fixes

Change-Id: Id3c0f6b3cf5b327bb9ca6acb6091dc67764c8032
2020-10-05 17:22:19 -05:00
Divya Shikre 8b48564ce3 Adding functionality that will parse gpu_metrics sysfs file
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I3a84870b83eb4cd0ed46f10bb19169c91f99fd8e
2020-10-02 10:25:41 -04:00
Chris Freehill 3522e94ed0 Add gtest lib dir to library search path
Change-Id: I57bb20e2a67a4eaac2d0e24314e22d1a5fbe3533
2020-10-01 23:46:33 -04:00
Ori Messinger 4ed1c1d492 ROCm SMI Python CLI: Implement --setclock for all Valid Clocks
The purpose of this patch is to implement --setclock functionality for
all of the valid clocks (can be set with --setclock TYPE LEVEL).

The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk.
This functionality uses the existing 'setClocks' method.

Change-Id: I1d62baf372427ac1c0642c26a949663b673ef335
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-09-22 15:41:51 -04:00
Mukul Joshi fb2ed24372 Use correct string conversion function for VRAM and SDMA usage
VRAM and SDMA usage can be 64-bit long numbers. Use stoull()
instead of stoi() to convert the VRAM and SDMA usage strings to
numbers.

Change-Id: Ifadbada9f33320fc67666036ce8439823c1d1fb7
2020-09-21 12:28:22 -04:00
Mukul Joshi 8b95705e6f Add support for GPU reset SMI events
Add handling for both pre GPU reset and post GPU reset SMI
events.

Change-Id: I64d5e006bef58cb28b1c580c75f482a4590427da
2020-09-16 13:25:06 -04:00
Mukul Joshi aff75c955f Add support for KFD Thermal Throttling SMI event
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.

Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0
2020-09-16 13:24:57 -04:00
Mukul Joshi 406859ca8a Update KFD SMI event notification handling
Event bitmask in KFD SMI event is now replaced with event index in
the SMI event message. Sending a event bitmask, which was a 64-bit
field with only 1 bit set, was quite wasteful of memory and also
potentially limiting to 64 events. Instead the kernel would send
event index in the SMI event message. As a result, update the
KFD SMI event handling to expect the event index in the message.

Change-Id: I3e74620788d3c1f7c0bdaa69e9d9ab3d1aba2c92
2020-09-16 13:24:50 -04:00
Chris Freehill 8f9f9433d8 Enable library-based rocm_smi.py
Change-Id: I5443308905456defc9818fac07ac2f20fe9426fd
2020-09-16 09:31:30 -05:00
Chris Freehill b015052a07 Make sure all sensor labels have valid mappings
There may not be label files for some sensors on older
devices. We need to make sure there is a valid dummy
mapping in these cases.

Change-Id: Id6a8b71e554552be84a0e42a477070b504151e7f
2020-09-11 17:32:54 -05:00
Chris Freehill cafd678d5d Add missing docs section for EvntNotif
Change-Id: I69187c734d2618ddb4272c58bb76d04646908793
2020-09-11 15:48:56 -05:00
Elena Sakhnovitch 91f8fcb7b1 ROCm SMI CLI: Add JSON support for topo functions
-Add divider between devices for --showclocks to increase readibility.
-Fix fan rounding error
-Fix spaces to comply with coding standard
-Fix @param description error in topo functions
-JSON result for topology:
{
  "card0": {
    "(Topology) Numa Node": "0",
    "(Topology) Numa Affinity": "4294967295"
  },
  "card1": {
    "(Topology) Numa Node": "0",
    "(Topology) Numa Affinity": "4294967295"
  },
  "system": {
    "(Topology) Weight between DRM devices 0 and 1": "40",
    "(Topology) Hops between DRM devices 0 and 1": "2",
    "(Topology) Link type between DRM devices 0 and 1": "PCIE"
  }
}

Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I711c100362826ed729ff90edd407009237d64f8f
2020-09-10 12:57:14 -04:00
Elena Sakhnovitch edcae88fe9 Add README.md starter file
signed-off-by: Elena Sakhnovitch
Change-Id: I677b7d643c6559693c5ad627b704ee36631cc32e
2020-09-10 11:09:42 -04:00
Elena Sakhnovitch 8b82621e72 ROCm SMI Python CLI: Implement --showbw
PCIE bandwidth functionality

Signed-off-by: Elena Sakhnovitch
Change-Id: I5a9ddc589846b6032739d491319078ead5723a27
2020-09-09 14:52:58 -04:00
Harish Kasiviswanathan f1786a3095 Don't hard code rocm_smi_lib path
During rocm_smi_lib installation the path should be set using ldconfig

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I0cab18f492013b783d1ce632591ce295f934a168
2020-09-08 19:29:09 -04:00
Divya Shikre 54d4b9d500 Adding setsrange, setmrange, setvc, setslevel and setmlevel functionality to rocm lib and cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I5fd65ea7bcd5403aaf2e42d2aa28d837929da253
2020-09-08 18:42:39 -04:00
Ori Messinger 95d43e30e3 ROCm SMI Python CLI: Implement show/set mclk OverDrive
The purpose of this patch is to implement show and set mclk OverDrive.
This implementation is copied directly from the previous rocm_smi.py
script since this functionality is mostly deprecated.

Change-Id: I705430f873a73f954b6812c222a385ff4e9b6eb2
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-09-08 14:24:11 -04:00
Ori Messinger 2d59d0877b ROCm SMI Python CLI: Implement Valid Clocks
The purpose of this patch is to implement the remaining valid clocks.
The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk
This functionality is needed for the 'setClocks' method.

Change-Id: Ie648fb29dbbd61f0f064d4462ac566911f1ca2aa
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-09-02 06:40:59 -04:00
Divya Shikre d1f4c252b0 Adding voltage range functionality to rocm cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I9288c0c6cda2a984c34cfd2570deec640b6c9f0d
2020-08-28 12:04:36 -04:00
Divya Shikre 49734f8d34 Adding logic to skip the loop if src and dest device are the same in HW Topology.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib9cfbf5a7238ba75f6463e8fa6250bb9946b7979
2020-08-20 10:44:28 -04:00
Harish Kasiviswanathan 9f5d4a698e Update rsmi_process_info_t with sdma_usage field
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ie326e75674127a2e13f17fac344e2b672e877ce1
2020-08-19 17:54:15 -04:00
Divya Shikre 1276e4b9e9 Adding gpu reset functionality to rocm cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ifc0a239e8e8046fd7f56893d0101e0866cc3185f
2020-08-19 13:37:47 -04:00
Chris Freehill 7be97ec2aa Clean up comments for rsmitst
Change-Id: Iea5322a5fd3bffe77557fa2cecbce70716e1258c
2020-08-17 11:48:07 -05:00
Divya Shikre 2e8dc4f2a9 Adding Sdma Usage to showpids
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com
Change-Id: I72a9e1adc61eba382f1ac17c8e50b2a8bd6d6898
2020-08-14 12:12:34 -04:00
Divya Shikre 4032898d1b Adding Hw Topology option to ROCm SMI Python CLI
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com
Change-Id: Ic46334567703f705e38b3a8b4a08ab388c749251
2020-08-13 18:51:21 -04:00
Ori Messinger b568270f55 ROCm SMI Python CLI: properly cast pid to int
The purpose of this patch is to fix --showpids and --showpidgpus functionality.
When pid is passed into a LIB function, it must be cast to int first.

Change-Id: I5cb7ac41052abeefff0dedf2384c4bb3c8d577a3
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-08-13 04:34:08 -04:00
Chris Freehill da64e284dc Move README back to root
README should be at root to display in github main page.
Also, removed paragraph related to API changes early
in development.

Change-Id: I2e92573a31d3caa7790364de9356c6d7e7be553d
2020-08-06 09:27:48 -05:00
Chris Freehill 0468aa4971 Correct event counter documentation example
Change-Id: I74c41de8e4aacbd42d9e156983369eb76bec3367
2020-08-06 08:49:21 -05:00
Ori Messinger 2b909252ac ROCm SMI Python CLI
This tool acts as a command line interface for manipulating
and monitoring the Radeon Open Compute Kernel, similar to the
rocm_smi.py python tool.

The purpose of this commit is for the initial upload and cleanup
of the (incomplete) rocmSmiLib_cli.py and rsmiBindings.py files.

In the near future, this tool should have full feature parity with
rocm_smi.py by relying on the available rocm_smi_lib functions.

Change-Id: Ifbafd5118c15c68c240e3c83a47d2690a27c9353
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-08-05 12:38:11 -04:00
Chris Freehill 92c258c364 Replace "." in pkg name with "-"
Package name should have a hyphen (not a period) between
NumCommitsSinceLastTag and ROCMIntegrationJobIdentifier.

Fixes SWDEV-245838

Change-Id: I28c4337af6f92ac51a4aed03a09af23b92bd89b5
2020-07-27 20:54:52 -04:00
Chris Freehill c2439d28e8 Correct usage of bitwise &
Also, fix warning related to catch() and cpplint error.

Change-Id: I4292170538d0f700fccb605814c5058543abe74a
2020-07-26 20:08:24 -05:00
Ashutosh Mishra d325613220 Adding "BUILD_SHARED_LIBS" flag to cmake files
JIRA : SWDEV-234471
Changing cmake for dynamically creation of shared / archive libs depending upon the parameret to cmake

Adapted comments.

Change-Id: Ice5925719b8c307c32310b252f61cbc211d1af27
2020-07-16 22:32:55 -04:00
Chris Freehill 52514835f0 Update xgmi event counter documentation
Also:
* fix doxygen manual generation that was altered during
  OAM refactor
* quiet some compile warnings.

Change-Id: I548a3cf00eb887bea3dbf58e362ca6dfe90bde28
2020-07-16 17:42:56 -05:00
Mukul Joshi 9d24fc9175 Fix compiler warning in TestPciReadWrite
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.

Change-Id: I05948073b0c40700bee69399b08df6031fc49d70
2020-07-13 17:32:17 -04:00
Mukul Joshi eea1ed8c3d Add support to retrieve process SDMA usage information.
Also, print SDMA usage information in TestProcInfoRead.

Change-Id: I8d19be3b8653e298c81237e5067eca75a1743e70
2020-07-13 17:32:08 -04:00
Chris Freehill 68155baed5 Handle un-readable kfd properties files
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.

Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".

Fixes SWDEV-240169

Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues

Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd
2020-07-10 12:35:31 -04:00
Chris Freehill e2c7ef6422 TestPerfCntrReadWrite fail rsmitst if not supported
Fixes SWDEV-243639

Change-Id: I087171231fbbe5939f239efad25a5485529381a3
2020-07-08 18:41:30 -04:00
Chris Freehill c2ef9a6879 Fix docs + cmake_utils path issues
This corrects issues that arose after OAM reorganization.
It should address SWDEV-243294.

Also, fix some compile warnings that show up on RHEL.

Change-Id: Id14d444905da35cd7346bcfbcd82b6d0572708c4
2020-07-08 09:47:25 -05:00
Chris Freehill 866438966d Quiet spurious pthread_unlock warnings
A message is output in debug builds when pthread_unlock
returns an error. However, in most cases, it should return
EPERM. In fact, if it doesn't return EPERM, it is an
indication of a problem. This commit adjusts accordingly.

Change-Id: Ia5cad89aa6e68e79c1291ea21adffb0fa68f2300
2020-06-30 15:12:58 -05:00
Divya Shikre e21232f059 OAM: Implement get_sensors_info()
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia2c6e18f463c0f97530ca8ad07d249e6f2116534
2020-06-29 14:50:19 -04:00
Amber Lin 27deaea6e8 OAM: Add get dev and pci properties and sensor count
Also, add amdoam_get_error_description.

On behalf of
Amber Lin <Amber.Lin@amd.com> and
Divya Shikre <DivyaUday.Shikre@amd.com>

Change-Id: I1f5ac0c5948adb2c30008e95c501e8b69b8183b6
2020-06-23 17:21:07 -05:00
Chris Freehill 6594f8f58b Refactor rsmi to support oam
Change-Id: Idc524e01ba06eb5c8d1682becaf5bf8ced5bffcf
2020-06-22 18:51:46 -05:00
Chris Freehill 59394f3354 Ensure no device mutexes are left held on shut_down
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.

Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
2020-06-19 13:59:20 -05:00