Граф коммитов

123 Коммитов

Автор SHA1 Сообщение Дата
Galantsev, Dmitrii 0c52236abd CMAKE: Resolve lib dependencies for tests
amdsmitst was failing and not finding libgtest and libamd_smi.

This change resolves the issue by

1. Installing gtest into tests directory
2. Modifying RUNPATH variable to point to libamd_smi.so

Change-Id: I126d01c88116d37c5f2b55b9ecb2c9f1313f26fe
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-12-12 14:37:57 -06:00
Dalibor Stanisavljevic b4b761d02f SWDEV-370223 - Change the name of the header to amdsmi.h
Change dev to device_handle throughout the file
Change the pcie_info pcie_speed field type to uint32_t
Add AMDSMI prefix before amdsmi_mm_ip enum

Change-Id: I242145389ddc3f2ad05dfd6ca371640f4d118fc4
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-12-08 13:34:34 +01:00
Galantsev, Dmitrii aeb0bf5832 CMAKE: Repackage whole project for ROCm 5.5 release
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Change-Id: I932b11a111c8e0db04bd8c5e0c3d1a470e5b2386
2022-11-29 17:04:32 -06:00
Bill(Shuzhou) Liu c8caa80405 Fix the unit tests
Fix a few broken unit tests to handle NOT_YET_IMPLEMENTED errors.

Change-Id: If3afac0dc32f2e3e82d83bffa5906b630bb1894a
2022-11-04 08:53:09 -05:00
Galantsev, Dmitrii 7957b63dd4 Cleanup tests
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-11-03 13:30:16 -05:00
Galantsev, Dmitrii c99e4e1501 Cleanup CMakeLists.txt for packaging
Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2022-11-03 12:44:23 -05:00
Bill(Shuzhou) Liu 9a92ea833f The device name and vbios version is incorrect
Get the device name from rocm-smi which is not displayed properly
in some cards. Set the vibos version using the rocm-smi.

Change-Id: I138f1760cde94007cb93cad02c6d8cccbb4afa28
2022-10-28 13:03:18 -05:00
Bill(Shuzhou) Liu a25c71b730 Fix unit test compile error
Update the unit test as header enum type name is changed.

Change-Id: Ie965462da7d46259883650b15644003cf936982a
2022-10-20 09:54:11 -05:00
Bill(Shuzhou) Liu 2b2d11c446 Change the get_socket_handles and get_device_handles APIs interface
Those two APIs are changed to let the user get the handles count,
allocate memory, and then return handles to the allocated memory.

Change-Id: Ibe28a89ad188c99da6af3af1740b2b25ff22ba06
2022-10-20 09:24:31 -05:00
Dalibor Stanisavljevic 3daf9c1063 SWDEV-353742 - Port smilib function to amdsmi
Change-Id: I99df249755a5c665a8dd1777fa82d046e139bd77
Signed-off-by: Dalibor Stanisavljevic <Dalibor.Stanisavljevic@amd.com>
2022-10-20 09:24:22 -05:00
Bill(Shuzhou) Liu 0c91ef919d Restructure the folder
Move rocm_smi related function to rocm_smi folder. Move amd_smi to
top level include/ and src/ folder. Remove obsolte oam folder.
Change the CMakeLists.txt to update folder locations.

Change-Id: I52e6be739e49f3b0545865f25364787f5985e9c3
2022-10-20 09:23:51 -05:00
Bill(Shuzhou) Liu 1ec3a2182e Support rocm-smi related device information
A few fields are added to board_info and asic_info for rocm-smi
device information.

Implement rocm-smi related fw block in amdsmi_get_fw_info().

Change-Id: I825d3e5c7feaa07a6e05386d4f1a59ebf528dfc0
2022-10-20 09:23:41 -05:00
Bill(Shuzhou) Liu f1d02aca79 Port rocm-smi function to amd-smi
Port most rocm-smi function to amd-smi and add unit tests.

Change-Id: I6387a4bdaf20ead2389c99bb01d438156ccd0747
2022-09-06 12:08:59 -04:00
Bill(Shuzhou) Liu 86017b799c Port more rocm-smi function to amd-smi
The API support function, performance counter, process information,
topology and xgmi info.

Change-Id: I3350ec75fdd2ca1438e79134582ae83c49763056
2022-08-24 12:49:27 -05:00
Bill(Shuzhou) Liu 7b92c694a0 Support events in the amdsmi
Port the events handling from rocm-smi to amd-smi

Change-Id: I0b4cb30a585cb2188a24be0e21c1c156b461bb1d
2022-08-23 16:49:56 -04:00
Bill(Shuzhou) Liu 98df483bef Add unit test support
Add gtest based unit test framework. Implement fan read/write function.

Change-Id: I83375c24b99d24d01d12bccda863a38f75f5987f
2022-08-05 09:55:34 -04:00
Bill(Shuzhou) Liu 8ce9289bc2 Upgrade GoogleTest to v1.11.0
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.

Change-Id: I6bbe6afdfad6422a210f258880ddc87a9f088d76
2022-03-09 15:18:43 -05:00
Sreekant Somasekharan e6ae697e9c Add blacklist filter 'virtualization' for rsmi tests failing in SRIOV
Change-Id: Ibbaef092482c0b78ecd86a29f0b9b4331b51abe2
2022-03-04 22:13:44 -05:00
Divya Shikre 8c4635acea Temporary blacklist TestPerfLevelReadWrite for navi21
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iee2146170b6828fe4fe2846c3ebfd57f95734f34
2022-01-27 22:56:37 -05:00
Divya Shikre 11a71c63b1 Don't assert when fan is not supported.
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d
2022-01-20 12:29:12 -05:00
Divya Shikre b4fd9c0d94 Update temp_read rsmitst.
Check for RSMI_STATUS_INVALID_ARGS when invalid args are passed.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d5ff84aee5cce4214026ddcd860a17ae3e43147
2021-11-29 18:09:45 -05:00
Sreekant Somasekharan c6f695f5a9 Skip TestFrequenciesReadWrite for unsupported ASICs
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.

As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.

Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd
2021-11-29 11:23:38 -05:00
Sreekant Somasekharan 3f27dcc1ac Modify bool variable to true in if condition of src=dst
Change-Id: Ie2024b3a6ad68e48384bb3472fe8785bcd643665
2021-11-17 12:53:40 -05:00
Sreekant Somasekharan ce46fd237a Add test case for rsmi_is_P2P_accessible API.
Change-Id: Iccfede42925c98d96454b5f25cc0ed6fc9258911
2021-10-28 17:06:07 -04:00
Divya Shikre e96d6ab77e Add failing rsmi tests to exclude file to enable blacklisting
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibdad4d54ffe87391b13379c63e005fd04c6abaf5
2021-10-26 17:57:05 -04:00
Bill(Shuzhou) Liu 42d39d3e34 Add -g compiler option for ADDRESS_SANITIZER
Add -g compiler option for Address Sanitizer

Change-Id: I958fefa6c4b5871c29734ab1d4ec238c9e073192
2021-08-03 13:54:19 -04:00
Divya Shikre 6edea7a92e Add fix to ignore error returned when perf determinism is not supported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I89b6a0a3dbba6fbd4b12ff2e20670eff9f32ed7f
2021-06-14 12:18:22 -04:00
Ori Messinger 5b42cdf780 ROCm SMI LIB: Add Default Power Cap To rsmitst
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c
2021-04-28 12:42:34 -04:00
Harish Kasiviswanathan 844acbc0d8 Add energy counter resolution to rsmi_dev_energy_count_get
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I03b70968257db7a45e21d7ba62542cdedd18eb85
2021-04-22 10:25:06 -04:00
Harish Kasiviswanathan 92cf7ff28a Add time profile for set_power_cap function
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Id728cb5fe85b3558e52b4517508211dca499e801
2021-04-21 15:29:44 -04:00
Bill(Shuzhou) Liu 392d13e318 Unit test for energy accumulator counter
Add a few unit tests for energy accumulator counter.

Change-Id: Ib78a67e29465de9c14e6e934c5d62ec64de66d8a
2021-04-14 16:04:46 -04:00
Bill(Shuzhou) Liu 6340176b99 Unit tests for coarse grain utilization counters
The unit tests for GFX and Memory activity counters.

Change-Id: I968dabc9ef6de9d335d7f751b290fb713b51a79c
2021-04-14 10:53:55 -04:00
Divya Shikre aaf2120117 Update performance determinism api as per the modified sysfs interface.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib0ec5128819644a2ff6c916da9194a7fe1dad795
2021-04-07 16:38:48 -04:00
Bill(Shuzhou) Liu da480b4589 Add support for the HBM temperature
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.

Change-Id: I96b979296e90cf881523627b41b1a02849676416
2021-04-05 15:55:55 -04:00
Chris Freehill 5e2a4f3a15 Handle different gpu_metrics content versions for format v1
Change-Id: I344d1815da683befc8f8b5caf921803b267ae29f
2021-03-24 14:34:55 -05:00
Kent Russell ef7f99a7e2 CMakeLIsts: Fix libasan usage
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9
2021-01-15 15:39:05 -05:00
Chris Freehill 7e17684532 Correct TestPerfLevelReadWrite test
Enums referenced in the test did not match what's in rocm_smi.h.
Added static assert to try to catch this. Also moved enum string
map to test_common.cc/h where other such maps are.

Also, fixed some cpplint issues.

Change-Id: I683553248ceb2fabb28ce1a1208bc9744aaf88d6
2020-12-16 17:12:04 -06:00
Chris Freehill f4938b0ac9 Fix process killed while holding mutex
Previously, when a process holding a shared mutex was killed,
the next time an RSMI application was started, it would not be
able to obtain the mutex--the application would have to exit.
This fix uses pthread_mutexattr_setrobust() to detect this
situation and act accordingingly.

Also, add some missing, needed mutexes and move mutexes
closer to where the protect resource is used.

Change-Id: Icfdc3a246f4cfa3fd008e3f13472199abd76fd35
2020-12-04 12:59:55 -05:00
Divya Shikre 60d0f3052f Adding Performance Determinism Mode to rocm_smi lib, CLI & gtest.
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.

Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock

Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2
2020-12-02 11:11:00 -05:00
Mukul Joshi 446ab5c8c7 Update Event Notification test for GPU Reset event
Update the event notification tests to handle both GPU pre reset
and GPU post reset events. GPU post reset event takes sometime to
be generated after the pre reset event, so issue another
notification read to wait for post reset event.

Change-Id: I2812760b184d5357130e478cc35d27b14592abb3
2020-11-23 10:53:23 -05:00
Chris Freehill bf6af90908 rsmitst address sanitizer support
Also, add libasan flag variants for librocm_smi

Change-Id: Ibd012e40d26907addf8c0550aaf9f78c11b8d51f
2020-11-10 15:45:56 -06:00
Chris Freehill 63064b0000 Quiet address sanitizer warnings
Also,
* Fix some doxygen issues
* Fix address sanitizer issues in rsmitst

Change-Id: Ie6c6fd9af5c418210b7064e79650fb92cd4a5e2b
2020-11-10 14:16:39 -06:00
Ramesh Errabolu 328878343c Update ROCm SMI library with ability to read CU occupancy
Change-Id: Ib9882fa2d81c13604af282279bfa116bc2fd05a4
2020-10-14 09:33:37 -04:00
Divya Shikre f397cba414 Adding gtest for gpu metrics read
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I66edb15c8b7380f3427822b33e845202bfac7a2b
2020-10-08 13:37:47 -04:00
Chris Freehill 5465d872aa Revert "Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters""
This reverts commit ae6d3fbdd0.



Change-Id: Ic412a64d35aab74caf12bf4c791f0a66ac15b061
2020-10-08 10:36:30 -04:00
Chris Freehill ae6d3fbdd0 Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters"
This reverts commit 946bf93dfb.

Temporarily reverting until the driver side of this is upstream

Change-Id: I2d8243208c1271ebad90bc2ee0fda2dfefb0831b
2020-10-07 18:42:56 -04:00
Chris Freehill 946bf93dfb Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters
Also some format fixes

Change-Id: Id3c0f6b3cf5b327bb9ca6acb6091dc67764c8032
2020-10-05 17:22:19 -05:00
Chris Freehill 3522e94ed0 Add gtest lib dir to library search path
Change-Id: I57bb20e2a67a4eaac2d0e24314e22d1a5fbe3533
2020-10-01 23:46:33 -04:00
Mukul Joshi 8b95705e6f Add support for GPU reset SMI events
Add handling for both pre GPU reset and post GPU reset SMI
events.

Change-Id: I64d5e006bef58cb28b1c580c75f482a4590427da
2020-09-16 13:25:06 -04:00
Mukul Joshi aff75c955f Add support for KFD Thermal Throttling SMI event
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.

Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0
2020-09-16 13:24:57 -04:00