Граф коммитов

107 Коммитов

Автор SHA1 Сообщение Дата
Bill(Shuzhou) Liu 8ce9289bc2 Upgrade GoogleTest to v1.11.0
The old GoogleTest has compile errors on Centos 9. Upgrade it
to latest version.

Change-Id: I6bbe6afdfad6422a210f258880ddc87a9f088d76
2022-03-09 15:18:43 -05:00
Sreekant Somasekharan e6ae697e9c Add blacklist filter 'virtualization' for rsmi tests failing in SRIOV
Change-Id: Ibbaef092482c0b78ecd86a29f0b9b4331b51abe2
2022-03-04 22:13:44 -05:00
Divya Shikre 8c4635acea Temporary blacklist TestPerfLevelReadWrite for navi21
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Iee2146170b6828fe4fe2846c3ebfd57f95734f34
2022-01-27 22:56:37 -05:00
Divya Shikre 11a71c63b1 Don't assert when fan is not supported.
Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d
2022-01-20 12:29:12 -05:00
Divya Shikre b4fd9c0d94 Update temp_read rsmitst.
Check for RSMI_STATUS_INVALID_ARGS when invalid args are passed.

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d5ff84aee5cce4214026ddcd860a17ae3e43147
2021-11-29 18:09:45 -05:00
Sreekant Somasekharan c6f695f5a9 Skip TestFrequenciesReadWrite for unsupported ASICs
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.

As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.

Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd
2021-11-29 11:23:38 -05:00
Sreekant Somasekharan 3f27dcc1ac Modify bool variable to true in if condition of src=dst
Change-Id: Ie2024b3a6ad68e48384bb3472fe8785bcd643665
2021-11-17 12:53:40 -05:00
Sreekant Somasekharan ce46fd237a Add test case for rsmi_is_P2P_accessible API.
Change-Id: Iccfede42925c98d96454b5f25cc0ed6fc9258911
2021-10-28 17:06:07 -04:00
Divya Shikre e96d6ab77e Add failing rsmi tests to exclude file to enable blacklisting
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibdad4d54ffe87391b13379c63e005fd04c6abaf5
2021-10-26 17:57:05 -04:00
Bill(Shuzhou) Liu 42d39d3e34 Add -g compiler option for ADDRESS_SANITIZER
Add -g compiler option for Address Sanitizer

Change-Id: I958fefa6c4b5871c29734ab1d4ec238c9e073192
2021-08-03 13:54:19 -04:00
Divya Shikre 6edea7a92e Add fix to ignore error returned when perf determinism is not supported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I89b6a0a3dbba6fbd4b12ff2e20670eff9f32ed7f
2021-06-14 12:18:22 -04:00
Ori Messinger 5b42cdf780 ROCm SMI LIB: Add Default Power Cap To rsmitst
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c
2021-04-28 12:42:34 -04:00
Harish Kasiviswanathan 844acbc0d8 Add energy counter resolution to rsmi_dev_energy_count_get
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I03b70968257db7a45e21d7ba62542cdedd18eb85
2021-04-22 10:25:06 -04:00
Harish Kasiviswanathan 92cf7ff28a Add time profile for set_power_cap function
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Id728cb5fe85b3558e52b4517508211dca499e801
2021-04-21 15:29:44 -04:00
Bill(Shuzhou) Liu 392d13e318 Unit test for energy accumulator counter
Add a few unit tests for energy accumulator counter.

Change-Id: Ib78a67e29465de9c14e6e934c5d62ec64de66d8a
2021-04-14 16:04:46 -04:00
Bill(Shuzhou) Liu 6340176b99 Unit tests for coarse grain utilization counters
The unit tests for GFX and Memory activity counters.

Change-Id: I968dabc9ef6de9d335d7f751b290fb713b51a79c
2021-04-14 10:53:55 -04:00
Divya Shikre aaf2120117 Update performance determinism api as per the modified sysfs interface.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib0ec5128819644a2ff6c916da9194a7fe1dad795
2021-04-07 16:38:48 -04:00
Bill(Shuzhou) Liu da480b4589 Add support for the HBM temperature
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.

Change-Id: I96b979296e90cf881523627b41b1a02849676416
2021-04-05 15:55:55 -04:00
Chris Freehill 5e2a4f3a15 Handle different gpu_metrics content versions for format v1
Change-Id: I344d1815da683befc8f8b5caf921803b267ae29f
2021-03-24 14:34:55 -05:00
Kent Russell ef7f99a7e2 CMakeLIsts: Fix libasan usage
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9
2021-01-15 15:39:05 -05:00
Chris Freehill 7e17684532 Correct TestPerfLevelReadWrite test
Enums referenced in the test did not match what's in rocm_smi.h.
Added static assert to try to catch this. Also moved enum string
map to test_common.cc/h where other such maps are.

Also, fixed some cpplint issues.

Change-Id: I683553248ceb2fabb28ce1a1208bc9744aaf88d6
2020-12-16 17:12:04 -06:00
Chris Freehill f4938b0ac9 Fix process killed while holding mutex
Previously, when a process holding a shared mutex was killed,
the next time an RSMI application was started, it would not be
able to obtain the mutex--the application would have to exit.
This fix uses pthread_mutexattr_setrobust() to detect this
situation and act accordingingly.

Also, add some missing, needed mutexes and move mutexes
closer to where the protect resource is used.

Change-Id: Icfdc3a246f4cfa3fd008e3f13472199abd76fd35
2020-12-04 12:59:55 -05:00
Divya Shikre 60d0f3052f Adding Performance Determinism Mode to rocm_smi lib, CLI & gtest.
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.

Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock

Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2
2020-12-02 11:11:00 -05:00
Mukul Joshi 446ab5c8c7 Update Event Notification test for GPU Reset event
Update the event notification tests to handle both GPU pre reset
and GPU post reset events. GPU post reset event takes sometime to
be generated after the pre reset event, so issue another
notification read to wait for post reset event.

Change-Id: I2812760b184d5357130e478cc35d27b14592abb3
2020-11-23 10:53:23 -05:00
Chris Freehill bf6af90908 rsmitst address sanitizer support
Also, add libasan flag variants for librocm_smi

Change-Id: Ibd012e40d26907addf8c0550aaf9f78c11b8d51f
2020-11-10 15:45:56 -06:00
Chris Freehill 63064b0000 Quiet address sanitizer warnings
Also,
* Fix some doxygen issues
* Fix address sanitizer issues in rsmitst

Change-Id: Ie6c6fd9af5c418210b7064e79650fb92cd4a5e2b
2020-11-10 14:16:39 -06:00
Ramesh Errabolu 328878343c Update ROCm SMI library with ability to read CU occupancy
Change-Id: Ib9882fa2d81c13604af282279bfa116bc2fd05a4
2020-10-14 09:33:37 -04:00
Divya Shikre f397cba414 Adding gtest for gpu metrics read
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I66edb15c8b7380f3427822b33e845202bfac7a2b
2020-10-08 13:37:47 -04:00
Chris Freehill 5465d872aa Revert "Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters""
This reverts commit ae6d3fbdd0.



Change-Id: Ic412a64d35aab74caf12bf4c791f0a66ac15b061
2020-10-08 10:36:30 -04:00
Chris Freehill ae6d3fbdd0 Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters"
This reverts commit 946bf93dfb.

Temporarily reverting until the driver side of this is upstream

Change-Id: I2d8243208c1271ebad90bc2ee0fda2dfefb0831b
2020-10-07 18:42:56 -04:00
Chris Freehill 946bf93dfb Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters
Also some format fixes

Change-Id: Id3c0f6b3cf5b327bb9ca6acb6091dc67764c8032
2020-10-05 17:22:19 -05:00
Chris Freehill 3522e94ed0 Add gtest lib dir to library search path
Change-Id: I57bb20e2a67a4eaac2d0e24314e22d1a5fbe3533
2020-10-01 23:46:33 -04:00
Mukul Joshi 8b95705e6f Add support for GPU reset SMI events
Add handling for both pre GPU reset and post GPU reset SMI
events.

Change-Id: I64d5e006bef58cb28b1c580c75f482a4590427da
2020-09-16 13:25:06 -04:00
Mukul Joshi aff75c955f Add support for KFD Thermal Throttling SMI event
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.

Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0
2020-09-16 13:24:57 -04:00
Mukul Joshi 406859ca8a Update KFD SMI event notification handling
Event bitmask in KFD SMI event is now replaced with event index in
the SMI event message. Sending a event bitmask, which was a 64-bit
field with only 1 bit set, was quite wasteful of memory and also
potentially limiting to 64 events. Instead the kernel would send
event index in the SMI event message. As a result, update the
KFD SMI event handling to expect the event index in the message.

Change-Id: I3e74620788d3c1f7c0bdaa69e9d9ab3d1aba2c92
2020-09-16 13:24:50 -04:00
Chris Freehill 7be97ec2aa Clean up comments for rsmitst
Change-Id: Iea5322a5fd3bffe77557fa2cecbce70716e1258c
2020-08-17 11:48:07 -05:00
Mukul Joshi 9d24fc9175 Fix compiler warning in TestPciReadWrite
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.

Change-Id: I05948073b0c40700bee69399b08df6031fc49d70
2020-07-13 17:32:17 -04:00
Mukul Joshi eea1ed8c3d Add support to retrieve process SDMA usage information.
Also, print SDMA usage information in TestProcInfoRead.

Change-Id: I8d19be3b8653e298c81237e5067eca75a1743e70
2020-07-13 17:32:08 -04:00
Chris Freehill 68155baed5 Handle un-readable kfd properties files
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.

Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".

Fixes SWDEV-240169

Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues

Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd
2020-07-10 12:35:31 -04:00
Chris Freehill e2c7ef6422 TestPerfCntrReadWrite fail rsmitst if not supported
Fixes SWDEV-243639

Change-Id: I087171231fbbe5939f239efad25a5485529381a3
2020-07-08 18:41:30 -04:00
Chris Freehill 59394f3354 Ensure no device mutexes are left held on shut_down
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.

Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
2020-06-19 13:59:20 -05:00
Mike Li 488bbb668a Add support to retrieve XGMI hive id
Change-Id: I1eee05dd85ecb856889d1cfe0565454d2f538856
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2020-06-19 07:35:23 -07:00
Chris Freehill 9e0ebb250c Fix line endings for init_shutdown_refcount.*
* Also, add assert that check for proper usage of
rand_sleep_mod().

Change-Id: Ieb4179e1ad12fbbf85c2e4f7c7f119b0bb30b197
2020-06-17 21:26:12 -05:00
Chris Freehill efc9b7658c Make verbosity level 0 completely quiet
Also, support --iterations flag for certain functions that will
likely be repeated frequently.

Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6
2020-06-17 21:26:12 -05:00
Divya Shikre 2805ed16a4 Adding current voltage feature & gtest.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic555a3af265e603419e2875d1989a366abc82596
2020-06-16 11:48:56 -04:00
Chris Freehill f946ea37ef Update XGMI perf counter test to show utilization
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
  reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file

Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
2020-06-10 12:49:49 -04:00
Kent Russell 8cf44548c0 Make an empty unique_id file non-fatal
This isn't supported on all models, so just comment out on failure
instead of fully failing

Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6
2020-06-04 10:31:53 -04:00
Mukul Joshi 633c852f5d Print VRAM usage in rsmitst
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.

Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2
2020-05-29 15:48:06 -04:00
Chris Freehill 8ced9c986a Add RSMI ref manual to packages
Also,
* remove extraneous test files
* fix Doxygen docs. issues
* fix whitespace issues

Change-Id: I9b58b0d68bd125a34f4fe0dc84d609c7b0b6e30e
2020-05-18 23:40:38 -04:00
Mike Li c7d349183a Add functions that are used to query Hardware topology.
Change-Id: I0f4cd02b237bde4d6dccfb0e83e65376ecb1cfaa
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2020-05-18 12:37:27 -04:00