コミットグラフ

301 コミット

作成者 SHA1 メッセージ 日付
Sreekant Somasekharan 3f27dcc1ac Modify bool variable to true in if condition of src=dst
Change-Id: Ie2024b3a6ad68e48384bb3472fe8785bcd643665
2021-11-17 12:53:40 -05:00
Sreekant Somasekharan ce46fd237a Add test case for rsmi_is_P2P_accessible API.
Change-Id: Iccfede42925c98d96454b5f25cc0ed6fc9258911
2021-10-28 17:06:07 -04:00
Divya Shikre e96d6ab77e Add failing rsmi tests to exclude file to enable blacklisting
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ibdad4d54ffe87391b13379c63e005fd04c6abaf5
2021-10-26 17:57:05 -04:00
Bill(Shuzhou) Liu 42d39d3e34 Add -g compiler option for ADDRESS_SANITIZER
Add -g compiler option for Address Sanitizer

Change-Id: I958fefa6c4b5871c29734ab1d4ec238c9e073192
2021-08-03 13:54:19 -04:00
Divya Shikre 6edea7a92e Add fix to ignore error returned when perf determinism is not supported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I89b6a0a3dbba6fbd4b12ff2e20670eff9f32ed7f
2021-06-14 12:18:22 -04:00
Ori Messinger 5b42cdf780 ROCm SMI LIB: Add Default Power Cap To rsmitst
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c
2021-04-28 12:42:34 -04:00
Harish Kasiviswanathan 844acbc0d8 Add energy counter resolution to rsmi_dev_energy_count_get
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I03b70968257db7a45e21d7ba62542cdedd18eb85
2021-04-22 10:25:06 -04:00
Harish Kasiviswanathan 92cf7ff28a Add time profile for set_power_cap function
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Id728cb5fe85b3558e52b4517508211dca499e801
2021-04-21 15:29:44 -04:00
Bill(Shuzhou) Liu 392d13e318 Unit test for energy accumulator counter
Add a few unit tests for energy accumulator counter.

Change-Id: Ib78a67e29465de9c14e6e934c5d62ec64de66d8a
2021-04-14 16:04:46 -04:00
Bill(Shuzhou) Liu 6340176b99 Unit tests for coarse grain utilization counters
The unit tests for GFX and Memory activity counters.

Change-Id: I968dabc9ef6de9d335d7f751b290fb713b51a79c
2021-04-14 10:53:55 -04:00
Divya Shikre aaf2120117 Update performance determinism api as per the modified sysfs interface.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib0ec5128819644a2ff6c916da9194a7fe1dad795
2021-04-07 16:38:48 -04:00
Bill(Shuzhou) Liu da480b4589 Add support for the HBM temperature
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.

Change-Id: I96b979296e90cf881523627b41b1a02849676416
2021-04-05 15:55:55 -04:00
Chris Freehill 5e2a4f3a15 Handle different gpu_metrics content versions for format v1
Change-Id: I344d1815da683befc8f8b5caf921803b267ae29f
2021-03-24 14:34:55 -05:00
Kent Russell ef7f99a7e2 CMakeLIsts: Fix libasan usage
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9
2021-01-15 15:39:05 -05:00
Chris Freehill 7e17684532 Correct TestPerfLevelReadWrite test
Enums referenced in the test did not match what's in rocm_smi.h.
Added static assert to try to catch this. Also moved enum string
map to test_common.cc/h where other such maps are.

Also, fixed some cpplint issues.

Change-Id: I683553248ceb2fabb28ce1a1208bc9744aaf88d6
2020-12-16 17:12:04 -06:00
Chris Freehill f4938b0ac9 Fix process killed while holding mutex
Previously, when a process holding a shared mutex was killed,
the next time an RSMI application was started, it would not be
able to obtain the mutex--the application would have to exit.
This fix uses pthread_mutexattr_setrobust() to detect this
situation and act accordingingly.

Also, add some missing, needed mutexes and move mutexes
closer to where the protect resource is used.

Change-Id: Icfdc3a246f4cfa3fd008e3f13472199abd76fd35
2020-12-04 12:59:55 -05:00
Divya Shikre 60d0f3052f Adding Performance Determinism Mode to rocm_smi lib, CLI & gtest.
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.

Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock

Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2
2020-12-02 11:11:00 -05:00
Mukul Joshi 446ab5c8c7 Update Event Notification test for GPU Reset event
Update the event notification tests to handle both GPU pre reset
and GPU post reset events. GPU post reset event takes sometime to
be generated after the pre reset event, so issue another
notification read to wait for post reset event.

Change-Id: I2812760b184d5357130e478cc35d27b14592abb3
2020-11-23 10:53:23 -05:00
Chris Freehill bf6af90908 rsmitst address sanitizer support
Also, add libasan flag variants for librocm_smi

Change-Id: Ibd012e40d26907addf8c0550aaf9f78c11b8d51f
2020-11-10 15:45:56 -06:00
Chris Freehill 63064b0000 Quiet address sanitizer warnings
Also,
* Fix some doxygen issues
* Fix address sanitizer issues in rsmitst

Change-Id: Ie6c6fd9af5c418210b7064e79650fb92cd4a5e2b
2020-11-10 14:16:39 -06:00
Ramesh Errabolu 328878343c Update ROCm SMI library with ability to read CU occupancy
Change-Id: Ib9882fa2d81c13604af282279bfa116bc2fd05a4
2020-10-14 09:33:37 -04:00
Divya Shikre f397cba414 Adding gtest for gpu metrics read
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I66edb15c8b7380f3427822b33e845202bfac7a2b
2020-10-08 13:37:47 -04:00
Chris Freehill 5465d872aa Revert "Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters""
This reverts commit ae6d3fbdd0.



Change-Id: Ic412a64d35aab74caf12bf4c791f0a66ac15b061
2020-10-08 10:36:30 -04:00
Chris Freehill ae6d3fbdd0 Revert "Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters"
This reverts commit 946bf93dfb.

Temporarily reverting until the driver side of this is upstream

Change-Id: I2d8243208c1271ebad90bc2ee0fda2dfefb0831b
2020-10-07 18:42:56 -04:00
Chris Freehill 946bf93dfb Support for RSMI_EVNT_GRP_XGMI_DATA_OUT counters
Also some format fixes

Change-Id: Id3c0f6b3cf5b327bb9ca6acb6091dc67764c8032
2020-10-05 17:22:19 -05:00
Chris Freehill 3522e94ed0 Add gtest lib dir to library search path
Change-Id: I57bb20e2a67a4eaac2d0e24314e22d1a5fbe3533
2020-10-01 23:46:33 -04:00
Mukul Joshi 8b95705e6f Add support for GPU reset SMI events
Add handling for both pre GPU reset and post GPU reset SMI
events.

Change-Id: I64d5e006bef58cb28b1c580c75f482a4590427da
2020-09-16 13:25:06 -04:00
Mukul Joshi aff75c955f Add support for KFD Thermal Throttling SMI event
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.

Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0
2020-09-16 13:24:57 -04:00
Mukul Joshi 406859ca8a Update KFD SMI event notification handling
Event bitmask in KFD SMI event is now replaced with event index in
the SMI event message. Sending a event bitmask, which was a 64-bit
field with only 1 bit set, was quite wasteful of memory and also
potentially limiting to 64 events. Instead the kernel would send
event index in the SMI event message. As a result, update the
KFD SMI event handling to expect the event index in the message.

Change-Id: I3e74620788d3c1f7c0bdaa69e9d9ab3d1aba2c92
2020-09-16 13:24:50 -04:00
Chris Freehill 7be97ec2aa Clean up comments for rsmitst
Change-Id: Iea5322a5fd3bffe77557fa2cecbce70716e1258c
2020-08-17 11:48:07 -05:00
Mukul Joshi 9d24fc9175 Fix compiler warning in TestPciReadWrite
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.

Change-Id: I05948073b0c40700bee69399b08df6031fc49d70
2020-07-13 17:32:17 -04:00
Mukul Joshi eea1ed8c3d Add support to retrieve process SDMA usage information.
Also, print SDMA usage information in TestProcInfoRead.

Change-Id: I8d19be3b8653e298c81237e5067eca75a1743e70
2020-07-13 17:32:08 -04:00
Chris Freehill 68155baed5 Handle un-readable kfd properties files
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.

Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".

Fixes SWDEV-240169

Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues

Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd
2020-07-10 12:35:31 -04:00
Chris Freehill e2c7ef6422 TestPerfCntrReadWrite fail rsmitst if not supported
Fixes SWDEV-243639

Change-Id: I087171231fbbe5939f239efad25a5485529381a3
2020-07-08 18:41:30 -04:00
Chris Freehill 59394f3354 Ensure no device mutexes are left held on shut_down
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.

Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
2020-06-19 13:59:20 -05:00
Mike Li 488bbb668a Add support to retrieve XGMI hive id
Change-Id: I1eee05dd85ecb856889d1cfe0565454d2f538856
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2020-06-19 07:35:23 -07:00
Chris Freehill 9e0ebb250c Fix line endings for init_shutdown_refcount.*
* Also, add assert that check for proper usage of
rand_sleep_mod().

Change-Id: Ieb4179e1ad12fbbf85c2e4f7c7f119b0bb30b197
2020-06-17 21:26:12 -05:00
Chris Freehill efc9b7658c Make verbosity level 0 completely quiet
Also, support --iterations flag for certain functions that will
likely be repeated frequently.

Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6
2020-06-17 21:26:12 -05:00
Divya Shikre 2805ed16a4 Adding current voltage feature & gtest.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic555a3af265e603419e2875d1989a366abc82596
2020-06-16 11:48:56 -04:00
Chris Freehill f946ea37ef Update XGMI perf counter test to show utilization
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
  reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file

Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
2020-06-10 12:49:49 -04:00
Kent Russell 8cf44548c0 Make an empty unique_id file non-fatal
This isn't supported on all models, so just comment out on failure
instead of fully failing

Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6
2020-06-04 10:31:53 -04:00
Mukul Joshi 633c852f5d Print VRAM usage in rsmitst
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.

Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2
2020-05-29 15:48:06 -04:00
Chris Freehill 8ced9c986a Add RSMI ref manual to packages
Also,
* remove extraneous test files
* fix Doxygen docs. issues
* fix whitespace issues

Change-Id: I9b58b0d68bd125a34f4fe0dc84d609c7b0b6e30e
2020-05-18 23:40:38 -04:00
Mike Li c7d349183a Add functions that are used to query Hardware topology.
Change-Id: I0f4cd02b237bde4d6dccfb0e83e65376ecb1cfaa
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2020-05-18 12:37:27 -04:00
Chris Freehill 8e03d10035 Add ref counting for rsmi init and shutdown
Also, clean lint from kfd_ioctl.h file.

Change-Id: I5a2ae127ab6ab6676a1b075ed10858d0ebfe13c1
2020-05-11 15:57:42 -05:00
Chris Freehill 2235ede34c Add event notification API
Change-Id: Ib6e8efbe6cdefaa7de1f74bd26993e9b4b011649
2020-05-06 14:07:25 -05:00
Chris Freehill 1c9ef44398 Add checking for no-longer-existing process in test
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.

* Also, fix compile warning by removing unused variable.

Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05
2020-04-10 08:51:44 -05:00
Chris Freehill f8b57c3b16 Add device mutual exclusion tests and related fixes
* Added a new test to verify mutual exclusion of access to device
  resources
* Added some missing acquiring of mutexes to some RSMI calls, as
  well as try-catch blocks.

Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
2020-04-08 15:05:11 -05:00
Mukul Joshi fd79e5c161 Add rsmi_topo_get_numa_affinity()
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.

Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a
2020-04-01 11:38:08 -04:00
Chris Freehill d9ab846bee Make rsmitst tests fail quickly if rsmi_init fails
Change-Id: I7b5d94b77305b30e08f33e1ddb6e2f089db0431f
2020-03-11 12:13:28 -05:00