Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.
Change-Id: I96b979296e90cf881523627b41b1a02849676416
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9
Enums referenced in the test did not match what's in rocm_smi.h.
Added static assert to try to catch this. Also moved enum string
map to test_common.cc/h where other such maps are.
Also, fixed some cpplint issues.
Change-Id: I683553248ceb2fabb28ce1a1208bc9744aaf88d6
Previously, when a process holding a shared mutex was killed,
the next time an RSMI application was started, it would not be
able to obtain the mutex--the application would have to exit.
This fix uses pthread_mutexattr_setrobust() to detect this
situation and act accordingingly.
Also, add some missing, needed mutexes and move mutexes
closer to where the protect resource is used.
Change-Id: Icfdc3a246f4cfa3fd008e3f13472199abd76fd35
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.
Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock
Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2
Update the event notification tests to handle both GPU pre reset
and GPU post reset events. GPU post reset event takes sometime to
be generated after the pre reset event, so issue another
notification read to wait for post reset event.
Change-Id: I2812760b184d5357130e478cc35d27b14592abb3
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.
Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0
Event bitmask in KFD SMI event is now replaced with event index in
the SMI event message. Sending a event bitmask, which was a 64-bit
field with only 1 bit set, was quite wasteful of memory and also
potentially limiting to 64 events. Instead the kernel would send
event index in the SMI event message. As a result, update the
KFD SMI event handling to expect the event index in the message.
Change-Id: I3e74620788d3c1f7c0bdaa69e9d9ab3d1aba2c92
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.
Change-Id: I05948073b0c40700bee69399b08df6031fc49d70
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.
Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".
Fixes SWDEV-240169
Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues
Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.
Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file
Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.
Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.
* Also, fix compile warning by removing unused variable.
Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05
* Added a new test to verify mutual exclusion of access to device
resources
* Added some missing acquiring of mutexes to some RSMI calls, as
well as try-catch blocks.
Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.
Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a