Add a check when RSMI_STATUS_NOT_SUPPORTED is returned for fanRead/fanReadWrite.
Fix for SWDEV-314176 & SWDEV-314175 reported.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Icf2cc541a3fa5ca4794aff5d6bc91104adc45e6d
Check for RSMI_STATUS_INVALID_ARGS when invalid args are passed.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I0d5ff84aee5cce4214026ddcd860a17ae3e43147
For ASICs NAVI10 and above setting display clock [DCEFCLK] is not supported and the sysfs entry is
read-only. As a result, the test falsely fails for these ASICs. ROCm SMI Lib is ASIC independent.
So Display clock set cannot be selectively disabled for these ASICs.
As a compromise if the set (write to sysfs entry) fails due to permission error and euid is root,
assume that set feature is not supported and skip the test.
Change-Id: I7a273878cbf1465b01728705323e8a92a42378dd
Implement default GPU power cap functionality in rsmitst.
It is available in the "rsmitstReadOnly.TestPowerRead" test, and
is displayed as: "Default Power Cap: #uW" (where uW is microwatts).
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I564ea3785f1a93dfd30587634057516549fa762c
The rsmi_dev_temp_metric_get() can also support the HBM
temperatures which is retrieved from gpu_metrics.
Change-Id: I96b979296e90cf881523627b41b1a02849676416
static-libasan doesn't exist, so use the easier-to-remember
shared-libsan and change static-libasan to static-libsan
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ieef480aacdd770f3bb40673a2e8f8306b308b1c9
Enums referenced in the test did not match what's in rocm_smi.h.
Added static assert to try to catch this. Also moved enum string
map to test_common.cc/h where other such maps are.
Also, fixed some cpplint issues.
Change-Id: I683553248ceb2fabb28ce1a1208bc9744aaf88d6
Previously, when a process holding a shared mutex was killed,
the next time an RSMI application was started, it would not be
able to obtain the mutex--the application would have to exit.
This fix uses pthread_mutexattr_setrobust() to detect this
situation and act accordingingly.
Also, add some missing, needed mutexes and move mutexes
closer to where the protect resource is used.
Change-Id: Icfdc3a246f4cfa3fd008e3f13472199abd76fd35
A special mode of operation to achieve minimal performance variation by letting
the user have the ability to provide the desired frequency to be set as the soft limit.
The user can control the entry and exit to the mode via rocm-smi a mechanism to
enter / exit performance determinism mode as below.
Enter performance determinism mode:
- hold a lock
- write performance_determinism to power_dpm_force_performance_level
- write input clk_freq to pp_dpm_sclk
- release lock
Exit performance determinism_mode:
- hold a lock
- write auto to power_dpm_force_performance_level
- release lock
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia24e27954cdf1c4337ffc83d8948fbdfaf4552d2
Update the event notification tests to handle both GPU pre reset
and GPU post reset events. GPU post reset event takes sometime to
be generated after the pre reset event, so issue another
notification read to wait for post reset event.
Change-Id: I2812760b184d5357130e478cc35d27b14592abb3
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.
Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0
Event bitmask in KFD SMI event is now replaced with event index in
the SMI event message. Sending a event bitmask, which was a 64-bit
field with only 1 bit set, was quite wasteful of memory and also
potentially limiting to 64 events. Instead the kernel would send
event index in the SMI event message. As a result, update the
KFD SMI event handling to expect the event index in the message.
Change-Id: I3e74620788d3c1f7c0bdaa69e9d9ab3d1aba2c92
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.
Change-Id: I05948073b0c40700bee69399b08df6031fc49d70
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.
Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".
Fixes SWDEV-240169
Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues
Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.
Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file
Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.
Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2