This corrects issues that arose after OAM reorganization.
It should address SWDEV-243294.
Also, fix some compile warnings that show up on RHEL.
Change-Id: Id14d444905da35cd7346bcfbcd82b6d0572708c4
[ROCm/rocm_smi_lib commit: c2ef9a6879]
A message is output in debug builds when pthread_unlock
returns an error. However, in most cases, it should return
EPERM. In fact, if it doesn't return EPERM, it is an
indication of a problem. This commit adjusts accordingly.
Change-Id: Ia5cad89aa6e68e79c1291ea21adffb0fa68f2300
[ROCm/rocm_smi_lib commit: 866438966d]
Also, add amdoam_get_error_description.
On behalf of
Amber Lin <Amber.Lin@amd.com> and
Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I1f5ac0c5948adb2c30008e95c501e8b69b8183b6
[ROCm/rocm_smi_lib commit: 27deaea6e8]
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.
Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
[ROCm/rocm_smi_lib commit: 59394f3354]
Also, support --iterations flag for certain functions that will
likely be repeated frequently.
Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6
[ROCm/rocm_smi_lib commit: efc9b7658c]
Automatically updating the manual pdf file causes a local
git change. This messes up "repo sync" calls because of the
local change. Instead, just write an un-tracked file that can
be used to update the tracked version of the manual .pdf.
Change-Id: Icd7edc244df60728ec169c5aa1cf8b322ca4143b
[ROCm/rocm_smi_lib commit: 8e6f7c798d]
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file
Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
[ROCm/rocm_smi_lib commit: f946ea37ef]
This isn't supported on all models, so just comment out on failure
instead of fully failing
Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6
[ROCm/rocm_smi_lib commit: 8cf44548c0]
* Also, remove dependency of manual pdf on the README
file; they are independent of each other.
Change-Id: I1ab8c8c9adf6b78e5b4aab86ecdf4c46f3a6bf63
[ROCm/rocm_smi_lib commit: bdf22c1c9e]
When pattern matching file names to determine API support, in
some environments std::regex will throw. This change is meant
to handle this more gracefully.
Change-Id: If1ccfe5bdd71ec4d08663c80692024488072e11b
[ROCm/rocm_smi_lib commit: 27148a02cb]
To avoid build and runtime issues, we should set a minimum
compiler version. std::regex, used by rocm_smi_lib, requires
4.9.0 or greater. However, the development and test
environments are (mainly) 5.4.0.
Change-Id: Ie18e9f905786ec8eb50d61a326cb45173a0ec355
[ROCm/rocm_smi_lib commit: b7ff71c001]
Instead of hard-coding install path to /opt/rocm, allow users to specify
where "make install" goes to so users can install lib to their local build
path for testing purpose without touching global /opt/rocm files.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I4144988e325edae4d1d1a2824e031996091036d3
[ROCm/rocm_smi_lib commit: 741f9c31ff]
This fixes a seg fault that would happen in release builds when
there are no KFD nodes on a system, which occurs when there are
not AMD gpus present in the system. This use case occurs
for higher application code that is meant to be gpu agnostic.
Change-Id: If374930bc2e62f9898f337349cde3ebb16091ff0
[ROCm/rocm_smi_lib commit: 806f665a85]
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.
* Also, fix compile warning by removing unused variable.
Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05
[ROCm/rocm_smi_lib commit: 1c9ef44398]
* Added a new test to verify mutual exclusion of access to device
resources
* Added some missing acquiring of mutexes to some RSMI calls, as
well as try-catch blocks.
Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
[ROCm/rocm_smi_lib commit: f8b57c3b16]
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
initialization fails. This may mess up other processes that
are using it. Instead, print a message on how to resolve the
situation, and then throw an error.
Note, this situation comes up when debug builds (usually)
either assert() or otherwise end execution without a proper
clean up.
* Remove cpplint from shared_mutex code
Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966
[ROCm/rocm_smi_lib commit: 52196caaee]
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.
Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a
[ROCm/rocm_smi_lib commit: fd79e5c161]
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.
Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a
[ROCm/rocm_smi_lib commit: 324c0ca0e5]
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.
Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3
[ROCm/rocm_smi_lib commit: 1d8e16bff2]
Given a process ID, give the device indices that process is
currently using.
Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting
Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9
[ROCm/rocm_smi_lib commit: 2d6e15190c]
Improvements include
* adding additional build flags that warn about stack-smashing
and type conversion errors
* run-time checks for valid function input values and adquate
space for the result of arithmetic operations.
* make sure default case for switch statements do something
besides just assert
* disable using env. var. debugging in release mode
Change-Id: I5f048310c5c56e05d9ec31bcc273404d6a0dd646
[ROCm/rocm_smi_lib commit: d00b9ac07d]