This tool acts as a command line interface for manipulating
and monitoring the Radeon Open Compute Kernel, similar to the
rocm_smi.py python tool.
The purpose of this commit is for the initial upload and cleanup
of the (incomplete) rocmSmiLib_cli.py and rsmiBindings.py files.
In the near future, this tool should have full feature parity with
rocm_smi.py by relying on the available rocm_smi_lib functions.
Change-Id: Ifbafd5118c15c68c240e3c83a47d2690a27c9353
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
[ROCm/rocm_smi_lib commit: 2b909252ac]
Package name should have a hyphen (not a period) between
NumCommitsSinceLastTag and ROCMIntegrationJobIdentifier.
Fixes SWDEV-245838
Change-Id: I28c4337af6f92ac51a4aed03a09af23b92bd89b5
[ROCm/rocm_smi_lib commit: 92c258c364]
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.
Change-Id: I05948073b0c40700bee69399b08df6031fc49d70
[ROCm/rocm_smi_lib commit: 9d24fc9175]
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.
Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".
Fixes SWDEV-240169
Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues
Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd
[ROCm/rocm_smi_lib commit: 68155baed5]
This corrects issues that arose after OAM reorganization.
It should address SWDEV-243294.
Also, fix some compile warnings that show up on RHEL.
Change-Id: Id14d444905da35cd7346bcfbcd82b6d0572708c4
[ROCm/rocm_smi_lib commit: c2ef9a6879]
A message is output in debug builds when pthread_unlock
returns an error. However, in most cases, it should return
EPERM. In fact, if it doesn't return EPERM, it is an
indication of a problem. This commit adjusts accordingly.
Change-Id: Ia5cad89aa6e68e79c1291ea21adffb0fa68f2300
[ROCm/rocm_smi_lib commit: 866438966d]
Also, add amdoam_get_error_description.
On behalf of
Amber Lin <Amber.Lin@amd.com> and
Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I1f5ac0c5948adb2c30008e95c501e8b69b8183b6
[ROCm/rocm_smi_lib commit: 27deaea6e8]
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.
Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
[ROCm/rocm_smi_lib commit: 59394f3354]
Also, support --iterations flag for certain functions that will
likely be repeated frequently.
Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6
[ROCm/rocm_smi_lib commit: efc9b7658c]
Automatically updating the manual pdf file causes a local
git change. This messes up "repo sync" calls because of the
local change. Instead, just write an un-tracked file that can
be used to update the tracked version of the manual .pdf.
Change-Id: Icd7edc244df60728ec169c5aa1cf8b322ca4143b
[ROCm/rocm_smi_lib commit: 8e6f7c798d]
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file
Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
[ROCm/rocm_smi_lib commit: f946ea37ef]
This isn't supported on all models, so just comment out on failure
instead of fully failing
Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6
[ROCm/rocm_smi_lib commit: 8cf44548c0]
* Also, remove dependency of manual pdf on the README
file; they are independent of each other.
Change-Id: I1ab8c8c9adf6b78e5b4aab86ecdf4c46f3a6bf63
[ROCm/rocm_smi_lib commit: bdf22c1c9e]
When pattern matching file names to determine API support, in
some environments std::regex will throw. This change is meant
to handle this more gracefully.
Change-Id: If1ccfe5bdd71ec4d08663c80692024488072e11b
[ROCm/rocm_smi_lib commit: 27148a02cb]
To avoid build and runtime issues, we should set a minimum
compiler version. std::regex, used by rocm_smi_lib, requires
4.9.0 or greater. However, the development and test
environments are (mainly) 5.4.0.
Change-Id: Ie18e9f905786ec8eb50d61a326cb45173a0ec355
[ROCm/rocm_smi_lib commit: b7ff71c001]
Instead of hard-coding install path to /opt/rocm, allow users to specify
where "make install" goes to so users can install lib to their local build
path for testing purpose without touching global /opt/rocm files.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I4144988e325edae4d1d1a2824e031996091036d3
[ROCm/rocm_smi_lib commit: 741f9c31ff]
This fixes a seg fault that would happen in release builds when
there are no KFD nodes on a system, which occurs when there are
not AMD gpus present in the system. This use case occurs
for higher application code that is meant to be gpu agnostic.
Change-Id: If374930bc2e62f9898f337349cde3ebb16091ff0
[ROCm/rocm_smi_lib commit: 806f665a85]
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.
* Also, fix compile warning by removing unused variable.
Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05
[ROCm/rocm_smi_lib commit: 1c9ef44398]
* Added a new test to verify mutual exclusion of access to device
resources
* Added some missing acquiring of mutexes to some RSMI calls, as
well as try-catch blocks.
Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
[ROCm/rocm_smi_lib commit: f8b57c3b16]
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
initialization fails. This may mess up other processes that
are using it. Instead, print a message on how to resolve the
situation, and then throw an error.
Note, this situation comes up when debug builds (usually)
either assert() or otherwise end execution without a proper
clean up.
* Remove cpplint from shared_mutex code
Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966
[ROCm/rocm_smi_lib commit: 52196caaee]
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.
Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a
[ROCm/rocm_smi_lib commit: fd79e5c161]
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.
Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a
[ROCm/rocm_smi_lib commit: 324c0ca0e5]
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.
Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3
[ROCm/rocm_smi_lib commit: 1d8e16bff2]