When pattern matching file names to determine API support, in
some environments std::regex will throw. This change is meant
to handle this more gracefully.
Change-Id: If1ccfe5bdd71ec4d08663c80692024488072e11b
[ROCm/amdsmi commit: 27148a02cb]
To avoid build and runtime issues, we should set a minimum
compiler version. std::regex, used by rocm_smi_lib, requires
4.9.0 or greater. However, the development and test
environments are (mainly) 5.4.0.
Change-Id: Ie18e9f905786ec8eb50d61a326cb45173a0ec355
[ROCm/amdsmi commit: b7ff71c001]
Instead of hard-coding install path to /opt/rocm, allow users to specify
where "make install" goes to so users can install lib to their local build
path for testing purpose without touching global /opt/rocm files.
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I4144988e325edae4d1d1a2824e031996091036d3
[ROCm/amdsmi commit: 741f9c31ff]
This fixes a seg fault that would happen in release builds when
there are no KFD nodes on a system, which occurs when there are
not AMD gpus present in the system. This use case occurs
for higher application code that is meant to be gpu agnostic.
Change-Id: If374930bc2e62f9898f337349cde3ebb16091ff0
[ROCm/amdsmi commit: 806f665a85]
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.
* Also, fix compile warning by removing unused variable.
Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05
[ROCm/amdsmi commit: 1c9ef44398]
* Added a new test to verify mutual exclusion of access to device
resources
* Added some missing acquiring of mutexes to some RSMI calls, as
well as try-catch blocks.
Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
[ROCm/amdsmi commit: f8b57c3b16]
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
initialization fails. This may mess up other processes that
are using it. Instead, print a message on how to resolve the
situation, and then throw an error.
Note, this situation comes up when debug builds (usually)
either assert() or otherwise end execution without a proper
clean up.
* Remove cpplint from shared_mutex code
Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966
[ROCm/amdsmi commit: 52196caaee]
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.
Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a
[ROCm/amdsmi commit: fd79e5c161]
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.
Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a
[ROCm/amdsmi commit: 324c0ca0e5]
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.
Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3
[ROCm/amdsmi commit: 1d8e16bff2]
Given a process ID, give the device indices that process is
currently using.
Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting
Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9
[ROCm/amdsmi commit: 2d6e15190c]
Improvements include
* adding additional build flags that warn about stack-smashing
and type conversion errors
* run-time checks for valid function input values and adquate
space for the result of arithmetic operations.
* make sure default case for switch statements do something
besides just assert
* disable using env. var. debugging in release mode
Change-Id: I5f048310c5c56e05d9ec31bcc273404d6a0dd646
[ROCm/amdsmi commit: d00b9ac07d]
This is part of fix to SWDEV-208805. The other part will
be in the build_* script.
Change-Id: I36397e3f918d08170db8bb228722a2b7389af83b
[ROCm/amdsmi commit: 0e5c44de2a]
* Update doc. on api-support function
* Check for valid integer value when reading a monitor int. val.
* If fan-write test attempts to set speed higher than max.
possible, then skip the test
Change-Id: I01ad0ab1f4caffdb0d2c26e9575f278c35a6b017
[ROCm/amdsmi commit: 52dfa4bcca]
For device-getter functions, allow users to specify a nullptr
for the provided buffer. In those cases, the function will return
RSMI_STATUS_NOT_SUPPORTED if the hardware or system software does
not support the function. If the function is supported, then
RSMI_STATUS_INVALID_ARGS will be returned, unless a different
error is encountered.
Additionally, tests and documentation were updated to reflect
this change.
Change-Id: Ie7db3a4c8c66af97ebd7ee1e3b95cd331ace9d9c
[ROCm/amdsmi commit: 68d25e82fd]
Add support and testing for reading the vram vendor associated with
the GPU. The vram vendor can be found as a separate sysfs file at:
/sys/class/drm/card[X]/device/mem_info_vram_vendor
The vram vendor is displayed as a string value.
Change-Id: I12c8e56e57f45aa08d7d6c25338c4e468ed1c7fc
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
[ROCm/amdsmi commit: 2412dff6a2]
The new functions added in this commit allow a caller to tell up
front what functions, function variants and monitors are
supported.
Also,
* fixed a few documentation/formatting issues
* fixed a process_info test issue
Change-Id: I2184ab1a4a6898f847e791f273e2185d556e78e9
[ROCm/amdsmi commit: 551b15182b]
If the 32-bit domain is found in the kfd node properties for
a device, then it will be used when constructing the bdfid.
If it's not present, it will continue to use the 16 bit version.
Also, whether or not 32b or 16b are used for the domain, the
domain will now be placed in the upper 32b of the 64b bdfid.
* Fixed some unrelated doxygen issues
Change-Id: Icb5116daa1ab45ee305bdbe6cd5df5736dd3ffa3
[ROCm/amdsmi commit: 469af303d6]
* Specifically, address case when brand name is longer than buffer
provided
* Also, slightly modify prototype to match similar, existing APIs.
* Address some cpplint issues.
Change-Id: Iaf77304e23085123e88f301e4b33bc4e6be2a225
[ROCm/amdsmi commit: 01e0800741]
Add support and testing for reading the brand name associated with
a specific GPU (such as mi25, mi50, mi60, etc). The brand name is
associated with the SKU of the GPU, and some brand names can be
mapped from multiple different SKUs.
Change-Id: I36eb95ca8e72efdd294ccd684841195925dfe820
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
[ROCm/amdsmi commit: 7f2d970a80]
Also, use abbreviated ROCM_BUILD_ID environment variable for job
and build number, if it's available.
Change-Id: Ib5a721f5920f1008bb6382935f7b439429389de0
[ROCm/amdsmi commit: aa2db48237]
Library version will now only have major and minor. Package
version will now include number of commits since previous
package. Both SO and package versions rely on git tags to
determine the current build and the commits since the last
release.
Change-Id: If2bda74bf342930a9e07f5c91cb1380b6b7c64ca
[ROCm/amdsmi commit: fe738eaedb]
RAS formatting changed, so get it to handle both types of sysfs output
until it's normalized
Change-Id: I56f2a2495af8ff4d01011bc614283376afb9ad0a
[ROCm/amdsmi commit: a34832f11e]