* Added a new test to verify mutual exclusion of access to device
resources
* Added some missing acquiring of mutexes to some RSMI calls, as
well as try-catch blocks.
Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
[ROCm/rocm_smi_lib commit: f8b57c3b16]
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
initialization fails. This may mess up other processes that
are using it. Instead, print a message on how to resolve the
situation, and then throw an error.
Note, this situation comes up when debug builds (usually)
either assert() or otherwise end execution without a proper
clean up.
* Remove cpplint from shared_mutex code
Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966
[ROCm/rocm_smi_lib commit: 52196caaee]
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.
Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a
[ROCm/rocm_smi_lib commit: fd79e5c161]
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.
Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a
[ROCm/rocm_smi_lib commit: 324c0ca0e5]
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.
Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3
[ROCm/rocm_smi_lib commit: 1d8e16bff2]
Given a process ID, give the device indices that process is
currently using.
Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting
Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9
[ROCm/rocm_smi_lib commit: 2d6e15190c]
Improvements include
* adding additional build flags that warn about stack-smashing
and type conversion errors
* run-time checks for valid function input values and adquate
space for the result of arithmetic operations.
* make sure default case for switch statements do something
besides just assert
* disable using env. var. debugging in release mode
Change-Id: I5f048310c5c56e05d9ec31bcc273404d6a0dd646
[ROCm/rocm_smi_lib commit: d00b9ac07d]
This is part of fix to SWDEV-208805. The other part will
be in the build_* script.
Change-Id: I36397e3f918d08170db8bb228722a2b7389af83b
[ROCm/rocm_smi_lib commit: 0e5c44de2a]
* Update doc. on api-support function
* Check for valid integer value when reading a monitor int. val.
* If fan-write test attempts to set speed higher than max.
possible, then skip the test
Change-Id: I01ad0ab1f4caffdb0d2c26e9575f278c35a6b017
[ROCm/rocm_smi_lib commit: 52dfa4bcca]
For device-getter functions, allow users to specify a nullptr
for the provided buffer. In those cases, the function will return
RSMI_STATUS_NOT_SUPPORTED if the hardware or system software does
not support the function. If the function is supported, then
RSMI_STATUS_INVALID_ARGS will be returned, unless a different
error is encountered.
Additionally, tests and documentation were updated to reflect
this change.
Change-Id: Ie7db3a4c8c66af97ebd7ee1e3b95cd331ace9d9c
[ROCm/rocm_smi_lib commit: 68d25e82fd]
Add support and testing for reading the vram vendor associated with
the GPU. The vram vendor can be found as a separate sysfs file at:
/sys/class/drm/card[X]/device/mem_info_vram_vendor
The vram vendor is displayed as a string value.
Change-Id: I12c8e56e57f45aa08d7d6c25338c4e468ed1c7fc
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
[ROCm/rocm_smi_lib commit: 2412dff6a2]
The new functions added in this commit allow a caller to tell up
front what functions, function variants and monitors are
supported.
Also,
* fixed a few documentation/formatting issues
* fixed a process_info test issue
Change-Id: I2184ab1a4a6898f847e791f273e2185d556e78e9
[ROCm/rocm_smi_lib commit: 551b15182b]
If the 32-bit domain is found in the kfd node properties for
a device, then it will be used when constructing the bdfid.
If it's not present, it will continue to use the 16 bit version.
Also, whether or not 32b or 16b are used for the domain, the
domain will now be placed in the upper 32b of the 64b bdfid.
* Fixed some unrelated doxygen issues
Change-Id: Icb5116daa1ab45ee305bdbe6cd5df5736dd3ffa3
[ROCm/rocm_smi_lib commit: 469af303d6]
* Specifically, address case when brand name is longer than buffer
provided
* Also, slightly modify prototype to match similar, existing APIs.
* Address some cpplint issues.
Change-Id: Iaf77304e23085123e88f301e4b33bc4e6be2a225
[ROCm/rocm_smi_lib commit: 01e0800741]
Add support and testing for reading the brand name associated with
a specific GPU (such as mi25, mi50, mi60, etc). The brand name is
associated with the SKU of the GPU, and some brand names can be
mapped from multiple different SKUs.
Change-Id: I36eb95ca8e72efdd294ccd684841195925dfe820
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
[ROCm/rocm_smi_lib commit: 7f2d970a80]
Also, use abbreviated ROCM_BUILD_ID environment variable for job
and build number, if it's available.
Change-Id: Ib5a721f5920f1008bb6382935f7b439429389de0
[ROCm/rocm_smi_lib commit: aa2db48237]
Library version will now only have major and minor. Package
version will now include number of commits since previous
package. Both SO and package versions rely on git tags to
determine the current build and the commits since the last
release.
Change-Id: If2bda74bf342930a9e07f5c91cb1380b6b7c64ca
[ROCm/rocm_smi_lib commit: fe738eaedb]
RAS formatting changed, so get it to handle both types of sysfs output
until it's normalized
Change-Id: I56f2a2495af8ff4d01011bc614283376afb9ad0a
[ROCm/rocm_smi_lib commit: a34832f11e]
Also, don't return an error for empty sysfs files. The reserved memory
page file will often have no lines. We don't want it to appear that
this function is not supported if the file is empty.
Change-Id: I1d28bb184ea587bb578fe71dd75adc2a812d09a8
[ROCm/rocm_smi_lib commit: 73c54e1fd0]
Function to get the drm minor number associated with ROCm device
Change-Id: I9356b9ca75151882acbb075076bc072f08b73aae
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/rocm_smi_lib commit: 68cb303a44]
Added implementation of and tests for
rsmi_dev_compute_process_info_by_pid_get() and
rsmi_dev_compute_process_info_get()
Change-Id: I4c4f5f39fe6701da37916c9ad41449b5d35ac7af
[ROCm/rocm_smi_lib commit: 9b93cbe21d]