* Added a new test to verify mutual exclusion of access to device
resources
* Added some missing acquiring of mutexes to some RSMI calls, as
well as try-catch blocks.
Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
initialization fails. This may mess up other processes that
are using it. Instead, print a message on how to resolve the
situation, and then throw an error.
Note, this situation comes up when debug builds (usually)
either assert() or otherwise end execution without a proper
clean up.
* Remove cpplint from shared_mutex code
Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.
Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.
Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a
Given a process ID, give the device indices that process is
currently using.
Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting
Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9
Improvements include
* adding additional build flags that warn about stack-smashing
and type conversion errors
* run-time checks for valid function input values and adquate
space for the result of arithmetic operations.
* make sure default case for switch statements do something
besides just assert
* disable using env. var. debugging in release mode
Change-Id: I5f048310c5c56e05d9ec31bcc273404d6a0dd646
* Update doc. on api-support function
* Check for valid integer value when reading a monitor int. val.
* If fan-write test attempts to set speed higher than max.
possible, then skip the test
Change-Id: I01ad0ab1f4caffdb0d2c26e9575f278c35a6b017
For device-getter functions, allow users to specify a nullptr
for the provided buffer. In those cases, the function will return
RSMI_STATUS_NOT_SUPPORTED if the hardware or system software does
not support the function. If the function is supported, then
RSMI_STATUS_INVALID_ARGS will be returned, unless a different
error is encountered.
Additionally, tests and documentation were updated to reflect
this change.
Change-Id: Ie7db3a4c8c66af97ebd7ee1e3b95cd331ace9d9c
Add support and testing for reading the vram vendor associated with
the GPU. The vram vendor can be found as a separate sysfs file at:
/sys/class/drm/card[X]/device/mem_info_vram_vendor
The vram vendor is displayed as a string value.
Change-Id: I12c8e56e57f45aa08d7d6c25338c4e468ed1c7fc
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
The new functions added in this commit allow a caller to tell up
front what functions, function variants and monitors are
supported.
Also,
* fixed a few documentation/formatting issues
* fixed a process_info test issue
Change-Id: I2184ab1a4a6898f847e791f273e2185d556e78e9
If the 32-bit domain is found in the kfd node properties for
a device, then it will be used when constructing the bdfid.
If it's not present, it will continue to use the 16 bit version.
Also, whether or not 32b or 16b are used for the domain, the
domain will now be placed in the upper 32b of the 64b bdfid.
* Fixed some unrelated doxygen issues
Change-Id: Icb5116daa1ab45ee305bdbe6cd5df5736dd3ffa3
* Specifically, address case when brand name is longer than buffer
provided
* Also, slightly modify prototype to match similar, existing APIs.
* Address some cpplint issues.
Change-Id: Iaf77304e23085123e88f301e4b33bc4e6be2a225
Add support and testing for reading the brand name associated with
a specific GPU (such as mi25, mi50, mi60, etc). The brand name is
associated with the SKU of the GPU, and some brand names can be
mapped from multiple different SKUs.
Change-Id: I36eb95ca8e72efdd294ccd684841195925dfe820
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Also, don't return an error for empty sysfs files. The reserved memory
page file will often have no lines. We don't want it to appear that
this function is not supported if the file is empty.
Change-Id: I1d28bb184ea587bb578fe71dd75adc2a812d09a8
Function to get the drm minor number associated with ROCm device
Change-Id: I9356b9ca75151882acbb075076bc072f08b73aae
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Added implementation of and tests for
rsmi_dev_compute_process_info_by_pid_get() and
rsmi_dev_compute_process_info_get()
Change-Id: I4c4f5f39fe6701da37916c9ad41449b5d35ac7af
Add support and testing for reading the Unique ID associated with a
specific GPU. This ID will persist across reboots, even if the GPU is
moved to a different machine. Note that this is per-GPU, not per-card,
as some cards have multiple GPUs, and each GPU will get a unique
identifier
Change-Id: Idce50c6febc2ceb1a4c1200d2489ec8b9d8fe174
* If vendor/device/subsystem name is not found, use device ID string
* Update documentation for get-name functions
* Add support for junction, edge and memory temperature sensors
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.
Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.
Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
* By default, only consider AMD GPU's in RSMI device list
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.
Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
* By default, only consider AMD GPU's in RSMI device list
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.
Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
* Remove duplicate definition of rsmi_init_flags_t
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.
Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.
Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.
Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
This commit uses a pthread mutex in shared memory to prevent
almost all cases of multiple processes simultaneously
reading/writing to device sysfs files. The main existing race
condition is when 2 processes are starting at the same time,
setting up their shared memory and mutexes. Since this is meant
to prevent collisions among thread and processes, the small
shared memory segments (big enough for a pthread_mutex) will
persist until reboot.