Gráfico de Commits

301 Commits

Autor SHA1 Mensagem Data
Chris Freehill 2d6e15190c Add rsmi_compute_process_gpus_get()
Given a process ID, give the device indices that process is
currently using.

Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
  KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting

Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9
2020-02-22 10:47:58 -06:00
Chris Freehill 1004a01094 Disable TestFrequenciesReadWrite for arcturus
Change-Id: Ia20ec853cdba34ff3dcdc68b4f869890bf58b539
2019-11-07 16:22:45 -05:00
Chris Freehill 52dfa4bcca Docs., error checking and test improvements
* Update doc. on api-support function
* Check for valid integer value when reading a monitor int. val.
* If fan-write test attempts to set speed higher than max.
   possible, then skip the test

Change-Id: I01ad0ab1f4caffdb0d2c26e9575f278c35a6b017
2019-11-06 11:19:47 -05:00
Chris Freehill 3a26a7270c Support rsmitst blacklisting by adding an exclude file
Change-Id: I9d581b8e24363a688b58a6ca59a6521c7be364d7
2019-10-17 13:47:02 -05:00
Chris Freehill 68d25e82fd Support checking for specific device-getter api support
For device-getter functions, allow users to specify a nullptr
for the provided buffer. In those cases, the function will return
RSMI_STATUS_NOT_SUPPORTED if the hardware or system software does
not support the function. If the function is supported, then
RSMI_STATUS_INVALID_ARGS will be returned, unless a different
error is encountered.

Additionally, tests and documentation were updated to reflect
this change.

Change-Id: Ie7db3a4c8c66af97ebd7ee1e3b95cd331ace9d9c
2019-10-05 15:55:18 -05:00
Ori Messinger 2412dff6a2 Display GPU vram vendor
Add support and testing for reading the vram vendor associated with
the GPU. The vram vendor can be found as a separate sysfs file at:
/sys/class/drm/card[X]/device/mem_info_vram_vendor
The vram vendor is displayed as a string value.

Change-Id: I12c8e56e57f45aa08d7d6c25338c4e468ed1c7fc
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2019-10-04 11:51:30 -04:00
Chris Freehill 551b15182b Add functions that tell what capabilities are supported
The new functions added in this commit allow a caller to tell up
front what functions, function variants and monitors are
supported.

Also,
* fixed a few documentation/formatting issues
* fixed a process_info test issue

Change-Id: I2184ab1a4a6898f847e791f273e2185d556e78e9
2019-09-23 13:30:47 -05:00
Ori Messinger 7f2d970a80 Display GPU brand name
Add support and testing for reading the brand name associated with
a specific GPU (such as mi25, mi50, mi60, etc). The brand name is
associated with the SKU of the GPU, and some brand names can be
mapped from multiple different SKUs.

Change-Id: I36eb95ca8e72efdd294ccd684841195925dfe820
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2019-08-22 12:24:29 -04:00
Chris Freehill aaecfd6fff Adjust how we read ECC block counter status
This change corresponds to kernel changes.

Change-Id: Ibd977e8b3338349036cb16e55fb0b2c9c187726d
2019-08-09 16:06:43 -05:00
Kent Russell a34832f11e Fix RAS change
RAS formatting changed, so get it to handle both types of sysfs output
until it's normalized
Change-Id: I56f2a2495af8ff4d01011bc614283376afb9ad0a
2019-08-08 12:09:18 -04:00
Chris Freehill 73c54e1fd0 Add support for rsmi_dev_memory_reserved_pages_get()
Also, don't return an error for empty sysfs files. The reserved memory
page file will often have no lines. We don't want it to appear that
this function is not supported if the file is empty.

Change-Id: I1d28bb184ea587bb578fe71dd75adc2a812d09a8
2019-08-06 11:42:03 -05:00
Chris Freehill cf13d6f4d8 Add rsmi_dev_serial_number_get()
Also correct whitespace issues

Change-Id: I7ffe23672304c31ed08d7148b04a19a7d4c3d7ef
2019-07-22 07:09:53 -05:00
Harish Kasiviswanathan 904ea5fc27 Test rsmi_dev_drm_render_minor_get()
Change-Id: I5c0702efc8ed1bc155292e4c3a73d74e5c66204e
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2019-07-11 13:13:03 -04:00
Chris Freehill 31e02fdc61 Add rsmi_dev_firmware_version_get()
Change-Id: Iba3e5f3eaa0eb031fc013fc168bded22bc249b5c
2019-07-09 22:50:44 -05:00
Chris Freehill 557e1f5704 Add xgmi error_status and error_reset functions
Also, comment corrections and added check for invalid arguments

Change-Id: I891cbf9b37bfda629914a008811b840323872c02
2019-07-09 09:55:05 -04:00
Chris Freehill 9b93cbe21d Add initial support for getting process information
Added implementation of and tests for
rsmi_dev_compute_process_info_by_pid_get() and
rsmi_dev_compute_process_info_get()

Change-Id: I4c4f5f39fe6701da37916c9ad41449b5d35ac7af
2019-07-03 20:01:43 -05:00
Chris Freehill 1c5e090507 Add rsmi_dev_memory_busy_percent_get()
Change-Id: Ide683b6c72870af547331f4502c5bb8c445d61b5
2019-06-25 19:09:13 -05:00
Chris Freehill ea26baec20 Event counter support
XGMI related events are supported

Change-Id: If17036fe890c8be45da3654353599821b5828c14
2019-06-24 17:40:01 -05:00
Kent Russell 35d2807196 Add support for reading GPU's unique ID
Add support and testing for reading the Unique ID associated with a
specific GPU. This ID will persist across reboots, even if the GPU is
moved to a different machine. Note that this is per-GPU, not per-card,
as some cards have multiple GPUs, and each GPU will get a unique
identifier

Change-Id: Idce50c6febc2ceb1a4c1200d2489ec8b9d8fe174
2019-06-21 08:39:36 -04:00
Chris Freehill 11f714326b Add support for junction, edge and memory temperature sensors (#42)
* If vendor/device/subsystem name is not found, use device ID string

* Update documentation for get-name functions

* Add support for junction, edge and memory temperature sensors
2019-05-24 15:24:49 -05:00
Chris Freehill 98c2ad6aaf Updated google test to googletest-release-1.8.1 2019-05-15 10:21:37 -05:00
Chris Freehill 1dfef717bb By default, only consider AMD GPU's in RSMI device list
With newly added initialization parameters that can be
passed to rsmi_init(), you can tell RSMI to consider other
devices.

Also:
-fixed incorrect header file name that would break in C
builds
-modified rsmi_init() and rsmi_shut_down() to reinitialize and
clear static structures
2019-05-09 18:55:15 -05:00
Chris Freehill 34c977bd06 Added rsmi_dev_pci_replay_counter_get()
Also, added code to destroy/recreate mutex if we can't get a lock
within 3 seconds, when shared memory mutex is initialized.
2019-05-06 11:26:40 -05:00
Chris Freehill bb73c2607f Added rsmi_version_str_get() 2019-04-24 17:46:53 -05:00
Chris Freehill 84e3c541d1 Add "Disabled" state to ECC states 2019-04-04 15:27:15 -05:00
Chris Freehill 4e679b9324 Added ECC enabled, status and get functions 2019-04-03 11:17:43 -05:00
Chris Freehill c77f3c0ebd Added new clock types
Also added missing error code strings and improved test output
messages
2019-03-28 17:01:35 -05:00
Chris Freehill cbdfac7bdc Added new id and id name string look up functions
Also, updated docs with typo corrections and a new section
2019-03-15 16:21:37 -05:00
Chris Freehill d39752cee7 Add table for GPU name look to give better name than just "amdgpu" 2019-03-10 17:56:06 -05:00
Chris Freehill ddd292f1b5 Add rsmi_dev_memory_total_get()and rsmi_dev_memory_usage_get() 2019-03-04 18:26:11 -06:00
Chris Freehill 89fb40fbe5 Re-organize function documentation into sections 2019-03-03 23:11:50 -06:00
Chris Freehill bc0d801478 Use "_t" suffix consistently for RSMI types 2019-03-02 16:30:30 -06:00
Chris Freehill fb5f41fc10 Added rsmi_dev_error_count_get() 2019-03-01 16:33:11 -06:00
Chris Freehill 18ce553dce Add rsmi_dev_pci_throughput_get() 2019-02-27 15:10:26 -06:00
Chris Freehill 021f13a68f Add VBIOS version get function
Also, consolidate "get version" type function tests into 1 test.
2019-02-24 11:01:18 -06:00
Chris Freehill 68b5e2ee0d Documentation and volt-curve read updates 2019-02-22 15:05:44 -06:00
Chris Freehill f3fa9a036c Add get_version test; remove sanity test
Also, don't fail pcie bandwidth test when the pp_dpm_pcie file
does not correctly show the current bandwidth.
2019-02-12 18:07:26 -06:00
Chris Freehill 17bf80dcb2 Break down monolithic test into many smaller tests
Also, added boot up default power profile, and modified to
accomodate new profile format
2019-02-11 22:53:24 -06:00
Chris Freehill dd450e963c Add dont_fail option to not fail entire test on a single failure 2019-02-09 12:18:49 -06:00
Chris Freehill 4ab27528be Replace fan test failure with warning
In some cases, the fan sysfs files will exist even if the device
doesn't have a fan. In these cases, the tests will give apparently
random results.

Also, remove documentation and ifdef'd test of debugfs related
power functions.
2019-02-08 09:51:10 -06:00
Chris Freehill 08ec2a9804 Don't assert or fail tests that are unsupported by system 2019-01-24 16:07:25 -06:00
Chris Freehill b6ce6d30f4 Add support for reading frequency-volt curve data 2019-01-09 23:17:16 -06:00
Chris Freehill 639a4e3503 Add support for reading frequency-volt curva data 2019-01-07 08:44:23 -06:00
Chris Freehill 5e6424cab3 Handle case where PCIe information is not implemented in system
Also add a new error code for this.
2018-12-19 17:24:27 -06:00
Chris Freehill 5a9a729b31 Add rsmi_version_get() function
Also, modify CMakeLists.txt to use git tags to determine the
shared library version for the SONAME and the ROCm build for the
package name.
2018-12-06 13:48:59 -06:00
Chris Freehill 9c897ab86d Add get and set routines for PCIe bandwidth 2018-11-16 15:55:38 -06:00
Chris Freehill 861c2c2e33 Add rsmi_dev_busy_percent_get()
Also: correct some comments, ifdef out unused code
2018-11-12 17:25:14 -06:00
Chris Freehill 59a952666f Add rsmi_dev_pci_id_get() to return BDFID for given device
Also:
* add some exception handling;
* chop newline character off of device name returned from
rsmi_dev_id_get()
2018-11-05 11:22:12 -06:00
Chris Freehill 62ba2f578e Use sysfs file to get average power instead of debugfs 2018-10-29 17:59:24 -05:00
Chris Freehill 767fa53d8c Add support for new performance levels
Also added tests for new performance levels and clean up some
formatting/style issues.
2018-10-25 14:13:55 -05:00