Граф коммитов

196 Коммитов

Автор SHA1 Сообщение Дата
Chris Freehill b1c550f82d Add RSMI ref manual to packages
Also,
* remove extraneous test files
* fix Doxygen docs. issues
* fix whitespace issues

Change-Id: I9b58b0d68bd125a34f4fe0dc84d609c7b0b6e30e


[ROCm/amdsmi commit: 8ced9c986a]
2020-05-18 23:40:38 -04:00
Mike Li 84e3d0f0e9 Add functions that are used to query Hardware topology.
Change-Id: I0f4cd02b237bde4d6dccfb0e83e65376ecb1cfaa
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/amdsmi commit: c7d349183a]
2020-05-18 12:37:27 -04:00
Chris Freehill b816c4a68b Fix README example error
Change-Id: Ib0124642cea34dcbfae0ea3bbe8ffaf09116bede


[ROCm/amdsmi commit: f8d623cb44]
2020-05-15 12:09:05 -04:00
Chris Freehill 5297b1172d Don't use static variable for monitors
Change-Id: I24b5ccfa94b2d722b070a6c6385af9201d21d9c5


[ROCm/amdsmi commit: 02e4a9c14f]
2020-05-15 08:05:06 -04:00
Chris Freehill b635264533 Catch and handle regex exception
When pattern matching file names to determine API support, in
some environments std::regex will throw. This change is meant
to handle this more gracefully.

Change-Id: If1ccfe5bdd71ec4d08663c80692024488072e11b


[ROCm/amdsmi commit: 27148a02cb]
2020-05-14 15:39:40 -04:00
Chris Freehill 1f3612fe7b Require gcc version 5.4.0 or greater
To avoid build and runtime issues, we should set a minimum
compiler version. std::regex, used by rocm_smi_lib, requires
4.9.0 or greater. However, the development and test
environments are (mainly) 5.4.0.

Change-Id: Ie18e9f905786ec8eb50d61a326cb45173a0ec355


[ROCm/amdsmi commit: b7ff71c001]
2020-05-14 15:15:56 -04:00
Chris Freehill 0fd8f4806c Merge "Add ref counting for rsmi init and shutdown" into amd-master
[ROCm/amdsmi commit: 44f14f4a86]
2020-05-13 09:24:32 -04:00
Pruthvi Madugundu 3f8cb30dba Merge "Adding lib symlink to top level rocm lib directory" into amd-master
[ROCm/amdsmi commit: 2143bc30a1]
2020-05-11 21:25:32 -04:00
Pruthvi Madugundu dbf7fb8b58 Adding lib symlink to top level rocm lib directory
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: Id00e501de7c3cbc814d18493b97449a5fcb96fd6


[ROCm/amdsmi commit: 2f3535f2eb]
2020-05-11 15:35:12 -07:00
Chris Freehill 42aa39ac22 Add ref counting for rsmi init and shutdown
Also, clean lint from kfd_ioctl.h file.

Change-Id: I5a2ae127ab6ab6676a1b075ed10858d0ebfe13c1


[ROCm/amdsmi commit: 8e03d10035]
2020-05-11 15:57:42 -05:00
Chris Freehill dcacb71f68 Use user-mode version of kfd_ioctl.h file
Previously using kernel mode version.

Change-Id: I82bfff9c019a9059b4d0d198c6cf06dc515cc528


[ROCm/amdsmi commit: e1f0d7e85a]
2020-05-07 17:13:59 -05:00
Amber Lin 264b0ea526 Allow specifying rsmi-lib install path
Instead of hard-coding install path to /opt/rocm, allow users to specify
where "make install" goes to so users can install lib to their local build
path for testing purpose without touching global /opt/rocm files.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I4144988e325edae4d1d1a2824e031996091036d3


[ROCm/amdsmi commit: 741f9c31ff]
2020-05-06 18:08:22 -04:00
Chris Freehill b10500a770 Add event notification API
Change-Id: Ib6e8efbe6cdefaa7de1f74bd26993e9b4b011649


[ROCm/amdsmi commit: 2235ede34c]
2020-05-06 14:07:25 -05:00
Chris Freehill 0495eb2077 Handle rsmi app running on machine with no AMD gpus
This fixes a seg fault that would happen in release builds when
there are no KFD nodes on a system, which occurs when there are
not AMD gpus present in the system. This use case occurs
for higher application code that is meant to be gpu agnostic.

Change-Id: If374930bc2e62f9898f337349cde3ebb16091ff0


[ROCm/amdsmi commit: 806f665a85]
2020-04-28 00:35:16 -04:00
Chris Freehill 922aa1b5dc Add checking for no-longer-existing process in test
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.

* Also, fix compile warning by removing unused variable.

Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05


[ROCm/amdsmi commit: 1c9ef44398]
2020-04-10 08:51:44 -05:00
Chris Freehill d592203ff5 Add device mutual exclusion tests and related fixes
* Added a new test to verify mutual exclusion of access to device
  resources
* Added some missing acquiring of mutexes to some RSMI calls, as
  well as try-catch blocks.

Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9


[ROCm/amdsmi commit: f8b57c3b16]
2020-04-08 15:05:11 -05:00
Chris Freehill 8ecf004060 Shared mutex fixes and improvements
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
  initialization fails. This may mess up other processes that
  are using it. Instead, print a message on how to resolve the
  situation, and then throw an error.

  Note, this situation comes up when debug builds (usually)
  either assert() or otherwise end execution without a proper
  clean up.
* Remove cpplint from shared_mutex code

Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966


[ROCm/amdsmi commit: 52196caaee]
2020-04-06 17:08:33 -05:00
Mukul Joshi b70850aa66 Add rsmi_topo_get_numa_affinity()
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.

Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a


[ROCm/amdsmi commit: fd79e5c161]
2020-04-01 11:38:08 -04:00
Chris Freehill 735b0aec7d Documentation update
Change-Id: I646cf3d2fd6064295937f7e727076532894d3514


[ROCm/amdsmi commit: 7abe6dc1b2]
2020-03-27 14:08:19 -05:00
Chris Freehill 1d8b235f4d More general solution to api support hwmon mapping
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.

Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a


[ROCm/amdsmi commit: 324c0ca0e5]
2020-03-16 11:37:47 -05:00
Chris Freehill b1fd4a6fb4 Fix indexing problem with api support function
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.

Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3


[ROCm/amdsmi commit: 1d8e16bff2]
2020-03-12 11:43:01 -05:00
Chris Freehill a2cf4e2ab4 Make rsmitst tests fail quickly if rsmi_init fails
Change-Id: I7b5d94b77305b30e08f33e1ddb6e2f089db0431f


[ROCm/amdsmi commit: d9ab846bee]
2020-03-11 12:13:28 -05:00
Chris Freehill 0bafc0b65e Don't assert or re-throw exception caught at top level
Instead, return error and let caller deal with it.

Change-Id: I1a55337134b00aa4259af27281b2450fc2252be9


[ROCm/amdsmi commit: d54a9484be]
2020-03-11 12:11:29 -05:00
Chris Freehill 07c74cbe38 Correct rsmitst build instructions
Change-Id: Ia7dbdd7a489d235c6003badb79f2d0808e18143b


[ROCm/amdsmi commit: a482394263]
2020-03-02 16:29:10 -05:00
Chris Freehill 646000ca93 Fix segmentation fault that sometimes occurs on release builds
Fixes SWDEV-216441

Change-Id: I3ea01a4edd14000a103de751757dfaadc7d358bb


[ROCm/amdsmi commit: 0bf81ed2f9]
2020-02-24 17:17:26 -06:00
Chris Freehill abd01b37eb Add rsmi_compute_process_gpus_get()
Given a process ID, give the device indices that process is
currently using.

Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
  KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting

Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9


[ROCm/amdsmi commit: 2d6e15190c]
2020-02-22 10:47:58 -06:00
Chris Freehill 40dbb98b41 Merge "Ensure string is non-empty before calling stoul or stoi" into amd-master
[ROCm/amdsmi commit: 842bd29568]
2020-01-30 20:16:56 -05:00
Srinivasan Subramanian 36697328c2 Changes for multiple ROCm installation
1. Support multiple rocm installtion
2. Support shared library versioning.

Change-Id: Id5c25b90abed084e8fe8cb7c374c2d4384653bbf


[ROCm/amdsmi commit: 29d55e001a]
2020-01-30 11:08:57 -08:00
Chris Freehill f97ab32b5f Ensure string is non-empty before calling stoul or stoi
Change-Id: I2c6314fb86d3bba8fd6aab932dbb989263fa8542


[ROCm/amdsmi commit: f748868818]
2020-01-28 17:05:14 -06:00
Chris Freehill 3aef34b9b1 Security improvements
Improvements include
* adding additional build flags that warn about stack-smashing
and type conversion errors
* run-time checks for valid function input values and adquate
space for the result of arithmetic operations.
* make sure default case for switch statements do something
besides just assert
* disable using env. var. debugging in release mode

Change-Id: I5f048310c5c56e05d9ec31bcc273404d6a0dd646


[ROCm/amdsmi commit: d00b9ac07d]
2020-01-16 14:56:27 -06:00
Chris Freehill 4c34034ec9 Use default value for version when git tags not present
Also, documentation typo correction.

Change-Id: I7fe4de05d3b8fb808a980862a09a9be32ed32bf5


[ROCm/amdsmi commit: fe4f7ed4a1]
2019-12-19 08:32:38 -06:00
Chris Freehill 23c153c198 Merge "Make dpkg and rpm package names match their file names" into amd-master
[ROCm/amdsmi commit: 8ffe1bc7f6]
2019-11-09 14:27:17 -05:00
Chris Freehill a9a37cb34d Make dpkg and rpm package names match their file names
For example,
$ dpkg -i rocm-smi-lib64-2.0.0.1.local-build-0-d10a391.deb 

will yield:
 ...
 Package: rocm-smi-lib64
 Version: 2.0.0.1.local-build-0-d10a391
 ...

Change-Id: I1e56e0c623b9421261cf0864958e821d10226d39


[ROCm/amdsmi commit: c926d50c3a]
2019-11-08 15:09:16 -04:00
Chris Freehill 35db3bb882 Disable TestFrequenciesReadWrite for arcturus
Change-Id: Ia20ec853cdba34ff3dcdc68b4f869890bf58b539


[ROCm/amdsmi commit: 1004a01094]
2019-11-07 16:22:45 -05:00
Chris Freehill 012bbcfc54 Merge "Docs., error checking and test improvements" into amd-master
[ROCm/amdsmi commit: 4ebb436893]
2019-11-06 20:15:26 -05:00
Chris Freehill d4e8ab37d5 Use "-" instead of "_" for package name
This is part of fix to SWDEV-208805. The other part will
be in the build_* script.

Change-Id: I36397e3f918d08170db8bb228722a2b7389af83b


[ROCm/amdsmi commit: 0e5c44de2a]
2019-11-06 11:31:50 -05:00
Chris Freehill 106c87ad0e Docs., error checking and test improvements
* Update doc. on api-support function
* Check for valid integer value when reading a monitor int. val.
* If fan-write test attempts to set speed higher than max.
   possible, then skip the test

Change-Id: I01ad0ab1f4caffdb0d2c26e9575f278c35a6b017


[ROCm/amdsmi commit: 52dfa4bcca]
2019-11-06 11:19:47 -05:00
Chris Freehill c7070324f3 Support rsmitst blacklisting by adding an exclude file
Change-Id: I9d581b8e24363a688b58a6ca59a6521c7be364d7


[ROCm/amdsmi commit: 3a26a7270c]
2019-10-17 13:47:02 -05:00
Chris Freehill 5c62352033 Correct README Markdown formatting
Change-Id: Id63618fc7fa7fa7cdc68bcd451cbe89ef2c04469


[ROCm/amdsmi commit: ee13e85265]
2019-10-17 08:38:50 -05:00
Chris Freehill cae8ad4321 Support checking for specific device-getter api support
For device-getter functions, allow users to specify a nullptr
for the provided buffer. In those cases, the function will return
RSMI_STATUS_NOT_SUPPORTED if the hardware or system software does
not support the function. If the function is supported, then
RSMI_STATUS_INVALID_ARGS will be returned, unless a different
error is encountered.

Additionally, tests and documentation were updated to reflect
this change.

Change-Id: Ie7db3a4c8c66af97ebd7ee1e3b95cd331ace9d9c


[ROCm/amdsmi commit: 68d25e82fd]
2019-10-05 15:55:18 -05:00
Ori Messinger 15844991b7 Display GPU vram vendor
Add support and testing for reading the vram vendor associated with
the GPU. The vram vendor can be found as a separate sysfs file at:
/sys/class/drm/card[X]/device/mem_info_vram_vendor
The vram vendor is displayed as a string value.

Change-Id: I12c8e56e57f45aa08d7d6c25338c4e468ed1c7fc
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 2412dff6a2]
2019-10-04 11:51:30 -04:00
Chris Freehill 948b82ed1a Add functions that tell what capabilities are supported
The new functions added in this commit allow a caller to tell up
front what functions, function variants and monitors are
supported.

Also,
* fixed a few documentation/formatting issues
* fixed a process_info test issue

Change-Id: I2184ab1a4a6898f847e791f273e2185d556e78e9


[ROCm/amdsmi commit: 551b15182b]
2019-09-23 13:30:47 -05:00
Chris Freehill b51bf32bd4 Make bdfid use 32 bit domain if possible
If the 32-bit domain is found in the kfd node properties for
a device, then it will be used when constructing the bdfid.
If it's not present, it will continue to use the 16 bit version.

Also, whether or not 32b or 16b are used for the domain, the
domain will now be placed in the upper 32b of the 64b bdfid.

* Fixed some unrelated doxygen issues

Change-Id: Icb5116daa1ab45ee305bdbe6cd5df5736dd3ffa3


[ROCm/amdsmi commit: 469af303d6]
2019-08-27 11:05:58 -04:00
Chris Freehill d8ab1b477a Fix issues with buffer length when getting brand name
* Specifically, address case when brand name is longer than buffer
provided

* Also, slightly modify prototype to match similar, existing APIs.

* Address some cpplint issues.

Change-Id: Iaf77304e23085123e88f301e4b33bc4e6be2a225


[ROCm/amdsmi commit: 01e0800741]
2019-08-26 07:21:02 -04:00
Ori Messinger 15c604930e Display GPU brand name
Add support and testing for reading the brand name associated with
a specific GPU (such as mi25, mi50, mi60, etc). The brand name is
associated with the SKU of the GPU, and some brand names can be
mapped from multiple different SKUs.

Change-Id: I36eb95ca8e72efdd294ccd684841195925dfe820
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/amdsmi commit: 7f2d970a80]
2019-08-22 12:24:29 -04:00
Chris Freehill 9e945d3e1a Fix building lib and test in non-automated (CI) env.
Also, use abbreviated ROCM_BUILD_ID environment variable for job
and build number, if it's available.

Change-Id: Ib5a721f5920f1008bb6382935f7b439429389de0


[ROCm/amdsmi commit: aa2db48237]
2019-08-14 23:18:15 -05:00
Chris Freehill 25010162d5 Add build and job numbers to package version
Change-Id: I06baf23e09b3a63a24d0046046f7f22281e0ec93


[ROCm/amdsmi commit: dffa533e13]
2019-08-14 09:48:59 -05:00
Chris Freehill 3bdbb5f9bf Conform versioning of to uniform version standards
Library version will now only have major and minor. Package
version will now include number of commits since previous
package. Both SO and package versions rely on git tags to
determine the current build and the commits since the last
release.

Change-Id: If2bda74bf342930a9e07f5c91cb1380b6b7c64ca


[ROCm/amdsmi commit: fe738eaedb]
2019-08-12 08:59:09 -05:00
Chris Freehill 6d6689b02c Adjust how we read ECC block counter status
This change corresponds to kernel changes.

Change-Id: Ibd977e8b3338349036cb16e55fb0b2c9c187726d


[ROCm/amdsmi commit: aaecfd6fff]
2019-08-09 16:06:43 -05:00
Kent Russell fe56aa131f Fix RAS change
RAS formatting changed, so get it to handle both types of sysfs output
until it's normalized
Change-Id: I56f2a2495af8ff4d01011bc614283376afb9ad0a


[ROCm/amdsmi commit: a34832f11e]
2019-08-08 12:09:18 -04:00