Граф коммитов

214 Коммитов

Автор SHA1 Сообщение Дата
Chris Freehill 001aa0b825 Fix docs + cmake_utils path issues
This corrects issues that arose after OAM reorganization.
It should address SWDEV-243294.

Also, fix some compile warnings that show up on RHEL.

Change-Id: Id14d444905da35cd7346bcfbcd82b6d0572708c4


[ROCm/rocm_smi_lib commit: c2ef9a6879]
2020-07-08 09:47:25 -05:00
Chris Freehill 77083980a8 Quiet spurious pthread_unlock warnings
A message is output in debug builds when pthread_unlock
returns an error. However, in most cases, it should return
EPERM. In fact, if it doesn't return EPERM, it is an
indication of a problem. This commit adjusts accordingly.

Change-Id: Ia5cad89aa6e68e79c1291ea21adffb0fa68f2300


[ROCm/rocm_smi_lib commit: 866438966d]
2020-06-30 15:12:58 -05:00
Divya Shikre 33f4141218 OAM: Implement get_sensors_info()
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia2c6e18f463c0f97530ca8ad07d249e6f2116534


[ROCm/rocm_smi_lib commit: e21232f059]
2020-06-29 14:50:19 -04:00
Amber Lin 9f72c8c08f OAM: Add get dev and pci properties and sensor count
Also, add amdoam_get_error_description.

On behalf of
Amber Lin <Amber.Lin@amd.com> and
Divya Shikre <DivyaUday.Shikre@amd.com>

Change-Id: I1f5ac0c5948adb2c30008e95c501e8b69b8183b6


[ROCm/rocm_smi_lib commit: 27deaea6e8]
2020-06-23 17:21:07 -05:00
Chris Freehill 98b976ef3e Refactor rsmi to support oam
Change-Id: Idc524e01ba06eb5c8d1682becaf5bf8ced5bffcf


[ROCm/rocm_smi_lib commit: 6594f8f58b]
2020-06-22 18:51:46 -05:00
Chris Freehill 4c94842508 Ensure no device mutexes are left held on shut_down
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.

Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b


[ROCm/rocm_smi_lib commit: 59394f3354]
2020-06-19 13:59:20 -05:00
Mike Li b3bb190b8d Add support to retrieve XGMI hive id
Change-Id: I1eee05dd85ecb856889d1cfe0565454d2f538856
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/rocm_smi_lib commit: 488bbb668a]
2020-06-19 07:35:23 -07:00
Chris Freehill 4fd855fe72 Fix line endings for init_shutdown_refcount.*
* Also, add assert that check for proper usage of
rand_sleep_mod().

Change-Id: Ieb4179e1ad12fbbf85c2e4f7c7f119b0bb30b197


[ROCm/rocm_smi_lib commit: 9e0ebb250c]
2020-06-17 21:26:12 -05:00
Chris Freehill b1787b8968 Make verbosity level 0 completely quiet
Also, support --iterations flag for certain functions that will
likely be repeated frequently.

Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6


[ROCm/rocm_smi_lib commit: efc9b7658c]
2020-06-17 21:26:12 -05:00
Divya Shikre 82825b40d3 Adding current voltage feature & gtest.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic555a3af265e603419e2875d1989a366abc82596


[ROCm/rocm_smi_lib commit: 2805ed16a4]
2020-06-16 11:48:56 -04:00
Chris Freehill a17db4e98c Don't automatically overwrite manual .pdf file
Automatically updating the manual pdf file causes a local
git change. This messes up "repo sync" calls because of the
local change. Instead, just write an un-tracked file that can
be used to update the tracked version of the manual .pdf.

Change-Id: Icd7edc244df60728ec169c5aa1cf8b322ca4143b


[ROCm/rocm_smi_lib commit: 8e6f7c798d]
2020-06-12 14:19:15 -05:00
Chris Freehill b07fd8fca7 Update XGMI perf counter test to show utilization
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
  reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file

Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d


[ROCm/rocm_smi_lib commit: f946ea37ef]
2020-06-10 12:49:49 -04:00
Kent Russell ef34c94574 Make an empty unique_id file non-fatal
This isn't supported on all models, so just comment out on failure
instead of fully failing

Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6


[ROCm/rocm_smi_lib commit: 8cf44548c0]
2020-06-04 10:31:53 -04:00
Mukul Joshi 4067baeff3 Print VRAM usage in rsmitst
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.

Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2


[ROCm/rocm_smi_lib commit: 633c852f5d]
2020-05-29 15:48:06 -04:00
Chris Freehill 430e52c394 Check only the minor ioctl version for event support
Change-Id: I70ddab4298a62178b2509a0365ee4cd6937302c1


[ROCm/rocm_smi_lib commit: 42c10633b6]
2020-05-27 09:27:01 -04:00
Mukul Joshi 506f71a01c Add support to retrieve process VRAM usage information.
Change-Id: I60843a99207a658022a26aa346b79f91863833cf


[ROCm/rocm_smi_lib commit: e30ebbc787]
2020-05-26 15:19:24 -04:00
Chris Freehill 13f3e6afb2 Update README doc. build instructions
* Also, remove dependency of manual pdf on the README
file; they are independent of each other.

Change-Id: I1ab8c8c9adf6b78e5b4aab86ecdf4c46f3a6bf63


[ROCm/rocm_smi_lib commit: bdf22c1c9e]
2020-05-21 09:10:08 -04:00
Chris Freehill 11b90a242a Return an error instead of assert when reading bad data
Assert doesn't help with release builds.

Change-Id: Ib076791fd442e96c7544914cdf08774fc7a40a94


[ROCm/rocm_smi_lib commit: 754a993d32]
2020-05-19 15:58:40 -04:00
Chris Freehill db0ed00070 Add RSMI ref manual to packages
Also,
* remove extraneous test files
* fix Doxygen docs. issues
* fix whitespace issues

Change-Id: I9b58b0d68bd125a34f4fe0dc84d609c7b0b6e30e


[ROCm/rocm_smi_lib commit: 8ced9c986a]
2020-05-18 23:40:38 -04:00
Mike Li f7885be06e Add functions that are used to query Hardware topology.
Change-Id: I0f4cd02b237bde4d6dccfb0e83e65376ecb1cfaa
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/rocm_smi_lib commit: c7d349183a]
2020-05-18 12:37:27 -04:00
Chris Freehill 24090f313a Fix README example error
Change-Id: Ib0124642cea34dcbfae0ea3bbe8ffaf09116bede


[ROCm/rocm_smi_lib commit: f8d623cb44]
2020-05-15 12:09:05 -04:00
Chris Freehill d4dda0017c Don't use static variable for monitors
Change-Id: I24b5ccfa94b2d722b070a6c6385af9201d21d9c5


[ROCm/rocm_smi_lib commit: 02e4a9c14f]
2020-05-15 08:05:06 -04:00
Chris Freehill 2862d311da Catch and handle regex exception
When pattern matching file names to determine API support, in
some environments std::regex will throw. This change is meant
to handle this more gracefully.

Change-Id: If1ccfe5bdd71ec4d08663c80692024488072e11b


[ROCm/rocm_smi_lib commit: 27148a02cb]
2020-05-14 15:39:40 -04:00
Chris Freehill affa804d13 Require gcc version 5.4.0 or greater
To avoid build and runtime issues, we should set a minimum
compiler version. std::regex, used by rocm_smi_lib, requires
4.9.0 or greater. However, the development and test
environments are (mainly) 5.4.0.

Change-Id: Ie18e9f905786ec8eb50d61a326cb45173a0ec355


[ROCm/rocm_smi_lib commit: b7ff71c001]
2020-05-14 15:15:56 -04:00
Chris Freehill f2779ee3db Merge "Add ref counting for rsmi init and shutdown" into amd-master
[ROCm/rocm_smi_lib commit: 44f14f4a86]
2020-05-13 09:24:32 -04:00
Pruthvi Madugundu b172889329 Merge "Adding lib symlink to top level rocm lib directory" into amd-master
[ROCm/rocm_smi_lib commit: 2143bc30a1]
2020-05-11 21:25:32 -04:00
Pruthvi Madugundu 4a4daf2a94 Adding lib symlink to top level rocm lib directory
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: Id00e501de7c3cbc814d18493b97449a5fcb96fd6


[ROCm/rocm_smi_lib commit: 2f3535f2eb]
2020-05-11 15:35:12 -07:00
Chris Freehill 0ab5e76b33 Add ref counting for rsmi init and shutdown
Also, clean lint from kfd_ioctl.h file.

Change-Id: I5a2ae127ab6ab6676a1b075ed10858d0ebfe13c1


[ROCm/rocm_smi_lib commit: 8e03d10035]
2020-05-11 15:57:42 -05:00
Chris Freehill ab2a22c90c Use user-mode version of kfd_ioctl.h file
Previously using kernel mode version.

Change-Id: I82bfff9c019a9059b4d0d198c6cf06dc515cc528


[ROCm/rocm_smi_lib commit: e1f0d7e85a]
2020-05-07 17:13:59 -05:00
Amber Lin 894ca737bc Allow specifying rsmi-lib install path
Instead of hard-coding install path to /opt/rocm, allow users to specify
where "make install" goes to so users can install lib to their local build
path for testing purpose without touching global /opt/rocm files.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I4144988e325edae4d1d1a2824e031996091036d3


[ROCm/rocm_smi_lib commit: 741f9c31ff]
2020-05-06 18:08:22 -04:00
Chris Freehill 1959ccb265 Add event notification API
Change-Id: Ib6e8efbe6cdefaa7de1f74bd26993e9b4b011649


[ROCm/rocm_smi_lib commit: 2235ede34c]
2020-05-06 14:07:25 -05:00
Chris Freehill fee5adb228 Handle rsmi app running on machine with no AMD gpus
This fixes a seg fault that would happen in release builds when
there are no KFD nodes on a system, which occurs when there are
not AMD gpus present in the system. This use case occurs
for higher application code that is meant to be gpu agnostic.

Change-Id: If374930bc2e62f9898f337349cde3ebb16091ff0


[ROCm/rocm_smi_lib commit: 806f665a85]
2020-04-28 00:35:16 -04:00
Chris Freehill 0759abca07 Add checking for no-longer-existing process in test
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.

* Also, fix compile warning by removing unused variable.

Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05


[ROCm/rocm_smi_lib commit: 1c9ef44398]
2020-04-10 08:51:44 -05:00
Chris Freehill 01401b0caa Add device mutual exclusion tests and related fixes
* Added a new test to verify mutual exclusion of access to device
  resources
* Added some missing acquiring of mutexes to some RSMI calls, as
  well as try-catch blocks.

Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9


[ROCm/rocm_smi_lib commit: f8b57c3b16]
2020-04-08 15:05:11 -05:00
Chris Freehill 49b562b209 Shared mutex fixes and improvements
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
  initialization fails. This may mess up other processes that
  are using it. Instead, print a message on how to resolve the
  situation, and then throw an error.

  Note, this situation comes up when debug builds (usually)
  either assert() or otherwise end execution without a proper
  clean up.
* Remove cpplint from shared_mutex code

Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966


[ROCm/rocm_smi_lib commit: 52196caaee]
2020-04-06 17:08:33 -05:00
Mukul Joshi 7137023637 Add rsmi_topo_get_numa_affinity()
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.

Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a


[ROCm/rocm_smi_lib commit: fd79e5c161]
2020-04-01 11:38:08 -04:00
Chris Freehill 024e27229c Documentation update
Change-Id: I646cf3d2fd6064295937f7e727076532894d3514


[ROCm/rocm_smi_lib commit: 7abe6dc1b2]
2020-03-27 14:08:19 -05:00
Chris Freehill 17871ecb14 More general solution to api support hwmon mapping
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.

Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a


[ROCm/rocm_smi_lib commit: 324c0ca0e5]
2020-03-16 11:37:47 -05:00
Chris Freehill 4e2d769dcc Fix indexing problem with api support function
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.

Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3


[ROCm/rocm_smi_lib commit: 1d8e16bff2]
2020-03-12 11:43:01 -05:00
Chris Freehill 06149e94bb Make rsmitst tests fail quickly if rsmi_init fails
Change-Id: I7b5d94b77305b30e08f33e1ddb6e2f089db0431f


[ROCm/rocm_smi_lib commit: d9ab846bee]
2020-03-11 12:13:28 -05:00
Chris Freehill a7ca81d161 Don't assert or re-throw exception caught at top level
Instead, return error and let caller deal with it.

Change-Id: I1a55337134b00aa4259af27281b2450fc2252be9


[ROCm/rocm_smi_lib commit: d54a9484be]
2020-03-11 12:11:29 -05:00
Chris Freehill 6ba4f32620 Correct rsmitst build instructions
Change-Id: Ia7dbdd7a489d235c6003badb79f2d0808e18143b


[ROCm/rocm_smi_lib commit: a482394263]
2020-03-02 16:29:10 -05:00
Chris Freehill e4d918aa70 Fix segmentation fault that sometimes occurs on release builds
Fixes SWDEV-216441

Change-Id: I3ea01a4edd14000a103de751757dfaadc7d358bb


[ROCm/rocm_smi_lib commit: 0bf81ed2f9]
2020-02-24 17:17:26 -06:00
Chris Freehill 95d3da04b9 Add rsmi_compute_process_gpus_get()
Given a process ID, give the device indices that process is
currently using.

Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
  KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting

Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9


[ROCm/rocm_smi_lib commit: 2d6e15190c]
2020-02-22 10:47:58 -06:00
Chris Freehill 386bab024e Merge "Ensure string is non-empty before calling stoul or stoi" into amd-master
[ROCm/rocm_smi_lib commit: 842bd29568]
2020-01-30 20:16:56 -05:00
Srinivasan Subramanian 05db31fdc7 Changes for multiple ROCm installation
1. Support multiple rocm installtion
2. Support shared library versioning.

Change-Id: Id5c25b90abed084e8fe8cb7c374c2d4384653bbf


[ROCm/rocm_smi_lib commit: 29d55e001a]
2020-01-30 11:08:57 -08:00
Chris Freehill 61db4c7e15 Ensure string is non-empty before calling stoul or stoi
Change-Id: I2c6314fb86d3bba8fd6aab932dbb989263fa8542


[ROCm/rocm_smi_lib commit: f748868818]
2020-01-28 17:05:14 -06:00
Chris Freehill 078c298e7b Security improvements
Improvements include
* adding additional build flags that warn about stack-smashing
and type conversion errors
* run-time checks for valid function input values and adquate
space for the result of arithmetic operations.
* make sure default case for switch statements do something
besides just assert
* disable using env. var. debugging in release mode

Change-Id: I5f048310c5c56e05d9ec31bcc273404d6a0dd646


[ROCm/rocm_smi_lib commit: d00b9ac07d]
2020-01-16 14:56:27 -06:00
Chris Freehill 322d1ff303 Use default value for version when git tags not present
Also, documentation typo correction.

Change-Id: I7fe4de05d3b8fb808a980862a09a9be32ed32bf5


[ROCm/rocm_smi_lib commit: fe4f7ed4a1]
2019-12-19 08:32:38 -06:00
Chris Freehill ddbe8013fe Merge "Make dpkg and rpm package names match their file names" into amd-master
[ROCm/rocm_smi_lib commit: 8ffe1bc7f6]
2019-11-09 14:27:17 -05:00