Graf commitů

1558 Commity

Autor SHA1 Zpráva Datum
Mike Li 488bbb668a Add support to retrieve XGMI hive id
Change-Id: I1eee05dd85ecb856889d1cfe0565454d2f538856
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2020-06-19 07:35:23 -07:00
Chris Freehill 9e0ebb250c Fix line endings for init_shutdown_refcount.*
* Also, add assert that check for proper usage of
rand_sleep_mod().

Change-Id: Ieb4179e1ad12fbbf85c2e4f7c7f119b0bb30b197
2020-06-17 21:26:12 -05:00
Chris Freehill efc9b7658c Make verbosity level 0 completely quiet
Also, support --iterations flag for certain functions that will
likely be repeated frequently.

Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6
2020-06-17 21:26:12 -05:00
Divya Shikre 2805ed16a4 Adding current voltage feature & gtest.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic555a3af265e603419e2875d1989a366abc82596
2020-06-16 11:48:56 -04:00
Chris Freehill 8e6f7c798d Don't automatically overwrite manual .pdf file
Automatically updating the manual pdf file causes a local
git change. This messes up "repo sync" calls because of the
local change. Instead, just write an un-tracked file that can
be used to update the tracked version of the manual .pdf.

Change-Id: Icd7edc244df60728ec169c5aa1cf8b322ca4143b
2020-06-12 14:19:15 -05:00
Chris Freehill f946ea37ef Update XGMI perf counter test to show utilization
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
  reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file

Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
2020-06-10 12:49:49 -04:00
Kent Russell 8cf44548c0 Make an empty unique_id file non-fatal
This isn't supported on all models, so just comment out on failure
instead of fully failing

Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6
2020-06-04 10:31:53 -04:00
Mukul Joshi 633c852f5d Print VRAM usage in rsmitst
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.

Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2
2020-05-29 15:48:06 -04:00
Chris Freehill 42c10633b6 Check only the minor ioctl version for event support
Change-Id: I70ddab4298a62178b2509a0365ee4cd6937302c1
2020-05-27 09:27:01 -04:00
Mukul Joshi e30ebbc787 Add support to retrieve process VRAM usage information.
Change-Id: I60843a99207a658022a26aa346b79f91863833cf
2020-05-26 15:19:24 -04:00
Chris Freehill bdf22c1c9e Update README doc. build instructions
* Also, remove dependency of manual pdf on the README
file; they are independent of each other.

Change-Id: I1ab8c8c9adf6b78e5b4aab86ecdf4c46f3a6bf63
2020-05-21 09:10:08 -04:00
Chris Freehill 754a993d32 Return an error instead of assert when reading bad data
Assert doesn't help with release builds.

Change-Id: Ib076791fd442e96c7544914cdf08774fc7a40a94
2020-05-19 15:58:40 -04:00
Chris Freehill 8ced9c986a Add RSMI ref manual to packages
Also,
* remove extraneous test files
* fix Doxygen docs. issues
* fix whitespace issues

Change-Id: I9b58b0d68bd125a34f4fe0dc84d609c7b0b6e30e
2020-05-18 23:40:38 -04:00
Mike Li c7d349183a Add functions that are used to query Hardware topology.
Change-Id: I0f4cd02b237bde4d6dccfb0e83e65376ecb1cfaa
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2020-05-18 12:37:27 -04:00
Chris Freehill f8d623cb44 Fix README example error
Change-Id: Ib0124642cea34dcbfae0ea3bbe8ffaf09116bede
2020-05-15 12:09:05 -04:00
Chris Freehill 02e4a9c14f Don't use static variable for monitors
Change-Id: I24b5ccfa94b2d722b070a6c6385af9201d21d9c5
2020-05-15 08:05:06 -04:00
Chris Freehill 27148a02cb Catch and handle regex exception
When pattern matching file names to determine API support, in
some environments std::regex will throw. This change is meant
to handle this more gracefully.

Change-Id: If1ccfe5bdd71ec4d08663c80692024488072e11b
2020-05-14 15:39:40 -04:00
Chris Freehill b7ff71c001 Require gcc version 5.4.0 or greater
To avoid build and runtime issues, we should set a minimum
compiler version. std::regex, used by rocm_smi_lib, requires
4.9.0 or greater. However, the development and test
environments are (mainly) 5.4.0.

Change-Id: Ie18e9f905786ec8eb50d61a326cb45173a0ec355
2020-05-14 15:15:56 -04:00
Chris Freehill 44f14f4a86 Merge "Add ref counting for rsmi init and shutdown" into amd-master 2020-05-13 09:24:32 -04:00
Pruthvi Madugundu 2143bc30a1 Merge "Adding lib symlink to top level rocm lib directory" into amd-master 2020-05-11 21:25:32 -04:00
Pruthvi Madugundu 2f3535f2eb Adding lib symlink to top level rocm lib directory
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: Id00e501de7c3cbc814d18493b97449a5fcb96fd6
2020-05-11 15:35:12 -07:00
Chris Freehill 8e03d10035 Add ref counting for rsmi init and shutdown
Also, clean lint from kfd_ioctl.h file.

Change-Id: I5a2ae127ab6ab6676a1b075ed10858d0ebfe13c1
2020-05-11 15:57:42 -05:00
Chris Freehill e1f0d7e85a Use user-mode version of kfd_ioctl.h file
Previously using kernel mode version.

Change-Id: I82bfff9c019a9059b4d0d198c6cf06dc515cc528
2020-05-07 17:13:59 -05:00
Amber Lin 741f9c31ff Allow specifying rsmi-lib install path
Instead of hard-coding install path to /opt/rocm, allow users to specify
where "make install" goes to so users can install lib to their local build
path for testing purpose without touching global /opt/rocm files.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I4144988e325edae4d1d1a2824e031996091036d3
2020-05-06 18:08:22 -04:00
Chris Freehill 2235ede34c Add event notification API
Change-Id: Ib6e8efbe6cdefaa7de1f74bd26993e9b4b011649
2020-05-06 14:07:25 -05:00
Chris Freehill 806f665a85 Handle rsmi app running on machine with no AMD gpus
This fixes a seg fault that would happen in release builds when
there are no KFD nodes on a system, which occurs when there are
not AMD gpus present in the system. This use case occurs
for higher application code that is meant to be gpu agnostic.

Change-Id: If374930bc2e62f9898f337349cde3ebb16091ff0
2020-04-28 00:35:16 -04:00
Chris Freehill 1c9ef44398 Add checking for no-longer-existing process in test
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.

* Also, fix compile warning by removing unused variable.

Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05
2020-04-10 08:51:44 -05:00
Chris Freehill f8b57c3b16 Add device mutual exclusion tests and related fixes
* Added a new test to verify mutual exclusion of access to device
  resources
* Added some missing acquiring of mutexes to some RSMI calls, as
  well as try-catch blocks.

Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9
2020-04-08 15:05:11 -05:00
Chris Freehill 52196caaee Shared mutex fixes and improvements
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
  initialization fails. This may mess up other processes that
  are using it. Instead, print a message on how to resolve the
  situation, and then throw an error.

  Note, this situation comes up when debug builds (usually)
  either assert() or otherwise end execution without a proper
  clean up.
* Remove cpplint from shared_mutex code

Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966
2020-04-06 17:08:33 -05:00
Mukul Joshi fd79e5c161 Add rsmi_topo_get_numa_affinity()
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.

Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a
2020-04-01 11:38:08 -04:00
Chris Freehill 7abe6dc1b2 Documentation update
Change-Id: I646cf3d2fd6064295937f7e727076532894d3514
2020-03-27 14:08:19 -05:00
Chris Freehill 324c0ca0e5 More general solution to api support hwmon mapping
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.

Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a
2020-03-16 11:37:47 -05:00
Chris Freehill 1d8e16bff2 Fix indexing problem with api support function
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.

Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3
2020-03-12 11:43:01 -05:00
Chris Freehill d9ab846bee Make rsmitst tests fail quickly if rsmi_init fails
Change-Id: I7b5d94b77305b30e08f33e1ddb6e2f089db0431f
2020-03-11 12:13:28 -05:00
Chris Freehill d54a9484be Don't assert or re-throw exception caught at top level
Instead, return error and let caller deal with it.

Change-Id: I1a55337134b00aa4259af27281b2450fc2252be9
2020-03-11 12:11:29 -05:00
Chris Freehill a482394263 Correct rsmitst build instructions
Change-Id: Ia7dbdd7a489d235c6003badb79f2d0808e18143b
2020-03-02 16:29:10 -05:00
Chris Freehill 0bf81ed2f9 Fix segmentation fault that sometimes occurs on release builds
Fixes SWDEV-216441

Change-Id: I3ea01a4edd14000a103de751757dfaadc7d358bb
2020-02-24 17:17:26 -06:00
Chris Freehill 2d6e15190c Add rsmi_compute_process_gpus_get()
Given a process ID, give the device indices that process is
currently using.

Also:
* made corrections to how RSMI, amdgpu (ie, "card#") and
  KFD indicies translate from one another
* add a few missing error codes to rsmi_status_string()
* fix some formatting

Change-Id: Icd2cae66bb4fec768da96af7cf9cf8b8b66ec7f9
2020-02-22 10:47:58 -06:00
Chris Freehill 842bd29568 Merge "Ensure string is non-empty before calling stoul or stoi" into amd-master 2020-01-30 20:16:56 -05:00
Srinivasan Subramanian 29d55e001a Changes for multiple ROCm installation
1. Support multiple rocm installtion
2. Support shared library versioning.

Change-Id: Id5c25b90abed084e8fe8cb7c374c2d4384653bbf
2020-01-30 11:08:57 -08:00
Chris Freehill f748868818 Ensure string is non-empty before calling stoul or stoi
Change-Id: I2c6314fb86d3bba8fd6aab932dbb989263fa8542
2020-01-28 17:05:14 -06:00
Chris Freehill d00b9ac07d Security improvements
Improvements include
* adding additional build flags that warn about stack-smashing
and type conversion errors
* run-time checks for valid function input values and adquate
space for the result of arithmetic operations.
* make sure default case for switch statements do something
besides just assert
* disable using env. var. debugging in release mode

Change-Id: I5f048310c5c56e05d9ec31bcc273404d6a0dd646
2020-01-16 14:56:27 -06:00
Chris Freehill fe4f7ed4a1 Use default value for version when git tags not present
Also, documentation typo correction.

Change-Id: I7fe4de05d3b8fb808a980862a09a9be32ed32bf5
2019-12-19 08:32:38 -06:00
Chris Freehill 8ffe1bc7f6 Merge "Make dpkg and rpm package names match their file names" into amd-master 2019-11-09 14:27:17 -05:00
Chris Freehill c926d50c3a Make dpkg and rpm package names match their file names
For example,
$ dpkg -i rocm-smi-lib64-2.0.0.1.local-build-0-d10a391.deb 

will yield:
 ...
 Package: rocm-smi-lib64
 Version: 2.0.0.1.local-build-0-d10a391
 ...

Change-Id: I1e56e0c623b9421261cf0864958e821d10226d39
2019-11-08 15:09:16 -04:00
Chris Freehill 1004a01094 Disable TestFrequenciesReadWrite for arcturus
Change-Id: Ia20ec853cdba34ff3dcdc68b4f869890bf58b539
2019-11-07 16:22:45 -05:00
Chris Freehill 4ebb436893 Merge "Docs., error checking and test improvements" into amd-master 2019-11-06 20:15:26 -05:00
Chris Freehill 0e5c44de2a Use "-" instead of "_" for package name
This is part of fix to SWDEV-208805. The other part will
be in the build_* script.

Change-Id: I36397e3f918d08170db8bb228722a2b7389af83b
2019-11-06 11:31:50 -05:00
Chris Freehill 52dfa4bcca Docs., error checking and test improvements
* Update doc. on api-support function
* Check for valid integer value when reading a monitor int. val.
* If fan-write test attempts to set speed higher than max.
   possible, then skip the test

Change-Id: I01ad0ab1f4caffdb0d2c26e9575f278c35a6b017
2019-11-06 11:19:47 -05:00
Chris Freehill 3a26a7270c Support rsmitst blacklisting by adding an exclude file
Change-Id: I9d581b8e24363a688b58a6ca59a6521c7be364d7
2019-10-17 13:47:02 -05:00