Gráfico de commits

224 Commits

Autor SHA1 Mensaje Fecha
Chris Freehill fc4d433877 Correct event counter documentation example
Change-Id: I74c41de8e4aacbd42d9e156983369eb76bec3367


[ROCm/rocm_smi_lib commit: 0468aa4971]
2020-08-06 08:49:21 -05:00
Ori Messinger 217b9b2aea ROCm SMI Python CLI
This tool acts as a command line interface for manipulating
and monitoring the Radeon Open Compute Kernel, similar to the
rocm_smi.py python tool.

The purpose of this commit is for the initial upload and cleanup
of the (incomplete) rocmSmiLib_cli.py and rsmiBindings.py files.

In the near future, this tool should have full feature parity with
rocm_smi.py by relying on the available rocm_smi_lib functions.

Change-Id: Ifbafd5118c15c68c240e3c83a47d2690a27c9353
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>


[ROCm/rocm_smi_lib commit: 2b909252ac]
2020-08-05 12:38:11 -04:00
Chris Freehill 0fe5175ed1 Replace "." in pkg name with "-"
Package name should have a hyphen (not a period) between
NumCommitsSinceLastTag and ROCMIntegrationJobIdentifier.

Fixes SWDEV-245838

Change-Id: I28c4337af6f92ac51a4aed03a09af23b92bd89b5


[ROCm/rocm_smi_lib commit: 92c258c364]
2020-07-27 20:54:52 -04:00
Chris Freehill b662e7ce51 Correct usage of bitwise &
Also, fix warning related to catch() and cpplint error.

Change-Id: I4292170538d0f700fccb605814c5058543abe74a


[ROCm/rocm_smi_lib commit: c2439d28e8]
2020-07-26 20:08:24 -05:00
Ashutosh Mishra 4371cc7afd Adding "BUILD_SHARED_LIBS" flag to cmake files
JIRA : SWDEV-234471
Changing cmake for dynamically creation of shared / archive libs depending upon the parameret to cmake

Adapted comments.

Change-Id: Ice5925719b8c307c32310b252f61cbc211d1af27


[ROCm/rocm_smi_lib commit: d325613220]
2020-07-16 22:32:55 -04:00
Chris Freehill 5c2ac56166 Update xgmi event counter documentation
Also:
* fix doxygen manual generation that was altered during
  OAM refactor
* quiet some compile warnings.

Change-Id: I548a3cf00eb887bea3dbf58e362ca6dfe90bde28


[ROCm/rocm_smi_lib commit: 52514835f0]
2020-07-16 17:42:56 -05:00
Mukul Joshi fd17bdb90f Fix compiler warning in TestPciReadWrite
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.

Change-Id: I05948073b0c40700bee69399b08df6031fc49d70


[ROCm/rocm_smi_lib commit: 9d24fc9175]
2020-07-13 17:32:17 -04:00
Mukul Joshi fdda24038f Add support to retrieve process SDMA usage information.
Also, print SDMA usage information in TestProcInfoRead.

Change-Id: I8d19be3b8653e298c81237e5067eca75a1743e70


[ROCm/rocm_smi_lib commit: eea1ed8c3d]
2020-07-13 17:32:08 -04:00
Chris Freehill 202434abad Handle un-readable kfd properties files
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.

Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".

Fixes SWDEV-240169

Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues

Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd


[ROCm/rocm_smi_lib commit: 68155baed5]
2020-07-10 12:35:31 -04:00
Chris Freehill 5a8b57437e TestPerfCntrReadWrite fail rsmitst if not supported
Fixes SWDEV-243639

Change-Id: I087171231fbbe5939f239efad25a5485529381a3


[ROCm/rocm_smi_lib commit: e2c7ef6422]
2020-07-08 18:41:30 -04:00
Chris Freehill 001aa0b825 Fix docs + cmake_utils path issues
This corrects issues that arose after OAM reorganization.
It should address SWDEV-243294.

Also, fix some compile warnings that show up on RHEL.

Change-Id: Id14d444905da35cd7346bcfbcd82b6d0572708c4


[ROCm/rocm_smi_lib commit: c2ef9a6879]
2020-07-08 09:47:25 -05:00
Chris Freehill 77083980a8 Quiet spurious pthread_unlock warnings
A message is output in debug builds when pthread_unlock
returns an error. However, in most cases, it should return
EPERM. In fact, if it doesn't return EPERM, it is an
indication of a problem. This commit adjusts accordingly.

Change-Id: Ia5cad89aa6e68e79c1291ea21adffb0fa68f2300


[ROCm/rocm_smi_lib commit: 866438966d]
2020-06-30 15:12:58 -05:00
Divya Shikre 33f4141218 OAM: Implement get_sensors_info()
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia2c6e18f463c0f97530ca8ad07d249e6f2116534


[ROCm/rocm_smi_lib commit: e21232f059]
2020-06-29 14:50:19 -04:00
Amber Lin 9f72c8c08f OAM: Add get dev and pci properties and sensor count
Also, add amdoam_get_error_description.

On behalf of
Amber Lin <Amber.Lin@amd.com> and
Divya Shikre <DivyaUday.Shikre@amd.com>

Change-Id: I1f5ac0c5948adb2c30008e95c501e8b69b8183b6


[ROCm/rocm_smi_lib commit: 27deaea6e8]
2020-06-23 17:21:07 -05:00
Chris Freehill 98b976ef3e Refactor rsmi to support oam
Change-Id: Idc524e01ba06eb5c8d1682becaf5bf8ced5bffcf


[ROCm/rocm_smi_lib commit: 6594f8f58b]
2020-06-22 18:51:46 -05:00
Chris Freehill 4c94842508 Ensure no device mutexes are left held on shut_down
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.

Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b


[ROCm/rocm_smi_lib commit: 59394f3354]
2020-06-19 13:59:20 -05:00
Mike Li b3bb190b8d Add support to retrieve XGMI hive id
Change-Id: I1eee05dd85ecb856889d1cfe0565454d2f538856
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/rocm_smi_lib commit: 488bbb668a]
2020-06-19 07:35:23 -07:00
Chris Freehill 4fd855fe72 Fix line endings for init_shutdown_refcount.*
* Also, add assert that check for proper usage of
rand_sleep_mod().

Change-Id: Ieb4179e1ad12fbbf85c2e4f7c7f119b0bb30b197


[ROCm/rocm_smi_lib commit: 9e0ebb250c]
2020-06-17 21:26:12 -05:00
Chris Freehill b1787b8968 Make verbosity level 0 completely quiet
Also, support --iterations flag for certain functions that will
likely be repeated frequently.

Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6


[ROCm/rocm_smi_lib commit: efc9b7658c]
2020-06-17 21:26:12 -05:00
Divya Shikre 82825b40d3 Adding current voltage feature & gtest.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic555a3af265e603419e2875d1989a366abc82596


[ROCm/rocm_smi_lib commit: 2805ed16a4]
2020-06-16 11:48:56 -04:00
Chris Freehill a17db4e98c Don't automatically overwrite manual .pdf file
Automatically updating the manual pdf file causes a local
git change. This messes up "repo sync" calls because of the
local change. Instead, just write an un-tracked file that can
be used to update the tracked version of the manual .pdf.

Change-Id: Icd7edc244df60728ec169c5aa1cf8b322ca4143b


[ROCm/rocm_smi_lib commit: 8e6f7c798d]
2020-06-12 14:19:15 -05:00
Chris Freehill b07fd8fca7 Update XGMI perf counter test to show utilization
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
  reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file

Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d


[ROCm/rocm_smi_lib commit: f946ea37ef]
2020-06-10 12:49:49 -04:00
Kent Russell ef34c94574 Make an empty unique_id file non-fatal
This isn't supported on all models, so just comment out on failure
instead of fully failing

Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6


[ROCm/rocm_smi_lib commit: 8cf44548c0]
2020-06-04 10:31:53 -04:00
Mukul Joshi 4067baeff3 Print VRAM usage in rsmitst
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.

Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2


[ROCm/rocm_smi_lib commit: 633c852f5d]
2020-05-29 15:48:06 -04:00
Chris Freehill 430e52c394 Check only the minor ioctl version for event support
Change-Id: I70ddab4298a62178b2509a0365ee4cd6937302c1


[ROCm/rocm_smi_lib commit: 42c10633b6]
2020-05-27 09:27:01 -04:00
Mukul Joshi 506f71a01c Add support to retrieve process VRAM usage information.
Change-Id: I60843a99207a658022a26aa346b79f91863833cf


[ROCm/rocm_smi_lib commit: e30ebbc787]
2020-05-26 15:19:24 -04:00
Chris Freehill 13f3e6afb2 Update README doc. build instructions
* Also, remove dependency of manual pdf on the README
file; they are independent of each other.

Change-Id: I1ab8c8c9adf6b78e5b4aab86ecdf4c46f3a6bf63


[ROCm/rocm_smi_lib commit: bdf22c1c9e]
2020-05-21 09:10:08 -04:00
Chris Freehill 11b90a242a Return an error instead of assert when reading bad data
Assert doesn't help with release builds.

Change-Id: Ib076791fd442e96c7544914cdf08774fc7a40a94


[ROCm/rocm_smi_lib commit: 754a993d32]
2020-05-19 15:58:40 -04:00
Chris Freehill db0ed00070 Add RSMI ref manual to packages
Also,
* remove extraneous test files
* fix Doxygen docs. issues
* fix whitespace issues

Change-Id: I9b58b0d68bd125a34f4fe0dc84d609c7b0b6e30e


[ROCm/rocm_smi_lib commit: 8ced9c986a]
2020-05-18 23:40:38 -04:00
Mike Li f7885be06e Add functions that are used to query Hardware topology.
Change-Id: I0f4cd02b237bde4d6dccfb0e83e65376ecb1cfaa
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/rocm_smi_lib commit: c7d349183a]
2020-05-18 12:37:27 -04:00
Chris Freehill 24090f313a Fix README example error
Change-Id: Ib0124642cea34dcbfae0ea3bbe8ffaf09116bede


[ROCm/rocm_smi_lib commit: f8d623cb44]
2020-05-15 12:09:05 -04:00
Chris Freehill d4dda0017c Don't use static variable for monitors
Change-Id: I24b5ccfa94b2d722b070a6c6385af9201d21d9c5


[ROCm/rocm_smi_lib commit: 02e4a9c14f]
2020-05-15 08:05:06 -04:00
Chris Freehill 2862d311da Catch and handle regex exception
When pattern matching file names to determine API support, in
some environments std::regex will throw. This change is meant
to handle this more gracefully.

Change-Id: If1ccfe5bdd71ec4d08663c80692024488072e11b


[ROCm/rocm_smi_lib commit: 27148a02cb]
2020-05-14 15:39:40 -04:00
Chris Freehill affa804d13 Require gcc version 5.4.0 or greater
To avoid build and runtime issues, we should set a minimum
compiler version. std::regex, used by rocm_smi_lib, requires
4.9.0 or greater. However, the development and test
environments are (mainly) 5.4.0.

Change-Id: Ie18e9f905786ec8eb50d61a326cb45173a0ec355


[ROCm/rocm_smi_lib commit: b7ff71c001]
2020-05-14 15:15:56 -04:00
Chris Freehill f2779ee3db Merge "Add ref counting for rsmi init and shutdown" into amd-master
[ROCm/rocm_smi_lib commit: 44f14f4a86]
2020-05-13 09:24:32 -04:00
Pruthvi Madugundu b172889329 Merge "Adding lib symlink to top level rocm lib directory" into amd-master
[ROCm/rocm_smi_lib commit: 2143bc30a1]
2020-05-11 21:25:32 -04:00
Pruthvi Madugundu 4a4daf2a94 Adding lib symlink to top level rocm lib directory
Signed-off-by: Pruthvi Madugundu <pruthvi.madugundu@amd.com>
Change-Id: Id00e501de7c3cbc814d18493b97449a5fcb96fd6


[ROCm/rocm_smi_lib commit: 2f3535f2eb]
2020-05-11 15:35:12 -07:00
Chris Freehill 0ab5e76b33 Add ref counting for rsmi init and shutdown
Also, clean lint from kfd_ioctl.h file.

Change-Id: I5a2ae127ab6ab6676a1b075ed10858d0ebfe13c1


[ROCm/rocm_smi_lib commit: 8e03d10035]
2020-05-11 15:57:42 -05:00
Chris Freehill ab2a22c90c Use user-mode version of kfd_ioctl.h file
Previously using kernel mode version.

Change-Id: I82bfff9c019a9059b4d0d198c6cf06dc515cc528


[ROCm/rocm_smi_lib commit: e1f0d7e85a]
2020-05-07 17:13:59 -05:00
Amber Lin 894ca737bc Allow specifying rsmi-lib install path
Instead of hard-coding install path to /opt/rocm, allow users to specify
where "make install" goes to so users can install lib to their local build
path for testing purpose without touching global /opt/rocm files.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Change-Id: I4144988e325edae4d1d1a2824e031996091036d3


[ROCm/rocm_smi_lib commit: 741f9c31ff]
2020-05-06 18:08:22 -04:00
Chris Freehill 1959ccb265 Add event notification API
Change-Id: Ib6e8efbe6cdefaa7de1f74bd26993e9b4b011649


[ROCm/rocm_smi_lib commit: 2235ede34c]
2020-05-06 14:07:25 -05:00
Chris Freehill fee5adb228 Handle rsmi app running on machine with no AMD gpus
This fixes a seg fault that would happen in release builds when
there are no KFD nodes on a system, which occurs when there are
not AMD gpus present in the system. This use case occurs
for higher application code that is meant to be gpu agnostic.

Change-Id: If374930bc2e62f9898f337349cde3ebb16091ff0


[ROCm/rocm_smi_lib commit: 806f665a85]
2020-04-28 00:35:16 -04:00
Chris Freehill 0759abca07 Add checking for no-longer-existing process in test
When getting process information for a process, it's possible
that between the time the process ID was discovered and when
we attempt to collect data for that process, that the process
ended. This change is meant to handle that in the test case.

* Also, fix compile warning by removing unused variable.

Change-Id: I62f9a84a63548c856f0661fef15b7d248a330c05


[ROCm/rocm_smi_lib commit: 1c9ef44398]
2020-04-10 08:51:44 -05:00
Chris Freehill 01401b0caa Add device mutual exclusion tests and related fixes
* Added a new test to verify mutual exclusion of access to device
  resources
* Added some missing acquiring of mutexes to some RSMI calls, as
  well as try-catch blocks.

Change-Id: I87aac009878a0b2d1f975e1d5b794d887bb23ff9


[ROCm/rocm_smi_lib commit: f8b57c3b16]
2020-04-08 15:05:11 -05:00
Chris Freehill 49b562b209 Shared mutex fixes and improvements
* Don't make different shared memory mutexes for different users
* Don't delete (unlink) the shared mutex file if the mutex
  initialization fails. This may mess up other processes that
  are using it. Instead, print a message on how to resolve the
  situation, and then throw an error.

  Note, this situation comes up when debug builds (usually)
  either assert() or otherwise end execution without a proper
  clean up.
* Remove cpplint from shared_mutex code

Change-Id: I5f8ca6150cac5c2405fb97007516da345093f966


[ROCm/rocm_smi_lib commit: 52196caaee]
2020-04-06 17:08:33 -05:00
Mukul Joshi 7137023637 Add rsmi_topo_get_numa_affinity()
Given a device index, return the corresponding NUMA node for the
device.
Also, add NUMA node tests to Sys Info Read test.

Change-Id: I0df4937470e6362e6737ccea568d4b3e5890c91a


[ROCm/rocm_smi_lib commit: fd79e5c161]
2020-04-01 11:38:08 -04:00
Chris Freehill 024e27229c Documentation update
Change-Id: I646cf3d2fd6064295937f7e727076532894d3514


[ROCm/rocm_smi_lib commit: 7abe6dc1b2]
2020-03-27 14:08:19 -05:00
Chris Freehill 17871ecb14 More general solution to api support hwmon mapping
This solution takes into account that some hwmons use
label files to map sensor types. The previous solution
did not take this into account.

Change-Id: I1d6204573cefa8197b2cfe0ffb412b545df3d80a


[ROCm/rocm_smi_lib commit: 324c0ca0e5]
2020-03-16 11:37:47 -05:00
Chris Freehill 4e2d769dcc Fix indexing problem with api support function
Also fix potential issue with evaluating functionality of
functions with multiple sub-variants.

Change-Id: I9a09e52f3d3f3181e72578ed1f3bfd0d85516aa3


[ROCm/rocm_smi_lib commit: 1d8e16bff2]
2020-03-12 11:43:01 -05:00
Chris Freehill 06149e94bb Make rsmitst tests fail quickly if rsmi_init fails
Change-Id: I7b5d94b77305b30e08f33e1ddb6e2f089db0431f


[ROCm/rocm_smi_lib commit: d9ab846bee]
2020-03-11 12:13:28 -05:00