Граф коммитов

247 Коммитов

Автор SHA1 Сообщение Дата
Mukul Joshi fb2ed24372 Use correct string conversion function for VRAM and SDMA usage
VRAM and SDMA usage can be 64-bit long numbers. Use stoull()
instead of stoi() to convert the VRAM and SDMA usage strings to
numbers.

Change-Id: Ifadbada9f33320fc67666036ce8439823c1d1fb7
2020-09-21 12:28:22 -04:00
Mukul Joshi 8b95705e6f Add support for GPU reset SMI events
Add handling for both pre GPU reset and post GPU reset SMI
events.

Change-Id: I64d5e006bef58cb28b1c580c75f482a4590427da
2020-09-16 13:25:06 -04:00
Mukul Joshi aff75c955f Add support for KFD Thermal Throttling SMI event
Add handling for receiving thermal throttling SMI event from the
kernel.
Also, update the event notification test to work with the new event.

Change-Id: Ib89c12b244f90998ccbae0a38b37f25705d156e0
2020-09-16 13:24:57 -04:00
Mukul Joshi 406859ca8a Update KFD SMI event notification handling
Event bitmask in KFD SMI event is now replaced with event index in
the SMI event message. Sending a event bitmask, which was a 64-bit
field with only 1 bit set, was quite wasteful of memory and also
potentially limiting to 64 events. Instead the kernel would send
event index in the SMI event message. As a result, update the
KFD SMI event handling to expect the event index in the message.

Change-Id: I3e74620788d3c1f7c0bdaa69e9d9ab3d1aba2c92
2020-09-16 13:24:50 -04:00
Chris Freehill 8f9f9433d8 Enable library-based rocm_smi.py
Change-Id: I5443308905456defc9818fac07ac2f20fe9426fd
2020-09-16 09:31:30 -05:00
Chris Freehill b015052a07 Make sure all sensor labels have valid mappings
There may not be label files for some sensors on older
devices. We need to make sure there is a valid dummy
mapping in these cases.

Change-Id: Id6a8b71e554552be84a0e42a477070b504151e7f
2020-09-11 17:32:54 -05:00
Chris Freehill cafd678d5d Add missing docs section for EvntNotif
Change-Id: I69187c734d2618ddb4272c58bb76d04646908793
2020-09-11 15:48:56 -05:00
Elena Sakhnovitch 91f8fcb7b1 ROCm SMI CLI: Add JSON support for topo functions
-Add divider between devices for --showclocks to increase readibility.
-Fix fan rounding error
-Fix spaces to comply with coding standard
-Fix @param description error in topo functions
-JSON result for topology:
{
  "card0": {
    "(Topology) Numa Node": "0",
    "(Topology) Numa Affinity": "4294967295"
  },
  "card1": {
    "(Topology) Numa Node": "0",
    "(Topology) Numa Affinity": "4294967295"
  },
  "system": {
    "(Topology) Weight between DRM devices 0 and 1": "40",
    "(Topology) Hops between DRM devices 0 and 1": "2",
    "(Topology) Link type between DRM devices 0 and 1": "PCIE"
  }
}

Signed-off-by: Elena Sakhnovitch <Elena.Sakhnovitch@amd.com>
Change-Id: I711c100362826ed729ff90edd407009237d64f8f
2020-09-10 12:57:14 -04:00
Elena Sakhnovitch edcae88fe9 Add README.md starter file
signed-off-by: Elena Sakhnovitch
Change-Id: I677b7d643c6559693c5ad627b704ee36631cc32e
2020-09-10 11:09:42 -04:00
Elena Sakhnovitch 8b82621e72 ROCm SMI Python CLI: Implement --showbw
PCIE bandwidth functionality

Signed-off-by: Elena Sakhnovitch
Change-Id: I5a9ddc589846b6032739d491319078ead5723a27
2020-09-09 14:52:58 -04:00
Harish Kasiviswanathan f1786a3095 Don't hard code rocm_smi_lib path
During rocm_smi_lib installation the path should be set using ldconfig

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I0cab18f492013b783d1ce632591ce295f934a168
2020-09-08 19:29:09 -04:00
Divya Shikre 54d4b9d500 Adding setsrange, setmrange, setvc, setslevel and setmlevel functionality to rocm lib and cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I5fd65ea7bcd5403aaf2e42d2aa28d837929da253
2020-09-08 18:42:39 -04:00
Ori Messinger 95d43e30e3 ROCm SMI Python CLI: Implement show/set mclk OverDrive
The purpose of this patch is to implement show and set mclk OverDrive.
This implementation is copied directly from the previous rocm_smi.py
script since this functionality is mostly deprecated.

Change-Id: I705430f873a73f954b6812c222a385ff4e9b6eb2
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-09-08 14:24:11 -04:00
Ori Messinger 2d59d0877b ROCm SMI Python CLI: Implement Valid Clocks
The purpose of this patch is to implement the remaining valid clocks.
The valid clocks are: dcefclk, fclk, mclk, pcie, sclk, socclk
This functionality is needed for the 'setClocks' method.

Change-Id: Ie648fb29dbbd61f0f064d4462ac566911f1ca2aa
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-09-02 06:40:59 -04:00
Divya Shikre d1f4c252b0 Adding voltage range functionality to rocm cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I9288c0c6cda2a984c34cfd2570deec640b6c9f0d
2020-08-28 12:04:36 -04:00
Divya Shikre 49734f8d34 Adding logic to skip the loop if src and dest device are the same in HW Topology.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ib9cfbf5a7238ba75f6463e8fa6250bb9946b7979
2020-08-20 10:44:28 -04:00
Harish Kasiviswanathan 9f5d4a698e Update rsmi_process_info_t with sdma_usage field
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ie326e75674127a2e13f17fac344e2b672e877ce1
2020-08-19 17:54:15 -04:00
Divya Shikre 1276e4b9e9 Adding gpu reset functionality to rocm cli
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ifc0a239e8e8046fd7f56893d0101e0866cc3185f
2020-08-19 13:37:47 -04:00
Chris Freehill 7be97ec2aa Clean up comments for rsmitst
Change-Id: Iea5322a5fd3bffe77557fa2cecbce70716e1258c
2020-08-17 11:48:07 -05:00
Divya Shikre 2e8dc4f2a9 Adding Sdma Usage to showpids
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com
Change-Id: I72a9e1adc61eba382f1ac17c8e50b2a8bd6d6898
2020-08-14 12:12:34 -04:00
Divya Shikre 4032898d1b Adding Hw Topology option to ROCm SMI Python CLI
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com
Change-Id: Ic46334567703f705e38b3a8b4a08ab388c749251
2020-08-13 18:51:21 -04:00
Ori Messinger b568270f55 ROCm SMI Python CLI: properly cast pid to int
The purpose of this patch is to fix --showpids and --showpidgpus functionality.
When pid is passed into a LIB function, it must be cast to int first.

Change-Id: I5cb7ac41052abeefff0dedf2384c4bb3c8d577a3
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-08-13 04:34:08 -04:00
Chris Freehill da64e284dc Move README back to root
README should be at root to display in github main page.
Also, removed paragraph related to API changes early
in development.

Change-Id: I2e92573a31d3caa7790364de9356c6d7e7be553d
2020-08-06 09:27:48 -05:00
Chris Freehill 0468aa4971 Correct event counter documentation example
Change-Id: I74c41de8e4aacbd42d9e156983369eb76bec3367
2020-08-06 08:49:21 -05:00
Ori Messinger 2b909252ac ROCm SMI Python CLI
This tool acts as a command line interface for manipulating
and monitoring the Radeon Open Compute Kernel, similar to the
rocm_smi.py python tool.

The purpose of this commit is for the initial upload and cleanup
of the (incomplete) rocmSmiLib_cli.py and rsmiBindings.py files.

In the near future, this tool should have full feature parity with
rocm_smi.py by relying on the available rocm_smi_lib functions.

Change-Id: Ifbafd5118c15c68c240e3c83a47d2690a27c9353
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2020-08-05 12:38:11 -04:00
Chris Freehill 92c258c364 Replace "." in pkg name with "-"
Package name should have a hyphen (not a period) between
NumCommitsSinceLastTag and ROCMIntegrationJobIdentifier.

Fixes SWDEV-245838

Change-Id: I28c4337af6f92ac51a4aed03a09af23b92bd89b5
2020-07-27 20:54:52 -04:00
Chris Freehill c2439d28e8 Correct usage of bitwise &
Also, fix warning related to catch() and cpplint error.

Change-Id: I4292170538d0f700fccb605814c5058543abe74a
2020-07-26 20:08:24 -05:00
Ashutosh Mishra d325613220 Adding "BUILD_SHARED_LIBS" flag to cmake files
JIRA : SWDEV-234471
Changing cmake for dynamically creation of shared / archive libs depending upon the parameret to cmake

Adapted comments.

Change-Id: Ice5925719b8c307c32310b252f61cbc211d1af27
2020-07-16 22:32:55 -04:00
Chris Freehill 52514835f0 Update xgmi event counter documentation
Also:
* fix doxygen manual generation that was altered during
  OAM refactor
* quiet some compile warnings.

Change-Id: I548a3cf00eb887bea3dbf58e362ca6dfe90bde28
2020-07-16 17:42:56 -05:00
Mukul Joshi 9d24fc9175 Fix compiler warning in TestPciReadWrite
Use unsigned number for left shift operation. If not specificed as
unsigned, compiler throws warning about left shift of negative
number.

Change-Id: I05948073b0c40700bee69399b08df6031fc49d70
2020-07-13 17:32:17 -04:00
Mukul Joshi eea1ed8c3d Add support to retrieve process SDMA usage information.
Also, print SDMA usage information in TestProcInfoRead.

Change-Id: I8d19be3b8653e298c81237e5067eca75a1743e70
2020-07-13 17:32:08 -04:00
Chris Freehill 68155baed5 Handle un-readable kfd properties files
Some systems have kfd sysfs properties entries that
are unreadable--for example, when a multi-gpu system is
dividing the gpus among containers, each container may
only be able to access certain gpus.

Previously, all kfd topology node properties entries were
assumed to be valid. Now, we check for readability before
declaring them "valid".

Fixes SWDEV-240169

Also:
* remove an assertion that would happen when no pcie
device identifier files are found on the system.
* fix cpplint issues

Change-Id: I74321b685159dd2628c890b33c39ad82988cb9dd
2020-07-10 12:35:31 -04:00
Chris Freehill e2c7ef6422 TestPerfCntrReadWrite fail rsmitst if not supported
Fixes SWDEV-243639

Change-Id: I087171231fbbe5939f239efad25a5485529381a3
2020-07-08 18:41:30 -04:00
Chris Freehill c2ef9a6879 Fix docs + cmake_utils path issues
This corrects issues that arose after OAM reorganization.
It should address SWDEV-243294.

Also, fix some compile warnings that show up on RHEL.

Change-Id: Id14d444905da35cd7346bcfbcd82b6d0572708c4
2020-07-08 09:47:25 -05:00
Chris Freehill 866438966d Quiet spurious pthread_unlock warnings
A message is output in debug builds when pthread_unlock
returns an error. However, in most cases, it should return
EPERM. In fact, if it doesn't return EPERM, it is an
indication of a problem. This commit adjusts accordingly.

Change-Id: Ia5cad89aa6e68e79c1291ea21adffb0fa68f2300
2020-06-30 15:12:58 -05:00
Divya Shikre e21232f059 OAM: Implement get_sensors_info()
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ia2c6e18f463c0f97530ca8ad07d249e6f2116534
2020-06-29 14:50:19 -04:00
Amber Lin 27deaea6e8 OAM: Add get dev and pci properties and sensor count
Also, add amdoam_get_error_description.

On behalf of
Amber Lin <Amber.Lin@amd.com> and
Divya Shikre <DivyaUday.Shikre@amd.com>

Change-Id: I1f5ac0c5948adb2c30008e95c501e8b69b8183b6
2020-06-23 17:21:07 -05:00
Chris Freehill 6594f8f58b Refactor rsmi to support oam
Change-Id: Idc524e01ba06eb5c8d1682becaf5bf8ced5bffcf
2020-06-22 18:51:46 -05:00
Chris Freehill 59394f3354 Ensure no device mutexes are left held on shut_down
Also, fix TestMutualExclusion and TestEvtNofifReadWrite.
Previously, some of the normal SetUp function was not
being done for this test. In some cases, no DRM
devices are being found on the test machine. Skip
those.

Change-Id: Iaa5a257841eb459aa57491ae9680c34a60d5ac2b
2020-06-19 13:59:20 -05:00
Mike Li 488bbb668a Add support to retrieve XGMI hive id
Change-Id: I1eee05dd85ecb856889d1cfe0565454d2f538856
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2020-06-19 07:35:23 -07:00
Chris Freehill 9e0ebb250c Fix line endings for init_shutdown_refcount.*
* Also, add assert that check for proper usage of
rand_sleep_mod().

Change-Id: Ieb4179e1ad12fbbf85c2e4f7c7f119b0bb30b197
2020-06-17 21:26:12 -05:00
Chris Freehill efc9b7658c Make verbosity level 0 completely quiet
Also, support --iterations flag for certain functions that will
likely be repeated frequently.

Change-Id: I7ed76835001b5cbca30042d6bf26484258c7b9a6
2020-06-17 21:26:12 -05:00
Divya Shikre 2805ed16a4 Adding current voltage feature & gtest.
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: Ic555a3af265e603419e2875d1989a366abc82596
2020-06-16 11:48:56 -04:00
Chris Freehill 8e6f7c798d Don't automatically overwrite manual .pdf file
Automatically updating the manual pdf file causes a local
git change. This messes up "repo sync" calls because of the
local change. Instead, just write an un-tracked file that can
be used to update the tracked version of the manual .pdf.

Change-Id: Icd7edc244df60728ec169c5aa1cf8b322ca4143b
2020-06-12 14:19:15 -05:00
Chris Freehill f946ea37ef Update XGMI perf counter test to show utilization
Also:
* When destroying a counter, make sure to stop the counter first
* In the test, do not stop (disable) the counter before
  reading it.
* Clean up some whitespace in other tests
* Re-add manual pdf file

Change-Id: I0786ef3a994ca568299c77e44f092af8943ac33d
2020-06-10 12:49:49 -04:00
Kent Russell 8cf44548c0 Make an empty unique_id file non-fatal
This isn't supported on all models, so just comment out on failure
instead of fully failing

Change-Id: Id36d5df7c87abbb41f7b6be43abfea82004703a6
2020-06-04 10:31:53 -04:00
Mukul Joshi 633c852f5d Print VRAM usage in rsmitst
Print VRAM usage information in TestProcInfoRead.
Also, fix output formatting when running TestProcInfoRead.

Change-Id: I9efed808458ef4645145610f6f564f0f2baadea2
2020-05-29 15:48:06 -04:00
Chris Freehill 42c10633b6 Check only the minor ioctl version for event support
Change-Id: I70ddab4298a62178b2509a0365ee4cd6937302c1
2020-05-27 09:27:01 -04:00
Mukul Joshi e30ebbc787 Add support to retrieve process VRAM usage information.
Change-Id: I60843a99207a658022a26aa346b79f91863833cf
2020-05-26 15:19:24 -04:00
Chris Freehill bdf22c1c9e Update README doc. build instructions
* Also, remove dependency of manual pdf on the README
file; they are independent of each other.

Change-Id: I1ab8c8c9adf6b78e5b4aab86ecdf4c46f3a6bf63
2020-05-21 09:10:08 -04:00