Gráfico de commits

529 Commits

Autor SHA1 Mensaje Fecha
Sean Keely d2a50a0048 Allow disabling scratch reclaim.
Debug and RCCL NPI assist feature.

Change-Id: I2cb76f0a086fa341465df3ede26965ab713bc3b4
2019-11-20 02:41:58 -05:00
Sean Keely 35c1ffa863 Raise large scratch allocation limit for RCCL.
Temporary workaround for 2.10 release.  RCCL, compiler, or firmware
must be corrected and this code reverted before another ASIC release.

Change-Id: I27851353289b93df9acb72d28b8c6ccb9f7f7d7a
2019-11-20 02:41:27 -05:00
Sean Keely f62017f4a5 Correct clang paths for incremental builds.
We should be using bin/clang, not the build/lightining/bin/clang since
build/ is the project's internal build directory.  This patch corrects
this where possible.

However, lightning does not install all it's end user files under out/
so some headers can not be found anywhere in out/ in an incremental build.
This header (opencl-c.h) if fetched out of the lightning source tree if
necessary.

Change-Id: I083d8b27bb39dd615fba3bb0711a789318f95e77
2019-11-16 03:10:28 -06:00
Laurent Morichetti 5774d9162b Fix a typo in INSN_S_ENDPGM_OPCODE encoding.
The correct s_endpgm instruction encoding is 0xBF810000.

Change-Id: I03f304762dcaced5bf3fa4f069da7a0b287d1cd2
2019-11-12 11:54:17 -08:00
Ramesh Errabolu 086b4b686c Fix Aql Header initialization
Change-Id: If23c12b28d22fb62b673038890f40bb0f3c3948b
2019-11-06 16:13:58 -05:00
Qingchuan Shi 16a20cfb8c Adding code object list in loader.
Change-Id: Iab3541287bd56276fd32615ee59fcd590de84ca0
2019-10-30 20:31:51 -04:00
Jay Cornwall 78e754935c Merge debugger trap handler into ROCr trap handler
Debugger path is taken for (trap_id >= 3) and single step exceptions.
Other traps/exceptions behave as before.

Change-Id: I276c0eb69953709968353a57717ee017d22348a2
2019-10-30 13:56:06 -04:00
Sean Keely 851ee799c4 Correct strip command.
Strip should only apply to the output target library.  Symlinks
with .so endings which will be relocated during install will cause
strip to fail, aborting the build.

Change-Id: Ieb598c2cec5277d9d14c8afa88b91ca2c7f4412d
2019-10-30 01:24:43 -05:00
Sean Keely 6c3acda664 Adapt to new versioning.
Using branch point for count since last change since we don't
have questions answered on tags yet.

Removed unused CMake files.
Restructured CMake to use the cache rather than only commandline
and be ccmake & cmake-gui friendly.
Dependency search paths are added for the Repo tree layout.

Search paths still needed for install paths.

Simplified packaging.  hsa-ext-rocr-dev package and contents now
build from the package CMake rather than being 3 separate projects.

Not applying new version number or new install paths!

Change-Id: Ibea50dc8a6ab091e91857f78833f5379a4511547
2019-10-29 04:21:17 -05:00
Chris Freehill 53228ad819 Add some gfx1xxx targets
This is to fix 

Change-Id: I69a87884d8174733905e4c007cf0f19b5103482a
2019-10-16 09:01:22 -04:00
Jay Cornwall 906cd84186 Disable SDMA HDP flush on gfx10
Not currently functional, triggering SRBM write protection.

Change-Id: Ib0b832357e3df5a6a0d0b46648515ec9bd70f017
2019-09-14 14:08:47 -04:00
Jay Cornwall e0358d7dc2 Set MTYPE field in SDMA fence command on gfx10
This is the only SDMA command with an MTYPE field.

Change-Id: Ice146ace9c3e8e7aff038e1e004be73c070f48fe
2019-09-14 14:07:57 -04:00
Jay Cornwall 32a9a5dbb0 Add gfx1010, gfx1011, gfx1012 ELF types to loader
Change-Id: I23a1159fb10f60881ea6830ba13ee73bd373bfc9
2019-09-14 14:07:16 -04:00
Jay Cornwall 5b64fbd0e5 Implement code cache (SQC I$, SQC K$, TCP, GL1, GL2) invalidation for gfx10
Change-Id: I8b2a59118094fbb55e3f575fa9f79959d3725d7d
2019-09-14 14:06:31 -04:00
Jay Cornwall d1c5a079cd Add binary shaders for gfx10
Change-Id: Iaf586a15a2f2aebc266da5148aa8637b092c1002
2019-09-14 14:05:35 -04:00
Chris Freehill 0afe6618a6 Add gfx10,11,12 old to new name format conversion
Change-Id: I792c840d8d819d1d48f95fc4167b2e25c6beec23
2019-09-14 10:37:19 -04:00
Jay Cornwall f8d0ccd159 Support wave32/wave64 scratch allocations on gfx10
- Use new buffer resource descriptor layout
- Handle wave32 scratch allocation error from CP
- Make wavefront size a property of scratch allocation requests
- Repurpose wave64-specific amd_queue_t.scratch_workitem_byte_size field
- Clear index_stride field in V# on gfx10, calculated per-dispatch by CP

Change-Id: If2acdf6430772abd4d6a8c792fc8c11260764dda
2019-09-13 17:22:59 -04:00
Chris Freehill 08841faf4c Update addrlib to pull in gfx10
This is mostly un-edited from Perforce. We will make other required
edits in future commits.

Change-Id: I55a809f2f23f03d60e4dcd1fb947ad558e737027
2019-09-13 11:44:23 -04:00
Chris Freehill 0ec781478d Make gfx10 use OSS defined packet fields
Change-Id: Icf622c22a17005aaeafb24f80a414319bebb891f
2019-09-13 08:14:24 -04:00
Chris Freehill b104031628 Add gfx10 as a target ID
Change-Id: Ib9a78776af9f26ff9278a06b059cb8b7ee216ee2
2019-09-12 20:24:40 -05:00
Chris Freehill 6ebdad5896 Initial support for gfx1010, gfx1011, gfx1012
Change-Id: I9ec398070c85db08aea72947557c6e1b5f7d541d
2019-09-12 20:24:30 -05:00
Sean Keely f2599fccb6 Enable trap handler on APUs.
Change-Id: Ifdc8c2782498b3fbe238d773120d378c47918d07
2019-09-06 18:10:20 -04:00
Sean Keely f9d3796db8 Correct doorbell_queue_map allocation.
doorbell_queue_map should always be allocated or we will need to
add branches around all accesses.

Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6
2019-09-06 18:10:20 -04:00
Christian Sigg 00b0ee15b3 Add missing include to lazy_ptr.h
Change-Id: I5b061692a4ec6def631d7c3182e5b644b6b9c519
2019-09-05 02:44:27 -04:00
Christian Sigg 1f177cf9c2 Change #include of libelf.h from quote to angle.
Change-Id: Ie940ed0f78e95224e42978381c552861e6d58ee4
2019-09-05 02:43:54 -04:00
Christian Sigg 912c23a6d5 Adding missing includes to sdma_registers.h
Change-Id: Idb2a54f45c810508ae0ebac0ca12853df8025c7a
2019-09-04 20:15:13 -04:00
Sean Keely ec5ac95dce Remove sdma ts pool.
sdma end ts must be 256 bit aligned in oss 3.0 and prior.  Using
the ts pool requires copying into the signal and is a significant
performance penalty for small copies.

SharedSignal is 128 bytes due to alignment so can host the end ts.
Move sdma end ts into SharedSignal and remove ts pool and ts copy.

Change-Id: I7899bda36ebc9adcaad1d3a3d2b7a489857cc9e8
2019-08-29 20:24:05 -05:00
Sean Keely 5adb73fffd Allow default kernel to spin freely at first.
Impacts GPU_ONLY signal type latency when waiting for small operations.
Using this type improves total SDMA small copy performance by ~40% if
the signal is allowed to spin freely.

Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651
2019-08-29 02:46:56 -05:00
Sean Keely ea8c99f452 Correct copy completion signal handling.
Remove agent lookup in time stamp translation for IPC signals.  The copy
agent handle is not shared so does not need to be checked for cross
process use.  Cross process copy-timestamp read is illegal and continues
to deliver garbage.

Store the copy agent properly when doing CPU-CPU copies.

Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd
2019-08-29 02:46:56 -05:00
Sean Keely 8133563a93 Enable HDP flush for all gfx9+ clients.
ucode versions are per asic so not valid for feature enablement outside
of bringup/dev.  Feature is older than the latest ioctl change that
the thunk depends on so use of this patch with kernel packages that
don't contain the feature is not possible in a supported environment.

Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749
2019-08-29 02:46:56 -05:00
Sean Keely 4647a5454d Allow concurrent copies in blit kernel path.
Also removed an unnecessary cache flush in dependency barrier packet.

Change-Id: I573df3bdf0a10df0bcd78025672c44038f8091ff
2019-08-29 02:46:56 -05:00
Ramesh Errabolu 8864c188b4 Initial support for xgmi sdma queues
Change-Id: I1aee379c7b9eede5f4b913cf2f9af3abb32e5baa
2019-08-24 02:03:37 -04:00
Sean Keely 324e0e5e0a Correct ROCr library path in rocrtst.
Change-Id: I3624f37e256a0b61f55b1eb1ae48dabd87481b5f
2019-08-23 19:29:30 -04:00
Sean Keely f343f6706e Report PCIe domain number.
Adds HSA_AMD_AGENT_INFO_DOMAIN.

Change-Id: I2ffcae474e18b2fe5f962b499e02eb9dfe2e62cd
2019-08-23 19:28:37 -04:00
Ramesh Errabolu 3201f68f72 Update memory allocation guide in using pool apis
This is to allow allocations in system memory that exceed sizes
reported by a CPU device

Change-Id: I3d10d192aafcefbe4107f69b7c5e30bf7f836619
2019-08-23 14:55:40 -04:00
Konstantin Zhuravlyov 2275c74695 Loader: add basic logging abilities
- Enabled with env var LOADER_ENABLE_LOGGING=1

Change-Id: Ibdbb1b55ffddb7dc9c63e52fc9db3013409376a4
2019-08-21 13:29:15 -04:00
Jay Cornwall ad717d2e98 Support KFD interrupt protocol in second-level trap handler
If M0[23] is set then the driver will interpret the interrupt as a
debug event, rather than a signal event.

Clear M0 before sending the interrupt. All paths here are terminal so
it's not necessary to save/restore M0.

Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606
2019-08-08 15:16:15 -05:00
Ramesh Errabolu a043c6acbb Add override qualifier to CPU and GPU agent api
Change-Id: I930e29d671b5dc81dece6f910d611056a54d2c85
2019-08-06 18:13:26 -05:00
Ramesh Errabolu 4a0d50f415 Handle thread creation error correctly
Change-Id: Iaa8811e245aa20ac107aef104847df3e455518f1
2019-08-05 15:39:54 -04:00
Konstantin Zhuravlyov 7d8205548b Allow ccache enabled builds if -DROCM_CCACHE_BUILD=ON
Change-Id: Ie3ebb5d95af5fa55f11c9c88378ab29736538e25
2019-08-01 14:33:38 -04:00
Chris Freehill 04da198a31 Fix gfx908 build
Includes
-"Report SRAM ECC errors through the system event handler."
  (bf6af52892b1f677697309a7946f651bfe8e9061)
-"Fix async memory test; temporarily disable NUMA memory test"
  (626f13a88b)
-Temporary disable of rocrtst

Change-Id: Ide9fc999b01ab810f00a56fc3733d07be45117c7
2019-07-23 07:37:40 -04:00
Chris Freehill 6588165de1 gfx908 loader/isa related changes
Change-Id: I638d4b2b300ac5a99d4d31d4fadcfe9e1e3c7748
2019-07-23 03:41:27 -04:00
Chris Freehill 2c15bcac9d Add ISAREG entry for gfx908 for ECC not supported
* Also, re-enable rocrtst

Change-Id: I70106c5a1788818387e46f240d577cbe59bc89f4
2019-07-22 21:50:09 -04:00
Chris Freehill 447a30e985 Initial gfx908 updates
Change-Id: I3d6307d6613a38861a95561b9ac68abaa5964b48
2019-07-22 17:25:06 -04:00
Sean Keely 0721dfd2e7 Update README build instructions.
Change-Id: I595e629117adfb44afb2e829d1f975782238277e
2019-07-19 14:17:47 -04:00
Sean Keely 4fafdcb00c Add deallocation callback test to rocrtst.
Change-Id: Ia20abd8f1f64213eea0c3c1c771cc229cf38fd5d
2019-07-19 14:17:21 -04:00
Sean Keely 6e07bc8dc4 Adjust agentOwner in pointer info queries for locked memory.
agentOwner from thunk reflects the GPU which holds the device alias.
We need to return a CPU to better reflect that the memory is system memory.

Change-Id: I9233f8779a4bfd471f68dbbbce07ae4528412e18
2019-07-19 14:17:13 -04:00
Sean Keely 465a8eb40b PR from github user DiamondLovesYou.
Allow user specified profiles if the HSAIL note is not found.

Konstantin reviewed and approved.  HSAIL note is not generated by LLVM.

Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1
2019-07-16 13:55:38 -05:00
Ramesh Errabolu 4daee0c8a1 Allocate fine-grained regions for Gpu devices that are members of Hives
Change-Id: Ibbed393aeac691793845d16d2f3fe2c3e5a7ec40
2019-07-13 01:12:53 -04:00
Chris Freehill d699039284 Make build_rocrtst.sh build all target kernels by default
This will allow the default target list to be branch
specific.

Change-Id: If8ecc14e2b7fb5ed2eb25ab447480308d539b248
2019-07-05 19:30:07 -04:00