Graf Tiomantas

507 Tiomáintí

Údar SHA1 Teachtaireacht Dáta
Chris Freehill f2023220fd Initial support for gfx1010, gfx1011, gfx1012
Change-Id: I9ec398070c85db08aea72947557c6e1b5f7d541d


[ROCm/ROCR-Runtime commit: 6ebdad5896]
2019-09-12 20:24:30 -05:00
Sean Keely 286cf8f732 Enable trap handler on APUs.
Change-Id: Ifdc8c2782498b3fbe238d773120d378c47918d07


[ROCm/ROCR-Runtime commit: f2599fccb6]
2019-09-06 18:10:20 -04:00
Sean Keely 9c6f904413 Correct doorbell_queue_map allocation.
doorbell_queue_map should always be allocated or we will need to
add branches around all accesses.

Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6


[ROCm/ROCR-Runtime commit: f9d3796db8]
2019-09-06 18:10:20 -04:00
Christian Sigg c28aadf5a8 Add missing include to lazy_ptr.h
Change-Id: I5b061692a4ec6def631d7c3182e5b644b6b9c519


[ROCm/ROCR-Runtime commit: 00b0ee15b3]
2019-09-05 02:44:27 -04:00
Christian Sigg e17c7e24d6 Change #include of libelf.h from quote to angle.
Change-Id: Ie940ed0f78e95224e42978381c552861e6d58ee4


[ROCm/ROCR-Runtime commit: 1f177cf9c2]
2019-09-05 02:43:54 -04:00
Christian Sigg dea46036d3 Adding missing includes to sdma_registers.h
Change-Id: Idb2a54f45c810508ae0ebac0ca12853df8025c7a


[ROCm/ROCR-Runtime commit: 912c23a6d5]
2019-09-04 20:15:13 -04:00
Sean Keely 4edf1a4cf1 Remove sdma ts pool.
sdma end ts must be 256 bit aligned in oss 3.0 and prior.  Using
the ts pool requires copying into the signal and is a significant
performance penalty for small copies.

SharedSignal is 128 bytes due to alignment so can host the end ts.
Move sdma end ts into SharedSignal and remove ts pool and ts copy.

Change-Id: I7899bda36ebc9adcaad1d3a3d2b7a489857cc9e8


[ROCm/ROCR-Runtime commit: ec5ac95dce]
2019-08-29 20:24:05 -05:00
Sean Keely b66e58a053 Allow default kernel to spin freely at first.
Impacts GPU_ONLY signal type latency when waiting for small operations.
Using this type improves total SDMA small copy performance by ~40% if
the signal is allowed to spin freely.

Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651


[ROCm/ROCR-Runtime commit: 5adb73fffd]
2019-08-29 02:46:56 -05:00
Sean Keely 919e5dc802 Correct copy completion signal handling.
Remove agent lookup in time stamp translation for IPC signals.  The copy
agent handle is not shared so does not need to be checked for cross
process use.  Cross process copy-timestamp read is illegal and continues
to deliver garbage.

Store the copy agent properly when doing CPU-CPU copies.

Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd


[ROCm/ROCR-Runtime commit: ea8c99f452]
2019-08-29 02:46:56 -05:00
Sean Keely e00fc1b0a2 Enable HDP flush for all gfx9+ clients.
ucode versions are per asic so not valid for feature enablement outside
of bringup/dev.  Feature is older than the latest ioctl change that
the thunk depends on so use of this patch with kernel packages that
don't contain the feature is not possible in a supported environment.

Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749


[ROCm/ROCR-Runtime commit: 8133563a93]
2019-08-29 02:46:56 -05:00
Sean Keely 92575fd1c5 Allow concurrent copies in blit kernel path.
Also removed an unnecessary cache flush in dependency barrier packet.

Change-Id: I573df3bdf0a10df0bcd78025672c44038f8091ff


[ROCm/ROCR-Runtime commit: 4647a5454d]
2019-08-29 02:46:56 -05:00
Ramesh Errabolu 08e994db50 Initial support for xgmi sdma queues
Change-Id: I1aee379c7b9eede5f4b913cf2f9af3abb32e5baa


[ROCm/ROCR-Runtime commit: 8864c188b4]
2019-08-24 02:03:37 -04:00
Sean Keely 5e5b7fac71 Correct ROCr library path in rocrtst.
Change-Id: I3624f37e256a0b61f55b1eb1ae48dabd87481b5f


[ROCm/ROCR-Runtime commit: 324e0e5e0a]
2019-08-23 19:29:30 -04:00
Sean Keely 5df1a8ae77 Report PCIe domain number.
Adds HSA_AMD_AGENT_INFO_DOMAIN.

Change-Id: I2ffcae474e18b2fe5f962b499e02eb9dfe2e62cd


[ROCm/ROCR-Runtime commit: f343f6706e]
2019-08-23 19:28:37 -04:00
Ramesh Errabolu 61b9d4e8b2 Update memory allocation guide in using pool apis
This is to allow allocations in system memory that exceed sizes
reported by a CPU device

Change-Id: I3d10d192aafcefbe4107f69b7c5e30bf7f836619


[ROCm/ROCR-Runtime commit: 3201f68f72]
2019-08-23 14:55:40 -04:00
Konstantin Zhuravlyov 2b9e13a56c Loader: add basic logging abilities
- Enabled with env var LOADER_ENABLE_LOGGING=1

Change-Id: Ibdbb1b55ffddb7dc9c63e52fc9db3013409376a4


[ROCm/ROCR-Runtime commit: 2275c74695]
2019-08-21 13:29:15 -04:00
Jay Cornwall 87c13b8a7d Support KFD interrupt protocol in second-level trap handler
If M0[23] is set then the driver will interpret the interrupt as a
debug event, rather than a signal event.

Clear M0 before sending the interrupt. All paths here are terminal so
it's not necessary to save/restore M0.

Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606


[ROCm/ROCR-Runtime commit: ad717d2e98]
2019-08-08 15:16:15 -05:00
Ramesh Errabolu 8f58e11c31 Add override qualifier to CPU and GPU agent api
Change-Id: I930e29d671b5dc81dece6f910d611056a54d2c85


[ROCm/ROCR-Runtime commit: a043c6acbb]
2019-08-06 18:13:26 -05:00
Ramesh Errabolu 568620e42e Handle thread creation error correctly
Change-Id: Iaa8811e245aa20ac107aef104847df3e455518f1


[ROCm/ROCR-Runtime commit: 4a0d50f415]
2019-08-05 15:39:54 -04:00
Konstantin Zhuravlyov 466099ec20 Allow ccache enabled builds if -DROCM_CCACHE_BUILD=ON
Change-Id: Ie3ebb5d95af5fa55f11c9c88378ab29736538e25


[ROCm/ROCR-Runtime commit: 7d8205548b]
2019-08-01 14:33:38 -04:00
Chris Freehill 3cd4461a7d gfx908 loader/isa related changes
Change-Id: I638d4b2b300ac5a99d4d31d4fadcfe9e1e3c7748


[ROCm/ROCR-Runtime commit: 6588165de1]
2019-07-23 03:41:27 -04:00
Chris Freehill 123dea7733 Add ISAREG entry for gfx908 for ECC not supported
* Also, re-enable rocrtst

Change-Id: I70106c5a1788818387e46f240d577cbe59bc89f4


[ROCm/ROCR-Runtime commit: 2c15bcac9d]
2019-07-22 21:50:09 -04:00
Chris Freehill a87ff82cad Initial gfx908 updates
Change-Id: I3d6307d6613a38861a95561b9ac68abaa5964b48


[ROCm/ROCR-Runtime commit: 447a30e985]
2019-07-22 17:25:06 -04:00
Sean Keely e2375b9328 Update README build instructions.
Change-Id: I595e629117adfb44afb2e829d1f975782238277e


[ROCm/ROCR-Runtime commit: 0721dfd2e7]
2019-07-19 14:17:47 -04:00
Sean Keely 3959e99131 Add deallocation callback test to rocrtst.
Change-Id: Ia20abd8f1f64213eea0c3c1c771cc229cf38fd5d


[ROCm/ROCR-Runtime commit: 4fafdcb00c]
2019-07-19 14:17:21 -04:00
Sean Keely b66fecd12f Adjust agentOwner in pointer info queries for locked memory.
agentOwner from thunk reflects the GPU which holds the device alias.
We need to return a CPU to better reflect that the memory is system memory.

Change-Id: I9233f8779a4bfd471f68dbbbce07ae4528412e18


[ROCm/ROCR-Runtime commit: 6e07bc8dc4]
2019-07-19 14:17:13 -04:00
Sean Keely 49e70a3ef5 PR from github user DiamondLovesYou.
Allow user specified profiles if the HSAIL note is not found.

Konstantin reviewed and approved.  HSAIL note is not generated by LLVM.

Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1


[ROCm/ROCR-Runtime commit: 465a8eb40b]
2019-07-16 13:55:38 -05:00
Ramesh Errabolu 9364c7ac0e Allocate fine-grained regions for Gpu devices that are members of Hives
Change-Id: Ibbed393aeac691793845d16d2f3fe2c3e5a7ec40


[ROCm/ROCR-Runtime commit: 4daee0c8a1]
2019-07-13 01:12:53 -04:00
Chris Freehill 290dfd785f Make build_rocrtst.sh build all target kernels by default
This will allow the default target list to be branch
specific.

Change-Id: If8ecc14e2b7fb5ed2eb25ab447480308d539b248


[ROCm/ROCR-Runtime commit: d699039284]
2019-07-05 19:30:07 -04:00
Jay Cornwall 60da601be4 Handle traps, illegal instruction, memory violations through queue signal
Report traps and fatal exceptions through a wavefront's
amd_queue_t.queue_inactive_signal. Previously, only traps were
reported and requireed the compiler to pass in the signal pointer
in s[0:1].

The signal is obtained through a mapping from doorbell index to
amd_queue_t*. The doorbell is fetched within a wavefront through
the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction.

Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b


[ROCm/ROCR-Runtime commit: ff8f439112]
2019-07-01 22:59:41 -04:00
Jay Cornwall 822d838eae Replace gfx9 SP3 trap handler with LLVM, fix IB_STS restore
Assembler toolchains are moving from SP3 to LLVM. Replace trap handler
source code with LLVM equivalent.

Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all
traps are currently considered fatal to the wavefront.

Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f


[ROCm/ROCR-Runtime commit: 6ed686ee29]
2019-07-01 22:59:27 -04:00
Chris Freehill 970cca3731 Temporarily disable Debug test
Change-Id: Iabb238fcd78b9c2eb0c085b19ab93b8c9e538140


[ROCm/ROCR-Runtime commit: 8caa6c0b01]
2019-06-29 04:55:35 -04:00
Sean Keely 872c359ba2 Initial support for deallocation callbacks.
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.

Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d


[ROCm/ROCR-Runtime commit: 299874f17d]
2019-06-26 04:12:17 -05:00
Chris Freehill 9d70b6a420 rocrtst fixes for hsa_signal cleanup and aql packet dispatch
In several places aql packets were written to queue all at once
instead of doing the header atomically. These cases have been
fixed.

There were a few hsa_signal leaked that have been addressed.

There was some duplication of code that has been addressed.

Addresses ROCMOPS-456

Change-Id: Ia1869bc370f92e49ac560301df47741d5f76978e


[ROCm/ROCR-Runtime commit: 081a2cc875]
2019-06-21 17:34:10 -05:00
Evgeny 87cdf00d09 aqlprofile api fix
Change-Id: I2a710040422c7853ece5472ea776442b25d69dcb


[ROCm/ROCR-Runtime commit: 6c0aaa2773]
2019-06-19 23:14:27 -04:00
Sean Keely 904723af7c Fix IPC related hangs/faults in rocrtst.
IPC was failing due to calling fork when HSA was open.  The fix
was correcting incomplete cleanup in several other tests.

TestBase::Close (via CommonCleanUp) now checks that HSA is properly
closed between tests.

rocrtstPerf.Memory_Async_Copy uses hwloc which uses OpenCL which
has no shutdown routine.  Consequently this test can not cleanup
properly.  I added a hack to force HSA refcount to the value
it should have if OpenCL were cleaning up but this leaks resources
and potentially puts hwloc & OpenCL in a bad state.

OpenCL loads LLVM which installs some exit handlers.  Those handlers
can't execute in a child process and can't be removed since OpenCL
doesn't cleanup.  IPC hacks around this by aborting rather than exiting
in the child process.

Change-Id: I92326a73d7b11632208717d99728e6dafdc7d3ca


[ROCm/ROCR-Runtime commit: bb980462e7]
2019-06-19 01:03:52 -04:00
Sean Keely 5d5d40fcf9 PTHREAD_STACK_MIN may differ from system parameters.
Restrict stack adjustment to non-default stack requests and allow
stack growth within reason (20MB cutoff).

Change-Id: I320280c711402ac29683e94c7246b7c32c797611


[ROCm/ROCR-Runtime commit: 0c0e634458]
2019-06-17 21:04:17 -05:00
Sean Keely ca44cbb3d9 Revert to SystemClockCounter for HSA system time.
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be 
better for measurements.  However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO.  The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors.  NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.

Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76


[ROCm/ROCR-Runtime commit: 4b22d24346]
2019-06-17 21:07:26 -04:00
Chris Freehill 74eb2440c3 Temporarily disable some failing tests
Change-Id: Iee713bb963db812c36ce2568aee2a4f8409c52e5


[ROCm/ROCR-Runtime commit: 259a1bac18]
2019-06-14 08:36:11 -05:00
Sean Keely ba3ec88220 Fix description of HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL.
Description was inconsistent with itself and code.  Existing behavior
returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system
memory pools only and system memory pools do require hsa_amd_agents_allow_access.

Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5


[ROCm/ROCR-Runtime commit: bbb90bdfc9]
2019-06-11 01:45:22 -04:00
Evgeny e07cc81005 aqlprofile API: sdma blocks
Change-Id: I619af8adc17706f808644180cdd5a5c785e052ec


[ROCm/ROCR-Runtime commit: a06d96cef8]
2019-06-05 18:54:08 -05:00
Evgeny f3b7848904 adding new trace API
Change-Id: I6c83b5789f5a6cdbb574d041c40d5a47229c7f1a


[ROCm/ROCR-Runtime commit: 1be9298f72]
2019-06-01 14:33:59 -04:00
Matt Arsenault 1379fea626 Don't check VERSION_BUILD is defined
Check if it is true or not. The string() call would define this to an
empty string, which would pass. This would then leave a trailing -
in the version string, which dpkg would error on during package
installation.

Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e


[ROCm/ROCR-Runtime commit: 0016c6ce5b]
2019-05-29 11:09:31 -04:00
Sean Keely b754622b33 Allow hsa_status_string when HSA is closed.
API is a stateless lookup of RO data and needed to interpret
hsa_init error codes.

Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7


[ROCm/ROCR-Runtime commit: 22de0e7fb9]
2019-05-24 22:40:03 -04:00
Sean Keely b9c2754101 Add exception and error safety for CreateThread.
Change-Id: I82aaf64e039ca9614b4948deec1f87147f56279a


[ROCm/ROCR-Runtime commit: 9f81bdfbe1]
2019-05-24 22:39:55 -04:00
Matt Arsenault 0bf3b480ee Change include flag order
Search the local src directories first. If using a system
installed hsakmt, this would pick the installed hsa headers.

Change-Id: I9746d6e9db1749a130e4d93e024556754a537083


[ROCm/ROCR-Runtime commit: 22d29b55a4]
2019-05-22 16:43:18 -07:00
Sean Keely f336a19a0f Correct pthread join/detach handling.
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.

Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65


[ROCm/ROCR-Runtime commit: a913549190]
2019-05-16 12:14:06 -05:00
Sean Keely bdda9b4f0e Correlate errors for time stamps which predate process start.
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time.  Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime.  Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.

This patch fixes the relative clock ratio used for times which predate
the call to hsa_init.  This correlates errors in such times allowing
the elapsed time to be correctly computed.

The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months.  GPU event timestamps are good for process uptime
of ~3.5 months.  These are limited by double's mantissa precision.

Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445


[ROCm/ROCR-Runtime commit: 6e2a056e1b]
2019-05-14 17:35:06 -04:00
Sean Keely ec39134408 Expose HDP flush registers.
Exposed via agent info query.  Only valid if fine grain PCIe memory is enabled.

Change-Id: Ib4770901592ec047276458926a947737f9b93bb5


[ROCm/ROCR-Runtime commit: 06376e726b]
2019-05-11 00:04:47 -04:00
Sean Keely 5b71bc65b7 Patch from github.
At the moment it is not possible to build ROCr with Clang. This is
a spurious limitation. The present PR addresses it by guarding GCC
only flags and by fixing some additional warnings that Clang triggers;
one of said warnings did outline a rather interesting issue with math
being done on void*s. - AlexVlx

Void ptr arithmetic had already been fixed in amd-master branch.

Change-Id: I5ee97e20b5c40b10dd73facecabe75f02ba46462


[ROCm/ROCR-Runtime commit: e89f9807f1]
2019-04-29 16:17:24 -04:00