Граф коммитов

361 Коммитов

Автор SHA1 Сообщение Дата
Chris Freehill f2023220fd Initial support for gfx1010, gfx1011, gfx1012
Change-Id: I9ec398070c85db08aea72947557c6e1b5f7d541d


[ROCm/ROCR-Runtime commit: 6ebdad5896]
2019-09-12 20:24:30 -05:00
Sean Keely 286cf8f732 Enable trap handler on APUs.
Change-Id: Ifdc8c2782498b3fbe238d773120d378c47918d07


[ROCm/ROCR-Runtime commit: f2599fccb6]
2019-09-06 18:10:20 -04:00
Sean Keely 9c6f904413 Correct doorbell_queue_map allocation.
doorbell_queue_map should always be allocated or we will need to
add branches around all accesses.

Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6


[ROCm/ROCR-Runtime commit: f9d3796db8]
2019-09-06 18:10:20 -04:00
Christian Sigg c28aadf5a8 Add missing include to lazy_ptr.h
Change-Id: I5b061692a4ec6def631d7c3182e5b644b6b9c519


[ROCm/ROCR-Runtime commit: 00b0ee15b3]
2019-09-05 02:44:27 -04:00
Christian Sigg e17c7e24d6 Change #include of libelf.h from quote to angle.
Change-Id: Ie940ed0f78e95224e42978381c552861e6d58ee4


[ROCm/ROCR-Runtime commit: 1f177cf9c2]
2019-09-05 02:43:54 -04:00
Christian Sigg dea46036d3 Adding missing includes to sdma_registers.h
Change-Id: Idb2a54f45c810508ae0ebac0ca12853df8025c7a


[ROCm/ROCR-Runtime commit: 912c23a6d5]
2019-09-04 20:15:13 -04:00
Sean Keely 4edf1a4cf1 Remove sdma ts pool.
sdma end ts must be 256 bit aligned in oss 3.0 and prior.  Using
the ts pool requires copying into the signal and is a significant
performance penalty for small copies.

SharedSignal is 128 bytes due to alignment so can host the end ts.
Move sdma end ts into SharedSignal and remove ts pool and ts copy.

Change-Id: I7899bda36ebc9adcaad1d3a3d2b7a489857cc9e8


[ROCm/ROCR-Runtime commit: ec5ac95dce]
2019-08-29 20:24:05 -05:00
Sean Keely b66e58a053 Allow default kernel to spin freely at first.
Impacts GPU_ONLY signal type latency when waiting for small operations.
Using this type improves total SDMA small copy performance by ~40% if
the signal is allowed to spin freely.

Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651


[ROCm/ROCR-Runtime commit: 5adb73fffd]
2019-08-29 02:46:56 -05:00
Sean Keely 919e5dc802 Correct copy completion signal handling.
Remove agent lookup in time stamp translation for IPC signals.  The copy
agent handle is not shared so does not need to be checked for cross
process use.  Cross process copy-timestamp read is illegal and continues
to deliver garbage.

Store the copy agent properly when doing CPU-CPU copies.

Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd


[ROCm/ROCR-Runtime commit: ea8c99f452]
2019-08-29 02:46:56 -05:00
Sean Keely e00fc1b0a2 Enable HDP flush for all gfx9+ clients.
ucode versions are per asic so not valid for feature enablement outside
of bringup/dev.  Feature is older than the latest ioctl change that
the thunk depends on so use of this patch with kernel packages that
don't contain the feature is not possible in a supported environment.

Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749


[ROCm/ROCR-Runtime commit: 8133563a93]
2019-08-29 02:46:56 -05:00
Sean Keely 92575fd1c5 Allow concurrent copies in blit kernel path.
Also removed an unnecessary cache flush in dependency barrier packet.

Change-Id: I573df3bdf0a10df0bcd78025672c44038f8091ff


[ROCm/ROCR-Runtime commit: 4647a5454d]
2019-08-29 02:46:56 -05:00
Ramesh Errabolu 08e994db50 Initial support for xgmi sdma queues
Change-Id: I1aee379c7b9eede5f4b913cf2f9af3abb32e5baa


[ROCm/ROCR-Runtime commit: 8864c188b4]
2019-08-24 02:03:37 -04:00
Sean Keely 5df1a8ae77 Report PCIe domain number.
Adds HSA_AMD_AGENT_INFO_DOMAIN.

Change-Id: I2ffcae474e18b2fe5f962b499e02eb9dfe2e62cd


[ROCm/ROCR-Runtime commit: f343f6706e]
2019-08-23 19:28:37 -04:00
Ramesh Errabolu 61b9d4e8b2 Update memory allocation guide in using pool apis
This is to allow allocations in system memory that exceed sizes
reported by a CPU device

Change-Id: I3d10d192aafcefbe4107f69b7c5e30bf7f836619


[ROCm/ROCR-Runtime commit: 3201f68f72]
2019-08-23 14:55:40 -04:00
Konstantin Zhuravlyov 2b9e13a56c Loader: add basic logging abilities
- Enabled with env var LOADER_ENABLE_LOGGING=1

Change-Id: Ibdbb1b55ffddb7dc9c63e52fc9db3013409376a4


[ROCm/ROCR-Runtime commit: 2275c74695]
2019-08-21 13:29:15 -04:00
Jay Cornwall 87c13b8a7d Support KFD interrupt protocol in second-level trap handler
If M0[23] is set then the driver will interpret the interrupt as a
debug event, rather than a signal event.

Clear M0 before sending the interrupt. All paths here are terminal so
it's not necessary to save/restore M0.

Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606


[ROCm/ROCR-Runtime commit: ad717d2e98]
2019-08-08 15:16:15 -05:00
Ramesh Errabolu 8f58e11c31 Add override qualifier to CPU and GPU agent api
Change-Id: I930e29d671b5dc81dece6f910d611056a54d2c85


[ROCm/ROCR-Runtime commit: a043c6acbb]
2019-08-06 18:13:26 -05:00
Konstantin Zhuravlyov 466099ec20 Allow ccache enabled builds if -DROCM_CCACHE_BUILD=ON
Change-Id: Ie3ebb5d95af5fa55f11c9c88378ab29736538e25


[ROCm/ROCR-Runtime commit: 7d8205548b]
2019-08-01 14:33:38 -04:00
Chris Freehill 3cd4461a7d gfx908 loader/isa related changes
Change-Id: I638d4b2b300ac5a99d4d31d4fadcfe9e1e3c7748


[ROCm/ROCR-Runtime commit: 6588165de1]
2019-07-23 03:41:27 -04:00
Chris Freehill 123dea7733 Add ISAREG entry for gfx908 for ECC not supported
* Also, re-enable rocrtst

Change-Id: I70106c5a1788818387e46f240d577cbe59bc89f4


[ROCm/ROCR-Runtime commit: 2c15bcac9d]
2019-07-22 21:50:09 -04:00
Chris Freehill a87ff82cad Initial gfx908 updates
Change-Id: I3d6307d6613a38861a95561b9ac68abaa5964b48


[ROCm/ROCR-Runtime commit: 447a30e985]
2019-07-22 17:25:06 -04:00
Sean Keely e2375b9328 Update README build instructions.
Change-Id: I595e629117adfb44afb2e829d1f975782238277e


[ROCm/ROCR-Runtime commit: 0721dfd2e7]
2019-07-19 14:17:47 -04:00
Sean Keely b66fecd12f Adjust agentOwner in pointer info queries for locked memory.
agentOwner from thunk reflects the GPU which holds the device alias.
We need to return a CPU to better reflect that the memory is system memory.

Change-Id: I9233f8779a4bfd471f68dbbbce07ae4528412e18


[ROCm/ROCR-Runtime commit: 6e07bc8dc4]
2019-07-19 14:17:13 -04:00
Sean Keely 49e70a3ef5 PR from github user DiamondLovesYou.
Allow user specified profiles if the HSAIL note is not found.

Konstantin reviewed and approved.  HSAIL note is not generated by LLVM.

Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1


[ROCm/ROCR-Runtime commit: 465a8eb40b]
2019-07-16 13:55:38 -05:00
Ramesh Errabolu 9364c7ac0e Allocate fine-grained regions for Gpu devices that are members of Hives
Change-Id: Ibbed393aeac691793845d16d2f3fe2c3e5a7ec40


[ROCm/ROCR-Runtime commit: 4daee0c8a1]
2019-07-13 01:12:53 -04:00
Jay Cornwall 60da601be4 Handle traps, illegal instruction, memory violations through queue signal
Report traps and fatal exceptions through a wavefront's
amd_queue_t.queue_inactive_signal. Previously, only traps were
reported and requireed the compiler to pass in the signal pointer
in s[0:1].

The signal is obtained through a mapping from doorbell index to
amd_queue_t*. The doorbell is fetched within a wavefront through
the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction.

Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b


[ROCm/ROCR-Runtime commit: ff8f439112]
2019-07-01 22:59:41 -04:00
Jay Cornwall 822d838eae Replace gfx9 SP3 trap handler with LLVM, fix IB_STS restore
Assembler toolchains are moving from SP3 to LLVM. Replace trap handler
source code with LLVM equivalent.

Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all
traps are currently considered fatal to the wavefront.

Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f


[ROCm/ROCR-Runtime commit: 6ed686ee29]
2019-07-01 22:59:27 -04:00
Sean Keely 872c359ba2 Initial support for deallocation callbacks.
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.

Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d


[ROCm/ROCR-Runtime commit: 299874f17d]
2019-06-26 04:12:17 -05:00
Evgeny 87cdf00d09 aqlprofile api fix
Change-Id: I2a710040422c7853ece5472ea776442b25d69dcb


[ROCm/ROCR-Runtime commit: 6c0aaa2773]
2019-06-19 23:14:27 -04:00
Sean Keely 5d5d40fcf9 PTHREAD_STACK_MIN may differ from system parameters.
Restrict stack adjustment to non-default stack requests and allow
stack growth within reason (20MB cutoff).

Change-Id: I320280c711402ac29683e94c7246b7c32c797611


[ROCm/ROCR-Runtime commit: 0c0e634458]
2019-06-17 21:04:17 -05:00
Sean Keely ca44cbb3d9 Revert to SystemClockCounter for HSA system time.
CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be 
better for measurements.  However, it is implemented with syscall while
CLOCK_MONOTONIC is implemented via vDSO.  The latency increase becomes
significant when language layers make corresponding clock measurements.
Reverting to CLOCK_MONOTONIC will reduce latency and allow small
duration events to be measured at the cost of incorporating NTP
frequency skew errors.  NTP may adjust frequency by 500ppm so limits us
to ~3 decimals in elapsed time.

Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76


[ROCm/ROCR-Runtime commit: 4b22d24346]
2019-06-17 21:07:26 -04:00
Sean Keely ba3ec88220 Fix description of HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL.
Description was inconsistent with itself and code.  Existing behavior
returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system
memory pools only and system memory pools do require hsa_amd_agents_allow_access.

Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5


[ROCm/ROCR-Runtime commit: bbb90bdfc9]
2019-06-11 01:45:22 -04:00
Evgeny e07cc81005 aqlprofile API: sdma blocks
Change-Id: I619af8adc17706f808644180cdd5a5c785e052ec


[ROCm/ROCR-Runtime commit: a06d96cef8]
2019-06-05 18:54:08 -05:00
Evgeny f3b7848904 adding new trace API
Change-Id: I6c83b5789f5a6cdbb574d041c40d5a47229c7f1a


[ROCm/ROCR-Runtime commit: 1be9298f72]
2019-06-01 14:33:59 -04:00
Matt Arsenault 1379fea626 Don't check VERSION_BUILD is defined
Check if it is true or not. The string() call would define this to an
empty string, which would pass. This would then leave a trailing -
in the version string, which dpkg would error on during package
installation.

Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e


[ROCm/ROCR-Runtime commit: 0016c6ce5b]
2019-05-29 11:09:31 -04:00
Sean Keely b754622b33 Allow hsa_status_string when HSA is closed.
API is a stateless lookup of RO data and needed to interpret
hsa_init error codes.

Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7


[ROCm/ROCR-Runtime commit: 22de0e7fb9]
2019-05-24 22:40:03 -04:00
Sean Keely b9c2754101 Add exception and error safety for CreateThread.
Change-Id: I82aaf64e039ca9614b4948deec1f87147f56279a


[ROCm/ROCR-Runtime commit: 9f81bdfbe1]
2019-05-24 22:39:55 -04:00
Matt Arsenault 0bf3b480ee Change include flag order
Search the local src directories first. If using a system
installed hsakmt, this would pick the installed hsa headers.

Change-Id: I9746d6e9db1749a130e4d93e024556754a537083


[ROCm/ROCR-Runtime commit: 22d29b55a4]
2019-05-22 16:43:18 -07:00
Sean Keely f336a19a0f Correct pthread join/detach handling.
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.

Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65


[ROCm/ROCR-Runtime commit: a913549190]
2019-05-16 12:14:06 -05:00
Sean Keely bdda9b4f0e Correlate errors for time stamps which predate process start.
Small times may be given to time conversion if GPU clocks are used to
accumulate elapsed time.  Because HSA APIs deal in absolute time this
leads to large conversion offsets of order system uptime.  Variation
in relative clock ratio estimation may be amplified in this case,
destroying elapsed time measurements.

This patch fixes the relative clock ratio used for times which predate
the call to hsa_init.  This correlates errors in such times allowing
the elapsed time to be correctly computed.

The effective maximum system uptime before elapsed time conversion becomes
inaccurate is ~3.5 months.  GPU event timestamps are good for process uptime
of ~3.5 months.  These are limited by double's mantissa precision.

Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445


[ROCm/ROCR-Runtime commit: 6e2a056e1b]
2019-05-14 17:35:06 -04:00
Sean Keely ec39134408 Expose HDP flush registers.
Exposed via agent info query.  Only valid if fine grain PCIe memory is enabled.

Change-Id: Ib4770901592ec047276458926a947737f9b93bb5


[ROCm/ROCR-Runtime commit: 06376e726b]
2019-05-11 00:04:47 -04:00
Sean Keely 5b71bc65b7 Patch from github.
At the moment it is not possible to build ROCr with Clang. This is
a spurious limitation. The present PR addresses it by guarding GCC
only flags and by fixing some additional warnings that Clang triggers;
one of said warnings did outline a rather interesting issue with math
being done on void*s. - AlexVlx

Void ptr arithmetic had already been fixed in amd-master branch.

Change-Id: I5ee97e20b5c40b10dd73facecabe75f02ba46462


[ROCm/ROCR-Runtime commit: e89f9807f1]
2019-04-29 16:17:24 -04:00
Felix Kuehling d810b66917 Use non-paged memory for IPC signals
Non-paged memory can be IPC-shared even when HSA_USERPTR_FOR_PAGED_MEM
is enabled.

Change-Id: I8b1fa6d7a4a9327c78a77b3679697fbf55397093
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 0c6b9532d4]
2019-04-29 09:20:11 -04:00
Sean Keely bdf4b84f82 Don't create blits when copy profiling is enabled.
Change-Id: I879827133957ee610c3381ea30c536ec7d10ffab


[ROCm/ROCR-Runtime commit: 1251842900]
2019-04-18 20:00:02 -05:00
Jay Cornwall 2c3c92208d Detect memory event through Flags field insetad of Failure
KFD no longer reports MemoryAccessFault.Failure with retry fault
implementation. ROCr ignores the memory event when Failure = 0.

Use the Flags field instead, which will be non-zero when the
event is triggered.

Change-Id: Ie90799a303b0b2f1b476b20ffafdde79ae137182


[ROCm/ROCR-Runtime commit: 56f280c8a7]
2019-04-15 19:16:07 -05:00
Ramesh Errabolu fd05ee66a7 Remove instantiation of MemoryRegion for heap type SVM surfaced by ROCt
Change-Id: Ib4ff7e7cabe9aacb811888aeb74f652dcb57f9e0


[ROCm/ROCR-Runtime commit: ba029ebe21]
2019-04-10 18:33:07 -05:00
Konstantin Zhuravlyov dde11e307d Process symbols with 0 address
Change-Id: I9ed943a8ccd3b103edd6aba8264c009d8cda29fa


[ROCm/ROCR-Runtime commit: 7001134757]
2019-03-30 02:14:43 -04:00
Sean Keely 59e91f0be8 Add hsa_amd_memory_lock_to_pool.
Makes malloc memory accessible to GPUs so that the memory has the
capabilities of the pool it is locked to.
This admits fine grained locked memory and reserves API space for any future
special CPU pools.

Change-Id: If8c3dd8582a43f19d3d36b3763c1a688cc419ef0


[ROCm/ROCR-Runtime commit: a535e18cc1]
2019-03-29 01:09:21 -05:00
Sean Keely f819304f49 Remove legacy memory fault event name.
Change-Id: I3ad240482523409e1152548009aecf127e63bbfa


[ROCm/ROCR-Runtime commit: 9f7df6d6fe]
2019-03-28 15:25:25 -05:00
Sean Keely 6121ae4f6b Fix void* arithmetic.
GCC allows arithmetic on void* treating void as char.  Clang and
the language spec does not.

Change-Id: I939f2432f276979bb81881406e10528597ac6001


[ROCm/ROCR-Runtime commit: e5de33dd9a]
2019-03-28 12:49:19 -05:00