rocm-systems

Автор	SHA1	Сообщение	Дата
Chris Freehill	6ebdad5896	Initial support for gfx1010, gfx1011, gfx1012 Change-Id: I9ec398070c85db08aea72947557c6e1b5f7d541d	2019-09-12 20:24:30 -05:00
Sean Keely	f2599fccb6	Enable trap handler on APUs. Change-Id: Ifdc8c2782498b3fbe238d773120d378c47918d07	2019-09-06 18:10:20 -04:00
Sean Keely	f9d3796db8	Correct doorbell_queue_map allocation. doorbell_queue_map should always be allocated or we will need to add branches around all accesses. Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6	2019-09-06 18:10:20 -04:00
Christian Sigg	00b0ee15b3	Add missing include to lazy_ptr.h Change-Id: I5b061692a4ec6def631d7c3182e5b644b6b9c519	2019-09-05 02:44:27 -04:00
Christian Sigg	1f177cf9c2	Change #include of libelf.h from quote to angle. Change-Id: Ie940ed0f78e95224e42978381c552861e6d58ee4	2019-09-05 02:43:54 -04:00
Christian Sigg	912c23a6d5	Adding missing includes to sdma_registers.h Change-Id: Idb2a54f45c810508ae0ebac0ca12853df8025c7a	2019-09-04 20:15:13 -04:00
Sean Keely	ec5ac95dce	Remove sdma ts pool. sdma end ts must be 256 bit aligned in oss 3.0 and prior. Using the ts pool requires copying into the signal and is a significant performance penalty for small copies. SharedSignal is 128 bytes due to alignment so can host the end ts. Move sdma end ts into SharedSignal and remove ts pool and ts copy. Change-Id: I7899bda36ebc9adcaad1d3a3d2b7a489857cc9e8	2019-08-29 20:24:05 -05:00
Sean Keely	5adb73fffd	Allow default kernel to spin freely at first. Impacts GPU_ONLY signal type latency when waiting for small operations. Using this type improves total SDMA small copy performance by ~40% if the signal is allowed to spin freely. Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651	2019-08-29 02:46:56 -05:00
Sean Keely	ea8c99f452	Correct copy completion signal handling. Remove agent lookup in time stamp translation for IPC signals. The copy agent handle is not shared so does not need to be checked for cross process use. Cross process copy-timestamp read is illegal and continues to deliver garbage. Store the copy agent properly when doing CPU-CPU copies. Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd	2019-08-29 02:46:56 -05:00
Sean Keely	8133563a93	Enable HDP flush for all gfx9+ clients. ucode versions are per asic so not valid for feature enablement outside of bringup/dev. Feature is older than the latest ioctl change that the thunk depends on so use of this patch with kernel packages that don't contain the feature is not possible in a supported environment. Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749	2019-08-29 02:46:56 -05:00
Sean Keely	4647a5454d	Allow concurrent copies in blit kernel path. Also removed an unnecessary cache flush in dependency barrier packet. Change-Id: I573df3bdf0a10df0bcd78025672c44038f8091ff	2019-08-29 02:46:56 -05:00
Ramesh Errabolu	8864c188b4	Initial support for xgmi sdma queues Change-Id: I1aee379c7b9eede5f4b913cf2f9af3abb32e5baa	2019-08-24 02:03:37 -04:00
Sean Keely	f343f6706e	Report PCIe domain number. Adds HSA_AMD_AGENT_INFO_DOMAIN. Change-Id: I2ffcae474e18b2fe5f962b499e02eb9dfe2e62cd	2019-08-23 19:28:37 -04:00
Ramesh Errabolu	3201f68f72	Update memory allocation guide in using pool apis This is to allow allocations in system memory that exceed sizes reported by a CPU device Change-Id: I3d10d192aafcefbe4107f69b7c5e30bf7f836619	2019-08-23 14:55:40 -04:00
Konstantin Zhuravlyov	2275c74695	Loader: add basic logging abilities - Enabled with env var LOADER_ENABLE_LOGGING=1 Change-Id: Ibdbb1b55ffddb7dc9c63e52fc9db3013409376a4	2019-08-21 13:29:15 -04:00
Jay Cornwall	ad717d2e98	Support KFD interrupt protocol in second-level trap handler If M0[23] is set then the driver will interpret the interrupt as a debug event, rather than a signal event. Clear M0 before sending the interrupt. All paths here are terminal so it's not necessary to save/restore M0. Change-Id: Ibd85b8cc6f8556941f2308a2c3fa3c68702cd606	2019-08-08 15:16:15 -05:00
Ramesh Errabolu	a043c6acbb	Add override qualifier to CPU and GPU agent api Change-Id: I930e29d671b5dc81dece6f910d611056a54d2c85	2019-08-06 18:13:26 -05:00
Konstantin Zhuravlyov	7d8205548b	Allow ccache enabled builds if -DROCM_CCACHE_BUILD=ON Change-Id: Ie3ebb5d95af5fa55f11c9c88378ab29736538e25	2019-08-01 14:33:38 -04:00
Chris Freehill	6588165de1	gfx908 loader/isa related changes Change-Id: I638d4b2b300ac5a99d4d31d4fadcfe9e1e3c7748	2019-07-23 03:41:27 -04:00
Chris Freehill	2c15bcac9d	Add ISAREG entry for gfx908 for ECC not supported * Also, re-enable rocrtst Change-Id: I70106c5a1788818387e46f240d577cbe59bc89f4	2019-07-22 21:50:09 -04:00
Chris Freehill	447a30e985	Initial gfx908 updates Change-Id: I3d6307d6613a38861a95561b9ac68abaa5964b48	2019-07-22 17:25:06 -04:00
Sean Keely	0721dfd2e7	Update README build instructions. Change-Id: I595e629117adfb44afb2e829d1f975782238277e	2019-07-19 14:17:47 -04:00
Sean Keely	6e07bc8dc4	Adjust agentOwner in pointer info queries for locked memory. agentOwner from thunk reflects the GPU which holds the device alias. We need to return a CPU to better reflect that the memory is system memory. Change-Id: I9233f8779a4bfd471f68dbbbce07ae4528412e18	2019-07-19 14:17:13 -04:00
Sean Keely	465a8eb40b	PR from github user DiamondLovesYou. Allow user specified profiles if the HSAIL note is not found. Konstantin reviewed and approved. HSAIL note is not generated by LLVM. Change-Id: I40fbfbaedd6787b6a716507918f698d02007afe1	2019-07-16 13:55:38 -05:00
Ramesh Errabolu	4daee0c8a1	Allocate fine-grained regions for Gpu devices that are members of Hives Change-Id: Ibbed393aeac691793845d16d2f3fe2c3e5a7ec40	2019-07-13 01:12:53 -04:00
Jay Cornwall	ff8f439112	Handle traps, illegal instruction, memory violations through queue signal Report traps and fatal exceptions through a wavefront's amd_queue_t.queue_inactive_signal. Previously, only traps were reported and requireed the compiler to pass in the signal pointer in s[0:1]. The signal is obtained through a mapping from doorbell index to amd_queue_t*. The doorbell is fetched within a wavefront through the gfx9+ S_SENDMSG(MSG_GET_DOORBELL) instruction. Change-Id: I319b45f2e15dfcfe4db8f4065da1136e9539a42b	2019-07-01 22:59:41 -04:00
Jay Cornwall	6ed686ee29	Replace gfx9 SP3 trap handler with LLVM, fix IB_STS restore Assembler toolchains are moving from SP3 to LLVM. Replace trap handler source code with LLVM equivalent. Fix a trap issue with SQ_WAVE_IB_STS restore. Mostly harmless as all traps are currently considered fatal to the wavefront. Change-Id: Iacecd9dd31a1d96a083c8b8327f442f33c861f9f	2019-07-01 22:59:27 -04:00
Sean Keely	299874f17d	Initial support for deallocation callbacks. Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback to notify when HSA memory has been released. Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d	2019-06-26 04:12:17 -05:00
Evgeny	6c0aaa2773	aqlprofile api fix Change-Id: I2a710040422c7853ece5472ea776442b25d69dcb	2019-06-19 23:14:27 -04:00
Sean Keely	0c0e634458	PTHREAD_STACK_MIN may differ from system parameters. Restrict stack adjustment to non-default stack requests and allow stack growth within reason (20MB cutoff). Change-Id: I320280c711402ac29683e94c7246b7c32c797611	2019-06-17 21:04:17 -05:00
Sean Keely	4b22d24346	Revert to SystemClockCounter for HSA system time. CPUClockCounter is not NTP adjusted (CLOCK_MONOTONIC_RAW) so should be better for measurements. However, it is implemented with syscall while CLOCK_MONOTONIC is implemented via vDSO. The latency increase becomes significant when language layers make corresponding clock measurements. Reverting to CLOCK_MONOTONIC will reduce latency and allow small duration events to be measured at the cost of incorporating NTP frequency skew errors. NTP may adjust frequency by 500ppm so limits us to ~3 decimals in elapsed time. Change-Id: I920b9f707f47109d80d6c256c475638c03fb8d76	2019-06-17 21:07:26 -04:00
Sean Keely	bbb90bdfc9	Fix description of HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL. Description was inconsistent with itself and code. Existing behavior returns HSA_AMD_MEMORY_POOL_INFO_ACCESSIBLE_BY_ALL == true for system memory pools only and system memory pools do require hsa_amd_agents_allow_access. Change-Id: I64b287bff9fdb21688aa169296e410edf1b209b5	2019-06-11 01:45:22 -04:00
Evgeny	a06d96cef8	aqlprofile API: sdma blocks Change-Id: I619af8adc17706f808644180cdd5a5c785e052ec	2019-06-05 18:54:08 -05:00
Evgeny	1be9298f72	adding new trace API Change-Id: I6c83b5789f5a6cdbb574d041c40d5a47229c7f1a	2019-06-01 14:33:59 -04:00
Matt Arsenault	0016c6ce5b	Don't check VERSION_BUILD is defined Check if it is true or not. The string() call would define this to an empty string, which would pass. This would then leave a trailing - in the version string, which dpkg would error on during package installation. Change-Id: Ifb5fc15f5dde506e96bff7881a5d3f22d983406e	2019-05-29 11:09:31 -04:00
Sean Keely	22de0e7fb9	Allow hsa_status_string when HSA is closed. API is a stateless lookup of RO data and needed to interpret hsa_init error codes. Change-Id: If80cba2f697843d08e529da0f790acf3c37127a7	2019-05-24 22:40:03 -04:00
Sean Keely	9f81bdfbe1	Add exception and error safety for CreateThread. Change-Id: I82aaf64e039ca9614b4948deec1f87147f56279a	2019-05-24 22:39:55 -04:00
Matt Arsenault	22d29b55a4	Change include flag order Search the local src directories first. If using a system installed hsakmt, this would pick the installed hsa headers. Change-Id: I9746d6e9db1749a130e4d93e024556754a537083	2019-05-22 16:43:18 -07:00
Sean Keely	a913549190	Correct pthread join/detach handling. Joined threads can not be joined more than once nor can they be detached. Thread library wait and close allows multiple waits and separate close so this fixes the pthread implementation. Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65	2019-05-16 12:14:06 -05:00
Sean Keely	6e2a056e1b	Correlate errors for time stamps which predate process start. Small times may be given to time conversion if GPU clocks are used to accumulate elapsed time. Because HSA APIs deal in absolute time this leads to large conversion offsets of order system uptime. Variation in relative clock ratio estimation may be amplified in this case, destroying elapsed time measurements. This patch fixes the relative clock ratio used for times which predate the call to hsa_init. This correlates errors in such times allowing the elapsed time to be correctly computed. The effective maximum system uptime before elapsed time conversion becomes inaccurate is ~3.5 months. GPU event timestamps are good for process uptime of ~3.5 months. These are limited by double's mantissa precision. Change-Id: I48752ff354920439d91016d6f2b0c8ddfa60b445	2019-05-14 17:35:06 -04:00
Sean Keely	06376e726b	Expose HDP flush registers. Exposed via agent info query. Only valid if fine grain PCIe memory is enabled. Change-Id: Ib4770901592ec047276458926a947737f9b93bb5	2019-05-11 00:04:47 -04:00
Sean Keely	e89f9807f1	Patch from github. At the moment it is not possible to build ROCr with Clang. This is a spurious limitation. The present PR addresses it by guarding GCC only flags and by fixing some additional warnings that Clang triggers; one of said warnings did outline a rather interesting issue with math being done on void*s. - AlexVlx Void ptr arithmetic had already been fixed in amd-master branch. Change-Id: I5ee97e20b5c40b10dd73facecabe75f02ba46462	2019-04-29 16:17:24 -04:00
Felix Kuehling	0c6b9532d4	Use non-paged memory for IPC signals Non-paged memory can be IPC-shared even when HSA_USERPTR_FOR_PAGED_MEM is enabled. Change-Id: I8b1fa6d7a4a9327c78a77b3679697fbf55397093 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2019-04-29 09:20:11 -04:00
Sean Keely	1251842900	Don't create blits when copy profiling is enabled. Change-Id: I879827133957ee610c3381ea30c536ec7d10ffab	2019-04-18 20:00:02 -05:00
Jay Cornwall	56f280c8a7	Detect memory event through Flags field insetad of Failure KFD no longer reports MemoryAccessFault.Failure with retry fault implementation. ROCr ignores the memory event when Failure = 0. Use the Flags field instead, which will be non-zero when the event is triggered. Change-Id: Ie90799a303b0b2f1b476b20ffafdde79ae137182	2019-04-15 19:16:07 -05:00
Ramesh Errabolu	ba029ebe21	Remove instantiation of MemoryRegion for heap type SVM surfaced by ROCt Change-Id: Ib4ff7e7cabe9aacb811888aeb74f652dcb57f9e0	2019-04-10 18:33:07 -05:00
Konstantin Zhuravlyov	7001134757	Process symbols with 0 address Change-Id: I9ed943a8ccd3b103edd6aba8264c009d8cda29fa	2019-03-30 02:14:43 -04:00
Sean Keely	a535e18cc1	Add hsa_amd_memory_lock_to_pool. Makes malloc memory accessible to GPUs so that the memory has the capabilities of the pool it is locked to. This admits fine grained locked memory and reserves API space for any future special CPU pools. Change-Id: If8c3dd8582a43f19d3d36b3763c1a688cc419ef0	2019-03-29 01:09:21 -05:00
Sean Keely	9f7df6d6fe	Remove legacy memory fault event name. Change-Id: I3ad240482523409e1152548009aecf127e63bbfa	2019-03-28 15:25:25 -05:00
Sean Keely	e5de33dd9a	Fix void* arithmetic. GCC allows arithmetic on void* treating void as char. Clang and the language spec does not. Change-Id: I939f2432f276979bb81881406e10528597ac6001	2019-03-28 12:49:19 -05:00

... 17 18 19 20 21 ...

1261 Коммитов