rocm-systems

Автор	SHA1	Сообщение	Дата
Tony Gutierrez	6e3c375bf1	rocr: Flags to alloc queue buf/struct in dev mem This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. Flags are added to support the various use cases.	2025-04-23 15:53:29 -04:00
David Yat Sin	aa2f98e6f9	rocr: Update for new async scratch reclaim Updating ROCr code to match new handshake protocol with CP FW for asynchronous scratch reclaim. Increase previous limits when scratch reclaim feature is available.	2025-02-19 21:02:00 -05:00
David Yat Sin	d90fbee9c4	rocr: find first dispatch pkt that needs scratch On GPUs where EOP is handled in asic, the read_dispatch_id is not always updated after each packet. Look for the first dispatch packet that needs scratch memory before allocating scratch. Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111fdafbe	2024-10-25 14:36:40 -04:00
Saleel Kudchadker	3baaa6e9c0	rocr: Allocate AQL queue on device memory - Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device memory. - Before writing AQL packet header to the queue use an SFENCE to ensure that there is no reodering of the writes over PCIE Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d	2024-09-05 17:48:02 -04:00
David Yat Sin	1d1d402dcc	Do not allow default mem_flags Force mem_flags to be explicit passed in then calling Queue constructor to avoid ambiguity with calls to Queue constructor trying to only pass the agent_node_id. Change-Id: Ib6fedcb9e52d6c9f35f9051dfa989343456ca368 Signed-off-by: David Yat Sin <David.YatSin@amd.com>	2024-08-19 12:19:32 -04:00
David Yat Sin	d6d5786051	Adding queue information queries New hsa_amd_queue_get_info API to support: - HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue - HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue completion signal. Change-Id: I98842131bcbdd08552649791a5d43e578a615808	2024-04-11 12:53:48 -04:00
David Yat Sin	721e56ef5c	Extend ExecutePM4() to accept completion signal and fences ExecutePM4() function can optionally accept extra arguments for acquire fence scope, release fence scope andcompletion signal. When a completion signal is provided, ExecutePM4() does not wait for the commands to complete. Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada	2024-04-11 12:51:52 -04:00
David Yat Sin	efe455c2fa	Temporary: Set AllocateGTTAccess and node_id for MES Temporary change to set the AllocateGTTAccess flag and node_id on MES devices. Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6	2024-03-29 19:38:19 +00:00
David Yat Sin	fa317f8c41	Re-arrange and rename scratch elements that are used with main scratch Change-Id: I4c1ff8cf4121a06b586fe49c70400226506bf95e	2023-12-04 15:05:22 +00:00
Tony Tye	7955fb01ec	Make AqlPacket::string more robust AqlPacket::string should check the packet type is in range of the array used to print its name. Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5	2023-10-18 12:54:36 -04:00
Tony Tye	395ad3b77b	AQL packet header may need to be loaded atomically An AQL packet header field is stored using an atomic release, and needs to be read using atomic acquire if it may be written by another thread. Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749	2023-10-18 12:54:36 -04:00
Tony Tye	23b4ce501d	Add AMD_AQL_FORMAT_INTERCEPT_MARKER vendor packet Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add support to intercept queue to invoke a callback for these packets. Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149	2023-10-18 12:54:36 -04:00
Shweta Khatri	a2d0adf9be	Correct evaluating condition to use logical AND Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical AND operator (&&) when evaluating AQL packet type Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70	2023-07-21 15:36:48 -04:00
David Yat Sin	c1e836b6ab	Use paged memory for queues on MEC devices MES devices need GART mappings and therefore need non-paged memory. But using non-paged memory introduces performance regression where it can take over 80 ms to see the signal changes if the memory is in the wrong NUMA node. Currently, we cannot control NUMA affinity when allocating non-paged memory. Using non-paged memory allocation only on devices that have MES scheduler Change-Id: Ib27fb01d75247aa4f2bb2aa4503c6af5a98afda0	2022-11-04 13:23:21 +00:00
Graham Sider	061aa04147	Make queue memory allocation non-paged Non-paged allocation for queue memory necessary for binding wptr to GART. Required to support usermode queue oversubscription with MES for GFX11. Adds AllocateNonPaged entry to MemoryRegion::AllocateEnum for clarity; aliases AllocateIPC. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: I1a97a1820da26cf2433d9c237b2e6d2b0b8628b4	2022-08-04 11:21:00 -04:00
Graham Sider	db1a13aa05	Clean up includes in queue.h Formatting. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: I141c8308d6b283b376035e21344629dc665289bb	2022-08-03 10:57:17 -04:00
Sean Keely	752cfd5ffd	Adjust include paths for new header locations. Thunk and rocm_smi_lib paths have been updated. Change-Id: If2948172f8064dd992cbccbc2a80f9161ad4d457	2022-05-09 14:44:32 -04:00
Sean Keely	4b0c94cfe8	Silence Clang warning. Clang warns about bitwise operators on bools. Cast to int silences the warning without introducing short circut logic. Change-Id: I6e25138e1acf4a5562d3925ea5b2fcef3addb783	2021-10-14 23:56:58 -05:00
Sean Keely	4455250be1	Add HSA_CU_MASK New environment variable HSA_CU_MASK allows users to specify a cu mask to every queue allocated from any GPU. hsa_amd_queue_cu_set_mask is restricted from escaping this mask. A new API hsa_amd_queue_cu_get_mask is added to query the current cu mask. Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03	2021-07-29 02:23:34 -05:00
Ramesh Errabolu	f7350c6020	Update ROCr implementation of Queue ID Change-Id: Iec48b1978e4d01563e71cfb58aed8f1bbc446443	2020-06-26 13:25:00 -05:00
Ramesh Errabolu	fa13208698	Add rocr namespace to core header and impl files Change-Id: I1e1b33f9bba1078d049bc19797889988c3e43360	2020-06-19 22:34:21 -04:00
Sean Keely	ce19721c88	Update copyright date. Change-Id: If4bf4c20cf051878bfe759080bb7345d884dd53d	2020-06-19 22:34:01 -04:00
Sean Keely	0a43a107b1	Initial GWS queue support. Queues should transition to ref counting for all queues eventually. That cleanup will be part of shared queue pooling support. Change-Id: I217ff5d573156678b9559da6fb81baa8cd31c617	2019-12-09 21:21:17 -05:00
Sean Keely	8323b2e1d7	Add pooling for Signal ABI blocks (SharedSignal). Makes better use of memory and greatly reduces mmap count. Change-Id: Ib444cd1ccd144986adbcc7cec297a966e2c08bc7	2018-11-12 22:37:28 -06:00
Jay Cornwall	e388a23344	Add hsa_amd_queue_set_priority extension function Controls dispatch and wavefront scheduling arbitration across quees. Change-Id: I498f4898b544f79b8fb8514bf7e789ca9da29462	2018-06-19 19:41:28 -05:00
Sean Keely	5f25619bb7	Enable large scratch on GFX8. Ensure system release fence is set on GFX8 large scratch using packets. Change-Id: I13cfdcd35969482ea6e95e0b352f5cb3a0454b86	2018-04-30 07:24:53 -04:00
Sean Keely	b6f0248f53	Respect new memory model requirements at queue destroy. Spec requires GPU release fences and CPU acquire fences at queue destroy. Also update the recognized status codes. Change-Id: If9166f5149f65417c7057ff7c0f69f6ac094d6ab	2018-04-04 08:13:00 -04:00
Sean Keely	6455a69b03	Fix bad casts in tools. Also virtualize queue profiling enable. Change-Id: I761b41269be3df7eb64a5914ee9951ed6b51bb04	2017-11-08 15:50:02 -05:00
Sean Keely	f312a7386e	Exception support for Queue. Remove "zombie" queue state and report queue creation failure via exceptions. Make Shared object a final container and support array objects with Shared. Add message printing to hsa_exception in debug builds. Change-Id: I459f38c80846018acbf45538874e95f91dd6b195	2017-11-08 15:50:02 -05:00
Sean Keely	0c7dde2d1f	Add queue intercept support to the runtime. Queue intercept is exposed as two tools-only APIs via the API intercept table. Change-Id: Iac9602ed3143974d85c3569e9092295ad18037f8	2017-11-08 15:50:01 -05:00
Sean Keely	bc0bd00746	Fix queue interception in tools. 1. Correct amd::AqlQueue::ExecutePM4 to support interception. 2. Minor fixes to AqlPacket and SoftCP. 3. Minimal change to disable interception of runtime internal queues. Change-Id: I103fece2ebf9a188d27f01e61221c737405d7253	2017-07-12 16:39:43 -04:00
Kenny Ho	5b4df54b10	Revert "Implement memory fault analysis through context save area" This reverts commit `75c9506f9d`. Change-Id: Ibf11b764b383b9be291f3009a30550e1a1e2d115	2017-06-14 14:21:53 -04:00
Jay Cornwall	75c9506f9d	Implement memory fault analysis through context save area When a fatal memory fault occurs the scheduler context-saves all queues in the process and notifies the runtime through the memory event. The saved state contains all GPR/LDS data at the moment of the fault. Retrieve this state and present it to the user if HSA_DEBUG_FAULT is set to "analyze" and the wavefront caused the fault. If amdgcn-capable objdump is in the PATH invoke this to disassemble code around the PC. Queue lifetime is now managed by the runtime to allow querying the context save state for all active queues. Change-Id: I6fee662fad1c4f9aa125bf5c53d7d0ea1ab32f95	2017-06-13 23:12:28 -04:00
Sean Keely	0e17cc2887	Allow reducing max occupancy (max scratch waves) when applications request large amounts of scratch. Also emit error messages to stderr if no async queue error callback was registered and queue fault messages are enabled (on by default). Queue fault messages are controlled with env key HSA_ENABLE_QUEUE_FAULT_MESSAGE. Change-Id: I496487b8d048b83aa95b9784e92928211f167b17	2016-12-20 16:52:59 -06:00
Jay Cornwall	74f5aca93d	Refactor: Consolidate calls to hsaKmtAllocMemory Route all device-visible system memory allocations through system_allocator. Change-Id: I5e90a1bf491e432678a6d8ab1f9f3770734cbda1	2016-08-24 23:57:19 -04:00
Sean Keely	54f1311e01	Update clang-format file to clang-format v3.8. Format HSA v1.1 core updates. Change-Id: I540b5c0e5b3ec7522b09c2e070167812b3f17769	2016-08-23 05:50:28 -05:00
Konstantin Zhuravlyov	c2c993e0d8	Update code object/isa/loader to hsa v1.1 - Includes Sean's latest changes - Cleanups/improvements - Fixes for few bugs that crept over from previous releases Change-Id: I839dc4895bf13ebd0afc8843424387a9fef667b0	2016-08-22 15:03:23 -04:00
Jay Cornwall	f76577ae43	Invalidate caches after allocating a code object Due to a misinterpretation of the HSA specification the microcode has, until now, been responsible for ensuring a coherent view of the amd_kernel_code_t object when acquire_fence_scope is set to agent or system. To correct this the runtime must instead assume this responsibility. Introduce GpuAgentInt::InvalidateCodeCaches to perform this operation on-demand. Invoke this after code object allocation. Extend the Queue implementations to support PM4 command submission, through which the PM4 command ACQUIRE_MEM can be submitted to perform cache invalidation. Submit through a runtime-managed queue shared with the blit implementation. This change depends on microcode support and this is checked against the running version. Older microcode builds will perform cache invalidation themselves, so it is acceptable for this change to do nothing in that case. Change-Id: I268dd2b83af3decdd9ad07430a81df8a2ecb6bd2	2016-08-02 13:30:55 -04:00
James Edwards (xN/A) TX	7d2bc9d113	Separate open source core runtime code from DK makefiles. [git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1250152]	2016-03-22 18:10:13 -05:00
James Edwards (xN/A) TX	7d1e6c3a57	Remove opensrc test files. [git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1249961]	2016-03-22 13:39:51 -05:00
James Edwards (xN/A) TX	c9ffe0004e	Check open source core runtime code into perforce. This includes license and README files. [git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1249136]	2016-03-20 15:39:40 -05:00

41 Коммитов