Граф коммитов

41 Коммитов

Автор SHA1 Сообщение Дата
Tony Gutierrez 6e3c375bf1 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
2025-04-23 15:53:29 -04:00
David Yat Sin aa2f98e6f9 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
2025-02-19 21:02:00 -05:00
David Yat Sin d90fbee9c4 rocr: find first dispatch pkt that needs scratch
On GPUs where EOP is handled in asic, the read_dispatch_id is not always
updated after each packet. Look for the first dispatch packet that needs
scratch memory before allocating scratch.

Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111fdafbe
2024-10-25 14:36:40 -04:00
Saleel Kudchadker 3baaa6e9c0 rocr: Allocate AQL queue on device memory
- Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device
memory.
- Before writing AQL packet header to the queue use an SFENCE to ensure
that there is no reodering of the writes over PCIE

Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d
2024-09-05 17:48:02 -04:00
David Yat Sin 1d1d402dcc Do not allow default mem_flags
Force mem_flags to be explicit passed in then calling Queue constructor
to avoid ambiguity with calls to Queue constructor trying to only pass
the agent_node_id.

Change-Id: Ib6fedcb9e52d6c9f35f9051dfa989343456ca368
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 12:19:32 -04:00
David Yat Sin d6d5786051 Adding queue information queries
New hsa_amd_queue_get_info API to support:

- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue

- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.

Change-Id: I98842131bcbdd08552649791a5d43e578a615808
2024-04-11 12:53:48 -04:00
David Yat Sin 721e56ef5c Extend ExecutePM4() to accept completion signal and fences
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.

Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada
2024-04-11 12:51:52 -04:00
David Yat Sin efe455c2fa Temporary: Set AllocateGTTAccess and node_id for MES
Temporary change to set the AllocateGTTAccess flag and node_id
on MES devices.

Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6
2024-03-29 19:38:19 +00:00
David Yat Sin fa317f8c41 Re-arrange and rename scratch elements that are used with main scratch
Change-Id: I4c1ff8cf4121a06b586fe49c70400226506bf95e
2023-12-04 15:05:22 +00:00
Tony Tye 7955fb01ec Make AqlPacket::string more robust
AqlPacket::string should check the packet type is in range of the array
used to print its name.

Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5
2023-10-18 12:54:36 -04:00
Tony Tye 395ad3b77b AQL packet header may need to be loaded atomically
An AQL packet header field is stored using an atomic release, and needs
to be read using atomic acquire if it may be written by another thread.

Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749
2023-10-18 12:54:36 -04:00
Tony Tye 23b4ce501d Add AMD_AQL_FORMAT_INTERCEPT_MARKER vendor packet
Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add
support to intercept queue to invoke a callback for these packets.

Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149
2023-10-18 12:54:36 -04:00
Shweta Khatri a2d0adf9be Correct evaluating condition to use logical AND
Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical
AND operator (&&) when evaluating AQL packet type

Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70
2023-07-21 15:36:48 -04:00
David Yat Sin c1e836b6ab Use paged memory for queues on MEC devices
MES devices need GART mappings and therefore need non-paged memory. But
using non-paged memory introduces performance regression where it can
take over 80 ms to see the signal changes if the memory is in the wrong
NUMA node. Currently, we cannot control NUMA affinity when allocating
non-paged memory. Using non-paged memory allocation only on devices that
have MES scheduler

Change-Id: Ib27fb01d75247aa4f2bb2aa4503c6af5a98afda0
2022-11-04 13:23:21 +00:00
Graham Sider 061aa04147 Make queue memory allocation non-paged
Non-paged allocation for queue memory necessary for binding wptr to
GART. Required to support usermode queue oversubscription with MES for
GFX11.

Adds AllocateNonPaged entry to MemoryRegion::AllocateEnum for clarity;
aliases AllocateIPC.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1a97a1820da26cf2433d9c237b2e6d2b0b8628b4
2022-08-04 11:21:00 -04:00
Graham Sider db1a13aa05 Clean up includes in queue.h
Formatting.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I141c8308d6b283b376035e21344629dc665289bb
2022-08-03 10:57:17 -04:00
Sean Keely 752cfd5ffd Adjust include paths for new header locations.
Thunk and rocm_smi_lib paths have been updated.

Change-Id: If2948172f8064dd992cbccbc2a80f9161ad4d457
2022-05-09 14:44:32 -04:00
Sean Keely 4b0c94cfe8 Silence Clang warning.
Clang warns about bitwise operators on bools.  Cast to int silences
the warning without introducing short circut logic.

Change-Id: I6e25138e1acf4a5562d3925ea5b2fcef3addb783
2021-10-14 23:56:58 -05:00
Sean Keely 4455250be1 Add HSA_CU_MASK
New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU.  hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.

A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.

Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03
2021-07-29 02:23:34 -05:00
Ramesh Errabolu f7350c6020 Update ROCr implementation of Queue ID
Change-Id: Iec48b1978e4d01563e71cfb58aed8f1bbc446443
2020-06-26 13:25:00 -05:00
Ramesh Errabolu fa13208698 Add rocr namespace to core header and impl files
Change-Id: I1e1b33f9bba1078d049bc19797889988c3e43360
2020-06-19 22:34:21 -04:00
Sean Keely ce19721c88 Update copyright date.
Change-Id: If4bf4c20cf051878bfe759080bb7345d884dd53d
2020-06-19 22:34:01 -04:00
Sean Keely 0a43a107b1 Initial GWS queue support.
Queues should transition to ref counting for all queues eventually.
That cleanup will be part of shared queue pooling support.

Change-Id: I217ff5d573156678b9559da6fb81baa8cd31c617
2019-12-09 21:21:17 -05:00
Sean Keely 8323b2e1d7 Add pooling for Signal ABI blocks (SharedSignal).
Makes better use of memory and greatly reduces mmap count.

Change-Id: Ib444cd1ccd144986adbcc7cec297a966e2c08bc7
2018-11-12 22:37:28 -06:00
Jay Cornwall e388a23344 Add hsa_amd_queue_set_priority extension function
Controls dispatch and wavefront scheduling arbitration across quees.

Change-Id: I498f4898b544f79b8fb8514bf7e789ca9da29462
2018-06-19 19:41:28 -05:00
Sean Keely 5f25619bb7 Enable large scratch on GFX8.
Ensure system release fence is set on GFX8 large scratch using packets.

Change-Id: I13cfdcd35969482ea6e95e0b352f5cb3a0454b86
2018-04-30 07:24:53 -04:00
Sean Keely b6f0248f53 Respect new memory model requirements at queue destroy.
Spec requires GPU release fences and CPU acquire fences at queue destroy.
Also update the recognized status codes.

Change-Id: If9166f5149f65417c7057ff7c0f69f6ac094d6ab
2018-04-04 08:13:00 -04:00
Sean Keely 6455a69b03 Fix bad casts in tools.
Also virtualize queue profiling enable.

Change-Id: I761b41269be3df7eb64a5914ee9951ed6b51bb04
2017-11-08 15:50:02 -05:00
Sean Keely f312a7386e Exception support for Queue.
Remove "zombie" queue state and report queue creation failure via
exceptions.  Make Shared object a final container and support array
objects with Shared.  Add message printing to hsa_exception in
debug builds.

Change-Id: I459f38c80846018acbf45538874e95f91dd6b195
2017-11-08 15:50:02 -05:00
Sean Keely 0c7dde2d1f Add queue intercept support to the runtime.
Queue intercept is exposed as two tools-only APIs via the API
intercept table.

Change-Id: Iac9602ed3143974d85c3569e9092295ad18037f8
2017-11-08 15:50:01 -05:00
Sean Keely bc0bd00746 Fix queue interception in tools.
1. Correct amd::AqlQueue::ExecutePM4 to support interception.
2. Minor fixes to AqlPacket and SoftCP.
3. Minimal change to disable interception of runtime internal queues.

Change-Id: I103fece2ebf9a188d27f01e61221c737405d7253
2017-07-12 16:39:43 -04:00
Kenny Ho 5b4df54b10 Revert "Implement memory fault analysis through context save area"
This reverts commit 75c9506f9d.

Change-Id: Ibf11b764b383b9be291f3009a30550e1a1e2d115
2017-06-14 14:21:53 -04:00
Jay Cornwall 75c9506f9d Implement memory fault analysis through context save area
When a fatal memory fault occurs the scheduler context-saves all queues
in the process and notifies the runtime through the memory event. The
saved state contains all GPR/LDS data at the moment of the fault.

Retrieve this state and present it to the user if HSA_DEBUG_FAULT is set
to "analyze" and the wavefront caused the fault. If amdgcn-capable objdump
is in the PATH invoke this to disassemble code around the PC.

Queue lifetime is now managed by the runtime to allow querying the
context save state for all active queues.

Change-Id: I6fee662fad1c4f9aa125bf5c53d7d0ea1ab32f95
2017-06-13 23:12:28 -04:00
Sean Keely 0e17cc2887 Allow reducing max occupancy (max scratch waves) when applications request large amounts of scratch.
Also emit error messages to stderr if no async queue error callback was registered and queue fault messages are enabled (on by default).
Queue fault messages are controlled with env key HSA_ENABLE_QUEUE_FAULT_MESSAGE.

Change-Id: I496487b8d048b83aa95b9784e92928211f167b17
2016-12-20 16:52:59 -06:00
Jay Cornwall 74f5aca93d Refactor: Consolidate calls to hsaKmtAllocMemory
Route all device-visible system memory allocations through system_allocator.

Change-Id: I5e90a1bf491e432678a6d8ab1f9f3770734cbda1
2016-08-24 23:57:19 -04:00
Sean Keely 54f1311e01 Update clang-format file to clang-format v3.8.
Format HSA v1.1 core updates.

Change-Id: I540b5c0e5b3ec7522b09c2e070167812b3f17769
2016-08-23 05:50:28 -05:00
Konstantin Zhuravlyov c2c993e0d8 Update code object/isa/loader to hsa v1.1
- Includes Sean's latest changes
- Cleanups/improvements
- Fixes for few bugs that crept over from previous releases

Change-Id: I839dc4895bf13ebd0afc8843424387a9fef667b0
2016-08-22 15:03:23 -04:00
Jay Cornwall f76577ae43 Invalidate caches after allocating a code object
Due to a misinterpretation of the HSA specification the microcode has,
until now, been responsible for ensuring a coherent view of the
amd_kernel_code_t object when acquire_fence_scope is set to agent or system.
To correct this the runtime must instead assume this responsibility.

Introduce GpuAgentInt::InvalidateCodeCaches to perform this operation
on-demand. Invoke this after code object allocation. Extend the Queue
implementations to support PM4 command submission, through which the
PM4 command ACQUIRE_MEM can be submitted to perform cache invalidation.
Submit through a runtime-managed queue shared with the blit implementation.

This change depends on microcode support and this is checked against the
running version. Older microcode builds will perform cache invalidation
themselves, so it is acceptable for this change to do nothing in that case.

Change-Id: I268dd2b83af3decdd9ad07430a81df8a2ecb6bd2
2016-08-02 13:30:55 -04:00
James Edwards (xN/A) TX 7d2bc9d113 Separate open source core runtime code from DK makefiles.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1250152]
2016-03-22 18:10:13 -05:00
James Edwards (xN/A) TX 7d1e6c3a57 Remove opensrc test files.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1249961]
2016-03-22 13:39:51 -05:00
James Edwards (xN/A) TX c9ffe0004e Check open source core runtime code into perforce. This includes license and README files.
[git-p4: depot-paths = "//depot/stg/hsa/drivers/hsa/runtime/": change = 1249136]
2016-03-20 15:39:40 -05:00