This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
On GPUs where EOP is handled in asic, the read_dispatch_id is not always
updated after each packet. Look for the first dispatch packet that needs
scratch memory before allocating scratch.
Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111fdafbe
- Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device
memory.
- Before writing AQL packet header to the queue use an SFENCE to ensure
that there is no reodering of the writes over PCIE
Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d
Force mem_flags to be explicit passed in then calling Queue constructor
to avoid ambiguity with calls to Queue constructor trying to only pass
the agent_node_id.
Change-Id: Ib6fedcb9e52d6c9f35f9051dfa989343456ca368
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
New hsa_amd_queue_get_info API to support:
- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue
- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.
Change-Id: I98842131bcbdd08552649791a5d43e578a615808
ExecutePM4() function can optionally accept extra arguments for
acquire fence scope, release fence scope andcompletion signal. When
a completion signal is provided, ExecutePM4() does not wait for the
commands to complete.
Change-Id: Ib2a433b7bce1cb6260be8b76fe902335bd5dfada
An AQL packet header field is stored using an atomic release, and needs
to be read using atomic acquire if it may be written by another thread.
Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749
Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add
support to intercept queue to invoke a callback for these packets.
Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149
Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical
AND operator (&&) when evaluating AQL packet type
Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70
MES devices need GART mappings and therefore need non-paged memory. But
using non-paged memory introduces performance regression where it can
take over 80 ms to see the signal changes if the memory is in the wrong
NUMA node. Currently, we cannot control NUMA affinity when allocating
non-paged memory. Using non-paged memory allocation only on devices that
have MES scheduler
Change-Id: Ib27fb01d75247aa4f2bb2aa4503c6af5a98afda0
Non-paged allocation for queue memory necessary for binding wptr to
GART. Required to support usermode queue oversubscription with MES for
GFX11.
Adds AllocateNonPaged entry to MemoryRegion::AllocateEnum for clarity;
aliases AllocateIPC.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1a97a1820da26cf2433d9c237b2e6d2b0b8628b4
Clang warns about bitwise operators on bools. Cast to int silences
the warning without introducing short circut logic.
Change-Id: I6e25138e1acf4a5562d3925ea5b2fcef3addb783
New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU. hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.
A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.
Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03
Queues should transition to ref counting for all queues eventually.
That cleanup will be part of shared queue pooling support.
Change-Id: I217ff5d573156678b9559da6fb81baa8cd31c617
Spec requires GPU release fences and CPU acquire fences at queue destroy.
Also update the recognized status codes.
Change-Id: If9166f5149f65417c7057ff7c0f69f6ac094d6ab
Remove "zombie" queue state and report queue creation failure via
exceptions. Make Shared object a final container and support array
objects with Shared. Add message printing to hsa_exception in
debug builds.
Change-Id: I459f38c80846018acbf45538874e95f91dd6b195
1. Correct amd::AqlQueue::ExecutePM4 to support interception.
2. Minor fixes to AqlPacket and SoftCP.
3. Minimal change to disable interception of runtime internal queues.
Change-Id: I103fece2ebf9a188d27f01e61221c737405d7253
When a fatal memory fault occurs the scheduler context-saves all queues
in the process and notifies the runtime through the memory event. The
saved state contains all GPR/LDS data at the moment of the fault.
Retrieve this state and present it to the user if HSA_DEBUG_FAULT is set
to "analyze" and the wavefront caused the fault. If amdgcn-capable objdump
is in the PATH invoke this to disassemble code around the PC.
Queue lifetime is now managed by the runtime to allow querying the
context save state for all active queues.
Change-Id: I6fee662fad1c4f9aa125bf5c53d7d0ea1ab32f95
Also emit error messages to stderr if no async queue error callback was registered and queue fault messages are enabled (on by default).
Queue fault messages are controlled with env key HSA_ENABLE_QUEUE_FAULT_MESSAGE.
Change-Id: I496487b8d048b83aa95b9784e92928211f167b17
- Includes Sean's latest changes
- Cleanups/improvements
- Fixes for few bugs that crept over from previous releases
Change-Id: I839dc4895bf13ebd0afc8843424387a9fef667b0
Due to a misinterpretation of the HSA specification the microcode has,
until now, been responsible for ensuring a coherent view of the
amd_kernel_code_t object when acquire_fence_scope is set to agent or system.
To correct this the runtime must instead assume this responsibility.
Introduce GpuAgentInt::InvalidateCodeCaches to perform this operation
on-demand. Invoke this after code object allocation. Extend the Queue
implementations to support PM4 command submission, through which the
PM4 command ACQUIRE_MEM can be submitted to perform cache invalidation.
Submit through a runtime-managed queue shared with the blit implementation.
This change depends on microcode support and this is checked against the
running version. Older microcode builds will perform cache invalidation
themselves, so it is acceptable for this change to do nothing in that case.
Change-Id: I268dd2b83af3decdd9ad07430a81df8a2ecb6bd2