Ensure that the write index and ring buffer contents are visible
to the HW before sending the doorbell. The latter is a write-combined
MMIO store and must be ordered with prior cacehable non-MMIO stores.
Also be more explicit about memory semantics for doorbell stores.
Change-Id: Ie4d96a7ee2a507237a8dbe7705fdf234d62ce9ba
If we issue too many copy commands without syncing and wrapping happens,
we need to wait for the blits to be done before moving forward otherwise
we will overwrite the kernel args of the blits in flight.
Change-Id: I9a21e31ce07f8e8157ca38e96dc264ff47fd3639
Introducing tiling format for images, still using LINEAR for now.
Using the new KFD/Thunk API hsaKmtGetTileConfig API for the address library.
Change-Id: Ic0677429dd320eef09ab62dddaf9b2dd94c4f904
C11 atomics are not statically guaranteed to be lock free and so
may not be atomic with respect to atomic operations originating
outside the standard library, such as platform atomics.
C11 macros to statically discover always lock free operations
(ATOMIC_*_LOCK_FREE) do not cover uint64_t in GCC and
std::atomic<uint64_t> is not a type alias of any covered type.
All use of __atomic by atomic_helpers.h is statically checked to be
always lock free.
GCC builtin fencing does not appear to be strong enough for WC memory.
Added an option (enabled) to enforce consistency for WC memory on x64.
__sync builtin's were not used as they were declared legacy by GCC.
Added a strongly conservative option (ALWAYS_CONSERVATIVE) to enable
use of full memory fences in place of partial fences and compiler
driven processor specific optimization.
Change-Id: Id7aaaca626144070f58759f6a348cbee4612bbc0
Change hsa_code_object_serialize and hsa_code_object_deserialize to use memcpy instead of hsa_memory_copy since it is system->system copy
Change-Id: I329e270ae4e2fc25e177dc8080d93662ffb261ab
- Includes Sean's latest changes
- Cleanups/improvements
- Fixes for few bugs that crept over from previous releases
Change-Id: I839dc4895bf13ebd0afc8843424387a9fef667b0
The PM4 IB must have executable permission.
A second part of this fix concerns robustness when this is not the case.
This remains under investigation.
This fix will shortly be cleaned up in a refactoring pass to consolidate
calls to hsaKmtAllocMemory.
Change-Id: I326fe01949a77669e0b07c3cadc9fd44b8065055
Due to a misinterpretation of the HSA specification the microcode has,
until now, been responsible for ensuring a coherent view of the
amd_kernel_code_t object when acquire_fence_scope is set to agent or system.
To correct this the runtime must instead assume this responsibility.
Introduce GpuAgentInt::InvalidateCodeCaches to perform this operation
on-demand. Invoke this after code object allocation. Extend the Queue
implementations to support PM4 command submission, through which the
PM4 command ACQUIRE_MEM can be submitted to perform cache invalidation.
Submit through a runtime-managed queue shared with the blit implementation.
This change depends on microcode support and this is checked against the
running version. Older microcode builds will perform cache invalidation
themselves, so it is acceptable for this change to do nothing in that case.
Change-Id: I268dd2b83af3decdd9ad07430a81df8a2ecb6bd2
This option was disabled by default to address issues writing to stderr
in Windows applications. The lack of an error message for memory access
faults is confusing to users, however.
Enable the error message by default on Linux only.
Change-Id: I1f44ba42362f8874abdc7c8e63ddd54a855b5394
The runtime needs a queue on which to submit cache management commands.
Device-to-device blit copy already creates a queue unconditionally.
We can share this queue for both purposes.
This change restructures the BlitKernel interface to accept, rather than
create, a queue. GpuAgent creates queues as needed for both cache
management and blit compute.
Fix queue full detection in AcquireWriteIndex (<= vs <).
Change-Id: I61d0c6b9d04f2dba74872f0676ad791435778ba4
This is the first part of transitioning to the LLVM-based assembler.
SP3 is deprecated and all references to the library are removed.
Pending LLVM support, relevant shaders have been precompiled.
Change-Id: I7d44cef5ded1836c4a74b77881af5bea8803d2c1
On multi-node systems only the first CPU node was recognized in the
signal consumer list, causing fallback to non-interrupt signals.
Change-Id: I9bd0706bafbe046be9d7f210d05fa4cf1fcd16fa