Gfx9 requires monotonic write pointer and doorbell.
Cound fields are 1-based compared with 0-based pre-Gfx9.
- Restructure implementation to use monotonic ring indices
- Remove redundant submission size checks (handled by AcquireWriteAddress)
- Unify copy/fill per-command limit (documentation is unclear)
Change-Id: I57c1675221d2e63aa319fee700d9951671e1bd65
Note: Implementation same as 1.0 APIs for now.
The followup change will have the complete implementation.
Change-Id: Ife633f74ff27eee0bb9b0c46952cf5233b0114e8
Also emit error messages to stderr if no async queue error callback was registered and queue fault messages are enabled (on by default).
Queue fault messages are controlled with env key HSA_ENABLE_QUEUE_FAULT_MESSAGE.
Change-Id: I496487b8d048b83aa95b9784e92928211f167b17
Uncommented HSA IPC code.
Changed hsa_amd_ipc_memory_t to be 8 uint32_t's instead of 9 to
match spec
Change-Id: Id1523125e9b876a23c3743df1be29c98b47f6725
Ensure that the write index and ring buffer contents are visible
to the HW before sending the doorbell. The latter is a write-combined
MMIO store and must be ordered with prior cacehable non-MMIO stores.
Also be more explicit about memory semantics for doorbell stores.
Change-Id: Ie4d96a7ee2a507237a8dbe7705fdf234d62ce9ba
If we issue too many copy commands without syncing and wrapping happens,
we need to wait for the blits to be done before moving forward otherwise
we will overwrite the kernel args of the blits in flight.
Change-Id: I9a21e31ce07f8e8157ca38e96dc264ff47fd3639
Introducing tiling format for images, still using LINEAR for now.
Using the new KFD/Thunk API hsaKmtGetTileConfig API for the address library.
Change-Id: Ic0677429dd320eef09ab62dddaf9b2dd94c4f904
C11 atomics are not statically guaranteed to be lock free and so
may not be atomic with respect to atomic operations originating
outside the standard library, such as platform atomics.
C11 macros to statically discover always lock free operations
(ATOMIC_*_LOCK_FREE) do not cover uint64_t in GCC and
std::atomic<uint64_t> is not a type alias of any covered type.
All use of __atomic by atomic_helpers.h is statically checked to be
always lock free.
GCC builtin fencing does not appear to be strong enough for WC memory.
Added an option (enabled) to enforce consistency for WC memory on x64.
__sync builtin's were not used as they were declared legacy by GCC.
Added a strongly conservative option (ALWAYS_CONSERVATIVE) to enable
use of full memory fences in place of partial fences and compiler
driven processor specific optimization.
Change-Id: Id7aaaca626144070f58759f6a348cbee4612bbc0
Change hsa_code_object_serialize and hsa_code_object_deserialize to use memcpy instead of hsa_memory_copy since it is system->system copy
Change-Id: I329e270ae4e2fc25e177dc8080d93662ffb261ab