Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.
Change-Id: If53377033e749b0050727964c9303f09b02527cc
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel. Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.
The fragment allocator now requires separate protection and is protected with a
mutex at the device level. Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate. This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim. Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache. So some device
level serialization is required in at least some paths.
Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00
GCC can't reasonably be told that the lock ptr isn't null. Adding a private bool
allows the branch to be eliminated, along with the bool.
Change-Id: I0605d69474d6a6e6951be93c0af1d8caf3f77124
Track pointer info for sub 2MB fragment allocations in allocation_map_.
Add fragment support to IPC.
Change-Id: I00cfc2e2fa289aac90a4718c392f9bb056a61a87