New API to accept a file stream for logging
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.
Change-Id: If53377033e749b0050727964c9303f09b02527cc
Fixes hang due to change in order of initialization of libraries
that have cyclical dependencies and they call hsa_init() during their
initialization phase.
This implementation looks for a symbol called "HSA_AMD_TOOL_PRIORITY"
across all loaded shared libraries using dynamic section entries of the
loaded lib instead of using dlopen and dlsym for the same purpose.
Change-Id: I4865f2fd18dd186ec311a432ec38fbb5583805d2
During hsa initializing stage, ROCr now searches all the loaded libraries
for a symbol "HSA_AMD_TOOL_PRIORITY" and adds all those libraries to
the tools library init list. Tools libraries listed in HSA_TOOLS_LIB
env variable are also loaded in the given order and take priority
over HSA_AMD_TOOL_PRIORITY.
Change-Id: I739af42bbd777c44a9152c11e17dd69979b65e82
This is consistent with KFD and has significantly better latency.
KFD is taking this as the definition of the SystemClockCounter.
Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel. Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.
The fragment allocator now requires separate protection and is protected with a
mutex at the device level. Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate. This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim. Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache. So some device
level serialization is required in at least some paths.
Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.
Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65