Because eventDescrp->mutex is a non-recursive lock attempting to
acquire the lock with pthread_mutex_lock can cause the system to hang
indefinitely if the lock was already previously aquired with the
preceeding call to pthread_mutex_trylock.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Removed 'args' as a unique pointer and deletion in
'ThreadTrampoline', then declared as a class member.
Change-Id: Ia52058392d0170e8b5e57cfdd2c587f47a6f93f0
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
WaitSemaphore and PostSemaphore are used in the HybridMutex
implementation. If HybridMutex did not have to call WaitSemaphore when
acquired, then calling PostSemaphore would cause the internal count
inside sem_t to slowly grow to large values and eventually cause
overflow.
Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc
Rewriting logic to fix issue where pthread_create would return errors
other than EINVAL, and these errors would be ignored.
Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06
New API to accept a file stream for logging
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
This reverts commit 1df7a44112e45b7fb447926778490f741601219a.
Change-Id: Ib386c8f944b6da0ef68ddd2be3f26013cd36ef5b
Signed-off-by: Chris Freehill <cfreehil@amd.com>
This reverts commit ef95ccf81e59b8608861e8f2f256d981eee19df7.
Reason for revert: Causing performance regressions on some systems
Change-Id: I82951350cafbd57c495852d6f90023a3373f04f6
Signed-off-by: Chris Freehill <cfreehil@amd.com>
If pthread_attr_setaffinity_np function exists use it instead of
pthread_setaffinity_np as pthread_setaffinity_np seems to fail to set
the affinity settings on some systems.
Change-Id: Icd8b17039699ac10d9cd5c4dbb6ac44630673949
Fix Musl libc NULL errors and unsupported pthread funcs for compatibility.
Also ensures cleanup and error handling irrespective of CPU affinity override.
Fix submitted by github dev - AngryLoki
https://github.com/ROCm/ROCR-Runtime/issues/181
Change-Id: Ia487315e504112be5d3370756f23f6e23b9ae4be
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.
Change-Id: If53377033e749b0050727964c9303f09b02527cc
On some systems, pthread_addr_setaffinity_np does not exist, so we need
to use pthread_setaffinity_np on thread after pthread_create
Provided by Julian Samaroo on github
https: //github.com/RadeonOpenCompute/ROCR-Runtime/pull/143
Change-Id: I4649f94333f2d7b0a5993b370a4bfc48d92acecb
I've just reverted some code what it was in 5.5 by wrapping new x86
specific bits with #if's, e.g.:
- CPUID is x86 specific
- mwait is x86 specific
Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
On Linux, the os_thread abstraction is built on top of pthread. Many of
the pthread calls might fail and return error codes. The error
conditions are only checked via assertions (if ever checked) which means
that when doing a release build, no error condition is checked. The
same goes for dlsym/dlinfo and clock_gettime.
This commit improves the situation this by checking the error conditions
and acting accordingly. When the error condition is detected in a
function with a mean to indicate some error to its caller, then this
patch prints some error message and returns. If there is no way to
propagate the error up the call stack, print some error message and
abort the process.
For the os_info::os_info ctor, the only user is CreateThread, which
checks that the built thread is Valid(). If not, nullptr is returned to
the caller.
It could be possible to use exceptions when functions cannot pass
errors, but for now I only use abort as it is what abort would do with
debug build.
Change-Id: I815703c3b95777cc29bb89a7d654ac879c14a759
Fixes hang due to change in order of initialization of libraries
that have cyclical dependencies and they call hsa_init() during their
initialization phase.
This implementation looks for a symbol called "HSA_AMD_TOOL_PRIORITY"
across all loaded shared libraries using dynamic section entries of the
loaded lib instead of using dlopen and dlsym for the same purpose.
Change-Id: I4865f2fd18dd186ec311a432ec38fbb5583805d2
Simplified the callback method. Also fixed the way, loaded shared object were getting appended into a string vector,
which was not being passed to this callback method.
Change-Id: I68661dd73f61a11c42fa92f670e8e7b6ffcb5711
New environment variable HSA_OVERRIDE_CPU_AFFINITY_DEBUG to
enable/disable overriding CPU affinity.
Default value is enabled(1).
This is a temporary variable and may be removed in the future.
Change-Id: Id6a7c611730471ddc276ca333fde1e57046bf32a
During hsa initializing stage, ROCr now searches all the loaded libraries
for a symbol "HSA_AMD_TOOL_PRIORITY" and adds all those libraries to
the tools library init list. Tools libraries listed in HSA_TOOLS_LIB
env variable are also loaded in the given order and take priority
over HSA_AMD_TOOL_PRIORITY.
Change-Id: I739af42bbd777c44a9152c11e17dd69979b65e82
This is consistent with KFD and has significantly better latency.
KFD is taking this as the definition of the SystemClockCounter.
Change-Id: I4c1b3bc58c738206265c55ebefd41356c013bfe5
Prior solution used a single global lock to protect the memory tracking structures.
This change protects the memory tracking structure with a shared mutex (rw lock) in
shared (r) mode for memory allocations and frees so that long duration processes,
calling to kfd, can be done in parallel. Operations which must modify the memory map
take the mutex in exclusive mode (w) and must not call to the thunk while holding
the mutex.
The fragment allocator now requires separate protection and is protected with a
mutex at the device level. Protecting at the device level, rather than pool,
allows retention of the current recursive design and allows calling Trim from
withing Allocate. This could be made finer (pool level locks) but would
require backing out of Allocate entirely to call Trim. Trim and any retried
Allocation must be done in isolation (per device) or we may report OOM when
memory is actually available in some pool's fragment cache. So some device
level serialization is required in at least some paths.
Change-Id: I7c1e94d6965ffcc602b12fefdd3a6e97b84b5e00
Joined threads can not be joined more than once nor can they be detached.
Thread library wait and close allows multiple waits and separate close so
this fixes the pthread implementation.
Change-Id: I0019271a438f11ed4c6c11854011f5c4f6e16b65