5eb895fc7d
- corrections in the calculations for latency and throughput points in `validate-causal-json.py`
- `omnitrace-causal` LD_PRELOAD libpthread
- ensures omnitrace is always wrapping libpthread.so pthread symbols
- minimal experiment delay
- always sleep 10 milliseconds before starting experiments
- ensures ~10 samples are taken to determine the sampling rate
- fixes issue with deadlocks on condition variables
- overhaul of `causal::component::blocking_gotcha` and `causal::component::unblocking_gotcha` components
- these components enforce the processing/crediting of delays before/after a thread is suspended
- these components wrap functions `pthread_cond_wait`, `pthread_cond_signal`, `pthread_mutex_lock`, etc.
- Fully implemented correct handling of processing/crediting delays based on return values and arguments
- E.g. skip crediting delay if `pthread_mutex_trylock` fail acquiring lock
- E.g. `kill`, `sigwait`, etc. check to make sure they are only applied if the PID matches its PID
## Condition Variable Deadlock Fix
In parallel applications using condition variables, it was found that the causal profiling was virtually guaranteed to deadlock. Although it was difficult to prove, evidence suggested that this was due to the work that was being done while taking a sample was causing notification to the condition variable to be lost. This was alleviated by the following updates:
- Separate out the part of `causal::backtrace::sample(int)` which calculates the sampling rate into small `sample_rate` component
- This component is essentially "always on" during sampling
- Added bundle of components invoked by `causal_sampler_t` during sampling
- Added two function calls to support disabling and re-enabling calls to `causal::backtrace::sample(int)` on a per-thread basis
- `causal::sampling::block_backtrace_samples()`
- `causal::sampling::unblock_backtrace_samples()`
- These two function now surround the wrappee functions of `blocking_gotcha` and `unblocking_gotcha`
**This solution was experimentally validated with a Geant4 application which uses a tasking model which makes _numerous_ calls to wait on a condition variables** (it was this application which exposed the bug)
* Fix validate-causal-json.py
- corrections in the calculations for latency and throughput points
* Update timemory submodule
- support for thread-local trait::runtime_enabled
* omnitrace-causal: LD_PRELOAD pthread library
- ensures omnitrace is always wrapping libpthread.so pthread symbols
* initial experiment delay
- always sleep 10 milliseconds before starting experiments
- ensures ~10 samples are taken to determine the sampling rate
* sample_rate component + block_backtrace_samples
- separate out the part of backtrace::sample which calculates the sampling rate into small sample_rate component
- add sample_rate component to causal_bundle_t used by causal_sampler_t
- causal::sampling::block_backtrace_samples() disables backtrace samples from being taken on a thread
- causal::sampling::unblock_backtrace_samples() enables backtrace samples from being taken on a thread
- above two function surround calls to function wrapped by blocking_gotcha and unblocking_gotcha
- the work happening in backtrace::sample when within these calls
produced deadlocks for condition variables (notifications to
condition variables were lost)
* blocking/unblocking gotcha updates
- overhaul of blocking_gotcha and unblocking_gotcha
- added fast_gotcha trait: replace function calls instead of wrapping
- when wrappees are called, backtrace samples are suppressed (thread-local)
- properly handle kill, sigwait, sigwaitinfo, sigtimedwait
- properly handle all instances of applying postblock based on return value
* Fix calculation of OMNITRACE_MAX_THREADS
* removed unnecessary checks in causal::delay
* Updated timemory with internal compiler error fix
[ROCm/rocprofiler-systems commit: 7c73d98125]