Linux Perf Support + Causal Profiling Updates (#276)
* causal backtrace updates
- fix initial causal sampling period value
* causal delay updates
- tweak handling of sleep_for_overhead
* Fix experiment global scaling for prog pts
- results in drastically improved predictions
* pthread_mutex_gotcha updates
- disable all wrappers during causal profiling
* validate-causal-json.py updates
- support decimal stddev
- fix setting stddev from command-line
* causal perform_experiment_impl update
- handle start failing because finalizing
* deprecate causal::component::sample_rate
- appears to not help at all
* Rework sample info
* Increase causal unwind_depth
- use OMNITRACE_MAX_UNWIND_DEPTH
* validate-causal-json updates
- min experiments
- exclude reporting predictions with less than X experiments at a given speedup
- percent samples
- only print samples within X% of the peak (default: 95%)
* Update timemory submodule
- extensions to sampling for signals delivered via non-timer method
- e.g. via HW counter overflow
* dwarf_entry::operator< updates
- sort via file
* causal profiling docs updates
- info about backends
- info about installing/enabling perf
* config updates: causal backend
- CausalBackend enum
- OMNITRACE_CAUSAL_BACKEND: perf, timer, auto
- omnitrace-causal option: --backend
* debug update
- use spin_mutex instead of std::mutex
* address_range::contains update
- range from 0-100 contains range from 10-100 but was returning false because high was == 100 not < 100
* symbol::operator< update
- handle load address differences
* sampling updates (non-causal)
- update get_timer to get_trigger + dynamic_cast
* container::static_vector updates
- support construction from container::c_array
- update_size private member func for handling atomic m_size
* Move perf files
- moved library/causal/perf.{hpp,cpp} to library/perf.{hpp,cpp}
* causal example update
- created impl.hpp (forward decls)
- renamed {cpu,rng}_func_impl to {cpu,rng}_impl_func
- only create two threads which run N iterations instead of two threads each iteration
* Update timemory submodule
- updates to unwind::processed_entry
- updates to procfs::maps
* Updated causal documentation
- fixed line numbers changed by modifications to causal example
* omnitrace-causal exe updates
- set OMNITRACE_THREAD_POOL_SIZE to zero by default
* core/containers updates
- static_vector: provide data() member function
- c_array pop_front() and pop_back() member functions
* core: config and argparse updates + perf
- core/perf.{hpp,cpp}
- forward decl of enums
- config-related capabilities
- argparse: --sample-overflow
- renamed some config functions
- e.g. get_sampling_cpu_freq -> get_sampling_cputime_freq
- added config settings related to overflow sampling via perf
- added timer_sampling and overflow_sampling categories
* Update timemory submodule
- sampling allocator flushing
* binary updates
- lookup_ipaddr_entry
- use bfd_find_nearest_line instead of bfd_find_nearest_line_discriminator
- discriminators are not used
- explicit instantiations of inlined_symbol::serialize
* Bump VERSION to 1.10.0
* sampling and perf updates
- support overflow sampling via Linux Perf
- update perf namespace
- update perf::perf_event
- update record ctor: pointer instead of const ref
- update open member func: return optional string
- add m_batch_size member variable
- sampling updates
- support overflow sampling
- flush allocators
- increase buffer size from 1024 to 2048
- restructure post-processing in light of perf overflow supports
- improve offload memory usage only load buffers for thread
- load_offload_buffer(tid) uses thread-specific filepos
- component updates
- backtrace_metrics::operator-=
- backtrace_metrics::operator-
- backtrace::sample does not record for overflow signal
- callchain: perf overflow sample
* core updates
- component::sampling_percent does not report self + uses_percent_units
* causal updates
- tweak get_line_info
- overloads for set_current_selection (uint64_t, c_array, std::array)
- delay
- use sampling::pause/sampling::resume
- experiment
- experiment::sample derives from unwind::processed_entry
- experiment::samples is vector instead of set
- fixed samples
- overloads for is_selected (uint64_t, c_array, std::array)
- scaling factor defaults to 100 instead of 50
- serialize updates follow change to experiment::sample
- modify algorithm for increasing/decreasing experiment length
- sample_data
- use map<uintptr, uint64_t> instead of set<sample_data>
- get_samples returns vector<sample_data> instead of set<sample_data>
- sampling
- support overflow via Linux Perf
- update causal_offload_buffer
- flush sampling allocator
- backtrace
- overflow component
* libomnitrace-dl updates
- handle dl::InstrumentMode::PythonProfile
* testing updates (causal)
- causal line 155 -> causal line 100
- causal line 165 -> causal line 110
* formatting
* exit_gotcha updates
- exit_info for abort()
- message about non-zero exit code
* testing updates
- fail regex for causal tests
- validate-causal-json: >= min_experiments instead of > min_experiments
- handle OMNITRACE_DEBUG_SETTINGS in omnitrace_write_test_config
* causal sampling updates
- add new lines where appropriate
* causal data updates
- reorder diagnostic info when experiment fails to start
* binary updates
- symbol address range from address to address + symsize + 1
- add 1 based on debug info
* causal data updates
- sample_selection wait_ns defaults to 1,000 instead of 10,000
- sample_selection wait scaled by iteration number
- save_line_info_impl verbosity
- print latest_eligible_pc when experiment does not start
* causal sampling + component updates
- perf backend disables component::backtrace
- ensure get_sampling_(realtime|cputime|overflow)_signal do not malloc
* causal: remove period stats
* validate-causal-json update
- fix --help
* causal data updates
- improve eligible pc history reporting when experiment fails to start
* causal data updates
- fix compute_eligible_lines_impl
- eligible address ranges returning too many ranges
- occasionally, overwrite all *true* eligible address ranges
* causal data updates
- reduce scoped ranges to symbol ranges
- is_eligible_address() returns true contains (not just coarse)
- revert some sample_selection behavior
* binary address_multirange updates
- make coarse_range private
- fix operator+=(pair<coarse, uintptr_t>)
* causal example update
- fix nsync to default to once per iteration
* binary analysis updates
- tweak header file includes
* causal updates
- remove factoring in sleep_for_overhead
- invoke delay::process() even if experiment is not active
* causal data updates
- update latest_eligible_pc structure
* update omnitrace-install.py.in
- fix support for fedora
- /etc/os-release does not have ID_LIKE
- fallback to RHEL 8.7 if version not specified
* update omnitrace-install.py.in
- fix support for debian
- /etc/os-release does not have ID_LIKE
- version mapping
* Update documentation
- update docs on installation
* causal data and experiment updates
- data: reset_sample_selection
* causal set_current_selection debugging
- debug messages for failed e2e runs
* causal data and backtrace component updates
- data: set_current_selection returns the number of eligible addresses added
- backtrace: if cputime signal has selected zero IPs > 5x, then realtime signal starts contributing call-stacks
* core library updates
- move config::parse_numeric_range to utility namespace
- add core/utility.cpp
- support range:increment, e.g. 5-25:10 expands to '5 15 25' instead of '5 10 15 20 25'
* omnitrace-causal update
- end-to-end expands all speedups
- support range:increment in speedups
* causal backtrace updates
- remove select_ival (realtime signal always contributes when select_count == 0)
* containers: static_vector update
- explicit c_array constructor
- explicit std::array constructor
* causal data updates
- remove set_current_selection(uint64_t)
- remove set_current_selection(std::array)
- sample_selection increase default wait time
- report eligible PC candidates
- move reset_sample_selection to perform_experiment_impl
- decrease latest_eligible_pc array size
- set_current_selection does not guard for experiment::active
* core debug updates
- OMNITRACE_PRINT_COLOR macros
* causal data updates
- tweak to experiment never started message
* causal gotcha updates
- remove unused code
* critical trace updates
- remove unused code
* omnitrace-causal
- OMNITRACE_LAUNCHER
* causal data updates
- don't fail on end-to-end + omnitrace-causal
* causal backtrace updates
- reintroduce select_ival behavior
* causal data updates
- tweak verbose messages about number of PC candidates
* core mproc updates
- utilities for waiting on child PID and diagnosing status
- omnitrace::mproc::wait_pid
- omnitrace::mproc::diagnose_status
* omnitrace-run updates
- support --fork argument for executing via fork in current process + execvpe on child instead of execvpe in current process
* omnitrace-causal updates
- wait_pid and diagnose_status just call equivalent functions in omnitrace::mproc
* ubuntu-focal workflow update
- attempt to launch ubuntu-focal-codecov job with CAP_SYS_ADMIN and use perf backend
* tests reorg and updates
- remove binary-rewrite-sampling and runtime-instrument-sampling tests
- rename *-preload tests (which use omnitrace-sample exe) to *-sampling
- split tests/CMakeLists.txt into several tests/omnitrace-<category>-tests.cmake files
- tweak to causal-both-omni-func test
- add args: -n 2 -b timer
* update validate-causal-json.py
- better reasoning info for adjusting tolerance
- always apply tolerance adjustments in CI mode
* causal e2e tests update
- add label "causal-e2e" label
- tweak params
- old: 80 12 432525 500000000
- new: 80 50 432525 100000000
- disable processor affinity for slow-func/line-100 tests
- artificially inflates some speedups with perf
* unblocking_gotcha updates
- overload operator() according to gotcha function index
* blocking_gotcha updates
- overload operator() according to gotcha function index
- fix bug where potentially post block functors (e.g. pthread_mutex_trylock) throw error if lock is not acquired.
* parse_numeric_range update
- support unordered_set
* config update
- OMNITRACE_DEBUG_{TIDS,PIDS} use parse_numeric_range
This commit is contained in:
کامیت شده توسط
GitHub
والد
cc14b52584
کامیت
9de3a6b0b4
@@ -21,10 +21,9 @@ parse:
|
||||
omnitrace_add_test:
|
||||
flags:
|
||||
- SKIP_BASELINE
|
||||
- SKIP_PRELOAD
|
||||
- SKIP_SAMPLING
|
||||
- SKIP_REWRITE
|
||||
- SKIP_RUNTIME
|
||||
- SKIP_SAMPLING
|
||||
kwargs:
|
||||
NAME: '*'
|
||||
TARGET: '*'
|
||||
@@ -33,15 +32,16 @@ parse:
|
||||
NUM_PROCS: '*'
|
||||
REWRITE_TIMEOUT: '*'
|
||||
RUNTIME_TIMEOUT: '*'
|
||||
PRELOAD_TIMEOUT: '*'
|
||||
SAMPLING_TIMEOUT: '*'
|
||||
SAMPLING_ARGS: '*'
|
||||
REWRITE_ARGS: '*'
|
||||
RUNTIME_ARGS: '*'
|
||||
RUN_ARGS: '*'
|
||||
ENVIRONMENT: '*'
|
||||
LABELS: '*'
|
||||
PROPERTIES: '*'
|
||||
PRELOAD_PASS_REGEX: '*'
|
||||
PRELOAD_FAIL_REGEX: '*'
|
||||
SAMPLING_PASS_REGEX: '*'
|
||||
SAMPLING_FAIL_REGEX: '*'
|
||||
RUNTIME_PASS_REGEX: '*'
|
||||
RUNTIME_FAIL_REGEX: '*'
|
||||
REWRITE_PASS_REGEX: '*'
|
||||
|
||||
@@ -554,9 +554,11 @@ jobs:
|
||||
|
||||
container:
|
||||
image: jrmadsen/omnitrace:ci-base-ubuntu-20.04
|
||||
options: --cap-add CAP_SYS_ADMIN
|
||||
|
||||
env:
|
||||
OMNITRACE_VERBOSE: 2
|
||||
OMNITRACE_CAUSAL_BACKEND: perf
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
+13
-3
@@ -99,9 +99,19 @@ See the [Getting Started documentation](https://amdresearch.github.io/omnitrace/
|
||||
- Visit [Releases](https://github.com/AMDResearch/omnitrace/releases) page
|
||||
- Select appropriate installer (recommendation: `.sh` scripts do not require super-user priviledges unlike the DEB/RPM installers)
|
||||
- If targeting a ROCm application, find the installer script with the matching ROCm version
|
||||
- If you are unsure about your Linux distro, check `/etc/os-release`
|
||||
- If no installer script matches your target OS, try one of the Ubuntu 18.04 `*.sh` installers
|
||||
- This installation may be built against older library versions supported on your distro via backwards compatibility
|
||||
- If you are unsure about your Linux distro, check `/etc/os-release` or use the `omnitrace-install.py` script
|
||||
|
||||
If the above recommendation is not desired, download the `omnitrace-install.py` and specify `--prefix <install-directory>` when
|
||||
executing it. This script will attempt to auto-detect a compatible OS distribution and version.
|
||||
If ROCm support is desired, specify `--rocm X.Y` where `X` is the ROCm major version and `Y`
|
||||
is the ROCm minor version, e.g. `--rocm 5.4`.
|
||||
|
||||
```console
|
||||
wget https://github.com/AMDResearch/omnitrace/releases/latest/download/omnitrace-install.py
|
||||
python3 ./omnitrace-install.py --prefix /opt/omnitrace/rocm-5.4 --rocm 5.4
|
||||
```
|
||||
|
||||
See the [Installation Documentation](https://amdresearch.github.io/omnitrace/installation) for detailed information.
|
||||
|
||||
### Setup
|
||||
|
||||
|
||||
+1
-1
@@ -1 +1 @@
|
||||
1.9.2
|
||||
1.10.0
|
||||
|
||||
@@ -63,30 +63,73 @@ def get_os_info(os_distrib, os_version):
|
||||
_key, _data = line.split("=", 1)
|
||||
_os_info[_key] = _data.strip('"')
|
||||
|
||||
def _parse_version(_v):
|
||||
_version = re.split(r"[\\.-]", _v)
|
||||
return (
|
||||
"{}.{}".format(_version[0], _version[1])
|
||||
if len(_version) > 1
|
||||
else "{}".format(_version[0])
|
||||
)
|
||||
|
||||
if os_distrib is None or os_distrib == "auto":
|
||||
if "debian" in _os_info["ID_LIKE"]:
|
||||
if "ubuntu" in _os_info["ID"]:
|
||||
os_distrib = "ubuntu"
|
||||
elif "suse" in _os_info["ID_LIKE"]:
|
||||
elif "opensuse" in _os_info["ID"]:
|
||||
os_distrib = "opensuse"
|
||||
elif "rhel" in _os_info["ID_LIKE"]:
|
||||
elif "rhel" in _os_info["ID"]:
|
||||
os_distrib = "rhel"
|
||||
elif "fedora" in _os_info["ID_LIKE"]:
|
||||
elif "centos" in _os_info["ID"]:
|
||||
os_distrib = "rhel"
|
||||
elif "centos" in _os_info["ID_LIKE"]:
|
||||
elif "rockylinux" in _os_info["ID"]:
|
||||
os_distrib = "rhel"
|
||||
elif "debian" in _os_info["ID"]:
|
||||
os_distrib = "ubuntu"
|
||||
if "debian" in _os_info["ID"] and os_version is None:
|
||||
_debian_version = float(_parse_version(_os_info["VERSION_ID"]))
|
||||
if _debian_version >= 11.0:
|
||||
os_version = "20.04"
|
||||
else:
|
||||
os_version = "18.04"
|
||||
elif "fedora" in _os_info["ID"]:
|
||||
os_distrib = "rhel"
|
||||
# fedora has different versioning system so fallback to 8.7
|
||||
if os_version is None:
|
||||
os_version = "8.7"
|
||||
else:
|
||||
raise RuntimeError(
|
||||
"Unknown ID_LIKE value in /etc/os-release: {}".format(_os_info["ID_LIKE"])
|
||||
)
|
||||
elif os_distrib == "fedora" or os_distrib == "centos":
|
||||
# if we don't have an exact match, check ID_LIKE
|
||||
if "ID_LIKE" not in _os_info.keys():
|
||||
_os_info["ID_LIKE"] = _os_info["ID"]
|
||||
|
||||
if "debian" in _os_info["ID_LIKE"]:
|
||||
os_distrib = "ubuntu"
|
||||
if os_version is None:
|
||||
# fallback on 18.04 if ID is not ubuntu but debian-like
|
||||
os_version = "18.04"
|
||||
elif "suse" in _os_info["ID_LIKE"]:
|
||||
os_distrib = "opensuse"
|
||||
# fallback on 15.3 if ID is not opensuse but suse-like
|
||||
if os_version is None:
|
||||
os_version = "15.3"
|
||||
elif "rhel" in _os_info["ID_LIKE"] or "centos" in _os_info["ID_LIKE"]:
|
||||
os_distrib = "rhel"
|
||||
if os_version is None:
|
||||
os_version = "8.7"
|
||||
else:
|
||||
raise RuntimeError(
|
||||
"Unknown ID_LIKE value in /etc/os-release: {}".format(
|
||||
_os_info["ID_LIKE"]
|
||||
)
|
||||
)
|
||||
elif os_distrib == "centos":
|
||||
os_distrib = "rhel"
|
||||
# uses same versioning system
|
||||
elif os_distrib == "fedora":
|
||||
os_distrib = "rhel"
|
||||
if os_version is None:
|
||||
# fedora has different versioning system so fallback to 8.7
|
||||
os_version = "8.7"
|
||||
|
||||
if os_version is None:
|
||||
|
||||
def _parse_version(_v):
|
||||
_version = re.split(r"[\\.-]", _v)
|
||||
return "{}.{}".format(_version[0], _version[1])
|
||||
|
||||
os_version = _parse_version(_os_info["VERSION_ID"])
|
||||
|
||||
return (os_distrib, os_version)
|
||||
|
||||
@@ -1,122 +1,67 @@
|
||||
#include "causal.hpp"
|
||||
|
||||
#include <chrono>
|
||||
#include <cmath>
|
||||
#include <cstdio>
|
||||
#include <cstdlib>
|
||||
#include <ctime>
|
||||
#include <iostream>
|
||||
#include <mutex>
|
||||
#include <random>
|
||||
#include <string>
|
||||
#include <thread>
|
||||
#include <unistd.h>
|
||||
#include <vector>
|
||||
|
||||
using mutex_t = std::timed_mutex;
|
||||
using auto_lock_t = std::unique_lock<mutex_t>;
|
||||
using clock_type = std::chrono::high_resolution_clock;
|
||||
using nanosec = std::chrono::nanoseconds;
|
||||
#include "impl.hpp"
|
||||
|
||||
namespace
|
||||
{
|
||||
std::chrono::duration<double, std::milli> t_ms;
|
||||
std::chrono::duration<double, std::milli> slow_ms;
|
||||
std::chrono::duration<double, std::milli> fast_ms;
|
||||
|
||||
template <typename... Args>
|
||||
inline void
|
||||
consume_variables(Args&&...)
|
||||
{}
|
||||
} // namespace
|
||||
|
||||
template <bool>
|
||||
bool
|
||||
rng_func_impl(int64_t n, uint64_t rseed);
|
||||
|
||||
template <bool>
|
||||
bool
|
||||
cpu_func_impl(int64_t n, int nloop);
|
||||
|
||||
void
|
||||
rng_slow_func(int64_t n, uint64_t rseed) __attribute__((noinline));
|
||||
|
||||
void
|
||||
rng_fast_func(int64_t n, uint64_t rseed) __attribute__((noinline));
|
||||
|
||||
void
|
||||
cpu_slow_func(int64_t n, int nloop) __attribute__((noinline));
|
||||
|
||||
void
|
||||
cpu_fast_func(int64_t n, int nloop) __attribute__((noinline));
|
||||
|
||||
#if USE_CPU > 0
|
||||
# define CPU_SLOW_FUNC(...) cpu_slow_func(__VA_ARGS__)
|
||||
# define CPU_FAST_FUNC(...) cpu_fast_func(__VA_ARGS__)
|
||||
#else
|
||||
# define CPU_SLOW_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
# define CPU_FAST_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
#endif
|
||||
|
||||
#if USE_RNG > 0
|
||||
# define RNG_SLOW_FUNC(...) rng_slow_func(__VA_ARGS__)
|
||||
# define RNG_FAST_FUNC(...) rng_fast_func(__VA_ARGS__)
|
||||
#else
|
||||
# define RNG_SLOW_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
# define RNG_FAST_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
#endif
|
||||
|
||||
int
|
||||
main(int argc, char** argv)
|
||||
{
|
||||
uint64_t rseed = std::random_device{}();
|
||||
int nitr = 200;
|
||||
size_t nitr = 50;
|
||||
double frac = 70;
|
||||
int64_t slow_val = 100000000L;
|
||||
int64_t slow_val = 200000000L;
|
||||
size_t nsync = 1;
|
||||
|
||||
if(argc > 1) frac = std::stod(argv[1]);
|
||||
if(argc > 2) nitr = std::stoi(argv[2]);
|
||||
if(argc > 2) nitr = std::stoull(argv[2]);
|
||||
if(argc > 3) rseed = std::stoul(argv[3]);
|
||||
if(argc > 4) slow_val = std::stol(argv[4]);
|
||||
if(argc > 5) nsync = std::stoull(argv[5]);
|
||||
|
||||
nsync = std::min<size_t>(std::max<size_t>(nsync, 1), nitr);
|
||||
int64_t fast_val = (frac / 100.0) * slow_val;
|
||||
double rfrac = (fast_val / static_cast<double>(slow_val));
|
||||
if(argc > 5) fast_val = std::stol(argv[5]);
|
||||
|
||||
printf("\nIterations: %i, fraction: %6.2f, random seed: %lu :: slow = %zu, "
|
||||
"fast = %zu, expected ratio = %6.2f\n",
|
||||
nitr, frac, rseed, slow_val, fast_val, rfrac * 100.0);
|
||||
printf("\nFraction: %6.2f, iterations: %zu, random seed: %lu :: slow = %zu, "
|
||||
"fast = %zu, expected ratio = %6.2f, sync every %lu iterations\n",
|
||||
frac, nitr, rseed, slow_val, fast_val, rfrac * 100.0, nsync);
|
||||
|
||||
auto _t = clock_type::now();
|
||||
for(int i = 0; i < nitr; ++i)
|
||||
auto _wait_barrier = pthread_barrier_t{};
|
||||
pthread_barrier_init(&_wait_barrier, nullptr, 3);
|
||||
auto _thread_func = [nitr, nsync, &_wait_barrier](const auto& _func, auto* _timer,
|
||||
auto _nsec, auto _nseed,
|
||||
auto _nloop) {
|
||||
pthread_barrier_wait(&_wait_barrier);
|
||||
for(size_t i = 0; i < nitr; ++i)
|
||||
{
|
||||
auto _t = clock_type::now();
|
||||
_func(_nsec, _nseed, _nloop);
|
||||
(*_timer) += (clock_type::now() - _t);
|
||||
CAUSAL_PROGRESS_NAMED("iteration");
|
||||
if(i % nsync == (nsync - 1)) pthread_barrier_wait(&_wait_barrier);
|
||||
}
|
||||
};
|
||||
|
||||
auto _t = clock_type::now();
|
||||
auto _threads = std::vector<std::thread>{};
|
||||
_threads.emplace_back(_thread_func, SLOW_FUNC, &slow_ms, slow_val, rseed, 10000);
|
||||
_threads.emplace_back(_thread_func, FAST_FUNC, &fast_ms, fast_val, rseed, 10000);
|
||||
pthread_barrier_wait(&_wait_barrier);
|
||||
for(size_t i = 0; i < nitr; ++i)
|
||||
{
|
||||
if(i == 0 || i + 1 == nitr || i % (nitr / 5) == 0)
|
||||
printf("executing iteration: %i\n", i);
|
||||
//
|
||||
auto&& _slow_func = [](auto _nsec, auto _seed, auto _nloop) {
|
||||
auto _t = clock_type::now();
|
||||
CPU_SLOW_FUNC(_nsec, _nloop);
|
||||
RNG_SLOW_FUNC(_nsec / 5, _seed);
|
||||
slow_ms += (clock_type::now() - _t);
|
||||
};
|
||||
//
|
||||
auto&& _fast_func = [](auto _nsec, auto _seed, auto _nloop) {
|
||||
auto _t = clock_type::now();
|
||||
CPU_FAST_FUNC(_nsec, _nloop);
|
||||
RNG_FAST_FUNC(_nsec / 5, _seed);
|
||||
fast_ms += (clock_type::now() - _t);
|
||||
};
|
||||
//
|
||||
CAUSAL_BEGIN("main_iteration");
|
||||
//
|
||||
auto _threads = std::vector<std::thread>{};
|
||||
_threads.emplace_back(std::move(_slow_func), slow_val, rseed, 10000);
|
||||
_threads.emplace_back(std::move(_fast_func), fast_val, rseed, 10000);
|
||||
for(auto& itr : _threads)
|
||||
itr.join();
|
||||
CAUSAL_END("main_iteration");
|
||||
CAUSAL_PROGRESS;
|
||||
(printf("executing iteration: %zu\n", i), fflush(stdout));
|
||||
if(i % nsync == (nsync - 1)) pthread_barrier_wait(&_wait_barrier);
|
||||
}
|
||||
for(auto& itr : _threads)
|
||||
itr.join();
|
||||
|
||||
t_ms += clock_type::now() - _t;
|
||||
auto rms = (fast_ms.count() / slow_ms.count());
|
||||
printf("slow_func() took %10.3f ms\n", slow_ms.count());
|
||||
@@ -132,7 +77,7 @@ void
|
||||
rng_slow_func(int64_t n, uint64_t rseed)
|
||||
{
|
||||
// clang-format off
|
||||
while(rng_func_impl<false>(n, rseed) != false) {}
|
||||
while(rng_impl_func<false>(n, rseed) != false) {}
|
||||
// clang-format on
|
||||
}
|
||||
//
|
||||
@@ -142,7 +87,7 @@ void
|
||||
rng_fast_func(int64_t n, uint64_t rseed)
|
||||
{
|
||||
// clang-format off
|
||||
while(rng_func_impl<true>(n, rseed) != true) {}
|
||||
while(rng_impl_func<true>(n, rseed) != true) {}
|
||||
// clang-format on
|
||||
}
|
||||
//
|
||||
@@ -152,7 +97,7 @@ void
|
||||
cpu_slow_func(int64_t n, int nloop)
|
||||
{
|
||||
// clang-format off
|
||||
while(cpu_func_impl<false>(n, nloop) != false) {}
|
||||
while(cpu_impl_func<false>(n, nloop) != false) {}
|
||||
// clang-format on
|
||||
}
|
||||
//
|
||||
@@ -162,6 +107,6 @@ void
|
||||
cpu_fast_func(int64_t n, int nloop)
|
||||
{
|
||||
// clang-format off
|
||||
while(cpu_func_impl<true>(n, nloop) != true) {}
|
||||
while(cpu_impl_func<true>(n, nloop) != true) {}
|
||||
// clang-format on
|
||||
}
|
||||
|
||||
@@ -66,7 +66,7 @@ get_clock_cpu_now() noexcept;
|
||||
//
|
||||
template <bool V>
|
||||
bool
|
||||
rng_func_impl(int64_t n, uint64_t rseed)
|
||||
rng_impl_func(int64_t n, uint64_t rseed)
|
||||
{
|
||||
int64_t _n = 0;
|
||||
auto _rng = std::mt19937_64{ rseed };
|
||||
@@ -77,8 +77,8 @@ rng_func_impl(int64_t n, uint64_t rseed)
|
||||
return V;
|
||||
}
|
||||
|
||||
template bool rng_func_impl<true>(int64_t, uint64_t);
|
||||
template bool rng_func_impl<false>(int64_t, uint64_t);
|
||||
template bool rng_impl_func<true>(int64_t, uint64_t);
|
||||
template bool rng_impl_func<false>(int64_t, uint64_t);
|
||||
|
||||
//
|
||||
// This implementation works well for COZ
|
||||
@@ -86,25 +86,25 @@ template bool rng_func_impl<false>(int64_t, uint64_t);
|
||||
//
|
||||
template <bool V>
|
||||
bool
|
||||
cpu_func_impl(int64_t n, int nloop)
|
||||
cpu_impl_func(int64_t n, int nloop)
|
||||
{
|
||||
auto _t = clock_type::now();
|
||||
auto _cpu_now = get_clock_cpu_now();
|
||||
auto _cpu_end = _cpu_now + n;
|
||||
// clang-format off
|
||||
while(get_clock_cpu_now() < _cpu_end)
|
||||
{
|
||||
for(volatile int i = 0; i < nloop; ++i) {}
|
||||
CAUSAL_PROGRESS_NAMED("cpu_impl");
|
||||
while(get_clock_cpu_now() < _cpu_end)
|
||||
{
|
||||
for(volatile int i = 0; i < nloop; ++i) {}
|
||||
CAUSAL_PROGRESS_NAMED("cpu_impl");
|
||||
}
|
||||
// clang-format on
|
||||
return V;
|
||||
}
|
||||
|
||||
template bool
|
||||
cpu_func_impl<true>(int64_t, int);
|
||||
cpu_impl_func<true>(int64_t, int);
|
||||
template bool
|
||||
cpu_func_impl<false>(int64_t, int);
|
||||
cpu_impl_func<false>(int64_t, int);
|
||||
|
||||
namespace
|
||||
{
|
||||
|
||||
@@ -0,0 +1,97 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <chrono>
|
||||
#include <cmath>
|
||||
#include <cstdio>
|
||||
#include <cstdlib>
|
||||
#include <ctime>
|
||||
#include <iostream>
|
||||
#include <mutex>
|
||||
#include <random>
|
||||
#include <string>
|
||||
#include <thread>
|
||||
#include <unistd.h>
|
||||
#include <vector>
|
||||
|
||||
using mutex_t = std::timed_mutex;
|
||||
using auto_lock_t = std::unique_lock<mutex_t>;
|
||||
using clock_type = std::chrono::high_resolution_clock;
|
||||
using nanosec = std::chrono::nanoseconds;
|
||||
|
||||
namespace
|
||||
{
|
||||
template <typename... Args>
|
||||
inline void
|
||||
consume_variables(Args&&...)
|
||||
{}
|
||||
} // namespace
|
||||
|
||||
template <bool>
|
||||
bool
|
||||
rng_impl_func(int64_t n, uint64_t rseed);
|
||||
|
||||
template <bool>
|
||||
bool
|
||||
cpu_impl_func(int64_t n, int nloop);
|
||||
|
||||
void
|
||||
rng_slow_func(int64_t n, uint64_t rseed) __attribute__((noinline));
|
||||
|
||||
void
|
||||
rng_fast_func(int64_t n, uint64_t rseed) __attribute__((noinline));
|
||||
|
||||
void
|
||||
cpu_slow_func(int64_t n, int nloop) __attribute__((noinline));
|
||||
|
||||
void
|
||||
cpu_fast_func(int64_t n, int nloop) __attribute__((noinline));
|
||||
|
||||
#if USE_CPU > 0
|
||||
# define CPU_SLOW_FUNC(...) cpu_slow_func(__VA_ARGS__)
|
||||
# define CPU_FAST_FUNC(...) cpu_fast_func(__VA_ARGS__)
|
||||
#else
|
||||
# define CPU_SLOW_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
# define CPU_FAST_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
#endif
|
||||
|
||||
#if USE_RNG > 0
|
||||
# define RNG_SLOW_FUNC(...) rng_slow_func(__VA_ARGS__)
|
||||
# define RNG_FAST_FUNC(...) rng_fast_func(__VA_ARGS__)
|
||||
#else
|
||||
# define RNG_SLOW_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
# define RNG_FAST_FUNC(...) consume_variables(__VA_ARGS__)
|
||||
#endif
|
||||
|
||||
#define SLOW_FUNC \
|
||||
[](auto _nsec_v, auto _nseed_v, auto _nloop_v) { \
|
||||
CPU_SLOW_FUNC(_nsec_v, _nloop_v); \
|
||||
RNG_SLOW_FUNC(_nsec_v / 5, _nseed_v); \
|
||||
}
|
||||
|
||||
#define FAST_FUNC \
|
||||
[](auto _nsec_v, auto _nseed_v, auto _nloop_v) { \
|
||||
CPU_FAST_FUNC(_nsec_v, _nloop_v); \
|
||||
RNG_FAST_FUNC(_nsec_v / 5, _nseed_v); \
|
||||
}
|
||||
فروخته شده
+1
-1
Submodule external/timemory updated: 2b92a966d7...58536c55d7
@@ -14,7 +14,7 @@ target_include_directories(omnitrace-causal PRIVATE ${CMAKE_CURRENT_LIST_DIR})
|
||||
target_link_libraries(
|
||||
omnitrace-causal
|
||||
PRIVATE omnitrace::omnitrace-compile-definitions omnitrace::omnitrace-headers
|
||||
omnitrace::omnitrace-common-library)
|
||||
omnitrace::omnitrace-common-library omnitrace::omnitrace-core)
|
||||
set_target_properties(
|
||||
omnitrace-causal PROPERTIES BUILD_RPATH "\$ORIGIN:\$ORIGIN/../${CMAKE_INSTALL_LIBDIR}"
|
||||
INSTALL_RPATH "${OMNITRACE_EXE_INSTALL_RPATH}")
|
||||
|
||||
@@ -27,12 +27,14 @@
|
||||
#include "common/environment.hpp"
|
||||
#include "common/join.hpp"
|
||||
#include "common/setup.hpp"
|
||||
#include "core/mproc.hpp"
|
||||
#include "core/utility.hpp"
|
||||
|
||||
#include <regex>
|
||||
#include <timemory/environment.hpp>
|
||||
#include <timemory/log/color.hpp>
|
||||
#include <timemory/utility/argparse.hpp>
|
||||
#include <timemory/utility/console.hpp>
|
||||
#include <timemory/utility/delimit.hpp>
|
||||
#include <timemory/utility/filepath.hpp>
|
||||
#include <timemory/utility/join.hpp>
|
||||
|
||||
@@ -45,6 +47,7 @@
|
||||
#include <cstring>
|
||||
#include <gnu/lib-names.h>
|
||||
#include <iostream>
|
||||
#include <regex>
|
||||
#include <stdexcept>
|
||||
#include <string>
|
||||
#include <string_view>
|
||||
@@ -57,10 +60,11 @@ namespace color = ::tim::log::color;
|
||||
namespace filepath = ::tim::filepath;
|
||||
namespace console = ::tim::utility::console;
|
||||
namespace argparse = ::tim::argparse;
|
||||
using namespace timemory::join;
|
||||
using tim::get_env;
|
||||
using tim::log::monochrome;
|
||||
using tim::log::stream;
|
||||
using namespace ::timemory::join;
|
||||
using ::omnitrace::utility::parse_numeric_range;
|
||||
using ::tim::get_env;
|
||||
using ::tim::log::monochrome;
|
||||
using ::tim::log::stream;
|
||||
|
||||
namespace std
|
||||
{
|
||||
@@ -147,117 +151,13 @@ remove_child_pid(pid_t _v)
|
||||
int
|
||||
wait_pid(pid_t _pid, int _opts)
|
||||
{
|
||||
int _status = 0;
|
||||
pid_t _pid_v = -1;
|
||||
_opts |= WUNTRACED;
|
||||
do
|
||||
{
|
||||
if((_opts & WNOHANG) > 0)
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds{ 100 });
|
||||
_pid_v = waitpid(_pid, &_status, _opts);
|
||||
} while(_pid <= 0);
|
||||
return _status;
|
||||
return ::omnitrace::mproc::wait_pid(_pid, _opts);
|
||||
}
|
||||
|
||||
int
|
||||
diagnose_status(pid_t _pid, int _status)
|
||||
{
|
||||
auto _verbose = get_verbose();
|
||||
if(_verbose >= 3)
|
||||
{
|
||||
fflush(stderr);
|
||||
fflush(stdout);
|
||||
std::cout << std::flush;
|
||||
std::cerr << std::flush;
|
||||
}
|
||||
|
||||
bool _normal_exit = (WIFEXITED(_status) > 0);
|
||||
bool _unhandled_signal = (WIFSIGNALED(_status) > 0);
|
||||
bool _core_dump = (WCOREDUMP(_status) > 0);
|
||||
bool _stopped = (WIFSTOPPED(_status) > 0);
|
||||
int _exit_status = WEXITSTATUS(_status);
|
||||
int _stop_signal = (_stopped) ? WSTOPSIG(_status) : 0;
|
||||
int _ec = (_unhandled_signal) ? WTERMSIG(_status) : 0;
|
||||
|
||||
if(_verbose >= 4)
|
||||
{
|
||||
TIMEMORY_PRINTF_INFO(
|
||||
stderr,
|
||||
"diagnosing status for process %i :: status: %i... normal exit: %s, "
|
||||
"unhandled signal: %s, core dump: %s, stopped: %s, exit status: %i, stop "
|
||||
"signal: %i, exit code: %i\n",
|
||||
_pid, _status, std::to_string(_normal_exit).c_str(),
|
||||
std::to_string(_unhandled_signal).c_str(), std::to_string(_core_dump).c_str(),
|
||||
std::to_string(_stopped).c_str(), _exit_status, _stop_signal, _ec);
|
||||
}
|
||||
else if(_verbose >= 3)
|
||||
{
|
||||
TIMEMORY_PRINTF_INFO(stderr,
|
||||
"diagnosing status for process %i :: status: %i ...\n", _pid,
|
||||
_status);
|
||||
}
|
||||
|
||||
if(!_normal_exit)
|
||||
{
|
||||
if(_ec == 0) _ec = EXIT_FAILURE;
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "process %i terminated abnormally. exit code: %i\n", _pid, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(_stopped)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(stderr,
|
||||
"process %i stopped with signal %i. exit code: %i\n",
|
||||
_pid, _stop_signal, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(_core_dump)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "process %i terminated and produced a core dump. exit code: %i\n",
|
||||
_pid, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(_unhandled_signal)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(stderr,
|
||||
"process %i terminated because it received a signal "
|
||||
"(%i) that was not handled. exit code: %i\n",
|
||||
_pid, _ec, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(!_normal_exit && _exit_status > 0)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
if(_exit_status == 127)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "execv in process %i failed. exit code: %i\n", _pid, _ec);
|
||||
}
|
||||
else
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr,
|
||||
"process %i terminated with a non-zero status. exit code: %i\n", _pid,
|
||||
_ec);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return _ec;
|
||||
return ::omnitrace::mproc::diagnose_status(_pid, _status, get_verbose());
|
||||
}
|
||||
|
||||
std::string
|
||||
@@ -301,6 +201,9 @@ get_initial_environment()
|
||||
update_env(_env, "OMNITRACE_USE_TIMEMORY", false);
|
||||
update_env(_env, "OMNITRACE_USE_PROCESS_SAMPLING", false);
|
||||
update_env(_env, "OMNITRACE_CRITICAL_TRACE", false);
|
||||
update_env(_env, "OMNITRACE_THREAD_POOL_SIZE",
|
||||
get_env<int>("OMNITRACE_THREAD_POOL_SIZE", 0));
|
||||
update_env(_env, "OMNITRACE_LAUNCHER", "omnitrace-causal");
|
||||
|
||||
return _env;
|
||||
}
|
||||
@@ -634,7 +537,12 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
|
||||
parser.start_group("CAUSAL PROFILING OPTIONS (General)",
|
||||
"These settings will be applied to all causal profiling runs");
|
||||
parser.add_argument({ "-m", "--mode" }, "Causal profiling mode")
|
||||
parser
|
||||
.add_argument({ "-m", "--mode" },
|
||||
"Causal profiling mode. Function mode tends to resolve statistics "
|
||||
"faster than line mode (due to smaller sampling space). Ideally, "
|
||||
"use function mode first to identify a function to target and then "
|
||||
"switch to line mode + function scope setting")
|
||||
.count(1)
|
||||
.dtype("string")
|
||||
.choices({ "function", "line" })
|
||||
@@ -643,6 +551,14 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
update_env(_env, "OMNITRACE_CAUSAL_MODE", p.get<std::string>("mode"));
|
||||
});
|
||||
|
||||
parser.add_argument({ "-b", "--backend" }, "Causal profiling sampling backend.")
|
||||
.count(1)
|
||||
.dtype("string")
|
||||
.choices({ "auto", "perf", "timer" })
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_CAUSAL_BACKEND", p.get<std::string>("backend"));
|
||||
});
|
||||
|
||||
parser
|
||||
.add_argument({ "-o", "--output-name" },
|
||||
"Output filename of causal profiling data w/o extension")
|
||||
@@ -717,16 +633,42 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
"scopes (MAIN+foo, MAIN+bar, MAIN+foo, MAIN+bar)");
|
||||
|
||||
parser
|
||||
.add_argument({ "-s", "--speedups" },
|
||||
"Pool of virtual speedups to sample from during experimentation. "
|
||||
"Each space designates a group and multiple speedups can be "
|
||||
"grouped together by commas, e.g. -s 0 0,10,20-50 is two groups: "
|
||||
"group #1 is '0' and group #2 is '0 10 20 25 30 35 40 45 50'")
|
||||
.min_count(0)
|
||||
.add_argument(
|
||||
{ "-s", "--speedups" },
|
||||
"Pool of virtual speedups to sample from during experimentation. "
|
||||
"Each space designates a group and multiple speedups can be "
|
||||
"grouped together by commas, e.g. '-s 0 0,10,20-50' is two groups: "
|
||||
"group #1 is '0' and group #2 is '0 10 20 25 30 35 40 45 50' -- "
|
||||
"unless end-to-end mode is activated: in end-to-end mode, only one "
|
||||
"speedup is selected for the entire run so all groups are "
|
||||
"expanded. If a range is specified, the default increment is 5, "
|
||||
"however, this can be overridden by suffixing the range with a colon and the "
|
||||
"desired increment, e.g., '0-40:10' would expand to '0 10 20 30 40'")
|
||||
.min_count(1)
|
||||
.max_count(-1)
|
||||
.dtype("integers")
|
||||
.dtype("integer | range | range:increment")
|
||||
.action([&](parser_t& p) {
|
||||
_virtual_speedups = p.get<std::vector<std::string>>("speedups");
|
||||
auto _val = p.get<std::vector<std::string>>("speedups");
|
||||
if(p.get<bool>("end-to-end"))
|
||||
{
|
||||
_virtual_speedups.clear();
|
||||
for(const auto& itr : _val)
|
||||
{
|
||||
for(const auto& ditr : tim::delimit(itr, ",; \t\n\r"))
|
||||
{
|
||||
for(auto nitr :
|
||||
parse_numeric_range<int64_t, std::vector<int64_t>>(
|
||||
ditr, "virtual speedup", 5L))
|
||||
{
|
||||
_virtual_speedups.emplace_back(std::to_string(nitr));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
_virtual_speedups = _val;
|
||||
}
|
||||
});
|
||||
|
||||
parser
|
||||
|
||||
@@ -247,7 +247,7 @@ print_updated_environment(parser_data_t& _data, std::string_view _prefix)
|
||||
}
|
||||
|
||||
parser_data_t&
|
||||
parse_args(int argc, char** argv, parser_data_t& _parser_data)
|
||||
parse_args(int argc, char** argv, parser_data_t& _parser_data, bool& _fork_exec)
|
||||
{
|
||||
get_initial_environment(_parser_data);
|
||||
|
||||
@@ -305,6 +305,13 @@ parse_args(int argc, char** argv, parser_data_t& _parser_data)
|
||||
omnitrace::argparse::add_core_arguments(parser, _parser_data);
|
||||
omnitrace::argparse::add_extended_arguments(parser, _parser_data);
|
||||
|
||||
parser.start_group("EXECUTION OPTIONS", "");
|
||||
parser.add_argument({ "--fork" }, "Execute via fork + execvpe instead of execvpe")
|
||||
.min_count(0)
|
||||
.max_count(1)
|
||||
.dtype("boolean")
|
||||
.action([&](parser_t& p) { _fork_exec = p.get<bool>("fork"); });
|
||||
|
||||
auto _inpv = std::vector<char*>{};
|
||||
auto& _outv = _parser_data.command;
|
||||
bool _hash = false;
|
||||
@@ -335,6 +342,8 @@ parse_args(int argc, char** argv, parser_data_t& _parser_data)
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
tim::log::monochrome() = _parser_data.monochrome;
|
||||
|
||||
return _parser_data;
|
||||
}
|
||||
|
||||
|
||||
@@ -21,6 +21,7 @@
|
||||
// SOFTWARE.
|
||||
|
||||
#include "omnitrace-run.hpp"
|
||||
#include "core/mproc.hpp"
|
||||
|
||||
#include <timemory/log/color.hpp>
|
||||
#include <timemory/log/macros.hpp>
|
||||
@@ -61,7 +62,8 @@ main(int argc, char** argv)
|
||||
}
|
||||
|
||||
auto _parse_data = parser_data_t{};
|
||||
parse_args(argc, argv, _parse_data);
|
||||
auto _fork_exec = false;
|
||||
parse_args(argc, argv, _parse_data, _fork_exec);
|
||||
prepare_command_for_run(argv[0], _parse_data);
|
||||
prepare_environment_for_run(_parse_data);
|
||||
|
||||
@@ -73,7 +75,39 @@ main(int argc, char** argv)
|
||||
print_command(_parse_data, "OMNITRACE: ");
|
||||
_argv.emplace_back(nullptr);
|
||||
_envp.emplace_back(nullptr);
|
||||
return execvpe(_argv.front(), _argv.data(), _envp.data());
|
||||
|
||||
if(_fork_exec)
|
||||
{
|
||||
auto _main_pid = getpid();
|
||||
auto _pid = fork();
|
||||
|
||||
if(_pid == 0)
|
||||
{
|
||||
return execvpe(_argv.front(), _argv.data(), _envp.data());
|
||||
}
|
||||
else
|
||||
{
|
||||
auto _status = omnitrace::mproc::wait_pid(_pid);
|
||||
auto _ec = omnitrace::mproc::diagnose_status(_pid, _status);
|
||||
if(_ec != 0 && _parse_data.verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "process %i exiting with non-zero exit code: %i\n", _pid,
|
||||
_ec);
|
||||
}
|
||||
else if(_parse_data.verbose >= 2)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "omnitrace run in process %i completed. exit code: %i\n",
|
||||
_pid, _ec);
|
||||
}
|
||||
return _ec;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
return execvpe(_argv.front(), _argv.data(), _envp.data());
|
||||
}
|
||||
}
|
||||
|
||||
_print_usage();
|
||||
|
||||
@@ -47,7 +47,7 @@ void
|
||||
prepare_environment_for_run(parser_data_t&);
|
||||
|
||||
parser_data_t&
|
||||
parse_args(int argc, char** argv, parser_data_t&);
|
||||
parse_args(int argc, char** argv, parser_data_t&, bool&);
|
||||
|
||||
parser_data_t&
|
||||
parse_command(int argc, char** argv, parser_data_t&);
|
||||
|
||||
@@ -521,6 +521,7 @@ omnitrace_add_bin_test(
|
||||
cpu_clock
|
||||
peak_rss
|
||||
page_rss
|
||||
--fork
|
||||
--
|
||||
$<TARGET_FILE:sleeper>
|
||||
5
|
||||
|
||||
@@ -76,6 +76,7 @@ be found in the future.
|
||||
|
||||
| Concept | Setting | Options | Description |
|
||||
|------------------|-----------------------------------|----------------------------------|--------------------------------------------------------------------------------------------------------------------|
|
||||
| Backend | `OMNITRACE_CAUSAL_BACKEND` | `perf`, `timer` | Backend for recording samples required to calculate the virtual speed-up |
|
||||
| Mode | `OMNITRACE_CAUSAL_MODE` | `function`, `line` | Select entire function or individual line of code for causal experiments |
|
||||
| End-to-End | `OMNITRACE_CAUSAL_END_TO_END` | boolean | Perform a single experiment during the entire run (does not require progress-points) |
|
||||
| Fixed speedup(s) | `OMNITRACE_CAUSAL_FIXED_SPEEDUP` | one or more values from [0, 100] | Virtual speedup or pool of virtual speedups to randomly select |
|
||||
@@ -89,6 +90,58 @@ be found in the future.
|
||||
2. `<file>` and `<file>:<line>` support requires debug info (i.e. code was compiled with `-g` or, preferably, `-g3`)
|
||||
3. Function mode does not require debug info but does not support stripped binaries
|
||||
|
||||
### Backends
|
||||
|
||||
Both causal profiling backends interrupt each thread 1000x per second of CPU-time to apply virtual speedups.
|
||||
The difference between the backends is how the samples which are responsible calculating the virtual speedup are recorded.
|
||||
There are 3 key differences between the two backends:
|
||||
|
||||
1. `perf` backend requires Linux Perf and elevated security priviledges
|
||||
2. `perf` backend interrupts the application less frequently whereas the `timer` backend will interrupt the applicaiton 1000x per second of realtime
|
||||
3. `timer` backend has less accurate call-stacks due to instruction pointer skid
|
||||
|
||||
In general, the `"perf"` is preferred over the `"timer"` backend when sufficient security priviledges permit it's usage.
|
||||
If `"OMNITRACE_CAUSAL_BACKEND"` is set to `"auto"`, Omnitrace will fallback to using the `"timer"` backend only if
|
||||
using the `"perf"` backend fails; if `"OMNITRACE_CAUSAL_BACKEND"` is set to `"perf"` and using this backend fails, Omnitrace
|
||||
will abort.
|
||||
|
||||
#### Instruction Pointer Skid
|
||||
|
||||
Instruction pointer (IP) skid is how many instructions execute between an event of interest
|
||||
happening and where the IP is when the kernel is able to stop the application.
|
||||
For the `"timer"` backend, this translates to the
|
||||
difference between when the IP when the timer generated a signal and the IP when the
|
||||
signal was actually generated. Although IP skid does still occur with the `"perf"` backend,
|
||||
the overhead of pausing the entire thread with the `"timer"` backend makes this much more pronounced
|
||||
and, as such, the `"timer"` backend tends to have a lower resolution than the `"perf"` backend,
|
||||
especially in `"line"` mode.
|
||||
|
||||
#### Installing Linux Perf
|
||||
|
||||
Linux Perf is built into the kernel and may already be installed (e.g., included in the default kernel for OpenSUSE).
|
||||
The official method of checking whether Linux Perf is installed is checking for the existence of the file
|
||||
`/proc/sys/kernel/perf_event_paranoid` -- if the file exists, the kernel has Perf installed.
|
||||
|
||||
If this file does not exist, on Debian-based systems like Ubuntu, install (as superuser):
|
||||
|
||||
```console
|
||||
apt-get install linux-tools-common linux-tools-generic linux-tools-$(uname -r)
|
||||
```
|
||||
|
||||
and reboot your computer. In order to use the `"perf"` backend, the value of `/proc/sys/kernel/perf_event_paranoid`
|
||||
should be <= 2. If the value in this file is greater than 2, you will likely be unable to use the perf backend.
|
||||
|
||||
To update the paranoid level temporarily (until the system is rebooted), run one of the following methods
|
||||
as a superuser (where `PARANOID_LEVEL=<N>` with `<N>` in the range `[-1, 2]):
|
||||
|
||||
```console
|
||||
echo ${PARANOID_LEVEL} | sudo tee /proc/sys/kernel/perf_event_paranoid
|
||||
sysctl kernel.perf_event_paranoid=${PARANOID_LEVEL}
|
||||
```
|
||||
|
||||
To make the paranoid level persistent after a reboot, add `kernel.perf_event_paranoid=<N>`
|
||||
(where `<N>` is the desired paranoid level) to the `/etc/sysctl.conf` file.
|
||||
|
||||
### Speedup Prediction Variability and `omnitrace-causal` Executable
|
||||
|
||||
Causal profiling typically require executing the application several times in order to adequately sample all the domains of executing code, experiment speedups, etc. and resolve statistical fluctuations.
|
||||
@@ -259,7 +312,7 @@ omnitrace-causal \
|
||||
|
||||
|
||||
# 20 iterations in line mode with 1 speedup group
|
||||
# and source scope restricted to lines 155 and 165
|
||||
# and source scope restricted to lines 100 and 110
|
||||
# in the causal.cpp file.
|
||||
#
|
||||
# outputs to files:
|
||||
@@ -273,7 +326,7 @@ omnitrace-causal \
|
||||
-s ${SPEEDUPS} \
|
||||
-m line \
|
||||
-o experiments.line \
|
||||
-S "causal\\.cpp:(155|165)" \
|
||||
-S "causal\\.cpp:(100|110)" \
|
||||
-- \
|
||||
./causal-omni-cpu "${@}"
|
||||
|
||||
@@ -302,8 +355,8 @@ omnitrace-causal \
|
||||
|
||||
# 3 iterations in line mode of 15 singular speedups
|
||||
# in end-to-end mode with 2 different source scopes
|
||||
# where one is restricted to line 155 in causal.cpp
|
||||
# and another is restricted to line 165 in causal.cpp.
|
||||
# where one is restricted to line 100 in causal.cpp
|
||||
# and another is restricted to line 110 in causal.cpp.
|
||||
#
|
||||
# outputs to files:
|
||||
# - causal/experiments.line.e2e.coz
|
||||
@@ -317,8 +370,8 @@ omnitrace-causal \
|
||||
-m line \
|
||||
-e \
|
||||
-o experiments.line.e2e \
|
||||
-S "causal\\.cpp:155" \
|
||||
"causal\\.cpp:165" \
|
||||
-S "causal\\.cpp:100" \
|
||||
"causal\\.cpp:110" \
|
||||
-- \
|
||||
./causal-omni-cpu "${@}"
|
||||
|
||||
@@ -465,9 +518,7 @@ OmniTrace provides several additional features and utilities for causal profilin
|
||||
| Scope options | Supports binary and source scopes | Supports binary, source, and function scopes | See Note #4, #5, and #6 below |
|
||||
| Scope inclusion | Uses `%` as wildcard for binary and source scopes | Full regex support for binary, source, and function scopes | |
|
||||
| Scope exclusion | Not supported | Supports regexes for excluding binary/source/function | See Note #7 below |
|
||||
| Call-stack sampling | Linux perf | libunwind | See Note #8 below |
|
||||
|
||||
### Notes
|
||||
| Call-stack sampling | Linux perf | Linux perf, libunwind | See Note #8 below |
|
||||
|
||||
1. OmniTrace supports a "function" mode which does not require debug info
|
||||
2. OmniTrace supports selecting entire range of instruction pointers for a function instead of instruction pointer for one line. In large codes, "function" mode
|
||||
@@ -478,3 +529,5 @@ OmniTrace provides several additional features and utilities for causal profilin
|
||||
6. OmniTrace supports a "function" scope which narrows the functions/lines which are eligible for causal experiments to those within the matching functions
|
||||
7. OmniTrace supports a second filter on scopes for removing binary/source/function caught by inclusive match, e.g. `BINARY_SCOPE=.*` + `BINARY_EXCLUDE=libmpi.*`
|
||||
initially includes all binaries but exclude regex removes MPI libraries
|
||||
8. In Omnitrace, the Linux perf backend is preferred over use libunwind. However, Linux perf usage can be restricted for security reasons.
|
||||
Omnitrace will fallback to using a second POSIX timer and libunwind if Linux perf is not available.
|
||||
|
||||
@@ -6,15 +6,38 @@
|
||||
:maxdepth: 4
|
||||
```
|
||||
|
||||
## Quick Start (Latest Release, Binary Installer)
|
||||
|
||||
Download the `omnitrace-install.py` and specify `--prefix <install-directory>`. This script
|
||||
will attempt to auto-detect the appropriate OS distribution and OS version.
|
||||
If ROCm support is desired, specify `--rocm X.Y` where `X` is the ROCm major version and `Y`
|
||||
is the ROCm minor version, e.g. `--rocm 5.4`.
|
||||
|
||||
```console
|
||||
wget https://github.com/AMDResearch/omnitrace/releases/latest/download/omnitrace-install.py
|
||||
python3 ./omnitrace-install.py --prefix /opt/omnitrace --rocm 5.4
|
||||
```
|
||||
|
||||
This script supports installation on Ubuntu, OpenSUSE, RedHat, Debian, CentOS, and Fedora.
|
||||
If the target OS is compatible with one of the [operating system versions](#operating-system) below,
|
||||
specify `-d <DISTRO> -v <VERSION>`, e.g. if the OS is compatible with Ubuntu 18.04, pass
|
||||
`-d ubuntu -v 18.04` to the script.
|
||||
|
||||
## Operating System
|
||||
|
||||
OmniTrace is only supported on Linux.
|
||||
OmniTrace is only supported on Linux. The following distributions are tested:
|
||||
|
||||
- Ubuntu 18.04
|
||||
- Ubuntu 20.04
|
||||
- Ubuntu 22.04
|
||||
- OpenSUSE 15.2
|
||||
- OpenSUSE 15.3
|
||||
- Other OS distributions may be supported but are not tested
|
||||
- OpenSUSE 15.4
|
||||
- RedHat 8.7
|
||||
- RedHat 9.0
|
||||
- RedHat 9.1
|
||||
|
||||
Other OS distributions may be supported but are not tested.
|
||||
|
||||
### Identifying the Operating System
|
||||
|
||||
|
||||
@@ -34,16 +34,16 @@ namespace binary
|
||||
address_multirange&
|
||||
address_multirange::operator+=(std::pair<coarse, uintptr_t>&& _v)
|
||||
{
|
||||
coarse_range = address_range{ std::min(coarse_range.low, _v.second),
|
||||
std::max(coarse_range.high, _v.second) };
|
||||
m_coarse_range = address_range{ std::min(m_coarse_range.low, _v.second),
|
||||
std::max(m_coarse_range.high, _v.second + 1) };
|
||||
return *this;
|
||||
}
|
||||
|
||||
address_multirange&
|
||||
address_multirange::operator+=(std::pair<coarse, address_range>&& _v)
|
||||
{
|
||||
coarse_range = address_range{ std::min(coarse_range.low, _v.second.low),
|
||||
std::max(coarse_range.high, _v.second.high) };
|
||||
m_coarse_range = address_range{ std::min(m_coarse_range.low, _v.second.low),
|
||||
std::max(m_coarse_range.high, _v.second.high) };
|
||||
|
||||
return *this;
|
||||
}
|
||||
|
||||
@@ -49,14 +49,15 @@ struct address_multirange
|
||||
template <typename Tp>
|
||||
bool contains(Tp&& _v) const;
|
||||
|
||||
address_range coarse_range = {};
|
||||
|
||||
auto size() const { return m_fine_ranges.size(); }
|
||||
auto empty() const { return m_fine_ranges.empty(); }
|
||||
auto range_size() const { return coarse_range.size(); }
|
||||
auto range_size() const { return m_coarse_range.size(); }
|
||||
auto get_coarse_range() const { return m_coarse_range; }
|
||||
auto get_ranges() const { return m_fine_ranges; }
|
||||
|
||||
private:
|
||||
std::set<address_range> m_fine_ranges = {};
|
||||
address_range m_coarse_range = {};
|
||||
std::set<address_range> m_fine_ranges = {};
|
||||
};
|
||||
|
||||
template <typename Tp>
|
||||
@@ -68,7 +69,7 @@ address_multirange::contains(Tp&& _v) const
|
||||
std::is_same<type, address_range>::value,
|
||||
"Error! operator+= supports only integrals or address_ranges");
|
||||
|
||||
if(!coarse_range.contains(_v)) return false;
|
||||
if(!m_coarse_range.contains(_v)) return false;
|
||||
return std::any_of(m_fine_ranges.begin(), m_fine_ranges.end(),
|
||||
[_v](auto&& itr) { return itr.contains(_v); });
|
||||
}
|
||||
|
||||
@@ -37,15 +37,18 @@
|
||||
#include "core/common.hpp"
|
||||
#include "core/config.hpp"
|
||||
#include "core/debug.hpp"
|
||||
#include "core/locking.hpp"
|
||||
#include "core/state.hpp"
|
||||
#include "core/utility.hpp"
|
||||
#include "dwarf_entry.hpp"
|
||||
#include "link_map.hpp"
|
||||
#include "scope_filter.hpp"
|
||||
#include "symbol.hpp"
|
||||
|
||||
#include <timemory/log/macros.hpp>
|
||||
#include <timemory/unwind/bfd.hpp>
|
||||
#include <timemory/unwind/dlinfo.hpp>
|
||||
#include <timemory/unwind/types.hpp>
|
||||
#include <timemory/utility/filepath.hpp>
|
||||
#include <timemory/utility/join.hpp>
|
||||
#include <timemory/utility/procfs/maps.hpp>
|
||||
@@ -200,5 +203,106 @@ get_binary_info(const std::vector<std::string>& _files,
|
||||
|
||||
return _data;
|
||||
}
|
||||
|
||||
template <bool ExcludeInternal>
|
||||
std::optional<tim::unwind::processed_entry>
|
||||
lookup_ipaddr_entry(uintptr_t _addr, unw_context_t* _context_p,
|
||||
tim::unwind::cache* _cache_p)
|
||||
{
|
||||
static auto _mutex = locking::atomic_mutex{};
|
||||
static auto _cache_v = tim::unwind::cache{ true };
|
||||
static auto _context_v = []() {
|
||||
auto _v = unw_context_t{};
|
||||
unw_getcontext(&_v);
|
||||
return _v;
|
||||
}();
|
||||
|
||||
if constexpr(ExcludeInternal)
|
||||
{
|
||||
static auto _exclude_range = []() {
|
||||
auto _maps = ::tim::procfs::maps::iterate_program_headers();
|
||||
auto _exclude_range_v = std::set<address_range_t>{};
|
||||
auto _insert_exclude_range = [&_maps,
|
||||
&_exclude_range_v](const std::string& _v) {
|
||||
auto _base_v = std::string_view{ filepath::basename(_v) };
|
||||
auto _real_v = filepath::realpath(_v);
|
||||
for(const auto& mitr : _maps)
|
||||
{
|
||||
if(std::string_view{ filepath::basename(mitr.pathname) } == _base_v ||
|
||||
_real_v == _v)
|
||||
{
|
||||
_exclude_range_v.emplace(
|
||||
address_range_t{ mitr.load_address, mitr.last_address });
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
for(const auto& itr : binary::get_link_map("libomnitrace.so", "", ""))
|
||||
_insert_exclude_range(itr.real());
|
||||
|
||||
for(const auto& itr : binary::get_link_map("libomnitrace-dl.so", "", ""))
|
||||
_insert_exclude_range(itr.real());
|
||||
|
||||
return _exclude_range_v;
|
||||
}();
|
||||
|
||||
for(auto itr : _exclude_range)
|
||||
if(itr.contains(_addr)) return std::optional<tim::unwind::processed_entry>{};
|
||||
}
|
||||
|
||||
// NOLINTNEXTLINE(readability-misleading-indentation)
|
||||
if(_addr == 0) return std::optional<tim::unwind::processed_entry>{};
|
||||
|
||||
auto _lk = locking::atomic_lock{ _mutex, std::defer_lock };
|
||||
|
||||
if(!_context_p) _context_p = &_context_v;
|
||||
if(!_cache_p)
|
||||
{
|
||||
_cache_p = &_cache_v;
|
||||
// prevent concurrent access to cache
|
||||
_lk.lock();
|
||||
}
|
||||
|
||||
auto _entry = tim::unwind::entry{ _addr };
|
||||
|
||||
auto citr = _cache_p->entries.find(_entry);
|
||||
if(citr != _cache_p->entries.end())
|
||||
{
|
||||
if(citr->second.error == 0) return citr->second;
|
||||
return std::optional<tim::unwind::processed_entry>{};
|
||||
}
|
||||
|
||||
auto _v = tim::unwind::processed_entry{};
|
||||
_v.address = _entry.address();
|
||||
_v.name = _entry.template get_name<4096, true>(*_context_p, &_v.offset, &_v.error);
|
||||
|
||||
tim::unwind::processed_entry::construct(_v, &_cache_p->files);
|
||||
|
||||
if(_v.error != 0 && _v.lineinfo)
|
||||
{
|
||||
auto _lineinfo = _v.lineinfo.get();
|
||||
if(_lineinfo)
|
||||
{
|
||||
_v.name = _lineinfo.name;
|
||||
_v.error = 0;
|
||||
}
|
||||
}
|
||||
else if(_v.info && _v.info.symbol)
|
||||
{
|
||||
_v.name = _v.info.symbol.name;
|
||||
_v.error = 0;
|
||||
}
|
||||
|
||||
_cache_p->entries.emplace(_entry, _v);
|
||||
|
||||
return (_v.error == 0) ? std::optional<tim::unwind::processed_entry>{ _v }
|
||||
: std::optional<tim::unwind::processed_entry>{};
|
||||
}
|
||||
|
||||
template std::optional<tim::unwind::processed_entry>
|
||||
lookup_ipaddr_entry<true>(uintptr_t, unw_context_t*, tim::unwind::cache*);
|
||||
|
||||
template std::optional<tim::unwind::processed_entry>
|
||||
lookup_ipaddr_entry<false>(uintptr_t, unw_context_t*, tim::unwind::cache*);
|
||||
} // namespace binary
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -57,5 +57,9 @@ std::vector<binary_info>
|
||||
get_binary_info(const std::vector<std::string>&, const std::vector<scope_filter>&,
|
||||
bool _process_dwarf = true, bool _process_bfd = true,
|
||||
bool _include_all = false);
|
||||
|
||||
template <bool ExcludeInternal>
|
||||
std::optional<tim::unwind::processed_entry>
|
||||
lookup_ipaddr_entry(uintptr_t, unw_context_t* = nullptr, tim::unwind::cache* = nullptr);
|
||||
} // namespace binary
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -132,8 +132,8 @@ get_dwarf_entry(Dwarf_Die* _die)
|
||||
bool
|
||||
dwarf_entry::operator<(const dwarf_entry& _rhs) const
|
||||
{
|
||||
return std::tie(address, line, col, discriminator) <
|
||||
std::tie(_rhs.address, _rhs.line, _rhs.col, _rhs.discriminator);
|
||||
return std::tie(address, file, line, col, discriminator) <
|
||||
std::tie(_rhs.address, _rhs.file, _rhs.line, _rhs.col, _rhs.discriminator);
|
||||
}
|
||||
|
||||
bool
|
||||
|
||||
@@ -82,7 +82,7 @@ read_inliner_info(bfd* _inp)
|
||||
|
||||
symbol::symbol(const base_type& _v)
|
||||
: base_type{ _v }
|
||||
, address{ _v.address, _v.address + _v.symsize }
|
||||
, address{ _v.address, _v.address + _v.symsize + 1 }
|
||||
, func{ std::string{ base_type::name } }
|
||||
{}
|
||||
|
||||
@@ -96,6 +96,15 @@ symbol::operator==(const symbol& _rhs) const
|
||||
bool
|
||||
symbol::operator<(const symbol& _rhs) const
|
||||
{
|
||||
// if both have non-zero load addresses that are not equal, compare based on load
|
||||
// addresses
|
||||
if(load_address > 0 && _rhs.load_address > 0 && load_address != _rhs.load_address)
|
||||
return (load_address < _rhs.load_address);
|
||||
|
||||
// if address is same and name is same, return true if load_address is higher
|
||||
if(address == _rhs.address && base_type::name == _rhs.base_type::name)
|
||||
return load_address > _rhs.load_address;
|
||||
|
||||
return std::tie(address, base_type::binding, base_type::visibility, base_type::name) <
|
||||
std::tie(_rhs.address, _rhs.base_type::binding, base_type::visibility,
|
||||
base_type::name);
|
||||
@@ -122,6 +131,10 @@ symbol::operator+=(const symbol& _rhs)
|
||||
address += _rhs.address;
|
||||
utility::combine(inlines, _rhs.inlines);
|
||||
utility::combine(dwarf_info, _rhs.dwarf_info);
|
||||
if(_rhs.binding < binding) binding = _rhs.binding;
|
||||
if(_rhs.visibility < visibility) visibility = _rhs.visibility;
|
||||
if(load_address == 0 && _rhs.load_address > load_address)
|
||||
load_address = _rhs.load_address;
|
||||
}
|
||||
else
|
||||
{
|
||||
@@ -171,6 +184,20 @@ symbol::read_dwarf_entries(const std::deque<dwarf_entry>& _info)
|
||||
_get_next_address(itr, itr->address.low) };
|
||||
}
|
||||
|
||||
std::sort(dwarf_info.begin(), dwarf_info.end(),
|
||||
[](const auto& _lhs, const auto& _rhs) {
|
||||
return std::tie(_lhs.address, _lhs.file, _lhs.line, _lhs.col) <
|
||||
std::tie(_rhs.address, _rhs.file, _rhs.line, _rhs.col);
|
||||
});
|
||||
|
||||
dwarf_info.erase(std::unique(dwarf_info.begin(), dwarf_info.end(),
|
||||
[](const auto& _lhs, const auto& _rhs) {
|
||||
return std::tie(_lhs.address, _lhs.file,
|
||||
_lhs.line) ==
|
||||
std::tie(_rhs.address, _rhs.file, _rhs.line);
|
||||
}),
|
||||
dwarf_info.end());
|
||||
|
||||
return dwarf_info.size();
|
||||
}
|
||||
|
||||
@@ -206,15 +233,12 @@ symbol::read_bfd_line_info(bfd_file& _bfd)
|
||||
auto* _syms = reinterpret_cast<asymbol**>(_bfd.syms);
|
||||
|
||||
{
|
||||
const char* _file = nullptr;
|
||||
const char* _func = nullptr;
|
||||
unsigned int _line = 0;
|
||||
unsigned int _discriminator = 0;
|
||||
const char* _file = nullptr;
|
||||
const char* _func = nullptr;
|
||||
unsigned int _line = 0;
|
||||
|
||||
// if(bfd_find_nearest_line(_inp, _section, _syms, _pc - _vma, &_file,
|
||||
// &_func, &_line) != 0)
|
||||
if(bfd_find_nearest_line_discriminator(_inp, _section, _syms, _pc - _vma, &_file,
|
||||
&_func, &_line, &_discriminator) != 0)
|
||||
if(bfd_find_nearest_line(_inp, _section, _syms, _pc - _vma, &_file, &_func,
|
||||
&_line) != 0)
|
||||
{
|
||||
if(_file) file = _file;
|
||||
if(_func) func = _func;
|
||||
@@ -340,6 +364,18 @@ symbol::serialize(ArchiveT& ar, const unsigned int)
|
||||
ar(cereal::make_nvp("dfunc", demangle(func)));
|
||||
}
|
||||
|
||||
template void
|
||||
inlined_symbol::serialize<cereal::JSONInputArchive>(cereal::JSONInputArchive&,
|
||||
const unsigned int);
|
||||
|
||||
template void
|
||||
inlined_symbol::serialize<cereal::MinimalJSONOutputArchive>(
|
||||
cereal::MinimalJSONOutputArchive&, const unsigned int);
|
||||
|
||||
template void
|
||||
inlined_symbol::serialize<cereal::PrettyJSONOutputArchive>(
|
||||
cereal::PrettyJSONOutputArchive&, const unsigned int);
|
||||
|
||||
template void
|
||||
symbol::serialize<cereal::JSONInputArchive>(cereal::JSONInputArchive&,
|
||||
const unsigned int);
|
||||
|
||||
@@ -12,9 +12,11 @@ set(core_sources
|
||||
${CMAKE_CURRENT_LIST_DIR}/exception.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/gpu.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/mproc.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/perf.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/perfetto.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/state.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/timemory.cpp)
|
||||
${CMAKE_CURRENT_LIST_DIR}/timemory.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/utility.cpp)
|
||||
|
||||
set(core_headers
|
||||
${CMAKE_CURRENT_LIST_DIR}/argparse.hpp
|
||||
@@ -29,6 +31,7 @@ set(core_headers
|
||||
${CMAKE_CURRENT_LIST_DIR}/gpu.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/locking.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/mproc.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/perf.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/perfetto.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/redirect.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/state.hpp
|
||||
|
||||
@@ -272,6 +272,15 @@ add_core_arguments(parser_t& _parser, parser_data& _data)
|
||||
%{INDENT}% to consume more resources since, while idle, the real-clock time increases (and therefore triggers taking samples)
|
||||
%{INDENT}% whereas the CPU-clock time does not.)";
|
||||
|
||||
const auto* _overflow_desc =
|
||||
R"(Sample based on an overflow event. Accepts zero or more arguments:
|
||||
%{INDENT}%0. Enables sampling based on overflow.
|
||||
%{INDENT}%1. Overflow metric, e.g. PERF_COUNT_HW_INSTRUCTIONS
|
||||
%{INDENT}%2. Overflow value. E.g., if metric == PERF_COUNT_HW_INSTRUCTIONS, then 10000000 == sample every 10,000,000 instructions.
|
||||
%{INDENT}%3+ Thread IDs to target for sampling, starting at 0 (the main thread).
|
||||
%{INDENT}% May be specified as index or range, e.g., '0 2-4' will be interpreted as:
|
||||
%{INDENT}% sample the main thread (0), do not sample the first child thread but sample the 2nd, 3rd, and 4th child threads)";
|
||||
|
||||
const auto* _hsa_interrupt_desc =
|
||||
R"(Set the value of the HSA_ENABLE_INTERRUPT environment variable.
|
||||
%{INDENT}% ROCm version 5.2 and older have a bug which will cause a deadlock if a sample is taken while waiting for the signal
|
||||
@@ -1075,7 +1084,7 @@ add_core_arguments(parser_t& _parser, parser_data& _data)
|
||||
"SAMPLING TIMER OPTIONS",
|
||||
"These options determine the heuristic for deciding when to take a sample");
|
||||
|
||||
if(_data.environ_filter("sample_cputime", _data))
|
||||
if(_data.environ_filter("sampling_cputime", _data))
|
||||
{
|
||||
_parser.add_argument({ "--sample-cputime" }, _cputime_desc)
|
||||
.min_count(0)
|
||||
@@ -1103,7 +1112,7 @@ add_core_arguments(parser_t& _parser, parser_data& _data)
|
||||
_data.processed_environs.emplace("sampling_cputime");
|
||||
}
|
||||
|
||||
if(_data.environ_filter("sample_realtime", _data))
|
||||
if(_data.environ_filter("sampling_realtime", _data))
|
||||
{
|
||||
_parser.add_argument({ "--sample-realtime" }, _realtime_desc)
|
||||
.min_count(0)
|
||||
@@ -1132,6 +1141,41 @@ add_core_arguments(parser_t& _parser, parser_data& _data)
|
||||
_data.processed_environs.emplace("sampling_realtime");
|
||||
}
|
||||
|
||||
if(_data.environ_filter("sampling_overflow", _data))
|
||||
{
|
||||
_parser.add_argument({ "--sample-overflow" }, _overflow_desc)
|
||||
.min_count(0)
|
||||
.dtype("[event] [freq] [tids...]")
|
||||
.action([&](parser_t& p) {
|
||||
auto _v = p.get<std::deque<std::string>>("sample-overflow");
|
||||
update_env(_data, "OMNITRACE_SAMPLING_OVERFLOW", true);
|
||||
|
||||
if(!_v.empty())
|
||||
{
|
||||
if(p.exists("sampling-overflow-event") &&
|
||||
_v.front() != p.get<std::string>("sampling-overflow-event"))
|
||||
throw exception<std::runtime_error>(join(
|
||||
"", "'--sample-overflow ", _v.front(),
|
||||
" ...' conflicts with '--sampling-overflow-event ",
|
||||
p.get<std::string>("sampling-overflow-event"), "' option"));
|
||||
update_env(_data, "OMNITRACE_SAMPLING_OVERFLOW_EVENT", _v.front());
|
||||
_v.pop_front();
|
||||
}
|
||||
if(!_v.empty())
|
||||
{
|
||||
update_env(_data, "OMNITRACE_SAMPLING_OVERFLOW_FREQ", _v.front());
|
||||
_v.pop_front();
|
||||
}
|
||||
if(!_v.empty())
|
||||
{
|
||||
update_env(_data, "OMNITRACE_SAMPLING_OVERFLOW_TIDS",
|
||||
join(array_config_t{ "," }, _v));
|
||||
}
|
||||
});
|
||||
|
||||
_data.processed_environs.emplace("sampling_overflow");
|
||||
}
|
||||
|
||||
_parser.start_group(
|
||||
"ADVANCED SAMPLING OPTIONS",
|
||||
"These options determine the heuristic for deciding when to take a sample");
|
||||
|
||||
@@ -80,7 +80,7 @@ address_range::contains(uintptr_t _v) const
|
||||
bool
|
||||
address_range::contains(address_range _v) const
|
||||
{
|
||||
return (*this == _v) || (contains(_v.low) && contains(_v.high));
|
||||
return (*this == _v) || (contains(_v.low) && (contains(_v.high) || _v.high == high));
|
||||
}
|
||||
|
||||
bool
|
||||
|
||||
@@ -130,6 +130,8 @@ OMNITRACE_DEFINE_CATEGORY(category, thread_context_switch, OMNITRACE_CATEGORY_TH
|
||||
OMNITRACE_DEFINE_CATEGORY(category, thread_hardware_counter, OMNITRACE_CATEGORY_THREAD_HARDWARE_COUNTER, "thread_hardware_counter", "Hardware counter value on thread (derived from sampling)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, kernel_hardware_counter, OMNITRACE_CATEGORY_KERNEL_HARDWARE_COUNTER, "kernel_hardware_counter", "Hardware counter value for kernel (deterministic)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, numa, OMNITRACE_CATEGORY_NUMA, "numa", "Non-unified memory architecture")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, timer_sampling, OMNITRACE_CATEGORY_TIMER_SAMPLING, "timer_sampling", "Sampling based on a timer")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, overflow_sampling, OMNITRACE_CATEGORY_OVERFLOW_SAMPLING, "overflow_sampling", "Sampling based on a counter overflow")
|
||||
|
||||
OMNITRACE_DECLARE_CATEGORY(category, sampling, OMNITRACE_CATEGORY_SAMPLING, "sampling", "Host-side call-stack sampling")
|
||||
// clang-format on
|
||||
@@ -192,6 +194,8 @@ using name = perfetto_category<Tp...>;
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::thread_hardware_counter), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::kernel_hardware_counter), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::numa), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::timer_sampling), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::overflow_sampling), \
|
||||
::perfetto::Category("timemory").SetDescription("Events from the timemory API")
|
||||
|
||||
#if defined(TIMEMORY_USE_PERFETTO)
|
||||
|
||||
@@ -233,6 +233,8 @@ OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::sampling_cpu_clock
|
||||
// enable percent units
|
||||
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_percent_units, component::sampling_gpu_busy,
|
||||
true_type)
|
||||
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_percent_units, component::sampling_percent,
|
||||
true_type)
|
||||
|
||||
// enable memory units
|
||||
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_memory_category, component::sampling_gpu_memory,
|
||||
@@ -253,6 +255,9 @@ OMNITRACE_DEFINE_CONCRETE_TRAIT(report_mean, component::sampling_percent, false_
|
||||
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_statistics, component::sampling_percent,
|
||||
false_type)
|
||||
|
||||
// reporting categories (self)
|
||||
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_self, component::sampling_percent, false_type)
|
||||
|
||||
#define OMNITRACE_DECLARE_EXTERN_COMPONENT(NAME, HAS_DATA, ...) \
|
||||
TIMEMORY_DECLARE_EXTERN_TEMPLATE( \
|
||||
struct tim::component::base<TIMEMORY_ESC(omnitrace::component::NAME), \
|
||||
|
||||
+170
-135
@@ -27,7 +27,9 @@
|
||||
#include "defines.hpp"
|
||||
#include "gpu.hpp"
|
||||
#include "mproc.hpp"
|
||||
#include "perf.hpp"
|
||||
#include "perfetto.hpp"
|
||||
#include "utility.hpp"
|
||||
|
||||
#include <timemory/backends/dmp.hpp>
|
||||
#include <timemory/backends/mpi.hpp>
|
||||
@@ -109,59 +111,7 @@ get_available_categories()
|
||||
return _v;
|
||||
}
|
||||
|
||||
template <typename Tp = int64_t, typename ContainerT = std::set<Tp>, typename Up = Tp>
|
||||
ContainerT
|
||||
parse_numeric_range(std::string _input_string, const std::string& _label, Up _incr)
|
||||
{
|
||||
auto _get_value = [](const std::string& _inp) {
|
||||
std::stringstream iss{ _inp };
|
||||
auto var = Tp{};
|
||||
iss >> var;
|
||||
return var;
|
||||
};
|
||||
|
||||
for(auto& itr : _input_string)
|
||||
itr = tolower(itr);
|
||||
auto _result = ContainerT{};
|
||||
for(const auto& _v : tim::delimit(_input_string, ",; \t\n\r"))
|
||||
{
|
||||
if(_v.find_first_not_of("0123456789-") != std::string::npos)
|
||||
{
|
||||
OMNITRACE_VERBOSE_F(
|
||||
0,
|
||||
"Invalid %s specification. Only numerical values (e.g., 0) or "
|
||||
"ranges (e.g., 0-7) are permitted. Ignoring %s...",
|
||||
_label.c_str(), _v.c_str());
|
||||
continue;
|
||||
}
|
||||
if(_v.find('-') != std::string::npos)
|
||||
{
|
||||
auto _vv = tim::delimit(_v, "-");
|
||||
OMNITRACE_CONDITIONAL_THROW(
|
||||
_vv.size() != 2,
|
||||
"Invalid %s range specification: %s. Required format N-M, e.g. 0-4",
|
||||
_label.c_str(), _v.c_str());
|
||||
Tp _vn = _get_value(_vv.at(0));
|
||||
Tp _vN = _get_value(_vv.at(1));
|
||||
do
|
||||
{
|
||||
if constexpr(std::is_same<ContainerT, std::set<Tp>>::value)
|
||||
_result.emplace(_vn);
|
||||
else
|
||||
_result.emplace_back(_vn);
|
||||
_vn += _incr;
|
||||
} while(_vn <= _vN);
|
||||
}
|
||||
else
|
||||
{
|
||||
if constexpr(std::is_same<ContainerT, std::set<Tp>>::value)
|
||||
_result.emplace(std::stol(_v));
|
||||
else
|
||||
_result.emplace_back(std::stol(_v));
|
||||
}
|
||||
}
|
||||
return _result;
|
||||
}
|
||||
using utility::parse_numeric_range;
|
||||
|
||||
#define OMNITRACE_CONFIG_SETTING(TYPE, ENV_NAME, DESCRIPTION, INITIAL_VALUE, ...) \
|
||||
[&]() { \
|
||||
@@ -454,6 +404,11 @@ configure_settings(bool _init)
|
||||
"Defaults to OMNITRACE_SAMPLING_FREQ when <= 0.0",
|
||||
-1.0, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(double, "OMNITRACE_SAMPLING_OVERFLOW_FREQ",
|
||||
"Number of events in between each sample. "
|
||||
"Defaults to OMNITRACE_SAMPLING_FREQ when <= 0.0",
|
||||
-1.0, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
double, "OMNITRACE_SAMPLING_DELAY",
|
||||
"Time (in seconds) to wait before the first sampling signal is delivered, "
|
||||
@@ -518,15 +473,22 @@ configure_settings(bool _init)
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_SAMPLING_CPUTIME_TIDS",
|
||||
"Same as OMNITRACE_SAMPLING_TIDS but applies specifically to samplers whose "
|
||||
"timers are based on the CPU-time. This is useful when both "
|
||||
"OMNITRACE_SAMPLING_CPUTIME=ON and OMNITRACE_SAMPLING_REALTIME=ON",
|
||||
"timers are based on the CPU-time. This is useful when you want to restrict "
|
||||
"samples to particular threads.",
|
||||
std::string{}, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_SAMPLING_REALTIME_TIDS",
|
||||
"Same as OMNITRACE_SAMPLING_TIDS but applies specifically to samplers whose "
|
||||
"timers are based on the real (wall) time. This is useful when both "
|
||||
"OMNITRACE_SAMPLING_CPUTIME=ON and OMNITRACE_SAMPLING_REALTIME=ON",
|
||||
"timers are based on the real (wall) time. This is useful when you want to "
|
||||
"restrict samples to particular threads.",
|
||||
std::string{}, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_SAMPLING_OVERFLOW_TIDS",
|
||||
"Same as OMNITRACE_SAMPLING_TIDS but applies specifically to samplers whose "
|
||||
"samples are based on the overflow of a particular event. This is useful when "
|
||||
"you want to restrict samples to particular threads.",
|
||||
std::string{}, "sampling", "advanced");
|
||||
|
||||
auto _backend = tim::get_env_choice<std::string>(
|
||||
@@ -593,29 +555,45 @@ configure_settings(bool _init)
|
||||
"thread started by the application.",
|
||||
8, "sampling", "debugging", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_SAMPLING_OVERFLOW",
|
||||
"Enable sampling via an overflow of a HW counter. This "
|
||||
"requires Linux perf (/proc/sys/kernel/perf_event_paranoid "
|
||||
"created by OS) with a value of 2 or less in that file",
|
||||
false, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
bool, "OMNITRACE_SAMPLING_REALTIME",
|
||||
"Enable sampling frequency via a wall-clock timer on child threads. This may "
|
||||
"result in typically idle child threads consuming an unnecessary large amount of "
|
||||
"CPU time. The main thread always has this enabled.",
|
||||
"Enable sampling frequency via a wall-clock timer. This may result in typically "
|
||||
"idle child threads consuming an unnecessary large amount of CPU time.",
|
||||
false, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_SAMPLING_CPUTIME",
|
||||
"Enable sampling frequency via a timer that measures both "
|
||||
"CPU time used by the current process, "
|
||||
"and CPU time expended on behalf of the process by the "
|
||||
"system. This is recommended.",
|
||||
true, "sampling", "advanced");
|
||||
|
||||
auto _sigrt_range = SIGRTMAX - SIGRTMIN;
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
int, "OMNITRACE_SAMPLING_REALTIME_OFFSET",
|
||||
std::string{
|
||||
"Modify this value only if the target process is also using SIGRTMIN. E.g. "
|
||||
"the signal used is SIGRTMIN + <THIS_VALUE>. Value must be <= " } +
|
||||
std::to_string(_sigrt_range),
|
||||
0, "sampling", "advanced");
|
||||
bool, "OMNITRACE_SAMPLING_CPUTIME",
|
||||
"Enable sampling frequency via a timer that measures both CPU time used by the "
|
||||
"current process, and CPU time expended on behalf of the process by the system. "
|
||||
"This is recommended.",
|
||||
false, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(int, "OMNITRACE_SAMPLING_CPUTIME_SIGNAL",
|
||||
"Modify this value only if the target process is also using "
|
||||
"the same signal (SIGPROF)",
|
||||
SIGPROF, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(int, "OMNITRACE_SAMPLING_REALTIME_SIGNAL",
|
||||
"Modify this value only if the target process is also using "
|
||||
"the same signal (SIGRTMIN)",
|
||||
SIGRTMIN, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(int, "OMNITRACE_SAMPLING_OVERFLOW_SIGNAL",
|
||||
"Modify this value only if the target process is also using "
|
||||
"the same signal (SIGRTMIN + 1)",
|
||||
SIGRTMIN + 1, "sampling", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_SAMPLING_OVERFLOW_EVENT",
|
||||
"Metric for overflow sampling",
|
||||
std::string{ "perf::PERF_COUNT_HW_CACHE_REFERENCES" },
|
||||
"sampling", "hardware_counters")
|
||||
->set_choices(perf::get_config_choices());
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_ROCTRACER_HIP_API",
|
||||
"Enable HIP API tracing support", true, "roctracer", "rocm",
|
||||
@@ -757,12 +735,22 @@ configure_settings(bool _init)
|
||||
std::string, "OMNITRACE_TMPDIR", "Base directory for temporary files",
|
||||
get_env<std::string>("TMPDIR", "/tmp"), "io", "data", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_CAUSAL_BACKEND",
|
||||
"Backend for call-stack sampling. See "
|
||||
"https://amdresearch.github.io/omnitrace/causal_profiling.html#backends for more "
|
||||
"info. If set to \"auto\", omnitrace will attempt to use the perf backend and "
|
||||
"fallback on the timer backend if unavailable",
|
||||
std::string{ "auto" }, "causal", "analysis")
|
||||
->set_choices({ "auto", "perf", "timer" });
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_CAUSAL_MODE",
|
||||
"Perform causal experiments at the function-scope or line-scope. Ideally, use "
|
||||
"function first to locate function with highest impact and then switch to line "
|
||||
"mode + OMNITRACE_CAUSAL_FUNCTION_SCOPE set to the function being targeted.",
|
||||
std::string{ "function" }, "causal", "analysis", "advanced");
|
||||
std::string{ "function" }, "causal", "analysis")
|
||||
->set_choices({ "func", "line", "function" });
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
double, "OMNITRACE_CAUSAL_DELAY",
|
||||
@@ -1306,16 +1294,25 @@ configure_signal_handler(const std::shared_ptr<settings>& _config)
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
get_realtime_signal()
|
||||
bool
|
||||
get_use_sampling_overflow()
|
||||
{
|
||||
return SIGRTMIN + get_sampling_rtoffset();
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_OVERFLOW");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
int
|
||||
get_cputime_signal()
|
||||
bool
|
||||
get_use_sampling_realtime()
|
||||
{
|
||||
return SIGPROF;
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
get_use_sampling_cputime()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
std::set<int> get_sampling_signals(int64_t)
|
||||
@@ -1323,13 +1320,22 @@ std::set<int> get_sampling_signals(int64_t)
|
||||
auto _v = std::set<int>{};
|
||||
if(get_use_causal())
|
||||
{
|
||||
_v.emplace(get_cputime_signal());
|
||||
_v.emplace(get_realtime_signal());
|
||||
_v.emplace(get_sampling_cputime_signal());
|
||||
_v.emplace(get_sampling_realtime_signal());
|
||||
}
|
||||
else
|
||||
{
|
||||
if(get_use_sampling_cputime()) _v.emplace(get_cputime_signal());
|
||||
if(get_use_sampling_realtime()) _v.emplace(get_realtime_signal());
|
||||
if(get_use_sampling() && !get_use_sampling_cputime() &&
|
||||
!get_use_sampling_realtime() && !get_use_sampling_overflow())
|
||||
{
|
||||
OMNITRACE_VERBOSE_F(1, "sampling enabled by cputime/realtime/overflow not "
|
||||
"specified. defaulting to cputime...\n");
|
||||
set_setting_value("OMNITRACE_SAMPLING_CPUTIME", true);
|
||||
}
|
||||
|
||||
if(get_use_sampling_cputime()) _v.emplace(get_sampling_cputime_signal());
|
||||
if(get_use_sampling_realtime()) _v.emplace(get_sampling_realtime_signal());
|
||||
if(get_use_sampling_overflow()) _v.emplace(get_sampling_overflow_signal());
|
||||
}
|
||||
|
||||
return _v;
|
||||
@@ -2008,24 +2014,24 @@ get_sampling_keep_internal()
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
get_use_sampling_realtime()
|
||||
int
|
||||
get_sampling_overflow_signal()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
bool
|
||||
get_use_sampling_cputime()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME");
|
||||
return static_cast<tim::tsettings<bool>&>(*_v->second).get();
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_OVERFLOW_SIGNAL");
|
||||
return static_cast<tim::tsettings<int>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
int
|
||||
get_sampling_rtoffset()
|
||||
get_sampling_realtime_signal()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_OFFSET");
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_SIGNAL");
|
||||
return static_cast<tim::tsettings<int>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
int
|
||||
get_sampling_cputime_signal()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME_SIGNAL");
|
||||
return static_cast<tim::tsettings<int>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
@@ -2241,7 +2247,7 @@ get_sampling_freq()
|
||||
}
|
||||
|
||||
double
|
||||
get_sampling_cpu_freq()
|
||||
get_sampling_cputime_freq()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME_FREQ");
|
||||
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
|
||||
@@ -2250,7 +2256,7 @@ get_sampling_cpu_freq()
|
||||
}
|
||||
|
||||
double
|
||||
get_sampling_real_freq()
|
||||
get_sampling_realtime_freq()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_FREQ");
|
||||
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
|
||||
@@ -2258,6 +2264,15 @@ get_sampling_real_freq()
|
||||
return _val;
|
||||
}
|
||||
|
||||
double
|
||||
get_sampling_overflow_freq()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_OVERFLOW_FREQ");
|
||||
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
|
||||
if(_val <= 0.0) _val = get_sampling_freq();
|
||||
return _val;
|
||||
}
|
||||
|
||||
double
|
||||
get_sampling_delay()
|
||||
{
|
||||
@@ -2266,7 +2281,7 @@ get_sampling_delay()
|
||||
}
|
||||
|
||||
double
|
||||
get_sampling_cpu_delay()
|
||||
get_sampling_cputime_delay()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME_DELAY");
|
||||
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
|
||||
@@ -2275,7 +2290,7 @@ get_sampling_cpu_delay()
|
||||
}
|
||||
|
||||
double
|
||||
get_sampling_real_delay()
|
||||
get_sampling_realtime_delay()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_DELAY");
|
||||
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
|
||||
@@ -2293,32 +2308,40 @@ get_sampling_duration()
|
||||
std::string
|
||||
get_sampling_cpus()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUS");
|
||||
auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUS");
|
||||
return static_cast<tim::tsettings<std::string>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
std::set<int64_t>
|
||||
get_sampling_tids()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_TIDS");
|
||||
auto _v = get_config()->find("OMNITRACE_SAMPLING_TIDS");
|
||||
return parse_numeric_range<>(
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), "thread IDs", 1);
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), "thread IDs", 1L);
|
||||
}
|
||||
|
||||
std::set<int64_t>
|
||||
get_sampling_cpu_tids()
|
||||
get_sampling_cputime_tids()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME_TIDS");
|
||||
auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME_TIDS");
|
||||
return parse_numeric_range<>(
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), "thread IDs", 1);
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), "thread IDs", 1L);
|
||||
}
|
||||
|
||||
std::set<int64_t>
|
||||
get_sampling_real_tids()
|
||||
get_sampling_realtime_tids()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_TIDS");
|
||||
auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_TIDS");
|
||||
return parse_numeric_range<>(
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), "thread IDs", 1);
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), "thread IDs", 1L);
|
||||
}
|
||||
|
||||
std::set<int64_t>
|
||||
get_sampling_overflow_tids()
|
||||
{
|
||||
auto _v = get_config()->find("OMNITRACE_SAMPLING_OVERFLOW_TIDS");
|
||||
return parse_numeric_range<>(
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), "thread IDs", 1L);
|
||||
}
|
||||
|
||||
bool
|
||||
@@ -2415,14 +2438,8 @@ get_trace_thread_join()
|
||||
bool
|
||||
get_debug_tid()
|
||||
{
|
||||
static auto _vlist = []() {
|
||||
std::unordered_set<int64_t> _tids{};
|
||||
for(auto itr : tim::delimit<std::vector<int64_t>>(
|
||||
tim::get_env<std::string>("OMNITRACE_DEBUG_TIDS", ""),
|
||||
",: ", [](const std::string& _v) { return std::stoll(_v); }))
|
||||
_tids.insert(itr);
|
||||
return _tids;
|
||||
}();
|
||||
static auto _vlist = parse_numeric_range<int64_t, std::unordered_set<int64_t>>(
|
||||
tim::get_env<std::string>("OMNITRACE_DEBUG_TIDS", ""), "debug tids", 1L);
|
||||
static thread_local bool _v =
|
||||
_vlist.empty() || _vlist.count(tim::threading::get_id()) > 0;
|
||||
return _v;
|
||||
@@ -2431,14 +2448,8 @@ get_debug_tid()
|
||||
bool
|
||||
get_debug_pid()
|
||||
{
|
||||
static auto _vlist = []() {
|
||||
std::unordered_set<int64_t> _pids{};
|
||||
for(auto itr : tim::delimit<std::vector<int64_t>>(
|
||||
tim::get_env<std::string>("OMNITRACE_DEBUG_PIDS", ""),
|
||||
",: ", [](const std::string& _v) { return std::stoll(_v); }))
|
||||
_pids.insert(itr);
|
||||
return _pids;
|
||||
}();
|
||||
static auto _vlist = parse_numeric_range<int64_t, std::unordered_set<int64_t>>(
|
||||
tim::get_env<std::string>("OMNITRACE_DEBUG_PIDS", ""), "debug pids", 1L);
|
||||
static bool _v = _vlist.empty() || _vlist.count(tim::process::get_id()) > 0 ||
|
||||
_vlist.count(dmp::rank()) > 0;
|
||||
return _v;
|
||||
@@ -2589,6 +2600,31 @@ get_tmp_file(std::string _basename, std::string _ext)
|
||||
return _existing_files.at(_fname);
|
||||
}
|
||||
|
||||
CausalBackend
|
||||
get_causal_backend()
|
||||
{
|
||||
static auto _m = std::unordered_map<std::string_view, CausalBackend>{
|
||||
{ "auto", CausalBackend::Auto },
|
||||
{ "perf", CausalBackend::Perf },
|
||||
{ "timer", CausalBackend::Timer },
|
||||
};
|
||||
|
||||
auto _v = get_config()->find("OMNITRACE_CAUSAL_BACKEND");
|
||||
try
|
||||
{
|
||||
return _m.at(static_cast<tim::tsettings<std::string>&>(*_v->second).get());
|
||||
} catch(std::runtime_error& _e)
|
||||
{
|
||||
auto _mode = static_cast<tim::tsettings<std::string>&>(*_v->second).get();
|
||||
OMNITRACE_THROW("[%s] invalid causal backend %s. Choices: %s\n", __FUNCTION__,
|
||||
_mode.c_str(),
|
||||
timemory::join::join(timemory::join::array_config{ ", ", "", "" },
|
||||
_v->second->get_choices())
|
||||
.c_str());
|
||||
}
|
||||
return CausalBackend::Auto;
|
||||
}
|
||||
|
||||
CausalMode
|
||||
get_causal_mode()
|
||||
{
|
||||
@@ -2612,12 +2648,11 @@ get_causal_mode()
|
||||
} catch(std::runtime_error& _e)
|
||||
{
|
||||
auto _mode = static_cast<tim::tsettings<std::string>&>(*_v->second).get();
|
||||
std::stringstream _ss{};
|
||||
for(const auto& itr : _v->second->get_choices())
|
||||
_ss << ", " << itr;
|
||||
auto _msg = (_ss.str().length() > 2) ? _ss.str().substr(2) : std::string{};
|
||||
OMNITRACE_THROW("[%s] invalid causal mode %s. Choices: %s\n", __FUNCTION__,
|
||||
_mode.c_str(), _msg.c_str());
|
||||
OMNITRACE_THROW(
|
||||
"[%s] invalid causal mode %s. Choices: %s\n", __FUNCTION__, _mode.c_str(),
|
||||
timemory::join::join(timemory::join::array_config{ ", ", "", "" },
|
||||
_v->second->get_choices())
|
||||
.c_str());
|
||||
}
|
||||
return CausalMode::Function;
|
||||
}();
|
||||
@@ -2637,7 +2672,7 @@ get_causal_fixed_speedup()
|
||||
static auto _v = get_config()->find("OMNITRACE_CAUSAL_FIXED_SPEEDUP");
|
||||
return parse_numeric_range<int64_t, std::vector<int64_t>>(
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(),
|
||||
"causal fixed speedup", 5);
|
||||
"causal fixed speedup", 5L);
|
||||
}
|
||||
|
||||
std::string
|
||||
|
||||
@@ -64,10 +64,13 @@ void
|
||||
configure_disabled_settings(const std::shared_ptr<settings>&);
|
||||
|
||||
int
|
||||
get_realtime_signal();
|
||||
get_sampling_overflow_signal();
|
||||
|
||||
int
|
||||
get_cputime_signal();
|
||||
get_sampling_realtime_signal();
|
||||
|
||||
int
|
||||
get_sampling_cputime_signal();
|
||||
|
||||
std::set<int>
|
||||
get_sampling_signals(int64_t _tid = 0);
|
||||
@@ -233,15 +236,6 @@ get_use_code_coverage();
|
||||
bool
|
||||
get_sampling_keep_internal();
|
||||
|
||||
bool
|
||||
get_use_sampling_realtime();
|
||||
|
||||
bool
|
||||
get_use_sampling_cputime();
|
||||
|
||||
int
|
||||
get_sampling_rtoffset();
|
||||
|
||||
bool
|
||||
get_use_rcclp();
|
||||
|
||||
@@ -316,19 +310,22 @@ double
|
||||
get_sampling_freq();
|
||||
|
||||
double
|
||||
get_sampling_cpu_freq();
|
||||
get_sampling_cputime_freq();
|
||||
|
||||
double
|
||||
get_sampling_real_freq();
|
||||
get_sampling_realtime_freq();
|
||||
|
||||
double
|
||||
get_sampling_overflow_freq();
|
||||
|
||||
double
|
||||
get_sampling_delay();
|
||||
|
||||
double
|
||||
get_sampling_cpu_delay();
|
||||
get_sampling_cputime_delay();
|
||||
|
||||
double
|
||||
get_sampling_real_delay();
|
||||
get_sampling_realtime_delay();
|
||||
|
||||
double
|
||||
get_sampling_duration();
|
||||
@@ -337,10 +334,13 @@ std::string
|
||||
get_sampling_cpus();
|
||||
|
||||
std::set<int64_t>
|
||||
get_sampling_cpu_tids();
|
||||
get_sampling_cputime_tids();
|
||||
|
||||
std::set<int64_t>
|
||||
get_sampling_real_tids();
|
||||
get_sampling_realtime_tids();
|
||||
|
||||
std::set<int64_t>
|
||||
get_sampling_overflow_tids();
|
||||
|
||||
bool
|
||||
get_sampling_include_inlines();
|
||||
@@ -408,6 +408,9 @@ struct tmp_file
|
||||
std::shared_ptr<tmp_file>
|
||||
get_tmp_file(std::string _basename, std::string _ext = "dat");
|
||||
|
||||
CausalBackend
|
||||
get_causal_backend();
|
||||
|
||||
CausalMode
|
||||
get_causal_mode();
|
||||
|
||||
|
||||
@@ -82,6 +82,14 @@ struct c_array
|
||||
return c_array<Tp>(&m_base[start], end - start);
|
||||
}
|
||||
|
||||
void pop_front()
|
||||
{
|
||||
++m_base;
|
||||
--m_size;
|
||||
}
|
||||
|
||||
void pop_back() { --m_size; }
|
||||
|
||||
operator Tp*() const { return m_base; }
|
||||
|
||||
// Iterator class for convenient range-based for loop support
|
||||
|
||||
@@ -23,6 +23,7 @@
|
||||
#pragma once
|
||||
|
||||
#include "core/common.hpp"
|
||||
#include "core/containers/c_array.hpp"
|
||||
#include "core/debug.hpp"
|
||||
#include "core/exception.hpp"
|
||||
|
||||
@@ -50,6 +51,10 @@ struct static_vector
|
||||
static_vector& operator=(static_vector&&) noexcept = default;
|
||||
|
||||
static_vector(size_t _n, Tp _v = {});
|
||||
explicit static_vector(c_array<Tp>&&);
|
||||
|
||||
template <size_t M>
|
||||
explicit static_vector(std::array<Tp, M>&&);
|
||||
|
||||
static_vector& operator=(std::initializer_list<Tp>&& _v);
|
||||
static_vector& operator=(std::pair<std::array<Tp, N>, size_t>&&);
|
||||
@@ -92,10 +97,16 @@ struct static_vector
|
||||
decltype(auto) back() { return *(m_data.begin() + size() - 1); }
|
||||
decltype(auto) back() const { return *(m_data.begin() + size() - 1); }
|
||||
|
||||
auto* data() { return m_data.data(); }
|
||||
const auto* data() const { return m_data.data(); }
|
||||
|
||||
void swap(this_type& _v);
|
||||
|
||||
friend void swap(this_type& _lhs, this_type& _rhs) { _lhs.swap(_rhs); }
|
||||
|
||||
private:
|
||||
void update_size(size_t);
|
||||
|
||||
private:
|
||||
count_type m_size = count_type{ 0 };
|
||||
std::array<Tp, N> m_data = {};
|
||||
@@ -104,8 +115,25 @@ private:
|
||||
template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
static_vector<Tp, N, AtomicSizeV>::static_vector(size_t _n, Tp _v)
|
||||
{
|
||||
m_size.store(_n);
|
||||
m_data.fill(_v);
|
||||
update_size(_n);
|
||||
}
|
||||
|
||||
template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
static_vector<Tp, N, AtomicSizeV>::static_vector(c_array<Tp>&& _v)
|
||||
{
|
||||
auto _n = std::min<size_t>(N, _v.size());
|
||||
for(size_t i = 0; i < _n; ++i, ++m_size)
|
||||
m_data[i] = _v[i];
|
||||
}
|
||||
|
||||
template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
template <size_t M>
|
||||
static_vector<Tp, N, AtomicSizeV>::static_vector(std::array<Tp, M>&& _v)
|
||||
{
|
||||
auto _n = std::min<size_t>(N, M);
|
||||
for(size_t i = 0; i < _n; ++i, ++m_size)
|
||||
m_data[i] = _v[i];
|
||||
}
|
||||
|
||||
template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
@@ -129,14 +157,9 @@ template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
static_vector<Tp, N, AtomicSizeV>&
|
||||
static_vector<Tp, N, AtomicSizeV>::operator=(std::pair<std::array<Tp, N>, size_t>&& _v)
|
||||
{
|
||||
if constexpr(AtomicSizeV) m_size.store(0);
|
||||
|
||||
update_size(0);
|
||||
m_data = std::move(_v.first);
|
||||
|
||||
if constexpr(AtomicSizeV)
|
||||
m_size.store(_v.second);
|
||||
else
|
||||
m_size = _v.second;
|
||||
update_size(_v.second);
|
||||
|
||||
return *this;
|
||||
}
|
||||
@@ -145,10 +168,7 @@ template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
void
|
||||
static_vector<Tp, N, AtomicSizeV>::clear()
|
||||
{
|
||||
if constexpr(AtomicSizeV)
|
||||
m_size.store(0);
|
||||
else
|
||||
m_size = 0;
|
||||
update_size(0);
|
||||
}
|
||||
|
||||
template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
@@ -160,8 +180,8 @@ static_vector<Tp, N, AtomicSizeV>::swap(this_type& _v)
|
||||
auto _t_size = m_size;
|
||||
auto _v_size = _v.m_size;
|
||||
std::swap(m_data, _v.m_data);
|
||||
m_size.store(_v_size);
|
||||
_v.m_size.store(_t_size);
|
||||
update_size(_v_size);
|
||||
_v.update_size(_t_size);
|
||||
}
|
||||
else
|
||||
{
|
||||
@@ -190,5 +210,14 @@ static_vector<Tp, N, AtomicSizeV>::emplace_back(Args&&... _v)
|
||||
return m_data[_idx];
|
||||
}
|
||||
|
||||
template <typename Tp, size_t N, bool AtomicSizeV>
|
||||
void
|
||||
static_vector<Tp, N, AtomicSizeV>::update_size(size_t _n)
|
||||
{
|
||||
if constexpr(AtomicSizeV)
|
||||
m_size.store(_n);
|
||||
else
|
||||
m_size = _n;
|
||||
}
|
||||
} // namespace container
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -22,6 +22,7 @@
|
||||
|
||||
#include "debug.hpp"
|
||||
#include "binary/address_range.hpp"
|
||||
#include "locking.hpp"
|
||||
#include "state.hpp"
|
||||
|
||||
#include <timemory/log/color.hpp>
|
||||
@@ -87,7 +88,8 @@ set_source_location(source_location&& _v)
|
||||
}
|
||||
|
||||
lock::lock()
|
||||
: m_lk{ tim::type_mutex<decltype(std::cerr)>(), std::defer_lock }
|
||||
: m_lk{ tim::type_mutex<decltype(std::cerr), TIMEMORY_API, 1, locking::atomic_mutex>(),
|
||||
std::defer_lock }
|
||||
{
|
||||
if(!m_lk.owns_lock() && !_protect_lock)
|
||||
{
|
||||
|
||||
@@ -24,6 +24,7 @@
|
||||
|
||||
#include "defines.hpp"
|
||||
#include "exception.hpp"
|
||||
#include "locking.hpp"
|
||||
|
||||
#include <timemory/api.hpp>
|
||||
#include <timemory/backends/dmp.hpp>
|
||||
@@ -109,7 +110,7 @@ struct lock
|
||||
~lock();
|
||||
|
||||
private:
|
||||
tim::auto_lock_t m_lk;
|
||||
locking::atomic_lock m_lk;
|
||||
};
|
||||
//
|
||||
template <typename Arg, typename... Args>
|
||||
@@ -220,6 +221,43 @@ as_hex<void*>(void*, size_t);
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
|
||||
#define OMNITRACE_CONDITIONAL_PRINT_COLOR(COLOR, COND, ...) \
|
||||
if((COND) && ::omnitrace::config::get_debug_tid() && \
|
||||
::omnitrace::config::get_debug_pid()) \
|
||||
{ \
|
||||
::omnitrace::debug::flush(); \
|
||||
::omnitrace::debug::lock _debug_lk{}; \
|
||||
OMNITRACE_FPRINTF_STDERR_COLOR(COLOR); \
|
||||
fprintf(::omnitrace::debug::get_file(), "[omnitrace][%i][%li]%s", \
|
||||
OMNITRACE_DEBUG_PROCESS_IDENTIFIER, OMNITRACE_DEBUG_THREAD_IDENTIFIER, \
|
||||
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
|
||||
fprintf(::omnitrace::debug::get_file(), __VA_ARGS__); \
|
||||
::omnitrace::debug::flush(); \
|
||||
}
|
||||
|
||||
#define OMNITRACE_CONDITIONAL_PRINT_COLOR_F(COLOR, COND, ...) \
|
||||
if((COND) && ::omnitrace::config::get_debug_tid() && \
|
||||
::omnitrace::config::get_debug_pid()) \
|
||||
{ \
|
||||
::omnitrace::debug::flush(); \
|
||||
::omnitrace::debug::lock _debug_lk{}; \
|
||||
OMNITRACE_FPRINTF_STDERR_COLOR(COLOR); \
|
||||
fprintf(::omnitrace::debug::get_file(), "[omnitrace][%i][%li][%s]%s", \
|
||||
OMNITRACE_DEBUG_PROCESS_IDENTIFIER, OMNITRACE_DEBUG_THREAD_IDENTIFIER, \
|
||||
OMNITRACE_FUNCTION, \
|
||||
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
|
||||
fprintf(::omnitrace::debug::get_file(), __VA_ARGS__); \
|
||||
::omnitrace::debug::flush(); \
|
||||
}
|
||||
|
||||
#define OMNITRACE_PRINT_COLOR(COLOR, ...) \
|
||||
OMNITRACE_CONDITIONAL_PRINT_COLOR(COLOR, true, __VA_ARGS__)
|
||||
|
||||
#define OMNITRACE_PRINT_COLOR_F(COLOR, ...) \
|
||||
OMNITRACE_CONDITIONAL_PRINT_COLOR_F(COLOR, true, __VA_ARGS__)
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
|
||||
#define OMNITRACE_CONDITIONAL_PRINT(COND, ...) \
|
||||
if((COND) && ::omnitrace::config::get_debug_tid() && \
|
||||
::omnitrace::config::get_debug_pid()) \
|
||||
|
||||
@@ -28,6 +28,8 @@
|
||||
#include <set>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <sys/wait.h>
|
||||
#include <thread>
|
||||
#include <unistd.h>
|
||||
|
||||
namespace omnitrace
|
||||
@@ -70,5 +72,123 @@ get_process_index(int _pid, int _ppid)
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
int
|
||||
wait_pid(pid_t _pid, int _opts)
|
||||
{
|
||||
int _status = 0;
|
||||
pid_t _pid_v = -1;
|
||||
_opts |= WUNTRACED;
|
||||
do
|
||||
{
|
||||
if((_opts & WNOHANG) > 0)
|
||||
{
|
||||
std::this_thread::yield();
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds{ 100 });
|
||||
}
|
||||
_pid_v = waitpid(_pid, &_status, _opts);
|
||||
} while(_pid_v <= 0);
|
||||
return _status;
|
||||
}
|
||||
|
||||
int
|
||||
diagnose_status(pid_t _pid, int _status, int _verbose)
|
||||
{
|
||||
if(_verbose >= 3)
|
||||
{
|
||||
fflush(stderr);
|
||||
fflush(stdout);
|
||||
std::cout << std::flush;
|
||||
std::cerr << std::flush;
|
||||
}
|
||||
|
||||
bool _normal_exit = (WIFEXITED(_status) > 0);
|
||||
bool _unhandled_signal = (WIFSIGNALED(_status) > 0);
|
||||
bool _core_dump = (WCOREDUMP(_status) > 0);
|
||||
bool _stopped = (WIFSTOPPED(_status) > 0);
|
||||
int _exit_status = WEXITSTATUS(_status);
|
||||
int _stop_signal = (_stopped) ? WSTOPSIG(_status) : 0;
|
||||
int _ec = (_unhandled_signal) ? WTERMSIG(_status) : 0;
|
||||
|
||||
if(_verbose >= 4)
|
||||
{
|
||||
TIMEMORY_PRINTF_INFO(
|
||||
stderr,
|
||||
"diagnosing status for process %i :: status: %i... normal exit: %s, "
|
||||
"unhandled signal: %s, core dump: %s, stopped: %s, exit status: %i, stop "
|
||||
"signal: %i, exit code: %i\n",
|
||||
_pid, _status, std::to_string(_normal_exit).c_str(),
|
||||
std::to_string(_unhandled_signal).c_str(), std::to_string(_core_dump).c_str(),
|
||||
std::to_string(_stopped).c_str(), _exit_status, _stop_signal, _ec);
|
||||
}
|
||||
else if(_verbose >= 3)
|
||||
{
|
||||
TIMEMORY_PRINTF_INFO(stderr,
|
||||
"diagnosing status for process %i :: status: %i ...\n", _pid,
|
||||
_status);
|
||||
}
|
||||
|
||||
if(!_normal_exit)
|
||||
{
|
||||
if(_ec == 0) _ec = EXIT_FAILURE;
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "process %i terminated abnormally. exit code: %i\n", _pid, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(_stopped)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(stderr,
|
||||
"process %i stopped with signal %i. exit code: %i\n",
|
||||
_pid, _stop_signal, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(_core_dump)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "process %i terminated and produced a core dump. exit code: %i\n",
|
||||
_pid, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(_unhandled_signal)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(stderr,
|
||||
"process %i terminated because it received a signal "
|
||||
"(%i) that was not handled. exit code: %i\n",
|
||||
_pid, _ec, _ec);
|
||||
}
|
||||
}
|
||||
|
||||
if(!_normal_exit && _exit_status > 0)
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
{
|
||||
if(_exit_status == 127)
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr, "execv in process %i failed. exit code: %i\n", _pid, _ec);
|
||||
}
|
||||
else
|
||||
{
|
||||
TIMEMORY_PRINTF_FATAL(
|
||||
stderr,
|
||||
"process %i terminated with a non-zero status. exit code: %i\n", _pid,
|
||||
_ec);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return _ec;
|
||||
}
|
||||
} // namespace mproc
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -35,5 +35,11 @@ get_concurrent_processes(int _ppid = getppid());
|
||||
|
||||
int
|
||||
get_process_index(int _pid = getpid(), int _ppid = getppid());
|
||||
|
||||
int
|
||||
wait_pid(pid_t _pid, int _opts = 0);
|
||||
|
||||
int
|
||||
diagnose_status(pid_t _pid, int _status, int _verbose = 0);
|
||||
} // namespace mproc
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -0,0 +1,244 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "perf.hpp"
|
||||
#include "debug.hpp"
|
||||
|
||||
#include <timemory/units.hpp>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace perf
|
||||
{
|
||||
namespace units = ::tim::units;
|
||||
|
||||
std::vector<std::string>
|
||||
get_config_choices()
|
||||
{
|
||||
namespace regex_const = ::std::regex_constants;
|
||||
|
||||
auto _data = std::vector<std::string>{};
|
||||
auto _papi_events = tim::papi::available_events_info();
|
||||
const auto _prefix = std::string_view{ "perf::" };
|
||||
auto _regex =
|
||||
std::regex{ "^(perf::|)PERF_COUNT_(HW|SW|HW_CACHE)_([A-Z_]+)(|:[A-Z]+)$",
|
||||
regex_const::optimize };
|
||||
|
||||
for(const auto& itr : _papi_events)
|
||||
{
|
||||
if(std::regex_match(itr.symbol(), _regex))
|
||||
{
|
||||
auto _symbol = itr.symbol();
|
||||
auto _pos = _symbol.find(_prefix);
|
||||
if(_pos == 0) _symbol = _symbol.substr(_prefix.length());
|
||||
_data.emplace_back(_symbol);
|
||||
}
|
||||
}
|
||||
|
||||
std::sort(_data.begin(), _data.end());
|
||||
_data.erase(std::unique(_data.begin(), _data.end()), _data.end());
|
||||
|
||||
return _data;
|
||||
}
|
||||
|
||||
event_type
|
||||
get_event_type(std::string_view _v)
|
||||
{
|
||||
if(_v.find("PERF_COUNT_SW_") != std::string_view::npos)
|
||||
return event_type::software;
|
||||
else if(_v.find("PERF_COUNT_SW_") != std::string_view::npos &&
|
||||
!std::regex_search(_v.data(),
|
||||
std::regex{ "PERF_COUNT_HW_CACHE_(MISSES|REFERENCES)$" }))
|
||||
return event_type::hw_cache;
|
||||
else if(_v.find("PERF_COUNT_HW_") != std::string_view::npos)
|
||||
return event_type::hardware;
|
||||
return event_type::max;
|
||||
}
|
||||
|
||||
hw_config
|
||||
get_hw_config(std::string_view _v)
|
||||
{
|
||||
#define HW_CONFIG_REGEX(KEY) std::regex_search(_v.data(), std::regex{ "(HW_" KEY ")$" })
|
||||
|
||||
if(HW_CONFIG_REGEX("CPU_CYCLES"))
|
||||
return hw_config::cpu_cycles;
|
||||
else if(HW_CONFIG_REGEX("INSTRUCTIONS"))
|
||||
return hw_config::instructions;
|
||||
else if(HW_CONFIG_REGEX("CACHE_REFERENCES"))
|
||||
return hw_config::cache_references;
|
||||
else if(HW_CONFIG_REGEX("CACHE_MISSES"))
|
||||
return hw_config::cache_misses;
|
||||
else if(HW_CONFIG_REGEX("BRANCH_INSTRUCTIONS"))
|
||||
return hw_config::branch_instructions;
|
||||
else if(HW_CONFIG_REGEX("BRANCH_MISSES"))
|
||||
return hw_config::branch_misses;
|
||||
else if(HW_CONFIG_REGEX("BUS_CYCLES"))
|
||||
return hw_config::bus_cycles;
|
||||
else if(HW_CONFIG_REGEX("STALLED_CYCLES_FRONTEND"))
|
||||
return hw_config::stalled_cycles_frontend;
|
||||
else if(HW_CONFIG_REGEX("STALLED_CYCLES_BACKEND"))
|
||||
return hw_config::stalled_cycles_backend;
|
||||
else if(HW_CONFIG_REGEX("REF_CPU_CYCLES"))
|
||||
return hw_config::reference_cpu_cycles;
|
||||
else
|
||||
{
|
||||
OMNITRACE_THROW("Unknown perf hardware config: %s", _v.data());
|
||||
}
|
||||
|
||||
#undef HW_CONFIG_REGEX
|
||||
|
||||
return hw_config::max;
|
||||
}
|
||||
|
||||
sw_config
|
||||
get_sw_config(std::string_view _v)
|
||||
{
|
||||
#define SW_CONFIG_REGEX(KEY) std::regex_search(_v.data(), std::regex{ "(SW_" KEY ")$" })
|
||||
|
||||
if(SW_CONFIG_REGEX("CPU_CLOCK"))
|
||||
return sw_config::cpu_clock;
|
||||
else if(SW_CONFIG_REGEX("TASK_CLOCK"))
|
||||
return sw_config::task_clock;
|
||||
else if(SW_CONFIG_REGEX("PAGE_FAULTS"))
|
||||
return sw_config::page_faults;
|
||||
else if(SW_CONFIG_REGEX("CONTEXT_SWITCHES"))
|
||||
return sw_config::context_switches;
|
||||
else if(SW_CONFIG_REGEX("CPU_MIGRATIONS"))
|
||||
return sw_config::cpu_migrations;
|
||||
else if(SW_CONFIG_REGEX("PAGE_FAULTS_MIN"))
|
||||
return sw_config::page_faults_minor;
|
||||
else if(SW_CONFIG_REGEX("PAGE_FAULTS_MAJ"))
|
||||
return sw_config::page_faults_major;
|
||||
else if(SW_CONFIG_REGEX("ALIGNMENT_FAULTS"))
|
||||
return sw_config::alignment_faults;
|
||||
else if(SW_CONFIG_REGEX("EMULATION_FAULTS"))
|
||||
return sw_config::emulation_faults;
|
||||
else
|
||||
{
|
||||
OMNITRACE_THROW("Unknown perf hw cache config: %s", _v.data());
|
||||
}
|
||||
|
||||
#undef SW_CONFIG_REGEX
|
||||
|
||||
return sw_config::max;
|
||||
}
|
||||
|
||||
int
|
||||
get_hw_cache_config(std::string_view _v)
|
||||
{
|
||||
int _value = 0;
|
||||
|
||||
#define HW_CACHE_CONFIG_REGEX(KEY) \
|
||||
std::regex_search(_v.data(), std::regex{ "(HW_CACHE_" KEY ")" })
|
||||
|
||||
if(HW_CACHE_CONFIG_REGEX("L1D"))
|
||||
_value |= static_cast<int>(hw_cache_config::l1d);
|
||||
else if(HW_CACHE_CONFIG_REGEX("L1I"))
|
||||
_value |= static_cast<int>(hw_cache_config::l1i);
|
||||
else if(HW_CACHE_CONFIG_REGEX("LL"))
|
||||
_value |= static_cast<int>(hw_cache_config::ll);
|
||||
else if(HW_CACHE_CONFIG_REGEX("DTLB"))
|
||||
_value |= static_cast<int>(hw_cache_config::dtlb);
|
||||
else if(HW_CACHE_CONFIG_REGEX("ITLB"))
|
||||
_value |= static_cast<int>(hw_cache_config::itlb);
|
||||
else if(HW_CACHE_CONFIG_REGEX("BPU"))
|
||||
_value |= static_cast<int>(hw_cache_config::bpu);
|
||||
else if(HW_CACHE_CONFIG_REGEX("NODE"))
|
||||
_value |= static_cast<int>(hw_cache_config::node);
|
||||
else
|
||||
OMNITRACE_THROW("Unknown perf software config: %s", _v.data());
|
||||
|
||||
#undef HW_CACHE_CONFIG_REGEX
|
||||
#define HW_CACHE_OP_REGEX(KEY) \
|
||||
std::regex_search(_v.data(), std::regex{ "(HW_CACHE_([A-Z1]+):" KEY ")" })
|
||||
|
||||
if(HW_CACHE_OP_REGEX("READ"))
|
||||
_value |= (static_cast<int>(hw_cache_op::read) << 8);
|
||||
else if(HW_CACHE_OP_REGEX("WRITE"))
|
||||
_value |= (static_cast<int>(hw_cache_op::write) << 8);
|
||||
else if(HW_CACHE_OP_REGEX("PREFETCH"))
|
||||
_value |= (static_cast<int>(hw_cache_op::prefetch) << 8);
|
||||
else
|
||||
_value |= (static_cast<int>(hw_cache_op::read) << 8);
|
||||
|
||||
#undef HW_CACHE_OP_REGEX
|
||||
#define HW_CACHE_OP_RESULT_REGEX(KEY) \
|
||||
std::regex_search(_v.data(), std::regex{ "(HW_CACHE_([A-Z1]+):" KEY ")" })
|
||||
|
||||
if(HW_CACHE_OP_RESULT_REGEX("READ"))
|
||||
_value |= (static_cast<int>(hw_cache_op_result::access) << 16);
|
||||
else if(HW_CACHE_OP_RESULT_REGEX("WRITE"))
|
||||
_value |= (static_cast<int>(hw_cache_op_result::miss) << 16);
|
||||
else
|
||||
_value |= (static_cast<int>(hw_cache_op_result::access) << 16);
|
||||
|
||||
#undef HW_CACHE_OP_RESULT_REGEX
|
||||
|
||||
return _value;
|
||||
}
|
||||
|
||||
void
|
||||
config_overflow_sampling(struct perf_event_attr& _pe, std::string_view _event,
|
||||
double _freq)
|
||||
{
|
||||
auto _period = (1.0 / _freq) * units::sec;
|
||||
|
||||
_pe.type = static_cast<int>(perf::get_event_type(_event));
|
||||
switch(_pe.type)
|
||||
{
|
||||
case PERF_TYPE_HARDWARE:
|
||||
{
|
||||
_pe.config = static_cast<int>(perf::get_hw_config(_event));
|
||||
break;
|
||||
}
|
||||
case PERF_TYPE_SOFTWARE:
|
||||
{
|
||||
_pe.config = static_cast<int>(perf::get_sw_config(_event));
|
||||
break;
|
||||
}
|
||||
case PERF_TYPE_HW_CACHE:
|
||||
{
|
||||
_pe.config = static_cast<int>(perf::get_hw_cache_config(_event));
|
||||
break;
|
||||
}
|
||||
case PERF_TYPE_BREAKPOINT:
|
||||
case PERF_TYPE_TRACEPOINT:
|
||||
case PERF_TYPE_RAW:
|
||||
case PERF_TYPE_MAX:
|
||||
default:
|
||||
{
|
||||
OMNITRACE_THROW("unsupported perf type");
|
||||
}
|
||||
};
|
||||
|
||||
if(_pe.type == PERF_TYPE_SOFTWARE &&
|
||||
(_pe.config == PERF_COUNT_SW_CPU_CLOCK || _pe.config == PERF_COUNT_SW_TASK_CLOCK))
|
||||
{
|
||||
_pe.sample_period = static_cast<uint64_t>(_period);
|
||||
}
|
||||
else
|
||||
{
|
||||
_pe.sample_period = static_cast<uint64_t>(_freq);
|
||||
}
|
||||
}
|
||||
} // namespace perf
|
||||
} // namespace omnitrace
|
||||
@@ -0,0 +1,291 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "core/defines.hpp"
|
||||
|
||||
#include <timemory/backends/papi.hpp>
|
||||
|
||||
#include <cstdint>
|
||||
#include <linux/perf_event.h>
|
||||
|
||||
#if __GLIBC__ == 2 && __GLIBC_MINOR__ < 30
|
||||
# include <sys/syscall.h>
|
||||
# define gettid() syscall(SYS_gettid)
|
||||
#endif
|
||||
|
||||
// Workaround for missing hw_breakpoint.h include file:
|
||||
// This include file just defines constants used to configure watchpoint registers.
|
||||
// This will be constant across x86 systems.
|
||||
enum
|
||||
{
|
||||
HW_BREAKPOINT_X = 4
|
||||
};
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace perf
|
||||
{
|
||||
/// An enum class with all the available sampling data
|
||||
enum class sample : uint64_t
|
||||
{
|
||||
ip = PERF_SAMPLE_IP,
|
||||
pid_tid = PERF_SAMPLE_TID,
|
||||
time = PERF_SAMPLE_TIME,
|
||||
addr = PERF_SAMPLE_ADDR,
|
||||
id = PERF_SAMPLE_ID,
|
||||
stream_id = PERF_SAMPLE_STREAM_ID,
|
||||
cpu = PERF_SAMPLE_CPU,
|
||||
period = PERF_SAMPLE_PERIOD,
|
||||
|
||||
#if defined(PERF_SAMPLE_READ)
|
||||
read = PERF_SAMPLE_READ,
|
||||
#else
|
||||
read = 0,
|
||||
#endif
|
||||
|
||||
callchain = PERF_SAMPLE_CALLCHAIN,
|
||||
raw = PERF_SAMPLE_RAW,
|
||||
|
||||
#if defined(PERF_SAMPLE_BRANCH_STACK)
|
||||
branch_stack = PERF_SAMPLE_BRANCH_STACK,
|
||||
#else
|
||||
branch_stack = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_REGS_USER)
|
||||
regs = PERF_SAMPLE_REGS_USER,
|
||||
#else
|
||||
regs = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_STACK_USER)
|
||||
stack = PERF_SAMPLE_STACK_USER,
|
||||
#else
|
||||
stack = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_WEIGHT)
|
||||
weight = PERF_SAMPLE_WEIGHT,
|
||||
#else
|
||||
weight = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_DATA_SRC)
|
||||
data_src = PERF_SAMPLE_DATA_SRC,
|
||||
#else
|
||||
data_src = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_IDENTIFIER)
|
||||
identifier = PERF_SAMPLE_IDENTIFIER,
|
||||
#else
|
||||
identifier = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_TRANSACTION)
|
||||
transaction = PERF_SAMPLE_TRANSACTION,
|
||||
#else
|
||||
transaction = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_REGS_INTR)
|
||||
regs_intr = PERF_SAMPLE_REGS_INTR,
|
||||
#else
|
||||
regs_intr = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_PHYS_ADDR)
|
||||
phys_addr = PERF_SAMPLE_PHYS_ADDR,
|
||||
#else
|
||||
phys_addr = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_CGROUP)
|
||||
cgroup = PERF_SAMPLE_CGROUP,
|
||||
#else
|
||||
cgroup = 0,
|
||||
#endif
|
||||
|
||||
last = PERF_SAMPLE_MAX
|
||||
};
|
||||
|
||||
enum class event_type : int
|
||||
{
|
||||
hardware = PERF_TYPE_HARDWARE,
|
||||
software = PERF_TYPE_SOFTWARE,
|
||||
tracepoint = PERF_TYPE_TRACEPOINT,
|
||||
hw_cache = PERF_TYPE_HW_CACHE,
|
||||
raw = PERF_TYPE_RAW,
|
||||
breakpoint = PERF_TYPE_BREAKPOINT,
|
||||
max = PERF_TYPE_MAX,
|
||||
};
|
||||
|
||||
enum class hw_config : int
|
||||
{
|
||||
cpu_cycles = PERF_COUNT_HW_CPU_CYCLES,
|
||||
instructions = PERF_COUNT_HW_INSTRUCTIONS,
|
||||
cache_references = PERF_COUNT_HW_CACHE_REFERENCES,
|
||||
cache_misses = PERF_COUNT_HW_CACHE_MISSES,
|
||||
branch_instructions = PERF_COUNT_HW_BRANCH_INSTRUCTIONS,
|
||||
branch_misses = PERF_COUNT_HW_BRANCH_MISSES,
|
||||
bus_cycles = PERF_COUNT_HW_BUS_CYCLES,
|
||||
stalled_cycles_frontend = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND,
|
||||
stalled_cycles_backend = PERF_COUNT_HW_STALLED_CYCLES_BACKEND,
|
||||
reference_cpu_cycles = PERF_COUNT_HW_REF_CPU_CYCLES,
|
||||
max = PERF_COUNT_HW_MAX,
|
||||
};
|
||||
|
||||
enum class sw_config : int
|
||||
{
|
||||
cpu_clock = PERF_COUNT_SW_CPU_CLOCK,
|
||||
task_clock = PERF_COUNT_SW_TASK_CLOCK,
|
||||
page_faults = PERF_COUNT_SW_PAGE_FAULTS,
|
||||
context_switches = PERF_COUNT_SW_CONTEXT_SWITCHES,
|
||||
cpu_migrations = PERF_COUNT_SW_CPU_MIGRATIONS,
|
||||
page_faults_minor = PERF_COUNT_SW_PAGE_FAULTS_MIN,
|
||||
page_faults_major = PERF_COUNT_SW_PAGE_FAULTS_MAJ,
|
||||
alignment_faults = PERF_COUNT_SW_ALIGNMENT_FAULTS,
|
||||
emulation_faults = PERF_COUNT_SW_EMULATION_FAULTS,
|
||||
max = PERF_COUNT_SW_MAX,
|
||||
};
|
||||
|
||||
enum class hw_cache_config : int
|
||||
{
|
||||
l1d = PERF_COUNT_HW_CACHE_L1D,
|
||||
l1i = PERF_COUNT_HW_CACHE_L1I,
|
||||
ll = PERF_COUNT_HW_CACHE_LL,
|
||||
dtlb = PERF_COUNT_HW_CACHE_DTLB,
|
||||
itlb = PERF_COUNT_HW_CACHE_ITLB,
|
||||
bpu = PERF_COUNT_HW_CACHE_BPU,
|
||||
node = PERF_COUNT_HW_CACHE_NODE,
|
||||
max = PERF_COUNT_HW_CACHE_MAX,
|
||||
};
|
||||
|
||||
enum class hw_cache_op : int
|
||||
{
|
||||
read = PERF_COUNT_HW_CACHE_OP_READ,
|
||||
write = PERF_COUNT_HW_CACHE_OP_WRITE,
|
||||
prefetch = PERF_COUNT_HW_CACHE_OP_PREFETCH,
|
||||
max = PERF_COUNT_HW_CACHE_OP_MAX,
|
||||
};
|
||||
|
||||
enum class hw_cache_op_result : int
|
||||
{
|
||||
access = PERF_COUNT_HW_CACHE_RESULT_ACCESS,
|
||||
miss = PERF_COUNT_HW_CACHE_RESULT_MISS,
|
||||
max = PERF_COUNT_HW_CACHE_RESULT_MAX,
|
||||
};
|
||||
|
||||
/// An enum to distinguish types of records in the mmapped ring buffer
|
||||
enum class record_type
|
||||
{
|
||||
mmap = PERF_RECORD_MMAP,
|
||||
lost = PERF_RECORD_LOST,
|
||||
comm = PERF_RECORD_COMM,
|
||||
exit = PERF_RECORD_EXIT,
|
||||
throttle = PERF_RECORD_THROTTLE,
|
||||
unthrottle = PERF_RECORD_UNTHROTTLE,
|
||||
fork = PERF_RECORD_FORK,
|
||||
read = PERF_RECORD_READ,
|
||||
sample = PERF_RECORD_SAMPLE,
|
||||
|
||||
#if defined(PERF_RECORD_MMAP2)
|
||||
mmap2 = PERF_RECORD_MMAP2,
|
||||
#else
|
||||
mmap2 = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_AUX)
|
||||
aux = PERF_RECORD_AUX,
|
||||
#else
|
||||
aux = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_ITRACE_START)
|
||||
itrace_start = PERF_RECORD_ITRACE_START,
|
||||
#else
|
||||
itrace_start = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_LOST_SAMPLES)
|
||||
lost_samples = PERF_RECORD_LOST_SAMPLES,
|
||||
#else
|
||||
lost_samples = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_SWITCH)
|
||||
switch_record = PERF_RECORD_SWITCH,
|
||||
#else
|
||||
switch_record = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_SWITCH_CPU_WIDE)
|
||||
switch_cpu_wide = PERF_RECORD_SWITCH_CPU_WIDE,
|
||||
#else
|
||||
switch_cpu_wide = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_NAMESPACES)
|
||||
namespaces = PERF_RECORD_NAMESPACES,
|
||||
#else
|
||||
namespaces = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_KSYMBOL)
|
||||
ksymbol = PERF_RECORD_KSYMBOL,
|
||||
#else
|
||||
ksymbol = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_BPF_EVENT)
|
||||
bpf_event = PERF_RECORD_BPF_EVENT,
|
||||
#else
|
||||
bpf_event = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_CGROUP)
|
||||
cgroup = PERF_RECORD_CGROUP,
|
||||
#else
|
||||
cgroup = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_RECORD_TEXT_POKE)
|
||||
text_poke = PERF_RECORD_TEXT_POKE,
|
||||
#else
|
||||
text_poke = 0,
|
||||
#endif
|
||||
};
|
||||
|
||||
std::vector<std::string>
|
||||
get_config_choices();
|
||||
|
||||
event_type get_event_type(std::string_view);
|
||||
hw_config get_hw_config(std::string_view);
|
||||
sw_config get_sw_config(std::string_view);
|
||||
int get_hw_cache_config(std::string_view);
|
||||
|
||||
void
|
||||
config_overflow_sampling(struct perf_event_attr&, std::string_view, double);
|
||||
} // namespace perf
|
||||
} // namespace omnitrace
|
||||
@@ -57,6 +57,13 @@ enum class Mode : unsigned short
|
||||
Coverage
|
||||
};
|
||||
|
||||
enum class CausalBackend : unsigned short
|
||||
{
|
||||
Perf = 0,
|
||||
Timer,
|
||||
Auto,
|
||||
};
|
||||
|
||||
enum class CausalMode : unsigned short
|
||||
{
|
||||
Line = 0,
|
||||
|
||||
@@ -0,0 +1,123 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "utility.hpp"
|
||||
#include "debug.hpp"
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace utility
|
||||
{
|
||||
namespace
|
||||
{
|
||||
template <typename ContainerT, typename Arg>
|
||||
auto
|
||||
emplace_impl(ContainerT& _targ, Arg&& _v, int)
|
||||
-> decltype(_targ.emplace(std::forward<Arg>(_v)))
|
||||
{
|
||||
return _targ.emplace(std::forward<Arg>(_v));
|
||||
}
|
||||
|
||||
template <typename ContainerT, typename Arg>
|
||||
auto
|
||||
emplace_impl(ContainerT& _targ, Arg&& _v, long)
|
||||
-> decltype(_targ.emplace_back(std::forward<Arg>(_v)))
|
||||
{
|
||||
return _targ.emplace_back(std::forward<Arg>(_v));
|
||||
}
|
||||
|
||||
template <typename ContainerT, typename Arg>
|
||||
decltype(auto)
|
||||
emplace(ContainerT& _targ, Arg&& _v)
|
||||
{
|
||||
return emplace_impl(_targ, std::forward<Arg>(_v), 0);
|
||||
}
|
||||
} // namespace
|
||||
|
||||
template <typename Tp, typename ContainerT, typename Up>
|
||||
ContainerT
|
||||
parse_numeric_range(std::string _input_string, const std::string& _label, Up _incr)
|
||||
{
|
||||
auto _get_value = [](const std::string& _inp) {
|
||||
std::stringstream iss{ _inp };
|
||||
auto var = Tp{};
|
||||
iss >> var;
|
||||
return var;
|
||||
};
|
||||
|
||||
for(auto& itr : _input_string)
|
||||
itr = tolower(itr);
|
||||
auto _result = ContainerT{};
|
||||
for(auto _v : tim::delimit(_input_string, ",; \t\n\r"))
|
||||
{
|
||||
if(_v.find_first_not_of("0123456789-:") != std::string::npos)
|
||||
{
|
||||
OMNITRACE_BASIC_VERBOSE_F(
|
||||
0,
|
||||
"Invalid %s specification. Only numerical values (e.g., 0), ranges "
|
||||
"(e.g., 0-7), and ranges with increments (e.g. 20-40:10) are permitted. "
|
||||
"Ignoring %s...",
|
||||
_label.c_str(), _v.c_str());
|
||||
continue;
|
||||
}
|
||||
|
||||
auto _incr_v = _incr;
|
||||
auto _incr_pos = _v.find(':');
|
||||
if(_incr_pos != std::string::npos)
|
||||
{
|
||||
auto _incr_str = _v.substr(_incr_pos + 1);
|
||||
if(!_incr_str.empty()) _incr_v = static_cast<Up>(std::stoull(_incr_str));
|
||||
_v = _v.substr(0, _incr_pos);
|
||||
}
|
||||
|
||||
if(_v.find('-') != std::string::npos)
|
||||
{
|
||||
auto _vv = tim::delimit(_v, "-");
|
||||
OMNITRACE_CONDITIONAL_THROW(
|
||||
_vv.size() != 2,
|
||||
"Invalid %s range specification: %s. Required format N-M, e.g. 0-4",
|
||||
_label.c_str(), _v.c_str());
|
||||
Tp _vn = _get_value(_vv.at(0));
|
||||
Tp _vN = _get_value(_vv.at(1));
|
||||
do
|
||||
{
|
||||
emplace(_result, _vn);
|
||||
_vn += _incr_v;
|
||||
} while(_vn <= _vN);
|
||||
}
|
||||
else
|
||||
{
|
||||
emplace(_result, std::stoll(_v));
|
||||
}
|
||||
}
|
||||
return _result;
|
||||
}
|
||||
|
||||
template std::set<int64_t>
|
||||
parse_numeric_range<int64_t, std::set<int64_t>>(std::string, const std::string&, long);
|
||||
template std::vector<int64_t>
|
||||
parse_numeric_range<int64_t, std::vector<int64_t>>(std::string, const std::string&, long);
|
||||
template std::unordered_set<int64_t>
|
||||
parse_numeric_range<int64_t, std::unordered_set<int64_t>>(std::string, const std::string&,
|
||||
long);
|
||||
} // namespace utility
|
||||
} // namespace omnitrace
|
||||
@@ -25,6 +25,7 @@
|
||||
#include "concepts.hpp"
|
||||
|
||||
#include <timemory/mpl/concepts.hpp>
|
||||
#include <timemory/utility/delimit.hpp>
|
||||
#include <timemory/utility/join.hpp>
|
||||
|
||||
#include <algorithm>
|
||||
@@ -32,6 +33,7 @@
|
||||
#include <atomic>
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <set>
|
||||
#include <sstream>
|
||||
#include <stdexcept>
|
||||
#include <vector>
|
||||
@@ -238,5 +240,17 @@ convert(std::string_view _inp)
|
||||
_iss >> _ret;
|
||||
return _ret;
|
||||
}
|
||||
|
||||
template <typename Tp = int64_t, typename ContainerT = std::set<Tp>, typename Up = Tp>
|
||||
ContainerT
|
||||
parse_numeric_range(std::string _input_string, const std::string& _label, Up _incr);
|
||||
|
||||
extern template std::set<int64_t>
|
||||
parse_numeric_range<int64_t, std::set<int64_t>>(std::string, const std::string&, long);
|
||||
extern template std::vector<int64_t>
|
||||
parse_numeric_range<int64_t, std::vector<int64_t>>(std::string, const std::string&, long);
|
||||
extern template std::unordered_set<int64_t>
|
||||
parse_numeric_range<int64_t, std::unordered_set<int64_t>>(std::string, const std::string&,
|
||||
long);
|
||||
} // namespace utility
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -136,6 +136,8 @@ reset_omnitrace_preload()
|
||||
auto&& _preload_libs = get_env("LD_PRELOAD", std::string{});
|
||||
if(_preload_libs.find("libomnitrace-dl.so") != std::string::npos)
|
||||
{
|
||||
(void) get_omnitrace_is_preloaded();
|
||||
(void) get_omnitrace_preload();
|
||||
auto _modified_preload = std::string{};
|
||||
for(const auto& itr : delimit(_preload_libs, ":"))
|
||||
{
|
||||
@@ -1293,6 +1295,7 @@ verify_instrumented_preloaded()
|
||||
case dl::InstrumentMode::None:
|
||||
case dl::InstrumentMode::ProcessAttach:
|
||||
case dl::InstrumentMode::ProcessCreate:
|
||||
case dl::InstrumentMode::PythonProfile:
|
||||
{
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -83,6 +83,8 @@ extern "C"
|
||||
OMNITRACE_CATEGORY_THREAD_HARDWARE_COUNTER,
|
||||
OMNITRACE_CATEGORY_KERNEL_HARDWARE_COUNTER,
|
||||
OMNITRACE_CATEGORY_NUMA,
|
||||
OMNITRACE_CATEGORY_TIMER_SAMPLING,
|
||||
OMNITRACE_CATEGORY_OVERFLOW_SAMPLING,
|
||||
OMNITRACE_CATEGORY_LAST
|
||||
// the value of below enum is used for iterating
|
||||
// over the enum in C++ templates. It MUST
|
||||
|
||||
@@ -5,6 +5,7 @@ set(library_sources
|
||||
${CMAKE_CURRENT_LIST_DIR}/critical_trace.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/kokkosp.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/ompt.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/perf.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/process_sampler.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/ptl.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/runtime.cpp
|
||||
@@ -19,6 +20,7 @@ set(library_headers
|
||||
${CMAKE_CURRENT_LIST_DIR}/critical_trace.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/ompt.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/process_sampler.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/perf.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/ptl.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/rcclp.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/rocm.hpp
|
||||
|
||||
@@ -1,21 +1,13 @@
|
||||
#
|
||||
set(causal_sources
|
||||
${CMAKE_CURRENT_LIST_DIR}/data.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/delay.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/experiment.cpp
|
||||
# ${CMAKE_CURRENT_LIST_DIR}/perf.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/sample_data.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/sampling.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/selected_entry.cpp)
|
||||
${CMAKE_CURRENT_LIST_DIR}/data.cpp ${CMAKE_CURRENT_LIST_DIR}/delay.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/experiment.cpp ${CMAKE_CURRENT_LIST_DIR}/sample_data.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/sampling.cpp ${CMAKE_CURRENT_LIST_DIR}/selected_entry.cpp)
|
||||
|
||||
set(causal_headers
|
||||
${CMAKE_CURRENT_LIST_DIR}/data.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/delay.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/experiment.hpp
|
||||
# ${CMAKE_CURRENT_LIST_DIR}/perf.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/sample_data.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/sampling.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/selected_entry.hpp)
|
||||
${CMAKE_CURRENT_LIST_DIR}/data.hpp ${CMAKE_CURRENT_LIST_DIR}/delay.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/experiment.hpp ${CMAKE_CURRENT_LIST_DIR}/sample_data.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/sampling.hpp ${CMAKE_CURRENT_LIST_DIR}/selected_entry.hpp)
|
||||
|
||||
target_sources(omnitrace-object-library PRIVATE ${causal_sources} ${causal_headers})
|
||||
|
||||
|
||||
@@ -29,6 +29,7 @@
|
||||
#include "library/causal/data.hpp"
|
||||
#include "library/causal/delay.hpp"
|
||||
#include "library/causal/experiment.hpp"
|
||||
#include "library/perf.hpp"
|
||||
#include "library/runtime.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "library/thread_info.hpp"
|
||||
@@ -45,6 +46,7 @@
|
||||
|
||||
#include <atomic>
|
||||
#include <ctime>
|
||||
#include <execinfo.h>
|
||||
#include <type_traits>
|
||||
|
||||
namespace omnitrace
|
||||
@@ -57,46 +59,85 @@ namespace
|
||||
{
|
||||
using ::tim::backtrace::get_unw_signal_frame_stack_raw;
|
||||
|
||||
auto&
|
||||
get_delay_statistics()
|
||||
{
|
||||
using thread_data_t =
|
||||
thread_data<identity<tim::statistics<int64_t>>, category::sampling>;
|
||||
|
||||
static_assert(
|
||||
use_placement_new_when_generating_unique_ptr<thread_data_t>::value,
|
||||
"delay statistics thread data should use placement new to allocate unique_ptr");
|
||||
|
||||
static auto& _v = thread_data_t::instance(construct_on_init{});
|
||||
return _v;
|
||||
}
|
||||
} // namespace
|
||||
int realtime_signal = 0;
|
||||
int cputime_signal = 0;
|
||||
int overflow_signal = 0;
|
||||
|
||||
void
|
||||
backtrace::start()
|
||||
generic_global_init()
|
||||
{
|
||||
// do not delete these lines. The thread data needs to be allocated
|
||||
// before it is called in sampler or else a deadlock will occur when
|
||||
// the sample interrupts a malloc call
|
||||
(void) get_delay_statistics();
|
||||
if(realtime_signal + cputime_signal + overflow_signal == 0)
|
||||
{
|
||||
realtime_signal = get_sampling_realtime_signal();
|
||||
cputime_signal = get_sampling_cputime_signal();
|
||||
overflow_signal = get_sampling_overflow_signal();
|
||||
}
|
||||
}
|
||||
} // namespace
|
||||
|
||||
void
|
||||
overflow::global_init()
|
||||
{
|
||||
// do not delete these lines.
|
||||
generic_global_init();
|
||||
}
|
||||
|
||||
void
|
||||
backtrace::stop()
|
||||
{}
|
||||
backtrace::global_init()
|
||||
{
|
||||
// do not delete these lines.
|
||||
generic_global_init();
|
||||
}
|
||||
|
||||
void
|
||||
sample_rate::sample(int _sig)
|
||||
overflow::sample(int _sig)
|
||||
{
|
||||
if(_sig != get_realtime_signal()) return;
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
// update the last sample for backtrace signal(s) even when in use
|
||||
static thread_local int64_t _last_sample = 0;
|
||||
static thread_local const auto& _tinfo = thread_info::get();
|
||||
auto _tid = _tinfo->index_data->sequent_value;
|
||||
auto& _perf_event = perf::get_instance(_tid);
|
||||
|
||||
auto _this_sample = tracing::now();
|
||||
auto& _period_stat = get_delay_statistics()->at(threading::get_id());
|
||||
if(_last_sample > 0) _period_stat += (_this_sample - _last_sample);
|
||||
_last_sample = _this_sample;
|
||||
if(!_perf_event) return;
|
||||
|
||||
m_index = causal::experiment::get_index();
|
||||
|
||||
_perf_event->stop();
|
||||
|
||||
for(auto itr : *_perf_event)
|
||||
{
|
||||
if(itr.is_sample())
|
||||
{
|
||||
auto _sample_ip = itr.get_ip();
|
||||
auto _data = callchain_t{};
|
||||
_data.emplace_back(_sample_ip);
|
||||
for(auto ditr : itr.get_callchain())
|
||||
{
|
||||
if(ditr != _sample_ip) _data.emplace_back(ditr);
|
||||
if(_data.size() == _data.capacity()) break;
|
||||
}
|
||||
|
||||
if(causal::experiment::is_active() && causal::experiment::is_selected(_data))
|
||||
{
|
||||
++m_selected;
|
||||
causal::experiment::add_selected();
|
||||
causal::delay::get_local() += causal::experiment::get_delay();
|
||||
}
|
||||
else if(!causal::experiment::is_active())
|
||||
{
|
||||
causal::set_current_selection(_data);
|
||||
}
|
||||
|
||||
m_stack.emplace_back(_data);
|
||||
}
|
||||
}
|
||||
|
||||
_perf_event->start();
|
||||
|
||||
if(_sig == cputime_signal) causal::delay::process();
|
||||
}
|
||||
|
||||
void
|
||||
@@ -104,11 +145,16 @@ backtrace::sample(int _sig)
|
||||
{
|
||||
constexpr size_t depth = ::omnitrace::causal::unwind_depth;
|
||||
constexpr int64_t ignore_depth = ::omnitrace::causal::unwind_offset;
|
||||
constexpr size_t select_init = std::numeric_limits<size_t>::max();
|
||||
constexpr size_t select_ival = 5; // interval at which realtime signal contributes
|
||||
|
||||
// update the last sample for backtrace signal(s) even when in use
|
||||
static thread_local size_t _protect_flag = 0;
|
||||
|
||||
// sampling_guard _guard{};
|
||||
// the select_count is initialized to max so that realtime signal does
|
||||
// not initially set the current selection
|
||||
static thread_local size_t _select_count = select_init;
|
||||
static thread_local size_t _select_zeros = 0;
|
||||
|
||||
if((_protect_flag & 1) == 1 ||
|
||||
OMNITRACE_UNLIKELY(!trait::runtime_enabled<causal::component::backtrace>::get()))
|
||||
@@ -122,33 +168,52 @@ backtrace::sample(int _sig)
|
||||
m_index = causal::experiment::get_index();
|
||||
m_stack = get_unw_signal_frame_stack_raw<depth, ignore_depth>();
|
||||
|
||||
// the batch handler timer delivers a signal according to the thread CPU
|
||||
// clock, ensuring that setting the current selection and processing the
|
||||
// delays only happens when the thread is active
|
||||
if(_sig == get_cputime_signal())
|
||||
{
|
||||
if(!causal::experiment::is_active())
|
||||
causal::set_current_selection(m_stack);
|
||||
else
|
||||
causal::delay::process();
|
||||
}
|
||||
else if(_sig == get_realtime_signal())
|
||||
{
|
||||
static thread_local auto _tid = threading::get_id();
|
||||
auto& _period_stat = get_delay_statistics()->at(_tid);
|
||||
auto _set_current_selection = [](auto _stack) {
|
||||
// save the former selection count
|
||||
auto _former_count = _select_count;
|
||||
// get the current selection count
|
||||
_select_count = causal::set_current_selection(_stack);
|
||||
// if the selection count was reduced, reset select zeros.
|
||||
// this typically means that a new experiment was started
|
||||
if(_former_count > _select_count) _select_zeros = 0;
|
||||
// if no PCs were selected, increment the select zeros.
|
||||
// if the cputime signal has not selected a PC in select_ival iterations,
|
||||
// then the realtime signal will start contributing to the current
|
||||
// selection. We generally want only the cputime signal to contribute
|
||||
// because those PCs are in-use (since the thread CPU clock in increasing)
|
||||
if(_select_count == 0) ++_select_zeros;
|
||||
};
|
||||
|
||||
// the batch handler timer delivers a signal according to the thread CPU
|
||||
// clock, ensuring that setting the current selection is preferred when the thread
|
||||
// is active and processing the delays happens only when the thread is active
|
||||
if(_sig == cputime_signal)
|
||||
{
|
||||
if(causal::experiment::is_active())
|
||||
causal::delay::process();
|
||||
else
|
||||
_set_current_selection(m_stack);
|
||||
}
|
||||
else if(_sig == realtime_signal)
|
||||
{
|
||||
if(causal::experiment::is_active() && causal::experiment::is_selected(m_stack))
|
||||
{
|
||||
m_selected = true;
|
||||
causal::experiment::add_selected();
|
||||
// compute the delay time based on the rate of taking samples,
|
||||
// unless we have taken less than 10, in which case, we just
|
||||
// use the pre-computed value.
|
||||
auto _delay =
|
||||
(_period_stat.get_count() < 10)
|
||||
? causal::experiment::get_delay()
|
||||
: (_period_stat.get_mean() * causal::experiment::get_delay_scaling());
|
||||
causal::delay::get_local() += _delay;
|
||||
causal::delay::get_local() += causal::experiment::get_delay();
|
||||
}
|
||||
else if(!causal::experiment::is_active())
|
||||
{
|
||||
// if no PCs have been selected after at least "select_ival" call-stacks via
|
||||
// the cputime signal, then contribute the call-stack via the realtime signal.
|
||||
// This can be particularly relevant in end-to-end runs targeting a particular
|
||||
// line/function since it is possible that the line/function is situated such
|
||||
// the cputime signal is never delivered when executing the particular
|
||||
// line/function... despite the line/function executing in between the
|
||||
// the cputime signals. This is rare but has been observed
|
||||
//
|
||||
if(_select_count == 0 && _select_zeros >= select_ival)
|
||||
_set_current_selection(m_stack);
|
||||
}
|
||||
}
|
||||
else
|
||||
@@ -165,36 +230,10 @@ backtrace::get_period(uint64_t _units)
|
||||
{
|
||||
using cast_type = std::conditional_t<std::is_floating_point<Tp>::value, Tp, double>;
|
||||
|
||||
double _realtime_freq =
|
||||
(get_use_sampling_realtime()) ? get_sampling_real_freq() : 0.0;
|
||||
double _cputime_freq = (get_use_sampling_cputime()) ? get_sampling_cpu_freq() : 0.0;
|
||||
|
||||
auto _freq = std::max<double>(_realtime_freq, _cputime_freq);
|
||||
double _period = 1.0 / _freq;
|
||||
double _period = 1.0 / 1000.0;
|
||||
int64_t _period_nsec = static_cast<int64_t>(_period * units::sec) % units::sec;
|
||||
return static_cast<Tp>(_period_nsec) / static_cast<cast_type>(_units);
|
||||
}
|
||||
|
||||
tim::statistics<int64_t>
|
||||
backtrace::get_period_stats()
|
||||
{
|
||||
auto _data = tim::statistics<int64_t>{};
|
||||
if(!get_delay_statistics()) return _data;
|
||||
for(auto itr : *get_delay_statistics())
|
||||
{
|
||||
if(itr.get_count() > 1) _data += itr;
|
||||
}
|
||||
return _data;
|
||||
}
|
||||
|
||||
void
|
||||
backtrace::reset_period_stats()
|
||||
{
|
||||
for(auto& itr : *get_delay_statistics())
|
||||
{
|
||||
itr.reset();
|
||||
}
|
||||
}
|
||||
} // namespace component
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -27,7 +27,8 @@
|
||||
#include "core/defines.hpp"
|
||||
#include "core/timemory.hpp"
|
||||
#include "library/causal/data.hpp"
|
||||
#include "library/causal/sample_data.hpp"
|
||||
#include "library/causal/fwd.hpp"
|
||||
#include "library/perf.hpp"
|
||||
|
||||
#include <timemory/components/base.hpp>
|
||||
#include <timemory/macros/language.hpp>
|
||||
@@ -45,22 +46,36 @@ namespace causal
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
struct sample_rate : comp::empty_base
|
||||
struct overflow : comp::empty_base
|
||||
{
|
||||
using value_type = void;
|
||||
static void sample(int = -1);
|
||||
static constexpr auto alt_stack_size = perf::perf_event::max_batch_size;
|
||||
|
||||
using value_type = void;
|
||||
using callchain_t = container::static_vector<uintptr_t, unwind_depth>;
|
||||
using alt_stack_t = container::static_vector<callchain_t, alt_stack_size>;
|
||||
|
||||
static std::string label() { return "causal::overflow"; }
|
||||
static void global_init();
|
||||
|
||||
void sample(int = -1);
|
||||
|
||||
auto get_selected() const { return m_selected; }
|
||||
auto get_index() const { return m_index; }
|
||||
const auto& get_stack() const { return m_stack; }
|
||||
|
||||
private:
|
||||
int32_t m_selected = 0;
|
||||
uint32_t m_index = 0;
|
||||
alt_stack_t m_stack = {};
|
||||
};
|
||||
|
||||
struct backtrace : comp::empty_base
|
||||
{
|
||||
using value_type = void;
|
||||
using sample_data_set_t = std::set<sample_data>;
|
||||
using value_type = void;
|
||||
using callchain_t = container::static_vector<uint64_t, unwind_depth>;
|
||||
|
||||
static std::string label() { return "causal::backtrace"; }
|
||||
static std::string description()
|
||||
{
|
||||
return "Causal profiling data collected in backtrace";
|
||||
}
|
||||
static void global_init();
|
||||
|
||||
backtrace() = default;
|
||||
~backtrace() = default;
|
||||
@@ -70,9 +85,6 @@ struct backtrace : comp::empty_base
|
||||
backtrace& operator=(const backtrace&) = default;
|
||||
backtrace& operator=(backtrace&&) noexcept = default;
|
||||
|
||||
static void start();
|
||||
static void stop();
|
||||
|
||||
void sample(int = -1);
|
||||
|
||||
auto get_selected() const { return m_selected; }
|
||||
@@ -82,9 +94,6 @@ struct backtrace : comp::empty_base
|
||||
template <typename Tp = uint64_t>
|
||||
static Tp get_period(uint64_t _units = units::nsec);
|
||||
|
||||
static tim::statistics<int64_t> get_period_stats();
|
||||
static void reset_period_stats();
|
||||
|
||||
private:
|
||||
bool m_selected = false;
|
||||
uint32_t m_index = 0;
|
||||
|
||||
@@ -39,6 +39,7 @@
|
||||
#include <cstdint>
|
||||
#include <pthread.h>
|
||||
#include <stdexcept>
|
||||
#include <type_traits>
|
||||
|
||||
#pragma weak pthread_join
|
||||
#pragma weak pthread_mutex_lock
|
||||
@@ -141,9 +142,9 @@ blocking_gotcha::shutdown()
|
||||
blocking_gotcha_t::disable();
|
||||
}
|
||||
|
||||
template <typename Ret, typename... Args>
|
||||
Ret
|
||||
blocking_gotcha::operator()(const comp::gotcha_data& _data, Ret (*_func)(Args...),
|
||||
template <size_t Idx, typename Ret, typename... Args>
|
||||
std::enable_if_t<(Idx <= blocking_gotcha::indexes::maybe_post_block_max_idx), Ret>
|
||||
blocking_gotcha::operator()(gotcha_index<Idx>, Ret (*_func)(Args...),
|
||||
Args... _args) const noexcept
|
||||
{
|
||||
int64_t _delay_value = causal::delay::get_global().load(std::memory_order_relaxed);
|
||||
@@ -154,20 +155,26 @@ blocking_gotcha::operator()(const comp::gotcha_data& _data, Ret (*_func)(Args...
|
||||
|
||||
if(get_thread_state() < ::omnitrace::ThreadState::Internal)
|
||||
{
|
||||
if(_data.index <= 5)
|
||||
causal::delay::postblock(_delay_value);
|
||||
else if(_ret == 0 && _data.index >= 6 && _data.index <= 13)
|
||||
if constexpr(Idx >= always_post_block_min_idx && Idx <= always_post_block_max_idx)
|
||||
{
|
||||
causal::delay::postblock(_delay_value);
|
||||
}
|
||||
else if constexpr(Idx >= maybe_post_block_min_idx &&
|
||||
Idx <= maybe_post_block_max_idx)
|
||||
{
|
||||
if(_ret == 0) causal::delay::postblock(_delay_value);
|
||||
}
|
||||
else
|
||||
OMNITRACE_FAIL_F("Error! unexpected index %zu ('%s')\n", _data.index,
|
||||
_data.tool_id.c_str());
|
||||
{
|
||||
static_assert(Idx > maybe_post_block_max_idx, "Error! bad overload");
|
||||
}
|
||||
}
|
||||
|
||||
return _ret;
|
||||
}
|
||||
|
||||
int
|
||||
blocking_gotcha::operator()(const comp::gotcha_data&, int (*)(const sigset_t*, int*),
|
||||
blocking_gotcha::operator()(gotcha_index<sigwait_idx>, int (*)(const sigset_t*, int*),
|
||||
const sigset_t* _set_v, int* _sig) const noexcept
|
||||
{
|
||||
auto _active = get_thread_state() < ::omnitrace::ThreadState::Internal;
|
||||
@@ -198,7 +205,7 @@ blocking_gotcha::operator()(const comp::gotcha_data&, int (*)(const sigset_t*, i
|
||||
}
|
||||
|
||||
int
|
||||
blocking_gotcha::operator()(const comp::gotcha_data&,
|
||||
blocking_gotcha::operator()(gotcha_index<sigwaitinfo_idx>,
|
||||
int (*_func)(const sigset_t*, siginfo_t*),
|
||||
const sigset_t* _set_v, siginfo_t* _info_v) const noexcept
|
||||
{
|
||||
@@ -224,7 +231,7 @@ blocking_gotcha::operator()(const comp::gotcha_data&,
|
||||
}
|
||||
|
||||
int
|
||||
blocking_gotcha::operator()(const comp::gotcha_data&,
|
||||
blocking_gotcha::operator()(gotcha_index<sigtimedwait_idx>,
|
||||
int (*_func)(const sigset_t*, siginfo_t*,
|
||||
const struct timespec*),
|
||||
const sigset_t* _set_v, siginfo_t* _info_v,
|
||||
@@ -250,6 +257,20 @@ blocking_gotcha::operator()(const comp::gotcha_data&,
|
||||
|
||||
return _ret;
|
||||
}
|
||||
|
||||
int
|
||||
blocking_gotcha::operator()(gotcha_index<sigsuspend_idx>, int (*)(const sigset_t*),
|
||||
const sigset_t* _set_v) const noexcept
|
||||
{
|
||||
auto _old_set = sigset_t{};
|
||||
int _sig = 0;
|
||||
::sigprocmask(SIG_SETMASK, _set_v, &_old_set);
|
||||
// sigwait is wrapped so no need to block/unblock signals
|
||||
auto _ret = ::sigwait(_set_v, &_sig);
|
||||
::sigprocmask(SIG_SETMASK, &_old_set, nullptr);
|
||||
|
||||
return _ret;
|
||||
}
|
||||
} // namespace component
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -44,6 +44,22 @@ struct blocking_gotcha : comp::base<blocking_gotcha, void>
|
||||
{
|
||||
static constexpr size_t gotcha_capacity = 19;
|
||||
|
||||
template <size_t Idx>
|
||||
using gotcha_index = std::integral_constant<size_t, Idx>;
|
||||
|
||||
enum indexes
|
||||
{
|
||||
always_post_block_min_idx = 0,
|
||||
always_post_block_max_idx = 5,
|
||||
maybe_post_block_min_idx = 6,
|
||||
maybe_post_block_max_idx = 13,
|
||||
sigwait_idx = 14,
|
||||
sigwaitinfo_idx = 15,
|
||||
sigtimedwait_idx = 16,
|
||||
sigsuspend_idx = 17,
|
||||
indexes_max = gotcha_capacity - 1,
|
||||
};
|
||||
|
||||
OMNITRACE_DEFAULT_OBJECT(blocking_gotcha)
|
||||
|
||||
// string id for component
|
||||
@@ -55,18 +71,22 @@ struct blocking_gotcha : comp::base<blocking_gotcha, void>
|
||||
static void configure();
|
||||
static void shutdown();
|
||||
|
||||
template <typename Ret, typename... Args>
|
||||
Ret operator()(const comp::gotcha_data&, Ret (*)(Args...), Args...) const noexcept;
|
||||
template <size_t Idx, typename Ret, typename... Args>
|
||||
std::enable_if_t<(Idx <= maybe_post_block_max_idx), Ret> operator()(
|
||||
gotcha_index<Idx>, Ret (*)(Args...), Args...) const noexcept;
|
||||
|
||||
int operator()(const comp::gotcha_data&, int (*)(const sigset_t*, int*),
|
||||
int operator()(gotcha_index<sigwait_idx>, int (*)(const sigset_t*, int*),
|
||||
const sigset_t*, int*) const noexcept;
|
||||
|
||||
int operator()(const comp::gotcha_data&, int (*)(const sigset_t*, siginfo_t*),
|
||||
int operator()(gotcha_index<sigwaitinfo_idx>, int (*)(const sigset_t*, siginfo_t*),
|
||||
const sigset_t*, siginfo_t*) const noexcept;
|
||||
|
||||
int operator()(const comp::gotcha_data&,
|
||||
int operator()(gotcha_index<sigtimedwait_idx>,
|
||||
int (*)(const sigset_t*, siginfo_t*, const struct timespec*),
|
||||
const sigset_t*, siginfo_t*, const struct timespec*) const noexcept;
|
||||
|
||||
int operator()(gotcha_index<sigsuspend_idx>, int (*)(const sigset_t*),
|
||||
const sigset_t*) const noexcept;
|
||||
};
|
||||
|
||||
using blocking_gotcha_t =
|
||||
|
||||
@@ -41,8 +41,6 @@ namespace component
|
||||
{
|
||||
namespace
|
||||
{
|
||||
namespace signals = ::tim::signals;
|
||||
|
||||
using bundle_t = tim::lightweight_tuple<blocking_gotcha_t, unblocking_gotcha_t>;
|
||||
|
||||
auto&
|
||||
@@ -101,18 +99,6 @@ causal_gotcha::stop()
|
||||
shutdown();
|
||||
}
|
||||
|
||||
void
|
||||
causal_gotcha::block_signals()
|
||||
{
|
||||
signals::block_signals(sampling_signals(), signals::sigmask_scope::thread);
|
||||
}
|
||||
|
||||
void
|
||||
causal_gotcha::unblock_signals()
|
||||
{
|
||||
signals::unblock_signals(sampling_signals(), signals::sigmask_scope::thread);
|
||||
}
|
||||
|
||||
void
|
||||
causal_gotcha::remove_signals(sigset_t* _set)
|
||||
{
|
||||
|
||||
@@ -49,8 +49,6 @@ struct causal_gotcha : tim::component::base<causal_gotcha, void>
|
||||
static void start();
|
||||
static void stop();
|
||||
|
||||
static void block_signals();
|
||||
static void unblock_signals();
|
||||
static void remove_signals(sigset_t*);
|
||||
};
|
||||
} // namespace component
|
||||
|
||||
@@ -97,34 +97,35 @@ unblocking_gotcha::shutdown()
|
||||
unblocking_gotcha_t::disable();
|
||||
}
|
||||
|
||||
template <typename Ret, typename... Args>
|
||||
Ret
|
||||
unblocking_gotcha::operator()(const comp::gotcha_data& _data, Ret (*_func)(Args...),
|
||||
template <size_t Idx, typename Ret, typename... Args>
|
||||
std::enable_if_t<(Idx < unblocking_gotcha::indexes::kill_idx), Ret>
|
||||
unblocking_gotcha::operator()(gotcha_index<Idx>, Ret (*_func)(Args...),
|
||||
Args... _args) const noexcept
|
||||
{
|
||||
auto _active = get_thread_state() < ::omnitrace::ThreadState::Internal;
|
||||
|
||||
if(_active) causal::delay::process();
|
||||
|
||||
if(_active && _data.index == 7)
|
||||
if(_active)
|
||||
{
|
||||
int64_t _delay_value = (_active) ? causal::delay::get_global().load() : 0;
|
||||
causal::delay::process();
|
||||
|
||||
causal::sampling::block_backtrace_samples();
|
||||
auto _ret = (*_func)(_args...);
|
||||
causal::sampling::unblock_backtrace_samples();
|
||||
if constexpr(Idx == pthread_barrier_wait_idx)
|
||||
{
|
||||
int64_t _delay_value = (_active) ? causal::delay::get_global().load() : 0;
|
||||
|
||||
causal::delay::postblock(_delay_value);
|
||||
return _ret;
|
||||
}
|
||||
else
|
||||
{
|
||||
return (*_func)(_args...);
|
||||
causal::sampling::block_backtrace_samples();
|
||||
auto _ret = (*_func)(_args...);
|
||||
causal::sampling::unblock_backtrace_samples();
|
||||
|
||||
causal::delay::postblock(_delay_value);
|
||||
return _ret;
|
||||
}
|
||||
}
|
||||
|
||||
return (*_func)(_args...);
|
||||
}
|
||||
|
||||
int
|
||||
unblocking_gotcha::operator()(const comp::gotcha_data&, int (*_func)(pid_t, int),
|
||||
unblocking_gotcha::operator()(gotcha_index<kill_idx>, int (*_func)(pid_t, int),
|
||||
pid_t _pid, int _sig) const noexcept
|
||||
{
|
||||
auto _active = get_thread_state() < ::omnitrace::ThreadState::Internal;
|
||||
|
||||
@@ -43,6 +43,16 @@ struct unblocking_gotcha : comp::base<unblocking_gotcha, void>
|
||||
{
|
||||
static constexpr size_t gotcha_capacity = 9;
|
||||
|
||||
enum indexes
|
||||
{
|
||||
pthread_barrier_wait_idx = 7,
|
||||
kill_idx = 8,
|
||||
indexes_max = gotcha_capacity,
|
||||
};
|
||||
|
||||
template <size_t Idx>
|
||||
using gotcha_index = std::integral_constant<size_t, Idx>;
|
||||
|
||||
OMNITRACE_DEFAULT_OBJECT(unblocking_gotcha)
|
||||
|
||||
// string id for component
|
||||
@@ -54,10 +64,12 @@ struct unblocking_gotcha : comp::base<unblocking_gotcha, void>
|
||||
static void configure();
|
||||
static void shutdown();
|
||||
|
||||
template <typename Ret, typename... Args>
|
||||
Ret operator()(const comp::gotcha_data&, Ret (*)(Args...), Args...) const noexcept;
|
||||
template <size_t Idx, typename Ret, typename... Args>
|
||||
std::enable_if_t<(Idx < kill_idx), Ret> operator()(gotcha_index<Idx>,
|
||||
Ret (*)(Args...),
|
||||
Args...) const noexcept;
|
||||
|
||||
int operator()(const comp::gotcha_data&, int (*)(pid_t, int), pid_t,
|
||||
int operator()(gotcha_index<kill_idx>, int (*)(pid_t, int), pid_t,
|
||||
int) const noexcept;
|
||||
};
|
||||
|
||||
|
||||
@@ -28,11 +28,14 @@
|
||||
#include "binary/scope_filter.hpp"
|
||||
#include "core/binary/fwd.hpp"
|
||||
#include "core/config.hpp"
|
||||
#include "core/containers/c_array.hpp"
|
||||
#include "core/debug.hpp"
|
||||
#include "core/state.hpp"
|
||||
#include "core/utility.hpp"
|
||||
#include "library/causal/delay.hpp"
|
||||
#include "library/causal/experiment.hpp"
|
||||
#include "library/causal/fwd.hpp"
|
||||
#include "library/causal/sample_data.hpp"
|
||||
#include "library/causal/sampling.hpp"
|
||||
#include "library/causal/selected_entry.hpp"
|
||||
#include "library/ptl.hpp"
|
||||
@@ -48,6 +51,7 @@
|
||||
#include <timemory/unwind/dlinfo.hpp>
|
||||
#include <timemory/unwind/processed_entry.hpp>
|
||||
#include <timemory/utility/procfs/maps.hpp>
|
||||
#include <timemory/utility/types.hpp>
|
||||
|
||||
#include <algorithm>
|
||||
#include <atomic>
|
||||
@@ -117,7 +121,7 @@ get_eligible_address_ranges()
|
||||
using sf = binary::scope_filter;
|
||||
|
||||
auto
|
||||
get_filters(std::set<binary::scope_filter::filter_scope> _scopes = {
|
||||
get_filters(const std::set<binary::scope_filter::filter_scope>& _scopes = {
|
||||
sf::BINARY_FILTER, sf::SOURCE_FILTER, sf::FUNCTION_FILTER })
|
||||
{
|
||||
auto _filters = std::vector<binary::scope_filter>{};
|
||||
@@ -276,29 +280,15 @@ auto
|
||||
compute_eligible_lines_impl()
|
||||
{
|
||||
const auto& _binary_info = get_cached_binary_info().first;
|
||||
auto& _filter_info = get_cached_binary_info().second;
|
||||
auto& _scoped_info = get_cached_binary_info().second;
|
||||
auto _filters = get_filters();
|
||||
|
||||
auto& _eligible_ar = get_eligible_address_ranges();
|
||||
for(const auto& litr : _binary_info)
|
||||
{
|
||||
for(const auto& ditr : litr.mappings)
|
||||
{
|
||||
_eligible_ar +=
|
||||
std::make_pair(binary::address_multirange::coarse{},
|
||||
address_range_t{ ditr.load_address, ditr.last_address });
|
||||
}
|
||||
|
||||
for(const auto& ditr : litr.symbols)
|
||||
{
|
||||
_eligible_ar += ditr.address + ditr.load_address;
|
||||
}
|
||||
|
||||
auto& _filtered = _filter_info.emplace_back();
|
||||
_filtered.bfd = litr.bfd;
|
||||
_filtered.mappings = litr.mappings;
|
||||
_filtered.ranges = litr.ranges;
|
||||
_filtered.sections = litr.sections;
|
||||
auto& _scoped = _scoped_info.emplace_back();
|
||||
_scoped.bfd = litr.bfd;
|
||||
_scoped.mappings = litr.mappings;
|
||||
_scoped.sections = litr.sections;
|
||||
|
||||
for(const auto& ditr : litr.symbols)
|
||||
{
|
||||
@@ -312,7 +302,8 @@ compute_eligible_lines_impl()
|
||||
|
||||
if(ditr(_filters) || (_sym.inlines.size() + _sym.dwarf_info.size()) > 0)
|
||||
{
|
||||
_filtered.symbols.emplace_back(_sym);
|
||||
_scoped.ranges.emplace_back(_sym.ipaddr());
|
||||
_scoped.symbols.emplace_back(_sym);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -322,17 +313,31 @@ compute_eligible_lines_impl()
|
||||
sf::satisfies_filter(_filters, sf::SOURCE_FILTER,
|
||||
join(':', ditr.file, ditr.line)))
|
||||
{
|
||||
_filtered.debug_info.emplace_back(ditr);
|
||||
_scoped.debug_info.emplace_back(ditr);
|
||||
}
|
||||
}
|
||||
|
||||
_filtered.sort();
|
||||
_scoped.sort();
|
||||
}
|
||||
|
||||
auto& _eligible_ar = get_eligible_address_ranges();
|
||||
for(const auto& litr : _scoped_info)
|
||||
{
|
||||
for(const auto& ditr : litr.symbols)
|
||||
{
|
||||
_eligible_ar += ditr.ipaddr();
|
||||
}
|
||||
|
||||
for(auto ditr : litr.ranges)
|
||||
{
|
||||
_eligible_ar += ditr;
|
||||
}
|
||||
}
|
||||
|
||||
OMNITRACE_VERBOSE(
|
||||
0, "[causal] eligible address ranges: %zu, coarse address range: %zu [%s]\n",
|
||||
_eligible_ar.size(), _eligible_ar.range_size(),
|
||||
_eligible_ar.coarse_range.as_string().c_str());
|
||||
_eligible_ar.get_coarse_range().as_string().c_str());
|
||||
|
||||
if(_eligible_ar.empty())
|
||||
{
|
||||
@@ -369,9 +374,10 @@ save_maps_info_impl(std::ostream& _ofs)
|
||||
|
||||
void
|
||||
save_line_info_impl(std::ostream& _ofs,
|
||||
const std::vector<binary::binary_info>& _binary_data)
|
||||
const std::vector<binary::binary_info>& _binary_data,
|
||||
const std::array<bool, 3>& _info = { true, true, true })
|
||||
{
|
||||
auto _write_impl = [&_ofs](const binary::binary_info& _data) {
|
||||
auto _write_impl = [&_ofs, &_info](const binary::binary_info& _data) {
|
||||
for(const auto& itr : _data.mappings)
|
||||
{
|
||||
_ofs << itr.pathname << " [" << as_hex(itr.load_address) << " - "
|
||||
@@ -389,28 +395,38 @@ save_line_info_impl(std::ostream& _ofs,
|
||||
if(!itr.func.empty()) _ofs << " [" << tim::demangle(itr.func) << "]";
|
||||
_ofs << "\n";
|
||||
|
||||
for(const auto& ditr : itr.inlines)
|
||||
if(std::get<0>(_info))
|
||||
{
|
||||
_ofs << " " << ditr.file << ":" << ditr.line;
|
||||
if(!ditr.func.empty()) _ofs << " [" << tim::demangle(ditr.func) << "]";
|
||||
_ofs << "\n";
|
||||
for(const auto& ditr : itr.inlines)
|
||||
{
|
||||
_ofs << " " << ditr.file << ":" << ditr.line;
|
||||
if(!ditr.func.empty())
|
||||
_ofs << " [" << tim::demangle(ditr.func) << "]";
|
||||
_ofs << "\n";
|
||||
}
|
||||
}
|
||||
|
||||
for(const auto& ditr : itr.dwarf_info)
|
||||
if(std::get<1>(_info))
|
||||
{
|
||||
_ofs << " " << as_hex(ditr.address) << " :: " << ditr.file << ":"
|
||||
<< ditr.line;
|
||||
_ofs << "\n";
|
||||
_emitted_dwarf_addresses.emplace(ditr.address.low);
|
||||
for(const auto& ditr : itr.dwarf_info)
|
||||
{
|
||||
_ofs << " " << as_hex(ditr.address) << " :: " << ditr.file
|
||||
<< ":" << ditr.line;
|
||||
_ofs << "\n";
|
||||
_emitted_dwarf_addresses.emplace(ditr.address.low);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for(const auto& itr : _data.debug_info)
|
||||
if(std::get<2>(_info))
|
||||
{
|
||||
if(_emitted_dwarf_addresses.count(itr.address.low) > 0) continue;
|
||||
_ofs << " " << as_hex(itr.address) << " :: " << itr.file << ":"
|
||||
<< itr.line;
|
||||
_ofs << "\n";
|
||||
for(const auto& itr : _data.debug_info)
|
||||
{
|
||||
if(_emitted_dwarf_addresses.count(itr.address.low) > 0) continue;
|
||||
_ofs << " " << as_hex(itr.address) << " :: " << itr.file << ":"
|
||||
<< itr.line;
|
||||
_ofs << "\n";
|
||||
}
|
||||
}
|
||||
|
||||
_ofs << "\n" << std::flush;
|
||||
@@ -433,6 +449,10 @@ compute_eligible_lines()
|
||||
});
|
||||
}
|
||||
|
||||
auto eligible_pc_history = std::map<uintptr_t, size_t>{};
|
||||
auto eligible_pc_idx = std::atomic<size_t>{ 0 };
|
||||
auto eligible_pc_candidates = std::atomic<size_t>{ 0 };
|
||||
|
||||
void
|
||||
perform_experiment_impl(std::shared_ptr<std::promise<void>> _started) // NOLINT
|
||||
{
|
||||
@@ -455,19 +475,18 @@ perform_experiment_impl(std::shared_ptr<std::promise<void>> _started) // NOLINT
|
||||
// notify that thread has started
|
||||
if(_started) _started->set_value();
|
||||
|
||||
// pause at least one second to determine sampling rate
|
||||
// std::this_thread::sleep_for(std::chrono::seconds{ 1 });
|
||||
|
||||
if(!config::get_causal_end_to_end())
|
||||
{
|
||||
// wait for at least one progress point to start
|
||||
while(num_progress_points.load(std::memory_order_relaxed) == 0)
|
||||
{
|
||||
std::this_thread::yield();
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds{ 1 });
|
||||
}
|
||||
}
|
||||
|
||||
// allow ~10 samples to be collected
|
||||
std::this_thread::yield();
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds{ 10 });
|
||||
|
||||
double _delay_sec =
|
||||
@@ -481,6 +500,7 @@ perform_experiment_impl(std::shared_ptr<std::promise<void>> _started) // NOLINT
|
||||
OMNITRACE_VERBOSE(1, "[causal] delaying experimentation for %.2f seconds...\n",
|
||||
_delay_sec);
|
||||
uint64_t _delay_nsec = _delay_sec * units::sec;
|
||||
std::this_thread::yield();
|
||||
std::this_thread::sleep_for(std::chrono::nanoseconds{ _delay_nsec });
|
||||
}
|
||||
|
||||
@@ -515,58 +535,128 @@ perform_experiment_impl(std::shared_ptr<std::promise<void>> _started) // NOLINT
|
||||
{
|
||||
if(get_state() == State::Finalized)
|
||||
{
|
||||
auto _memory = std::stringstream{};
|
||||
auto _binary = std::stringstream{};
|
||||
auto _scoped = std::stringstream{};
|
||||
auto _sample = std::stringstream{};
|
||||
save_maps_info_impl(_memory);
|
||||
save_line_info_impl(_binary, get_cached_binary_info().first);
|
||||
save_line_info_impl(_scoped, get_cached_binary_info().second);
|
||||
if(_impl_no > 0) return;
|
||||
|
||||
auto _samples = std::map<uintptr_t, size_t>{};
|
||||
OMNITRACE_VERBOSE(
|
||||
0,
|
||||
"[causal] experiment failed to start. Number of PC candidates: %zu\n",
|
||||
eligible_pc_candidates.load());
|
||||
|
||||
auto _memory = std::stringstream{};
|
||||
auto _binary = std::stringstream{};
|
||||
auto _scoped = std::stringstream{};
|
||||
auto _sample = std::stringstream{};
|
||||
auto _eligible = std::stringstream{};
|
||||
save_maps_info_impl(_memory);
|
||||
save_line_info_impl(_binary, get_cached_binary_info().first,
|
||||
{ true, true, false });
|
||||
save_line_info_impl(_scoped, get_cached_binary_info().second,
|
||||
{ true, true, false });
|
||||
|
||||
auto _samples_map = std::map<uintptr_t, size_t>{};
|
||||
for(const auto& itr : get_samples())
|
||||
{
|
||||
for(const auto& iitr : itr.second)
|
||||
{
|
||||
_samples[iitr.address] += iitr.count;
|
||||
_samples_map[iitr.address] += iitr.count;
|
||||
}
|
||||
}
|
||||
|
||||
auto _eligible_pc_hist = std::vector<std::pair<uintptr_t, size_t>>{};
|
||||
for(const auto& itr : eligible_pc_history)
|
||||
{
|
||||
_eligible_pc_hist.emplace_back(std::make_pair(itr.first, itr.second));
|
||||
}
|
||||
|
||||
std::sort(
|
||||
_eligible_pc_hist.begin(), _eligible_pc_hist.end(),
|
||||
[](auto&& _lhs, auto&& _rhs) { return _lhs.second > _rhs.second; });
|
||||
|
||||
for(const auto& itr : _eligible_pc_hist)
|
||||
{
|
||||
_eligible << " " << std::setw(8) << itr.second
|
||||
<< " :: " << as_hex(itr.first) << "\n";
|
||||
}
|
||||
|
||||
auto _samples = std::vector<std::pair<uintptr_t, size_t>>{};
|
||||
for(const auto& itr : _samples_map)
|
||||
_samples.emplace_back(std::make_pair(itr.first, itr.second));
|
||||
|
||||
// sort by most samples
|
||||
std::sort(_samples.begin(), _samples.end(),
|
||||
[](const auto& _lhs, const auto& _rhs) {
|
||||
return _lhs.second > _rhs.second;
|
||||
});
|
||||
|
||||
for(const auto& itr : _samples)
|
||||
{
|
||||
if(itr.second > 0)
|
||||
{
|
||||
auto _linfo = get_line_info(itr.first, true);
|
||||
// if(_linfo.size() > 1) _linfo.pop_front();
|
||||
for(const auto& iitr : _linfo)
|
||||
auto _is_eligible = is_eligible_address(itr.first) &&
|
||||
!get_line_info(itr.first, false).empty();
|
||||
auto _linfo = binary::lookup_ipaddr_entry<true>(itr.first);
|
||||
if(_linfo)
|
||||
{
|
||||
_sample << " " << std::setw(8) << itr.second
|
||||
<< " :: " << as_hex(itr.first) << " [" << iitr.file
|
||||
<< ":" << iitr.line << "][" << demangle(iitr.func)
|
||||
<< "]\n";
|
||||
}
|
||||
|
||||
if(_linfo.empty())
|
||||
{
|
||||
_sample << " " << std::setw(8) << itr.second
|
||||
<< " :: " << as_hex(itr.first) << "\n";
|
||||
<< " :: " << std::setw(5) << std::boolalpha
|
||||
<< _is_eligible << " :: " << as_hex(itr.first) << " "
|
||||
<< _linfo->location << ":" << _linfo->lineno << " ["
|
||||
<< demangle(_linfo->name) << "]\n";
|
||||
for(const auto& iitr : _linfo->lineinfo.lines)
|
||||
{
|
||||
_sample << " " << std::setw(8) << itr.second
|
||||
<< " :: " << std::setw(5) << std::boolalpha
|
||||
<< _is_eligible << " :: " << as_hex(itr.first)
|
||||
<< " " << iitr.location << ":" << iitr.line
|
||||
<< " [" << demangle(iitr.name) << "]\n";
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
OMNITRACE_PRINT_COLOR(fatal, "causal experiment never started\n");
|
||||
|
||||
std::cerr << std::flush;
|
||||
auto _cerr = tim::log::warning_stream(std::cerr);
|
||||
_cerr << "\nmaps:\n\n" << _memory.str() << "\n";
|
||||
_cerr << "\nbinary:\n\n" << _binary.str() << "\n";
|
||||
_cerr << "\nscoped:\n\n" << _scoped.str() << "\n";
|
||||
_cerr << "\nsample:\n\n" << _sample.str() << "\n";
|
||||
_cerr << "\npc samples:\n\n" << _sample.str() << "\n";
|
||||
_cerr << "\neligible pcs:\n\n" << _eligible.str() << "\n";
|
||||
_cerr << "\nscoped pcs:\n\n" << _scoped.str() << "\n";
|
||||
if(get_verbose() >= 1)
|
||||
{
|
||||
_cerr << "\nbinary pcs:\n\n" << _binary.str() << "\n";
|
||||
_cerr << "\nmaps:\n\n" << _memory.str() << "\n";
|
||||
}
|
||||
std::cerr << std::flush;
|
||||
|
||||
OMNITRACE_CONDITIONAL_THROW(_impl_no == 0, "experiment never started");
|
||||
// if launched via omnitrace-causal, allow end-to-end runs that do not
|
||||
// start experiments
|
||||
auto _omni_causal_launcher =
|
||||
get_env<std::string>("OMNITRACE_LAUNCHER", "", false) ==
|
||||
"omnitrace-causal";
|
||||
|
||||
if(!(get_causal_end_to_end() && _omni_causal_launcher))
|
||||
{
|
||||
OMNITRACE_CONDITIONAL_THROW(_impl_no == 0,
|
||||
"causal experiment never started");
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
else
|
||||
{
|
||||
OMNITRACE_VERBOSE(
|
||||
1,
|
||||
"[causal] experiment failed to start. Number of PC candidates: %zu\n",
|
||||
eligible_pc_candidates.load());
|
||||
}
|
||||
}
|
||||
|
||||
OMNITRACE_VERBOSE(3,
|
||||
"[causal] experiment started. Number of PC candidates: %zu\n",
|
||||
eligible_pc_candidates.load());
|
||||
|
||||
reset_sample_selection();
|
||||
|
||||
// wait for the experiment to complete
|
||||
if(config::get_causal_end_to_end())
|
||||
{
|
||||
@@ -592,15 +682,14 @@ perform_experiment_impl(std::shared_ptr<std::promise<void>> _started) // NOLINT
|
||||
}
|
||||
}
|
||||
|
||||
// thread-safe read/write ring-buffer via atomics
|
||||
using pc_ring_buffer_t = tim::data_storage::atomic_ring_buffer<uintptr_t>;
|
||||
// latest_eligible_pcs is an array of unwind_depth size -> samples will
|
||||
// use lowest indexes for most recent functions address in the call-stack
|
||||
auto latest_eligible_pc = []() {
|
||||
auto _arr = std::array<std::unique_ptr<pc_ring_buffer_t>, unwind_depth>{};
|
||||
using atomic_uintptr_t = std::atomic<uintptr_t>;
|
||||
constexpr size_t array_size = unwind_depth;
|
||||
auto _arr = std::array<std::unique_ptr<atomic_uintptr_t>, array_size>{};
|
||||
for(auto& itr : _arr)
|
||||
itr = std::make_unique<pc_ring_buffer_t>(units::get_page_size() /
|
||||
(sizeof(uintptr_t) + 1));
|
||||
itr = std::make_unique<std::atomic<uintptr_t>>(0);
|
||||
return _arr;
|
||||
}();
|
||||
} // namespace
|
||||
@@ -610,20 +699,21 @@ auto latest_eligible_pc = []() {
|
||||
bool
|
||||
is_eligible_address(uintptr_t _v)
|
||||
{
|
||||
return get_eligible_address_ranges().coarse_range.contains(_v);
|
||||
return get_eligible_address_ranges().contains(_v);
|
||||
}
|
||||
|
||||
void
|
||||
save_line_info(const settings::compose_filename_config& _cfg, int _verbose)
|
||||
{
|
||||
auto _write = [_verbose](const std::string& ofname, const auto& _data) {
|
||||
auto _write = [_verbose](const std::string& ofname, const auto& _data,
|
||||
const std::array<bool, 3>& _info) {
|
||||
auto _ofs = std::ofstream{};
|
||||
if(tim::filepath::open(_ofs, ofname))
|
||||
{
|
||||
if(_verbose >= 0)
|
||||
operation::file_output_message<binary::symbol>{}(
|
||||
ofname, std::string{ "causal_symbol_info" });
|
||||
save_line_info_impl(_ofs, _data);
|
||||
save_line_info_impl(_ofs, _data, _info);
|
||||
save_maps_info_impl(_ofs);
|
||||
}
|
||||
else
|
||||
@@ -634,27 +724,54 @@ save_line_info(const settings::compose_filename_config& _cfg, int _verbose)
|
||||
|
||||
_write(tim::settings::compose_output_filename(
|
||||
join('-', config::get_causal_output_filename(), "binary"), "txt", _cfg),
|
||||
get_cached_binary_info().first);
|
||||
get_cached_binary_info().first, { true, true, true });
|
||||
_write(tim::settings::compose_output_filename(
|
||||
join('-', config::get_causal_output_filename(), "scoped"), "txt", _cfg),
|
||||
get_cached_binary_info().second);
|
||||
get_cached_binary_info().second, { true, true, false });
|
||||
}
|
||||
|
||||
size_t
|
||||
set_current_selection(unwind_addr_t _stack)
|
||||
{
|
||||
for(auto itr : _stack)
|
||||
{
|
||||
if(itr == 0) continue;
|
||||
++eligible_pc_candidates;
|
||||
if(is_eligible_address(itr))
|
||||
{
|
||||
auto _idx = eligible_pc_idx++ % latest_eligible_pc.size();
|
||||
latest_eligible_pc.at(_idx)->store(itr);
|
||||
}
|
||||
}
|
||||
|
||||
return eligible_pc_idx.load(std::memory_order_relaxed);
|
||||
}
|
||||
|
||||
size_t
|
||||
set_current_selection(container::c_array<uint64_t> _stack)
|
||||
{
|
||||
for(auto itr : _stack)
|
||||
{
|
||||
if(itr == 0) continue;
|
||||
++eligible_pc_candidates;
|
||||
if(is_eligible_address(itr))
|
||||
{
|
||||
auto _idx = eligible_pc_idx++ % latest_eligible_pc.size();
|
||||
latest_eligible_pc.at(_idx)->store(itr);
|
||||
}
|
||||
}
|
||||
|
||||
return eligible_pc_idx.load(std::memory_order_relaxed);
|
||||
}
|
||||
|
||||
void
|
||||
set_current_selection(unwind_addr_t _stack)
|
||||
reset_sample_selection()
|
||||
{
|
||||
if(experiment::is_active()) return;
|
||||
|
||||
size_t _n = 0;
|
||||
for(auto itr : _stack)
|
||||
eligible_pc_idx.store(0);
|
||||
eligible_pc_candidates.store(0);
|
||||
for(auto& itr : latest_eligible_pc)
|
||||
{
|
||||
auto& _pcs = latest_eligible_pc.at(_n);
|
||||
if(_pcs && is_eligible_address(itr))
|
||||
{
|
||||
_pcs->write(&itr);
|
||||
// increment after valid found -> first valid pc for call-stack
|
||||
++_n;
|
||||
}
|
||||
if(itr) itr->store(0);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -663,8 +780,6 @@ sample_selection(size_t _nitr, size_t _wait_ns)
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
size_t _n = 0;
|
||||
|
||||
auto _select_address = [&](auto& _address_vec) {
|
||||
// this isn't necessary bc of check before calling this lambda but
|
||||
// kept because of size() - 1 in distribution range
|
||||
@@ -688,6 +803,8 @@ sample_selection(size_t _nitr, size_t _wait_ns)
|
||||
|
||||
_address_vec.erase(_address_vec.begin() + _idx);
|
||||
|
||||
eligible_pc_history[_addr] += 1;
|
||||
|
||||
if(get_causal_mode() == CausalMode::Function)
|
||||
_sym_addr = (_dl_info.symbol) ? _dl_info.symbol.address() : _addr;
|
||||
|
||||
@@ -725,45 +842,37 @@ sample_selection(size_t _nitr, size_t _wait_ns)
|
||||
? linfo.front()
|
||||
: linfo.back();
|
||||
return selected_entry{ _addr, _sym_addr, _linfo_v };
|
||||
// return selected_entry{ address_range_t{ _addr },
|
||||
// address_range_t{ _sym_addr },
|
||||
// { _linfo_v.second } };
|
||||
}
|
||||
return selected_entry{};
|
||||
};
|
||||
|
||||
while(_n++ < _nitr)
|
||||
while(eligible_pc_idx.load(std::memory_order_relaxed) == 0)
|
||||
{
|
||||
if(get_state() >= State::Finalized) return selected_entry{};
|
||||
std::this_thread::yield();
|
||||
std::this_thread::sleep_for(std::chrono::nanoseconds{ _wait_ns });
|
||||
}
|
||||
|
||||
for(size_t _n = 0; _n < _nitr; ++_n)
|
||||
{
|
||||
auto _addresses = std::deque<uintptr_t>{};
|
||||
for(auto& aitr : latest_eligible_pc)
|
||||
{
|
||||
if(OMNITRACE_UNLIKELY(!aitr))
|
||||
{
|
||||
OMNITRACE_WARNING(0, "invalid ring buffer...\n");
|
||||
OMNITRACE_WARNING(0, "invalid atomic pc...\n");
|
||||
continue;
|
||||
}
|
||||
|
||||
auto _naddrs = aitr->count();
|
||||
if(_naddrs == 0) continue;
|
||||
|
||||
for(size_t i = 0; i < _naddrs; ++i)
|
||||
{
|
||||
uintptr_t _addr = 0;
|
||||
if(!aitr->is_empty() && aitr->read(&_addr) != nullptr)
|
||||
{
|
||||
if(_addr > 0) _addresses.emplace_back(_addr);
|
||||
}
|
||||
}
|
||||
|
||||
if(!_addresses.empty())
|
||||
{
|
||||
auto _selection = _select_address(_addresses);
|
||||
if(_selection) return _selection;
|
||||
}
|
||||
uintptr_t _addr = aitr->load();
|
||||
if(_addr > 0) _addresses.emplace_back(_addr);
|
||||
}
|
||||
|
||||
std::this_thread::yield();
|
||||
std::this_thread::sleep_for(std::chrono::nanoseconds{ _wait_ns });
|
||||
if(!_addresses.empty())
|
||||
{
|
||||
auto _selection = _select_address(_addresses);
|
||||
if(_selection) return _selection;
|
||||
}
|
||||
}
|
||||
|
||||
return selected_entry{};
|
||||
@@ -781,12 +890,32 @@ get_line_info(uintptr_t _addr, bool _include_discarded)
|
||||
{
|
||||
auto _local_data = std::deque<binary::symbol>{};
|
||||
|
||||
// make sure the address is in the coarse grained mapped regions
|
||||
// before performing an exhaustive search
|
||||
bool _is_mapped = std::find_if(litr.mappings.begin(), litr.mappings.end(),
|
||||
[_addr](const auto& mitr) {
|
||||
return address_range_t{ mitr.load_address,
|
||||
mitr.last_address }
|
||||
.contains(_addr);
|
||||
}) != litr.mappings.end();
|
||||
|
||||
if(!_is_mapped) return;
|
||||
|
||||
for(const auto& ditr : litr.symbols)
|
||||
{
|
||||
// skip if load address is greater than address
|
||||
if(_addr < ditr.load_address) continue;
|
||||
// compute the symbols ip address range
|
||||
auto _ipaddr = ditr.ipaddr();
|
||||
// if the lower bound of the ip address range is greater than the address,
|
||||
// all following symbols are not worth searching since they are at higher
|
||||
// addresses than this symbol (sorted by address)
|
||||
// if(_ipaddr.low > _addr) break;
|
||||
|
||||
if(!_ipaddr.contains(_addr)) continue;
|
||||
|
||||
if(config::get_causal_mode() == CausalMode::Function)
|
||||
if(_include_discarded ||
|
||||
config::get_causal_mode() == CausalMode::Function)
|
||||
{
|
||||
// check if the primary symbol satisfy the constraints
|
||||
if(ditr(_filters)) _local_data.emplace_back(ditr);
|
||||
@@ -795,28 +924,28 @@ get_line_info(uintptr_t _addr, bool _include_discarded)
|
||||
// functions may
|
||||
utility::combine(_local_data, ditr.get_inline_symbols(_filters));
|
||||
}
|
||||
else if(config::get_causal_mode() == CausalMode::Line)
|
||||
|
||||
if(_include_discarded || config::get_causal_mode() == CausalMode::Line)
|
||||
{
|
||||
auto _debug_data = std::deque<binary::symbol>{};
|
||||
for(const auto& itr : ditr.get_debug_line_info(_filters))
|
||||
{
|
||||
if(!_ipaddr.contains(itr.ipaddr()))
|
||||
OMNITRACE_THROW(
|
||||
"Error! debug line info ipaddr (%s) is not contained in "
|
||||
"symbol ipaddr (%s)",
|
||||
as_hex(itr.ipaddr()).c_str(), as_hex(_ipaddr).c_str());
|
||||
if(itr.ipaddr().contains(_addr)) _debug_data.emplace_back(itr);
|
||||
}
|
||||
utility::combine(_local_data, _debug_data);
|
||||
}
|
||||
else
|
||||
{
|
||||
throw exception<std::runtime_error>(
|
||||
join(" ", "Causal mode not supported:",
|
||||
std::to_string(config::get_causal_mode())));
|
||||
}
|
||||
}
|
||||
|
||||
if(!_local_data.empty())
|
||||
{
|
||||
// combine and only allow first match
|
||||
utility::combine(_data, _local_data);
|
||||
break;
|
||||
if(!_include_discarded) break;
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
@@ -52,10 +52,14 @@ get_line_info(uintptr_t _addr, bool include_discarded = true);
|
||||
|
||||
bool is_eligible_address(uintptr_t);
|
||||
|
||||
void set_current_selection(unwind_addr_t);
|
||||
size_t set_current_selection(unwind_addr_t);
|
||||
size_t set_current_selection(container::c_array<uint64_t>);
|
||||
|
||||
void
|
||||
reset_sample_selection();
|
||||
|
||||
selected_entry
|
||||
sample_selection(size_t _nitr = 1000, size_t _wait_ns = 10000);
|
||||
sample_selection(size_t _nitr = 1000, size_t _wait_ns = 100000);
|
||||
|
||||
void push_progress_point(std::string_view);
|
||||
|
||||
|
||||
@@ -25,6 +25,7 @@
|
||||
#include "core/utility.hpp"
|
||||
#include "library/causal/components/causal_gotcha.hpp"
|
||||
#include "library/causal/experiment.hpp"
|
||||
#include "library/causal/sampling.hpp"
|
||||
#include "library/runtime.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "library/thread_info.hpp"
|
||||
@@ -108,17 +109,16 @@ delay::process()
|
||||
{
|
||||
if(get_global() < get_local())
|
||||
{
|
||||
auto _diff = (get_local() - get_global());
|
||||
if(_diff > sleep_for_overhead) get_global() += _diff;
|
||||
get_global() += (get_local() - get_global());
|
||||
}
|
||||
else if(get_global() > get_local())
|
||||
{
|
||||
::omnitrace::causal::component::causal_gotcha::block_signals();
|
||||
::omnitrace::causal::sampling::pause();
|
||||
auto _beg = tracing::now();
|
||||
std::this_thread::sleep_for(
|
||||
std::chrono::nanoseconds{ get_global() - get_local() });
|
||||
get_local() += (tracing::now() - _beg);
|
||||
::omnitrace::causal::component::causal_gotcha::unblock_signals();
|
||||
::omnitrace::causal::sampling::resume();
|
||||
}
|
||||
}
|
||||
else
|
||||
|
||||
@@ -21,6 +21,9 @@
|
||||
// SOFTWARE.
|
||||
|
||||
#include "library/causal/experiment.hpp"
|
||||
#include "binary/analysis.hpp"
|
||||
#include "binary/dwarf_entry.hpp"
|
||||
#include "binary/symbol.hpp"
|
||||
#include "common/defines.h"
|
||||
#include "core/config.hpp"
|
||||
#include "core/debug.hpp"
|
||||
@@ -29,11 +32,11 @@
|
||||
#include "library/causal/components/progress_point.hpp"
|
||||
#include "library/causal/data.hpp"
|
||||
#include "library/causal/delay.hpp"
|
||||
#include "library/causal/sample_data.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
#include "library/thread_info.hpp"
|
||||
#include "library/tracing.hpp"
|
||||
|
||||
#include <string>
|
||||
#include <timemory/components/timing/backends.hpp>
|
||||
#include <timemory/hash/types.hpp>
|
||||
#include <timemory/mpl/policy.hpp>
|
||||
@@ -42,10 +45,12 @@
|
||||
#include <timemory/tpls/cereal/cereal/archives/json.hpp>
|
||||
#include <timemory/tpls/cereal/types.hpp>
|
||||
#include <timemory/units.hpp>
|
||||
#include <timemory/unwind/dlinfo.hpp>
|
||||
|
||||
#include <chrono>
|
||||
#include <ratio>
|
||||
#include <regex>
|
||||
#include <string>
|
||||
#include <thread>
|
||||
#include <vector>
|
||||
|
||||
@@ -68,26 +73,31 @@ bool use_exp_speedup_scaling =
|
||||
get_env<bool>("OMNITRACE_CAUSAL_SCALE_EXPERIMENT_TIME_BY_SPEEDUP", false);
|
||||
} // namespace
|
||||
|
||||
experiment::sample::sample(const base_type& _b, uint64_t _c)
|
||||
: base_type{ _b }
|
||||
, count{ _c }
|
||||
{
|
||||
if(lineinfo)
|
||||
{
|
||||
for(const auto& itr : lineinfo.lines)
|
||||
{
|
||||
if(itr.inlined)
|
||||
inlines.emplace_back(
|
||||
binary::inlined_symbol{ itr.line, itr.location, itr.name });
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
bool
|
||||
experiment::sample::operator==(const sample& _v) const
|
||||
{
|
||||
return std::tie(address, info.line, info.file, info.func, location) ==
|
||||
std::tie(_v.address, _v.info.line, _v.info.file, _v.info.func, _v.location);
|
||||
return base_type::operator==(_v);
|
||||
}
|
||||
|
||||
bool
|
||||
experiment::sample::operator<(const sample& _v) const
|
||||
{
|
||||
if(info.line > 0 && _v.info.line > 0)
|
||||
{
|
||||
return std::tie(info.line, info.file) == std::tie(_v.info.line, _v.info.file);
|
||||
}
|
||||
else if((info.line + _v.info.line) > 0)
|
||||
{
|
||||
return std::tie(info.file, location, info.line) <
|
||||
std::tie(_v.info.file, _v.location, _v.info.line);
|
||||
}
|
||||
return (location < _v.location);
|
||||
return base_type::operator<(_v);
|
||||
}
|
||||
|
||||
const auto&
|
||||
@@ -102,8 +112,35 @@ void
|
||||
experiment::sample::serialize(ArchiveT& ar, const unsigned)
|
||||
{
|
||||
namespace cereal = ::tim::cereal;
|
||||
ar(cereal::make_nvp("location", location), cereal::make_nvp("count", count),
|
||||
cereal::make_nvp("info", info));
|
||||
using cereal::make_nvp;
|
||||
|
||||
ar(cereal::make_nvp("count", count));
|
||||
if constexpr(concepts::is_output_archive<ArchiveT>::value)
|
||||
{
|
||||
ar(cereal::make_nvp("location", get_identifier()));
|
||||
}
|
||||
|
||||
ar.setNextName("info");
|
||||
ar.startNode();
|
||||
ar(make_nvp("address", address), make_nvp("line", lineno), make_nvp("file", location),
|
||||
make_nvp("func", name));
|
||||
|
||||
if constexpr(concepts::is_output_archive<ArchiveT>::value)
|
||||
{
|
||||
ar(cereal::make_nvp("dfunc", demangle(name)),
|
||||
cereal::make_nvp("dwarf_info", std::vector<binary::dwarf_entry>{}));
|
||||
}
|
||||
ar(cereal::make_nvp("inlines", inlines));
|
||||
ar.finishNode();
|
||||
|
||||
ar(cereal::make_nvp("dlinfo", info));
|
||||
}
|
||||
|
||||
std::string
|
||||
experiment::sample::get_identifier() const
|
||||
{
|
||||
return (lineno > 0 && !location.empty()) ? join(":", location, lineno)
|
||||
: demangle(name);
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
@@ -119,7 +156,7 @@ experiment::record::serialize(ArchiveT& ar, const unsigned)
|
||||
{
|
||||
ar(cereal::make_nvp("samples", _samples));
|
||||
for(auto& itr : _samples)
|
||||
samples.emplace(std::move(itr));
|
||||
samples.emplace_back(std::move(itr));
|
||||
}
|
||||
else
|
||||
{
|
||||
@@ -171,8 +208,6 @@ experiment::serialize(ArchiveT& ar, const unsigned)
|
||||
}
|
||||
ar(cereal::make_nvp("progress_points", _ppts));
|
||||
}
|
||||
|
||||
ar(cereal::make_nvp("period_stats", period_stats));
|
||||
}
|
||||
|
||||
std::string
|
||||
@@ -203,9 +238,6 @@ experiment::start()
|
||||
|
||||
// sampling period in nanoseconds
|
||||
sampling_period = backtrace_causal::get_period(units::nsec);
|
||||
// adjust for the real sampling period
|
||||
period_stats = causal::component::backtrace::get_period_stats();
|
||||
if(period_stats.get_count() > 10) sampling_period = period_stats.get_mean();
|
||||
|
||||
// experiment time is scaled up for longer speedups
|
||||
index = experiment_history.size() + 1;
|
||||
@@ -222,10 +254,14 @@ experiment::start()
|
||||
OMNITRACE_VERBOSE(0, "Starting causal experiment #%-3u: %s\n", index,
|
||||
as_string().c_str());
|
||||
|
||||
current_experiment_value = *this;
|
||||
current_selected_count.store(0);
|
||||
current_experiment.store(this);
|
||||
return true;
|
||||
if(get_state() < State::Finalized)
|
||||
{
|
||||
current_experiment_value = *this;
|
||||
current_selected_count.store(0);
|
||||
current_experiment.store(this);
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
bool
|
||||
@@ -258,34 +294,46 @@ experiment::stop()
|
||||
total_delay = (global_delay - total_delay);
|
||||
duration = (experiment_time > total_delay) ? (experiment_time - total_delay) : 0;
|
||||
fini_progress = component::progress_point::get_progress_points();
|
||||
period_stats = causal::component::backtrace::get_period_stats();
|
||||
|
||||
// sync data
|
||||
delay::sync();
|
||||
|
||||
// for larger speedups, we increased the experiment time, so we want to artificially
|
||||
// increase num by the same factor. E.g. 10 throughput points at speedup 50 should
|
||||
// really look like 15
|
||||
double _scale_num = 1.0 + ((use_exp_speedup_scaling) ? delay_scaling : 0.0);
|
||||
auto _prog_stats = tim::statistics<int64_t>{};
|
||||
auto _prog_stats = tim::statistics<double>{};
|
||||
auto _prog_vals = std::vector<int64_t>{};
|
||||
_prog_vals.reserve(fini_progress.size());
|
||||
for(auto fitr : fini_progress)
|
||||
{
|
||||
auto _pt = fitr.second - init_progress[fitr.first];
|
||||
int64_t _num =
|
||||
std::max<int64_t>({ _pt.get_laps(), _pt.get_arrival(), _pt.get_departure() });
|
||||
if(_num > 0) _prog_stats += (_num * _scale_num);
|
||||
if(_num > 0) _prog_vals.emplace_back(_num);
|
||||
}
|
||||
std::sort(_prog_vals.begin(), _prog_vals.end());
|
||||
for(auto itr : _prog_vals)
|
||||
_prog_stats += itr;
|
||||
|
||||
auto _mean = (_prog_stats.get_count() > 0) ? _prog_stats.get_mean() : 0;
|
||||
auto _high = (_prog_stats.get_count() > 0) ? _prog_stats.get_max() : 0;
|
||||
if(_high < 5)
|
||||
auto _nvals = _prog_vals.size();
|
||||
auto _medi = (_nvals > 2) ? _prog_vals.at(_nvals / 2) : _prog_vals.front();
|
||||
auto _mean = (_nvals > 0) ? _prog_stats.get_mean() : 0;
|
||||
auto _high = (_nvals > 0) ? _prog_stats.get_max() : 0;
|
||||
auto _lowv = (_nvals > 0) ? _prog_stats.get_min() : 0;
|
||||
|
||||
if(_lowv <= 3 && (_mean < 5 || _medi < 5))
|
||||
{
|
||||
OMNITRACE_VERBOSE(2,
|
||||
"[progress points] increasing experiment time :: low: %6.3f, "
|
||||
"high: %6.3f, mean: %6.3f, median: %zi\n",
|
||||
_lowv, _high, _mean, _medi);
|
||||
global_scaling *= 2;
|
||||
++global_scaling_increments; // keep track of how many successive increments have
|
||||
// been performed
|
||||
}
|
||||
else if(_mean > 10 && global_scaling > 1)
|
||||
else if(_mean > 10 && _lowv >= 8 && global_scaling > 1)
|
||||
{
|
||||
OMNITRACE_VERBOSE(2,
|
||||
"[progress points] decreasing experiment time :: low: %6.3f, "
|
||||
"high: %6.3f, mean: %6.3f, median: %zi\n",
|
||||
_lowv, _high, _mean, _medi);
|
||||
global_scaling /= 2;
|
||||
global_scaling_increments = 0;
|
||||
}
|
||||
@@ -304,7 +352,9 @@ experiment::stop()
|
||||
|
||||
if(_high > 0) experiment_history.emplace_back(*this);
|
||||
|
||||
std::this_thread::sleep_for(std::chrono::nanoseconds{ sampling_period * batch_size });
|
||||
std::this_thread::sleep_for(
|
||||
std::chrono::nanoseconds{ 5 * sampling_period * batch_size });
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
@@ -320,7 +370,6 @@ experiment::as_string() const
|
||||
_ss << ", duration: " << std::setw(5) << std::fixed << std::setprecision(3)
|
||||
<< _dur << " sec";
|
||||
_ss << " :: experiment: " << as_hex(selection.address) << " ";
|
||||
//_ss << " [" << selection.info.ipaddr().as_string() << "]";
|
||||
if(selection.symbol_address > 0 && selection.address != selection.symbol_address)
|
||||
_ss << "(symbol@" << as_hex(selection.symbol_address) << ") ";
|
||||
if(!selection.symbol.file.empty() && selection.symbol.line > 0)
|
||||
@@ -375,13 +424,30 @@ experiment::is_active()
|
||||
return (current_experiment.load(std::memory_order_relaxed) != nullptr);
|
||||
}
|
||||
|
||||
bool
|
||||
experiment::is_selected(uint64_t _addr)
|
||||
{
|
||||
return (is_active() && current_experiment_value.selection.contains(_addr));
|
||||
}
|
||||
|
||||
bool
|
||||
experiment::is_selected(unwind_addr_t _stack)
|
||||
{
|
||||
if(is_active())
|
||||
{
|
||||
for(auto itr : _stack)
|
||||
if(current_experiment_value.selection.contains(itr)) return true;
|
||||
if(itr > 0 && current_experiment_value.selection.contains(itr)) return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
bool
|
||||
experiment::is_selected(container::c_array<uint64_t> _stack)
|
||||
{
|
||||
if(is_active())
|
||||
{
|
||||
for(auto itr : _stack)
|
||||
if(itr > 0 && current_experiment_value.selection.contains(itr)) return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
@@ -413,9 +479,6 @@ experiment::save_experiments(std::string _fname_base, const filename_config_t& _
|
||||
{
|
||||
const auto& _info0 = thread_info::get(0, InternalTID);
|
||||
|
||||
// if(experiment_history.size() > 1)
|
||||
// experiment_history.erase(experiment_history.begin());
|
||||
|
||||
auto current_record = record{};
|
||||
current_record.startup = _info0->lifetime.first;
|
||||
|
||||
@@ -446,11 +509,7 @@ experiment::save_experiments(std::string _fname_base, const filename_config_t& _
|
||||
// update sample data
|
||||
{
|
||||
auto _add_sample = [¤t_record](sample&& _v) {
|
||||
auto fitr = current_record.samples.find(_v);
|
||||
if(fitr != current_record.samples.end())
|
||||
*fitr += _v;
|
||||
else
|
||||
current_record.samples.emplace(std::move(_v));
|
||||
current_record.samples.emplace_back(std::move(_v));
|
||||
};
|
||||
|
||||
auto _total_samples = std::map<uintptr_t, size_t>{};
|
||||
@@ -462,41 +521,24 @@ experiment::save_experiments(std::string _fname_base, const filename_config_t& _
|
||||
}
|
||||
}
|
||||
|
||||
OMNITRACE_VERBOSE_F(1, "Processing line info for %zu sampled addresses...\n",
|
||||
_total_samples.size());
|
||||
|
||||
for(const auto& itr : _total_samples)
|
||||
{
|
||||
auto _entry = binary::lookup_ipaddr_entry<true>(itr.first);
|
||||
if(_entry) _add_sample(sample{ *_entry, itr.second });
|
||||
}
|
||||
|
||||
auto _binfo_cfg = settings::compose_filename_config{};
|
||||
_binfo_cfg.subdirectory = "causal/binary-info";
|
||||
_binfo_cfg.use_suffix = config::get_use_pid();
|
||||
save_line_info(_binfo_cfg, config::get_verbose());
|
||||
|
||||
for(const auto& itr : _total_samples)
|
||||
{
|
||||
auto _addr = itr.first;
|
||||
auto _count = itr.second;
|
||||
if(_count > 0)
|
||||
{
|
||||
auto _linfo = get_line_info(_addr, true);
|
||||
for(const auto& iitr : _linfo)
|
||||
{
|
||||
auto _name = (iitr.line > 0) ? join(":", iitr.file, iitr.line)
|
||||
: demangle(iitr.func);
|
||||
|
||||
_name = join(" :: ", as_hex(_addr), _name);
|
||||
_add_sample(sample{ _count, _addr, _name, iitr });
|
||||
}
|
||||
|
||||
if(_linfo.empty() && config::get_debug())
|
||||
{
|
||||
_add_sample(
|
||||
sample{ _count, _addr, as_hex(_addr), sample::line_info{} });
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
bool _causal_output_reset =
|
||||
config::get_setting_value<bool>("OMNITRACE_CAUSAL_FILE_RESET").value_or(false);
|
||||
|
||||
// if(current_record.experiments.empty()) return;
|
||||
|
||||
{
|
||||
auto _saved_experiments = (_causal_output_reset)
|
||||
? std::vector<experiment::record>{}
|
||||
@@ -615,7 +657,8 @@ experiment::save_experiments(std::string _fname_base, const filename_config_t& _
|
||||
|
||||
for(const auto& itr : current_record.samples)
|
||||
{
|
||||
ofs << "samples\tlocation=" << itr.location << "\tcount=" << itr.count;
|
||||
ofs << "samples\tlocation=" << itr.get_identifier()
|
||||
<< "\tcount=" << itr.count;
|
||||
if(config::get_debug()) ofs << "\taddress=" << as_hex(itr.address);
|
||||
ofs << "\n";
|
||||
}
|
||||
|
||||
@@ -30,13 +30,13 @@
|
||||
#include "library/causal/components/backtrace.hpp"
|
||||
#include "library/causal/components/progress_point.hpp"
|
||||
#include "library/causal/data.hpp"
|
||||
#include "library/causal/sample_data.hpp"
|
||||
#include "library/causal/selected_entry.hpp"
|
||||
|
||||
#include <timemory/hash/types.hpp>
|
||||
#include <timemory/mpl/concepts.hpp>
|
||||
#include <timemory/tpls/cereal/cereal.hpp>
|
||||
#include <timemory/tpls/cereal/cereal/cereal.hpp>
|
||||
#include <timemory/unwind/types.hpp>
|
||||
#include <timemory/utility/unwind.hpp>
|
||||
|
||||
#include <atomic>
|
||||
@@ -55,17 +55,17 @@ struct experiment
|
||||
std::unordered_map<tim::hash_value_t, component::progress_point>;
|
||||
using experiments_t = std::vector<experiment>;
|
||||
using filename_config_t = settings::compose_filename_config;
|
||||
using sample_dataset_t = std::set<sample_data>;
|
||||
using period_stats_t = tim::statistics<int64_t>;
|
||||
|
||||
struct sample
|
||||
struct sample : unwind::processed_entry
|
||||
{
|
||||
using line_info = binary::symbol;
|
||||
using base_type = unwind::processed_entry;
|
||||
|
||||
mutable uint64_t count = 0;
|
||||
uintptr_t address = 0;
|
||||
std::string location = {};
|
||||
line_info info = {};
|
||||
sample() = default;
|
||||
sample(const base_type&, uint64_t);
|
||||
|
||||
mutable uint64_t count = 0;
|
||||
std::vector<binary::inlined_symbol> inlines = {};
|
||||
|
||||
bool operator==(const sample&) const;
|
||||
bool operator<(const sample&) const;
|
||||
@@ -73,6 +73,8 @@ struct experiment
|
||||
|
||||
template <typename ArchiveT>
|
||||
void serialize(ArchiveT& ar, const unsigned);
|
||||
|
||||
std::string get_identifier() const;
|
||||
};
|
||||
|
||||
struct record
|
||||
@@ -80,7 +82,7 @@ struct experiment
|
||||
int64_t startup = 0;
|
||||
uint64_t runtime = 0;
|
||||
std::vector<experiment> experiments = {};
|
||||
std::set<sample> samples = {};
|
||||
std::vector<sample> samples = {};
|
||||
|
||||
template <typename ArchiveT>
|
||||
void serialize(ArchiveT& ar, const unsigned);
|
||||
@@ -105,10 +107,18 @@ struct experiment
|
||||
static double get_delay_scaling();
|
||||
static uint32_t get_index();
|
||||
static bool is_active();
|
||||
static bool is_selected(uint64_t);
|
||||
static bool is_selected(unwind_addr_t);
|
||||
static bool is_selected(container::c_array<uint64_t>);
|
||||
static void add_selected();
|
||||
static experiments_t get_experiments();
|
||||
|
||||
template <size_t N>
|
||||
static bool is_selected(std::array<uint64_t, N> _v)
|
||||
{
|
||||
return is_selected(container::c_array<uint64_t>{ _v.data(), _v.size() });
|
||||
}
|
||||
|
||||
static void save_experiments();
|
||||
static void save_experiments(std::string, const filename_config_t&);
|
||||
static std::vector<record> load_experiments(bool _throw_on_err = true);
|
||||
@@ -116,24 +126,23 @@ struct experiment
|
||||
bool = true);
|
||||
|
||||
bool running = false;
|
||||
uint16_t virtual_speedup = 0; /// 0-100 in multiples of 5
|
||||
uint32_t index = 0; /// experiment number
|
||||
uint64_t sampling_period = 0; /// period b/t samples [nsec]
|
||||
uint64_t start_time = 0; /// start of experiment [nsec]
|
||||
uint64_t end_time = 0; /// end of experiment [nsec]
|
||||
uint64_t experiment_time = 0; /// how long the experiment ran [nsec]
|
||||
uint64_t duration = 0; /// runtime - delays [nsec]
|
||||
uint64_t batch_size = 10; /// batch factor for experiment/cooloff
|
||||
uint64_t scaling_factor = 50; /// scaling factor for experiment time
|
||||
uint64_t sample_delay = 0; /// how long to delay [nsec]
|
||||
uint64_t total_delay = 0; /// total delays [nsec]
|
||||
uint64_t selected = 0; /// num times selected line sampled
|
||||
uint16_t virtual_speedup = 0; /// 0-100 in multiples of 5
|
||||
uint32_t index = 0; /// experiment number
|
||||
uint64_t sampling_period = 0; /// period b/t samples [nsec]
|
||||
uint64_t start_time = 0; /// start of experiment [nsec]
|
||||
uint64_t end_time = 0; /// end of experiment [nsec]
|
||||
uint64_t experiment_time = 0; /// how long the experiment ran [nsec]
|
||||
uint64_t duration = 0; /// runtime - delays [nsec]
|
||||
uint64_t batch_size = 10; /// batch factor for experiment/cooloff
|
||||
uint64_t scaling_factor = 100; /// scaling factor for experiment time
|
||||
uint64_t sample_delay = 0; /// how long to delay [nsec]
|
||||
uint64_t total_delay = 0; /// total delays [nsec]
|
||||
uint64_t selected = 0; /// num times selected line sampled
|
||||
uint64_t global_delay = 0;
|
||||
double delay_scaling = 0.0; /// virtual_speedup / 100.
|
||||
selected_entry selection = {}; /// which line was selected
|
||||
progress_points_t init_progress = {}; /// progress points at start
|
||||
progress_points_t fini_progress = {}; /// progress points at end
|
||||
period_stats_t period_stats = {}; /// stats for sampling period
|
||||
};
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -22,6 +22,7 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "common/defines.h"
|
||||
#include "core/binary/fwd.hpp"
|
||||
#include "core/containers/static_vector.hpp"
|
||||
#include "core/defines.hpp"
|
||||
@@ -41,7 +42,7 @@ namespace unwind = ::tim::unwind;
|
||||
|
||||
namespace causal
|
||||
{
|
||||
static constexpr size_t unwind_depth = 8;
|
||||
static constexpr size_t unwind_depth = OMNITRACE_MAX_UNWIND_DEPTH;
|
||||
static constexpr size_t unwind_offset = 0;
|
||||
using unwind_stack_t = unwind::stack<unwind_depth>;
|
||||
using unwind_addr_t = container::static_vector<uintptr_t, unwind_depth>;
|
||||
|
||||
@@ -33,32 +33,38 @@ namespace causal
|
||||
{
|
||||
namespace
|
||||
{
|
||||
auto samples = std::map<uint32_t, std::set<sample_data>>{};
|
||||
auto samples = std::map<uint32_t, std::map<uintptr_t, uint64_t>>{};
|
||||
}
|
||||
|
||||
std::set<sample_data>
|
||||
std::vector<sample_data>
|
||||
get_samples(uint32_t _index)
|
||||
{
|
||||
return samples[_index];
|
||||
auto _data = std::vector<sample_data>{};
|
||||
_data.reserve(samples.at(_index).size());
|
||||
for(const auto& itr : samples.at(_index))
|
||||
{
|
||||
_data.emplace_back(sample_data{ itr.first, itr.second });
|
||||
}
|
||||
return _data;
|
||||
}
|
||||
|
||||
std::map<uint32_t, std::set<sample_data>>
|
||||
std::map<uint32_t, std::vector<sample_data>>
|
||||
get_samples()
|
||||
{
|
||||
return samples;
|
||||
auto _data = std::map<uint32_t, std::vector<sample_data>>{};
|
||||
|
||||
for(const auto& itr : samples)
|
||||
{
|
||||
_data[itr.first] = get_samples(itr.first);
|
||||
}
|
||||
|
||||
return _data;
|
||||
}
|
||||
|
||||
void
|
||||
add_sample(uint32_t _index, uintptr_t _v)
|
||||
add_sample(uint32_t _index, uintptr_t _addr, uint64_t _count)
|
||||
{
|
||||
auto& _samples = samples[_index];
|
||||
auto _value = sample_data{ _v };
|
||||
_value.count = 1;
|
||||
auto itr = _samples.find(_value);
|
||||
if(itr == _samples.end())
|
||||
_samples.emplace(_value);
|
||||
else
|
||||
itr->count += 1;
|
||||
samples[_index][_addr] += _count;
|
||||
}
|
||||
|
||||
void
|
||||
@@ -67,5 +73,12 @@ add_samples(uint32_t _index, const std::vector<uintptr_t>& _v)
|
||||
for(const auto& itr : _v)
|
||||
add_sample(_index, itr);
|
||||
}
|
||||
|
||||
void
|
||||
add_samples(uint32_t _index, const std::map<uintptr_t, uint64_t>& _v)
|
||||
{
|
||||
for(const auto& itr : _v)
|
||||
add_sample(_index, itr.first, itr.second);
|
||||
}
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -51,14 +51,17 @@ struct sample_data
|
||||
}
|
||||
};
|
||||
|
||||
std::map<uint32_t, std::set<sample_data>>
|
||||
std::map<uint32_t, std::vector<sample_data>>
|
||||
get_samples();
|
||||
|
||||
void
|
||||
add_samples(uint32_t, const std::vector<uintptr_t>&);
|
||||
|
||||
std::set<sample_data> get_samples(uint32_t);
|
||||
std::vector<sample_data> get_samples(uint32_t);
|
||||
|
||||
void add_sample(uint32_t, uintptr_t);
|
||||
void add_sample(uint32_t, uintptr_t, uint64_t = 1);
|
||||
|
||||
void
|
||||
add_samples(uint32_t, const std::map<uintptr_t, uint64_t>&);
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -21,6 +21,7 @@
|
||||
// SOFTWARE.
|
||||
|
||||
#include "library/causal/sampling.hpp"
|
||||
#include "binary/analysis.hpp"
|
||||
#include "core/common.hpp"
|
||||
#include "core/concepts.hpp"
|
||||
#include "core/config.hpp"
|
||||
@@ -30,6 +31,8 @@
|
||||
#include "core/utility.hpp"
|
||||
#include "library/causal/components/backtrace.hpp"
|
||||
#include "library/causal/data.hpp"
|
||||
#include "library/causal/sample_data.hpp"
|
||||
#include "library/perf.hpp"
|
||||
#include "library/ptl.hpp"
|
||||
#include "library/runtime.hpp"
|
||||
#include "library/sampling.hpp"
|
||||
@@ -37,8 +40,11 @@
|
||||
#include "library/thread_info.hpp"
|
||||
|
||||
#include <timemory/macros.hpp>
|
||||
#include <timemory/mpl/types.hpp>
|
||||
#include <timemory/sampling/allocator.hpp>
|
||||
#include <timemory/sampling/overflow.hpp>
|
||||
#include <timemory/sampling/sampler.hpp>
|
||||
#include <timemory/sampling/timer.hpp>
|
||||
#include <timemory/units.hpp>
|
||||
#include <timemory/utility/backtrace.hpp>
|
||||
#include <timemory/variadic.hpp>
|
||||
@@ -46,6 +52,7 @@
|
||||
#include <csignal>
|
||||
#include <cstring>
|
||||
#include <ctime>
|
||||
#include <memory>
|
||||
#include <mutex>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
@@ -58,11 +65,14 @@ namespace causal
|
||||
namespace sampling
|
||||
{
|
||||
using ::tim::sampling::dynamic;
|
||||
using ::tim::sampling::overflow;
|
||||
using ::tim::sampling::timer;
|
||||
|
||||
using causal_bundle_t =
|
||||
tim::lightweight_tuple<causal::component::sample_rate, causal::component::backtrace>;
|
||||
using causal_sampler_t = tim::sampling::sampler<causal_bundle_t, dynamic>;
|
||||
tim::lightweight_tuple<causal::component::overflow, causal::component::backtrace>;
|
||||
using causal_sampler_t = tim::sampling::sampler<causal_bundle_t, dynamic>;
|
||||
using backtrace_enabled = trait::runtime_enabled<component::backtrace>;
|
||||
using overflow_enabled = trait::runtime_enabled<component::overflow>;
|
||||
} // namespace sampling
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
@@ -156,17 +166,34 @@ void
|
||||
causal_offload_buffer(int64_t, causal_sampler_buffer_t&& _buf)
|
||||
{
|
||||
auto _data = std::move(_buf);
|
||||
auto _processed = std::map<uint32_t, std::vector<uintptr_t>>{};
|
||||
auto _processed = std::map<uint32_t, std::map<uintptr_t, uint64_t>>{};
|
||||
while(!_data.is_empty())
|
||||
{
|
||||
auto _bundle = causal_sampler_bundle_t{};
|
||||
_data.read(&_bundle);
|
||||
auto* _bt_causal = _bundle.get<causal::component::backtrace>();
|
||||
|
||||
const auto* _bt_causal = _bundle.get<causal::component::backtrace>();
|
||||
if(_bt_causal)
|
||||
{
|
||||
for(auto&& itr : _bt_causal->get_stack())
|
||||
auto _stack = _bt_causal->get_stack();
|
||||
|
||||
for(auto itr : _stack)
|
||||
{
|
||||
if(itr > 0) _processed[_bt_causal->get_index()].emplace_back(itr);
|
||||
if(itr > 0) _processed[_bt_causal->get_index()][itr] += 1;
|
||||
}
|
||||
}
|
||||
|
||||
const auto* _of_causal = _bundle.get<causal::component::overflow>();
|
||||
if(_of_causal)
|
||||
{
|
||||
const auto& _stack = _of_causal->get_stack();
|
||||
|
||||
for(const auto& ditr : _stack)
|
||||
{
|
||||
for(auto aitr : ditr)
|
||||
{
|
||||
if(aitr > 0) _processed[_of_causal->get_index()][aitr] += 1;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -177,7 +204,9 @@ causal_offload_buffer(int64_t, causal_sampler_buffer_t&& _buf)
|
||||
static auto _mutex = locking::atomic_mutex{};
|
||||
auto _lk = locking::atomic_lock{ _mutex };
|
||||
for(const auto& itr : _processed)
|
||||
{
|
||||
add_samples(itr.first, itr.second);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -186,6 +215,7 @@ configure(bool _setup, int64_t _tid)
|
||||
{
|
||||
const auto& _info = thread_info::get(_tid, SequentTID);
|
||||
auto& _causal = get_causal_sampler(_tid);
|
||||
auto& _causal_perf = perf::get_instance(_tid);
|
||||
auto& _running = get_causal_sampler_running(_tid);
|
||||
auto& _signal_types = get_causal_sampler_signals(_tid);
|
||||
|
||||
@@ -197,6 +227,19 @@ configure(bool _setup, int64_t _tid)
|
||||
|
||||
if(_setup && _signal_types.empty()) _signal_types = get_sampling_signals(_tid);
|
||||
|
||||
// initialize
|
||||
if(_setup)
|
||||
{
|
||||
using global_init_mode = operation::mode_constant<operation::init_mode::global>;
|
||||
using thread_init_mode = operation::mode_constant<operation::init_mode::thread>;
|
||||
// initialize backtrace
|
||||
operation::init<component::backtrace>{}(global_init_mode{});
|
||||
operation::init<component::backtrace>{}(thread_init_mode{});
|
||||
// initialize overflow
|
||||
operation::init<component::overflow>{}(global_init_mode{});
|
||||
operation::init<component::overflow>{}(thread_init_mode{});
|
||||
}
|
||||
|
||||
if(_setup && !_causal && !_running && !_signal_types.empty())
|
||||
{
|
||||
auto _verbose = std::min<int>(get_verbose() - 2, 2);
|
||||
@@ -218,21 +261,95 @@ configure(bool _setup, int64_t _tid)
|
||||
_causal = std::make_unique<causal_sampler_t>(_causal_alloc, "omnitrace", _tid,
|
||||
_verbose);
|
||||
|
||||
auto _activate_perf_backend = [&_causal, &_causal_perf, &_info, &_tid]() {
|
||||
_causal_perf = std::make_unique<perf::perf_event>();
|
||||
auto _open_error =
|
||||
_causal_perf->open(1000.0, 10, _info->index_data->system_value);
|
||||
if(_open_error)
|
||||
{
|
||||
_causal_perf.reset();
|
||||
}
|
||||
else
|
||||
{
|
||||
overflow_enabled::set(true);
|
||||
overflow_enabled::set(scope::thread_scope{}, true);
|
||||
backtrace_enabled::set(false);
|
||||
backtrace_enabled::set(scope::thread_scope{}, false);
|
||||
_causal->configure(overflow{ get_sampling_overflow_signal(),
|
||||
[](int, pid_t, long, int64_t) {
|
||||
// perf::get_instance(_idx)->set_ready_signal(_sig);
|
||||
return true;
|
||||
},
|
||||
[](int, pid_t, long, int64_t _idx) {
|
||||
return perf::get_instance(_idx)->start();
|
||||
},
|
||||
[](int, pid_t, long, int64_t _idx) {
|
||||
return perf::get_instance(_idx)->stop();
|
||||
},
|
||||
_tid, threading::get_sys_tid() });
|
||||
if(_tid == 0) OMNITRACE_VERBOSE(1, "causal profiling backend: perf\n");
|
||||
}
|
||||
|
||||
return _open_error;
|
||||
};
|
||||
|
||||
auto _activate_timer_backend = [&_causal, &_tid]() {
|
||||
backtrace_enabled::set(true);
|
||||
backtrace_enabled::set(scope::thread_scope{}, true);
|
||||
overflow_enabled::set(false);
|
||||
overflow_enabled::set(scope::thread_scope{}, false);
|
||||
_causal->configure(timer{ get_sampling_realtime_signal(), CLOCK_REALTIME,
|
||||
SIGEV_THREAD_ID, 1000.0, 1.0e-6, _tid,
|
||||
threading::get_sys_tid() });
|
||||
if(_tid == 0) OMNITRACE_VERBOSE(1, "causal profiling backend: timer\n");
|
||||
return true;
|
||||
};
|
||||
|
||||
TIMEMORY_REQUIRE(_causal) << "nullptr to causal profiling instance";
|
||||
|
||||
_causal->set_flags(SA_RESTART);
|
||||
_causal->set_verbose(_verbose);
|
||||
_causal->set_offload(&causal_offload_buffer);
|
||||
|
||||
_causal->configure(timer{ get_realtime_signal(), CLOCK_REALTIME, SIGEV_THREAD_ID,
|
||||
1000.0, 1.0e-6, _tid, threading::get_sys_tid() });
|
||||
if(get_causal_backend() == CausalBackend::Perf)
|
||||
{
|
||||
auto _perf_error = _activate_perf_backend();
|
||||
OMNITRACE_REQUIRE(!_perf_error)
|
||||
<< "perf backend for causal profiling failed to activate: "
|
||||
<< *_perf_error << "\n";
|
||||
}
|
||||
else if(get_causal_backend() == CausalBackend::Timer)
|
||||
{
|
||||
OMNITRACE_REQUIRE(_activate_timer_backend())
|
||||
<< "timer backend for causal profiling failed to activate\n";
|
||||
}
|
||||
else if(get_causal_backend() == CausalBackend::Auto)
|
||||
{
|
||||
auto _perf_error = _activate_perf_backend();
|
||||
if(!_perf_error)
|
||||
{
|
||||
config::set_setting_value("OMNITRACE_CAUSAL_BACKEND",
|
||||
std::string{ "perf" });
|
||||
}
|
||||
else
|
||||
{
|
||||
OMNITRACE_WARNING_F(
|
||||
0, "perf backend for causal profiling failed to activate: %s\n",
|
||||
_perf_error->c_str());
|
||||
|
||||
_causal->configure(timer{ get_cputime_signal(), CLOCK_THREAD_CPUTIME_ID,
|
||||
OMNITRACE_REQUIRE(_activate_timer_backend())
|
||||
<< "timer backend for causal profiling failed to activate\n";
|
||||
|
||||
config::set_setting_value("OMNITRACE_CAUSAL_BACKEND",
|
||||
std::string{ "timer" });
|
||||
}
|
||||
}
|
||||
|
||||
_causal->configure(timer{ get_sampling_cputime_signal(), CLOCK_THREAD_CPUTIME_ID,
|
||||
SIGEV_THREAD_ID, 1000.0, 1.0e-6, _tid,
|
||||
threading::get_sys_tid() });
|
||||
|
||||
_running = true;
|
||||
if(_tid == 0) causal::component::backtrace::start();
|
||||
_causal->start();
|
||||
}
|
||||
else if(!_setup && _causal && _running)
|
||||
@@ -257,12 +374,22 @@ configure(bool _setup, int64_t _tid)
|
||||
get_causal_sampler(i)->stop();
|
||||
get_causal_sampler(i)->reset();
|
||||
}
|
||||
|
||||
if(perf::get_instance(i))
|
||||
{
|
||||
perf::get_instance(i).reset();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
_causal->stop();
|
||||
_causal->reset();
|
||||
|
||||
if(_causal_perf)
|
||||
{
|
||||
_causal_perf.reset();
|
||||
}
|
||||
|
||||
OMNITRACE_DEBUG("Causal sampler destroyed for thread %lu\n", _tid);
|
||||
}
|
||||
|
||||
@@ -311,17 +438,113 @@ unblock_samples()
|
||||
void
|
||||
block_backtrace_samples()
|
||||
{
|
||||
trait::runtime_enabled<causal::component::backtrace>::set(scope::thread_scope{},
|
||||
false);
|
||||
pause(scope::thread_scope{});
|
||||
}
|
||||
|
||||
void
|
||||
unblock_backtrace_samples()
|
||||
{
|
||||
trait::runtime_enabled<causal::component::backtrace>::set(scope::thread_scope{},
|
||||
true);
|
||||
resume(scope::thread_scope{});
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
std::optional<bool> _process_paused = {};
|
||||
thread_local std::optional<bool> _thread_paused = {};
|
||||
namespace signals = ::tim::signals;
|
||||
|
||||
const auto&
|
||||
sampling_signals()
|
||||
{
|
||||
static thread_local auto _v = get_signal_types(threading::get_id());
|
||||
return _v;
|
||||
}
|
||||
} // namespace
|
||||
|
||||
template <typename ScopeT>
|
||||
void pause(ScopeT)
|
||||
{
|
||||
static_assert(
|
||||
tim::is_one_of<ScopeT,
|
||||
type_list<scope::thread_scope, scope::process_scope>>::value,
|
||||
"Unsupported scope");
|
||||
|
||||
if constexpr(std::is_same<ScopeT, scope::thread_scope>::value)
|
||||
{
|
||||
if(!_thread_paused) _thread_paused = false;
|
||||
|
||||
bool _paused_v = *_thread_paused;
|
||||
if(!_paused_v)
|
||||
{
|
||||
auto& _causal_perf = perf::get_instance(threading::get_id());
|
||||
if(_causal_perf) _causal_perf->stop();
|
||||
signals::block_signals(sampling_signals(), signals::sigmask_scope::thread);
|
||||
_thread_paused = true;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
if(!_process_paused) _process_paused = false;
|
||||
|
||||
bool _paused_v = *_process_paused;
|
||||
if(!_paused_v)
|
||||
{
|
||||
for(auto i = 0; i < OMNITRACE_MAX_THREADS; ++i)
|
||||
{
|
||||
auto& _causal_perf = perf::get_instance(i);
|
||||
if(_causal_perf) _causal_perf->stop();
|
||||
}
|
||||
signals::block_signals(sampling_signals(), signals::sigmask_scope::process);
|
||||
_process_paused = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename ScopeT>
|
||||
void resume(ScopeT)
|
||||
{
|
||||
static_assert(
|
||||
tim::is_one_of<ScopeT,
|
||||
type_list<scope::thread_scope, scope::process_scope>>::value,
|
||||
"Unsupported scope");
|
||||
|
||||
if constexpr(std::is_same<ScopeT, scope::thread_scope>::value)
|
||||
{
|
||||
if(!_thread_paused) _thread_paused = true;
|
||||
|
||||
bool _paused_v = *_thread_paused;
|
||||
if(_paused_v)
|
||||
{
|
||||
auto& _causal_perf = perf::get_instance(threading::get_id());
|
||||
if(_causal_perf) _causal_perf->start();
|
||||
signals::unblock_signals(sampling_signals(), signals::sigmask_scope::thread);
|
||||
_thread_paused = false;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
if(!_process_paused) _process_paused = true;
|
||||
|
||||
bool _paused_v = *_process_paused;
|
||||
if(_paused_v)
|
||||
{
|
||||
for(auto i = 0; i < OMNITRACE_MAX_THREADS; ++i)
|
||||
{
|
||||
auto& _causal_perf = perf::get_instance(i);
|
||||
if(_causal_perf) _causal_perf->start();
|
||||
}
|
||||
signals::unblock_signals(sampling_signals(), signals::sigmask_scope::process);
|
||||
_process_paused = false;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template void pause<scope::thread_scope>(scope::thread_scope);
|
||||
template void pause<scope::process_scope>(scope::process_scope);
|
||||
|
||||
template void resume<scope::thread_scope>(scope::thread_scope);
|
||||
template void resume<scope::process_scope>(scope::process_scope);
|
||||
|
||||
void
|
||||
block_signals(std::set<int> _signals)
|
||||
{
|
||||
@@ -354,10 +577,15 @@ post_process()
|
||||
{
|
||||
auto& _causal = get_causal_sampler(i);
|
||||
if(_causal) _causal->stop();
|
||||
auto& _causal_perf = perf::get_instance(i);
|
||||
if(_causal_perf) _causal_perf->stop();
|
||||
}
|
||||
|
||||
configure(false, 0);
|
||||
|
||||
auto _allocator = get_causal_sampler_allocator(false);
|
||||
if(_allocator) _allocator->flush();
|
||||
|
||||
for(size_t i = 0; i < max_supported_threads; ++i)
|
||||
{
|
||||
auto& _causal = get_causal_sampler(i);
|
||||
@@ -370,12 +598,15 @@ post_process()
|
||||
for(size_t i = 0; i < max_supported_threads; ++i)
|
||||
{
|
||||
get_causal_sampler(i).reset();
|
||||
|
||||
auto& _causal_perf = perf::get_instance(i);
|
||||
if(_causal_perf)
|
||||
{
|
||||
_causal_perf.reset();
|
||||
}
|
||||
}
|
||||
|
||||
if(get_causal_sampler_allocator(false))
|
||||
{
|
||||
get_causal_sampler_allocator(false).reset();
|
||||
}
|
||||
if(_allocator) _allocator.reset();
|
||||
}
|
||||
|
||||
namespace
|
||||
@@ -386,9 +617,27 @@ post_process_causal(int64_t, const std::vector<causal_bundle_t>& _data)
|
||||
for(const auto& itr : _data)
|
||||
{
|
||||
const auto* _bt_causal = itr.get<causal::component::backtrace>();
|
||||
for(auto&& ditr : _bt_causal->get_stack())
|
||||
if(_bt_causal)
|
||||
{
|
||||
if(ditr > 0) add_sample(_bt_causal->get_index(), ditr);
|
||||
auto _stack = _bt_causal->get_stack();
|
||||
for(auto&& ditr : _stack)
|
||||
{
|
||||
if(ditr > 0) add_sample(_bt_causal->get_index(), ditr);
|
||||
}
|
||||
}
|
||||
|
||||
const auto* _of_causal = itr.get<causal::component::overflow>();
|
||||
if(_of_causal)
|
||||
{
|
||||
const auto& _stack = _of_causal->get_stack();
|
||||
|
||||
for(const auto& ditr : _stack)
|
||||
{
|
||||
for(auto aitr : ditr)
|
||||
{
|
||||
if(aitr > 0) add_sample(_of_causal->get_index(), aitr);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -51,6 +51,12 @@ block_backtrace_samples();
|
||||
void
|
||||
unblock_backtrace_samples();
|
||||
|
||||
template <typename Tp = tim::scope::thread_scope>
|
||||
void pause(Tp = {});
|
||||
|
||||
template <typename Tp = tim::scope::thread_scope>
|
||||
void resume(Tp = {});
|
||||
|
||||
void block_signals(std::set<int> = {});
|
||||
|
||||
void unblock_signals(std::set<int> = {});
|
||||
|
||||
@@ -3,6 +3,7 @@ set(component_sources
|
||||
${CMAKE_CURRENT_LIST_DIR}/backtrace.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/backtrace_metrics.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/backtrace_timestamp.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/callchain.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/comm_data.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/exit_gotcha.cpp
|
||||
@@ -17,6 +18,7 @@ set(component_headers
|
||||
${CMAKE_CURRENT_LIST_DIR}/backtrace.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/backtrace_metrics.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/backtrace_timestamp.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/callchain.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/category_region.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/comm_data.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.hpp
|
||||
|
||||
@@ -132,12 +132,12 @@ backtrace::filter_and_patch(const std::vector<entry_type>& _data)
|
||||
return 1;
|
||||
};
|
||||
|
||||
bool _keep_suffix = tim::get_env<bool>("OMNITRACE_SAMPLING_KEEP_DYNINST_SUFFIX",
|
||||
get_debug_sampling());
|
||||
static bool _keep_suffix = tim::get_env<bool>(
|
||||
"OMNITRACE_SAMPLING_KEEP_DYNINST_SUFFIX", get_debug_sampling());
|
||||
|
||||
// in the dyninst binary rewrite runtime, instrumented functions are appended with
|
||||
// "_dyninst", i.e. "main" will show up as "main_dyninst" in the backtrace.
|
||||
auto _patch_label = [_keep_suffix](std::string_view _lbl) -> std::string {
|
||||
auto _patch_label = [](std::string_view _lbl) -> std::string {
|
||||
// debugging feature
|
||||
if(_keep_suffix) return std::string{ _lbl };
|
||||
const std::string _dyninst{ "_dyninst" };
|
||||
@@ -183,8 +183,10 @@ backtrace::size() const
|
||||
}
|
||||
|
||||
void
|
||||
backtrace::sample(int)
|
||||
backtrace::sample(int signo)
|
||||
{
|
||||
if(signo == get_sampling_overflow_signal()) return;
|
||||
|
||||
// on RedHat, the unw_step within get_unw_stack involves a mutex lock
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
|
||||
@@ -317,6 +317,34 @@ backtrace_metrics::fini_perfetto(int64_t _tid, valid_array_t _valid)
|
||||
}
|
||||
}
|
||||
|
||||
backtrace_metrics&
|
||||
backtrace_metrics::operator-=(const backtrace_metrics& _rhs)
|
||||
{
|
||||
auto& _lhs = *this;
|
||||
if(_lhs(category::thread_peak_memory{}))
|
||||
{
|
||||
_lhs.m_mem_peak -= _rhs.m_mem_peak;
|
||||
}
|
||||
|
||||
if(_lhs(category::thread_context_switch{}))
|
||||
{
|
||||
_lhs.m_ctx_swch -= _rhs.m_ctx_swch;
|
||||
}
|
||||
|
||||
if(_lhs(category::thread_page_fault{}))
|
||||
{
|
||||
_lhs.m_page_flt -= _rhs.m_page_flt;
|
||||
}
|
||||
|
||||
if(_lhs(type_list<hw_counters>{}) && _lhs(category::thread_hardware_counter{}))
|
||||
{
|
||||
for(size_t i = 0; i < _lhs.m_hw_counter.size(); ++i)
|
||||
_lhs.m_hw_counter.at(i) -= _rhs.m_hw_counter.at(i);
|
||||
}
|
||||
|
||||
return _lhs;
|
||||
}
|
||||
|
||||
void
|
||||
backtrace_metrics::post_process_perfetto(int64_t _tid, uint64_t _ts) const
|
||||
{
|
||||
@@ -340,6 +368,7 @@ backtrace_metrics::post_process_perfetto(int64_t _tid, uint64_t _ts) const
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts,
|
||||
m_page_flt);
|
||||
}
|
||||
|
||||
if((*this)(type_list<hw_counters>{}) && (*this)(category::thread_hardware_counter{}))
|
||||
{
|
||||
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid); ++i)
|
||||
|
||||
@@ -114,6 +114,14 @@ struct backtrace_metrics : comp::empty_base
|
||||
|
||||
void post_process_perfetto(int64_t _tid, uint64_t _ts) const;
|
||||
|
||||
backtrace_metrics& operator-=(const backtrace_metrics&);
|
||||
|
||||
friend backtrace_metrics operator-(backtrace_metrics _lhs,
|
||||
const backtrace_metrics& _rhs)
|
||||
{
|
||||
return (_lhs -= _rhs);
|
||||
}
|
||||
|
||||
private:
|
||||
valid_array_t m_valid = {};
|
||||
int64_t m_cpu = 0;
|
||||
|
||||
@@ -0,0 +1,210 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "callchain.hpp"
|
||||
#include "binary/analysis.hpp"
|
||||
#include "core/common.hpp"
|
||||
#include "core/components/fwd.hpp"
|
||||
#include "core/config.hpp"
|
||||
#include "core/debug.hpp"
|
||||
#include "core/perfetto.hpp"
|
||||
#include "core/state.hpp"
|
||||
#include "library/components/ensure_storage.hpp"
|
||||
#include "library/perf.hpp"
|
||||
#include "library/ptl.hpp"
|
||||
#include "library/runtime.hpp"
|
||||
#include "library/sampling.hpp"
|
||||
#include "library/thread_info.hpp"
|
||||
|
||||
#include <timemory/backends/papi.hpp>
|
||||
#include <timemory/backends/threading.hpp>
|
||||
#include <timemory/components/data_tracker/components.hpp>
|
||||
#include <timemory/components/macros.hpp>
|
||||
#include <timemory/components/papi/extern.hpp>
|
||||
#include <timemory/components/papi/papi_array.hpp>
|
||||
#include <timemory/components/papi/papi_vector.hpp>
|
||||
#include <timemory/components/rusage/components.hpp>
|
||||
#include <timemory/components/rusage/types.hpp>
|
||||
#include <timemory/components/timing/backends.hpp>
|
||||
#include <timemory/components/trip_count/extern.hpp>
|
||||
#include <timemory/macros.hpp>
|
||||
#include <timemory/math.hpp>
|
||||
#include <timemory/mpl.hpp>
|
||||
#include <timemory/mpl/quirks.hpp>
|
||||
#include <timemory/mpl/type_traits.hpp>
|
||||
#include <timemory/operations.hpp>
|
||||
#include <timemory/storage.hpp>
|
||||
#include <timemory/units.hpp>
|
||||
#include <timemory/unwind/entry.hpp>
|
||||
#include <timemory/utility/demangle.hpp>
|
||||
#include <timemory/utility/types.hpp>
|
||||
#include <timemory/variadic.hpp>
|
||||
|
||||
#include <array>
|
||||
#include <cstring>
|
||||
#include <ctime>
|
||||
#include <initializer_list>
|
||||
#include <mutex>
|
||||
#include <regex>
|
||||
#include <sstream>
|
||||
#include <string>
|
||||
#include <string_view>
|
||||
#include <type_traits>
|
||||
#include <vector>
|
||||
|
||||
#include <pthread.h>
|
||||
#include <signal.h>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
bool
|
||||
callchain::record::operator<(const record& rhs) const
|
||||
{
|
||||
return timestamp < rhs.timestamp;
|
||||
}
|
||||
|
||||
std::vector<callchain::ts_entry_vec_t>
|
||||
callchain::get() const
|
||||
{
|
||||
std::vector<ts_entry_vec_t> _v = {};
|
||||
if(size() == 0) return _v;
|
||||
|
||||
_v.reserve(size());
|
||||
auto _data = m_data;
|
||||
std::sort(_data.begin(), _data.end());
|
||||
for(const auto& itr : _data)
|
||||
{
|
||||
auto _v2 = ts_entry_vec_t{ itr.timestamp, {} };
|
||||
for(auto iitr : itr.data)
|
||||
{
|
||||
auto _entry = binary::lookup_ipaddr_entry<true>(iitr);
|
||||
if(_entry) _v2.second.emplace_back(*_entry);
|
||||
}
|
||||
|
||||
if(!_v2.second.empty())
|
||||
{
|
||||
// put the bottom of the call-stack on top
|
||||
std::reverse(_v2.second.begin(), _v2.second.end());
|
||||
_v.emplace_back(std::move(_v2));
|
||||
}
|
||||
}
|
||||
|
||||
auto _known_excludes =
|
||||
std::set<std::string>{ "funlockfile", "killpg", "__restore_rt" };
|
||||
// remove some known functions which are by-products of interrupts
|
||||
for(auto& itr : _v)
|
||||
{
|
||||
while(!itr.second.empty() &&
|
||||
_known_excludes.find(itr.second.back().name) != _known_excludes.end())
|
||||
itr.second.pop_back();
|
||||
}
|
||||
|
||||
return _v;
|
||||
}
|
||||
|
||||
std::string
|
||||
callchain::label()
|
||||
{
|
||||
return "callchain";
|
||||
}
|
||||
|
||||
std::string
|
||||
callchain::description()
|
||||
{
|
||||
return "Records callchain data";
|
||||
}
|
||||
|
||||
std::vector<callchain::ts_entry_vec_t>
|
||||
callchain::filter_and_patch(const std::vector<ts_entry_vec_t>& _data)
|
||||
{
|
||||
auto _ret = std::vector<ts_entry_vec_t>{};
|
||||
_ret.reserve(_data.size());
|
||||
for(const auto& itr : _data)
|
||||
{
|
||||
auto _v = backtrace::filter_and_patch(itr.second);
|
||||
if(!_v.empty()) _ret.emplace_back(ts_entry_vec_t{ itr.first, std::move(_v) });
|
||||
}
|
||||
|
||||
return _ret;
|
||||
}
|
||||
|
||||
void
|
||||
callchain::start()
|
||||
{}
|
||||
|
||||
void
|
||||
callchain::stop()
|
||||
{}
|
||||
|
||||
bool
|
||||
callchain::empty() const
|
||||
{
|
||||
return (size() == 0);
|
||||
}
|
||||
|
||||
size_t
|
||||
callchain::size() const
|
||||
{
|
||||
return m_data.size();
|
||||
}
|
||||
|
||||
void
|
||||
callchain::sample(int signo)
|
||||
{
|
||||
if(signo != get_sampling_overflow_signal()) return;
|
||||
|
||||
// on RedHat, the unw_step within get_unw_stack involves a mutex lock
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
static thread_local const auto& _tinfo = thread_info::get();
|
||||
auto _tid = _tinfo->index_data->sequent_value;
|
||||
auto& _perf_event = perf::get_instance(_tid);
|
||||
|
||||
if(!_perf_event) return;
|
||||
|
||||
_perf_event->stop();
|
||||
|
||||
for(auto itr : *_perf_event)
|
||||
{
|
||||
if(itr.is_sample())
|
||||
{
|
||||
auto _ip = itr.get_ip();
|
||||
auto _data = record{};
|
||||
_data.timestamp = itr.get_time();
|
||||
_data.data.emplace_back(_ip);
|
||||
for(auto ditr : itr.get_callchain())
|
||||
{
|
||||
if(ditr != _ip) _data.data.emplace_back(ditr);
|
||||
if(_data.data.size() == _data.data.capacity()) break;
|
||||
}
|
||||
if(!_data.data.empty()) m_data.emplace_back(_data);
|
||||
}
|
||||
}
|
||||
|
||||
_perf_event->start();
|
||||
}
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::callchain)
|
||||
@@ -0,0 +1,95 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "core/common.hpp"
|
||||
#include "core/components/fwd.hpp"
|
||||
#include "core/containers/static_vector.hpp"
|
||||
#include "core/defines.hpp"
|
||||
#include "core/timemory.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
|
||||
#include <timemory/components/base/declaration.hpp>
|
||||
#include <timemory/mpl/concepts.hpp>
|
||||
#include <timemory/unwind/cache.hpp>
|
||||
#include <timemory/unwind/processed_entry.hpp>
|
||||
#include <timemory/unwind/stack.hpp>
|
||||
|
||||
#include <array>
|
||||
#include <chrono>
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <set>
|
||||
#include <vector>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace component
|
||||
{
|
||||
struct callchain : comp::empty_base
|
||||
{
|
||||
static constexpr size_t stack_depth = OMNITRACE_MAX_UNWIND_DEPTH;
|
||||
|
||||
struct record
|
||||
{
|
||||
uint64_t timestamp = 0;
|
||||
container::static_vector<uintptr_t, stack_depth> data = {};
|
||||
|
||||
bool operator<(const record& rhs) const;
|
||||
};
|
||||
|
||||
using cache_type = tim::unwind::cache;
|
||||
using entry_type = tim::unwind::processed_entry;
|
||||
using value_type = void;
|
||||
using data_t = container::static_vector<record, 64>;
|
||||
using entry_vec_t = std::vector<entry_type>;
|
||||
using ts_entry_vec_t = std::pair<uint64_t, entry_vec_t>;
|
||||
|
||||
static std::string label();
|
||||
static std::string description();
|
||||
|
||||
callchain() = default;
|
||||
~callchain() = default;
|
||||
callchain(const callchain&) = default;
|
||||
callchain(callchain&&) noexcept = default;
|
||||
|
||||
callchain& operator=(const callchain&) = default;
|
||||
callchain& operator=(callchain&&) noexcept = default;
|
||||
|
||||
static std::vector<ts_entry_vec_t> filter_and_patch(
|
||||
const std::vector<ts_entry_vec_t>&);
|
||||
|
||||
static void start();
|
||||
static void stop();
|
||||
|
||||
void sample(int = -1);
|
||||
bool empty() const;
|
||||
size_t size() const;
|
||||
std::vector<ts_entry_vec_t> get() const;
|
||||
data_t get_data() const { return m_data; }
|
||||
|
||||
private:
|
||||
data_t m_data = {};
|
||||
};
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
@@ -89,6 +89,12 @@ invoke_exit_gotcha(const exit_gotcha::gotcha_data& _data, FuncT _func, Args... _
|
||||
JOIN(", ", _args...).c_str(), get_exe_name().c_str());
|
||||
}
|
||||
|
||||
if(_exit_info.is_known && _exit_info.exit_code != 0)
|
||||
{
|
||||
OMNITRACE_BASIC_VERBOSE(0, "%s exiting with non-zero exit code: %i...\n",
|
||||
get_exe_name().c_str(), _exit_info.exit_code);
|
||||
}
|
||||
|
||||
(*_func)(_args...);
|
||||
}
|
||||
} // namespace
|
||||
@@ -106,6 +112,7 @@ exit_gotcha::operator()(const gotcha_data& _data, exit_func_t _func, int _ec) co
|
||||
void
|
||||
exit_gotcha::operator()(const gotcha_data& _data, abort_func_t _func) const
|
||||
{
|
||||
_exit_info = { true, false, SIGABRT };
|
||||
invoke_exit_gotcha(_data, _func);
|
||||
}
|
||||
|
||||
|
||||
@@ -98,7 +98,8 @@ void
|
||||
pthread_mutex_gotcha::configure()
|
||||
{
|
||||
pthread_mutex_gotcha_t::get_initializer() = []() {
|
||||
if(!tim::settings::enabled()) return;
|
||||
if(!tim::settings::enabled() || get_use_causal()) return;
|
||||
|
||||
if(config::get_trace_thread_locks())
|
||||
{
|
||||
pthread_mutex_gotcha_t::configure(
|
||||
@@ -155,7 +156,7 @@ pthread_mutex_gotcha::configure()
|
||||
"pthread_spin_unlock" });
|
||||
}
|
||||
|
||||
if(config::get_trace_thread_join() && !get_use_causal())
|
||||
if(config::get_trace_thread_join())
|
||||
{
|
||||
pthread_mutex_gotcha_t::configure(
|
||||
comp::gotcha_config<12, int, pthread_t, void**>{ "pthread_join" });
|
||||
|
||||
@@ -228,100 +228,6 @@ entry::get_cost() const
|
||||
return 0;
|
||||
}
|
||||
|
||||
int64_t
|
||||
entry::get_overlap(const entry& rhs) const
|
||||
{
|
||||
if(begin_ns >= rhs.end_ns || end_ns >= rhs.begin_ns) // no overlap
|
||||
return 0;
|
||||
else if(begin_ns >= rhs.begin_ns && end_ns <= rhs.end_ns) // inclusive to rhs
|
||||
return get_cost();
|
||||
else if(begin_ns <= rhs.begin_ns && end_ns >= rhs.end_ns) // rhs is inclusive
|
||||
return rhs.get_cost();
|
||||
else if(begin_ns <= rhs.begin_ns && end_ns <= rhs.end_ns) // at beginning
|
||||
return (end_ns - rhs.begin_ns);
|
||||
else if(begin_ns >= rhs.begin_ns && end_ns >= rhs.end_ns) // at end
|
||||
return (rhs.end_ns - begin_ns);
|
||||
else
|
||||
{
|
||||
OMNITRACE_PRINT("Warning! entry::get_overlap(entry, tid) "
|
||||
"could not determine the overlap :: %s\n",
|
||||
JOIN("", *this).c_str());
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int64_t
|
||||
entry::get_independent(const entry& rhs) const
|
||||
{
|
||||
if(begin_ns >= rhs.end_ns || end_ns >= rhs.begin_ns) // no overlap
|
||||
return get_cost();
|
||||
else if(begin_ns >= rhs.begin_ns && end_ns <= rhs.end_ns) // inclusive to rhs
|
||||
return 0;
|
||||
else if(begin_ns <= rhs.begin_ns && end_ns >= rhs.end_ns) // rhs is inclusive
|
||||
return get_cost() - rhs.get_cost();
|
||||
else if(begin_ns <= rhs.begin_ns && end_ns <= rhs.end_ns) // at beginning
|
||||
return (rhs.begin_ns - begin_ns);
|
||||
else if(begin_ns >= rhs.begin_ns && end_ns >= rhs.end_ns) // at end
|
||||
return (end_ns - rhs.end_ns);
|
||||
else
|
||||
{
|
||||
OMNITRACE_PRINT("Warning! entry::get_independent(entry, tid) "
|
||||
"could not determine the overlap :: %s\n",
|
||||
JOIN("", *this).c_str());
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
||||
int64_t
|
||||
entry::get_overlap(const entry& rhs, int32_t _devid, int32_t _pid, int64_t _tid) const
|
||||
{
|
||||
if(_devid != this->devid || _pid != this->pid) // different device or process id
|
||||
return 0;
|
||||
|
||||
if(!is_delta(*this, __FUNCTION__)) return 0;
|
||||
if(!is_delta(rhs, __FUNCTION__)) return 0;
|
||||
|
||||
if(_tid < 0 || (this->tid == _tid && rhs.tid == _tid)) // all threads or same thread
|
||||
return get_overlap(rhs);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int64_t
|
||||
entry::get_independent(const entry& rhs, int32_t _devid, int32_t _pid, int64_t _tid) const
|
||||
{
|
||||
if(!is_delta(*this, __FUNCTION__)) return 0;
|
||||
if(!is_delta(rhs, __FUNCTION__)) return 0;
|
||||
|
||||
if(_devid != this->devid || _pid != this->pid) // different device or process id
|
||||
return get_independent(rhs);
|
||||
else if(_tid < 0 ||
|
||||
(this->tid == _tid && rhs.tid == _tid)) // all threads or same thread
|
||||
return get_independent(rhs);
|
||||
else if(this->tid == _tid && rhs.tid != _tid) // rhs is on different thread
|
||||
return get_cost();
|
||||
return 0;
|
||||
}
|
||||
|
||||
bool
|
||||
entry::is_bounded(const entry& rhs) const
|
||||
{
|
||||
// ignores thread
|
||||
return !(begin_ns < rhs.begin_ns || end_ns > rhs.end_ns);
|
||||
}
|
||||
|
||||
bool
|
||||
entry::is_bounded(const entry& rhs, int32_t _devid, int32_t _pid, int64_t _tid) const
|
||||
{
|
||||
if(_devid != this->devid || _pid != this->pid) // different device or process id
|
||||
return false;
|
||||
|
||||
if(tid == _tid && rhs.tid == _tid) // all threads or same thread
|
||||
return !(begin_ns < rhs.begin_ns || end_ns > rhs.end_ns);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
void
|
||||
entry::write(std::ostream& _os) const
|
||||
{
|
||||
@@ -354,19 +260,6 @@ entry::write(std::ostream& _os) const
|
||||
_os << ", hash: " << hash << " :: " << tim::demangle(tim::get_hash_identifier(hash));
|
||||
}
|
||||
|
||||
bool
|
||||
entry::is_delta(const entry& _v, const std::string_view& _ctx)
|
||||
{
|
||||
if(_v.phase != Phase::DELTA)
|
||||
{
|
||||
OMNITRACE_CT_DEBUG(
|
||||
"Warning! Invalid phase for entry. entry::%s requires Phase::DELTA :: %s\n",
|
||||
_ctx.data(), JOIN("", _v).c_str());
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//
|
||||
// CALL CHAIN
|
||||
@@ -382,16 +275,6 @@ call_chain::operator==(const call_chain& rhs) const
|
||||
return true;
|
||||
}
|
||||
|
||||
size_t
|
||||
call_chain::get_hash() const
|
||||
{
|
||||
if(empty()) return 0;
|
||||
int64_t _hash = this->at(0).get_hash();
|
||||
for(size_t i = 1; i < this->size(); ++i)
|
||||
_hash = get_combined_hash(_hash, at(i).get_hash());
|
||||
return _hash;
|
||||
}
|
||||
|
||||
int64_t
|
||||
call_chain::get_cost(int64_t _tid) const
|
||||
{
|
||||
@@ -411,35 +294,6 @@ call_chain::get_cost(int64_t _tid) const
|
||||
return _cost;
|
||||
}
|
||||
|
||||
int64_t
|
||||
call_chain::get_overlap(int32_t _devid, int32_t _pid, int64_t _tid) const
|
||||
{
|
||||
int64_t _cost = 0;
|
||||
auto itr = this->begin();
|
||||
auto nitr = ++this->begin();
|
||||
for(; nitr != this->end(); ++nitr, ++itr)
|
||||
_cost += nitr->get_overlap(*itr, _devid, _pid, _tid);
|
||||
return _cost;
|
||||
}
|
||||
|
||||
int64_t
|
||||
call_chain::get_independent(int32_t _devid, int32_t _pid, int64_t _tid) const
|
||||
{
|
||||
int64_t _cost = 0;
|
||||
auto itr = this->begin();
|
||||
auto nitr = ++this->begin();
|
||||
for(; nitr != this->end(); ++nitr, ++itr)
|
||||
_cost += itr->get_independent(*nitr, _devid, _pid, _tid);
|
||||
return _cost;
|
||||
}
|
||||
|
||||
std::vector<call_chain>&
|
||||
call_chain::get_top_chains()
|
||||
{
|
||||
static std::vector<call_chain> _v{};
|
||||
return _v;
|
||||
}
|
||||
|
||||
template <Device DevT>
|
||||
void
|
||||
call_chain::generate_perfetto(::perfetto::Track _track, std::set<entry>& _used) const
|
||||
|
||||
@@ -102,20 +102,8 @@ struct OMNITRACE_ATTRIBUTE(packed) entry
|
||||
|
||||
int64_t get_cost() const;
|
||||
|
||||
bool is_bounded(const entry& rhs) const;
|
||||
int64_t get_overlap(const entry& rhs) const;
|
||||
int64_t get_independent(const entry& rhs) const;
|
||||
|
||||
int64_t get_overlap(const entry& rhs, int32_t _devid, int32_t _pid,
|
||||
int64_t _tid) const;
|
||||
int64_t get_independent(const entry& rhs, int32_t _devid, int32_t _pid,
|
||||
int64_t _tid) const;
|
||||
bool is_bounded(const entry& rhs, int32_t _devid, int32_t _pid, int64_t _tid) const;
|
||||
|
||||
void write(std::ostream& _os) const;
|
||||
|
||||
static bool is_delta(const entry&, const std::string_view&);
|
||||
|
||||
friend std::ostream& operator<<(std::ostream& _os, const entry& _v)
|
||||
{
|
||||
_v.write(_os);
|
||||
@@ -222,11 +210,7 @@ struct call_chain : private std::vector<entry>
|
||||
using base_type::reserve;
|
||||
using base_type::size;
|
||||
|
||||
size_t get_hash() const;
|
||||
int64_t get_cost(int64_t _tid = -1) const;
|
||||
int64_t get_overlap(int32_t _devid, int32_t _pid, int64_t _tid = -1) const;
|
||||
int64_t get_independent(int32_t _devid, int32_t _pid, int64_t _tid = -1) const;
|
||||
static std::vector<call_chain>& get_top_chains();
|
||||
|
||||
bool operator==(const call_chain& rhs) const;
|
||||
bool operator!=(const call_chain& rhs) const { return !(*this == rhs); }
|
||||
|
||||
+184
-78
@@ -20,18 +20,25 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "library/causal/perf.hpp"
|
||||
#include "library/perf.hpp"
|
||||
#include "core/debug.hpp"
|
||||
#include "core/locking.hpp"
|
||||
#include "core/state.hpp"
|
||||
#include "core/timemory.hpp"
|
||||
#include "core/utility.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
|
||||
#include <timemory/log/logger.hpp>
|
||||
#include <timemory/log/macros.hpp>
|
||||
#include <timemory/units.hpp>
|
||||
|
||||
#include <asm/unistd.h>
|
||||
#include <ctime>
|
||||
#include <fcntl.h>
|
||||
#include <linux/perf_event.h>
|
||||
#include <mutex>
|
||||
#include <poll.h>
|
||||
#include <regex>
|
||||
#include <signal.h>
|
||||
#include <stdint.h>
|
||||
#include <stdio.h>
|
||||
@@ -42,10 +49,26 @@
|
||||
#include <sys/types.h>
|
||||
#include <unistd.h>
|
||||
|
||||
#if !defined(OMNITRACE_RETURN_ERROR_MSG)
|
||||
# define OMNITRACE_RETURN_ERROR_MSG(COND, ...) \
|
||||
if((COND)) \
|
||||
{ \
|
||||
auto _msg_ss = std::stringstream{}; \
|
||||
_msg_ss << __VA_ARGS__; \
|
||||
return std::optional<std::string>{ _msg_ss.str() }; \
|
||||
}
|
||||
#endif
|
||||
|
||||
#if !defined(OMNITRACE_FATAL)
|
||||
# define OMNITRACE_FATAL TIMEMORY_FATAL
|
||||
#endif
|
||||
|
||||
#if !defined(OMNITRACE_ASSERT)
|
||||
# define OMNITRACE_ASSERT(COND) (COND) ? ::tim::log::base() : TIMEMORY_FATAL
|
||||
#endif
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace causal
|
||||
{
|
||||
namespace perf
|
||||
{
|
||||
namespace
|
||||
@@ -75,7 +98,7 @@ perf_event::perf_event(perf_event&& rhs) noexcept
|
||||
if(m_fd != -1 && m_fd != rhs.m_fd)
|
||||
{
|
||||
::close(m_fd);
|
||||
TIMEMORY_INFO << "Closed perf event fd " << m_fd;
|
||||
OMNITRACE_VERBOSE(1, "Closed perf event fd %li\n", m_fd);
|
||||
}
|
||||
|
||||
if(m_mapping != nullptr && m_mapping != rhs.m_mapping) munmap(m_mapping, sizes.mmap);
|
||||
@@ -100,6 +123,7 @@ perf_event::~perf_event() { close(); }
|
||||
perf_event&
|
||||
perf_event::operator=(perf_event&& rhs) noexcept
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
if(&rhs == this) return *this;
|
||||
|
||||
// Release resources if the current perf_event is initialized and not equal to this
|
||||
@@ -123,11 +147,13 @@ perf_event::operator=(perf_event&& rhs) noexcept
|
||||
}
|
||||
|
||||
// Open a perf_event file and map it (if sampling is enabled)
|
||||
bool
|
||||
std::optional<std::string>
|
||||
perf_event::open(struct perf_event_attr& _pe, pid_t _pid, int _cpu)
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
m_sample_type = _pe.sample_type;
|
||||
m_read_format = _pe.read_format;
|
||||
m_batch_size = _pe.wakeup_events;
|
||||
|
||||
// Set some mandatory fields
|
||||
_pe.size = sizeof(struct perf_event_attr);
|
||||
@@ -139,27 +165,22 @@ perf_event::open(struct perf_event_attr& _pe, pid_t _pid, int _cpu)
|
||||
{
|
||||
std::string path = "/proc/sys/kernel/perf_event_paranoid";
|
||||
|
||||
FILE* file = fopen(path.c_str(), "r");
|
||||
OMNITRACE_PREFER(file != nullptr)
|
||||
<< "Failed to open " << path << ": " << strerror(errno);
|
||||
auto file = std::ifstream{ path.c_str() };
|
||||
|
||||
if(file == nullptr) return false;
|
||||
OMNITRACE_RETURN_ERROR_MSG(!file,
|
||||
"Failed to open " << path << ": " << strerror(errno));
|
||||
|
||||
char value_str[3];
|
||||
int res = fread(value_str, sizeof(value_str), 1, file);
|
||||
TIMEMORY_REQUIRE(res != -1)
|
||||
<< "Failed to read from " << path << ": " << strerror(errno);
|
||||
int value = 4;
|
||||
file >> value;
|
||||
|
||||
if(res == -1) return false;
|
||||
OMNITRACE_RETURN_ERROR_MSG(file.bad(), "Failed to read from " << path << ": "
|
||||
<< strerror(errno));
|
||||
|
||||
value_str[2] = '\0';
|
||||
int value = atoi(value_str);
|
||||
|
||||
TIMEMORY_WARNING << "Failed to open perf event. "
|
||||
<< "Consider tweaking " << path << " to 2 or less "
|
||||
<< "(current value is " << value << "), "
|
||||
<< "or run omnitrace as a privileged user (with CAP_SYS_ADMIN).";
|
||||
return false;
|
||||
OMNITRACE_RETURN_ERROR_MSG(
|
||||
true, "Failed to open perf event. Consider tweaking "
|
||||
<< path << " to 2 or less "
|
||||
<< "(current value is " << value << "), "
|
||||
<< "or run omnitrace as a privileged user (with CAP_SYS_ADMIN).");
|
||||
}
|
||||
|
||||
// If sampling, map the perf event file
|
||||
@@ -168,78 +189,107 @@ perf_event::open(struct perf_event_attr& _pe, pid_t _pid, int _cpu)
|
||||
void* ring_buffer =
|
||||
mmap(nullptr, sizes.mmap, PROT_READ | PROT_WRITE, MAP_SHARED, m_fd, 0);
|
||||
|
||||
OMNITRACE_PREFER(ring_buffer != MAP_FAILED)
|
||||
<< "Mapping perf_event ring buffer failed. "
|
||||
<< "Make sure the current user has permission "
|
||||
"to invoke the perf tool, and that "
|
||||
<< "the program being profiled does not use "
|
||||
"an excessive number of threads (>1000).\n";
|
||||
|
||||
if(ring_buffer == MAP_FAILED) return false;
|
||||
OMNITRACE_RETURN_ERROR_MSG(
|
||||
ring_buffer == MAP_FAILED,
|
||||
"Mapping perf_event ring buffer failed. Make sure the current user has "
|
||||
"permission to invoke the perf tool, and that the program being profiled "
|
||||
"does not use an excessive number of threads (>1000)");
|
||||
|
||||
m_mapping = reinterpret_cast<struct perf_event_mmap_page*>(ring_buffer);
|
||||
}
|
||||
|
||||
return true;
|
||||
return std::optional<std::string>{};
|
||||
}
|
||||
|
||||
bool
|
||||
std::optional<std::string>
|
||||
perf_event::open(double _freq, uint32_t _batch_size, pid_t _pid, int _cpu)
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
uint64_t _period = (1.0 / _freq) * units::sec;
|
||||
struct perf_event_attr _pe;
|
||||
|
||||
if(_batch_size > 0)
|
||||
m_batch_size = _batch_size;
|
||||
else
|
||||
_batch_size = m_batch_size;
|
||||
|
||||
memset(&_pe, 0, sizeof(_pe));
|
||||
_pe.type = PERF_TYPE_SOFTWARE;
|
||||
_pe.config = PERF_COUNT_SW_TASK_CLOCK;
|
||||
_pe.sample_type = PERF_SAMPLE_IP | PERF_SAMPLE_CALLCHAIN;
|
||||
_pe.sample_period = _period;
|
||||
_pe.wakeup_events = _batch_size;
|
||||
_pe.sample_period = _period;
|
||||
_pe.wakeup_events = _batch_size; // This is ignored on linux 3.13 (why?)
|
||||
_pe.exclude_idle = 1;
|
||||
_pe.exclude_kernel = 1;
|
||||
_pe.precise_ip = 0;
|
||||
_pe.disabled = 1;
|
||||
// potential additions
|
||||
_pe.inherit = 0;
|
||||
_pe.exclude_hv = 1;
|
||||
_pe.exclude_callchain_kernel = 1;
|
||||
_pe.use_clockid = 1;
|
||||
_pe.clockid = CLOCK_REALTIME;
|
||||
// _pe.precise_ip = 0;
|
||||
// _pe.exclusive = 1;
|
||||
// _pe.pinned = 1;
|
||||
|
||||
return open(_pe, _pid, _cpu);
|
||||
}
|
||||
|
||||
/// Read event count
|
||||
long
|
||||
perf_event::get_fileno() const
|
||||
{
|
||||
return m_fd;
|
||||
}
|
||||
|
||||
/// Read event count
|
||||
uint64_t
|
||||
perf_event::get_count() const
|
||||
{
|
||||
uint64_t count;
|
||||
TIMEMORY_REQUIRE(read(m_fd, &count, sizeof(uint64_t)) == sizeof(uint64_t))
|
||||
OMNITRACE_REQUIRE(read(m_fd, &count, sizeof(uint64_t)) == sizeof(uint64_t))
|
||||
<< "Failed to read event count from perf_event file";
|
||||
return count;
|
||||
}
|
||||
|
||||
/// Start counting events
|
||||
void
|
||||
bool
|
||||
perf_event::start() const
|
||||
{
|
||||
if(m_fd != -1)
|
||||
{
|
||||
TIMEMORY_REQUIRE(ioctl(m_fd, PERF_EVENT_IOC_ENABLE, 0) != -1)
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
OMNITRACE_REQUIRE(ioctl(m_fd, PERF_EVENT_IOC_ENABLE, 0) != -1)
|
||||
<< "Failed to start perf event: " << strerror(errno);
|
||||
}
|
||||
return (m_fd != -1);
|
||||
}
|
||||
|
||||
/// Stop counting events
|
||||
void
|
||||
bool
|
||||
perf_event::stop() const
|
||||
{
|
||||
if(m_fd != -1)
|
||||
{
|
||||
TIMEMORY_REQUIRE(ioctl(m_fd, PERF_EVENT_IOC_DISABLE, 0) != -1)
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
OMNITRACE_REQUIRE(ioctl(m_fd, PERF_EVENT_IOC_DISABLE, 0) != -1)
|
||||
<< "Failed to stop perf event: " << strerror(errno) << " (" << m_fd << ")";
|
||||
}
|
||||
return (m_fd != -1);
|
||||
}
|
||||
|
||||
bool
|
||||
perf_event::is_open() const
|
||||
{
|
||||
return (m_fd != -1);
|
||||
}
|
||||
|
||||
void
|
||||
perf_event::close()
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
stop();
|
||||
|
||||
if(m_fd != -1)
|
||||
{
|
||||
::close(m_fd);
|
||||
@@ -256,22 +306,25 @@ perf_event::close()
|
||||
void
|
||||
perf_event::set_ready_signal(int sig) const
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
// Set the perf_event file to async
|
||||
TIMEMORY_REQUIRE(fcntl(m_fd, F_SETFL, fcntl(m_fd, F_GETFL, 0) | O_ASYNC) != -1)
|
||||
OMNITRACE_REQUIRE(fcntl(m_fd, F_SETFL, fcntl(m_fd, F_GETFL, 0) | O_ASYNC) != -1)
|
||||
<< "failed to set perf_event file to async mode";
|
||||
|
||||
// Set the notification signal for the perf file
|
||||
TIMEMORY_REQUIRE(fcntl(m_fd, F_SETSIG, sig) != -1)
|
||||
OMNITRACE_REQUIRE(fcntl(m_fd, F_SETSIG, sig) != -1)
|
||||
<< "failed to set perf_event file signal";
|
||||
|
||||
// Set the current thread as the owner of the file (to target signal delivery)
|
||||
TIMEMORY_REQUIRE(fcntl(m_fd, F_SETOWN, gettid()) != -1)
|
||||
OMNITRACE_REQUIRE(fcntl(m_fd, F_SETOWN, gettid()) != -1)
|
||||
<< "failed to set the owner of the perf_event file";
|
||||
}
|
||||
|
||||
void
|
||||
perf_event::iterator::next()
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
struct perf_event_header _hdr;
|
||||
|
||||
// Copy out the record header
|
||||
@@ -322,6 +375,8 @@ perf_event::iterator::operator!=(const iterator& other) const
|
||||
perf_event::record
|
||||
perf_event::iterator::get()
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
// Copy out the record header
|
||||
perf_event::copy_from_ring_buffer(m_mapping, m_index, _buf,
|
||||
sizeof(struct perf_event_header));
|
||||
@@ -332,7 +387,7 @@ perf_event::iterator::get()
|
||||
// Copy out the entire record
|
||||
perf_event::copy_from_ring_buffer(m_mapping, m_index, _buf, header->size);
|
||||
|
||||
return perf_event::record(m_source, header);
|
||||
return perf_event::record(&m_source, header);
|
||||
}
|
||||
|
||||
bool
|
||||
@@ -367,6 +422,8 @@ void
|
||||
perf_event::copy_from_ring_buffer(struct perf_event_mmap_page* _mapping, ptrdiff_t _index,
|
||||
void* _dest, size_t _nbytes)
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
uintptr_t _base = reinterpret_cast<uintptr_t>(_mapping) + sizes.page;
|
||||
size_t _beg_idx = _index % sizes.data;
|
||||
size_t _end_idx = _beg_idx + _nbytes;
|
||||
@@ -391,53 +448,74 @@ perf_event::copy_from_ring_buffer(struct perf_event_mmap_page* _mapping, ptrdiff
|
||||
uint64_t
|
||||
perf_event::record::get_ip() const
|
||||
{
|
||||
TIMEMORY_ASSERT(is_sample() && m_source.is_sampling(sample::ip))
|
||||
<< "Record does not have an ip field";
|
||||
OMNITRACE_ASSERT(is_sample() && m_source != nullptr &&
|
||||
m_source->is_sampling(sample::ip))
|
||||
<< "Record does not have an ip field (" << is_sample() << "|" << m_source << ")";
|
||||
return *locate_field<sample::ip, uint64_t*>();
|
||||
}
|
||||
|
||||
uint64_t
|
||||
perf_event::record::get_pid() const
|
||||
{
|
||||
TIMEMORY_ASSERT(is_sample() && m_source.is_sampling(sample::pid_tid))
|
||||
<< "Record does not have a `pid` field";
|
||||
OMNITRACE_ASSERT(is_sample() && m_source != nullptr &&
|
||||
m_source->is_sampling(sample::pid_tid))
|
||||
<< "Record does not have a `pid` field (" << is_sample() << "|" << m_source
|
||||
<< ")";
|
||||
return locate_field<sample::pid_tid, uint32_t*>()[0];
|
||||
}
|
||||
|
||||
uint64_t
|
||||
perf_event::record::get_tid() const
|
||||
{
|
||||
TIMEMORY_ASSERT(is_sample() && m_source.is_sampling(sample::pid_tid))
|
||||
<< "Record does not have a `tid` field";
|
||||
OMNITRACE_ASSERT(is_sample() && m_source != nullptr &&
|
||||
m_source->is_sampling(sample::pid_tid))
|
||||
<< "Record does not have a `tid` field (" << is_sample() << "|" << m_source
|
||||
<< ")";
|
||||
return locate_field<sample::pid_tid, uint32_t*>()[1];
|
||||
}
|
||||
|
||||
uint64_t
|
||||
perf_event::record::get_time() const
|
||||
{
|
||||
TIMEMORY_ASSERT(is_sample() && m_source.is_sampling(sample::time))
|
||||
<< "Record does not have a 'time' field";
|
||||
OMNITRACE_ASSERT(is_sample() && m_source != nullptr &&
|
||||
m_source->is_sampling(sample::time))
|
||||
<< "Record does not have a 'time' field (" << is_sample() << "|" << m_source
|
||||
<< ")";
|
||||
return *locate_field<sample::time, uint64_t*>();
|
||||
}
|
||||
|
||||
uint64_t
|
||||
perf_event::record::get_period() const
|
||||
{
|
||||
OMNITRACE_ASSERT(is_sample() && m_source != nullptr &&
|
||||
m_source->is_sampling(sample::period))
|
||||
<< "Record does not have a 'period' field (" << is_sample() << "|" << m_source
|
||||
<< ")";
|
||||
return *locate_field<sample::period, uint64_t*>();
|
||||
}
|
||||
|
||||
uint32_t
|
||||
perf_event::record::get_cpu() const
|
||||
{
|
||||
TIMEMORY_ASSERT(is_sample() && m_source.is_sampling(sample::cpu))
|
||||
<< "Record does not have a 'cpu' field";
|
||||
OMNITRACE_ASSERT(is_sample() && m_source != nullptr &&
|
||||
m_source->is_sampling(sample::cpu))
|
||||
<< "Record does not have a 'cpu' field (" << is_sample() << "|" << m_source
|
||||
<< ")";
|
||||
return *locate_field<sample::cpu, uint32_t*>();
|
||||
}
|
||||
|
||||
container::c_array<uint64_t>
|
||||
perf_event::record::get_callchain() const
|
||||
{
|
||||
TIMEMORY_ASSERT(is_sample() && m_source.is_sampling(sample::callchain))
|
||||
<< "Record does not have a callchain field";
|
||||
OMNITRACE_ASSERT(is_sample() && m_source != nullptr &&
|
||||
m_source->is_sampling(sample::callchain))
|
||||
<< "Record does not have a callchain field (" << is_sample() << "|" << m_source
|
||||
<< ")";
|
||||
|
||||
uint64_t* _base = locate_field<sample::callchain, uint64_t*>();
|
||||
uint64_t _size = *_base;
|
||||
// Advance the callchain array pointer past the size
|
||||
_base++;
|
||||
++_base;
|
||||
return container::wrap_c_array(_base, _size);
|
||||
}
|
||||
|
||||
@@ -445,6 +523,8 @@ template <sample SampleT, typename Tp>
|
||||
Tp
|
||||
perf_event::record::locate_field() const
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
|
||||
uintptr_t p =
|
||||
reinterpret_cast<uintptr_t>(m_header) + sizeof(struct perf_event_header);
|
||||
|
||||
@@ -454,41 +534,45 @@ perf_event::record::locate_field() const
|
||||
|
||||
// ip
|
||||
if constexpr(SampleT == sample::ip) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::ip)) p += sizeof(uint64_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::ip)) p += sizeof(uint64_t);
|
||||
|
||||
// pid, tid
|
||||
if constexpr(SampleT == sample::pid_tid) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::pid_tid)) p += sizeof(uint32_t) + sizeof(uint32_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::pid_tid))
|
||||
p += sizeof(uint32_t) + sizeof(uint32_t);
|
||||
|
||||
// time
|
||||
if constexpr(SampleT == sample::time) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::time)) p += sizeof(uint64_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::time)) p += sizeof(uint64_t);
|
||||
|
||||
// addr
|
||||
if constexpr(SampleT == sample::addr) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::addr)) p += sizeof(uint64_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::addr)) p += sizeof(uint64_t);
|
||||
|
||||
// id
|
||||
if constexpr(SampleT == sample::id) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::id)) p += sizeof(uint64_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::id)) p += sizeof(uint64_t);
|
||||
|
||||
// stream_id
|
||||
if constexpr(SampleT == sample::stream_id) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::stream_id)) p += sizeof(uint64_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::stream_id))
|
||||
p += sizeof(uint64_t);
|
||||
|
||||
// cpu
|
||||
if constexpr(SampleT == sample::cpu) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::cpu)) p += sizeof(uint32_t) + sizeof(uint32_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::cpu))
|
||||
p += sizeof(uint32_t) + sizeof(uint32_t);
|
||||
|
||||
// period
|
||||
if constexpr(SampleT == sample::period) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::period)) p += sizeof(uint64_t);
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::period))
|
||||
p += sizeof(uint64_t);
|
||||
|
||||
// value
|
||||
if constexpr(SampleT == sample::read) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::read))
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::read))
|
||||
{
|
||||
uint64_t read_format = m_source.get_read_format();
|
||||
uint64_t read_format = m_source->get_read_format();
|
||||
if(read_format & PERF_FORMAT_GROUP)
|
||||
{
|
||||
// Get the number of values in the read format structure
|
||||
@@ -516,15 +600,15 @@ perf_event::record::locate_field() const
|
||||
|
||||
// callchain
|
||||
if constexpr(SampleT == sample::callchain) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::callchain))
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::callchain))
|
||||
{
|
||||
uint64_t nr = *reinterpret_cast<uint64_t*>(p);
|
||||
p += sizeof(uint64_t) + nr * sizeof(uint64_t);
|
||||
p += sizeof(uint64_t) + (nr * sizeof(uint64_t));
|
||||
}
|
||||
|
||||
// raw
|
||||
if constexpr(SampleT == sample::raw) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::raw))
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::raw))
|
||||
{
|
||||
uint32_t raw_size = *reinterpret_cast<uint32_t*>(p);
|
||||
p += sizeof(uint32_t) + raw_size;
|
||||
@@ -532,24 +616,46 @@ perf_event::record::locate_field() const
|
||||
|
||||
// branch_stack
|
||||
if constexpr(SampleT == sample::branch_stack) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::branch_stack))
|
||||
TIMEMORY_FATAL << "Branch stack sampling is not supported";
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::branch_stack))
|
||||
OMNITRACE_FATAL << "Branch stack sampling is not supported";
|
||||
|
||||
// regs
|
||||
if constexpr(SampleT == sample::regs) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::regs))
|
||||
TIMEMORY_FATAL << "Register sampling is not supported";
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::regs))
|
||||
OMNITRACE_FATAL << "Register sampling is not supported";
|
||||
|
||||
// stack
|
||||
if constexpr(SampleT == sample::stack) return reinterpret_cast<Tp>(p);
|
||||
if(m_source.is_sampling(sample::stack))
|
||||
TIMEMORY_FATAL << "Stack sampling is not supported";
|
||||
if(m_source != nullptr && m_source->is_sampling(sample::stack))
|
||||
OMNITRACE_FATAL << "Stack sampling is not supported";
|
||||
|
||||
// end
|
||||
if constexpr(SampleT == sample::last) return reinterpret_cast<Tp>(p);
|
||||
|
||||
TIMEMORY_FATAL << "Unsupported sample field requested!";
|
||||
OMNITRACE_FATAL << "Unsupported sample field requested!";
|
||||
}
|
||||
|
||||
namespace
|
||||
{
|
||||
inline auto&
|
||||
get_instances()
|
||||
{
|
||||
using thread_data_t = thread_data<identity<std::unique_ptr<perf_event>>, perf_event>;
|
||||
static auto& _v = thread_data_t::instance(construct_on_init{});
|
||||
return _v;
|
||||
}
|
||||
} // namespace
|
||||
|
||||
std::unique_ptr<perf_event>&
|
||||
get_instance(int64_t _tid)
|
||||
{
|
||||
auto& _data = get_instances();
|
||||
if(static_cast<size_t>(_tid) >= _data->size())
|
||||
{
|
||||
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
|
||||
_data->resize(_tid + 1);
|
||||
}
|
||||
return _data->at(_tid);
|
||||
}
|
||||
} // namespace perf
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
+45
-89
@@ -24,79 +24,31 @@
|
||||
|
||||
#include "core/containers/c_array.hpp"
|
||||
#include "core/defines.hpp"
|
||||
#include "core/locking.hpp"
|
||||
#include "core/perf.hpp"
|
||||
|
||||
#include <timemory/backends/papi.hpp>
|
||||
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <functional>
|
||||
#include <linux/perf_event.h>
|
||||
#include <regex>
|
||||
#include <set>
|
||||
#include <string>
|
||||
#include <sys/types.h>
|
||||
|
||||
#if __GLIBC__ == 2 && __GLIBC_MINOR__ < 30
|
||||
# include <sys/syscall.h>
|
||||
# define gettid() syscall(SYS_gettid)
|
||||
#endif
|
||||
|
||||
// Workaround for missing hw_breakpoint.h include file:
|
||||
// This include file just defines constants used to configure watchpoint registers.
|
||||
// This will be constant across x86 systems.
|
||||
enum
|
||||
{
|
||||
HW_BREAKPOINT_X = 4
|
||||
};
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace causal
|
||||
{
|
||||
namespace perf
|
||||
{
|
||||
/// An enum class with all the available sampling data
|
||||
enum class sample : uint64_t
|
||||
{
|
||||
ip = PERF_SAMPLE_IP,
|
||||
pid_tid = PERF_SAMPLE_TID,
|
||||
time = PERF_SAMPLE_TIME,
|
||||
addr = PERF_SAMPLE_ADDR,
|
||||
id = PERF_SAMPLE_ID,
|
||||
stream_id = PERF_SAMPLE_STREAM_ID,
|
||||
cpu = PERF_SAMPLE_CPU,
|
||||
period = PERF_SAMPLE_PERIOD,
|
||||
|
||||
#if defined(PREF_SAMPLE_READ)
|
||||
read = PERF_SAMPLE_READ,
|
||||
#else
|
||||
read = 0,
|
||||
#endif
|
||||
|
||||
callchain = PERF_SAMPLE_CALLCHAIN,
|
||||
raw = PERF_SAMPLE_RAW,
|
||||
|
||||
#if defined(PERF_SAMPLE_BRANCH_STACK)
|
||||
branch_stack = PERF_SAMPLE_BRANCH_STACK,
|
||||
#else
|
||||
branch_stack = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_REGS_USER)
|
||||
regs = PERF_SAMPLE_REGS_USER,
|
||||
#else
|
||||
regs = 0,
|
||||
#endif
|
||||
|
||||
#if defined(PERF_SAMPLE_STACK_USER)
|
||||
stack = PERF_SAMPLE_STACK_USER,
|
||||
#else
|
||||
stack = 0,
|
||||
#endif
|
||||
|
||||
last = PERF_SAMPLE_MAX
|
||||
};
|
||||
|
||||
struct perf_event
|
||||
{
|
||||
enum class record_type;
|
||||
static constexpr uint32_t max_batch_size = 32;
|
||||
|
||||
struct record;
|
||||
struct sample_record;
|
||||
class iterator;
|
||||
|
||||
/// Default constructor
|
||||
perf_event() = default;
|
||||
@@ -113,17 +65,27 @@ struct perf_event
|
||||
perf_event& operator=(const perf_event&) = delete;
|
||||
|
||||
/// Open a perf_event file using the given options structure
|
||||
bool open(struct perf_event_attr& pe, pid_t pid = 0, int cpu = -1);
|
||||
bool open(double, uint32_t, pid_t pid = 0, int cpu = -1);
|
||||
std::optional<std::string> open(struct perf_event_attr& pe, pid_t pid = 0,
|
||||
int cpu = -1);
|
||||
std::optional<std::string> open(double, uint32_t = 0, pid_t pid = 0, int cpu = -1);
|
||||
|
||||
/// Return file descriptor
|
||||
long get_fileno() const;
|
||||
|
||||
/// Read event count
|
||||
uint64_t get_count() const;
|
||||
|
||||
/// Get the batch size
|
||||
uint32_t get_batch_size() const { return m_batch_size; }
|
||||
|
||||
/// Start counting events and collecting samples
|
||||
void start() const;
|
||||
bool start() const;
|
||||
|
||||
/// Stop counting events
|
||||
void stop() const;
|
||||
bool stop() const;
|
||||
|
||||
/// Check if counting events and collecting samples
|
||||
bool is_open() const;
|
||||
|
||||
/// Close the perf_event file and unmap the ring buffer
|
||||
void close();
|
||||
@@ -141,33 +103,21 @@ struct perf_event
|
||||
/// Get the configuration for this perf_event's read format
|
||||
inline uint64_t get_read_format() const { return m_read_format; }
|
||||
|
||||
/// An enum to distinguish types of records in the mmapped ring buffer
|
||||
enum class record_type
|
||||
{
|
||||
mmap = PERF_RECORD_MMAP,
|
||||
lost = PERF_RECORD_LOST,
|
||||
comm = PERF_RECORD_COMM,
|
||||
exit = PERF_RECORD_EXIT,
|
||||
throttle = PERF_RECORD_THROTTLE,
|
||||
unthrottle = PERF_RECORD_UNTHROTTLE,
|
||||
fork = PERF_RECORD_FORK,
|
||||
read = PERF_RECORD_READ,
|
||||
sample = PERF_RECORD_SAMPLE,
|
||||
|
||||
#if defined(PERF_RECORD_MMAP2)
|
||||
mmap2 = PERF_RECORD_MMAP2
|
||||
#else
|
||||
mmap2 = 0
|
||||
#endif
|
||||
};
|
||||
|
||||
class iterator;
|
||||
|
||||
/// A generic record type
|
||||
struct record
|
||||
{
|
||||
friend class perf_event::iterator;
|
||||
|
||||
record() = default;
|
||||
~record() = default;
|
||||
record(const record&) = default;
|
||||
record(record&&) noexcept = default;
|
||||
record& operator=(const record&) = default;
|
||||
record& operator=(record&&) noexcept = default;
|
||||
|
||||
bool is_valid() const { return (m_source != nullptr && m_header != nullptr); }
|
||||
operator bool() const { return is_valid(); }
|
||||
|
||||
record_type get_type() const { return static_cast<record_type>(m_header->type); }
|
||||
|
||||
inline bool is_mmap() const { return get_type() == record_type::mmap; }
|
||||
@@ -188,11 +138,12 @@ struct perf_event
|
||||
uint64_t get_pid() const;
|
||||
uint64_t get_tid() const;
|
||||
uint64_t get_time() const;
|
||||
uint64_t get_period() const;
|
||||
uint32_t get_cpu() const;
|
||||
container::c_array<uint64_t> get_callchain() const;
|
||||
|
||||
private:
|
||||
record(const perf_event& source, struct perf_event_header* header)
|
||||
record(const perf_event* source, struct perf_event_header* header)
|
||||
: m_source(source)
|
||||
, m_header(header)
|
||||
{}
|
||||
@@ -200,8 +151,8 @@ struct perf_event
|
||||
template <sample SampleT, typename Tp = void*>
|
||||
Tp locate_field() const;
|
||||
|
||||
const perf_event& m_source;
|
||||
struct perf_event_header* m_header;
|
||||
const perf_event* m_source = nullptr;
|
||||
struct perf_event_header* m_header = nullptr;
|
||||
};
|
||||
|
||||
class iterator
|
||||
@@ -232,7 +183,7 @@ struct perf_event
|
||||
/// Get an iterator to the beginning of the memory mapped ring buffer
|
||||
iterator begin() { return iterator(*this, m_mapping); }
|
||||
|
||||
// Get an iterator to the end of the memory mapped ring buffer
|
||||
/// Get an iterator to the end of the memory mapped ring buffer
|
||||
iterator end() { return iterator(*this, nullptr); }
|
||||
|
||||
private:
|
||||
@@ -240,6 +191,8 @@ private:
|
||||
static void copy_from_ring_buffer(struct perf_event_mmap_page* mapping,
|
||||
ptrdiff_t index, void* dest, size_t bytes);
|
||||
|
||||
uint32_t m_batch_size = 10;
|
||||
|
||||
/// File descriptor for the perf event
|
||||
long m_fd = -1;
|
||||
|
||||
@@ -251,6 +204,9 @@ private:
|
||||
/// The read format from this perf event's configuration
|
||||
uint64_t m_read_format = 0;
|
||||
};
|
||||
|
||||
/// provides thread-local instance of perf_event
|
||||
std::unique_ptr<perf_event>&
|
||||
get_instance(int64_t _tid);
|
||||
} // namespace perf
|
||||
} // namespace causal
|
||||
} // namespace omnitrace
|
||||
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
@@ -29,6 +29,7 @@
|
||||
#include "library/components/backtrace.hpp"
|
||||
#include "library/components/backtrace_metrics.hpp"
|
||||
#include "library/components/backtrace_timestamp.hpp"
|
||||
#include "library/components/callchain.hpp"
|
||||
#include "library/thread_data.hpp"
|
||||
|
||||
#include <timemory/macros/language.hpp>
|
||||
@@ -43,20 +44,6 @@ namespace omnitrace
|
||||
{
|
||||
namespace sampling
|
||||
{
|
||||
using component::backtrace; // NOLINT
|
||||
using component::backtrace_cpu_clock; // NOLINT
|
||||
using component::backtrace_fraction; // NOLINT
|
||||
using component::backtrace_metrics; // NOLINT
|
||||
using component::backtrace_timestamp; // NOLINT
|
||||
using component::backtrace_wall_clock; // NOLINT
|
||||
using component::sampling_cpu_clock;
|
||||
using component::sampling_gpu_busy;
|
||||
using component::sampling_gpu_memory;
|
||||
using component::sampling_gpu_power;
|
||||
using component::sampling_gpu_temp;
|
||||
using component::sampling_percent;
|
||||
using component::sampling_wall_clock;
|
||||
|
||||
unique_ptr_t<std::set<int>>&
|
||||
get_signal_types(int64_t _tid);
|
||||
|
||||
|
||||
+17
-1420
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
@@ -0,0 +1,53 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# attach tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
set(_VALID_PTRACE_SCOPE OFF)
|
||||
|
||||
if(EXISTS "/proc/sys/kernel/yama/ptrace_scope")
|
||||
file(READ "/proc/sys/kernel/yama/ptrace_scope" _PTRACE_SCOPE LIMIT 1)
|
||||
|
||||
if("${_PTRACE_SCOPE}" EQUAL 0)
|
||||
set(_VALID_PTRACE_SCOPE ON)
|
||||
endif()
|
||||
else()
|
||||
omnitrace_message(
|
||||
AUTHOR_WARNING
|
||||
"Disabling attach tests. Run 'echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope' to enable attaching to process"
|
||||
)
|
||||
endif()
|
||||
|
||||
if(NOT _VALID_PTRACE_SCOPE)
|
||||
return()
|
||||
endif()
|
||||
|
||||
if(NOT TARGET parallel-overhead)
|
||||
return()
|
||||
endif()
|
||||
|
||||
add_test(
|
||||
NAME parallel-overhead-attach
|
||||
COMMAND
|
||||
${CMAKE_CURRENT_LIST_DIR}/run-omnitrace-pid.sh $<TARGET_FILE:omnitrace-instrument>
|
||||
-ME "\.c$" -E fib -e -v 1 --label return args file -l --
|
||||
$<TARGET_FILE:parallel-overhead> 30 8 1000
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
set(_parallel_overhead_attach_environ
|
||||
"${_attach_environment}" "OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
|
||||
"OMNITRACE_OUTPUT_PREFIX=parallel-overhead-attach/")
|
||||
|
||||
set_tests_properties(
|
||||
parallel-overhead-attach
|
||||
PROPERTIES ENVIRONMENT
|
||||
"${_parallel_overhead_attach_environ}"
|
||||
TIMEOUT
|
||||
300
|
||||
LABELS
|
||||
"parallel-overhead;attach"
|
||||
PASS_REGULAR_EXPRESSION
|
||||
"Outputting.*(perfetto-trace.proto).*Outputting.*(wall_clock.txt)"
|
||||
FAIL_REGULAR_EXPRESSION
|
||||
"Dyninst was unable to attach to the specified process")
|
||||
@@ -0,0 +1,180 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# causal profiling tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
NAME cpu-omni-func
|
||||
TARGET causal-cpu-omni
|
||||
RUN_ARGS 70 10 432525 1000000000
|
||||
CAUSAL_MODE "function"
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-func-ndebug
|
||||
TARGET causal-cpu-omni-ndebug
|
||||
RUN_ARGS 70 10 432525 1000000000
|
||||
CAUSAL_MODE "function"
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-line
|
||||
TARGET causal-cpu-omni
|
||||
RUN_ARGS 70 10 432525 1000000000
|
||||
CAUSAL_MODE "line"
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
NAME both-omni-func
|
||||
TARGET causal-both-omni
|
||||
RUN_ARGS 70 10 432525 400000000
|
||||
CAUSAL_MODE "function"
|
||||
CAUSAL_ARGS
|
||||
-n
|
||||
2
|
||||
-w
|
||||
1
|
||||
-d
|
||||
3
|
||||
--monochrome
|
||||
-g
|
||||
${CMAKE_BINARY_DIR}/omnitrace-tests-config/causal-both-omni-func
|
||||
-l
|
||||
causal-both-omni
|
||||
-v
|
||||
3
|
||||
-b
|
||||
timer
|
||||
ENVIRONMENT "OMNITRACE_STRICT_CONFIG=OFF"
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
NAME lulesh-func
|
||||
TARGET lulesh-omni
|
||||
RUN_ARGS -i 35 -s 50 -p
|
||||
CAUSAL_MODE "function"
|
||||
CAUSAL_ARGS -s 0,10,25,50,75
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME lulesh-func-ndebug
|
||||
TARGET lulesh-omni-ndebug
|
||||
RUN_ARGS -i 35 -s 50 -p
|
||||
CAUSAL_MODE "function"
|
||||
CAUSAL_ARGS -s 0,10,25,50,75
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME lulesh-line
|
||||
TARGET lulesh-omni
|
||||
RUN_ARGS -i 35 -s 50 -p
|
||||
CAUSAL_MODE "line"
|
||||
CAUSAL_ARGS -s 0,10,25,50,75 -S lulesh.cc
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
)
|
||||
|
||||
# set(_causal_e2e_exe_args 80 100 432525 100000000) set(_causal_e2e_exe_args 80 12 432525
|
||||
# 500000000)
|
||||
set(_causal_e2e_exe_args 80 50 432525 100000000)
|
||||
set(_causal_common_args
|
||||
"-n 5 -e -s 0 10 20 30 -B $<TARGET_FILE_BASE_NAME:causal-cpu-omni>")
|
||||
|
||||
macro(
|
||||
causal_e2e_args_and_validation
|
||||
_NAME
|
||||
_TEST
|
||||
_MODE
|
||||
_EXPER
|
||||
_V10 # expected value for virtual speedup of 15
|
||||
_V20
|
||||
_V30
|
||||
_TOL # tolerance for virtual speedup
|
||||
)
|
||||
# arguments to omnitrace-causal
|
||||
set(${_NAME}_args "${_causal_common_args} ${_MODE} ${_EXPER}")
|
||||
|
||||
# arguments to validate-causal-json.py
|
||||
set(${_NAME}_valid
|
||||
"-n 0 -i omnitrace-tests-output/causal-cpu-omni-${_TEST}-e2e/causal/experiments.json -v ${_EXPER} $<TARGET_FILE_BASE_NAME:causal-cpu-omni> 10 ${_V10} ${_TOL} ${_EXPER} $<TARGET_FILE_BASE_NAME:causal-cpu-omni> 20 ${_V20} ${_TOL} ${_EXPER} $<TARGET_FILE_BASE_NAME:causal-cpu-omni> 30 ${_V30} ${_TOL}"
|
||||
)
|
||||
|
||||
# patch string for command-line
|
||||
string(REPLACE " " ";" ${_NAME}_args "${${_NAME}_args}")
|
||||
string(REPLACE " " ";" ${_NAME}_valid "${${_NAME}_valid}")
|
||||
endmacro()
|
||||
|
||||
causal_e2e_args_and_validation(_causal_slow_func slow-func "-F" "cpu_slow_func" 10 20 20
|
||||
5)
|
||||
causal_e2e_args_and_validation(_causal_fast_func fast-func "-F" "cpu_fast_func" 0 0 0 5)
|
||||
causal_e2e_args_and_validation(_causal_line_100 line-100 "-S" "causal.cpp:100" 10 20 20 5)
|
||||
causal_e2e_args_and_validation(_causal_line_110 line-110 "-S" "causal.cpp:110" 0 0 0 5)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-slow-func-e2e
|
||||
TARGET causal-cpu-omni
|
||||
LABELS "causal-e2e"
|
||||
RUN_ARGS ${_causal_e2e_exe_args}
|
||||
CAUSAL_MODE "func"
|
||||
CAUSAL_ARGS ${_causal_slow_func_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_slow_func_valid}
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
PROPERTIES PROCESSORS 2 PROCESSOR_AFFINITY OFF)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-fast-func-e2e
|
||||
TARGET causal-cpu-omni
|
||||
LABELS "causal-e2e"
|
||||
RUN_ARGS ${_causal_e2e_exe_args}
|
||||
CAUSAL_MODE "func"
|
||||
CAUSAL_ARGS ${_causal_fast_func_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_fast_func_valid}
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
PROPERTIES PROCESSORS 2 PROCESSOR_AFFINITY OFF)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-line-100-e2e
|
||||
TARGET causal-cpu-omni
|
||||
LABELS "causal-e2e"
|
||||
RUN_ARGS ${_causal_e2e_exe_args}
|
||||
CAUSAL_MODE "line"
|
||||
CAUSAL_ARGS ${_causal_line_100_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_line_100_valid}
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
PROPERTIES PROCESSORS 2 PROCESSOR_AFFINITY OFF)
|
||||
|
||||
omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-line-110-e2e
|
||||
TARGET causal-cpu-omni
|
||||
LABELS "causal-e2e"
|
||||
RUN_ARGS ${_causal_e2e_exe_args}
|
||||
CAUSAL_MODE "line"
|
||||
CAUSAL_ARGS ${_causal_line_110_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_line_110_valid}
|
||||
CAUSAL_PASS_REGEX
|
||||
"Starting causal experiment #1(.*)causal/experiments.json(.*)causal/experiments.coz"
|
||||
PROPERTIES PROCESSORS 2 PROCESSOR_AFFINITY OFF)
|
||||
@@ -0,0 +1,137 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# code-coverage tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING
|
||||
NAME code-coverage
|
||||
TARGET code-coverage
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-v
|
||||
2
|
||||
--min-instructions=4
|
||||
-E
|
||||
^std::
|
||||
-M
|
||||
coverage
|
||||
--coverage
|
||||
function
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--min-instructions=4
|
||||
-E
|
||||
^std::
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
-M
|
||||
coverage
|
||||
--coverage
|
||||
function
|
||||
--module-restrict
|
||||
code.coverage
|
||||
LABELS "coverage;function-coverage"
|
||||
RUN_ARGS 10 ${NUM_THREADS} 1000
|
||||
ENVIRONMENT "${_base_environment}"
|
||||
RUNTIME_PASS_REGEX "(\\\[[0-9]+\\\]) code coverage :: 66.67%"
|
||||
REWRITE_RUN_PASS_REGEX "(\\\[[0-9]+\\\]) code coverage :: 66.67%")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING
|
||||
NAME code-coverage-hybrid
|
||||
TARGET code-coverage
|
||||
REWRITE_ARGS -e -v 2 --min-instructions=4 -E ^std:: --coverage function
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--min-instructions=4
|
||||
-E
|
||||
^std::
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
--coverage
|
||||
function
|
||||
--module-restrict
|
||||
code.coverage
|
||||
LABELS "coverage;function-coverage;hybrid-coverage"
|
||||
RUN_ARGS 10 ${NUM_THREADS} 1000
|
||||
ENVIRONMENT "${_base_environment}"
|
||||
RUNTIME_PASS_REGEX "(\\\[[0-9]+\\\]) code coverage :: 66.67%"
|
||||
REWRITE_RUN_PASS_REGEX "(\\\[[0-9]+\\\]) code coverage :: 66.67%")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING
|
||||
NAME code-coverage-basic-blocks
|
||||
TARGET code-coverage
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-v
|
||||
2
|
||||
--min-instructions=4
|
||||
-E
|
||||
^std::
|
||||
-M
|
||||
coverage
|
||||
--coverage
|
||||
basic_block
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--min-instructions=4
|
||||
-E
|
||||
^std::
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
-M
|
||||
coverage
|
||||
--coverage
|
||||
basic_block
|
||||
--module-restrict
|
||||
code.coverage
|
||||
LABELS "coverage;bb-coverage"
|
||||
RUN_ARGS 10 ${NUM_THREADS} 1000
|
||||
ENVIRONMENT "${_base_environment}"
|
||||
RUNTIME_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%"
|
||||
REWRITE_RUN_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING
|
||||
NAME code-coverage-basic-blocks-hybrid
|
||||
TARGET code-coverage
|
||||
REWRITE_ARGS -e -v 2 --min-instructions=4 -E ^std:: --coverage basic_block
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--min-instructions=4
|
||||
-E
|
||||
^std::
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
--coverage
|
||||
basic_block
|
||||
--module-restrict
|
||||
code.coverage
|
||||
LABELS "coverage;bb-coverage;hybrid-coverage"
|
||||
RUN_ARGS 10 ${NUM_THREADS} 1000
|
||||
ENVIRONMENT "${_base_environment}"
|
||||
RUNTIME_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%"
|
||||
REWRITE_RUN_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%")
|
||||
@@ -0,0 +1,40 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# general config file tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
file(
|
||||
WRITE ${CMAKE_CURRENT_BINARY_DIR}/invalid.cfg
|
||||
"
|
||||
OMNITRACE_CONFIG_FILE =
|
||||
FOOBAR = ON
|
||||
")
|
||||
|
||||
if(TARGET parallel-overhead)
|
||||
set(_CONFIG_TEST_EXE $<TARGET_FILE:parallel-overhead>)
|
||||
else()
|
||||
set(_CONFIG_TEST_EXE ls)
|
||||
endif()
|
||||
|
||||
add_test(
|
||||
NAME omnitrace-invalid-config
|
||||
COMMAND $<TARGET_FILE:omnitrace-instrument> -- ${_CONFIG_TEST_EXE}
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
set_tests_properties(
|
||||
omnitrace-invalid-config
|
||||
PROPERTIES ENVIRONMENT
|
||||
"OMNITRACE_CONFIG_FILE=${CMAKE_CURRENT_BINARY_DIR}/invalid.cfg" TIMEOUT
|
||||
120 LABELS "config" WILL_FAIL ON)
|
||||
|
||||
add_test(
|
||||
NAME omnitrace-missing-config
|
||||
COMMAND $<TARGET_FILE:omnitrace-instrument> -- ${_CONFIG_TEST_EXE}
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
set_tests_properties(
|
||||
omnitrace-missing-config
|
||||
PROPERTIES ENVIRONMENT
|
||||
"OMNITRACE_CONFIG_FILE=${CMAKE_CURRENT_BINARY_DIR}/missing.cfg" TIMEOUT
|
||||
120 LABELS "config" WILL_FAIL ON)
|
||||
@@ -0,0 +1,52 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# critical-trace tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME SKIP_SAMPLING
|
||||
NAME parallel-overhead-critical-trace
|
||||
TARGET parallel-overhead
|
||||
LABELS "critical-trace"
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-i
|
||||
8
|
||||
-E
|
||||
"^fib"
|
||||
-v
|
||||
2
|
||||
--print-instrumented
|
||||
functions
|
||||
RUN_ARGS 10 4 100
|
||||
ENVIRONMENT "${_critical_trace_environment}")
|
||||
|
||||
add_test(
|
||||
NAME parallel-overhead-process-critical-trace
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-critical-trace>
|
||||
${PROJECT_BINARY_DIR}/omnitrace-tests-output/parallel-overhead-critical-trace-binary-rewrite/call-chain.json
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
set(_parallel_overhead_critical_trace_environ
|
||||
"OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
|
||||
"OMNITRACE_OUTPUT_PREFIX=parallel-overhead-critical-trace/"
|
||||
"OMNITRACE_CRITICAL_TRACE_DEBUG=ON"
|
||||
"OMNITRACE_VERBOSE=4"
|
||||
"OMNITRACE_USE_PID=OFF"
|
||||
"OMNITRACE_TIME_OUTPUT=OFF")
|
||||
|
||||
set_tests_properties(
|
||||
parallel-overhead-process-critical-trace
|
||||
PROPERTIES
|
||||
ENVIRONMENT
|
||||
"${_parallel_overhead_critical_trace_environ}"
|
||||
TIMEOUT
|
||||
300
|
||||
LABELS
|
||||
"parallel-overhead;critical-trace"
|
||||
PASS_REGULAR_EXPRESSION
|
||||
"Outputting.*(critical-trace-cpu.json).*Outputting.*(critical-trace-any.json)"
|
||||
DEPENDS
|
||||
parallel-overhead-critical-trace-binary-rewrite-run)
|
||||
@@ -0,0 +1,23 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# fork tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME fork
|
||||
TARGET fork-example
|
||||
REWRITE_ARGS -e -v 2 --print-instrumented modules -i 16
|
||||
RUNTIME_ARGS -e -v 1 --label file -i 16
|
||||
ENVIRONMENT
|
||||
"${_base_environment};OMNITRACE_CRITICAL_TRACE=ON;OMNITRACE_SAMPLING_FREQ=250;OMNITRACE_SAMPLING_REALTIME=ON"
|
||||
SAMPLING_PASS_REGEX "fork.. called on PID"
|
||||
RUNTIME_PASS_REGEX "fork.. called on PID"
|
||||
REWRITE_RUN_PASS_REGEX "fork.. called on PID"
|
||||
SAMPLING_FAIL_REGEX
|
||||
"(terminate called after throwing an instance|calling abort.. in |Exit code: [1-9])"
|
||||
RUNTIME_FAIL_REGEX
|
||||
"(terminate called after throwing an instance|calling abort.. in |Exit code: [1-9])"
|
||||
REWRITE_RUN_FAIL_REGEX
|
||||
"(terminate called after throwing an instance|calling abort.. in |Exit code: [1-9])"
|
||||
)
|
||||
@@ -0,0 +1,54 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# binary-rewrite and runtime-instrumentation tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_SAMPLING SKIP_RUNTIME
|
||||
NAME rewrite-caller
|
||||
TARGET rewrite-caller
|
||||
LABELS "caller-include"
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-i
|
||||
256
|
||||
--caller-include
|
||||
"^inner"
|
||||
-v
|
||||
2
|
||||
--print-instrumented
|
||||
functions
|
||||
RUN_ARGS 17
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_COUT_OUTPUT=ON"
|
||||
BASELINE_PASS_REGEX "number of calls made = 17"
|
||||
REWRITE_PASS_REGEX "\\[function\\]\\[Forcing\\] caller-include-regex :: 'outer'"
|
||||
REWRITE_RUN_PASS_REGEX ">>> ._outer ([ \\|]+) 17")
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME parallel-overhead
|
||||
TARGET parallel-overhead
|
||||
REWRITE_ARGS -e -v 2 --min-instructions=8
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--min-instructions=8
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
RUN_ARGS 10 ${NUM_THREADS} 1000
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME
|
||||
NAME parallel-overhead-locks-perfetto
|
||||
TARGET parallel-overhead-locks
|
||||
LABELS "locks"
|
||||
REWRITE_ARGS -e -v 2 --min-instructions=8
|
||||
RUN_ARGS 10 4 1000
|
||||
ENVIRONMENT
|
||||
"${_lock_environment};OMNITRACE_FLAT_PROFILE=ON;OMNITRACE_USE_TIMEMORY=OFF;OMNITRACE_USE_PERFETTO=ON;OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF"
|
||||
)
|
||||
@@ -0,0 +1,128 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# kokkos (lulesh) tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME lulesh
|
||||
TARGET lulesh
|
||||
MPI ${LULESH_USE_MPI}
|
||||
GPU ${LULESH_USE_GPU}
|
||||
NUM_PROCS 8
|
||||
LABELS "kokkos"
|
||||
REWRITE_ARGS -e -v 2 --label file line return args
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
-ME
|
||||
[==[lib(gomp|m-)]==]
|
||||
LABELS "kokkos;kokkos-profile-library"
|
||||
RUN_ARGS -i 25 -s 20 -p
|
||||
ENVIRONMENT
|
||||
"${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_USE_KOKKOSP=ON;OMNITRACE_COUT_OUTPUT=ON;OMNITRACE_SAMPLING_FREQ=50;OMNITRACE_KOKKOSP_PREFIX=[kokkos];KOKKOS_PROFILE_LIBRARY=libomnitrace-dl.so"
|
||||
REWRITE_RUN_PASS_REGEX "\\|_\\[kokkos\\] [a-zA-Z]"
|
||||
RUNTIME_PASS_REGEX "\\|_\\[kokkos\\] [a-zA-Z]")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME SKIP_REWRITE
|
||||
NAME lulesh-baseline-kokkosp-libomnitrace
|
||||
TARGET lulesh
|
||||
MPI ${LULESH_USE_MPI}
|
||||
GPU ${LULESH_USE_GPU}
|
||||
NUM_PROCS 8
|
||||
LABELS "kokkos;kokkos-profile-library"
|
||||
RUN_ARGS -i 10 -s 20 -p
|
||||
ENVIRONMENT
|
||||
"${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_USE_KOKKOSP=ON;OMNITRACE_COUT_OUTPUT=ON;OMNITRACE_SAMPLING_FREQ=50;OMNITRACE_KOKKOSP_PREFIX=[kokkos];KOKKOS_PROFILE_LIBRARY=libomnitrace.so"
|
||||
BASELINE_PASS_REGEX "\\|_\\[kokkos\\] [a-zA-Z]")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME SKIP_REWRITE
|
||||
NAME lulesh-baseline-kokkosp-libomnitrace-dl
|
||||
TARGET lulesh
|
||||
MPI ${LULESH_USE_MPI}
|
||||
GPU ${LULESH_USE_GPU}
|
||||
NUM_PROCS 8
|
||||
LABELS "kokkos;kokkos-profile-library"
|
||||
RUN_ARGS -i 10 -s 20 -p
|
||||
ENVIRONMENT
|
||||
"${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_USE_KOKKOSP=ON;OMNITRACE_COUT_OUTPUT=ON;OMNITRACE_SAMPLING_FREQ=50;OMNITRACE_KOKKOSP_PREFIX=[kokkos];KOKKOS_PROFILE_LIBRARY=libomnitrace-dl.so"
|
||||
BASELINE_PASS_REGEX "\\|_\\[kokkos\\] [a-zA-Z]")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE
|
||||
NAME lulesh-kokkosp
|
||||
TARGET lulesh
|
||||
MPI ${LULESH_USE_MPI}
|
||||
GPU ${LULESH_USE_GPU}
|
||||
NUM_PROCS 8
|
||||
LABELS "kokkos"
|
||||
REWRITE_ARGS -e -v 2
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
-ME
|
||||
[==[lib(gomp|m-)]==]
|
||||
RUN_ARGS -i 10 -s 20 -p
|
||||
ENVIRONMENT
|
||||
"${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_USE_KOKKOSP=ON")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE
|
||||
NAME lulesh-perfetto
|
||||
TARGET lulesh
|
||||
MPI ${LULESH_USE_MPI}
|
||||
GPU ${LULESH_USE_GPU}
|
||||
NUM_PROCS 8
|
||||
LABELS "kokkos;loops"
|
||||
REWRITE_ARGS -e -v 2
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
-l
|
||||
--dynamic-callsites
|
||||
--traps
|
||||
--allow-overlapping
|
||||
-ME
|
||||
[==[libgomp]==]
|
||||
RUN_ARGS -i 10 -s 20 -p
|
||||
ENVIRONMENT
|
||||
"${_perfetto_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_USE_KOKKOSP=OFF")
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME lulesh-timemory
|
||||
TARGET lulesh
|
||||
MPI ${LULESH_USE_MPI}
|
||||
GPU ${LULESH_USE_GPU}
|
||||
NUM_PROCS 8
|
||||
LABELS "kokkos;loops"
|
||||
REWRITE_ARGS -e -v 2 -l --dynamic-callsites --traps --allow-overlapping
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
-l
|
||||
--dynamic-callsites
|
||||
-ME
|
||||
[==[libgomp]==]
|
||||
-d
|
||||
wall_clock
|
||||
peak_rss
|
||||
RUN_ARGS -i 10 -s 20 -p
|
||||
ENVIRONMENT
|
||||
"${_timemory_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_USE_KOKKOSP=OFF"
|
||||
REWRITE_FAIL_REGEX "0 instrumented loops in procedure")
|
||||
@@ -0,0 +1,122 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# MPI tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
if(NOT OMNITRACE_USE_MPI AND NOT OMNITRACE_USE_MPI_HEADERS)
|
||||
return()
|
||||
endif()
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME
|
||||
NAME "mpi"
|
||||
TARGET mpi-example
|
||||
MPI ON
|
||||
NUM_PROCS 4
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-v
|
||||
2
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
--min-instructions
|
||||
0
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_VERBOSE=1"
|
||||
REWRITE_RUN_PASS_REGEX
|
||||
"(/[A-Za-z-]+/perfetto-trace-0.proto).*(/[A-Za-z-]+/wall_clock-0.txt')"
|
||||
REWRITE_RUN_FAIL_REGEX
|
||||
"(perfetto-trace|trip_count|sampling_percent|sampling_cpu_clock|sampling_wall_clock|wall_clock)-[0-9][0-9]+.(json|txt|proto)"
|
||||
)
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME
|
||||
NAME "mpi-flat-mpip"
|
||||
TARGET mpi-example
|
||||
MPI ON
|
||||
NUM_PROCS 4
|
||||
LABELS "mpip"
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-v
|
||||
2
|
||||
--label
|
||||
file
|
||||
line
|
||||
args
|
||||
--min-instructions
|
||||
0
|
||||
ENVIRONMENT
|
||||
"${_flat_environment};OMNITRACE_USE_SAMPLING=OFF;OMNITRACE_STRICT_CONFIG=OFF;OMNITRACE_USE_MPIP=ON"
|
||||
REWRITE_RUN_PASS_REGEX
|
||||
">>> mpi-flat-mpip.inst(.*\n.*)>>> MPI_Init_thread(.*\n.*)>>> pthread_create(.*\n.*)>>> MPI_Comm_size(.*\n.*)>>> MPI_Comm_rank(.*\n.*)>>> MPI_Barrier(.*\n.*)>>> MPI_Alltoall"
|
||||
)
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME
|
||||
NAME "mpi-flat"
|
||||
TARGET mpi-example
|
||||
MPI ON
|
||||
NUM_PROCS 4
|
||||
LABELS "mpip"
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-v
|
||||
2
|
||||
--label
|
||||
file
|
||||
line
|
||||
args
|
||||
--min-instructions
|
||||
0
|
||||
ENVIRONMENT "${_flat_environment};OMNITRACE_USE_SAMPLING=OFF"
|
||||
REWRITE_RUN_PASS_REGEX
|
||||
">>> mpi-flat.inst(.*\n.*)>>> MPI_Init_thread(.*\n.*)>>> pthread_create(.*\n.*)>>> MPI_Comm_size(.*\n.*)>>> MPI_Comm_rank(.*\n.*)>>> MPI_Barrier(.*\n.*)>>> MPI_Alltoall"
|
||||
)
|
||||
|
||||
set(_mpip_environment
|
||||
"OMNITRACE_USE_PERFETTO=ON"
|
||||
"OMNITRACE_USE_TIMEMORY=ON"
|
||||
"OMNITRACE_USE_SAMPLING=OFF"
|
||||
"OMNITRACE_USE_PROCESS_SAMPLING=OFF"
|
||||
"OMNITRACE_TIME_OUTPUT=OFF"
|
||||
"OMNITRACE_FILE_OUTPUT=ON"
|
||||
"OMNITRACE_USE_MPIP=ON"
|
||||
"OMNITRACE_DEBUG=OFF"
|
||||
"OMNITRACE_VERBOSE=2"
|
||||
"OMNITRACE_DL_VERBOSE=2"
|
||||
"${_test_openmp_env}"
|
||||
"${_test_library_path}")
|
||||
|
||||
set(_mpip_all2all_environment
|
||||
"OMNITRACE_USE_PERFETTO=ON"
|
||||
"OMNITRACE_USE_TIMEMORY=ON"
|
||||
"OMNITRACE_USE_SAMPLING=OFF"
|
||||
"OMNITRACE_USE_PROCESS_SAMPLING=OFF"
|
||||
"OMNITRACE_TIME_OUTPUT=OFF"
|
||||
"OMNITRACE_FILE_OUTPUT=ON"
|
||||
"OMNITRACE_USE_MPIP=ON"
|
||||
"OMNITRACE_DEBUG=ON"
|
||||
"OMNITRACE_VERBOSE=3"
|
||||
"OMNITRACE_DL_VERBOSE=3"
|
||||
"${_test_openmp_env}"
|
||||
"${_test_library_path}")
|
||||
|
||||
foreach(_EXAMPLE all2all allgather allreduce bcast reduce scatter-gather send-recv)
|
||||
if("${_mpip_${_EXAMPLE}_environment}" STREQUAL "")
|
||||
set(_mpip_${_EXAMPLE}_environment "${_mpip_environment}")
|
||||
endif()
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME SKIP_SAMPLING
|
||||
NAME "mpi-${_EXAMPLE}"
|
||||
TARGET mpi-${_EXAMPLE}
|
||||
MPI ON
|
||||
NUM_PROCS 2
|
||||
LABELS "mpip"
|
||||
REWRITE_ARGS -e -v 2 --label file line --min-instructions 0
|
||||
RUN_ARGS 30
|
||||
ENVIRONMENT "${_mpip_${_EXAMPLE}_environment}")
|
||||
endforeach()
|
||||
@@ -0,0 +1,99 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# openmp tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
if(OMNITRACE_OPENMP_USING_LIBOMP_LIBRARY AND OMNITRACE_USE_OMPT)
|
||||
set(_OMPT_PASS_REGEX "\\|_ompt_")
|
||||
else()
|
||||
set(_OMPT_PASS_REGEX "")
|
||||
endif()
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME openmp-cg
|
||||
TARGET openmp-cg
|
||||
LABELS "openmp"
|
||||
REWRITE_ARGS -e -v 2 --instrument-loops
|
||||
RUNTIME_ARGS -e -v 1 --label return args
|
||||
REWRITE_TIMEOUT 180
|
||||
RUNTIME_TIMEOUT 360
|
||||
ENVIRONMENT "${_ompt_environment};OMNITRACE_USE_SAMPLING=OFF;OMNITRACE_COUT_OUTPUT=ON"
|
||||
REWRITE_RUN_PASS_REGEX "${_OMPT_PASS_REGEX}"
|
||||
RUNTIME_PASS_REGEX "${_OMPT_PASS_REGEX}"
|
||||
REWRITE_FAIL_REGEX "0 instrumented loops in procedure")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME
|
||||
NAME openmp-lu
|
||||
TARGET openmp-lu
|
||||
LABELS "openmp"
|
||||
REWRITE_ARGS -e -v 2 --instrument-loops
|
||||
RUNTIME_ARGS -e -v 1 --label return args -E ^GOMP
|
||||
REWRITE_TIMEOUT 180
|
||||
RUNTIME_TIMEOUT 360
|
||||
ENVIRONMENT
|
||||
"${_ompt_environment};OMNITRACE_USE_SAMPLING=ON;OMNITRACE_SAMPLING_FREQ=50;OMNITRACE_COUT_OUTPUT=ON"
|
||||
REWRITE_RUN_PASS_REGEX "${_OMPT_PASS_REGEX}"
|
||||
REWRITE_FAIL_REGEX "0 instrumented loops in procedure")
|
||||
|
||||
set(_ompt_sampling_environ
|
||||
"${_ompt_environment}"
|
||||
"OMNITRACE_VERBOSE=2"
|
||||
"OMNITRACE_USE_OMPT=OFF"
|
||||
"OMNITRACE_USE_SAMPLING=ON"
|
||||
"OMNITRACE_USE_PROCESS_SAMPLING=OFF"
|
||||
"OMNITRACE_SAMPLING_FREQ=100"
|
||||
"OMNITRACE_SAMPLING_DELAY=0.1"
|
||||
"OMNITRACE_SAMPLING_DURATION=0.25"
|
||||
"OMNITRACE_SAMPLING_CPUTIME=ON"
|
||||
"OMNITRACE_SAMPLING_REALTIME=ON"
|
||||
"OMNITRACE_SAMPLING_CPUTIME_FREQ=1000"
|
||||
"OMNITRACE_SAMPLING_REALTIME_FREQ=500"
|
||||
"OMNITRACE_MONOCHROME=ON")
|
||||
|
||||
set(_ompt_sample_no_tmpfiles_environ
|
||||
"${_ompt_environment}"
|
||||
"OMNITRACE_VERBOSE=2"
|
||||
"OMNITRACE_USE_OMPT=OFF"
|
||||
"OMNITRACE_USE_SAMPLING=ON"
|
||||
"OMNITRACE_USE_PROCESS_SAMPLING=OFF"
|
||||
"OMNITRACE_SAMPLING_CPUTIME=ON"
|
||||
"OMNITRACE_SAMPLING_REALTIME=OFF"
|
||||
"OMNITRACE_SAMPLING_CPUTIME_FREQ=700"
|
||||
"OMNITRACE_USE_TEMPORARY_FILES=OFF"
|
||||
"OMNITRACE_MONOCHROME=ON")
|
||||
|
||||
set(_ompt_sampling_samp_regex
|
||||
"Sampler for thread 0 will be triggered 1000.0x per second of CPU-time(.*)Sampler for thread 0 will be triggered 500.0x per second of wall-time(.*)Sampling will be disabled after 0.250000 seconds(.*)Sampling duration of 0.250000 seconds has elapsed. Shutting down sampling"
|
||||
)
|
||||
set(_ompt_sampling_file_regex
|
||||
"sampling-duration-sampling/sampling_percent.(json|txt)(.*)sampling-duration-sampling/sampling_cpu_clock.(json|txt)(.*)sampling-duration-sampling/sampling_wall_clock.(json|txt)"
|
||||
)
|
||||
set(_notmp_sampling_file_regex
|
||||
"sampling-no-tmp-files-sampling/sampling_percent.(json|txt)(.*)sampling-no-tmp-files-sampling/sampling_cpu_clock.(json|txt)(.*)sampling-no-tmp-files-sampling/sampling_wall_clock.(json|txt)"
|
||||
)
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME SKIP_REWRITE
|
||||
NAME openmp-cg-sampling-duration
|
||||
TARGET openmp-cg
|
||||
LABELS "openmp;sampling-duration"
|
||||
ENVIRONMENT "${_ompt_sampling_environ}"
|
||||
SAMPLING_PASS_REGEX "${_ompt_sampling_samp_regex}(.*)${_ompt_sampling_file_regex}")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME SKIP_REWRITE
|
||||
NAME openmp-lu-sampling-duration
|
||||
TARGET openmp-lu
|
||||
LABELS "openmp;sampling-duration"
|
||||
ENVIRONMENT "${_ompt_sampling_environ}"
|
||||
SAMPLING_PASS_REGEX "${_ompt_sampling_samp_regex}(.*)${_ompt_sampling_file_regex}")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME SKIP_REWRITE
|
||||
NAME openmp-cg-sampling-no-tmp-files
|
||||
TARGET openmp-cg
|
||||
LABELS "openmp;no-tmp-files"
|
||||
ENVIRONMENT "${_ompt_sample_no_tmpfiles_environ}"
|
||||
SAMPLING_PASS_REGEX "${_notmp_sampling_file_regex}")
|
||||
@@ -0,0 +1,34 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# binary-rewrite and runtime-instrumentation tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME parallel-overhead-locks
|
||||
TARGET parallel-overhead-locks
|
||||
LABELS "locks"
|
||||
REWRITE_ARGS -e -i 256
|
||||
RUNTIME_ARGS -e -i 256
|
||||
RUN_ARGS 30 4 1000
|
||||
ENVIRONMENT
|
||||
"${_lock_environment};OMNITRACE_USE_TIMEMORY=ON;OMNITRACE_USE_PERFETTO=ON;OMNITRACE_COLLAPSE_THREADS=OFF;OMNITRACE_SAMPLING_REALTIME=ON;OMNITRACE_SAMPLING_REALTIME_FREQ=10;OMNITRACE_SAMPLING_REALTIME_TIDS=0;OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF"
|
||||
REWRITE_RUN_PASS_REGEX
|
||||
"wall_clock .*\\|_pthread_create .* 4 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000"
|
||||
RUNTIME_PASS_REGEX
|
||||
"wall_clock .*\\|_pthread_create .* 4 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000 .*\\|_pthread_mutex_lock .* 1000 .*\\|_pthread_mutex_unlock .* 1000"
|
||||
)
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_RUNTIME
|
||||
NAME parallel-overhead-locks-timemory
|
||||
TARGET parallel-overhead-locks
|
||||
LABELS "locks"
|
||||
REWRITE_ARGS -e -v 2 --min-instructions=32 --dyninst-options InstrStackFrames SaveFPR
|
||||
TrampRecursive
|
||||
RUN_ARGS 10 4 1000
|
||||
ENVIRONMENT
|
||||
"${_lock_environment};OMNITRACE_FLAT_PROFILE=ON;OMNITRACE_USE_TIMEMORY=ON;OMNITRACE_USE_PERFETTO=OFF;OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF"
|
||||
REWRITE_RUN_PASS_REGEX
|
||||
"start_thread (.*) 4 (.*) pthread_mutex_lock (.*) 4000 (.*) pthread_mutex_unlock (.*) 4000"
|
||||
)
|
||||
@@ -0,0 +1,267 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# python tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
set(_INDEX 0)
|
||||
|
||||
foreach(_VERSION ${OMNITRACE_PYTHON_VERSIONS})
|
||||
if(NOT OMNITRACE_USE_PYTHON)
|
||||
continue()
|
||||
endif()
|
||||
|
||||
list(GET OMNITRACE_PYTHON_ROOT_DIRS ${_INDEX} _PYTHON_ROOT_DIR)
|
||||
|
||||
omnitrace_find_python(
|
||||
_PYTHON
|
||||
ROOT_DIR "${_PYTHON_ROOT_DIR}"
|
||||
COMPONENTS Interpreter)
|
||||
|
||||
# ---------------------------------------------------------------------------------- #
|
||||
# python tests
|
||||
# ---------------------------------------------------------------------------------- #
|
||||
omnitrace_add_python_test(
|
||||
NAME python-external
|
||||
PYTHON_EXECUTABLE ${_PYTHON_EXECUTABLE}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE ${CMAKE_SOURCE_DIR}/examples/python/external.py
|
||||
PROFILE_ARGS "--label" "file"
|
||||
RUN_ARGS -v 10 -n 5
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME python-external-exclude-inefficient
|
||||
PYTHON_EXECUTABLE ${_PYTHON_EXECUTABLE}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE ${CMAKE_SOURCE_DIR}/examples/python/external.py
|
||||
PROFILE_ARGS -E "^inefficient$"
|
||||
RUN_ARGS -v 10 -n 5
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME python-builtin
|
||||
PYTHON_EXECUTABLE ${_PYTHON_EXECUTABLE}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE ${CMAKE_SOURCE_DIR}/examples/python/builtin.py
|
||||
PROFILE_ARGS "-b" "--label" "file" "line"
|
||||
RUN_ARGS -v 10 -n 5
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME python-builtin-noprofile
|
||||
PYTHON_EXECUTABLE ${_PYTHON_EXECUTABLE}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE ${CMAKE_SOURCE_DIR}/examples/python/noprofile.py
|
||||
PROFILE_ARGS "-b" "--label" "file"
|
||||
RUN_ARGS -v 15 -n 5
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
STANDALONE
|
||||
NAME python-source
|
||||
PYTHON_EXECUTABLE ${_PYTHON_EXECUTABLE}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE ${CMAKE_SOURCE_DIR}/examples/python/source.py
|
||||
RUN_ARGS -v 5 -n 5 -s 3
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
STANDALONE
|
||||
NAME python-code-coverage
|
||||
PYTHON_EXECUTABLE ${_PYTHON_EXECUTABLE}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE ${CMAKE_SOURCE_DIR}/examples/code-coverage/code-coverage.py
|
||||
RUN_ARGS
|
||||
-i
|
||||
${PROJECT_BINARY_DIR}/omnitrace-tests-output/code-coverage-basic-blocks-binary-rewrite/coverage.json
|
||||
${PROJECT_BINARY_DIR}/omnitrace-tests-output/code-coverage-basic-blocks-hybrid-runtime-instrument/coverage.json
|
||||
-o
|
||||
${PROJECT_BINARY_DIR}/omnitrace-tests-output/code-coverage-basic-blocks-summary/coverage.json
|
||||
DEPENDS code-coverage-basic-blocks-binary-rewrite
|
||||
code-coverage-basic-blocks-binary-rewrite-run
|
||||
code-coverage-basic-blocks-hybrid-runtime-instrument
|
||||
LABELS "code-coverage"
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
# ---------------------------------------------------------------------------------- #
|
||||
# python output tests
|
||||
# ---------------------------------------------------------------------------------- #
|
||||
if(CMAKE_VERSION VERSION_LESS "3.18.0")
|
||||
find_program(
|
||||
OMNITRACE_CAT_EXE
|
||||
NAMES cat
|
||||
PATH_SUFFIXES bin)
|
||||
|
||||
if(OMNITRACE_CAT_EXE)
|
||||
set(OMNITRACE_CAT_COMMAND ${OMNITRACE_CAT_EXE})
|
||||
endif()
|
||||
else()
|
||||
set(OMNITRACE_CAT_COMMAND ${CMAKE_COMMAND} -E cat)
|
||||
endif()
|
||||
|
||||
if(OMNITRACE_CAT_COMMAND)
|
||||
omnitrace_add_python_test(
|
||||
NAME python-external-check
|
||||
COMMAND ${OMNITRACE_CAT_COMMAND}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE omnitrace-tests-output/python-external/${_VERSION}/trip_count.txt
|
||||
PASS_REGEX
|
||||
"(\\\[compile\\\]).*(\\\| \\\|0>>> \\\[run\\\]\\\[external.py\\\]).*(\\\| \\\|0>>> \\\|_\\\[fib\\\]\\\[external.py\\\]).*(\\\| \\\|0>>> \\\|_\\\[inefficient\\\]\\\[external.py\\\])"
|
||||
DEPENDS python-external-${_VERSION}
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME python-external-exclude-inefficient-check
|
||||
COMMAND ${OMNITRACE_CAT_COMMAND}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE omnitrace-tests-output/python-external-exclude-inefficient/${_VERSION}/trip_count.txt
|
||||
FAIL_REGEX "(\\\|_inefficient).*(\\\|_sum)"
|
||||
DEPENDS python-external-exclude-inefficient-${_VERSION}
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME python-builtin-check
|
||||
COMMAND ${OMNITRACE_CAT_COMMAND}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE omnitrace-tests-output/python-builtin/${_VERSION}/trip_count.txt
|
||||
PASS_REGEX "\\\[inefficient\\\]\\\[builtin.py:14\\\]"
|
||||
DEPENDS python-builtin-${_VERSION}
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME python-builtin-noprofile-check
|
||||
COMMAND ${OMNITRACE_CAT_COMMAND}
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE omnitrace-tests-output/python-builtin-noprofile/${_VERSION}/trip_count.txt
|
||||
PASS_REGEX ".(run)..(noprofile.py)."
|
||||
FAIL_REGEX ".(fib|inefficient)..(noprofile.py)."
|
||||
DEPENDS python-builtin-noprofile-${_VERSION}
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
else()
|
||||
omnitrace_message(
|
||||
WARNING
|
||||
"Neither 'cat' nor 'cmake -E cat' are available. Python source checks are disabled"
|
||||
)
|
||||
endif()
|
||||
|
||||
function(OMNITRACE_ADD_PYTHON_VALIDATION_TEST)
|
||||
cmake_parse_arguments(
|
||||
TEST "" "NAME;TIMEMORY_METRIC;TIMEMORY_FILE;PERFETTO_METRIC;PERFETTO_FILE"
|
||||
"ARGS" ${ARGN})
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME ${TEST_NAME}-validate-timemory
|
||||
COMMAND
|
||||
${_PYTHON_EXECUTABLE} ${CMAKE_CURRENT_LIST_DIR}/validate-timemory-json.py
|
||||
-m ${TEST_TIMEMORY_METRIC} ${TEST_ARGS} -i
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE omnitrace-tests-output/${TEST_NAME}/${_VERSION}/${TEST_TIMEMORY_FILE}
|
||||
DEPENDS ${TEST_NAME}-${_VERSION}
|
||||
PASS_REGEX
|
||||
"omnitrace-tests-output/${TEST_NAME}/${_VERSION}/${TEST_TIMEMORY_FILE} validated"
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
|
||||
omnitrace_add_python_test(
|
||||
NAME ${TEST_NAME}-validate-perfetto
|
||||
COMMAND
|
||||
${_PYTHON_EXECUTABLE} ${CMAKE_CURRENT_LIST_DIR}/validate-perfetto-proto.py
|
||||
-m ${TEST_PERFETTO_METRIC} ${TEST_ARGS} -p -i
|
||||
PYTHON_VERSION ${_VERSION}
|
||||
FILE omnitrace-tests-output/${TEST_NAME}/${_VERSION}/${TEST_PERFETTO_FILE}
|
||||
DEPENDS ${TEST_NAME}-${_VERSION}
|
||||
PASS_REGEX
|
||||
"omnitrace-tests-output/${TEST_NAME}/${_VERSION}/${TEST_PERFETTO_FILE} validated"
|
||||
ENVIRONMENT "${_python_environment}")
|
||||
endfunction()
|
||||
|
||||
set(python_source_labels
|
||||
main_loop
|
||||
run
|
||||
fib
|
||||
fib
|
||||
fib
|
||||
fib
|
||||
fib
|
||||
inefficient
|
||||
_sum)
|
||||
set(python_source_count
|
||||
5
|
||||
3
|
||||
3
|
||||
6
|
||||
12
|
||||
18
|
||||
6
|
||||
3
|
||||
3)
|
||||
set(python_source_depth
|
||||
0
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
5
|
||||
6
|
||||
2
|
||||
3)
|
||||
|
||||
omnitrace_add_python_validation_test(
|
||||
NAME python-source
|
||||
TIMEMORY_METRIC "trip_count"
|
||||
TIMEMORY_FILE "trip_count.json"
|
||||
PERFETTO_METRIC "host;user"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
ARGS -l ${python_source_labels} -c ${python_source_count} -d
|
||||
${python_source_depth})
|
||||
|
||||
set(python_builtin_labels
|
||||
[run][builtin.py:28]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[fib][builtin.py:10]
|
||||
[inefficient][builtin.py:14])
|
||||
set(python_builtin_count
|
||||
5
|
||||
5
|
||||
10
|
||||
20
|
||||
40
|
||||
80
|
||||
160
|
||||
260
|
||||
220
|
||||
80
|
||||
10
|
||||
5)
|
||||
set(python_builtin_depth
|
||||
0
|
||||
1
|
||||
2
|
||||
3
|
||||
4
|
||||
5
|
||||
6
|
||||
7
|
||||
8
|
||||
9
|
||||
10
|
||||
1)
|
||||
|
||||
omnitrace_add_python_validation_test(
|
||||
NAME python-builtin
|
||||
TIMEMORY_METRIC "trip_count"
|
||||
TIMEMORY_FILE "trip_count.json"
|
||||
PERFETTO_METRIC "host;user"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
ARGS -l ${python_builtin_labels} -c ${python_builtin_count} -d
|
||||
${python_builtin_depth})
|
||||
math(EXPR _INDEX "${_INDEX} + 1")
|
||||
endforeach()
|
||||
@@ -0,0 +1,60 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# rccl tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
foreach(_TARGET ${RCCL_TEST_TARGETS})
|
||||
string(REPLACE "rccl-tests::" "" _NAME "${_TARGET}")
|
||||
string(REPLACE "_" "-" _NAME "${_NAME}")
|
||||
omnitrace_add_test(
|
||||
NAME rccl-test-${_NAME}
|
||||
TARGET ${_TARGET}
|
||||
LABELS "rccl-tests;rcclp"
|
||||
MPI ON
|
||||
GPU ON
|
||||
NUM_PROCS 1
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-v
|
||||
2
|
||||
-i
|
||||
8
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
-i
|
||||
8
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
-ME
|
||||
sysdeps
|
||||
--log-file
|
||||
rccl-test-${_NAME}.log
|
||||
RUN_ARGS -t
|
||||
1
|
||||
-g
|
||||
1
|
||||
-i
|
||||
10
|
||||
-w
|
||||
2
|
||||
-m
|
||||
2
|
||||
-p
|
||||
-c
|
||||
1
|
||||
-z
|
||||
-s
|
||||
1
|
||||
ENVIRONMENT "${_rccl_environment}")
|
||||
endforeach()
|
||||
@@ -0,0 +1,85 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# ROCm tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
set(OMNITRACE_ROCM_EVENTS_TEST
|
||||
"GRBM_COUNT,GPUBusy,SQ_WAVES,SQ_INSTS_VALU,VALUInsts,TCC_HIT_sum,TA_TA_BUSY[0]:device=0,TA_TA_BUSY[11]:device=0"
|
||||
)
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME transpose
|
||||
TARGET transpose
|
||||
MPI ${TRANSPOSE_USE_MPI}
|
||||
GPU ON
|
||||
NUM_PROCS ${NUM_PROCS}
|
||||
REWRITE_ARGS -e -v 2 --print-instructions -E uniform_int_distribution
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
-E
|
||||
uniform_int_distribution
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=ON")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME
|
||||
NAME transpose-loops
|
||||
TARGET transpose
|
||||
LABELS "loops"
|
||||
MPI ${TRANSPOSE_USE_MPI}
|
||||
GPU ON
|
||||
NUM_PROCS ${NUM_PROCS}
|
||||
REWRITE_ARGS
|
||||
-e
|
||||
-v
|
||||
2
|
||||
--label
|
||||
return
|
||||
args
|
||||
-l
|
||||
-i
|
||||
8
|
||||
-E
|
||||
uniform_int_distribution
|
||||
RUN_ARGS 2 100 50
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF"
|
||||
REWRITE_FAIL_REGEX "0 instrumented loops in procedure transpose")
|
||||
|
||||
if(OMNITRACE_USE_ROCPROFILER)
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME
|
||||
NAME transpose-rocprofiler
|
||||
TARGET transpose
|
||||
LABELS "rocprofiler"
|
||||
MPI ${TRANSPOSE_USE_MPI}
|
||||
GPU ON
|
||||
NUM_PROCS ${NUM_PROCS}
|
||||
REWRITE_ARGS -e -v 2 -E uniform_int_distribution
|
||||
ENVIRONMENT
|
||||
"${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_ROCM_EVENTS=${OMNITRACE_ROCM_EVENTS_TEST}"
|
||||
REWRITE_RUN_PASS_REGEX
|
||||
"rocprof-device-0-GRBM_COUNT.txt(.*)rocprof-device-0-GPUBusy.txt(.*)rocprof-device-0-SQ_WAVES.txt(.*)rocprof-device-0-SQ_INSTS_VALU.txt(.*)rocprof-device-0-VALUInsts.txt(.*)rocprof-device-0-TCC_HIT_sum.txt(.*)rocprof-device-0-TA_TA_BUSY_0.txt(.*)rocprof-device-0-TA_TA_BUSY_11.txt"
|
||||
)
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_RUNTIME
|
||||
NAME transpose-rocprofiler-no-roctracer
|
||||
TARGET transpose
|
||||
LABELS "rocprofiler"
|
||||
MPI ${TRANSPOSE_USE_MPI}
|
||||
GPU ON
|
||||
NUM_PROCS ${NUM_PROCS}
|
||||
REWRITE_ARGS -e -v 2 -E uniform_int_distribution
|
||||
ENVIRONMENT
|
||||
"${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF;OMNITRACE_USE_ROCTRACER=OFF;OMNITRACE_ROCM_EVENTS=${OMNITRACE_ROCM_EVENTS_TEST}"
|
||||
REWRITE_RUN_PASS_REGEX
|
||||
"rocprof-device-0-GRBM_COUNT.txt(.*)rocprof-device-0-GPUBusy.txt(.*)rocprof-device-0-SQ_WAVES.txt(.*)rocprof-device-0-SQ_INSTS_VALU.txt(.*)rocprof-device-0-VALUInsts.txt(.*)rocprof-device-0-TCC_HIT_sum.txt(.*)rocprof-device-0-TA_TA_BUSY_0.txt(.*)rocprof-device-0-TA_TA_BUSY_11.txt"
|
||||
REWRITE_RUN_FAIL_REGEX "roctracer.txt")
|
||||
endif()
|
||||
@@ -262,7 +262,11 @@ function(OMNITRACE_WRITE_TEST_CONFIG _FILE _ENV)
|
||||
set(_FILE_CONTENTS)
|
||||
set(_ENV_CONTENTS)
|
||||
|
||||
set(_DEBUG_SETTINGS ON)
|
||||
foreach(_VAL ${${_ENV}})
|
||||
if("${_VAL}" MATCHES "^OMNITRACE_DEBUG_SETTINGS=")
|
||||
set(_DEBUG_SETTINGS OFF)
|
||||
endif()
|
||||
if("${_VAL}" MATCHES "^OMNITRACE_" AND NOT "${_VAL}" MATCHES "${_ENV_ONLY}")
|
||||
set(_FILE_CONTENTS "${_FILE_CONTENTS}${_VAL}\n")
|
||||
else()
|
||||
@@ -290,7 +294,9 @@ OMNITRACE_ROCTRACER_HSA_ACTIVITY = ON
|
||||
${_FILE_CONTENTS}
|
||||
")
|
||||
list(APPEND _ENV_CONTENTS "OMNITRACE_CONFIG_FILE=${_CONFIG_FILE}")
|
||||
list(APPEND _ENV_CONTENTS "OMNITRACE_DEBUG_SETTINGS=1")
|
||||
if(_DEBUG_SETTINGS)
|
||||
list(APPEND _ENV_CONTENTS "OMNITRACE_DEBUG_SETTINGS=1")
|
||||
endif()
|
||||
set(${_ENV}
|
||||
"${_ENV_CONTENTS}"
|
||||
PARENT_SCOPE)
|
||||
@@ -336,25 +342,24 @@ endmacro()
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
function(OMNITRACE_ADD_TEST)
|
||||
foreach(_PREFIX PRELOAD RUNTIME REWRITE REWRITE_RUN BASELINE)
|
||||
foreach(_PREFIX SAMPLING RUNTIME REWRITE REWRITE_RUN BASELINE)
|
||||
foreach(_TYPE PASS FAIL SKIP)
|
||||
list(APPEND _REGEX_OPTS "${_PREFIX}_${_TYPE}_REGEX")
|
||||
endforeach()
|
||||
endforeach()
|
||||
set(_KWARGS REWRITE_ARGS RUNTIME_ARGS RUN_ARGS ENVIRONMENT LABELS PROPERTIES
|
||||
${_REGEX_OPTS})
|
||||
set(_KWARGS REWRITE_ARGS RUNTIME_ARGS SAMPLING_ARGS RUN_ARGS ENVIRONMENT LABELS
|
||||
PROPERTIES ${_REGEX_OPTS})
|
||||
|
||||
cmake_parse_arguments(
|
||||
TEST
|
||||
"SKIP_BASELINE;SKIP_PRELOAD;SKIP_REWRITE;SKIP_RUNTIME;SKIP_SAMPLING;FORCE_SAMPLING"
|
||||
"NAME;TARGET;MPI;GPU;NUM_PROCS;REWRITE_TIMEOUT;RUNTIME_TIMEOUT;PRELOAD"
|
||||
"${_KWARGS}"
|
||||
TEST "SKIP_BASELINE;SKIP_SAMPLING;SKIP_REWRITE;SKIP_RUNTIME"
|
||||
"NAME;TARGET;MPI;GPU;NUM_PROCS;REWRITE_TIMEOUT;RUNTIME_TIMEOUT" "${_KWARGS}"
|
||||
${ARGN})
|
||||
|
||||
foreach(_PREFIX PRELOAD RUNTIME REWRITE REWRITE_RUN BASELINE)
|
||||
foreach(_PREFIX SAMPLING RUNTIME REWRITE REWRITE_RUN BASELINE)
|
||||
if("${${_PREFIX}_FAIL_REGEX}" STREQUAL "")
|
||||
set(${_PREFIX}_FAIL_REGEX
|
||||
"(### ERROR ###|address of faulting memory reference)")
|
||||
"(### ERROR ###|address of faulting memory reference|exiting with non-zero exit code)"
|
||||
)
|
||||
endif()
|
||||
endforeach()
|
||||
|
||||
@@ -387,8 +392,8 @@ function(OMNITRACE_ADD_TEST)
|
||||
set(TEST_RUNTIME_TIMEOUT 300)
|
||||
endif()
|
||||
|
||||
if(NOT TEST_PRELOAD_TIMEOUT)
|
||||
set(TEST_PRELOAD_TIMEOUT 120)
|
||||
if(NOT TEST_SAMPLING_TIMEOUT)
|
||||
set(TEST_SAMPLING_TIMEOUT 120)
|
||||
endif()
|
||||
|
||||
if(NOT DEFINED TEST_ENVIRONMENT OR "${TEST_ENVIRONMENT}" STREQUAL "")
|
||||
@@ -448,11 +453,12 @@ function(OMNITRACE_ADD_TEST)
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
if(NOT TEST_SKIP_PRELOAD)
|
||||
if(NOT TEST_SKIP_SAMPLING)
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-preload
|
||||
COMMAND ${COMMAND_PREFIX} $<TARGET_FILE:omnitrace-sample> --
|
||||
$<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
|
||||
NAME ${TEST_NAME}-sampling
|
||||
COMMAND
|
||||
${COMMAND_PREFIX} $<TARGET_FILE:omnitrace-sample> ${TEST_SAMPLE_ARGS}
|
||||
-- $<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
@@ -473,23 +479,6 @@ function(OMNITRACE_ADD_TEST)
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
if(TEST_FORCE_SAMPLING OR (NOT TEST_SKIP_REWRITE AND NOT TEST_SKIP_SAMPLING))
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-binary-rewrite-sampling
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-instrument> -o
|
||||
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_NAME}.samp -M sampling
|
||||
${TEST_REWRITE_ARGS} -- $<TARGET_FILE:${TEST_TARGET}>
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-binary-rewrite-sampling-run
|
||||
COMMAND
|
||||
${COMMAND_PREFIX} $<TARGET_FILE:omnitrace-run> --
|
||||
$<TARGET_FILE_DIR:${TEST_TARGET}>/${TEST_NAME}.samp ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
if(NOT TEST_SKIP_RUNTIME AND NOT OMNITRACE_USE_SANITIZER)
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-runtime-instrument
|
||||
@@ -498,34 +487,16 @@ function(OMNITRACE_ADD_TEST)
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
if((TEST_FORCE_SAMPLING OR (NOT TEST_SKIP_RUNTIME AND NOT TEST_SKIP_SAMPLING))
|
||||
AND NOT OMNITRACE_USE_SANITIZER)
|
||||
add_test(
|
||||
NAME ${TEST_NAME}-runtime-instrument-sampling
|
||||
COMMAND
|
||||
$<TARGET_FILE:omnitrace-instrument> -M sampling ${TEST_RUNTIME_ARGS}
|
||||
-- $<TARGET_FILE:${TEST_TARGET}> ${TEST_RUN_ARGS}
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
if(TEST ${TEST_NAME}-binary-rewrite-run)
|
||||
set_tests_properties(${TEST_NAME}-binary-rewrite-run
|
||||
PROPERTIES DEPENDS ${TEST_NAME}-binary-rewrite)
|
||||
endif()
|
||||
|
||||
if(TEST ${TEST_NAME}-binary-rewrite-sampling-run)
|
||||
set_tests_properties(${TEST_NAME}-binary-rewrite-sampling-run
|
||||
PROPERTIES DEPENDS ${TEST_NAME}-binary-rewrite-sampling)
|
||||
endif()
|
||||
|
||||
foreach(
|
||||
_TEST
|
||||
baseline preload binary-rewrite binary-rewrite-run binary-rewrite-sampling
|
||||
binary-rewrite-sampling-run runtime-instrument runtime-instrument-sampling)
|
||||
foreach(_TEST baseline sampling binary-rewrite binary-rewrite-run
|
||||
runtime-instrument)
|
||||
string(REGEX REPLACE "-run(-|/)" "\\1" _prefix "${TEST_NAME}-${_TEST}/")
|
||||
set(_labels "${_TEST}")
|
||||
string(REPLACE "-run" "" _labels "${_TEST}")
|
||||
string(REPLACE "-sampling" ";sampling" _labels "${_labels}")
|
||||
if(TEST_TARGET)
|
||||
list(APPEND _labels "${TEST_TARGET}")
|
||||
endif()
|
||||
@@ -539,14 +510,14 @@ function(OMNITRACE_ADD_TEST)
|
||||
"OMNITRACE_OUTPUT_PREFIX=${_prefix}")
|
||||
|
||||
set(_timeout ${TEST_REWRITE_TIMEOUT})
|
||||
if("${_TEST}" MATCHES "preload")
|
||||
set(_timeout ${TEST_PRELOAD_TIMEOUT})
|
||||
if("${_TEST}" MATCHES "sampling")
|
||||
set(_timeout ${TEST_SAMPLING_TIMEOUT})
|
||||
elseif("${_TEST}" MATCHES "runtime-instrument")
|
||||
set(_timeout ${TEST_RUNTIME_TIMEOUT})
|
||||
endif()
|
||||
|
||||
set(_props)
|
||||
if("${_TEST}" MATCHES "run|preload|baseline")
|
||||
if("${_TEST}" MATCHES "run|sampling|baseline")
|
||||
set(_props ${TEST_PROPERTIES})
|
||||
if(NOT "RUN_SERIAL" IN_LIST _props)
|
||||
list(APPEND _props RUN_SERIAL ON)
|
||||
@@ -561,13 +532,13 @@ function(OMNITRACE_ADD_TEST)
|
||||
set(_REGEX_VAR REWRITE)
|
||||
elseif("${_TEST}" MATCHES "baseline")
|
||||
set(_REGEX_VAR BASELINE)
|
||||
elseif("${_TEST}" MATCHES "preload")
|
||||
set(_REGEX_VAR PRELOAD)
|
||||
elseif("${_TEST}" MATCHES "sampling")
|
||||
set(_REGEX_VAR SAMPLING)
|
||||
else()
|
||||
set(_REGEX_VAR)
|
||||
endif()
|
||||
|
||||
if("${_TEST}" MATCHES "binary-rewrite-run|runtime-instrument|preload")
|
||||
if("${_TEST}" MATCHES "binary-rewrite-run|runtime-instrument|sampling")
|
||||
omnitrace_patch_sanitizer_environment(_environ)
|
||||
endif()
|
||||
|
||||
@@ -632,6 +603,12 @@ function(OMNITRACE_ADD_CAUSAL_TEST)
|
||||
set(TEST_CAUSAL_VALIDATE_TIMEOUT 60)
|
||||
endif()
|
||||
|
||||
if("${TEST_CAUSAL_FAIL_REGEX}" STREQUAL "")
|
||||
set(TEST_CAUSAL_FAIL_REGEX
|
||||
"(### ERROR ###|address of faulting memory reference|exiting with non-zero exit code)"
|
||||
)
|
||||
endif()
|
||||
|
||||
if(TARGET ${TEST_TARGET})
|
||||
set(COMMAND_PREFIX $<TARGET_FILE:omnitrace-causal> --reset -m ${TEST_CAUSAL_MODE}
|
||||
${TEST_CAUSAL_ARGS} --)
|
||||
@@ -692,7 +669,10 @@ function(OMNITRACE_ADD_CAUSAL_TEST)
|
||||
"OMNITRACE_OUTPUT_PREFIX=${_prefix}"
|
||||
"OMNITRACE_CI=ON"
|
||||
"OMNITRACE_USE_PID=OFF"
|
||||
"OMNITRACE_THREAD_POOL_SIZE=1"
|
||||
"OMNITRACE_THREAD_POOL_SIZE=0"
|
||||
"OMNITRACE_VERBOSE=1"
|
||||
"OMNITRACE_DL_VERBOSE=0"
|
||||
"OMNITRACE_DEBUG_SETTINGS=0"
|
||||
"${TEST_ENVIRONMENT}")
|
||||
|
||||
set(_timeout ${TEST_CAUSAL_TIMEOUT})
|
||||
@@ -954,6 +934,9 @@ function(OMNITRACE_ADD_VALIDATION_TEST)
|
||||
endforeach()
|
||||
|
||||
list(APPEND TEST_DEPENDS "${TEST_NAME}")
|
||||
if("${TEST_NAME}" MATCHES "-binary-rewrite")
|
||||
list(APPEND TEST_DEPENDS "${TEST_NAME}-run")
|
||||
endif()
|
||||
|
||||
if(NOT TEST_PASS_REGEX)
|
||||
set(TEST_PASS_REGEX
|
||||
|
||||
@@ -0,0 +1,114 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# time-window tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
if(_OS_RELEASE STREQUAL "ubuntu-18.04")
|
||||
set(_TRACE_WINDOW_SKIP SKIP_RUNTIME)
|
||||
endif()
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING ${_TRACE_WINDOW_SKIP}
|
||||
NAME trace-time-window
|
||||
TARGET trace-time-window
|
||||
REWRITE_ARGS -e -v 2 --caller-include inner -i 4096
|
||||
RUNTIME_ARGS -e -v 1 --caller-include inner -i 4096
|
||||
LABELS "time-window"
|
||||
ENVIRONMENT "${_window_environment};OMNITRACE_TRACE_DURATION=1.25")
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-binary-rewrite
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
FAIL_REGEX "outer_d"
|
||||
ARGS -l
|
||||
trace-time-window.inst
|
||||
outer_a
|
||||
outer_b
|
||||
outer_c
|
||||
-c
|
||||
1
|
||||
1
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
1
|
||||
1
|
||||
1
|
||||
-p)
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-runtime-instrument
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
FAIL_REGEX "outer_d"
|
||||
ARGS -l
|
||||
trace-time-window
|
||||
outer_a
|
||||
outer_b
|
||||
outer_c
|
||||
-c
|
||||
1
|
||||
1
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
1
|
||||
1
|
||||
1
|
||||
-p)
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING ${_TRACE_WINDOW_SKIP}
|
||||
NAME trace-time-window-delay
|
||||
TARGET trace-time-window
|
||||
REWRITE_ARGS -e -v 2 --caller-include inner -i 4096
|
||||
RUNTIME_ARGS -e -v 1 --caller-include inner -i 4096
|
||||
LABELS "time-window"
|
||||
ENVIRONMENT
|
||||
"${_window_environment};OMNITRACE_TRACE_DELAY=0.75;OMNITRACE_TRACE_DURATION=0.75")
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-delay-binary-rewrite
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
ARGS -l
|
||||
outer_c
|
||||
outer_d
|
||||
-c
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
0
|
||||
-p)
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-delay-runtime-instrument
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
ARGS -l
|
||||
outer_c
|
||||
outer_d
|
||||
-c
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
0
|
||||
-p)
|
||||
@@ -0,0 +1,31 @@
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# User API tests
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
omnitrace_add_test(
|
||||
NAME user-api
|
||||
TARGET user-api
|
||||
LABELS "loops"
|
||||
REWRITE_ARGS -e -v 2 -l --min-instructions=8 -E custom_push_region
|
||||
RUNTIME_ARGS
|
||||
-e
|
||||
-v
|
||||
1
|
||||
-l
|
||||
--min-instructions=8
|
||||
-E
|
||||
custom_push_region
|
||||
--label
|
||||
file
|
||||
line
|
||||
return
|
||||
args
|
||||
RUN_ARGS 10 ${NUM_THREADS} 1000
|
||||
ENVIRONMENT "${_base_environment};OMNITRACE_CRITICAL_TRACE=OFF"
|
||||
REWRITE_RUN_PASS_REGEX "Pushing custom region :: run.10. x 1000"
|
||||
RUNTIME_PASS_REGEX "Pushing custom region :: run.10. x 1000"
|
||||
SAMPLING_PASS_REGEX "Pushing custom region :: run.10. x 1000"
|
||||
BASELINE_FAIL_REGEX "Pushing custom region"
|
||||
REWRITE_FAIL_REGEX "0 instrumented loops in procedure")
|
||||
@@ -9,7 +9,7 @@ import argparse
|
||||
from collections import OrderedDict
|
||||
|
||||
|
||||
num_stddev = 1
|
||||
num_stddev = 1.0
|
||||
|
||||
|
||||
def mean(_data):
|
||||
@@ -21,7 +21,7 @@ def stddev(_data):
|
||||
return 0.0
|
||||
_mean = mean(_data)
|
||||
_variance = sum([((x - _mean) ** 2) for x in _data]) / float(len(_data))
|
||||
return _variance**0.5
|
||||
return float(num_stddev) * math.sqrt(_variance)
|
||||
|
||||
|
||||
def simpsons_rule(a, b, fa, fb):
|
||||
@@ -66,21 +66,28 @@ class validation(object):
|
||||
return None
|
||||
|
||||
_tolerance = self.tolerance
|
||||
|
||||
if _ci is True and _virt_speedup > 10:
|
||||
"""On GitHub Action servers, you typically only get one core with two hyperthreads.
|
||||
The hyperthreading causes the speedup potential to drop off at higher virtual speedups
|
||||
so we consider
|
||||
_reason = "[unspecified reason]"
|
||||
if _ci is True:
|
||||
"""On GitHub Action servers, you typically only get two CPUs, which may be one
|
||||
core with two hyperthreads. The hyperthreading can causes the speedup potential
|
||||
to drop. Furthermore, these are typically shared resources so the runtime may
|
||||
vary significantly. Thus, always account for stddev to prevent failures due to
|
||||
these causes
|
||||
"""
|
||||
_tolerance += max([_base_speedup_stddev, _prog_speedup_stddev])
|
||||
_reason = "results obtained on a shared CI system... potentially artificially deflating speedup predictions"
|
||||
elif _base_speedup_stddev > self.tolerance:
|
||||
_tolerance += math.sqrt(_base_speedup_stddev)
|
||||
_reason = (
|
||||
f"large standard deviation of the baseline ({_base_speedup_stddev:.3f})"
|
||||
)
|
||||
elif _prog_speedup_stddev > 1.0:
|
||||
_tolerance += math.sqrt(_prog_speedup_stddev)
|
||||
_reason = f"large standard deviation of the program speedup ({_prog_speedup_stddev:.3f})"
|
||||
|
||||
if _tolerance > self.tolerance:
|
||||
sys.stderr.write(
|
||||
f" [{_exp_name}][{_pp_name}][{_virt_speedup}] Tolerance adjusted due to stddev or to account for hyperthreading on CI systems ({self.tolerance:.3f} increased to {_tolerance:.3f})...\n"
|
||||
f" [{_exp_name}][{_pp_name}][{_virt_speedup}] Tolerance increased: {_reason} ({self.tolerance:.3f} increased to {_tolerance:.3f})...\n"
|
||||
)
|
||||
|
||||
def _compute(_speedup_v, _tolerance_v):
|
||||
@@ -195,9 +202,7 @@ class line_speedup(object):
|
||||
if self.data is None or self.base is None:
|
||||
return f"{self.name}"
|
||||
_line_speedup = self.compute_speedup()
|
||||
_line_stddev = (
|
||||
float(num_stddev) * self.compute_speedup_stddev()
|
||||
) # 3 stddev == 99.87%
|
||||
_line_stddev = self.compute_speedup_stddev() # 3 stddev == 99.87%
|
||||
_name = self.get_name()
|
||||
return f"[{_name}][{self.prog}][{self.data.speedup:3}] speedup: {_line_speedup:6.1f} +/- {_line_stddev:6.2f} %"
|
||||
|
||||
@@ -345,7 +350,6 @@ def compute_speedups(_data, args):
|
||||
for selected, pitr in _data.items():
|
||||
for progpt, ditr in pitr.items():
|
||||
if 0 not in ditr.keys():
|
||||
# print(f"missing baseline data for {progpt} in {selected}...")
|
||||
continue
|
||||
_baseline = ditr[0].mean()
|
||||
for speedup, itr in ditr.items():
|
||||
@@ -353,8 +357,9 @@ def compute_speedups(_data, args):
|
||||
continue
|
||||
if speedup != itr.speedup:
|
||||
raise ValueError(f"in {selected}: {speedup} != {itr.speedup}")
|
||||
_val = line_speedup(selected, progpt, itr, ditr[0])
|
||||
ret.append(_val)
|
||||
if len(itr) >= args.min_experiments:
|
||||
_val = line_speedup(selected, progpt, itr, ditr[0])
|
||||
ret.append(_val)
|
||||
|
||||
ret.sort()
|
||||
_last_name = None
|
||||
@@ -400,6 +405,8 @@ def get_validations(args):
|
||||
def main():
|
||||
import argparse
|
||||
|
||||
global num_stddev
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument(
|
||||
"-e", "--experiments", type=str, help="Regex for experiments", default=".*"
|
||||
@@ -414,6 +421,13 @@ def main():
|
||||
parser.add_argument(
|
||||
"-n", "--num-points", type=int, help="Minimum number of data points", default=5
|
||||
)
|
||||
parser.add_argument(
|
||||
"-m",
|
||||
"--min-experiments",
|
||||
type=int,
|
||||
help="Minimum number of experiments per speedup (e.g. do not display speedups when there are fewer than X experiments at this speedup)",
|
||||
default=2,
|
||||
)
|
||||
parser.add_argument(
|
||||
"-i", "--input", type=str, nargs="*", help="Input file(s)", required=True
|
||||
)
|
||||
@@ -428,9 +442,9 @@ def main():
|
||||
parser.add_argument(
|
||||
"-d",
|
||||
"--stddev",
|
||||
type=int,
|
||||
type=float,
|
||||
help="Number of standard deviations to report",
|
||||
default=1,
|
||||
default=1.0,
|
||||
)
|
||||
parser.add_argument(
|
||||
"-v",
|
||||
@@ -440,6 +454,12 @@ def main():
|
||||
help="Validate speedup: {experiment regex} {progress-point regex} {virtual-speedup} {expected-speedup} {tolerance}",
|
||||
default=[],
|
||||
)
|
||||
parser.add_argument(
|
||||
"--samples",
|
||||
type=float,
|
||||
help="Report samples within this percentage of the peak (0.0, 100.0] (default: 95 percent)",
|
||||
default=95.0,
|
||||
)
|
||||
parser.add_argument(
|
||||
"--ci",
|
||||
action="store_true",
|
||||
@@ -454,6 +474,13 @@ def main():
|
||||
num_stddev = args.stddev
|
||||
num_speedups = len(args.speedups)
|
||||
|
||||
percent_samples = args.samples
|
||||
if not percent_samples > 0.0 and not percent_samples <= 100.0:
|
||||
raise ValueError(
|
||||
f"Invalid samples value: {percent_samples}. Supported range: 0.0 < x <= 100.0"
|
||||
)
|
||||
percent_samples = 1.0 - (percent_samples / 100.0)
|
||||
|
||||
if num_speedups > 0 and args.num_points > num_speedups:
|
||||
args.num_points = num_speedups
|
||||
|
||||
@@ -466,9 +493,11 @@ def main():
|
||||
samp = process_samples(samp, inp_data)
|
||||
|
||||
print("Samples:")
|
||||
width = max([len(x) for x in samp.keys()])
|
||||
for name, count in sorted(samp.items()):
|
||||
print(f" {name:{width}} :: {count}")
|
||||
width = max([int(math.log10(x) + 1) for _, x in samp.items()])
|
||||
samp_peak = max([count for _, count in samp.items()])
|
||||
for name, count in sorted(samp.items(), key=lambda x: x[1], reverse=True):
|
||||
if count >= samp_peak * percent_samples:
|
||||
print(f" {count:{width}} :: {name}")
|
||||
|
||||
results = compute_speedups(data, args)
|
||||
print("")
|
||||
|
||||
مرجع در شماره جدید
Block a user