Files
rocm-systems/tests/validate-causal-json.py
T
Jonathan R. Madsen 9de3a6b0b4 Linux Perf Support + Causal Profiling Updates (#276)
* causal backtrace updates

- fix initial causal sampling period value

* causal delay updates

- tweak handling of sleep_for_overhead

* Fix experiment global scaling for prog pts

- results in drastically improved predictions

* pthread_mutex_gotcha updates

- disable all wrappers during causal profiling

* validate-causal-json.py updates

- support decimal stddev
- fix setting stddev from command-line

* causal perform_experiment_impl update

- handle start failing because finalizing

* deprecate causal::component::sample_rate

- appears to not help at all

* Rework sample info

* Increase causal unwind_depth

- use OMNITRACE_MAX_UNWIND_DEPTH

* validate-causal-json updates

- min experiments
  - exclude reporting predictions with less than X experiments at a given speedup
- percent samples
  - only print samples within X% of the peak (default: 95%)

* Update timemory submodule

- extensions to sampling for signals delivered via non-timer method
  - e.g. via HW counter overflow

* dwarf_entry::operator< updates

- sort via file

* causal profiling docs updates

- info about backends
- info about installing/enabling perf

* config updates: causal backend

- CausalBackend enum
- OMNITRACE_CAUSAL_BACKEND: perf, timer, auto
- omnitrace-causal option: --backend

* debug update

- use spin_mutex instead of std::mutex

* address_range::contains update

- range from 0-100 contains range from 10-100 but was returning false because high was == 100 not < 100

* symbol::operator< update

- handle load address differences

* sampling updates (non-causal)

- update get_timer to get_trigger + dynamic_cast

* container::static_vector updates

- support construction from container::c_array
- update_size private member func for handling atomic m_size

* Move perf files

- moved library/causal/perf.{hpp,cpp} to library/perf.{hpp,cpp}

* causal example update

- created impl.hpp (forward decls)
- renamed {cpu,rng}_func_impl to {cpu,rng}_impl_func
- only create two threads which run N iterations instead of two threads each iteration

* Update timemory submodule

- updates to unwind::processed_entry
- updates to procfs::maps

* Updated causal documentation

- fixed line numbers changed by modifications to causal example

* omnitrace-causal exe updates

- set OMNITRACE_THREAD_POOL_SIZE to zero by default

* core/containers updates

- static_vector: provide data() member function
- c_array pop_front() and pop_back() member functions

* core: config and argparse updates + perf

- core/perf.{hpp,cpp}
  - forward decl of enums
  - config-related capabilities
- argparse: --sample-overflow
- renamed some config functions
  - e.g. get_sampling_cpu_freq -> get_sampling_cputime_freq
- added config settings related to overflow sampling via perf
- added timer_sampling and overflow_sampling categories

* Update timemory submodule

- sampling allocator flushing

* binary updates

- lookup_ipaddr_entry
- use bfd_find_nearest_line instead of bfd_find_nearest_line_discriminator
  - discriminators are not used
- explicit instantiations of inlined_symbol::serialize

* Bump VERSION to 1.10.0

* sampling and perf updates

- support overflow sampling via Linux Perf
- update perf namespace
- update perf::perf_event
  - update record ctor: pointer instead of const ref
  - update open member func: return optional string
  - add m_batch_size member variable
- sampling updates
  - support overflow sampling
  - flush allocators
  - increase buffer size from 1024 to 2048
  - restructure post-processing in light of perf overflow supports
  - improve offload memory usage only load buffers for thread
  - load_offload_buffer(tid) uses thread-specific filepos
- component updates
  - backtrace_metrics::operator-=
  - backtrace_metrics::operator-
  - backtrace::sample does not record for overflow signal
  - callchain: perf overflow sample

* core updates

- component::sampling_percent does not report self + uses_percent_units

* causal updates

- tweak get_line_info
- overloads for set_current_selection (uint64_t, c_array, std::array)
- delay
  - use sampling::pause/sampling::resume
- experiment
  - experiment::sample derives from unwind::processed_entry
  - experiment::samples is vector instead of set
  - fixed samples
  - overloads for is_selected (uint64_t, c_array, std::array)
  - scaling factor defaults to 100 instead of 50
  - serialize updates follow change to experiment::sample
  - modify algorithm for increasing/decreasing experiment length
- sample_data
  - use map<uintptr, uint64_t> instead of set<sample_data>
  - get_samples returns vector<sample_data> instead of set<sample_data>
- sampling
  - support overflow via Linux Perf
  - update causal_offload_buffer
  - flush sampling allocator
- backtrace
  - overflow component

* libomnitrace-dl updates

- handle dl::InstrumentMode::PythonProfile

* testing updates (causal)

- causal line 155 -> causal line 100
- causal line 165 -> causal line 110

* formatting

* exit_gotcha updates

- exit_info for abort()
- message about non-zero exit code

* testing updates

- fail regex for causal tests
- validate-causal-json: >= min_experiments instead of > min_experiments
- handle OMNITRACE_DEBUG_SETTINGS in omnitrace_write_test_config

* causal sampling updates

- add new lines where appropriate

* causal data updates

- reorder diagnostic info when experiment fails to start

* binary updates

- symbol address range from address to address + symsize + 1
  - add 1 based on debug info

* causal data updates

- sample_selection wait_ns defaults to 1,000 instead of 10,000
- sample_selection wait scaled by iteration number
- save_line_info_impl verbosity
- print latest_eligible_pc when experiment does not start

* causal sampling + component updates

- perf backend disables component::backtrace
- ensure get_sampling_(realtime|cputime|overflow)_signal do not malloc

* causal: remove period stats

* validate-causal-json update

- fix --help

* causal data updates

- improve eligible pc history reporting when experiment fails to start

* causal data updates

- fix compute_eligible_lines_impl
  - eligible address ranges returning too many ranges
  - occasionally, overwrite all *true* eligible address ranges

* causal data updates

- reduce scoped ranges to symbol ranges
- is_eligible_address() returns true contains (not just coarse)
- revert some sample_selection behavior

* binary address_multirange updates

- make coarse_range private
- fix operator+=(pair<coarse, uintptr_t>)

* causal example update

- fix nsync to default to once per iteration

* binary analysis updates

- tweak header file includes

* causal updates

- remove factoring in sleep_for_overhead
- invoke delay::process() even if experiment is not active

* causal data updates

- update latest_eligible_pc structure

* update omnitrace-install.py.in

- fix support for fedora
  - /etc/os-release does not have ID_LIKE
  - fallback to RHEL 8.7 if version not specified

* update omnitrace-install.py.in

- fix support for debian
  - /etc/os-release does not have ID_LIKE
  - version mapping

* Update documentation

- update docs on installation

* causal data and experiment updates

- data: reset_sample_selection

* causal set_current_selection debugging

- debug messages for failed e2e runs

* causal data and backtrace component updates

- data: set_current_selection returns the number of eligible addresses added
- backtrace: if cputime signal has selected zero IPs > 5x, then realtime signal starts contributing call-stacks

* core library updates

- move config::parse_numeric_range to utility namespace
- add core/utility.cpp
- support range:increment, e.g. 5-25:10 expands to '5 15 25' instead of '5 10 15 20 25'

* omnitrace-causal update

- end-to-end expands all speedups
- support range:increment in speedups

* causal backtrace updates

- remove select_ival (realtime signal always contributes when select_count == 0)

* containers: static_vector update

- explicit c_array constructor
- explicit std::array constructor

* causal data updates

- remove set_current_selection(uint64_t)
- remove set_current_selection(std::array)
- sample_selection increase default wait time
- report eligible PC candidates
- move reset_sample_selection to perform_experiment_impl
- decrease latest_eligible_pc array size
- set_current_selection does not guard for experiment::active

* core debug updates

- OMNITRACE_PRINT_COLOR macros

* causal data updates

- tweak to experiment never started message

* causal gotcha updates

- remove unused code

* critical trace updates

- remove unused code

* omnitrace-causal

- OMNITRACE_LAUNCHER

* causal data updates

- don't fail on end-to-end + omnitrace-causal

* causal backtrace updates

- reintroduce select_ival behavior

* causal data updates

- tweak verbose messages about number of PC candidates

* core mproc updates

- utilities for waiting on child PID and diagnosing status
  - omnitrace::mproc::wait_pid
  - omnitrace::mproc::diagnose_status

* omnitrace-run updates

- support --fork argument for executing via fork in current process + execvpe on child instead of execvpe in current process

* omnitrace-causal updates

- wait_pid and diagnose_status just call equivalent functions in omnitrace::mproc

* ubuntu-focal workflow update

- attempt to launch ubuntu-focal-codecov job with CAP_SYS_ADMIN and use perf backend

* tests reorg and updates

- remove binary-rewrite-sampling and runtime-instrument-sampling tests
- rename *-preload tests (which use omnitrace-sample exe) to *-sampling
- split tests/CMakeLists.txt into several tests/omnitrace-<category>-tests.cmake files
- tweak to causal-both-omni-func test
  - add args: -n 2 -b timer

* update validate-causal-json.py

- better reasoning info for adjusting tolerance
- always apply tolerance adjustments in CI mode

* causal e2e tests update

- add label "causal-e2e" label
- tweak params
  - old: 80 12 432525 500000000
  - new: 80 50 432525 100000000
- disable processor affinity for slow-func/line-100 tests
  - artificially inflates some speedups with perf

* unblocking_gotcha updates

- overload operator() according to gotcha function index

* blocking_gotcha updates

- overload operator() according to gotcha function index
- fix bug where potentially post block functors (e.g. pthread_mutex_trylock) throw error if lock is not acquired.

* parse_numeric_range update

- support unordered_set

* config update

- OMNITRACE_DEBUG_{TIDS,PIDS} use parse_numeric_range
2023-04-13 02:14:35 -05:00

561 строка
18 KiB
Python
Исполняемый файл

#!/usr/bin/env python3
import os
import re
import sys
import json
import math
import argparse
from collections import OrderedDict
num_stddev = 1.0
def mean(_data):
return sum(_data) / float(len(_data)) if len(_data) > 0 else 0.0
def stddev(_data):
if len(_data) == 0:
return 0.0
_mean = mean(_data)
_variance = sum([((x - _mean) ** 2) for x in _data]) / float(len(_data))
return float(num_stddev) * math.sqrt(_variance)
def simpsons_rule(a, b, fa, fb):
"""Simple numerical integration via Simpson's rule
https://en.m.wikipedia.org/wiki/Simpson%27s_rule
"""
slope = (fb - fa) / (b - a)
# f(x) at midpoint
fm = fa + (0.5 * (b - a) * slope)
factor = (b - a) / 6.0
# print(
# f"[{a:8.3f} : {b:8.3f}|{fa:8.3f} : {fb:8.3f}][slope={slope:8.3f}] {factor:8.3f} * ({fa:8.3f} + (4.0 * {fm:8.3f}) + {fb:8.3f})"
# )
return factor * (fa + (4.0 * fm) + fb)
class validation(object):
def __init__(self, _exp_re, _pp_re, _virt, _expected, _tolerance):
self.experiment_filter = re.compile(_exp_re)
self.progress_pt_filter = re.compile(_pp_re)
self.virtual_speedup = int(_virt)
self.program_speedup = float(_expected)
self.tolerance = float(_tolerance)
def validate(
self,
_exp_name,
_pp_name,
_virt_speedup,
_prog_speedup,
_prog_speedup_stddev,
_base_speedup_stddev,
_ci=False,
):
if (
not re.search(self.experiment_filter, _exp_name)
or not re.search(self.progress_pt_filter, _pp_name)
or _virt_speedup != self.virtual_speedup
):
return None
_tolerance = self.tolerance
_reason = "[unspecified reason]"
if _ci is True:
"""On GitHub Action servers, you typically only get two CPUs, which may be one
core with two hyperthreads. The hyperthreading can causes the speedup potential
to drop. Furthermore, these are typically shared resources so the runtime may
vary significantly. Thus, always account for stddev to prevent failures due to
these causes
"""
_tolerance += max([_base_speedup_stddev, _prog_speedup_stddev])
_reason = "results obtained on a shared CI system... potentially artificially deflating speedup predictions"
elif _base_speedup_stddev > self.tolerance:
_tolerance += math.sqrt(_base_speedup_stddev)
_reason = (
f"large standard deviation of the baseline ({_base_speedup_stddev:.3f})"
)
elif _prog_speedup_stddev > 1.0:
_tolerance += math.sqrt(_prog_speedup_stddev)
_reason = f"large standard deviation of the program speedup ({_prog_speedup_stddev:.3f})"
if _tolerance > self.tolerance:
sys.stderr.write(
f" [{_exp_name}][{_pp_name}][{_virt_speedup}] Tolerance increased: {_reason} ({self.tolerance:.3f} increased to {_tolerance:.3f})...\n"
)
def _compute(_speedup_v, _tolerance_v):
return _speedup_v >= (self.program_speedup - _tolerance_v) and _speedup_v <= (
self.program_speedup + _tolerance_v
)
return _compute(_prog_speedup, _tolerance)
class throughput_point(object):
def __init__(self, _speedup):
self.speedup = _speedup
self.delta = []
self.duration = []
def __iadd__(self, _data):
self.delta += [float(_data[0])]
self.duration += [float(_data[1])]
def __len__(self):
return len(self.duration)
def __eq__(self, rhs):
return self.speedup == rhs.speedup
def __neq__(self, rhs):
return not self == rhs
def __lt__(self, rhs):
return self.speedup < rhs.speedup
def get_data(self):
return [x / y for x, y in zip(self.duration, self.delta)]
def mean(self):
return sum(self.duration) / sum(self.delta)
class latency_point(object):
def __init__(self, _speedup):
self.speedup = _speedup
self.arrivals = []
self.departures = []
self.duration = []
def __iadd__(self, _data):
self.arrivals += [float(_data[0])]
self.departures += [float(_data[1])]
self.duration += [float(_data[2])]
def __len__(self):
return len(self.duration)
def __eq__(self, rhs):
return self.speedup == rhs.speedup
def __neq__(self, rhs):
return not self == rhs
def __lt__(self, rhs):
return self.speedup < rhs.speedup
def get_data(self):
_duration = sum(self.duration)
return [y / x for x, y in zip(self.arrivals, self.duration)]
def get_difference(self):
_duration = sum(self.duration)
return [x / _duration for x in self.duration]
def mean(self):
rate = sum(self.arrivals) / sum(self.duration)
return sum(self.get_difference()) / rate
class line_speedup(object):
def __init__(self, _name="", _prog="", _exp_data=None, _exp_base=None):
self.name = _name
self.prog = _prog
self.data = _exp_data
self.base = _exp_base
def virtual_speedup(self):
if self.data is None or self.base is None:
return 0.0
return self.data.speedup
def compute_speedup(self):
if self.data is None or self.base is None:
return 0.0
return ((self.base.mean() - self.data.mean()) / self.base.mean()) * 100
def compute_speedup_stddev(self):
if self.data is None or self.base is None:
return 0.0
_data = []
_base = self.base.mean()
for ditr in self.data.get_data():
_data += [((_base - ditr) / _base) * 100]
return stddev(_data)
def get_name(self):
return ":".join(
[
os.path.basename(x) if os.path.isfile(x) else x
for x in self.name.split(":")
]
)
def __str__(self):
if self.data is None or self.base is None:
return f"{self.name}"
_line_speedup = self.compute_speedup()
_line_stddev = self.compute_speedup_stddev() # 3 stddev == 99.87%
_name = self.get_name()
return f"[{_name}][{self.prog}][{self.data.speedup:3}] speedup: {_line_speedup:6.1f} +/- {_line_stddev:6.2f} %"
def __eq__(self, rhs):
return (
self.name == rhs.name
and self.prog == rhs.prog
and self.data == rhs.data
and self.base == rhs.base
)
def __neq__(self, rhs):
return not self == rhs
def __lt__(self, rhs):
if self.name != rhs.name:
return self.name < rhs.name
elif self.prog != rhs.prog:
return self.prog < rhs.prog
elif self.data != rhs.data:
return self.data < rhs.data
elif self.base != rhs.base:
return self.base < rhs.base
return False
class experiment_progress(object):
def __init__(self, _data):
self.data = _data
def get_impact(self):
speedup_c = [float(x.compute_speedup()) for x in self.data]
speedup_v = [float(x.virtual_speedup()) for x in self.data]
impact = []
for i in range(len(self.data) - 1):
impact += [
simpsons_rule(
speedup_v[i], speedup_v[i + 1], speedup_c[i], speedup_c[i + 1]
)
]
return [sum(impact), mean(impact), stddev(impact)]
def __len__(self):
return len(self.data)
def __str__(self):
_impact_v = self.get_impact()
_name = self.data[0].get_name()
_prog = self.data[0].prog
_impact = [
f"[{_name}][{_prog}][sum] impact: {_impact_v[0]:6.1f}",
f"[{_name}][{_prog}][avg] impact: {_impact_v[1]:6.1f} +/- {_impact_v[2]:6.2f}",
]
return "\n".join([f"{x}" for x in self.data] + _impact)
def __lt__(self, rhs):
self.data.sort()
return self.get_impact()[0] < rhs.get_impact()[0]
def process_samples(data, _data):
if not _data:
return data
for record in _data["omnitrace"]["causal"]["records"]:
for samp in record["samples"]:
_info = samp["info"]
_count = samp["count"]
_func = _info["dfunc"]
if _func not in data:
data[_func] = 0
data[_func] += _count
for dwarf_entry in _info["dwarf_info"]:
_name = "{}:{}".format(dwarf_entry["file"], dwarf_entry["line"])
if _name not in data:
data[_name] = 0
data[_name] += _count
return data
def process_data(data, _data, args):
def find_or_insert(_data, _value, _type):
if _value not in _data:
if _type == "throughput":
_data[_value] = throughput_point(_value)
elif _type == "latency":
_data[_value] = latency_point(_value)
return _data[_value]
if not _data:
return data
_selection_filter = re.compile(args.experiments)
_progresspt_filter = re.compile(args.progress_points)
for record in _data["omnitrace"]["causal"]["records"]:
for exp in record["experiments"]:
_speedup = exp["virtual_speedup"]
_duration = exp["duration"]
_file = exp["selection"]["info"]["file"]
_line = exp["selection"]["info"]["line"]
_func = exp["selection"]["info"]["dfunc"]
_sym_addr = exp["selection"]["symbol_address"]
_selected = ":".join([_file, f"{_line}"]) if _sym_addr == 0 else _func
if not re.search(_selection_filter, _selected):
continue
if _selected not in data:
data[_selected] = {}
for pts in exp["progress_points"]:
_name = pts["name"]
if not re.search(_progresspt_filter, _name):
continue
if _name not in data[_selected]:
data[_selected][_name] = {}
if "delta" in pts:
_delt = pts["delta"]
if _delt > 0:
itr = find_or_insert(
data[_selected][_name], _speedup, "throughput"
)
itr += [_delt, _duration]
elif "arrival" in pts and pts["arrival"] > 0:
itr = find_or_insert(data[_selected][_name], _speedup, "latency")
itr += [pts["arrival"], pts["departure"], _duration]
else:
_delt = pts["laps"]
if _delt > 0:
itr = find_or_insert(data[_selected][_name], _speedup)
itr += [_delt, _duration]
return data
def compute_speedups(_data, args):
data = {}
for selected, pitr in _data.items():
if selected not in data:
data[selected] = {}
for progpt, ditr in pitr.items():
data[selected][progpt] = OrderedDict(sorted(ditr.items()))
from os.path import dirname
ret = []
for selected, pitr in _data.items():
for progpt, ditr in pitr.items():
if 0 not in ditr.keys():
continue
_baseline = ditr[0].mean()
for speedup, itr in ditr.items():
if len(args.speedups) > 0 and speedup not in args.speedups:
continue
if speedup != itr.speedup:
raise ValueError(f"in {selected}: {speedup} != {itr.speedup}")
if len(itr) >= args.min_experiments:
_val = line_speedup(selected, progpt, itr, ditr[0])
ret.append(_val)
ret.sort()
_last_name = None
_last_prog = None
result = []
for itr in ret:
if itr.name != _last_name or itr.prog != _last_prog:
result.append([])
result[-1].append(itr)
_last_name = itr.name
_last_prog = itr.prog
_data = []
for itr in result:
_data.append(experiment_progress(itr))
_data.sort()
return _data
def get_validations(args):
data = []
_len = len(args.validate)
if _len == 0:
return data
elif _len % 5 != 0:
raise ValueError(
"validation requires format: {experiment regex} {progress-point regex} {virtual-speedup} {expected-speedup} {tolerance} (i.e. 5 args per validation. There are {} extra/missing arguments".format(
_len % 5
)
)
v = args.validate
for i in range(int(_len / 5)):
off = 5 * i
data.append(
validation(v[off + 0], v[off + 1], v[off + 2], v[off + 3], v[off + 4])
)
return data
def main():
import argparse
global num_stddev
parser = argparse.ArgumentParser()
parser.add_argument(
"-e", "--experiments", type=str, help="Regex for experiments", default=".*"
)
parser.add_argument(
"-p",
"--progress-points",
type=str,
help="Regex for progress points",
default=".*",
)
parser.add_argument(
"-n", "--num-points", type=int, help="Minimum number of data points", default=5
)
parser.add_argument(
"-m",
"--min-experiments",
type=int,
help="Minimum number of experiments per speedup (e.g. do not display speedups when there are fewer than X experiments at this speedup)",
default=2,
)
parser.add_argument(
"-i", "--input", type=str, nargs="*", help="Input file(s)", required=True
)
parser.add_argument(
"-s",
"--speedups",
type=int,
help="List of speedup values to report",
nargs="*",
default=[],
)
parser.add_argument(
"-d",
"--stddev",
type=float,
help="Number of standard deviations to report",
default=1.0,
)
parser.add_argument(
"-v",
"--validate",
type=str,
nargs="*",
help="Validate speedup: {experiment regex} {progress-point regex} {virtual-speedup} {expected-speedup} {tolerance}",
default=[],
)
parser.add_argument(
"--samples",
type=float,
help="Report samples within this percentage of the peak (0.0, 100.0] (default: 95 percent)",
default=95.0,
)
parser.add_argument(
"--ci",
action="store_true",
help="{}. {}".format(
"Accept speedup predictions when: (A) virtual speedup > 10 and (B) prediction is within the tolerance after being increased by (0.5 * stddev) and (1.0 * stddev)",
"This is primarily used for the CI where the two threads commonly run on 1 CPU core with 2 hyperthreads (causing the speedup potential to drop)",
),
)
args = parser.parse_args()
num_stddev = args.stddev
num_speedups = len(args.speedups)
percent_samples = args.samples
if not percent_samples > 0.0 and not percent_samples <= 100.0:
raise ValueError(
f"Invalid samples value: {percent_samples}. Supported range: 0.0 < x <= 100.0"
)
percent_samples = 1.0 - (percent_samples / 100.0)
if num_speedups > 0 and args.num_points > num_speedups:
args.num_points = num_speedups
data = {}
samp = {}
for inp in args.input:
with open(inp, "r") as f:
inp_data = json.load(f)
data = process_data(data, inp_data, args)
samp = process_samples(samp, inp_data)
print("Samples:")
width = max([int(math.log10(x) + 1) for _, x in samp.items()])
samp_peak = max([count for _, count in samp.items()])
for name, count in sorted(samp.items(), key=lambda x: x[1], reverse=True):
if count >= samp_peak * percent_samples:
print(f" {count:{width}} :: {name}")
results = compute_speedups(data, args)
print("")
print("Experiments:")
for itr in results:
if len(itr) < args.num_points:
continue
print("")
# split each line, indent each line, and join again into single string
print("{}".format("\n".join([f" {x}" for x in f"{itr}".split("\n")])))
sys.stdout.flush()
validations = get_validations(args)
expected_validations = len(validations)
correct_validations = 0
if expected_validations > 0:
print(f"\nPerforming {expected_validations} validations...\n")
for eitr in results:
_experiment = eitr.data[0].get_name()
_progresspt = eitr.data[0].prog
_base_speedup_stddev = eitr.data[0].compute_speedup_stddev()
for ditr in eitr.data:
_virt_speedup = ditr.virtual_speedup()
_prog_speedup = ditr.compute_speedup()
_prog_speedup_stddev = ditr.compute_speedup_stddev()
for vitr in validations:
_v = vitr.validate(
_experiment,
_progresspt,
_virt_speedup,
_prog_speedup,
_prog_speedup_stddev,
_base_speedup_stddev,
args.ci,
)
if _v is None:
continue
if _v is True:
correct_validations += 1
else:
sys.stderr.write(
f"\n [{_experiment}][{_progresspt}][{_virt_speedup}] failed validation: {_prog_speedup:8.3f} != {vitr.program_speedup} +/- {vitr.tolerance}\n\n"
)
if expected_validations != correct_validations:
sys.stderr.flush()
sys.stderr.write(
f"\nCausal profiling predictions not validated. Expected {expected_validations}, found {correct_validations}\n"
)
sys.stderr.flush()
sys.exit(-1)
elif expected_validations > 0:
print(f"Causal profiling predictions validated: {expected_validations}")
if __name__ == "__main__":
main()