Global trace delay and duration (#235)
- The primary feature of this PR is the **addition of support for scoping the collection of tracing/profiling data into one or more time-based windows** - Closes #222 - Closes #207 - Support for a real-clock time delay and/or a duration for tracing/profiling was added, *resembling the support for this feature during sampling and process-sampling* - However, above paradigm was enhanced for tracing - Instead of one delay and/or one duration based on real time, ***tracing supports periodic and varying delays and durations and these delay+duration sets can be controlled with different clocks*** - At some point, this capability will be extended to sampling and process-sampling - A secondary feature of this PR are the improvements to the handling of categories (by-product of the primary feature) - For example, previously setting `OMNITRACE_ENABLE_CATEGORIES` to a specific set of categories only eliminated the disabled categories from the perfetto trace, now these are applied to timemory profiles too - A new configuration variable `OMNITRACE_DISABLE_CATEGORIES` was added for when disabling only a handful of categories is easier - There are quite a few miscellaneous modifications which pollute this PR a bit ## Multiple Tracing Windows As noted above, tracing now supports specifying multiple delays and durations _and_ with different clocks. Consider the configuration below with two entries in the format `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_TYPE>`: ```console OMNITRACE_TRACE_PERIODS = 0.5:1.0:2:realtime 10.0:5.0:3:cputime ``` The above configuration defines: 1. `0.5:1.0:2:realtime` - A delay of 0.5 seconds (real-time) - Followed by a data collection duration of 1 second (real-time) - This delay + duration is repeated 2x - Summary: tracing data is collected for 2 out of the first 3 seconds of the application's execution 2. `10.0:5.0:3:cputime` - A delay of 10 seconds (process _CPU-time_) - Followed by a data collection duration of 5 seconds (process _CPU-time_) - This delay + duration is repeated 3x - Summary: tracing data is collected for a total of 15 seconds of process CPU-time in the ensuing 75 seconds of CPU-time during the application execution. - Note: the elapsed CPU-time is the aggregate of the CPU-time consumed by all the threads in the process and should be scaled accordingly, e.g., 4 threads running constantly for 1 second of real-time is ~4 seconds of CPU time. ## `omnitrace-sample` Changes Formerly, `--wait` and `--duration` command-line options only applied to sampling delay and duration. The value of these options are now applied to the tracing delay and duration. To retain the ability to control sampling delay/duration without setting tracing delay/duration or vice versa, `--sampling-wait`, `--sampling-duration`, `--trace-wait`, and `--trace-duration` options were added. `omnitrace-sample` also has new options for most of the new configuration options detailed below. ## New configuration options | Option | Description | | ------- | ----------- | | `OMNITRACE_DISABLE_CATEGORIES` | inverse behavior from `OMNITRACE_ENABLE_CATEGORIES` -- populates list of all available categories and then removes the specified ones. | | `OMNITRACE_TRACE_DELAY` | Single floating-point number specifying time to wait before starting data collection. Analagous to `OMNITRACE_SAMPLING_DELAY` and `OMNITRACE_PROCESS_SAMPLING_DELAY` | | `OMNITRACE_TRACE_DURATION` | Single floating-point number specifying data collection duration. Analagous to `OMNITRACE_SAMPLING_DURATION` and `OMNITRACE_PROCESS_SAMPLING_DURATION` | | `OMNITRACE_TRACE_PERIOD_CLOCK_ID` | Sets the default clock-type for tracing delay/duration. Always applied to above two options, can be overridden in below option. Accepts `CLOCK_REALTIME`, `CLOCK_MONOTONIC`, `CLOCK_PROCESS_CPUTIME_ID`, `CLOCK_MONOTONIC_RAW`, `CLOCK_REALTIME_COARSE`, `CLOCK_MONOTONIC_COARSE`, `CLOCK_BOOTTIME`. See `man 2 clock_gettime` for details on differences. | | `OMNITRACE_TRACE_PERIODS` | More powerful version for specifying delay + duration. Supports formats: `<DELAY>`, `<DELAY>:<DURATION>`, `<DELAY>:<DURATION>:<REPEAT>`, and `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>`. | ## Miscellaneous Changes - Expanded `critical_trace_categories_t` to include tracing data from MPI, pthread, HIP, HSA, RCCL, NUMA, and Python. - Added categories `thread_wall_time` and `thread_cpu_time` (derived from sampling) - Read DWARF info for breakpoints - Relocated some source code - Reason: necessary to make `libomnitrace` a bit more modular. Eventually, a large chunk will be separated into `libomnitrace-core`, `libomnitrace-binary`, etc. in order to facilitate re-usability - Relocated some functionality from `runtime.cpp` to `config.cpp` - Relocated code using rocm-smi library to query number of devices to `gpu.cpp` (where the code for using HIP to query number of devices is) - Relocated code for perfetto config and perfetto session out of tracing namespace to reside with other perfetto code - `OMNITRACE_COLORIZED_LOG` configuration option renamed to `OMNITRACE_MONOCHROME` - Backwards compatibility via a deprecated option was not retained here since the logic changed (i.e. true in former means false in latter) - Replaced `TIMEMORY_DEFAULT_OBJECT` macro with `OMNITRACE_DEFAULT_OBJECT` macro - Updated some code in roctracer to use `component::category_region` instead of explicitly using `tracing::` functions - Updated `backtrace_metrics` to better support controlling their presence in the traces/profiles via categories - Added support for `--print` in `validate-timemory-json.py` - Generic `OMNITRACE_ADD_VALIDATION_TEST` CMake function ## Git Log * OMNITRACE_DEFAULT_OBJECT - replace TIMEMORY_DEFAULT_OBJECT with TIMEMORY_DEFAULT_OBJECT * trace-time-window example + tests - adds cmake OMNITRACE_ADD_VALIDATION_TEST function for testing - validate-timemory-json.py now supports printing (-p) - update to OMNITRACE_STRIP_TARGET * Update timemory submodule - detailed backtrace print /proc/<PID>/maps - operation::push_node verbosity change - storage::insert_hierarchy use emplace + at instead of operator[] - concepts::is_type_listing - argparse updates for start/end group - argparse color fixes * perfetto updates - Remove OMNITRACE_CUSTOM_DATA_SOURCE CMake option - move tracing::get_perfetto_config and tracing::get_perfetto_session to perfetto.cpp * config and runtime updates - OMNITRACE_DISABLE_CATEGORIES option - get_enabled_categories() + get_disabled_categories() - config impl handles populating them - OMNITRACE_TRACE_DELAY option - OMNITRACE_TRACE_DURATION option - OMNITRACE_TRACE_PERIODS option - {get,set}_signal_handler - removes config.cpp link dependency for omnitrace_finalize - get_realtime_signal() + get_cputime_signal() + get_sampling_signals() - moved from runtime.cpp to config.cpp * utility::convert - helper function for converting string to a type * pthread_create_gotcha + thread_info updates - thread_index_data::as_string() - tweak printing info about new thread / exited thread * binary updates - get_binary_info has arg to disable dwarf parsing - binary_info contains vector of breakpoint addresses - binary_info:filename() function - binary::get_linked_path - binary::get_link_map has args for dlopen mode - symbol::read_dwarf -> symbol::read_dwarf_entries - symbol::read_dwarf_breakpoints * library updates + categories impl - implement config::set_signal_handler - categories.cpp for handling trace delays - implement trace delay/duration/periods * concepts + debug + defines - tuple_element in concepts - removed runtime header from debug header - OMNITRACE_DEFAULT_COPY_MOVE * gpu + rocm_smi - moved rsmi_num_monitor_devices call to gpu.cpp - gpu::rsmi_device_count() * roctracer updates - roctracer_bundle_t -> roctracer_hip_bundle_t - use category_region instead of explicit tracing push/pop calls * sampling + backtrace_metrics - rework backtrace_metrics to support categories * tracing updates - category stack counters (i.e. push vs. pop counter) for profiling and tracing - push_timemory and pop_timemory accept string_view instead of const char* - tweaked the pop_timemory hash search - {push,pop}_perfetto theoretically supports same invocations as for {push,pop}_perfetto_ts and {push,pop}_perfetto_track - mark_perfetto, mark_perfetto_ts, mark_perfetto_track * category_region update - expanded the critical trace categories - use category_push_disabled - use category_pop_disabled - use category_mark_disabled * constraint implementation - This provides generic functionality for constraining data collection within a windows of time. - E.g., delay, delay + duration, (delay + duration) * nrepeat * COLORIZED_LOG -> MONOCHROME * constraint + omnitrace-causal + omnitrace-sample updates - support for using different clock IDs for constraints - OMNITRACE_TRACE_PERIOD_CLOCK_ID option - tweak to trace-time-window example - tweak to trace-time-window tests * Fix formatting * Update time-window tests - Fix detection of validation support for perfetto - Using the --caller-include feature + runtime instrumentation on Ubuntu 18.04 and OpenSUSE 15.2 results in a segfault in the internals of Dyninst. - For now, mark that these tests will fail - Later, determine if updating Dyninst submodule fixes this problem * Fix OMNITRACE_OUTPUT_PATH for all tests - Provide absolute path instead of relative * Tweak lambda for checking whether HW counters are enabled - causing strange build errors on older GCC compilers * Update dyninst submodule - fix issues with using --caller-include for Ubuntu 18.04, OpenSUSE 15.x * cmake formatting * fix sampling compiler issue for GCC 8 * Tweak thread create message * Increase causal validation iterations
Este commit está contenido en:
cometido por
GitHub
padre
2fb67c394b
commit
8feb6bf8b6
@@ -215,6 +215,22 @@ parse:
|
||||
DEFINITIONS: '*'
|
||||
LINK_LIBRARIES: '*'
|
||||
INCLUDE_DIRECTORIES: '*'
|
||||
omnitrace_add_validation_test:
|
||||
kwargs:
|
||||
NAME: '*'
|
||||
ARGS: '*'
|
||||
LABELS: '*'
|
||||
TIMEOUT: '*'
|
||||
DEPENDS: '*'
|
||||
PROPERTIES: '*'
|
||||
PASS_REGEX: '*'
|
||||
FAIL_REGEX: '*'
|
||||
SKIP_REGEX: '*'
|
||||
ENVIRONMENT: '*'
|
||||
PERFETTO_FILE: '*'
|
||||
PERFETTO_METRIC: '*'
|
||||
TIMEMORY_FILE: '*'
|
||||
TIMEMORY_METRIC: '*'
|
||||
override_spec: {}
|
||||
vartags: []
|
||||
proptags: []
|
||||
|
||||
@@ -116,8 +116,6 @@ if(CI_BUILD)
|
||||
ADVANCED)
|
||||
omnitrace_add_option(OMNITRACE_BUILD_DEBUG
|
||||
"Enable building with extensive debug symbols" OFF ADVANCED)
|
||||
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF
|
||||
ADVANCED)
|
||||
omnitrace_add_option(
|
||||
OMNITRACE_BUILD_HIDDEN_VISIBILITY
|
||||
"Build with hidden visibility (disable for Debug builds)" OFF ADVANCED)
|
||||
@@ -131,8 +129,6 @@ else()
|
||||
ADVANCED)
|
||||
omnitrace_add_option(OMNITRACE_BUILD_DEBUG
|
||||
"Enable building with extensive debug symbols" OFF ADVANCED)
|
||||
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF
|
||||
ADVANCED)
|
||||
omnitrace_add_option(
|
||||
OMNITRACE_BUILD_HIDDEN_VISIBILITY
|
||||
"Build with hidden visibility (disable for Debug builds)" ON ADVANCED)
|
||||
|
||||
@@ -108,27 +108,57 @@ function(OMNITRACE_CAPITALIZE str var)
|
||||
endfunction()
|
||||
|
||||
# ------------------------------------------------------------------------------#
|
||||
# function omnitrace_strip_target()
|
||||
# function omnitrace_strip_target(<TARGET> [FORCE] [EXPLICIT])
|
||||
#
|
||||
# Creates a target which runs ctest but depends on all the tests being built.
|
||||
# Creates a post-build command which strips a binary. FORCE flag will override
|
||||
#
|
||||
function(OMNITRACE_STRIP_TARGET _TARGET)
|
||||
if(CMAKE_STRIP AND OMNITRACE_STRIP_LIBRARIES)
|
||||
add_custom_command(
|
||||
TARGET ${_TARGET}
|
||||
POST_BUILD
|
||||
COMMAND
|
||||
${CMAKE_STRIP} -w --keep-symbol="omnitrace_init"
|
||||
--keep-symbol="omnitrace_finalize" --keep-symbol="omnitrace_push_trace"
|
||||
--keep-symbol="omnitrace_pop_trace" --keep-symbol="omnitrace_push_region"
|
||||
--keep-symbol="omnitrace_pop_region" --keep-symbol="omnitrace_set_env"
|
||||
--keep-symbol="omnitrace_set_mpi" --keep-symbol="omnitrace_reset_preload"
|
||||
--keep-symbol="omnitrace_user_*" --keep-symbol="ompt_start_tool"
|
||||
--keep-symbol="kokkosp_*" --keep-symbol="OnLoad" --keep-symbol="OnUnload"
|
||||
--keep-symbol="OnLoadToolProp" --keep-symbol="OnUnloadTool"
|
||||
--keep-symbol="__libc_start_main" ${ARGN} $<TARGET_FILE:${_TARGET}>
|
||||
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
|
||||
COMMENT "Stripping ${_TARGET}...")
|
||||
function(OMNITRACE_STRIP_TARGET)
|
||||
cmake_parse_arguments(STRIP "FORCE;EXPLICIT" "" "ARGS" ${ARGN})
|
||||
|
||||
list(LENGTH STRIP_UNPARSED_ARGUMENTS NUM_UNPARSED)
|
||||
|
||||
if(NUM_UNPARSED EQUAL 1)
|
||||
set(_TARGET "${STRIP_UNPARSED_ARGUMENTS}")
|
||||
else()
|
||||
omnitrace_message(FATAL_ERROR
|
||||
"omnitrace_strip_target cannot deduce target from \"${ARGN}\"")
|
||||
endif()
|
||||
|
||||
if(NOT TARGET "${_TARGET}")
|
||||
omnitrace_message(
|
||||
FATAL_ERROR
|
||||
"omnitrace_strip_target not provided valid target: \"${_TARGET}\"")
|
||||
endif()
|
||||
|
||||
if(CMAKE_STRIP AND (STRIP_FORCE OR OMNITRACE_STRIP_LIBRARIES))
|
||||
if(STRIP_EXPLICIT)
|
||||
add_custom_command(
|
||||
TARGET ${_TARGET}
|
||||
POST_BUILD
|
||||
COMMAND ${CMAKE_STRIP} ${STRIP_ARGS} $<TARGET_FILE:${_TARGET}>
|
||||
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
|
||||
COMMENT "Stripping ${_TARGET}...")
|
||||
else()
|
||||
add_custom_command(
|
||||
TARGET ${_TARGET}
|
||||
POST_BUILD
|
||||
COMMAND
|
||||
${CMAKE_STRIP} -w --keep-symbol="omnitrace_init"
|
||||
--keep-symbol="omnitrace_finalize"
|
||||
--keep-symbol="omnitrace_push_trace"
|
||||
--keep-symbol="omnitrace_pop_trace"
|
||||
--keep-symbol="omnitrace_push_region"
|
||||
--keep-symbol="omnitrace_pop_region" --keep-symbol="omnitrace_set_env"
|
||||
--keep-symbol="omnitrace_set_mpi"
|
||||
--keep-symbol="omnitrace_reset_preload"
|
||||
--keep-symbol="omnitrace_user_*" --keep-symbol="ompt_start_tool"
|
||||
--keep-symbol="kokkosp_*" --keep-symbol="OnLoad"
|
||||
--keep-symbol="OnUnload" --keep-symbol="OnLoadToolProp"
|
||||
--keep-symbol="OnUnloadTool" --keep-symbol="__libc_start_main"
|
||||
${STRIP_ARGS} $<TARGET_FILE:${_TARGET}>
|
||||
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
|
||||
COMMENT "Stripping ${_TARGET}...")
|
||||
endif()
|
||||
endif()
|
||||
endfunction()
|
||||
|
||||
|
||||
@@ -53,3 +53,4 @@ add_subdirectory(lulesh)
|
||||
add_subdirectory(rccl)
|
||||
add_subdirectory(rewrite-caller)
|
||||
add_subdirectory(causal)
|
||||
add_subdirectory(trace-time-window)
|
||||
|
||||
@@ -0,0 +1,24 @@
|
||||
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
|
||||
|
||||
project(omnitrace-trace-time-window-example LANGUAGES CXX)
|
||||
|
||||
if(OMNITRACE_DISABLE_EXAMPLES)
|
||||
get_filename_component(_DIR ${CMAKE_CURRENT_LIST_DIR} NAME)
|
||||
|
||||
if(${PROJECT_NAME} IN_LIST OMNITRACE_DISABLE_EXAMPLES OR ${_DIR} IN_LIST
|
||||
OMNITRACE_DISABLE_EXAMPLES)
|
||||
return()
|
||||
endif()
|
||||
endif()
|
||||
|
||||
set(CMAKE_BUILD_TYPE "Debug")
|
||||
|
||||
add_executable(trace-time-window trace-time-window.cpp)
|
||||
target_compile_options(trace-time-window PRIVATE ${_FLAGS})
|
||||
|
||||
if(OMNITRACE_INSTALL_EXAMPLES)
|
||||
install(
|
||||
TARGETS trace-time-window
|
||||
DESTINATION bin
|
||||
COMPONENT omnitrace-examples)
|
||||
endif()
|
||||
@@ -0,0 +1,80 @@
|
||||
#include <chrono>
|
||||
#include <cstdio>
|
||||
#include <cstdlib>
|
||||
#include <ratio>
|
||||
#include <string>
|
||||
#include <thread>
|
||||
|
||||
#define NOINLINE __attribute__((noinline))
|
||||
|
||||
NOINLINE size_t
|
||||
inner();
|
||||
|
||||
NOINLINE size_t
|
||||
outer_a();
|
||||
|
||||
NOINLINE size_t
|
||||
outer_b();
|
||||
|
||||
NOINLINE size_t
|
||||
outer_c();
|
||||
|
||||
NOINLINE size_t
|
||||
outer_d();
|
||||
|
||||
NOINLINE size_t
|
||||
outer_e();
|
||||
|
||||
int
|
||||
main(int argc, char** argv)
|
||||
{
|
||||
int nrepeat = 1;
|
||||
if(argc > 1) nrepeat = atol(argv[1]);
|
||||
|
||||
std::string _name = argv[0];
|
||||
auto _pos = _name.find_last_of('/');
|
||||
if(_pos != std::string::npos) _name = _name.substr(_pos + 1);
|
||||
|
||||
size_t nitr = 0;
|
||||
for(int i = 0; i < nrepeat; ++i)
|
||||
{
|
||||
nitr += outer_a();
|
||||
nitr += outer_b();
|
||||
nitr += outer_c();
|
||||
nitr += outer_d();
|
||||
nitr += outer_e();
|
||||
printf("[%s][%i] number of calls made = %zu\n", _name.c_str(), i, nitr);
|
||||
}
|
||||
}
|
||||
|
||||
size_t
|
||||
inner(size_t _duration)
|
||||
{
|
||||
static int64_t _n = 0;
|
||||
|
||||
if(_n++ % 5 == 2)
|
||||
{
|
||||
using clock_type = std::chrono::high_resolution_clock;
|
||||
auto _end = clock_type::now() + std::chrono::milliseconds{ _duration };
|
||||
size_t nitr = 0;
|
||||
while(clock_type::now() < _end)
|
||||
{
|
||||
++nitr;
|
||||
}
|
||||
return nitr;
|
||||
}
|
||||
else
|
||||
{
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds{ _duration });
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
|
||||
#define OUTER_FUNCTION(TAG) \
|
||||
size_t outer_##TAG() { return inner(500); }
|
||||
|
||||
OUTER_FUNCTION(a)
|
||||
OUTER_FUNCTION(b)
|
||||
OUTER_FUNCTION(c)
|
||||
OUTER_FUNCTION(d)
|
||||
OUTER_FUNCTION(e)
|
||||
vendido
+1
-1
Submodule external/dyninst updated: e4d2eb36ae...dcc8dad3fb
vendido
+1
-1
Submodule external/timemory updated: 64bf1067a4...92fc712074
@@ -19,7 +19,6 @@ def which(cmd, require):
|
||||
|
||||
|
||||
def generate_custom(args, cmake_args, ctest_args):
|
||||
|
||||
if not os.path.exists(args.binary_dir):
|
||||
os.makedirs(args.binary_dir)
|
||||
|
||||
@@ -74,7 +73,6 @@ def generate_custom(args, cmake_args, ctest_args):
|
||||
|
||||
|
||||
def generate_dashboard_script(args):
|
||||
|
||||
CODECOV = 1 if args.coverage else 0
|
||||
DASHBOARD_MODE = args.mode
|
||||
SOURCE_DIR = os.path.realpath(args.source_dir)
|
||||
@@ -244,7 +242,6 @@ def run(*args, **kwargs):
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
args, cmake_args, ctest_args = parse_args()
|
||||
|
||||
if not os.path.exists(args.binary_dir):
|
||||
|
||||
@@ -58,7 +58,7 @@ namespace console = ::tim::utility::console;
|
||||
namespace argparse = ::tim::argparse;
|
||||
using namespace timemory::join;
|
||||
using tim::get_env;
|
||||
using tim::log::colorized;
|
||||
using tim::log::monochrome;
|
||||
using tim::log::stream;
|
||||
|
||||
namespace std
|
||||
@@ -535,15 +535,6 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
exit(EXIT_FAILURE);
|
||||
});
|
||||
|
||||
auto _add_separator = [&](std::string _v, const std::string& _desc) {
|
||||
parser.add_argument({ "" }, "");
|
||||
parser
|
||||
.add_argument({ join("", "[", _v, "]") },
|
||||
(_desc.empty()) ? _desc : join({ "", "(", ")" }, _desc))
|
||||
.color(color::info());
|
||||
parser.add_argument({ "" }, "");
|
||||
};
|
||||
|
||||
parser.enable_help();
|
||||
parser.enable_version("omnitrace-causal", "v" OMNITRACE_VERSION_STRING,
|
||||
OMNITRACE_GIT_DESCRIBE, OMNITRACE_GIT_REVISION);
|
||||
@@ -553,16 +544,16 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
parser.set_description_width(
|
||||
std::min<int>(_cols - parser.get_help_width() - 8, 120));
|
||||
|
||||
_add_separator("DEBUG OPTIONS", "");
|
||||
parser.start_group("DEBUG OPTIONS", "");
|
||||
parser.add_argument({ "--monochrome" }, "Disable colorized output")
|
||||
.max_count(1)
|
||||
.dtype("bool")
|
||||
.action([&](parser_t& p) {
|
||||
auto _colorized = !p.get<bool>("monochrome");
|
||||
colorized() = _colorized;
|
||||
p.set_use_color(_colorized);
|
||||
update_env(_env, "OMNITRACE_COLORIZED_LOG", (_colorized) ? "1" : "0");
|
||||
update_env(_env, "COLORIZED_LOG", (_colorized) ? "1" : "0");
|
||||
auto _monochrome = p.get<bool>("monochrome");
|
||||
monochrome() = _monochrome;
|
||||
p.set_use_color(!_monochrome);
|
||||
update_env(_env, "OMNITRACE_MONOCHROME", (_monochrome) ? "1" : "0");
|
||||
update_env(_env, "MONOCHROME", (_monochrome) ? "1" : "0");
|
||||
});
|
||||
parser.add_argument({ "--debug" }, "Debug output")
|
||||
.max_count(1)
|
||||
@@ -582,7 +573,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
bool _generate_configs = false;
|
||||
bool _add_defaults = true;
|
||||
|
||||
_add_separator("GENERAL OPTIONS", "");
|
||||
parser.start_group("GENERAL OPTIONS", "");
|
||||
parser.add_argument({ "-c", "--config" }, "Base configuration file")
|
||||
.min_count(0)
|
||||
.dtype("filepath")
|
||||
@@ -629,8 +620,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
.dtype("bool")
|
||||
.action([&](parser_t& p) { _add_defaults = !p.get<bool>("no-defaults"); });
|
||||
|
||||
_add_separator("CAUSAL PROFILING OPTIONS (General)",
|
||||
"These settings will be applied to all causal profiling runs");
|
||||
parser.start_group("CAUSAL PROFILING OPTIONS (General)",
|
||||
"These settings will be applied to all causal profiling runs");
|
||||
parser.add_argument({ "-m", "--mode" }, "Causal profiling mode")
|
||||
.count(1)
|
||||
.dtype("string")
|
||||
@@ -706,7 +697,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
.dtype("int")
|
||||
.action([&](parser_t& p) { _niterations = p.get<int64_t>("iterations"); });
|
||||
|
||||
_add_separator(
|
||||
parser.start_group(
|
||||
"CAUSAL PROFILING OPTIONS (Combinatorial)",
|
||||
"Each individual argument to these options will multiply the number runs by the "
|
||||
"number of arguments and the number of iterations. E.g. -n 2 -B \"MAIN\" -F "
|
||||
@@ -804,6 +795,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
|
||||
_function_excludes = p.get<std::vector<std::string>>("function-exclude");
|
||||
});
|
||||
|
||||
parser.end_group();
|
||||
|
||||
#if OMNITRACE_HIP_VERSION > 0 && OMNITRACE_HIP_VERSION < 50300
|
||||
update_env(_env, "HSA_ENABLE_INTERRUPT", 0);
|
||||
#endif
|
||||
|
||||
@@ -54,14 +54,47 @@
|
||||
namespace color = tim::log::color;
|
||||
using namespace timemory::join;
|
||||
using tim::get_env;
|
||||
using tim::log::colorized;
|
||||
using tim::log::monochrome;
|
||||
using tim::log::stream;
|
||||
|
||||
namespace
|
||||
{
|
||||
int verbose = 0;
|
||||
auto updated_envs = std::set<std::string_view>{};
|
||||
auto original_envs = std::set<std::string>{};
|
||||
int verbose = 0;
|
||||
auto updated_envs = std::set<std::string_view>{};
|
||||
auto original_envs = std::set<std::string>{};
|
||||
auto clock_id_choices = []() {
|
||||
auto clock_name = [](std::string _v) {
|
||||
constexpr auto _clock_prefix = std::string_view{ "clock_" };
|
||||
for(auto& itr : _v)
|
||||
itr = tolower(itr);
|
||||
auto _pos = _v.find(_clock_prefix);
|
||||
if(_pos == 0) _v = _v.substr(_pos + _clock_prefix.length());
|
||||
if(_v == "process_cputime_id") _v = "cputime";
|
||||
return _v;
|
||||
};
|
||||
|
||||
#define OMNITRACE_CLOCK_IDENTIFIER(VAL) \
|
||||
std::make_tuple(clock_name(#VAL), VAL, std::string_view{ #VAL })
|
||||
|
||||
auto _choices = std::vector<std::string>{};
|
||||
auto _aliases = std::map<std::string, std::vector<std::string>>{};
|
||||
for(auto itr : { OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_PROCESS_CPUTIME_ID),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_RAW),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME_COARSE),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_COARSE),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_BOOTTIME) })
|
||||
{
|
||||
auto _choice = std::to_string(std::get<1>(itr));
|
||||
_choices.emplace_back(_choice);
|
||||
_aliases[_choice] = { std::get<0>(itr), std::string{ std::get<2>(itr) } };
|
||||
}
|
||||
|
||||
#undef OMNITRACE_CLOCK_IDENTIFIER
|
||||
|
||||
return std::make_pair(_choices, _aliases);
|
||||
}();
|
||||
} // namespace
|
||||
|
||||
std::string
|
||||
@@ -329,15 +362,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
%{INDENT}%- discard : new data is ignored
|
||||
%{INDENT}%- ring_buffer : new data overwrites oldest data)";
|
||||
|
||||
auto _add_separator = [&](std::string _v, const std::string& _desc) {
|
||||
parser.add_argument({ "" }, "");
|
||||
parser
|
||||
.add_argument({ join("", "[", _v, "]") },
|
||||
(_desc.empty()) ? _desc : join({ "", "(", ")" }, _desc))
|
||||
.color(tim::log::color::info());
|
||||
parser.add_argument({ "" }, "");
|
||||
};
|
||||
|
||||
parser.set_use_color(true);
|
||||
parser.enable_help();
|
||||
parser.enable_version("omnitrace-sample", "v" OMNITRACE_VERSION_STRING,
|
||||
OMNITRACE_GIT_DESCRIBE, OMNITRACE_GIT_REVISION);
|
||||
@@ -347,16 +372,16 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
parser.set_description_width(
|
||||
std::min<int>(_cols - parser.get_help_width() - 8, 120));
|
||||
|
||||
_add_separator("DEBUG OPTIONS", "");
|
||||
parser.start_group("DEBUG OPTIONS", "");
|
||||
parser.add_argument({ "--monochrome" }, "Disable colorized output")
|
||||
.max_count(1)
|
||||
.dtype("bool")
|
||||
.action([&](parser_t& p) {
|
||||
auto _colorized = !p.get<bool>("monochrome");
|
||||
colorized() = _colorized;
|
||||
p.set_use_color(_colorized);
|
||||
update_env(_env, "OMNITRACE_COLORIZED_LOG", (_colorized) ? "1" : "0");
|
||||
update_env(_env, "COLORIZED_LOG", (_colorized) ? "1" : "0");
|
||||
auto _monochrome = p.get<bool>("monochrome");
|
||||
monochrome() = _monochrome;
|
||||
p.set_use_color(!_monochrome);
|
||||
update_env(_env, "OMNITRACE_MONOCHROME", (_monochrome) ? "1" : "0");
|
||||
update_env(_env, "MONOCHROME", (_monochrome) ? "1" : "0");
|
||||
});
|
||||
parser.add_argument({ "--debug" }, "Debug output")
|
||||
.max_count(1)
|
||||
@@ -371,7 +396,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
update_env(_env, "OMNITRACE_VERBOSE", _v);
|
||||
});
|
||||
|
||||
_add_separator("GENERAL OPTIONS", "");
|
||||
parser.start_group("GENERAL OPTIONS",
|
||||
"These are options which are ubiquitously applied");
|
||||
parser.add_argument({ "-c", "--config" }, "Configuration file")
|
||||
.min_count(0)
|
||||
.dtype("filepath")
|
||||
@@ -437,8 +463,28 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
update_env(_env, "OMNITRACE_USE_PROCESS_SAMPLING", _h || _d);
|
||||
update_env(_env, "OMNITRACE_USE_ROCM_SMI", _d);
|
||||
});
|
||||
parser
|
||||
.add_argument({ "-w", "--wait" },
|
||||
"This option is a combination of '--trace-wait' and "
|
||||
"'--sampling-wait'. See the descriptions for those two options.")
|
||||
.count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_TRACE_DELAY", p.get<double>("wait"));
|
||||
update_env(_env, "OMNITRACE_SAMPLING_DELAY", p.get<double>("wait"));
|
||||
});
|
||||
parser
|
||||
.add_argument(
|
||||
{ "-d", "--duration" },
|
||||
"This option is a combination of '--trace-duration' and "
|
||||
"'--sampling-duration'. See the descriptions for those two options.")
|
||||
.count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_TRACE_DURATION", p.get<double>("duration"));
|
||||
update_env(_env, "OMNITRACE_SAMPLING_DURATION", p.get<double>("duration"));
|
||||
});
|
||||
|
||||
_add_separator("TRACING OPTIONS", "");
|
||||
parser.start_group("TRACING OPTIONS", "Specific options controlling tracing (i.e. "
|
||||
"deterministic measurements of every event)");
|
||||
parser
|
||||
.add_argument({ "--trace-file" },
|
||||
"Specify the trace output filename. Relative filepath will be with "
|
||||
@@ -464,8 +510,57 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
update_env(_env, "OMNITRACE_PERFETTO_FILL_POLICY",
|
||||
p.get<std::string>("trace-fill-policy"));
|
||||
});
|
||||
parser
|
||||
.add_argument({ "--trace-wait" },
|
||||
"Set the wait time (in seconds) "
|
||||
"before collecting trace and/or profiling data"
|
||||
"(in seconds). By default, the duration is in seconds of realtime "
|
||||
"but that can changed via --trace-clock-id.")
|
||||
.count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_TRACE_DELAY", p.get<double>("trace-wait"));
|
||||
});
|
||||
parser
|
||||
.add_argument({ "--trace-duration" },
|
||||
"Set the duration of the trace and/or profile data collection (in "
|
||||
"seconds). By default, the duration is in seconds of realtime but "
|
||||
"that can changed via --trace-clock-id.")
|
||||
.count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_TRACE_DURATION", p.get<double>("trace-duration"));
|
||||
});
|
||||
parser
|
||||
.add_argument(
|
||||
{ "--trace-periods" },
|
||||
"More powerful version of specifying trace delay and/or duration. Format is "
|
||||
"one or more groups of: <DELAY>:<DURATION>, <DELAY>:<DURATION>:<REPEAT>, "
|
||||
"and/or <DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>.")
|
||||
.min_count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_TRACE_PERIODS",
|
||||
join(array_config{ ",", "", "" },
|
||||
p.get<std::vector<std::string>>("trace-periods")));
|
||||
});
|
||||
parser
|
||||
.add_argument(
|
||||
{ "--trace-clock-id" },
|
||||
"Set the default clock ID for for trace delay/duration. Note: \"cputime\" is "
|
||||
"the *process* CPU time and might need to be scaled based on the number of "
|
||||
"threads, i.e. 4 seconds of CPU-time for an application with 4 fully active "
|
||||
"threads would equate to ~1 second of realtime. If this proves to be "
|
||||
"difficult to handle in practice, please file a feature request for "
|
||||
"omnitrace to auto-scale based on the number of threads.")
|
||||
.count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_TRACE_PERIOD_CLOCK_ID",
|
||||
p.get<double>("trace-clock-id"));
|
||||
})
|
||||
.choices(clock_id_choices.first)
|
||||
.choice_aliases(clock_id_choices.second);
|
||||
|
||||
_add_separator("PROFILE OPTIONS", "");
|
||||
parser.start_group("PROFILE OPTIONS",
|
||||
"Specific options controlling profiling (i.e. deterministic "
|
||||
"measurements which are aggregated into a summary)");
|
||||
parser.add_argument({ "--profile-format" }, "Data formats for profiling results")
|
||||
.min_count(1)
|
||||
.max_count(3)
|
||||
@@ -496,7 +591,10 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
if(_v.size() > 1) update_env(_env, "OMNITRACE_INPUT_PREFIX", _v.at(1));
|
||||
});
|
||||
|
||||
_add_separator("HOST/DEVICE (PROCESS SAMPLING) OPTIONS", "");
|
||||
parser.start_group(
|
||||
"HOST/DEVICE (PROCESS SAMPLING) OPTIONS",
|
||||
"Process sampling is background measurements for resources available to the "
|
||||
"entire process. These samples are not tied to specific lines/regions of code");
|
||||
parser
|
||||
.add_argument({ "--process-freq" },
|
||||
"Set the default host/device sampling frequency "
|
||||
@@ -545,7 +643,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
join(array_config{ "," }, p.get<std::vector<std::string>>("gpus")));
|
||||
});
|
||||
|
||||
_add_separator("GENERAL SAMPLING OPTIONS", "");
|
||||
parser.start_group("GENERAL SAMPLING OPTIONS",
|
||||
"General options for timer-based sampling per-thread");
|
||||
parser
|
||||
.add_argument({ "-f", "--freq" }, "Set the default sampling frequency "
|
||||
"(number of interrupts per second)")
|
||||
@@ -555,23 +654,24 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
});
|
||||
parser
|
||||
.add_argument(
|
||||
{ "-w", "--wait" },
|
||||
{ "--sampling-wait" },
|
||||
"Set the default wait time (i.e. delay) before taking first sample "
|
||||
"(in seconds). This delay time is based on the clock of the sampler, i.e., a "
|
||||
"delay of 1 second for CPU-clock sampler may not equal 1 second of realtime")
|
||||
.count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_SAMPLING_DELAY", p.get<double>("wait"));
|
||||
update_env(_env, "OMNITRACE_SAMPLING_DELAY", p.get<double>("sampling-wait"));
|
||||
});
|
||||
parser
|
||||
.add_argument(
|
||||
{ "-d", "--duration" },
|
||||
{ "--sampling-duration" },
|
||||
"Set the duration of the sampling (in seconds of realtime). I.e., it is "
|
||||
"possible (currently) to set a CPU-clock time delay that exceeds the "
|
||||
"real-time duration... resulting in zero samples being taken")
|
||||
.count(1)
|
||||
.action([&](parser_t& p) {
|
||||
update_env(_env, "OMNITRACE_SAMPLING_DURATION", p.get<double>("duration"));
|
||||
update_env(_env, "OMNITRACE_SAMPLING_DURATION",
|
||||
p.get<double>("sampling-duration"));
|
||||
});
|
||||
parser
|
||||
.add_argument({ "-t", "--tids" },
|
||||
@@ -584,7 +684,9 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
join(array_config{ ", " }, p.get<std::vector<int64_t>>("tids")));
|
||||
});
|
||||
|
||||
_add_separator("SAMPLING TIMER OPTIONS", "");
|
||||
parser.start_group(
|
||||
"SAMPLING TIMER OPTIONS",
|
||||
"These options determine the heuristic for deciding when to take a sample");
|
||||
parser.add_argument({ "--cputime" }, _cputime_desc)
|
||||
.min_count(0)
|
||||
.action([&](parser_t& p) {
|
||||
@@ -660,8 +762,9 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
_backend_choices.erase("rocprofiler");
|
||||
#endif
|
||||
|
||||
_add_separator("BACKEND OPTIONS", "These options control region information captured "
|
||||
"w/o sampling or instrumentation");
|
||||
parser.start_group("BACKEND OPTIONS",
|
||||
"These options control region information captured "
|
||||
"w/o sampling or instrumentation");
|
||||
parser.add_argument({ "-I", "--include" }, "Include data from these backends")
|
||||
.choices(_backend_choices)
|
||||
.action([&](parser_t& p) {
|
||||
@@ -727,7 +830,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
remove_env(_env, "KOKKOS_PROFILE_LIBRARY");
|
||||
});
|
||||
|
||||
_add_separator("HARDWARE COUNTER OPTIONS", "");
|
||||
parser.start_group("HARDWARE COUNTER OPTIONS", "See also: omnitrace-avail -H");
|
||||
parser
|
||||
.add_argument({ "-C", "--cpu-events" },
|
||||
"Set the CPU hardware counter events to record (ref: "
|
||||
@@ -750,7 +853,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
});
|
||||
#endif
|
||||
|
||||
_add_separator("MISCELLANEOUS OPTIONS", "");
|
||||
parser.start_group("MISCELLANEOUS OPTIONS", "");
|
||||
parser
|
||||
.add_argument({ "-i", "--inlines" },
|
||||
"Include inline info in output when available")
|
||||
@@ -768,6 +871,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
|
||||
update_env(_env, "HSA_ENABLE_INTERRUPT", p.get<int>("hsa-interrupt"));
|
||||
});
|
||||
|
||||
parser.end_group();
|
||||
|
||||
auto _inpv = std::vector<char*>{};
|
||||
auto _outv = std::vector<char*>{};
|
||||
bool _hash = false;
|
||||
|
||||
@@ -154,6 +154,7 @@ for pref in preferences:
|
||||
|
||||
from recommonmark.transform import AutoStructify
|
||||
|
||||
|
||||
# app setup hook
|
||||
def setup(app):
|
||||
app.add_config_value(
|
||||
|
||||
@@ -550,8 +550,8 @@ extern "C"
|
||||
{
|
||||
void omnitrace_preinit_library(void)
|
||||
{
|
||||
if(!omnitrace::common::get_env("OMNITRACE_COLORIZED_LOG", tim::log::colorized()))
|
||||
tim::log::colorized() = false;
|
||||
if(omnitrace::common::get_env("OMNITRACE_MONOCHROME", tim::log::monochrome()))
|
||||
tim::log::monochrome() = true;
|
||||
}
|
||||
|
||||
int omnitrace_preload_library(void)
|
||||
|
||||
@@ -75,6 +75,8 @@ extern "C"
|
||||
OMNITRACE_CATEGORY_PROCESS_PAGE_FAULT,
|
||||
OMNITRACE_CATEGORY_PROCESS_USER_MODE_TIME,
|
||||
OMNITRACE_CATEGORY_PROCESS_KERNEL_MODE_TIME,
|
||||
OMNITRACE_CATEGORY_THREAD_WALL_TIME,
|
||||
OMNITRACE_CATEGORY_THREAD_CPU_TIME,
|
||||
OMNITRACE_CATEGORY_THREAD_PAGE_FAULT,
|
||||
OMNITRACE_CATEGORY_THREAD_PEAK_MEMORY,
|
||||
OMNITRACE_CATEGORY_THREAD_CONTEXT_SWITCH,
|
||||
|
||||
@@ -15,12 +15,8 @@ target_include_directories(
|
||||
omnitrace-interface-library INTERFACE ${CMAKE_CURRENT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_BINARY_DIR})
|
||||
|
||||
target_compile_definitions(
|
||||
omnitrace-interface-library
|
||||
INTERFACE
|
||||
OMNITRACE_MAX_THREADS=${OMNITRACE_MAX_THREADS}
|
||||
$<BUILD_INTERFACE:$<IF:$<BOOL:${OMNITRACE_CUSTOM_DATA_SOURCE}>,CUSTOM_DATA_SOURCE,>>
|
||||
)
|
||||
target_compile_definitions(omnitrace-interface-library
|
||||
INTERFACE OMNITRACE_MAX_THREADS=${OMNITRACE_MAX_THREADS})
|
||||
|
||||
target_link_libraries(
|
||||
omnitrace-interface-library
|
||||
|
||||
@@ -20,12 +20,13 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
// clang-format off
|
||||
#include <timemory/log/color.hpp>
|
||||
// clang-format on
|
||||
|
||||
//
|
||||
// above should always be included first
|
||||
//
|
||||
#include "api.hpp"
|
||||
#include "common/setup.hpp"
|
||||
#include "library/categories.hpp"
|
||||
#include "library/causal/data.hpp"
|
||||
#include "library/causal/experiment.hpp"
|
||||
#include "library/causal/sampling.hpp"
|
||||
@@ -37,6 +38,7 @@
|
||||
#include "library/components/rocprofiler.hpp"
|
||||
#include "library/concepts.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/constraint.hpp"
|
||||
#include "library/coverage.hpp"
|
||||
#include "library/critical_trace.hpp"
|
||||
#include "library/debug.hpp"
|
||||
@@ -56,24 +58,25 @@
|
||||
#include "library/utility.hpp"
|
||||
#include "omnitrace/categories.h" // in omnitrace-user
|
||||
|
||||
#include <timemory/process/threading.hpp>
|
||||
#include <timemory/signals/signal_handlers.hpp>
|
||||
#include <timemory/signals/types.hpp>
|
||||
#include <timemory/hash/types.hpp>
|
||||
#include <timemory/manager/manager.hpp>
|
||||
#include <timemory/mpl/type_traits.hpp>
|
||||
#include <timemory/operations/types/file_output_message.hpp>
|
||||
#include <timemory/signals/signal_mask.hpp>
|
||||
#include <timemory/process/threading.hpp>
|
||||
#include <timemory/settings/types.hpp>
|
||||
#include <timemory/signals/signal_handlers.hpp>
|
||||
#include <timemory/signals/signal_mask.hpp>
|
||||
#include <timemory/signals/types.hpp>
|
||||
#include <timemory/utility/backtrace.hpp>
|
||||
#include <timemory/utility/procfs/maps.hpp>
|
||||
|
||||
#include <atomic>
|
||||
#include <cstdio>
|
||||
#include <cstdlib>
|
||||
#include <mutex>
|
||||
#include <stdexcept>
|
||||
#include <string_view>
|
||||
#include <utility>
|
||||
#include <cstdlib>
|
||||
#include <stdexcept>
|
||||
|
||||
using namespace omnitrace;
|
||||
|
||||
@@ -122,9 +125,18 @@ ensure_initialization(bool _offset, int64_t _glob_n, int64_t _offset_n)
|
||||
return _offset;
|
||||
}
|
||||
|
||||
void
|
||||
finalization_handler()
|
||||
{
|
||||
if(get_state() == State::Active) omnitrace_finalize();
|
||||
}
|
||||
|
||||
auto
|
||||
ensure_finalization(bool _static_init = false)
|
||||
{
|
||||
if(config::set_signal_handler(nullptr) == nullptr)
|
||||
config::set_signal_handler(&finalization_handler);
|
||||
|
||||
if(_static_init)
|
||||
{
|
||||
auto _idx = threading::add_callback(&ensure_initialization);
|
||||
@@ -132,6 +144,12 @@ ensure_finalization(bool _static_init = false)
|
||||
throw exception<std::runtime_error>("failure adding threading callback");
|
||||
}
|
||||
|
||||
OMNITRACE_CI_BASIC_THROW(
|
||||
config::set_signal_handler(nullptr) != &finalization_handler,
|
||||
"Assignment of signal handler failed. signal handler is %s, expected %s\n",
|
||||
as_hex(reinterpret_cast<void*>(config::set_signal_handler(nullptr))).c_str(),
|
||||
as_hex(reinterpret_cast<void*>(&finalization_handler)).c_str());
|
||||
|
||||
const auto& _info = thread_info::init();
|
||||
const auto& _tid = _info->index_data;
|
||||
if(_tid)
|
||||
@@ -144,7 +162,7 @@ ensure_finalization(bool _static_init = false)
|
||||
_tid->system_value);
|
||||
}
|
||||
|
||||
if(!get_env("OMNITRACE_COLORIZED_LOG", true)) tim::log::colorized() = false;
|
||||
if(get_env("OMNITRACE_MONOCHROME", false)) tim::log::monochrome() = true;
|
||||
|
||||
(void) tim::manager::instance();
|
||||
(void) tim::settings::shared_instance();
|
||||
@@ -192,7 +210,7 @@ struct fini_bundle
|
||||
{
|
||||
using data_type = std::tuple<Tp...>;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(fini_bundle)
|
||||
OMNITRACE_DEFAULT_OBJECT(fini_bundle)
|
||||
|
||||
fini_bundle(std::string_view _label)
|
||||
: m_label{ _label }
|
||||
@@ -400,7 +418,7 @@ omnitrace_init_library_hidden()
|
||||
extern "C" bool
|
||||
omnitrace_init_tooling_hidden()
|
||||
{
|
||||
if(!get_env("OMNITRACE_COLORIZED_LOG", true, false)) tim::log::colorized() = false;
|
||||
if(get_env("OMNITRACE_MONOCHROME", false, false)) tim::log::monochrome() = true;
|
||||
|
||||
if(!tim::get_env("OMNITRACE_INIT_TOOLING", true))
|
||||
{
|
||||
@@ -538,6 +556,8 @@ omnitrace_init_tooling_hidden()
|
||||
omnitrace::perfetto::start();
|
||||
}
|
||||
|
||||
categories::setup();
|
||||
|
||||
// if static objects are destroyed in the inverse order of when they are
|
||||
// created this should ensure that finalization is called before perfetto
|
||||
// ends the tracing session
|
||||
@@ -701,6 +721,10 @@ omnitrace_finalize_hidden(void)
|
||||
push_enable_sampling_on_child_threads(false);
|
||||
set_sampling_on_all_future_threads(false);
|
||||
|
||||
// if the categories are not enabled, it can/will suppress generating output for data
|
||||
// in category
|
||||
categories::enable_categories();
|
||||
|
||||
auto _debug_init = get_debug_finalize();
|
||||
auto _debug_value = get_debug();
|
||||
if(_debug_init) config::set_setting_value("OMNITRACE_DEBUG", true);
|
||||
@@ -951,7 +975,7 @@ omnitrace_finalize_hidden(void)
|
||||
bool _perfetto_output_error = false;
|
||||
if(get_use_perfetto() && !is_system_backend())
|
||||
{
|
||||
auto& tracing_session = tracing::get_perfetto_session();
|
||||
auto& tracing_session = get_perfetto_session();
|
||||
|
||||
OMNITRACE_CI_THROW(tracing_session == nullptr,
|
||||
"Null pointer to the tracing session");
|
||||
@@ -1061,6 +1085,8 @@ omnitrace_finalize_hidden(void)
|
||||
"omnitrace", _cfg);
|
||||
}
|
||||
|
||||
categories::shutdown();
|
||||
|
||||
_finalization.stop();
|
||||
|
||||
if(_perfetto_output_error)
|
||||
|
||||
@@ -3,7 +3,9 @@ configure_file(${CMAKE_CURRENT_SOURCE_DIR}/defines.hpp.in
|
||||
${CMAKE_CURRENT_BINARY_DIR}/defines.hpp @ONLY)
|
||||
|
||||
set(library_sources
|
||||
${CMAKE_CURRENT_LIST_DIR}/categories.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/config.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/constraint.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/coverage.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.cpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/critical_trace.cpp
|
||||
@@ -31,6 +33,7 @@ set(library_headers
|
||||
${CMAKE_CURRENT_LIST_DIR}/common.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/concepts.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/config.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/constraint.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/coverage.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.hpp
|
||||
${CMAKE_CURRENT_LIST_DIR}/critical_trace.hpp
|
||||
|
||||
@@ -39,7 +39,7 @@ struct address_multirange
|
||||
struct coarse
|
||||
{};
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(address_multirange)
|
||||
OMNITRACE_DEFAULT_OBJECT(address_multirange)
|
||||
|
||||
address_multirange& operator+=(std::pair<coarse, uintptr_t>&&);
|
||||
address_multirange& operator+=(std::pair<coarse, address_range>&& _v);
|
||||
|
||||
@@ -43,7 +43,7 @@ struct address_range
|
||||
uintptr_t low = std::numeric_limits<uintptr_t>::max();
|
||||
uintptr_t high = std::numeric_limits<uintptr_t>::min();
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(address_range)
|
||||
OMNITRACE_DEFAULT_OBJECT(address_range)
|
||||
|
||||
explicit address_range(uintptr_t _v);
|
||||
address_range(uintptr_t _low, uintptr_t _high);
|
||||
|
||||
@@ -64,7 +64,7 @@ namespace binary
|
||||
namespace
|
||||
{
|
||||
binary_info
|
||||
parse_line_info(const std::string& _name)
|
||||
parse_line_info(const std::string& _name, bool _process_dwarf)
|
||||
{
|
||||
auto _info = binary_info{};
|
||||
|
||||
@@ -105,10 +105,17 @@ parse_line_info(const std::string& _name)
|
||||
<< "section set size (" << _section_set.size() << ") != section map size ("
|
||||
<< _section_map.size() << ")\n";
|
||||
|
||||
_info.debug_info = dwarf_entry::process_dwarf(_bfd->fd, _info.ranges);
|
||||
if(_process_dwarf)
|
||||
{
|
||||
std::tie(_info.debug_info, _info.ranges, _info.breakpoints) =
|
||||
dwarf_entry::process_dwarf(_bfd->fd);
|
||||
}
|
||||
|
||||
for(auto& itr : _info.symbols)
|
||||
itr.read_dwarf(_info.debug_info);
|
||||
{
|
||||
itr.read_dwarf_entries(_info.debug_info);
|
||||
itr.read_dwarf_breakpoints(_info.breakpoints);
|
||||
}
|
||||
|
||||
_info.sort();
|
||||
}
|
||||
@@ -122,7 +129,7 @@ parse_line_info(const std::string& _name)
|
||||
|
||||
std::vector<binary_info>
|
||||
get_binary_info(const std::vector<std::string>& _files,
|
||||
const std::vector<scope_filter>& _filters)
|
||||
const std::vector<scope_filter>& _filters, bool _process_dwarf)
|
||||
{
|
||||
auto _satisfies_filter = [&_filters](auto _scope, const std::string& _value) {
|
||||
for(const auto& itr : _filters) // NOLINT
|
||||
@@ -157,7 +164,7 @@ get_binary_info(const std::vector<std::string>& _files,
|
||||
if(filepath::exists(_filename) && _satisfies_binary_filter(_filename) &&
|
||||
_exists.find(_filename) == _exists.end())
|
||||
{
|
||||
_data.emplace_back(parse_line_info(_filename));
|
||||
_data.emplace_back(parse_line_info(_filename, _process_dwarf));
|
||||
_exists.emplace(_filename);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -54,6 +54,7 @@ using bfd_file = ::tim::unwind::bfd_file;
|
||||
using hash_value_t = ::tim::hash_value_t;
|
||||
|
||||
std::vector<binary_info>
|
||||
get_binary_info(const std::vector<std::string>&, const std::vector<scope_filter>&);
|
||||
get_binary_info(const std::vector<std::string>&, const std::vector<scope_filter>&,
|
||||
bool _process_dwarf = true);
|
||||
} // namespace binary
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -30,8 +30,10 @@
|
||||
|
||||
#include <timemory/utility/procfs/maps.hpp>
|
||||
|
||||
#include <cstdint>
|
||||
#include <deque>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
namespace omnitrace
|
||||
@@ -40,17 +42,19 @@ namespace binary
|
||||
{
|
||||
struct binary_info
|
||||
{
|
||||
std::shared_ptr<bfd_file> bfd = {};
|
||||
std::vector<procfs::maps> mappings = {};
|
||||
std::deque<symbol> symbols = {};
|
||||
std::deque<dwarf_entry> debug_info = {};
|
||||
std::vector<address_range> ranges = {};
|
||||
std::unordered_map<address_range, void*> sections = {};
|
||||
std::shared_ptr<bfd_file> bfd = {};
|
||||
std::vector<procfs::maps> mappings = {};
|
||||
std::deque<symbol> symbols = {};
|
||||
std::deque<dwarf_entry> debug_info = {};
|
||||
std::vector<address_range> ranges = {};
|
||||
std::vector<uintptr_t> breakpoints = {};
|
||||
std::unordered_map<address_range, void*> sections = {};
|
||||
|
||||
void sort();
|
||||
void sort();
|
||||
std::string filename() const;
|
||||
|
||||
template <typename RetT = void>
|
||||
RetT* find_section(uintptr_t);
|
||||
RetT* find_section(uintptr_t) const;
|
||||
};
|
||||
|
||||
inline void
|
||||
@@ -60,11 +64,12 @@ binary_info::sort()
|
||||
utility::filter_sort_unique(symbols);
|
||||
utility::filter_sort_unique(ranges);
|
||||
utility::filter_sort_unique(debug_info);
|
||||
utility::filter_sort_unique(breakpoints);
|
||||
}
|
||||
|
||||
template <typename RetT>
|
||||
inline RetT*
|
||||
binary_info::find_section(uintptr_t _addr)
|
||||
binary_info::find_section(uintptr_t _addr) const
|
||||
{
|
||||
for(const auto& sitr : sections)
|
||||
{
|
||||
@@ -72,5 +77,11 @@ binary_info::find_section(uintptr_t _addr)
|
||||
}
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
inline std::string
|
||||
binary_info::filename() const
|
||||
{
|
||||
return (bfd) ? std::string{ bfd->name } : std::string{};
|
||||
}
|
||||
} // namespace binary
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -41,28 +41,51 @@ get_dwarf_address_ranges(Dwarf_Die* _die)
|
||||
{
|
||||
auto _ranges = std::vector<address_range>{};
|
||||
|
||||
if(dwarf_tag(_die) != DW_TAG_compile_unit) return _ranges;
|
||||
if(dwarf_tag(_die) != DW_TAG_compile_unit && dwarf_tag(_die) != DW_TAG_subprogram)
|
||||
return _ranges;
|
||||
|
||||
Dwarf_Addr _low_pc;
|
||||
Dwarf_Addr _high_pc;
|
||||
dwarf_lowpc(_die, &_low_pc);
|
||||
dwarf_highpc(_die, &_high_pc);
|
||||
|
||||
_ranges.emplace_back(address_range{ _low_pc, _high_pc });
|
||||
if(_low_pc > _high_pc)
|
||||
{
|
||||
Dwarf_Addr _entry_pc;
|
||||
dwarf_entrypc(_die, &_entry_pc);
|
||||
if(_entry_pc < _low_pc) _low_pc = _entry_pc;
|
||||
}
|
||||
|
||||
if(_low_pc < _high_pc) _ranges.emplace_back(_low_pc, _high_pc);
|
||||
|
||||
Dwarf_Addr _base_addr;
|
||||
ptrdiff_t _offset = 0;
|
||||
do
|
||||
{
|
||||
_ranges.emplace_back(address_range{ 0, 0 });
|
||||
} while((_offset = dwarf_ranges(_die, _offset, &_base_addr, &_ranges.back().low,
|
||||
&_ranges.back().high)) > 0);
|
||||
// will always have one extra
|
||||
_ranges.pop_back();
|
||||
uintptr_t _low = 0;
|
||||
uintptr_t _high = 0;
|
||||
_offset = dwarf_ranges(_die, _offset, &_base_addr, &_low, &_high);
|
||||
if(_low < _high) _ranges.emplace_back(_low, _high);
|
||||
} while(_offset > 0);
|
||||
|
||||
return _ranges;
|
||||
}
|
||||
|
||||
auto
|
||||
get_dwarf_breakpoints(Dwarf_Die* _die)
|
||||
{
|
||||
auto _bkpts = std::vector<uintptr_t>{};
|
||||
|
||||
if(dwarf_tag(_die) != DW_TAG_subprogram) return _bkpts;
|
||||
|
||||
Dwarf_Addr* _pts = nullptr;
|
||||
auto _npts = dwarf_entry_breakpoints(_die, &_pts);
|
||||
|
||||
if(_npts > 0 && _pts) _bkpts.assign(_pts, _pts + _npts);
|
||||
|
||||
return _bkpts;
|
||||
}
|
||||
|
||||
auto
|
||||
get_dwarf_entry(Dwarf_Die* _die)
|
||||
{
|
||||
@@ -133,37 +156,50 @@ dwarf_entry::is_valid() const
|
||||
return (*this != dwarf_entry{} && !file.empty());
|
||||
}
|
||||
|
||||
std::deque<dwarf_entry>
|
||||
dwarf_entry::process_dwarf(int _fd, std::vector<address_range>& _ranges)
|
||||
dwarf_entry::dwarf_tuple_t
|
||||
dwarf_entry::process_dwarf(int _fd)
|
||||
{
|
||||
auto* _dwarf_v = dwarf_begin(_fd, DWARF_C_READ);
|
||||
auto _line_info = std::deque<dwarf_entry>{};
|
||||
auto* _dwarf_v = dwarf_begin(_fd, DWARF_C_READ);
|
||||
auto _data_v = dwarf_tuple_t{};
|
||||
|
||||
size_t cu_header_size = 0;
|
||||
Dwarf_Off cu_off = 0;
|
||||
Dwarf_Off next_cu_off = 0;
|
||||
for(; dwarf_nextcu(_dwarf_v, cu_off, &next_cu_off, &cu_header_size, nullptr, nullptr,
|
||||
nullptr) == 0;
|
||||
cu_off = next_cu_off)
|
||||
if(_dwarf_v)
|
||||
{
|
||||
Dwarf_Off cu_die_off = cu_off + cu_header_size;
|
||||
Dwarf_Die cu_die;
|
||||
if(dwarf_offdie(_dwarf_v, cu_die_off, &cu_die) != nullptr)
|
||||
auto& _entries = std::get<0>(_data_v);
|
||||
auto& _ranges = std::get<1>(_data_v);
|
||||
auto& _bkpts = std::get<2>(_data_v);
|
||||
|
||||
size_t cu_header_size = 0;
|
||||
Dwarf_Off cu_off = 0;
|
||||
Dwarf_Off next_cu_off = 0;
|
||||
for(; dwarf_nextcu(_dwarf_v, cu_off, &next_cu_off, &cu_header_size, nullptr,
|
||||
nullptr, nullptr) == 0;
|
||||
cu_off = next_cu_off)
|
||||
{
|
||||
Dwarf_Die* _die = &cu_die;
|
||||
if(dwarf_tag(_die) == DW_TAG_compile_unit)
|
||||
auto cu_die_off = cu_off + cu_header_size;
|
||||
auto cu_die = Dwarf_Die{};
|
||||
if(dwarf_offdie(_dwarf_v, cu_die_off, &cu_die) != nullptr)
|
||||
{
|
||||
combine(_line_info, get_dwarf_entry(_die));
|
||||
combine(_ranges, get_dwarf_address_ranges(_die));
|
||||
Dwarf_Die* _die = &cu_die;
|
||||
if(dwarf_tag(_die) == DW_TAG_compile_unit)
|
||||
{
|
||||
combine(_entries, get_dwarf_entry(_die));
|
||||
combine(_ranges, get_dwarf_address_ranges(_die));
|
||||
}
|
||||
else if(dwarf_tag(_die) == DW_TAG_subprogram)
|
||||
{
|
||||
combine(_bkpts, get_dwarf_breakpoints(_die));
|
||||
combine(_ranges, get_dwarf_address_ranges(_die));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
dwarf_end(_dwarf_v);
|
||||
utility::filter_sort_unique(_entries);
|
||||
utility::filter_sort_unique(_ranges);
|
||||
utility::filter_sort_unique(_bkpts);
|
||||
}
|
||||
|
||||
dwarf_end(_dwarf_v);
|
||||
utility::filter_sort_unique(_line_info);
|
||||
utility::filter_sort_unique(_ranges);
|
||||
|
||||
return _line_info;
|
||||
return _data_v;
|
||||
}
|
||||
|
||||
template <typename ArchiveT>
|
||||
|
||||
@@ -31,7 +31,11 @@ namespace binary
|
||||
{
|
||||
struct dwarf_entry
|
||||
{
|
||||
TIMEMORY_DEFAULT_OBJECT(dwarf_entry)
|
||||
// tuple of dwarf line info, address ranges, and breakpoints
|
||||
using dwarf_tuple_t = std::tuple<std::deque<dwarf_entry>, std::vector<address_range>,
|
||||
std::vector<uintptr_t>>;
|
||||
|
||||
OMNITRACE_DEFAULT_OBJECT(dwarf_entry)
|
||||
|
||||
bool begin_statement = false;
|
||||
bool end_sequence = false;
|
||||
@@ -53,7 +57,7 @@ struct dwarf_entry
|
||||
bool operator!=(const dwarf_entry&) const;
|
||||
explicit operator bool() const { return is_valid(); }
|
||||
|
||||
static std::deque<dwarf_entry> process_dwarf(int _fd, std::vector<address_range>&);
|
||||
static dwarf_tuple_t process_dwarf(int _fd);
|
||||
|
||||
template <typename ArchiveT>
|
||||
void serialize(ArchiveT&, const unsigned int);
|
||||
|
||||
@@ -39,13 +39,59 @@ namespace omnitrace
|
||||
{
|
||||
namespace binary
|
||||
{
|
||||
namespace
|
||||
{
|
||||
const open_modes_vec_t default_link_open_modes = { (RTLD_LAZY | RTLD_NOLOAD),
|
||||
(RTLD_LAZY | RTLD_LOCAL) };
|
||||
}
|
||||
|
||||
std::string
|
||||
get_linked_path(const char* _name, open_modes_vec_t&& _open_modes)
|
||||
{
|
||||
if(_name == nullptr) return config::get_exe_realpath();
|
||||
|
||||
if(_open_modes.empty()) _open_modes = default_link_open_modes;
|
||||
|
||||
auto _lib = std::string{ _name };
|
||||
void* _handle = nullptr;
|
||||
bool _noload = false;
|
||||
for(auto _mode : _open_modes)
|
||||
{
|
||||
_handle = dlopen(_name, _mode);
|
||||
_noload = (_mode & RTLD_NOLOAD) == RTLD_NOLOAD;
|
||||
if(_handle) break;
|
||||
}
|
||||
|
||||
if(_handle)
|
||||
{
|
||||
struct link_map* _link_map = nullptr;
|
||||
dlinfo(_handle, RTLD_DI_LINKMAP, &_link_map);
|
||||
if(_link_map != nullptr && !std::string_view{ _link_map->l_name }.empty())
|
||||
{
|
||||
_lib = filepath::realpath(_link_map->l_name, nullptr, false);
|
||||
}
|
||||
if(_noload == false) dlclose(_handle);
|
||||
}
|
||||
return _lib;
|
||||
}
|
||||
|
||||
std::set<link_file>
|
||||
get_link_map(const char* _lib, const std::string& _exclude_linked_by,
|
||||
const std::string& _exclude_re)
|
||||
const std::string& _exclude_re, open_modes_vec_t&& _open_modes)
|
||||
{
|
||||
auto _get_chain = [](const char* _name) {
|
||||
void* _handle = dlopen(_name, RTLD_LAZY | RTLD_NOLOAD);
|
||||
auto _chain = std::set<std::string>{};
|
||||
if(_open_modes.empty()) _open_modes = default_link_open_modes;
|
||||
|
||||
auto _get_chain = [&_open_modes](const char* _name) {
|
||||
void* _handle = nullptr;
|
||||
bool _noload = false;
|
||||
for(auto _mode : _open_modes)
|
||||
{
|
||||
_handle = dlopen(_name, _mode);
|
||||
_noload = (_mode & RTLD_NOLOAD) == RTLD_NOLOAD;
|
||||
if(_handle) break;
|
||||
}
|
||||
|
||||
auto _chain = std::set<std::string>{};
|
||||
if(_handle)
|
||||
{
|
||||
struct link_map* _link_map = nullptr;
|
||||
@@ -66,6 +112,8 @@ get_link_map(const char* _lib, const std::string& _exclude_linked_by,
|
||||
}
|
||||
_next = _next->l_next;
|
||||
}
|
||||
|
||||
if(_noload == false) dlclose(_handle);
|
||||
}
|
||||
return _chain;
|
||||
};
|
||||
@@ -78,6 +126,7 @@ get_link_map(const char* _lib, const std::string& _exclude_linked_by,
|
||||
|
||||
for(const auto& itr : _full_chain)
|
||||
{
|
||||
std::cout << itr << std::endl;
|
||||
if(_excl_chain.find(itr) == _excl_chain.end())
|
||||
{
|
||||
if(_exclude_re.empty() || !std::regex_search(itr, std::regex{ _exclude_re }))
|
||||
|
||||
@@ -23,14 +23,18 @@
|
||||
#pragma once
|
||||
|
||||
#include <cstdint>
|
||||
#include <dlfcn.h>
|
||||
#include <set>
|
||||
#include <string>
|
||||
#include <string_view>
|
||||
#include <vector>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace binary
|
||||
{
|
||||
using open_modes_vec_t = std::vector<int>;
|
||||
|
||||
struct link_file
|
||||
{
|
||||
link_file(std::string_view&& _v)
|
||||
@@ -44,11 +48,16 @@ struct link_file
|
||||
std::string name = {};
|
||||
};
|
||||
|
||||
// helper function for translating generic lib name to resolved path
|
||||
std::string
|
||||
get_linked_path(const char*, open_modes_vec_t&& = {});
|
||||
|
||||
// default parameters: get the linked binaries for the exe but exclude the linked binaries
|
||||
// from libomnitrace
|
||||
std::set<link_file>
|
||||
get_link_map(const char* _lib = nullptr,
|
||||
const std::string& _exclude_linked_by = "libomnitrace.so",
|
||||
const std::string& _exclude_re = "libomnitrace-([a-zA-Z]+)\\.so");
|
||||
const std::string& _exclude_re = "libomnitrace-([a-zA-Z]+)\\.so",
|
||||
open_modes_vec_t&& _open_modes = {});
|
||||
} // namespace binary
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -136,7 +136,7 @@ symbol::operator bool() const
|
||||
}
|
||||
|
||||
size_t
|
||||
symbol::read_dwarf(const std::deque<dwarf_entry>& _info)
|
||||
symbol::read_dwarf_entries(const std::deque<dwarf_entry>& _info)
|
||||
{
|
||||
for(const auto& itr : _info)
|
||||
{
|
||||
@@ -173,6 +173,20 @@ symbol::read_dwarf(const std::deque<dwarf_entry>& _info)
|
||||
return dwarf_info.size();
|
||||
}
|
||||
|
||||
size_t
|
||||
symbol::read_dwarf_breakpoints(const std::vector<uintptr_t>& _bkpts)
|
||||
{
|
||||
for(const auto& itr : _bkpts)
|
||||
{
|
||||
if(address.contains(itr)) breakpoints.emplace_back(itr);
|
||||
}
|
||||
|
||||
// make sure the breakpoints are sorted low to high
|
||||
std::sort(breakpoints.begin(), breakpoints.end());
|
||||
|
||||
return breakpoints.size();
|
||||
}
|
||||
|
||||
bool
|
||||
symbol::read_bfd(bfd_file& _bfd)
|
||||
{
|
||||
|
||||
@@ -67,7 +67,8 @@ struct symbol : private tim::unwind::bfd_file::symbol
|
||||
explicit operator bool() const;
|
||||
|
||||
bool read_bfd(bfd_file&);
|
||||
size_t read_dwarf(const std::deque<dwarf_entry>&);
|
||||
size_t read_dwarf_entries(const std::deque<dwarf_entry>&);
|
||||
size_t read_dwarf_breakpoints(const std::vector<uintptr_t>&);
|
||||
address_range ipaddr() const { return address + load_address; }
|
||||
symbol clone() const;
|
||||
|
||||
@@ -89,6 +90,7 @@ struct symbol : private tim::unwind::bfd_file::symbol
|
||||
address_range address = {};
|
||||
std::string func = {};
|
||||
std::string file = {};
|
||||
std::vector<uintptr_t> breakpoints = {};
|
||||
std::vector<inlined_symbol> inlines = {};
|
||||
std::vector<dwarf_entry> dwarf_info = {};
|
||||
};
|
||||
|
||||
@@ -0,0 +1,141 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "library/categories.hpp"
|
||||
#include "library/common.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/constraint.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/timemory.hpp"
|
||||
#include "library/utility.hpp"
|
||||
|
||||
#include <set>
|
||||
#include <string>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace categories
|
||||
{
|
||||
namespace
|
||||
{
|
||||
template <typename Tp>
|
||||
void
|
||||
configure_categories(bool _enable, const std::set<std::string>& _categories)
|
||||
{
|
||||
auto _name = trait::name<Tp>::value;
|
||||
if(_categories.count(_name) > 0)
|
||||
{
|
||||
OMNITRACE_VERBOSE_F(3, "%s category: %s\n", (_enable) ? "Enabling" : "Disabling",
|
||||
_name);
|
||||
trait::runtime_enabled<Tp>::set(_enable);
|
||||
}
|
||||
}
|
||||
|
||||
template <size_t... Idx>
|
||||
void
|
||||
configure_categories(bool _enable, const std::set<std::string>& _categories,
|
||||
std::index_sequence<Idx...>)
|
||||
{
|
||||
(configure_categories<category_type_id_t<Idx>>(_enable, _categories), ...);
|
||||
}
|
||||
|
||||
void
|
||||
configure_categories(bool _enable, const std::set<std::string>& _categories)
|
||||
{
|
||||
OMNITRACE_VERBOSE_F(1, "%s categories...\n", (_enable) ? "Enabling" : "Disabling");
|
||||
|
||||
configure_categories(
|
||||
_enable, _categories,
|
||||
utility::make_index_sequence_range<1, OMNITRACE_CATEGORY_LAST>{});
|
||||
}
|
||||
} // namespace
|
||||
|
||||
void
|
||||
enable_categories(const std::set<std::string>& _categories)
|
||||
{
|
||||
configure_categories(
|
||||
true, _categories,
|
||||
utility::make_index_sequence_range<1, OMNITRACE_CATEGORY_LAST>{});
|
||||
}
|
||||
|
||||
void
|
||||
disable_categories(const std::set<std::string>& _categories)
|
||||
{
|
||||
configure_categories(
|
||||
false, _categories,
|
||||
utility::make_index_sequence_range<1, OMNITRACE_CATEGORY_LAST>{});
|
||||
}
|
||||
|
||||
void
|
||||
setup()
|
||||
{
|
||||
// disable specified categories
|
||||
disable_categories();
|
||||
|
||||
auto _trace_specs = constraint::get_trace_specs();
|
||||
|
||||
if(!_trace_specs.empty())
|
||||
{
|
||||
auto _trace_stages = constraint::get_trace_stages();
|
||||
|
||||
_trace_stages.init = [](const constraint::spec& _spec) {
|
||||
if(_spec.delay > 1.0e-3) disable_categories(config::get_enabled_categories());
|
||||
return get_state() < State::Finalized;
|
||||
};
|
||||
|
||||
_trace_stages.start = [](const constraint::spec&) {
|
||||
enable_categories(config::get_enabled_categories());
|
||||
return get_state() < State::Finalized;
|
||||
};
|
||||
|
||||
_trace_stages.stop = [](const constraint::spec&) {
|
||||
// only disable categories if not finalized since this might run in background
|
||||
// during finalization and disable output of data in those categories
|
||||
if(get_state() < State::Finalized)
|
||||
disable_categories(config::get_enabled_categories());
|
||||
return get_state() < State::Finalized;
|
||||
};
|
||||
|
||||
auto _promise = std::promise<void>();
|
||||
std::thread{ [_trace_specs, _trace_stages](std::promise<void>* _prom) {
|
||||
// ensure all categories are disabled before proceeding
|
||||
// if a delay is requested
|
||||
if(_trace_specs.front().delay > 1.0e-3)
|
||||
disable_categories(config::get_enabled_categories());
|
||||
_prom->set_value();
|
||||
for(const auto& itr : _trace_specs)
|
||||
itr(_trace_stages);
|
||||
},
|
||||
&_promise }
|
||||
.detach();
|
||||
|
||||
_promise.get_future().wait_for(std::chrono::seconds{ 1 });
|
||||
}
|
||||
}
|
||||
|
||||
void
|
||||
shutdown()
|
||||
{
|
||||
disable_categories(config::get_enabled_categories());
|
||||
}
|
||||
} // namespace categories
|
||||
} // namespace omnitrace
|
||||
@@ -122,6 +122,8 @@ OMNITRACE_DEFINE_CATEGORY(category, process_context_switch, OMNITRACE_CATEGORY_P
|
||||
OMNITRACE_DEFINE_CATEGORY(category, process_page_fault, OMNITRACE_CATEGORY_PROCESS_PAGE_FAULT, "process_page_fault", "Memory page faults in process (collected in background thread)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, process_user_mode_time, OMNITRACE_CATEGORY_PROCESS_USER_MODE_TIME, "process_user_cpu_time", "CPU time of functions executing in user-space in process in seconds (collected in background thread)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, process_kernel_mode_time, OMNITRACE_CATEGORY_PROCESS_KERNEL_MODE_TIME, "process_kernel_cpu_time", "CPU time of functions executing in kernel-space in process in seconds (collected in background thread)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, thread_wall_time, OMNITRACE_CATEGORY_THREAD_WALL_TIME, "thread_wall_time", "Wall-clock time on thread (derived from sampling)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, thread_cpu_time, OMNITRACE_CATEGORY_THREAD_CPU_TIME, "thread_cpu_time", "CPU time on thread (derived from sampling)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, thread_page_fault, OMNITRACE_CATEGORY_THREAD_PAGE_FAULT, "thread_page_fault", "Memory page faults on thread (derived from sampling)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, thread_peak_memory, OMNITRACE_CATEGORY_THREAD_PEAK_MEMORY, "thread_peak_memory", "Peak memory usage on thread in MB (derived from sampling)")
|
||||
OMNITRACE_DEFINE_CATEGORY(category, thread_context_switch, OMNITRACE_CATEGORY_THREAD_CONTEXT_SWITCH, "thread_context_switch", "Context switches on thread (derived from sampling)")
|
||||
@@ -182,6 +184,8 @@ using name = perfetto_category<Tp...>;
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::process_page_fault), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::process_user_mode_time), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::process_kernel_mode_time), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::thread_wall_time), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::thread_cpu_time), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::thread_page_fault), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::thread_peak_memory), \
|
||||
OMNITRACE_PERFETTO_CATEGORY(category::thread_context_switch), \
|
||||
@@ -193,3 +197,33 @@ using name = perfetto_category<Tp...>;
|
||||
#if defined(TIMEMORY_USE_PERFETTO)
|
||||
# define TIMEMORY_PERFETTO_CATEGORIES OMNITRACE_PERFETTO_CATEGORIES
|
||||
#endif
|
||||
|
||||
#include <set>
|
||||
#include <string>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
inline namespace config
|
||||
{
|
||||
std::set<std::string>
|
||||
get_enabled_categories();
|
||||
|
||||
std::set<std::string>
|
||||
get_disabled_categories();
|
||||
} // namespace config
|
||||
|
||||
namespace categories
|
||||
{
|
||||
void
|
||||
enable_categories(const std::set<std::string>& = config::get_enabled_categories());
|
||||
|
||||
void
|
||||
disable_categories(const std::set<std::string>& = config::get_disabled_categories());
|
||||
|
||||
void
|
||||
setup();
|
||||
|
||||
void
|
||||
shutdown();
|
||||
} // namespace categories
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -46,7 +46,7 @@ struct blocking_gotcha : comp::base<blocking_gotcha, void>
|
||||
{
|
||||
static constexpr size_t gotcha_capacity = 13;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(blocking_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(blocking_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label();
|
||||
|
||||
@@ -37,7 +37,7 @@ namespace component
|
||||
{
|
||||
struct causal_gotcha : tim::component::base<causal_gotcha, void>
|
||||
{
|
||||
TIMEMORY_DEFAULT_OBJECT(causal_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(causal_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "causal_gotcha"; }
|
||||
|
||||
@@ -52,7 +52,7 @@ struct progress_point : comp::base<progress_point, void>
|
||||
static std::string label();
|
||||
static std::string description();
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(progress_point)
|
||||
OMNITRACE_DEFAULT_OBJECT(progress_point)
|
||||
|
||||
void start();
|
||||
void stop();
|
||||
@@ -130,7 +130,7 @@ struct push_node<omnitrace::causal::component::progress_point>
|
||||
{
|
||||
using type = omnitrace::causal::component::progress_point;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(push_node)
|
||||
OMNITRACE_DEFAULT_OBJECT(push_node)
|
||||
|
||||
push_node(type& _obj, scope::config _scope, hash_value_t _hash,
|
||||
int64_t _tid = threading::get_id())
|
||||
@@ -147,7 +147,7 @@ struct pop_node<omnitrace::causal::component::progress_point>
|
||||
{
|
||||
using type = omnitrace::causal::component::progress_point;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(pop_node)
|
||||
OMNITRACE_DEFAULT_OBJECT(pop_node)
|
||||
|
||||
pop_node(type& _obj, int64_t _tid = threading::get_id()) { (*this)(_obj, _tid); }
|
||||
|
||||
|
||||
@@ -45,7 +45,7 @@ struct unblocking_gotcha : comp::base<unblocking_gotcha, void>
|
||||
{
|
||||
static constexpr size_t gotcha_capacity = 8;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(unblocking_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(unblocking_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label();
|
||||
|
||||
@@ -44,7 +44,7 @@ struct delay
|
||||
{
|
||||
using value_type = void;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(delay)
|
||||
OMNITRACE_DEFAULT_OBJECT(delay)
|
||||
|
||||
static void process();
|
||||
static void credit();
|
||||
|
||||
@@ -90,7 +90,7 @@ struct experiment
|
||||
static std::string description();
|
||||
static const std::atomic<experiment*>& get_current_experiment();
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(experiment)
|
||||
OMNITRACE_DEFAULT_OBJECT(experiment)
|
||||
|
||||
bool start();
|
||||
bool wait() const; // returns false if interrupted
|
||||
|
||||
@@ -47,7 +47,7 @@ namespace causal
|
||||
{
|
||||
struct selected_entry
|
||||
{
|
||||
TIMEMORY_DEFAULT_OBJECT(selected_entry)
|
||||
OMNITRACE_DEFAULT_OBJECT(selected_entry)
|
||||
|
||||
uintptr_t address = 0x0;
|
||||
uintptr_t symbol_address = 0x0;
|
||||
|
||||
@@ -48,6 +48,7 @@
|
||||
#include <timemory/mpl.hpp>
|
||||
#include <timemory/mpl/quirks.hpp>
|
||||
#include <timemory/mpl/type_traits.hpp>
|
||||
#include <timemory/mpl/types.hpp>
|
||||
#include <timemory/operations.hpp>
|
||||
#include <timemory/storage.hpp>
|
||||
#include <timemory/units.hpp>
|
||||
@@ -150,10 +151,32 @@ void
|
||||
backtrace_metrics::stop()
|
||||
{}
|
||||
|
||||
namespace
|
||||
{
|
||||
template <typename... Tp>
|
||||
auto get_enabled(tim::type_list<Tp...>)
|
||||
{
|
||||
constexpr size_t N = sizeof...(Tp);
|
||||
auto _v = std::bitset<N>{};
|
||||
size_t _n = 0;
|
||||
(_v.set(_n++, trait::runtime_enabled<Tp>::get()), ...);
|
||||
return _v;
|
||||
}
|
||||
} // namespace
|
||||
void
|
||||
backtrace_metrics::sample(int)
|
||||
{
|
||||
auto _tid = threading::get_id();
|
||||
if(!get_enabled(type_list<category::process_sampling, backtrace_metrics>{}).all())
|
||||
{
|
||||
m_valid.reset();
|
||||
return;
|
||||
}
|
||||
|
||||
m_valid = get_enabled(categories_t{});
|
||||
|
||||
// return if everything is disabled
|
||||
if(!m_valid.any()) return;
|
||||
|
||||
auto _cache = tim::rusage_cache{ RUSAGE_THREAD };
|
||||
m_cpu = tim::get_clock_thread_now<int64_t, std::nano>();
|
||||
m_mem_peak = _cache.get_peak_rss();
|
||||
@@ -163,16 +186,15 @@ backtrace_metrics::sample(int)
|
||||
|
||||
if constexpr(tim::trait::is_available<hw_counters>::value)
|
||||
{
|
||||
if(tim::trait::runtime_enabled<hw_counters>::get())
|
||||
constexpr auto hw_counters_idx = tim::index_of<hw_counters, categories_t>::value;
|
||||
constexpr auto hw_category_idx =
|
||||
tim::index_of<category::thread_hardware_counter, categories_t>::value;
|
||||
|
||||
auto _tid = threading::get_id();
|
||||
if(m_valid.test(hw_category_idx) && m_valid.test(hw_counters_idx))
|
||||
{
|
||||
assert(get_papi_vector(_tid).get() != nullptr);
|
||||
m_hw_counter = get_papi_vector(_tid)->record();
|
||||
// const auto& _cfg = get_papi_vector(_tid)->get_config();
|
||||
// std::cerr << "Config: ";
|
||||
// for(size_t i = 0; i < _cfg->size; ++i)
|
||||
// std::cerr << "[" << _cfg->labels.at(i) << "|" << _cfg->event_names.at(i)
|
||||
// << "|" << _cfg->event_codes.at(i) << "]";
|
||||
// std::cerr << "\n";
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -220,23 +242,27 @@ backtrace_metrics::configure(bool _setup, int64_t _tid)
|
||||
}
|
||||
|
||||
void
|
||||
backtrace_metrics::init_perfetto(int64_t _tid)
|
||||
backtrace_metrics::init_perfetto(int64_t _tid, valid_array_t _valid)
|
||||
{
|
||||
auto _hw_cnt_labels = *get_papi_labels(_tid);
|
||||
auto _tid_name = JOIN("", '[', _tid, ']');
|
||||
|
||||
if(!perfetto_counter_track<perfetto_rusage>::exists(_tid))
|
||||
{
|
||||
perfetto_counter_track<perfetto_rusage>::emplace(
|
||||
_tid, JOIN(' ', "Thread Peak Memory Usage", _tid_name, "(S)"), "MB");
|
||||
perfetto_counter_track<perfetto_rusage>::emplace(
|
||||
_tid, JOIN(' ', "Thread Context Switches", _tid_name, "(S)"));
|
||||
perfetto_counter_track<perfetto_rusage>::emplace(
|
||||
_tid, JOIN(' ', "Thread Page Faults", _tid_name, "(S)"));
|
||||
if(get_valid(category::thread_peak_memory{}, _valid))
|
||||
perfetto_counter_track<perfetto_rusage>::emplace(
|
||||
_tid, JOIN(' ', "Thread Peak Memory Usage", _tid_name, "(S)"), "MB");
|
||||
if(get_valid(category::thread_context_switch{}, _valid))
|
||||
perfetto_counter_track<perfetto_rusage>::emplace(
|
||||
_tid, JOIN(' ', "Thread Context Switches", _tid_name, "(S)"));
|
||||
if(get_valid(category::thread_page_fault{}, _valid))
|
||||
perfetto_counter_track<perfetto_rusage>::emplace(
|
||||
_tid, JOIN(' ', "Thread Page Faults", _tid_name, "(S)"));
|
||||
}
|
||||
|
||||
if(!perfetto_counter_track<hw_counters>::exists(_tid) &&
|
||||
tim::trait::runtime_enabled<hw_counters>::get())
|
||||
get_valid(type_list<hw_counters>{}, _valid) &&
|
||||
get_valid(category::thread_hardware_counter{}, _valid))
|
||||
{
|
||||
for(auto& itr : _hw_cnt_labels)
|
||||
{
|
||||
@@ -250,7 +276,7 @@ backtrace_metrics::init_perfetto(int64_t _tid)
|
||||
}
|
||||
|
||||
void
|
||||
backtrace_metrics::fini_perfetto(int64_t _tid)
|
||||
backtrace_metrics::fini_perfetto(int64_t _tid, valid_array_t _valid)
|
||||
{
|
||||
auto _hw_cnt_labels = *get_papi_labels(_tid);
|
||||
const auto& _thread_info = thread_info::get(_tid, SequentTID);
|
||||
@@ -260,22 +286,32 @@ backtrace_metrics::fini_perfetto(int64_t _tid)
|
||||
|
||||
uint64_t _ts = _thread_info->get_stop();
|
||||
|
||||
TRACE_COUNTER("thread_peak_memory",
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts, 0);
|
||||
if(get_valid(category::thread_peak_memory{}, _valid))
|
||||
{
|
||||
TRACE_COUNTER(trait::name<category::thread_peak_memory>::value,
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts, 0);
|
||||
}
|
||||
|
||||
TRACE_COUNTER("thread_context_switch",
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, 0);
|
||||
if(get_valid(category::thread_context_switch{}, _valid))
|
||||
{
|
||||
TRACE_COUNTER(trait::name<category::thread_context_switch>::value,
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, 0);
|
||||
}
|
||||
|
||||
TRACE_COUNTER("thread_page_fault",
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, 0);
|
||||
if(get_valid(category::thread_page_fault{}, _valid))
|
||||
{
|
||||
TRACE_COUNTER(trait::name<category::thread_page_fault>::value,
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, 0);
|
||||
}
|
||||
|
||||
if(tim::trait::runtime_enabled<hw_counters>::get())
|
||||
if(get_valid(type_list<hw_counters>{}, _valid) &&
|
||||
get_valid(category::thread_hardware_counter{}, _valid))
|
||||
{
|
||||
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid); ++i)
|
||||
{
|
||||
if(i < _hw_cnt_labels.size())
|
||||
{
|
||||
TRACE_COUNTER("thread_hardware_counter",
|
||||
TRACE_COUNTER(trait::name<category::thread_hardware_counter>::value,
|
||||
perfetto_counter_track<hw_counters>::at(_tid, i), _ts, 0.0);
|
||||
}
|
||||
}
|
||||
@@ -285,23 +321,33 @@ backtrace_metrics::fini_perfetto(int64_t _tid)
|
||||
void
|
||||
backtrace_metrics::post_process_perfetto(int64_t _tid, uint64_t _ts) const
|
||||
{
|
||||
TRACE_COUNTER("thread_peak_memory",
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts,
|
||||
m_mem_peak / units::megabyte);
|
||||
if((*this)(category::thread_peak_memory{}))
|
||||
{
|
||||
TRACE_COUNTER(trait::name<category::thread_peak_memory>::value,
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts,
|
||||
m_mem_peak / units::megabyte);
|
||||
}
|
||||
|
||||
TRACE_COUNTER("thread_context_switch",
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, m_ctx_swch);
|
||||
if((*this)(category::thread_context_switch{}))
|
||||
{
|
||||
TRACE_COUNTER(trait::name<category::thread_context_switch>::value,
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts,
|
||||
m_ctx_swch);
|
||||
}
|
||||
|
||||
TRACE_COUNTER("thread_page_fault",
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, m_page_flt);
|
||||
|
||||
if(tim::trait::runtime_enabled<hw_counters>::get())
|
||||
if((*this)(category::thread_page_fault{}))
|
||||
{
|
||||
TRACE_COUNTER(trait::name<category::thread_page_fault>::value,
|
||||
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts,
|
||||
m_page_flt);
|
||||
}
|
||||
if((*this)(type_list<hw_counters>{}) && (*this)(category::thread_hardware_counter{}))
|
||||
{
|
||||
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid); ++i)
|
||||
{
|
||||
if(i < m_hw_counter.size())
|
||||
{
|
||||
TRACE_COUNTER("thread_hardware_counter",
|
||||
TRACE_COUNTER(trait::name<category::thread_hardware_counter>::value,
|
||||
perfetto_counter_track<hw_counters>::at(_tid, i), _ts,
|
||||
m_hw_counter.at(i));
|
||||
}
|
||||
|
||||
@@ -34,6 +34,7 @@
|
||||
#include <timemory/components/papi/types.hpp>
|
||||
#include <timemory/macros/language.hpp>
|
||||
#include <timemory/mpl/concepts.hpp>
|
||||
#include <timemory/utility/type_list.hpp>
|
||||
#include <timemory/variadic/types.hpp>
|
||||
|
||||
#include <array>
|
||||
@@ -45,11 +46,14 @@
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
template <typename... Tp>
|
||||
using type_list = ::tim::type_list<Tp...>;
|
||||
|
||||
namespace component
|
||||
{
|
||||
struct backtrace_metrics
|
||||
: tim::component::empty_base
|
||||
, tim::concepts::component
|
||||
, concepts::component
|
||||
{
|
||||
static constexpr size_t num_hw_counters = TIMEMORY_PAPI_ARRAY_SIZE;
|
||||
|
||||
@@ -60,6 +64,13 @@ struct backtrace_metrics
|
||||
using system_clock = std::chrono::system_clock;
|
||||
using system_time_point = typename system_clock::time_point;
|
||||
|
||||
using categories_t =
|
||||
type_list<category::thread_cpu_time, category::thread_peak_memory,
|
||||
category::thread_context_switch, category::thread_page_fault,
|
||||
category::thread_hardware_counter, hw_counters>;
|
||||
static constexpr size_t num_categories = std::tuple_size<categories_t>::value;
|
||||
using valid_array_t = std::bitset<num_categories>;
|
||||
|
||||
static std::string label();
|
||||
static std::string description();
|
||||
|
||||
@@ -72,16 +83,31 @@ struct backtrace_metrics
|
||||
backtrace_metrics& operator=(backtrace_metrics&&) noexcept = default;
|
||||
|
||||
static void configure(bool, int64_t _tid = threading::get_id());
|
||||
static void init_perfetto(int64_t _tid);
|
||||
static void fini_perfetto(int64_t _tid);
|
||||
static void init_perfetto(int64_t _tid, valid_array_t);
|
||||
static void fini_perfetto(int64_t _tid, valid_array_t);
|
||||
static std::vector<std::string> get_hw_counter_labels(int64_t);
|
||||
|
||||
template <typename Tp>
|
||||
static bool get_valid(Tp, valid_array_t);
|
||||
|
||||
template <typename Tp>
|
||||
static bool get_valid(type_list<Tp>, valid_array_t);
|
||||
|
||||
static void start();
|
||||
static void stop();
|
||||
void sample(int = -1);
|
||||
void post_process(int64_t _tid, const backtrace* _bt,
|
||||
const backtrace_metrics* _last) const;
|
||||
|
||||
explicit operator bool() const { return m_valid.any(); }
|
||||
|
||||
template <typename Tp>
|
||||
bool operator()(Tp) const;
|
||||
|
||||
template <typename Tp>
|
||||
bool operator()(type_list<Tp>) const;
|
||||
|
||||
auto get_valid() const { return m_valid; }
|
||||
auto get_cpu_timestamp() const { return m_cpu; }
|
||||
auto get_peak_memory() const { return m_mem_peak; }
|
||||
auto get_context_switches() const { return m_ctx_swch; }
|
||||
@@ -91,12 +117,44 @@ struct backtrace_metrics
|
||||
void post_process_perfetto(int64_t _tid, uint64_t _ts) const;
|
||||
|
||||
private:
|
||||
valid_array_t m_valid = {};
|
||||
int64_t m_cpu = 0;
|
||||
int64_t m_mem_peak = 0;
|
||||
int64_t m_ctx_swch = 0;
|
||||
int64_t m_page_flt = 0;
|
||||
hw_counter_data_t m_hw_counter = {};
|
||||
};
|
||||
|
||||
template <typename Tp>
|
||||
bool
|
||||
backtrace_metrics::get_valid(type_list<Tp>, valid_array_t _valid)
|
||||
{
|
||||
constexpr auto idx = tim::index_of<Tp, categories_t>::value;
|
||||
return _valid.test(idx);
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
bool backtrace_metrics::operator()(type_list<Tp>) const
|
||||
{
|
||||
static_assert(!concepts::is_type_listing<Tp>::value,
|
||||
"Error! invalid call with tuple");
|
||||
|
||||
constexpr auto idx = tim::index_of<Tp, categories_t>::value;
|
||||
return m_valid.test(idx);
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
bool
|
||||
backtrace_metrics::get_valid(Tp, valid_array_t _valid)
|
||||
{
|
||||
return get_valid(type_list<Tp>{}, _valid);
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
bool backtrace_metrics::operator()(Tp) const
|
||||
{
|
||||
return (*this)(type_list<Tp>{});
|
||||
}
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
|
||||
@@ -68,7 +68,10 @@ using tracing_count_categories_t =
|
||||
category::rocm_hsa, category::rocm_rccl>;
|
||||
|
||||
// these categories are added to the critical trace
|
||||
using critical_trace_categories_t = type_list<category::host>;
|
||||
using critical_trace_categories_t =
|
||||
type_list<category::host, category::mpi, category::pthread, category::rocm_hip,
|
||||
category::rocm_hsa, category::rocm_rccl, category::device_hip,
|
||||
category::device_hsa, category::numa, category::python>;
|
||||
|
||||
// convert these categories to throughput points
|
||||
using causal_throughput_categories_t =
|
||||
@@ -128,7 +131,7 @@ void
|
||||
category_region<CategoryT>::start(std::string_view name, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
if(tracing::category_push_disabled<CategoryT>()) return;
|
||||
|
||||
// unconditionally return if thread is disabled or finalized
|
||||
if(get_thread_state() == ThreadState::Disabled) return;
|
||||
@@ -212,7 +215,7 @@ void
|
||||
category_region<CategoryT>::stop(std::string_view name, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
if(tracing::category_pop_disabled<CategoryT>()) return;
|
||||
|
||||
if(get_thread_state() == ThreadState::Disabled) return;
|
||||
|
||||
@@ -315,7 +318,7 @@ category_region<CategoryT>::mark(std::string_view name, Args&&...)
|
||||
if constexpr(!_ct_use_causal) return;
|
||||
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
if(tracing::category_mark_disabled<CategoryT>()) return;
|
||||
|
||||
// the expectation here is that if the state is not active then the call
|
||||
// to omnitrace_init_tooling_hidden will activate all the appropriate
|
||||
@@ -345,9 +348,6 @@ void
|
||||
category_region<CategoryT>::audit(const gotcha_data_t& _data, audit::incoming,
|
||||
Args&&... _args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
|
||||
start<OptsT...>(_data.tool_id.c_str(), [&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
@@ -364,9 +364,6 @@ void
|
||||
category_region<CategoryT>::audit(const gotcha_data_t& _data, audit::outgoing,
|
||||
Args&&... _args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
|
||||
stop<OptsT...>(_data.tool_id.c_str(), [&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
tracing::add_perfetto_annotation(ctx, "return", JOIN(", ", _args...));
|
||||
@@ -379,9 +376,6 @@ void
|
||||
category_region<CategoryT>::audit(std::string_view _name, audit::incoming,
|
||||
Args&&... _args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
|
||||
start<OptsT...>(_name.data(), [&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
@@ -398,9 +392,6 @@ void
|
||||
category_region<CategoryT>::audit(std::string_view _name, audit::outgoing,
|
||||
Args&&... _args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
|
||||
stop<OptsT...>(_name.data(), [&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
tracing::add_perfetto_annotation(ctx, "return", JOIN(", ", _args...));
|
||||
@@ -466,6 +457,5 @@ struct local_category_region : comp::base<local_category_region<CategoryT>, void
|
||||
private:
|
||||
std::string_view m_prefix = {};
|
||||
};
|
||||
|
||||
} // namespace component
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -97,7 +97,7 @@ struct comm_data : base<comm_data, void>
|
||||
static constexpr auto label = "RCCL Comm Send";
|
||||
};
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(comm_data)
|
||||
OMNITRACE_DEFAULT_OBJECT(comm_data)
|
||||
|
||||
static void preinit();
|
||||
static void configure();
|
||||
|
||||
@@ -45,7 +45,7 @@ struct cpu_freq
|
||||
using storage_type = tim::storage<cpu_freq, value_type>;
|
||||
using cpu_id_set_t = std::set<uint64_t>;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(cpu_freq)
|
||||
OMNITRACE_DEFAULT_OBJECT(cpu_freq)
|
||||
|
||||
// string id for component
|
||||
static std::string label();
|
||||
|
||||
@@ -39,7 +39,7 @@ namespace
|
||||
template <typename... Tp>
|
||||
struct ensure_storage
|
||||
{
|
||||
TIMEMORY_DEFAULT_OBJECT(ensure_storage)
|
||||
OMNITRACE_DEFAULT_OBJECT(ensure_storage)
|
||||
|
||||
void operator()() const { OMNITRACE_FOLD_EXPRESSION((*this)(tim::type_list<Tp>{})); }
|
||||
|
||||
|
||||
@@ -44,7 +44,7 @@ struct exit_gotcha : tim::component::base<exit_gotcha, void>
|
||||
using exit_func_t = void (*)(int);
|
||||
using abort_func_t = void (*)();
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(exit_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(exit_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "exit_gotcha"; }
|
||||
|
||||
@@ -37,7 +37,7 @@ struct fork_gotcha : comp::base<fork_gotcha, void>
|
||||
|
||||
using gotcha_data_t = comp::gotcha_data;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(fork_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(fork_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "fork_gotcha"; }
|
||||
|
||||
@@ -38,7 +38,7 @@ struct mpi_gotcha : comp::base<mpi_gotcha, void>
|
||||
using comm_t = tim::mpi::comm_t;
|
||||
using gotcha_data_t = comp::gotcha_data;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(mpi_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(mpi_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "mpi_gotcha"; }
|
||||
|
||||
@@ -44,7 +44,7 @@ struct numa_gotcha : tim::component::base<numa_gotcha, void>
|
||||
using exit_func_t = void (*)(int);
|
||||
using abort_func_t = void (*)();
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(numa_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(numa_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "numa_gotcha"; }
|
||||
|
||||
@@ -161,6 +161,7 @@ pthread_create_gotcha::wrapper::operator()() const
|
||||
auto _signals = std::set<int>{};
|
||||
auto _coverage = (get_mode() == Mode::Coverage);
|
||||
const auto& _parent_info = thread_info::get(m_config.parent_tid, InternalTID);
|
||||
const auto& _info = thread_info::init(m_config.offset);
|
||||
auto _dtor = [&]() {
|
||||
set_thread_state(ThreadState::Internal);
|
||||
if(_is_sampling)
|
||||
@@ -189,16 +190,22 @@ pthread_create_gotcha::wrapper::operator()() const
|
||||
_thr_bundle->stop();
|
||||
if(_bundle) stop_bundle(*_bundle, _tid);
|
||||
pthread_create_gotcha::shutdown(_tid);
|
||||
OMNITRACE_BASIC_VERBOSE(
|
||||
1, "[PID=%i][rank=%i] Thread %s (parent: %s) exited\n", process::get_id(),
|
||||
dmp::rank(), _info->index_data->as_string().c_str(),
|
||||
_parent_info->index_data->as_string().c_str());
|
||||
}
|
||||
};
|
||||
|
||||
auto _active = (get_state() == ::omnitrace::State::Active && bundles != nullptr &&
|
||||
bundles_mutex != nullptr);
|
||||
|
||||
const auto& _info = thread_info::init(m_config.offset);
|
||||
if(_active && !_coverage && !m_config.offset)
|
||||
{
|
||||
_tid = _info->index_data->sequent_value;
|
||||
OMNITRACE_BASIC_VERBOSE(1, "[PID=%i][rank=%i] Thread %s (parent: %s) created\n",
|
||||
process::get_id(), dmp::rank(),
|
||||
_info->index_data->as_string().c_str(),
|
||||
_parent_info->index_data->as_string().c_str());
|
||||
threading::set_thread_name(TIMEMORY_JOIN(" ", "Thread", _tid).c_str());
|
||||
if(!thread_bundle_data_t::instances().at(_tid))
|
||||
{
|
||||
@@ -235,6 +242,14 @@ pthread_create_gotcha::wrapper::operator()() const
|
||||
sampling::unblock_signals();
|
||||
}
|
||||
}
|
||||
else if(m_config.offset)
|
||||
{
|
||||
OMNITRACE_BASIC_VERBOSE(
|
||||
2,
|
||||
"[PID=%i][rank=%i] Thread %s (parent: %s) created [started by omnitrace]\n",
|
||||
process::get_id(), dmp::rank(), _info->index_data->as_string().c_str(),
|
||||
_parent_info->index_data->as_string().c_str());
|
||||
}
|
||||
|
||||
// notify the wrapper that all internal work is completed
|
||||
if(m_config.promise) m_config.promise->set_value();
|
||||
@@ -399,8 +414,9 @@ pthread_create_gotcha::operator()(pthread_t* thread, const pthread_attr_t* attr,
|
||||
|
||||
if(_active && !_disabled && !_info->is_offset)
|
||||
{
|
||||
OMNITRACE_VERBOSE(1, "Creating new thread on PID %i (rank: %i), TID %li\n",
|
||||
process::get_id(), dmp::rank(), _tid);
|
||||
OMNITRACE_BASIC_VERBOSE(2, "[PID=%i][rank=%i] Starting new thread on %s...\n",
|
||||
process::get_id(), dmp::rank(),
|
||||
_info->index_data->as_string().c_str());
|
||||
}
|
||||
|
||||
// ensure that cpu cid stack exists on the parent thread if active
|
||||
|
||||
@@ -64,7 +64,7 @@ struct pthread_create_gotcha : tim::component::base<pthread_create_gotcha, void>
|
||||
wrapper_config m_config = {};
|
||||
};
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(pthread_create_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(pthread_create_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "pthread_create_gotcha"; }
|
||||
|
||||
@@ -48,7 +48,7 @@ struct stop<omnitrace::component::pthread_create_gotcha_t>
|
||||
{
|
||||
using type = omnitrace::component::pthread_create_gotcha_t;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(stop)
|
||||
OMNITRACE_DEFAULT_OBJECT(stop)
|
||||
|
||||
template <typename... Args>
|
||||
explicit stop(type&, Args&&...)
|
||||
|
||||
@@ -33,7 +33,7 @@ namespace omnitrace
|
||||
{
|
||||
struct pthread_gotcha : tim::component::base<pthread_gotcha, void>
|
||||
{
|
||||
TIMEMORY_DEFAULT_OBJECT(pthread_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(pthread_gotcha)
|
||||
|
||||
// string id for component
|
||||
static std::string label() { return "pthread_gotcha"; }
|
||||
|
||||
@@ -44,7 +44,7 @@ struct pthread_mutex_gotcha : comp::base<pthread_mutex_gotcha, void>
|
||||
using hash_array_t = std::array<size_t, gotcha_capacity>;
|
||||
using gotcha_data_t = comp::gotcha_data;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(pthread_mutex_gotcha)
|
||||
OMNITRACE_DEFAULT_OBJECT(pthread_mutex_gotcha)
|
||||
|
||||
explicit pthread_mutex_gotcha(const gotcha_data_t&);
|
||||
|
||||
|
||||
@@ -109,7 +109,7 @@ struct rocprofiler
|
||||
using base_type = base<rocprofiler, void>;
|
||||
using tracker_type = policy::instance_tracker<rocprofiler, false>;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(rocprofiler)
|
||||
OMNITRACE_DEFAULT_OBJECT(rocprofiler)
|
||||
|
||||
static void preinit();
|
||||
static void global_init() { setup(); }
|
||||
@@ -173,7 +173,7 @@ struct set_storage<component::rocm_data_tracker>
|
||||
using storage_array_t = std::array<storage<type>*, max_threads>;
|
||||
friend struct get_storage<component::rocm_data_tracker>;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(set_storage)
|
||||
OMNITRACE_DEFAULT_OBJECT(set_storage)
|
||||
|
||||
auto operator()(storage<type>*, size_t) const {}
|
||||
auto operator()(type&, size_t) const {}
|
||||
@@ -192,7 +192,7 @@ struct get_storage<component::rocm_data_tracker>
|
||||
{
|
||||
using type = component::rocm_data_tracker;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(get_storage)
|
||||
OMNITRACE_DEFAULT_OBJECT(get_storage)
|
||||
|
||||
auto operator()(const type&) const
|
||||
{
|
||||
|
||||
@@ -51,7 +51,7 @@ struct roctracer
|
||||
using base_type = base<roctracer, void>;
|
||||
using tracker_type = policy::instance_tracker<roctracer, false>;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(roctracer)
|
||||
OMNITRACE_DEFAULT_OBJECT(roctracer)
|
||||
|
||||
static void preinit();
|
||||
static void global_init() { setup(); }
|
||||
|
||||
@@ -94,5 +94,33 @@ public:
|
||||
static constexpr bool value = sfinae(0);
|
||||
constexpr auto operator()() const { return sfinae(0); }
|
||||
};
|
||||
|
||||
template <size_t N, typename Tp, bool>
|
||||
struct tuple_element_impl;
|
||||
|
||||
template <size_t N, typename... Tp>
|
||||
struct tuple_element_impl<N, std::tuple<Tp...>, true>
|
||||
{
|
||||
using type = typename std::tuple_element<N, std::tuple<Tp...>>::type;
|
||||
};
|
||||
|
||||
template <size_t N, typename... Tp>
|
||||
struct tuple_element_impl<N, std::tuple<Tp...>, false>
|
||||
{
|
||||
using type = void;
|
||||
};
|
||||
|
||||
template <size_t N, typename Tp>
|
||||
struct tuple_element;
|
||||
|
||||
template <size_t N, typename... Tp>
|
||||
struct tuple_element<N, std::tuple<Tp...>>
|
||||
{
|
||||
using type =
|
||||
typename tuple_element_impl<N, std::tuple<Tp...>, (N < sizeof...(Tp))>::type;
|
||||
};
|
||||
|
||||
template <size_t N, typename Tp>
|
||||
using tuple_element_t = typename tuple_element<N, Tp>::type;
|
||||
} // namespace concepts
|
||||
} // namespace tim
|
||||
|
||||
@@ -22,12 +22,12 @@
|
||||
|
||||
#include "library/config.hpp"
|
||||
#include "common/defines.h"
|
||||
#include "library/constraint.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/gpu.hpp"
|
||||
#include "library/mproc.hpp"
|
||||
#include "library/perfetto.hpp"
|
||||
#include "library/runtime.hpp"
|
||||
|
||||
#include <timemory/backends/dmp.hpp>
|
||||
#include <timemory/backends/mpi.hpp>
|
||||
@@ -43,12 +43,14 @@
|
||||
#include <timemory/settings/types.hpp>
|
||||
#include <timemory/utility/argparse.hpp>
|
||||
#include <timemory/utility/declaration.hpp>
|
||||
#include <timemory/utility/delimit.hpp>
|
||||
#include <timemory/utility/filepath.hpp>
|
||||
#include <timemory/utility/join.hpp>
|
||||
#include <timemory/utility/signals.hpp>
|
||||
|
||||
#include <algorithm>
|
||||
#include <array>
|
||||
#include <atomic>
|
||||
#include <csignal>
|
||||
#include <cstdint>
|
||||
#include <cstdlib>
|
||||
@@ -98,7 +100,7 @@ get_setting_name(std::string _v)
|
||||
|
||||
template <typename Tp>
|
||||
Tp
|
||||
get_available_perfetto_categories()
|
||||
get_available_categories()
|
||||
{
|
||||
auto _v = Tp{};
|
||||
for(auto itr : { OMNITRACE_PERFETTO_CATEGORIES })
|
||||
@@ -287,8 +289,8 @@ configure_settings(bool _init)
|
||||
"for continuous integration)",
|
||||
false, "debugging", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_COLORIZED_LOG", "Enable colorized logging",
|
||||
true, "debugging", "advanced");
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_MONOCHROME", "Disable colorized logging",
|
||||
false, "debugging", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_EXT_SETTING(int, "OMNITRACE_DL_VERBOSE",
|
||||
"Verbosity within the omnitrace-dl library", 0,
|
||||
@@ -392,10 +394,45 @@ configure_settings(bool _init)
|
||||
"Enable support for code coverage", false, "coverage",
|
||||
"backend", "advanced");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_INSTRUMENTATION_INTERVAL",
|
||||
"Instrumentation only takes measurements once every N "
|
||||
"function calls (not statistical)",
|
||||
size_t{ 1 }, "instrumentation", "data_sampling", "advanced");
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
double, "OMNITRACE_TRACE_DELAY",
|
||||
"Time in seconds to wait before enabling trace/profile data collection. If "
|
||||
"multiple delays + durations are needed, see OMNITRACE_TRACE_PERIODS.",
|
||||
0.0, "trace", "profile", "perfetto", "timemory");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
double, "OMNITRACE_TRACE_DURATION",
|
||||
"If > 0.0, time (in seconds) to collect trace/profile data. If multiple delays + "
|
||||
"durations are needed, see OMNITRACE_TRACE_PERIODS.",
|
||||
0.0, "trace", "profile", "perfetto", "timemory");
|
||||
|
||||
auto _clock_s =
|
||||
config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIOD_CLOCK_ID").second;
|
||||
|
||||
auto _clock_choices = std::vector<std::string>{};
|
||||
|
||||
for(const auto& itr : constraint::get_valid_clock_ids())
|
||||
{
|
||||
_clock_choices.emplace_back(
|
||||
join("", "(", join('|', itr.name, itr.value, itr.raw_name), ")"));
|
||||
}
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_TRACE_PERIODS",
|
||||
"Similar to specify trace delay and/or duration except in "
|
||||
"the form <DELAY>:<DURATION>, <DELAY>:<DURATION>:<REPEAT>, "
|
||||
"and/or <DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>",
|
||||
std::string{}, "trace", "profile", "perfetto", "timemory");
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_TRACE_PERIOD_CLOCK_ID",
|
||||
"Set the default clock ID for OMNITRACE_TRACE_DELAY, OMNITRACE_TRACE_DURATION, "
|
||||
"and/or OMNITRACE_TRACE_PERIODS. E.g. \"realtime\" == the delay/duration is "
|
||||
"governed by the elapsed realtime, \"cputime\" == the delay/duration is governed "
|
||||
"by the elapsed CPU-time within the process, etc. Note: when using CPU-based "
|
||||
"timing, it is recommened to scale the value by the number of threads and be "
|
||||
"aware that omnitrace may contribute to advancing the process CPU-time",
|
||||
"CLOCK_REALTIME", "trace", "profile", "perfetto", "timemory")
|
||||
->set_choices(_clock_choices);
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
double, "OMNITRACE_SAMPLING_FREQ",
|
||||
@@ -639,10 +676,18 @@ configure_settings(bool _init)
|
||||
"discard", "perfetto", "data")
|
||||
->set_choices({ "fill", "discard" });
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_PERFETTO_CATEGORIES",
|
||||
"Categories to collect within perfetto", "", "perfetto",
|
||||
"data", "advanced")
|
||||
->set_choices(get_available_perfetto_categories<std::vector<std::string>>());
|
||||
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_ENABLE_CATEGORIES",
|
||||
"Enable collecting profiling and trace data for these "
|
||||
"categories and disable all other categories",
|
||||
"", "trace", "profile", "perfetto", "timemory", "data",
|
||||
"advanced")
|
||||
->set_choices(get_available_categories<std::vector<std::string>>());
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(
|
||||
std::string, "OMNITRACE_DISABLE_CATEGORIES",
|
||||
"Disable collecting profiling and trace data for these categories", "", "trace",
|
||||
"profile", "perfetto", "timemory", "data", "advanced")
|
||||
->set_choices(get_available_categories<std::vector<std::string>>());
|
||||
|
||||
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_PERFETTO_ANNOTATIONS",
|
||||
"Include debug annotations in perfetto trace. When enabled, "
|
||||
@@ -977,8 +1022,8 @@ configure_settings(bool _init)
|
||||
|
||||
settings::suppress_config() = true;
|
||||
|
||||
if(!get_env("OMNITRACE_COLORIZED_LOG", _config->get<bool>("OMNITRACE_COLORIZED_LOG")))
|
||||
tim::log::colorized() = false;
|
||||
if(get_env("OMNITRACE_MONOCHROME", _config->get<bool>("OMNITRACE_MONOCHROME")))
|
||||
tim::log::monochrome() = true;
|
||||
|
||||
if(_init)
|
||||
{
|
||||
@@ -1105,8 +1150,6 @@ configure_mode_settings()
|
||||
_set("OMNITRACE_USE_ROCM_SMI", false);
|
||||
}
|
||||
|
||||
get_instrumentation_interval() = std::max<size_t>(get_instrumentation_interval(), 1);
|
||||
|
||||
if(get_use_kokkosp())
|
||||
{
|
||||
auto _current_kokkosp_lib = tim::get_env<std::string>("KOKKOS_PROFILE_LIBRARY");
|
||||
@@ -1156,6 +1199,13 @@ namespace
|
||||
using signal_settings = tim::signals::signal_settings;
|
||||
using sys_signal = tim::signals::sys_signal;
|
||||
|
||||
std::atomic<signal_handler_t>&
|
||||
get_signal_handler()
|
||||
{
|
||||
static auto _v = std::atomic<signal_handler_t>{ nullptr };
|
||||
return _v;
|
||||
}
|
||||
|
||||
void
|
||||
omnitrace_exit_action(int nsig)
|
||||
{
|
||||
@@ -1163,7 +1213,8 @@ omnitrace_exit_action(int nsig)
|
||||
tim::signals::sigmask_scope::process);
|
||||
OMNITRACE_BASIC_PRINT("Finalizing afer signal %i :: %s\n", nsig,
|
||||
signal_settings::str(static_cast<sys_signal>(nsig)).c_str());
|
||||
if(get_state() == State::Active) omnitrace_finalize();
|
||||
auto _handler = get_signal_handler().load();
|
||||
if(_handler) (*_handler)();
|
||||
kill(process::get_id(), nsig);
|
||||
}
|
||||
|
||||
@@ -1183,6 +1234,28 @@ omnitrace_trampoline_handler(int _v)
|
||||
}
|
||||
} // namespace
|
||||
|
||||
signal_handler_t
|
||||
set_signal_handler(signal_handler_t _func)
|
||||
{
|
||||
if(_func)
|
||||
{
|
||||
auto _handler = get_signal_handler().load(std::memory_order_relaxed);
|
||||
if(get_signal_handler().compare_exchange_strong(_handler, _func,
|
||||
std::memory_order_relaxed))
|
||||
{
|
||||
return _handler;
|
||||
}
|
||||
else
|
||||
{
|
||||
_handler = get_signal_handler().load(std::memory_order_seq_cst);
|
||||
get_signal_handler().store(_func);
|
||||
return _handler;
|
||||
}
|
||||
}
|
||||
|
||||
return get_signal_handler().load();
|
||||
}
|
||||
|
||||
void
|
||||
configure_signal_handler()
|
||||
{
|
||||
@@ -1218,6 +1291,35 @@ configure_signal_handler()
|
||||
}
|
||||
}
|
||||
|
||||
int
|
||||
get_realtime_signal()
|
||||
{
|
||||
return SIGRTMIN + get_sampling_rtoffset();
|
||||
}
|
||||
|
||||
int
|
||||
get_cputime_signal()
|
||||
{
|
||||
return SIGPROF;
|
||||
}
|
||||
|
||||
std::set<int> get_sampling_signals(int64_t)
|
||||
{
|
||||
auto _v = std::set<int>{};
|
||||
if(get_use_causal())
|
||||
{
|
||||
_v.emplace(get_cputime_signal());
|
||||
_v.emplace(get_realtime_signal());
|
||||
}
|
||||
else
|
||||
{
|
||||
if(get_use_sampling_cputime()) _v.emplace(get_cputime_signal());
|
||||
if(get_use_sampling_realtime()) _v.emplace(get_realtime_signal());
|
||||
}
|
||||
|
||||
return _v;
|
||||
}
|
||||
|
||||
void
|
||||
configure_disabled_settings()
|
||||
{
|
||||
@@ -1964,18 +2066,74 @@ get_perfetto_fill_policy()
|
||||
return static_cast<tim::tsettings<std::string>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
std::set<std::string>
|
||||
get_perfetto_categories()
|
||||
namespace
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_PERFETTO_CATEGORIES");
|
||||
static auto _avail = get_available_perfetto_categories<std::set<std::string>>();
|
||||
auto _ret = std::set<std::string>{};
|
||||
for(auto itr : tim::delimit(
|
||||
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), " ,;:"))
|
||||
{
|
||||
if(_avail.count(itr) > 0) _ret.emplace(itr);
|
||||
}
|
||||
return _ret;
|
||||
auto
|
||||
get_category_config()
|
||||
{
|
||||
using strset_t = std::set<std::string>;
|
||||
|
||||
static auto _v = []() {
|
||||
auto _avail = get_available_categories<strset_t>();
|
||||
auto _parse = [&_avail](const auto& _setting) {
|
||||
auto _ret = strset_t{};
|
||||
for(auto itr : tim::delimit(
|
||||
static_cast<tim::tsettings<std::string>&>(*_setting->second).get(),
|
||||
" ,;:\n\t"))
|
||||
{
|
||||
if(_avail.count(itr) > 0) _ret.emplace(itr);
|
||||
}
|
||||
return _ret;
|
||||
};
|
||||
|
||||
auto _enabled = _parse(get_config()->find("OMNITRACE_ENABLE_CATEGORIES"));
|
||||
auto _disabled = _parse(get_config()->find("OMNITRACE_DISABLE_CATEGORIES"));
|
||||
|
||||
if(_enabled.empty() && _disabled.empty())
|
||||
{
|
||||
_enabled = _avail;
|
||||
}
|
||||
else if(_enabled.empty() && !_disabled.empty())
|
||||
{
|
||||
for(auto itr : _avail)
|
||||
{
|
||||
if(_disabled.count(itr) == 0) _enabled.emplace(itr);
|
||||
}
|
||||
}
|
||||
else if(!_enabled.empty() && _disabled.empty())
|
||||
{
|
||||
for(auto itr : _avail)
|
||||
{
|
||||
if(_enabled.count(itr) == 0) _disabled.emplace(itr);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
OMNITRACE_ABORT("Error! Conflicting options OMNITRACE_ENABLE_CATEGORIES and "
|
||||
"OMNITRACE_DISABLE_CATEGORIES were both provided.");
|
||||
}
|
||||
|
||||
OMNITRACE_CI_THROW(_enabled.size() + _disabled.size() != _avail.size(),
|
||||
"Error! Internal error for categories: %zu (enabled) + %zu "
|
||||
"(disabled) != %zu (total)\n",
|
||||
_enabled.size(), _disabled.size(), _avail.size());
|
||||
|
||||
return std::make_pair(_enabled, _disabled);
|
||||
}();
|
||||
|
||||
return _v;
|
||||
}
|
||||
} // namespace
|
||||
std::set<std::string>
|
||||
get_enabled_categories()
|
||||
{
|
||||
return get_category_config().first;
|
||||
}
|
||||
|
||||
std::set<std::string>
|
||||
get_disabled_categories()
|
||||
{
|
||||
return get_category_config().second;
|
||||
}
|
||||
|
||||
bool
|
||||
@@ -2043,13 +2201,6 @@ get_perfetto_output_filename()
|
||||
return _val;
|
||||
}
|
||||
|
||||
size_t&
|
||||
get_instrumentation_interval()
|
||||
{
|
||||
static auto _v = get_config()->find("OMNITRACE_INSTRUMENTATION_INTERVAL");
|
||||
return static_cast<tim::tsettings<size_t>&>(*_v->second).get();
|
||||
}
|
||||
|
||||
double
|
||||
get_sampling_freq()
|
||||
{
|
||||
|
||||
@@ -22,7 +22,6 @@
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "api.hpp"
|
||||
#include "library/common.hpp"
|
||||
#include "library/defines.hpp"
|
||||
#include "library/state.hpp"
|
||||
@@ -43,6 +42,12 @@ namespace omnitrace
|
||||
//
|
||||
inline namespace config
|
||||
{
|
||||
using signal_handler_t = void (*)(void);
|
||||
|
||||
// if arg is nullptr, returns current signal handler
|
||||
// if arg is non-null, returns replaced signal handler
|
||||
signal_handler_t set_signal_handler(signal_handler_t);
|
||||
|
||||
bool
|
||||
settings_are_configured() OMNITRACE_HOT;
|
||||
|
||||
@@ -55,6 +60,15 @@ configure_mode_settings();
|
||||
void
|
||||
configure_signal_handler();
|
||||
|
||||
int
|
||||
get_realtime_signal();
|
||||
|
||||
int
|
||||
get_cputime_signal();
|
||||
|
||||
std::set<int>
|
||||
get_sampling_signals(int64_t _tid = 0);
|
||||
|
||||
void
|
||||
configure_disabled_settings();
|
||||
|
||||
@@ -257,7 +271,10 @@ std::string
|
||||
get_perfetto_fill_policy();
|
||||
|
||||
std::set<std::string>
|
||||
get_perfetto_categories();
|
||||
get_enabled_categories();
|
||||
|
||||
std::set<std::string>
|
||||
get_disabled_categories();
|
||||
|
||||
bool
|
||||
get_perfetto_annotations() OMNITRACE_HOT;
|
||||
@@ -284,8 +301,11 @@ get_perfetto_roctracer_per_stream() OMNITRACE_HOT;
|
||||
int64_t
|
||||
get_critical_trace_count();
|
||||
|
||||
size_t&
|
||||
get_instrumentation_interval();
|
||||
double
|
||||
get_trace_delay();
|
||||
|
||||
double
|
||||
get_trace_duration();
|
||||
|
||||
double
|
||||
get_sampling_freq();
|
||||
|
||||
@@ -0,0 +1,349 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#include "library/constraint.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/state.hpp"
|
||||
#include "library/utility.hpp"
|
||||
|
||||
#include <timemory/units.hpp>
|
||||
#include <timemory/utility/delimit.hpp>
|
||||
|
||||
#include <chrono>
|
||||
#include <cstdint>
|
||||
#include <ratio>
|
||||
#include <string>
|
||||
#include <thread>
|
||||
#include <type_traits>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace constraint
|
||||
{
|
||||
namespace
|
||||
{
|
||||
namespace units = ::tim::units;
|
||||
|
||||
using clock_type = std::chrono::high_resolution_clock;
|
||||
using duration_type = std::chrono::duration<double, std::nano>;
|
||||
|
||||
#define OMNITRACE_CLOCK_IDENTIFIER(VAL) \
|
||||
clock_identifier { #VAL, VAL }
|
||||
|
||||
auto
|
||||
clock_name(std::string _v)
|
||||
{
|
||||
constexpr auto _clock_prefix = std::string_view{ "clock_" };
|
||||
for(auto& itr : _v)
|
||||
itr = tolower(itr);
|
||||
auto _pos = _v.find(_clock_prefix);
|
||||
if(_pos == 0) _v = _v.substr(_pos + _clock_prefix.length());
|
||||
if(_v == "process_cputime_id") _v = "cputime";
|
||||
return _v;
|
||||
}
|
||||
|
||||
auto accepted_clock_ids =
|
||||
std::set<clock_identifier>{ OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_PROCESS_CPUTIME_ID),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_RAW),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME_COARSE),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_COARSE),
|
||||
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_BOOTTIME) };
|
||||
|
||||
template <typename Tp>
|
||||
clock_identifier
|
||||
find_clock_identifier(const Tp& _v)
|
||||
{
|
||||
const char* _descript = "";
|
||||
if constexpr(std::is_integral<Tp>::value)
|
||||
{
|
||||
_descript = "value";
|
||||
for(const auto& itr : accepted_clock_ids)
|
||||
{
|
||||
if(itr.value == _v)
|
||||
{
|
||||
return itr;
|
||||
}
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
_descript = "name";
|
||||
auto _clock_name = clock_name(_v);
|
||||
for(const auto& itr : accepted_clock_ids)
|
||||
{
|
||||
if(itr.name == _clock_name || itr.raw_name == _v ||
|
||||
std::to_string(itr.value) == _v)
|
||||
{
|
||||
return itr;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
OMNITRACE_THROW("Unknown clock id %s: %s. Valid choices: %s\n", _descript,
|
||||
timemory::join::join("", _v).c_str(),
|
||||
timemory::join::join("", accepted_clock_ids).c_str());
|
||||
}
|
||||
|
||||
void
|
||||
sleep(uint64_t _n)
|
||||
{
|
||||
std::this_thread::sleep_for(std::chrono::nanoseconds{ _n });
|
||||
}
|
||||
|
||||
timespec
|
||||
get_timespec(clockid_t clock_id) noexcept
|
||||
{
|
||||
struct timespec _ts;
|
||||
clock_gettime(clock_id, &_ts);
|
||||
return _ts;
|
||||
}
|
||||
|
||||
template <typename Tp = uint64_t, typename Precision = std::nano>
|
||||
Tp
|
||||
get_clock_now(clockid_t clock_id) noexcept
|
||||
{
|
||||
constexpr Tp factor = (Precision::den == std::nano::den)
|
||||
? 1
|
||||
: (Precision::den / static_cast<Tp>(std::nano::den));
|
||||
auto _ts = get_timespec(clock_id);
|
||||
return (_ts.tv_sec * std::nano::den + _ts.tv_nsec) * factor;
|
||||
}
|
||||
} // namespace
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//
|
||||
// stages implementation
|
||||
//
|
||||
//--------------------------------------------------------------------------------------//
|
||||
|
||||
stages::stages()
|
||||
: init{ [](const spec&) { return get_state() < State::Finalized; } }
|
||||
, wait{ [](const spec& _spec) {
|
||||
sleep(std::min<uint64_t>(100 * units::msec, _spec.delay * units::sec));
|
||||
return get_state() < State::Finalized;
|
||||
} }
|
||||
, start{ [](const spec&) { return get_state() < State::Finalized; } }
|
||||
, collect{ [](const spec& _spec) {
|
||||
sleep(std::min<uint64_t>(100 * units::msec, _spec.duration * units::sec));
|
||||
return get_state() < State::Finalized;
|
||||
} }
|
||||
, stop{ [](const spec&) { return get_state() < State::Finalized; } }
|
||||
{}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//
|
||||
// clock identifier implementation
|
||||
//
|
||||
//--------------------------------------------------------------------------------------//
|
||||
|
||||
clock_identifier::clock_identifier(std::string_view _name, int _val)
|
||||
: value{ _val }
|
||||
, raw_name{ _name }
|
||||
, name{ clock_name(std::string{ _name }) }
|
||||
{}
|
||||
|
||||
bool
|
||||
clock_identifier::operator<(const clock_identifier& _rhs) const
|
||||
{
|
||||
return value < _rhs.value;
|
||||
}
|
||||
|
||||
bool
|
||||
clock_identifier::operator==(const clock_identifier& _rhs) const
|
||||
{
|
||||
return std::tie(raw_name, value) == std::tie(_rhs.raw_name, _rhs.value);
|
||||
}
|
||||
|
||||
bool
|
||||
clock_identifier::operator==(int _rhs) const
|
||||
{
|
||||
return (value == _rhs);
|
||||
}
|
||||
|
||||
bool
|
||||
clock_identifier::operator==(std::string _rhs) const
|
||||
{
|
||||
return (raw_name == std::string_view{ _rhs }) ||
|
||||
(name == clock_name(std::move(_rhs)));
|
||||
}
|
||||
|
||||
std::string
|
||||
clock_identifier::as_string() const
|
||||
{
|
||||
auto _name = name;
|
||||
for(auto& itr : _name)
|
||||
itr = tolower(itr);
|
||||
auto _ss = std::stringstream{};
|
||||
_ss << _name << "(id=" << raw_name << ", value=" << value << ")";
|
||||
return _ss.str();
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//
|
||||
// spec implementation
|
||||
//
|
||||
//--------------------------------------------------------------------------------------//
|
||||
|
||||
spec::spec(clock_identifier _id, double _delay, double _dur, uint64_t _n, uint64_t _rep)
|
||||
: delay{ _delay }
|
||||
, duration{ _dur }
|
||||
, count{ _n }
|
||||
, repeat{ _rep }
|
||||
, clock_id{ std::move(_id) }
|
||||
{}
|
||||
|
||||
spec::spec(int _clock_id, double _delay, double _dur, uint64_t _n, uint64_t _rep)
|
||||
: delay{ _delay }
|
||||
, duration{ _dur }
|
||||
, count{ _n }
|
||||
, repeat{ _rep }
|
||||
, clock_id{ find_clock_identifier(_clock_id) }
|
||||
{}
|
||||
|
||||
spec::spec(const std::string& _clock_id, double _delay, double _dur, uint64_t _n,
|
||||
uint64_t _rep)
|
||||
: delay{ _delay }
|
||||
, duration{ _dur }
|
||||
, count{ _n }
|
||||
, repeat{ _rep }
|
||||
, clock_id{ find_clock_identifier(_clock_id) }
|
||||
{}
|
||||
|
||||
spec::spec(const std::string& _line)
|
||||
: spec{ config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIOD_CLOCK_ID").second,
|
||||
config::get_setting_value<double>("OMNITRACE_TRACE_DELAY").second,
|
||||
config::get_setting_value<double>("OMNITRACE_TRACE_DURATION").second }
|
||||
{
|
||||
auto _delim = tim::delimit(_line, ":");
|
||||
if(!_delim.empty()) delay = utility::convert<double>(_delim.at(0));
|
||||
if(_delim.size() > 1) duration = utility::convert<double>(_delim.at(1));
|
||||
if(_delim.size() > 2) repeat = utility::convert<uint64_t>(_delim.at(2));
|
||||
if(_delim.size() > 3) clock_id = find_clock_identifier(_delim.at(3));
|
||||
}
|
||||
|
||||
void
|
||||
spec::operator()(const stages& _stages) const
|
||||
{
|
||||
auto _n = repeat;
|
||||
if(_n < 1) _n = std::numeric_limits<uint64_t>::max();
|
||||
|
||||
while(get_state() < State::Active)
|
||||
sleep(1 * units::usec);
|
||||
|
||||
for(uint64_t i = 0; i < _n; ++i)
|
||||
{
|
||||
auto _spec = spec{ clock_id, delay, duration, i, repeat };
|
||||
auto _wait = [_spec](const auto& _func, auto _dur) {
|
||||
auto _ret = true;
|
||||
auto _now = get_clock_now(_spec.clock_id.value);
|
||||
auto _del = (_dur * units::sec);
|
||||
auto _end = _now + _del;
|
||||
while(get_clock_now(_spec.clock_id.value) < _end && (_ret = _func(_spec)))
|
||||
{}
|
||||
return _ret;
|
||||
};
|
||||
|
||||
OMNITRACE_VERBOSE(2,
|
||||
"Executing constraint spec %lu of %lu :: delay: %6.3f, "
|
||||
"duration: %6.3f, clock: %s\n",
|
||||
i, _spec.repeat, _spec.delay, _spec.duration,
|
||||
_spec.clock_id.as_string().c_str());
|
||||
|
||||
if(_stages.init(_spec) && _wait(_stages.wait, _spec.delay) &&
|
||||
_stages.start(_spec) && _wait(_stages.collect, _spec.duration) &&
|
||||
_stages.stop(_spec))
|
||||
{}
|
||||
else
|
||||
{
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
//--------------------------------------------------------------------------------------//
|
||||
//
|
||||
// global usage functions
|
||||
//
|
||||
//--------------------------------------------------------------------------------------//
|
||||
|
||||
const std::set<clock_identifier>&
|
||||
get_valid_clock_ids()
|
||||
{
|
||||
return accepted_clock_ids;
|
||||
}
|
||||
|
||||
std::vector<spec>
|
||||
get_trace_specs()
|
||||
{
|
||||
auto _v = std::vector<constraint::spec>{};
|
||||
|
||||
{
|
||||
auto _delay_v = config::get_setting_value<double>("OMNITRACE_TRACE_DELAY").second;
|
||||
auto _duration_v =
|
||||
config::get_setting_value<double>("OMNITRACE_TRACE_DURATION").second;
|
||||
auto _clock_v = find_clock_identifier(
|
||||
config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIOD_CLOCK_ID")
|
||||
.second);
|
||||
|
||||
if(_delay_v > 0.0 || _duration_v > 0.0)
|
||||
{
|
||||
_v.emplace_back(_clock_v, _delay_v, _duration_v);
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
auto _periods_v =
|
||||
config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIODS").second;
|
||||
if(!_periods_v.empty())
|
||||
{
|
||||
for(auto itr : tim::delimit(_periods_v, " ;\t\n"))
|
||||
_v.emplace_back(itr);
|
||||
}
|
||||
}
|
||||
|
||||
return _v;
|
||||
}
|
||||
|
||||
stages
|
||||
get_trace_stages()
|
||||
{
|
||||
auto _v = stages{};
|
||||
|
||||
_v.init = [](const spec&) { return get_state() < State::Finalized; };
|
||||
_v.wait = [](const spec& _spec) {
|
||||
sleep(std::min<uint64_t>(100 * units::msec, _spec.delay * units::sec));
|
||||
return get_state() < State::Finalized;
|
||||
};
|
||||
_v.start = [](const spec&) { return get_state() < State::Finalized; };
|
||||
_v.collect = [](const spec& _spec) {
|
||||
sleep(std::min<uint64_t>(100 * units::msec, _spec.duration * units::sec));
|
||||
return get_state() < State::Finalized;
|
||||
};
|
||||
_v.stop = [](const spec&) { return get_state() < State::Finalized; };
|
||||
|
||||
return _v;
|
||||
}
|
||||
} // namespace constraint
|
||||
} // namespace omnitrace
|
||||
@@ -0,0 +1,114 @@
|
||||
// MIT License
|
||||
//
|
||||
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
//
|
||||
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
// of this software and associated documentation files (the "Software"), to deal
|
||||
// in the Software without restriction, including without limitation the rights
|
||||
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
||||
// copies of the Software, and to permit persons to whom the Software is
|
||||
// furnished to do so, subject to the following conditions:
|
||||
//
|
||||
// The above copyright notice and this permission notice shall be included in all
|
||||
// copies or substantial portions of the Software.
|
||||
//
|
||||
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#pragma once
|
||||
|
||||
/// @file
|
||||
/// This provides generic functionality for constraining data collection within
|
||||
/// a windows of time. E.g., delay, delay + duration, (delay + duration) * nrepeat
|
||||
///
|
||||
/// @todo Migrate delay/duration for sampling, process sampling, and causal profiling
|
||||
/// to use this
|
||||
///
|
||||
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <cstdint>
|
||||
#include <ctime>
|
||||
#include <functional>
|
||||
#include <set>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
namespace constraint
|
||||
{
|
||||
struct spec;
|
||||
|
||||
struct stages
|
||||
{
|
||||
using functor_t = std::function<bool(const spec&)>;
|
||||
|
||||
stages();
|
||||
|
||||
OMNITRACE_DEFAULT_COPY_MOVE(stages)
|
||||
|
||||
functor_t init = [](const spec&) { return true; };
|
||||
functor_t wait = [](const spec&) { return true; };
|
||||
functor_t start = [](const spec&) { return true; };
|
||||
functor_t collect = [](const spec&) { return true; };
|
||||
functor_t stop = [](const spec&) { return true; };
|
||||
};
|
||||
|
||||
struct clock_identifier
|
||||
{
|
||||
int value = -1;
|
||||
std::string_view raw_name = {};
|
||||
std::string name = {};
|
||||
|
||||
clock_identifier();
|
||||
clock_identifier(std::string_view, int);
|
||||
|
||||
OMNITRACE_DEFAULT_COPY_MOVE(clock_identifier)
|
||||
|
||||
std::string as_string() const;
|
||||
|
||||
bool operator<(const clock_identifier& _rhs) const;
|
||||
bool operator==(const clock_identifier& _rhs) const;
|
||||
bool operator==(int _rhs) const;
|
||||
bool operator==(std::string _rhs) const;
|
||||
|
||||
friend std::ostream& operator<<(std::ostream& _os, const clock_identifier& _v)
|
||||
{
|
||||
return (_os << _v.as_string());
|
||||
}
|
||||
};
|
||||
|
||||
struct spec
|
||||
{
|
||||
spec(int, double, double, uint64_t = 0, uint64_t = 1);
|
||||
spec(clock_identifier, double, double, uint64_t = 0, uint64_t = 1);
|
||||
spec(const std::string&, double, double, uint64_t = 0, uint64_t = 1);
|
||||
spec(const std::string&);
|
||||
|
||||
OMNITRACE_DEFAULT_COPY_MOVE(spec)
|
||||
|
||||
void operator()(const stages&) const;
|
||||
|
||||
double delay = 0.0;
|
||||
double duration = 0.0;
|
||||
uint64_t count = 0;
|
||||
uint64_t repeat = 1;
|
||||
clock_identifier clock_id = {};
|
||||
};
|
||||
|
||||
const std::set<clock_identifier>&
|
||||
get_valid_clock_ids();
|
||||
|
||||
std::vector<spec>
|
||||
get_trace_specs();
|
||||
|
||||
stages
|
||||
get_trace_stages();
|
||||
} // namespace constraint
|
||||
} // namespace omnitrace
|
||||
@@ -22,7 +22,6 @@
|
||||
|
||||
#include "library/debug.hpp"
|
||||
#include "library/binary/address_range.hpp"
|
||||
#include "library/runtime.hpp"
|
||||
#include "library/state.hpp"
|
||||
|
||||
#include <timemory/log/color.hpp>
|
||||
@@ -91,7 +90,7 @@ get_file()
|
||||
{
|
||||
static FILE* _v = []() {
|
||||
auto&& _fname = tim::get_env<std::string>("OMNITRACE_LOG_FILE", "");
|
||||
if(!_fname.empty()) tim::log::colorized() = false;
|
||||
if(!_fname.empty()) tim::log::monochrome() = true;
|
||||
return (_fname.empty()) ? stderr : tim::filepath::fopen(_fname, "w");
|
||||
}();
|
||||
return _v;
|
||||
|
||||
@@ -42,3 +42,20 @@
|
||||
#define OMNITRACE_SAMPLING_GPU_MEMORY_USAGE OMNITRACE_SAMPLING_GPU_MEMORY_USAGE_idx
|
||||
|
||||
#define OMNITRACE_METADATA(...) ::tim::manager::add_metadata(__VA_ARGS__)
|
||||
|
||||
#if !defined(OMNITRACE_DEFAULT_OBJECT)
|
||||
# define OMNITRACE_DEFAULT_OBJECT(NAME) \
|
||||
NAME() = default; \
|
||||
NAME(const NAME&) = default; \
|
||||
NAME(NAME&&) noexcept = default; \
|
||||
NAME& operator=(const NAME&) = default; \
|
||||
NAME& operator=(NAME&&) noexcept = default;
|
||||
#endif
|
||||
|
||||
#if !defined(OMNITRACE_DEFAULT_COPY_MOVE)
|
||||
# define OMNITRACE_DEFAULT_COPY_MOVE(NAME) \
|
||||
NAME(const NAME&) = default; \
|
||||
NAME(NAME&&) noexcept = default; \
|
||||
NAME& operator=(const NAME&) = default; \
|
||||
NAME& operator=(NAME&&) noexcept = default;
|
||||
#endif
|
||||
|
||||
@@ -20,30 +20,34 @@
|
||||
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
||||
// SOFTWARE.
|
||||
|
||||
#if !defined(OMNITRACE_USE_ROCM_SMI)
|
||||
# define OMNITRACE_USE_ROCM_SMI 0
|
||||
#endif
|
||||
|
||||
#if !defined(OMNITRACE_USE_HIP)
|
||||
# define OMNITRACE_USE_HIP 0
|
||||
#endif
|
||||
|
||||
#if OMNITRACE_USE_HIP > 0
|
||||
# if !defined(TIMEMORY_USE_HIP)
|
||||
# define TIMEMORY_USE_HIP 1
|
||||
# endif
|
||||
#endif
|
||||
|
||||
#include "library/gpu.hpp"
|
||||
#include "library/debug.hpp"
|
||||
#include "library/defines.hpp"
|
||||
|
||||
#include <timemory/manager.hpp>
|
||||
|
||||
#if defined(OMNITRACE_USE_ROCM_SMI) && OMNITRACE_USE_ROCM_SMI > 0
|
||||
# include "library/rocm_smi.hpp"
|
||||
#elif !defined(OMNITRACE_USE_ROCM_SMI)
|
||||
# define OMNITRACE_USE_ROCM_SMI 0
|
||||
#endif
|
||||
|
||||
#if defined(OMNITRACE_USE_HIP) && OMNITRACE_USE_HIP > 0
|
||||
# if !defined(TIMEMORY_USE_HIP)
|
||||
# define TIMEMORY_USE_HIP 1
|
||||
# endif
|
||||
# include <timemory/components/hip/backends.hpp>
|
||||
#elif !defined(OMNITRACE_USE_HIP)
|
||||
# define OMNITRACE_USE_HIP 0
|
||||
#if OMNITRACE_USE_ROCM_SMI > 0
|
||||
# include <rocm_smi/rocm_smi.h>
|
||||
#endif
|
||||
|
||||
#if OMNITRACE_USE_HIP > 0
|
||||
# include <hip/hip_runtime.h>
|
||||
# include <hip/hip_runtime_api.h>
|
||||
# include <timemory/components/hip/backends.hpp>
|
||||
|
||||
# if !defined(OMNITRACE_HIP_RUNTIME_CALL)
|
||||
# define OMNITRACE_HIP_RUNTIME_CALL(err) \
|
||||
@@ -62,6 +66,49 @@ namespace omnitrace
|
||||
{
|
||||
namespace gpu
|
||||
{
|
||||
namespace
|
||||
{
|
||||
namespace scope = ::tim::scope;
|
||||
|
||||
#if OMNITRACE_USE_ROCM_SMI > 0
|
||||
# define OMNITRACE_ROCM_SMI_CALL(ERROR_CODE) \
|
||||
::omnitrace::gpu::check_rsmi_error(ERROR_CODE, __FILE__, __LINE__)
|
||||
|
||||
void
|
||||
check_rsmi_error(rsmi_status_t _code, const char* _file, int _line)
|
||||
{
|
||||
if(_code == RSMI_STATUS_SUCCESS) return;
|
||||
const char* _msg = nullptr;
|
||||
auto _err = rsmi_status_string(_code, &_msg);
|
||||
if(_err != RSMI_STATUS_SUCCESS)
|
||||
OMNITRACE_THROW("rsmi_status_string failed. No error message available. "
|
||||
"Error code %i originated at %s:%i\n",
|
||||
static_cast<int>(_code), _file, _line);
|
||||
OMNITRACE_THROW("[%s:%i] Error code %i :: %s", _file, _line, static_cast<int>(_code),
|
||||
_msg);
|
||||
}
|
||||
|
||||
bool
|
||||
rsmi_init()
|
||||
{
|
||||
auto _rsmi_init = []() {
|
||||
try
|
||||
{
|
||||
OMNITRACE_ROCM_SMI_CALL(::rsmi_init(0));
|
||||
} catch(std::exception& _e)
|
||||
{
|
||||
OMNITRACE_BASIC_VERBOSE(1, "Exception thrown initializing rocm-smi: %s\n",
|
||||
_e.what());
|
||||
return false;
|
||||
}
|
||||
return true;
|
||||
}();
|
||||
|
||||
return _rsmi_init;
|
||||
}
|
||||
#endif
|
||||
} // namespace
|
||||
|
||||
int
|
||||
hip_device_count()
|
||||
{
|
||||
@@ -72,13 +119,37 @@ hip_device_count()
|
||||
#endif
|
||||
}
|
||||
|
||||
int
|
||||
rsmi_device_count()
|
||||
{
|
||||
#if OMNITRACE_USE_ROCM_SMI > 0
|
||||
if(!rsmi_init()) return 0;
|
||||
|
||||
static auto _num_devices = []() {
|
||||
uint32_t _v = 0;
|
||||
try
|
||||
{
|
||||
OMNITRACE_ROCM_SMI_CALL(rsmi_num_monitor_devices(&_v));
|
||||
} catch(std::exception& _e)
|
||||
{
|
||||
OMNITRACE_BASIC_VERBOSE(
|
||||
1, "Exception thrown getting the rocm-smi devices: %s\n", _e.what());
|
||||
}
|
||||
return _v;
|
||||
}();
|
||||
|
||||
return _num_devices;
|
||||
#else
|
||||
return 0;
|
||||
#endif
|
||||
}
|
||||
|
||||
int
|
||||
device_count()
|
||||
{
|
||||
#if OMNITRACE_USE_ROCM_SMI > 0
|
||||
// store as static since calls after rsmi_shutdown will return zero
|
||||
static auto _v = rocm_smi::device_count();
|
||||
return _v;
|
||||
return rsmi_device_count();
|
||||
#elif OMNITRACE_USE_HIP > 0
|
||||
return ::tim::hip::device_count();
|
||||
#else
|
||||
|
||||
@@ -32,6 +32,9 @@ device_count();
|
||||
int
|
||||
hip_device_count();
|
||||
|
||||
int
|
||||
rsmi_device_count();
|
||||
|
||||
void
|
||||
add_hip_device_metadata();
|
||||
} // namespace gpu
|
||||
|
||||
@@ -28,37 +28,40 @@ namespace omnitrace
|
||||
{
|
||||
namespace perfetto
|
||||
{
|
||||
auto&
|
||||
get_config()
|
||||
{
|
||||
static auto _v = ::perfetto::TraceConfig{};
|
||||
return _v;
|
||||
}
|
||||
|
||||
auto&
|
||||
get_session()
|
||||
{
|
||||
static auto _v = std::unique_ptr<::perfetto::TracingSession>{};
|
||||
return _v;
|
||||
}
|
||||
|
||||
void
|
||||
setup()
|
||||
{
|
||||
auto args = ::perfetto::TracingInitArgs{};
|
||||
auto track_event_cfg = ::perfetto::protos::gen::TrackEventConfig{};
|
||||
auto& cfg = tracing::get_perfetto_config();
|
||||
auto& cfg = get_config();
|
||||
|
||||
// environment settings
|
||||
auto shmem_size_hint = get_perfetto_shmem_size_hint();
|
||||
auto buffer_size = get_perfetto_buffer_size();
|
||||
auto shmem_size_hint = config::get_perfetto_shmem_size_hint();
|
||||
auto buffer_size = config::get_perfetto_buffer_size();
|
||||
|
||||
auto _policy =
|
||||
get_perfetto_fill_policy() == "discard"
|
||||
config::get_perfetto_fill_policy() == "discard"
|
||||
? ::perfetto::protos::gen::TraceConfig_BufferConfig_FillPolicy_DISCARD
|
||||
: ::perfetto::protos::gen::TraceConfig_BufferConfig_FillPolicy_RING_BUFFER;
|
||||
auto* buffer_config = cfg.add_buffers();
|
||||
buffer_config->set_size_kb(buffer_size);
|
||||
buffer_config->set_fill_policy(_policy);
|
||||
|
||||
std::set<std::string> _available_categories = {};
|
||||
std::set<std::string> _disabled_categories = {};
|
||||
for(auto itr : { OMNITRACE_PERFETTO_CATEGORIES })
|
||||
_available_categories.emplace(itr.name);
|
||||
auto _enabled_categories = config::get_perfetto_categories();
|
||||
for(const auto& itr : _available_categories)
|
||||
{
|
||||
if(!_enabled_categories.empty() && _enabled_categories.count(itr) == 0)
|
||||
_disabled_categories.emplace(itr);
|
||||
}
|
||||
|
||||
for(const auto& itr : _disabled_categories)
|
||||
for(const auto& itr : config::get_disabled_categories())
|
||||
{
|
||||
OMNITRACE_VERBOSE_F(1, "Disabling perfetto track event category: %s\n",
|
||||
itr.c_str());
|
||||
@@ -81,31 +84,19 @@ setup()
|
||||
void
|
||||
start()
|
||||
{
|
||||
#if defined(CUSTOM_DATA_SOURCE)
|
||||
// Add the following:
|
||||
::perfetto::DataSourceDescriptor dsd{};
|
||||
dsd.set_name("com.example.custom_data_source");
|
||||
CustomDataSource::Register(dsd);
|
||||
auto* ds_cfg = cfg.add_data_sources()->mutable_config();
|
||||
ds_cfg->set_name("com.example.custom_data_source");
|
||||
CustomDataSource::Trace([](CustomDataSource::TraceContext ctx) {
|
||||
auto packet = ctx.NewTracePacket();
|
||||
packet->set_timestamp(::perfetto::TrackEvent::GetTraceTimeNs());
|
||||
packet->set_for_testing()->set_str("Hello world!");
|
||||
PRINT_HERE("%s", "Trace");
|
||||
});
|
||||
#endif
|
||||
auto& cfg = tracing::get_perfetto_config();
|
||||
auto& tracing_session = tracing::get_perfetto_session();
|
||||
auto& cfg = get_config();
|
||||
auto& tracing_session = get_session();
|
||||
tracing_session = ::perfetto::Tracing::NewTrace();
|
||||
tracing_session->Setup(cfg);
|
||||
tracing_session->StartBlocking();
|
||||
}
|
||||
} // namespace perfetto
|
||||
|
||||
std::unique_ptr<::perfetto::TracingSession>&
|
||||
get_perfetto_session()
|
||||
{
|
||||
return ::omnitrace::perfetto::get_session();
|
||||
}
|
||||
} // namespace omnitrace
|
||||
|
||||
PERFETTO_TRACK_EVENT_STATIC_STORAGE();
|
||||
|
||||
#if defined(CUSTOM_DATA_SOURCE)
|
||||
PERFETTO_DEFINE_DATA_SOURCE_STATIC_MEMBERS(CustomDataSource);
|
||||
#endif
|
||||
|
||||
@@ -43,123 +43,22 @@ PERFETTO_DEFINE_CATEGORIES(OMNITRACE_PERFETTO_CATEGORIES);
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
#if defined(CUSTOM_DATA_SOURCE)
|
||||
class CustomDataSource : public perfetto::DataSource<CustomDataSource>
|
||||
{
|
||||
public:
|
||||
void OnSetup(const SetupArgs&) override
|
||||
{
|
||||
// Use this callback to apply any custom configuration to your data source
|
||||
// based on the TraceConfig in SetupArgs.
|
||||
OMNITRACE_PRINT_F("[CustomDataSource] setup\n");
|
||||
}
|
||||
|
||||
void OnStart(const StartArgs&) override
|
||||
{
|
||||
// This notification can be used to initialize the GPU driver, enable
|
||||
// counters, etc. StartArgs will contains the DataSourceDescriptor,
|
||||
// which can be extended.
|
||||
OMNITRACE_PRINT_F("[CustomDataSource] start\n");
|
||||
}
|
||||
|
||||
void OnStop(const StopArgs&) override
|
||||
{
|
||||
// Undo any initialization done in OnStart.
|
||||
OMNITRACE_PRINT_F("[CustomDataSource] stop\n");
|
||||
}
|
||||
|
||||
// Data sources can also have per-instance state.
|
||||
int my_custom_state = 0;
|
||||
};
|
||||
|
||||
PERFETTO_DECLARE_DATA_SOURCE_STATIC_MEMBERS(CustomDataSource);
|
||||
#endif
|
||||
std::unique_ptr<::perfetto::TracingSession>&
|
||||
get_perfetto_session();
|
||||
|
||||
template <typename Tp>
|
||||
struct perfetto_counter_track
|
||||
{
|
||||
using track_map_t = std::map<uint32_t, std::vector<perfetto::CounterTrack>>;
|
||||
using track_map_t = std::map<uint32_t, std::vector<::perfetto::CounterTrack>>;
|
||||
using name_map_t = std::map<uint32_t, std::vector<std::unique_ptr<std::string>>>;
|
||||
using data_t = std::pair<name_map_t, track_map_t>;
|
||||
|
||||
static auto init() { (void) get_data(); }
|
||||
|
||||
static auto exists(size_t _idx, int64_t _n = -1)
|
||||
{
|
||||
bool _v = get_data().second.count(_idx) != 0;
|
||||
if(_n < 0 || !_v) return _v;
|
||||
return static_cast<size_t>(_n) < get_data().second.at(_idx).size();
|
||||
}
|
||||
|
||||
static size_t size(size_t _idx)
|
||||
{
|
||||
bool _v = get_data().second.count(_idx) != 0;
|
||||
if(!_v) return 0;
|
||||
return get_data().second.at(_idx).size();
|
||||
}
|
||||
|
||||
static auto init() { (void) get_data(); }
|
||||
static auto exists(size_t _idx, int64_t _n = -1);
|
||||
static size_t size(size_t _idx);
|
||||
static auto emplace(size_t _idx, const std::string& _v, const char* _units = nullptr,
|
||||
const char* _category = nullptr, int64_t _mult = 1,
|
||||
bool _incr = false)
|
||||
{
|
||||
auto& _name_data = get_data().first[_idx];
|
||||
auto& _track_data = get_data().second[_idx];
|
||||
std::vector<std::tuple<std::string, const char*, bool>> _missing = {};
|
||||
if(config::get_is_continuous_integration())
|
||||
{
|
||||
for(const auto& itr : _name_data)
|
||||
{
|
||||
_missing.emplace_back(std::make_tuple(*itr, itr->c_str(), false));
|
||||
}
|
||||
}
|
||||
auto _index = _track_data.size();
|
||||
auto& _name = _name_data.emplace_back(std::make_unique<std::string>(_v));
|
||||
const char* _unit_name = (_units && strlen(_units) > 0) ? _units : nullptr;
|
||||
_track_data.emplace_back(perfetto::CounterTrack{ _name->c_str() }
|
||||
.set_unit_name(_unit_name)
|
||||
.set_category(_category)
|
||||
.set_unit_multiplier(_mult)
|
||||
.set_is_incremental(_incr));
|
||||
if(config::get_is_continuous_integration())
|
||||
{
|
||||
for(auto& itr : _missing)
|
||||
{
|
||||
const char* citr = std::get<1>(itr);
|
||||
for(const auto& ditr : _name_data)
|
||||
{
|
||||
if(citr == ditr->c_str() && strcmp(citr, ditr->c_str()) == 0)
|
||||
{
|
||||
std::get<2>(itr) = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if(!std::get<2>(itr))
|
||||
{
|
||||
std::set<void*> _prev = {};
|
||||
std::set<void*> _curr = {};
|
||||
for(const auto& eitr : _missing)
|
||||
_prev.emplace(
|
||||
static_cast<void*>(const_cast<char*>(std::get<1>(eitr))));
|
||||
for(const auto& eitr : _name_data)
|
||||
_curr.emplace(
|
||||
static_cast<void*>(const_cast<char*>(eitr->c_str())));
|
||||
std::stringstream _pss{};
|
||||
for(auto&& eitr : _prev)
|
||||
_pss << " " << std::hex << std::setw(12) << std::left << eitr;
|
||||
std::stringstream _css{};
|
||||
for(auto&& eitr : _curr)
|
||||
_css << " " << std::hex << std::setw(12) << std::left << eitr;
|
||||
OMNITRACE_THROW("perfetto_counter_track emplace method for '%s' (%p) "
|
||||
"invalidated C-string '%s' (%p).\n%8s: %s\n%8s: %s\n",
|
||||
_v.c_str(), (void*) _name->c_str(),
|
||||
std::get<0>(itr).c_str(),
|
||||
(void*) std::get<0>(itr).c_str(), "previous",
|
||||
_pss.str().c_str(), "current", _css.str().c_str());
|
||||
}
|
||||
}
|
||||
}
|
||||
return _index;
|
||||
}
|
||||
bool _incr = false);
|
||||
|
||||
static auto& at(size_t _idx, size_t _n) { return get_data().second.at(_idx).at(_n); }
|
||||
|
||||
@@ -170,4 +69,87 @@ private:
|
||||
return _v;
|
||||
}
|
||||
};
|
||||
|
||||
template <typename Tp>
|
||||
auto
|
||||
perfetto_counter_track<Tp>::exists(size_t _idx, int64_t _n)
|
||||
{
|
||||
bool _v = get_data().second.count(_idx) != 0;
|
||||
if(_n < 0 || !_v) return _v;
|
||||
return static_cast<size_t>(_n) < get_data().second.at(_idx).size();
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
size_t
|
||||
perfetto_counter_track<Tp>::size(size_t _idx)
|
||||
{
|
||||
bool _v = get_data().second.count(_idx) != 0;
|
||||
if(!_v) return 0;
|
||||
return get_data().second.at(_idx).size();
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
auto
|
||||
perfetto_counter_track<Tp>::emplace(size_t _idx, const std::string& _v,
|
||||
const char* _units, const char* _category,
|
||||
int64_t _mult, bool _incr)
|
||||
{
|
||||
auto& _name_data = get_data().first[_idx];
|
||||
auto& _track_data = get_data().second[_idx];
|
||||
std::vector<std::tuple<std::string, const char*, bool>> _missing = {};
|
||||
if(config::get_is_continuous_integration())
|
||||
{
|
||||
for(const auto& itr : _name_data)
|
||||
{
|
||||
_missing.emplace_back(std::make_tuple(*itr, itr->c_str(), false));
|
||||
}
|
||||
}
|
||||
auto _index = _track_data.size();
|
||||
auto& _name = _name_data.emplace_back(std::make_unique<std::string>(_v));
|
||||
const char* _unit_name = (_units && strlen(_units) > 0) ? _units : nullptr;
|
||||
_track_data.emplace_back(::perfetto::CounterTrack{ _name->c_str() }
|
||||
.set_unit_name(_unit_name)
|
||||
.set_category(_category)
|
||||
.set_unit_multiplier(_mult)
|
||||
.set_is_incremental(_incr));
|
||||
if(config::get_is_continuous_integration())
|
||||
{
|
||||
for(auto& itr : _missing)
|
||||
{
|
||||
const char* citr = std::get<1>(itr);
|
||||
for(const auto& ditr : _name_data)
|
||||
{
|
||||
if(citr == ditr->c_str() && strcmp(citr, ditr->c_str()) == 0)
|
||||
{
|
||||
std::get<2>(itr) = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if(!std::get<2>(itr))
|
||||
{
|
||||
std::set<void*> _prev = {};
|
||||
std::set<void*> _curr = {};
|
||||
for(const auto& eitr : _missing)
|
||||
_prev.emplace(
|
||||
static_cast<void*>(const_cast<char*>(std::get<1>(eitr))));
|
||||
for(const auto& eitr : _name_data)
|
||||
_curr.emplace(static_cast<void*>(const_cast<char*>(eitr->c_str())));
|
||||
std::stringstream _pss{};
|
||||
for(auto&& eitr : _prev)
|
||||
_pss << " " << std::hex << std::setw(12) << std::left << eitr;
|
||||
std::stringstream _css{};
|
||||
for(auto&& eitr : _curr)
|
||||
_css << " " << std::hex << std::setw(12) << std::left << eitr;
|
||||
OMNITRACE_THROW("perfetto_counter_track emplace method for '%s' (%p) "
|
||||
"invalidated C-string '%s' (%p).\n%8s: %s\n%8s: %s\n",
|
||||
_v.c_str(), (void*) _name->c_str(),
|
||||
std::get<0>(itr).c_str(),
|
||||
(void*) std::get<0>(itr).c_str(), "previous",
|
||||
_pss.str().c_str(), "current", _css.str().c_str());
|
||||
}
|
||||
}
|
||||
}
|
||||
return _index;
|
||||
}
|
||||
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -442,20 +442,7 @@ post_process()
|
||||
uint32_t
|
||||
device_count()
|
||||
{
|
||||
uint32_t _num_devices = 0;
|
||||
try
|
||||
{
|
||||
static auto _rsmi_init_once = []() { OMNITRACE_ROCM_SMI_CALL(rsmi_init(0)); };
|
||||
static std::once_flag _once{};
|
||||
std::call_once(_once, _rsmi_init_once);
|
||||
|
||||
OMNITRACE_ROCM_SMI_CALL(rsmi_num_monitor_devices(&_num_devices));
|
||||
} catch(std::exception& _e)
|
||||
{
|
||||
OMNITRACE_BASIC_VERBOSE(1, "Exception thrown getting the rocm-smi devices: %s\n",
|
||||
_e.what());
|
||||
}
|
||||
return _num_devices;
|
||||
return gpu::rsmi_device_count();
|
||||
}
|
||||
} // namespace rocm_smi
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -82,7 +82,7 @@ struct data
|
||||
using mem_usage_t = uint64_t;
|
||||
using temp_t = int64_t;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(data)
|
||||
OMNITRACE_DEFAULT_OBJECT(data)
|
||||
|
||||
explicit data(uint32_t _dev_id);
|
||||
|
||||
|
||||
@@ -660,7 +660,7 @@ post_process_timemory()
|
||||
rocm_event* parent = nullptr;
|
||||
mutable std::vector<local_event> children = {};
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(local_event)
|
||||
OMNITRACE_DEFAULT_OBJECT(local_event)
|
||||
|
||||
explicit local_event(rocm_event* _v)
|
||||
: parent{ _v }
|
||||
|
||||
@@ -21,6 +21,7 @@
|
||||
// SOFTWARE.
|
||||
|
||||
#include "library/roctracer.hpp"
|
||||
#include "library/components/category_region.hpp"
|
||||
#include "library/components/fwd.hpp"
|
||||
#include "library/config.hpp"
|
||||
#include "library/critical_trace.hpp"
|
||||
@@ -99,7 +100,7 @@ get_roctracer_kernels()
|
||||
auto&
|
||||
get_roctracer_hip_data(int64_t _tid = threading::get_id())
|
||||
{
|
||||
using data_t = std::unordered_map<uint64_t, roctracer_bundle_t>;
|
||||
using data_t = std::unordered_map<uint64_t, roctracer_hip_bundle_t>;
|
||||
using thread_data_t = thread_data<data_t, category::roctracer>;
|
||||
static auto& _v = thread_data_t::instances(construct_on_init{});
|
||||
return _v.at(_tid);
|
||||
@@ -124,7 +125,7 @@ struct cid_data : cid_tuple_t
|
||||
{
|
||||
using cid_tuple_t::cid_tuple_t;
|
||||
|
||||
TIMEMORY_DEFAULT_OBJECT(cid_data)
|
||||
OMNITRACE_DEFAULT_OBJECT(cid_data)
|
||||
|
||||
auto& cid() { return std::get<0>(*this); }
|
||||
auto& pcid() { return std::get<1>(*this); }
|
||||
@@ -454,20 +455,12 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
|
||||
{
|
||||
case ROCTX_API_ID_roctxRangePushA:
|
||||
{
|
||||
if(get_use_perfetto())
|
||||
tracing::push_perfetto(category::rocm_roctx{}, _data->args.message);
|
||||
|
||||
if(get_use_timemory())
|
||||
tracing::push_timemory(category::rocm_roctx{}, _data->args.message);
|
||||
|
||||
component::category_region<category::rocm_roctx>::start(_data->args.message);
|
||||
break;
|
||||
}
|
||||
case ROCTX_API_ID_roctxRangePop:
|
||||
{
|
||||
if(get_use_timemory())
|
||||
tracing::pop_timemory(category::rocm_roctx{}, _data->args.message);
|
||||
if(get_use_perfetto())
|
||||
tracing::pop_perfetto(category::rocm_roctx{}, _data->args.message);
|
||||
component::category_region<category::rocm_roctx>::stop(_data->args.message);
|
||||
break;
|
||||
}
|
||||
case ROCTX_API_ID_roctxRangeStartA:
|
||||
@@ -479,11 +472,7 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
|
||||
std::string_view{ _data->args.message });
|
||||
}
|
||||
|
||||
if(get_use_perfetto())
|
||||
tracing::push_perfetto(category::rocm_roctx{}, _data->args.message);
|
||||
|
||||
if(get_use_timemory())
|
||||
tracing::push_timemory(category::rocm_roctx{}, _data->args.message);
|
||||
component::category_region<category::rocm_roctx>::start(_data->args.message);
|
||||
break;
|
||||
}
|
||||
case ROCTX_API_ID_roctxRangeStop:
|
||||
@@ -510,10 +499,7 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
|
||||
|
||||
if(!_message.empty())
|
||||
{
|
||||
if(get_use_timemory())
|
||||
tracing::pop_timemory(category::rocm_roctx{}, _message.data());
|
||||
if(get_use_perfetto())
|
||||
tracing::pop_perfetto(category::rocm_roctx{}, _message.data());
|
||||
component::category_region<category::rocm_roctx>::stop(_message.data());
|
||||
}
|
||||
|
||||
break;
|
||||
@@ -733,8 +719,8 @@ hip_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
|
||||
}
|
||||
if(get_use_timemory())
|
||||
{
|
||||
auto itr = get_roctracer_hip_data()->emplace(_corr_id,
|
||||
roctracer_bundle_t{ op_name });
|
||||
auto itr = get_roctracer_hip_data()->emplace(
|
||||
_corr_id, roctracer_hip_bundle_t{ op_name });
|
||||
if(itr.second)
|
||||
{
|
||||
itr.first->second.start();
|
||||
@@ -983,7 +969,7 @@ hip_activity_callback(const char* begin, const char* end, void* arg)
|
||||
if(_found && _name != nullptr && get_use_timemory())
|
||||
{
|
||||
auto _func = [_beg_ns, _end_ns, _name]() {
|
||||
roctracer_bundle_t _bundle{ _name };
|
||||
roctracer_hip_bundle_t _bundle{ _name };
|
||||
_bundle.start()
|
||||
.store(std::plus<double>{}, static_cast<double>(_end_ns - _beg_ns))
|
||||
.stop()
|
||||
|
||||
@@ -46,10 +46,10 @@
|
||||
|
||||
namespace omnitrace
|
||||
{
|
||||
using roctracer_bundle_t =
|
||||
tim::component_bundle<project::omnitrace, comp::roctracer_data, comp::wall_clock>;
|
||||
using roctracer_hip_bundle_t =
|
||||
tim::component_bundle<category::rocm_hip, comp::roctracer_data, comp::wall_clock>;
|
||||
using roctracer_hsa_bundle_t =
|
||||
tim::component_bundle<project::omnitrace, comp::roctracer_data>;
|
||||
tim::component_bundle<category::rocm_hsa, comp::roctracer_data>;
|
||||
using roctracer_functions_t = std::vector<std::pair<std::string, std::function<void()>>>;
|
||||
|
||||
// HSA API callback function
|
||||
|
||||
@@ -89,35 +89,6 @@ sampling_on_child_threads()
|
||||
}
|
||||
} // namespace
|
||||
|
||||
int
|
||||
get_realtime_signal()
|
||||
{
|
||||
return SIGRTMIN + config::get_sampling_rtoffset();
|
||||
}
|
||||
|
||||
int
|
||||
get_cputime_signal()
|
||||
{
|
||||
return SIGPROF;
|
||||
}
|
||||
|
||||
std::set<int> get_sampling_signals(int64_t)
|
||||
{
|
||||
auto _v = std::set<int>{};
|
||||
if(config::get_use_causal())
|
||||
{
|
||||
_v.emplace(get_cputime_signal());
|
||||
_v.emplace(get_realtime_signal());
|
||||
}
|
||||
else
|
||||
{
|
||||
if(config::get_use_sampling_cputime()) _v.emplace(get_cputime_signal());
|
||||
if(config::get_use_sampling_realtime()) _v.emplace(get_realtime_signal());
|
||||
}
|
||||
|
||||
return _v;
|
||||
}
|
||||
|
||||
std::atomic<uint64_t>&
|
||||
get_cpu_cid()
|
||||
{
|
||||
|
||||
@@ -78,15 +78,6 @@ get_init_bundle();
|
||||
std::unique_ptr<preinit_bundle_t>&
|
||||
get_preinit_bundle();
|
||||
|
||||
int
|
||||
get_realtime_signal();
|
||||
|
||||
int
|
||||
get_cputime_signal();
|
||||
|
||||
std::set<int>
|
||||
get_sampling_signals(int64_t _tid = 0);
|
||||
|
||||
std::atomic<uint64_t>&
|
||||
get_cpu_cid() TIMEMORY_HOT;
|
||||
|
||||
|
||||
@@ -854,11 +854,19 @@ void
|
||||
post_process_perfetto(int64_t _tid, const bundle_t* _init,
|
||||
const std::vector<bundle_t*>& _data)
|
||||
{
|
||||
auto _valid_metrics = backtrace_metrics::valid_array_t{};
|
||||
|
||||
for(const auto& itr : _data)
|
||||
{
|
||||
const auto* _bt_mt = itr->get<backtrace_metrics>();
|
||||
if(_bt_mt) _valid_metrics |= _bt_mt->get_valid();
|
||||
}
|
||||
|
||||
if(trait::runtime_enabled<backtrace_metrics>::get())
|
||||
{
|
||||
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
|
||||
"[%li] Post-processing metrics for perfetto...\n", _tid);
|
||||
backtrace_metrics::init_perfetto(_tid);
|
||||
backtrace_metrics::init_perfetto(_tid, _valid_metrics);
|
||||
for(const auto& itr : _data)
|
||||
{
|
||||
const auto* _bt_metrics = itr->get<backtrace_metrics>();
|
||||
@@ -867,8 +875,7 @@ post_process_perfetto(int64_t _tid, const bundle_t* _init,
|
||||
if(_bt_time->get_tid() != _tid) continue;
|
||||
_bt_metrics->post_process_perfetto(_tid, _bt_time->get_timestamp());
|
||||
}
|
||||
|
||||
backtrace_metrics::fini_perfetto(_tid);
|
||||
backtrace_metrics::fini_perfetto(_tid, _valid_metrics);
|
||||
}
|
||||
|
||||
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
|
||||
@@ -936,6 +943,12 @@ post_process_perfetto(int64_t _tid, const bundle_t* _init,
|
||||
_bt_mt->get_hw_counters().size() ==
|
||||
_last->get<backtrace_metrics>()->get_hw_counters().size();
|
||||
|
||||
auto _hw_counters_enabled = [](const auto* _bt_v) {
|
||||
return (_bt_v != nullptr) &&
|
||||
(*_bt_v)(type_list<backtrace_metrics::hw_counters>{}) &&
|
||||
(*_bt_v)(category::thread_hardware_counter{});
|
||||
};
|
||||
|
||||
// annotations common to both modes
|
||||
auto _common_annotate = [&](::perfetto::EventContext& ctx, bool _is_last) {
|
||||
if(_include_common && _is_last)
|
||||
@@ -943,7 +956,9 @@ post_process_perfetto(int64_t _tid, const bundle_t* _init,
|
||||
tracing::add_perfetto_annotation(ctx, "begin_ns", _beg);
|
||||
tracing::add_perfetto_annotation(ctx, "end_ns", _end);
|
||||
}
|
||||
if(_include_hw && _is_last)
|
||||
if(_include_hw && _is_last && _last &&
|
||||
_hw_counters_enabled(_last->get<backtrace_metrics>()) &&
|
||||
_hw_counters_enabled(_bt_mt))
|
||||
{
|
||||
// current values when read
|
||||
auto _hw_cnt_vals = _bt_mt->get_hw_counters();
|
||||
@@ -1048,16 +1063,15 @@ post_process_timemory(int64_t _tid, const bundle_t* _init,
|
||||
using bundle_t = tim::lightweight_tuple<comp::trip_count, sampling_wall_clock,
|
||||
sampling_cpu_clock, hw_counters>;
|
||||
|
||||
auto* _bt_data = itr->get<backtrace>();
|
||||
auto* _bt_time = itr->get<backtrace_timestamp>();
|
||||
auto* _bt_metrics = itr->get<backtrace_metrics>();
|
||||
auto* _bt_data = itr->get<backtrace>();
|
||||
auto* _bt_time = itr->get<backtrace_timestamp>();
|
||||
auto* _bt_metrics = itr->get<backtrace_metrics>();
|
||||
const auto* _last_metrics = _last->get<backtrace_metrics>();
|
||||
|
||||
if(!_bt_data || !_bt_time || !_bt_metrics) continue;
|
||||
if(!_bt_data || !_bt_time) continue;
|
||||
|
||||
double _elapsed_wc = (_bt_time->get_timestamp() -
|
||||
_last->get<backtrace_timestamp>()->get_timestamp());
|
||||
double _elapsed_cc = (_bt_metrics->get_cpu_timestamp() -
|
||||
_last->get<backtrace_metrics>()->get_cpu_timestamp());
|
||||
|
||||
std::vector<bundle_t> _tc{};
|
||||
_tc.reserve(_bt_data->size());
|
||||
@@ -1090,31 +1104,45 @@ post_process_timemory(int64_t _tid, const bundle_t* _init,
|
||||
if constexpr(tim::trait::is_available<sampling_cpu_clock>::value)
|
||||
{
|
||||
auto* _cc = iitr.get<sampling_cpu_clock>();
|
||||
if(_cc)
|
||||
|
||||
if(_cc && _bt_metrics && _last_metrics &&
|
||||
(*_bt_metrics)(category::thread_cpu_time{}) &&
|
||||
(*_last_metrics)(category::thread_cpu_time{}))
|
||||
{
|
||||
double _elapsed_cc = (_bt_metrics->get_cpu_timestamp() -
|
||||
_last_metrics->get_cpu_timestamp());
|
||||
|
||||
_cc->set_value(_elapsed_cc / sampling_cpu_clock::get_unit());
|
||||
_cc->set_accum(_elapsed_cc / sampling_cpu_clock::get_unit());
|
||||
}
|
||||
}
|
||||
if constexpr(tim::trait::is_available<hw_counters>::value)
|
||||
{
|
||||
auto _hw_cnt_vals = _bt_metrics->get_hw_counters();
|
||||
if(_last && _bt_metrics->get_hw_counters().size() ==
|
||||
_last->get<backtrace_metrics>()->get_hw_counters().size())
|
||||
auto _hw_counters_enabled = [](const auto* _bt_v) {
|
||||
return (_bt_v != nullptr) &&
|
||||
(*_bt_v)(type_list<backtrace_metrics::hw_counters>{}) &&
|
||||
(*_bt_v)(category::thread_hardware_counter{});
|
||||
};
|
||||
|
||||
if(_bt_metrics && _last_metrics && _hw_counters_enabled(_bt_metrics) &&
|
||||
_hw_counters_enabled(_last_metrics))
|
||||
{
|
||||
for(size_t k = 0; k < _bt_metrics->get_hw_counters().size(); ++k)
|
||||
auto _hw_cnt_vals = _bt_metrics->get_hw_counters();
|
||||
if(_bt_metrics->get_hw_counters().size() ==
|
||||
_last_metrics->get_hw_counters().size())
|
||||
{
|
||||
if(_last->get<backtrace_metrics>()->get_hw_counters()[k] >
|
||||
_hw_cnt_vals[k])
|
||||
_hw_cnt_vals[k] -=
|
||||
_last->get<backtrace_metrics>()->get_hw_counters()[k];
|
||||
for(size_t k = 0; k < _bt_metrics->get_hw_counters().size(); ++k)
|
||||
{
|
||||
if(_last_metrics->get_hw_counters()[k] > _hw_cnt_vals[k])
|
||||
_hw_cnt_vals[k] -= _last_metrics->get_hw_counters()[k];
|
||||
}
|
||||
}
|
||||
auto* _hw_counter = iitr.get<hw_counters>();
|
||||
if(_hw_counter)
|
||||
{
|
||||
_hw_counter->set_value(_hw_cnt_vals);
|
||||
_hw_counter->set_accum(_hw_cnt_vals);
|
||||
}
|
||||
}
|
||||
auto* _hw_counter = iitr.get<hw_counters>();
|
||||
if(_hw_counter)
|
||||
{
|
||||
_hw_counter->set_value(_hw_cnt_vals);
|
||||
_hw_counter->set_accum(_hw_cnt_vals);
|
||||
}
|
||||
}
|
||||
iitr.pop();
|
||||
|
||||
@@ -98,6 +98,15 @@ init_index_data(int64_t _tid, bool _offset = false)
|
||||
const auto unknown_thread = std::optional<thread_info>{};
|
||||
} // namespace
|
||||
|
||||
std::string
|
||||
thread_index_data::as_string() const
|
||||
{
|
||||
auto _ss = std::stringstream{};
|
||||
_ss << sequent_value << " [" << as_hex(system_value) << "] (#" << internal_value
|
||||
<< ")";
|
||||
return _ss.str();
|
||||
}
|
||||
|
||||
int64_t
|
||||
grow_data(int64_t _tid)
|
||||
{
|
||||
|
||||
@@ -64,6 +64,8 @@ struct thread_index_data
|
||||
int64_t internal_value = utility::get_thread_index();
|
||||
int64_t system_value = tim::threading::get_sys_tid();
|
||||
int64_t sequent_value = tim::threading::get_id();
|
||||
|
||||
std::string as_string() const;
|
||||
};
|
||||
|
||||
int64_t grow_data(int64_t);
|
||||
|
||||
@@ -34,20 +34,6 @@ bool debug_pop = tim::get_env("OMNITRACE_DEBUG_POP", false) || get_debug_env();
|
||||
bool debug_mark = tim::get_env("OMNITRACE_DEBUG_MARK", false) || get_debug_env();
|
||||
bool debug_user = tim::get_env("OMNITRACE_DEBUG_USER_REGIONS", false) || get_debug_env();
|
||||
|
||||
perfetto::TraceConfig&
|
||||
get_perfetto_config()
|
||||
{
|
||||
static auto _v = ::perfetto::TraceConfig{};
|
||||
return _v;
|
||||
}
|
||||
|
||||
std::unique_ptr<perfetto::TracingSession>&
|
||||
get_perfetto_session()
|
||||
{
|
||||
static auto _v = std::unique_ptr<perfetto::TracingSession>{};
|
||||
return _v;
|
||||
}
|
||||
|
||||
std::unordered_map<hash_value_t, std::string>&
|
||||
get_perfetto_track_uuids()
|
||||
{
|
||||
@@ -114,7 +100,6 @@ thread_init()
|
||||
process::get_id(), "thread",
|
||||
threading::get_id()),
|
||||
quirk::config<quirk::auto_start>{});
|
||||
get_interval_data()->reserve(512);
|
||||
// save the hash maps
|
||||
get_timemory_hash_ids() = tim::get_hash_ids();
|
||||
get_timemory_hash_aliases() = tim::get_hash_aliases();
|
||||
|
||||
@@ -40,6 +40,7 @@
|
||||
|
||||
#include <timemory/components/timing/backends.hpp>
|
||||
#include <timemory/hash/types.hpp>
|
||||
#include <timemory/mpl/concepts.hpp>
|
||||
#include <timemory/mpl/type_traits.hpp>
|
||||
|
||||
#include <atomic>
|
||||
@@ -70,12 +71,6 @@ extern OMNITRACE_HIDDEN_API bool debug_mark;
|
||||
std::unordered_map<hash_value_t, std::string>&
|
||||
get_perfetto_track_uuids();
|
||||
|
||||
perfetto::TraceConfig&
|
||||
get_perfetto_config();
|
||||
|
||||
std::unique_ptr<perfetto::TracingSession>&
|
||||
get_perfetto_session();
|
||||
|
||||
tim::hash_map_ptr_t&
|
||||
get_timemory_hash_ids(int64_t _tid = threading::get_id());
|
||||
|
||||
@@ -91,6 +86,46 @@ record_thread_start_time();
|
||||
void
|
||||
thread_init();
|
||||
|
||||
template <typename CategoryT>
|
||||
auto&
|
||||
get_category_stack();
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
push_perfetto(CategoryT, const char*, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
pop_perfetto(CategoryT, const char*, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
push_perfetto_ts(CategoryT, const char*, uint64_t _ts, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
pop_perfetto_ts(CategoryT, const char*, uint64_t, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
push_perfetto_track(CategoryT, const char*, perfetto::Track, uint64_t, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
pop_perfetto_track(CategoryT, const char*, perfetto::Track, uint64_t, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
mark_perfetto(CategoryT, const char*, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
mark_perfetto_ts(CategoryT, const char*, uint64_t, Args&&...);
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
mark_perfetto_track(CategoryT, const char*, perfetto::Track, uint64_t, Args&&...);
|
||||
|
||||
//
|
||||
// definitions
|
||||
//
|
||||
@@ -147,13 +182,6 @@ now()
|
||||
return ::tim::get_clock_real_now<Tp, std::nano>();
|
||||
}
|
||||
|
||||
inline auto&
|
||||
get_interval_data(int64_t _tid = threading::get_id())
|
||||
{
|
||||
static auto& _v = interval_data_instances::instances(construct_on_init{});
|
||||
return _v.at(_tid);
|
||||
}
|
||||
|
||||
inline auto&
|
||||
get_instrumentation_bundles(int64_t _tid = threading::get_id())
|
||||
{
|
||||
@@ -174,44 +202,128 @@ pop_count()
|
||||
return _v;
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
push_timemory(CategoryT, const char* name, Args&&... args)
|
||||
struct category_stack
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
int32_t profile = 0; // use signed so compiler doesn't have to
|
||||
int32_t tracing = 0; // account for underflow/overflow
|
||||
};
|
||||
|
||||
auto& _data = tracing::get_instrumentation_bundles();
|
||||
// this generates a hash for the raw string array
|
||||
auto _hash = tim::add_hash_id(tim::string_view_t{ name });
|
||||
_data.construct(_hash)->start(std::forward<Args>(args)...);
|
||||
template <typename CategoryT>
|
||||
auto&
|
||||
get_category_stack()
|
||||
{
|
||||
static thread_local auto _v = category_stack{};
|
||||
return _v;
|
||||
}
|
||||
|
||||
template <typename CategoryT>
|
||||
auto&
|
||||
get_tracing_stack()
|
||||
{
|
||||
return get_category_stack<CategoryT>().tracing;
|
||||
}
|
||||
|
||||
template <typename CategoryT>
|
||||
auto&
|
||||
get_profile_stack()
|
||||
{
|
||||
return get_category_stack<CategoryT>().profile;
|
||||
}
|
||||
|
||||
template <typename CategoryT>
|
||||
auto
|
||||
category_push_disabled()
|
||||
{
|
||||
return !trait::runtime_enabled<CategoryT>::get();
|
||||
}
|
||||
|
||||
template <typename CategoryT>
|
||||
auto
|
||||
category_mark_disabled()
|
||||
{
|
||||
return !trait::runtime_enabled<CategoryT>::get();
|
||||
}
|
||||
|
||||
template <typename CategoryT>
|
||||
auto
|
||||
category_pop_disabled()
|
||||
{
|
||||
return !trait::runtime_enabled<CategoryT>::get() &&
|
||||
(get_profile_stack<CategoryT>() + get_tracing_stack<CategoryT>()) <= 0;
|
||||
}
|
||||
|
||||
template <typename CategoryT>
|
||||
auto
|
||||
tracing_pop_disabled()
|
||||
{
|
||||
return !trait::runtime_enabled<CategoryT>::get() &&
|
||||
get_tracing_stack<CategoryT>() <= 0;
|
||||
}
|
||||
|
||||
template <typename CategoryT>
|
||||
auto
|
||||
profile_pop_disabled()
|
||||
{
|
||||
return !trait::runtime_enabled<CategoryT>::get() &&
|
||||
get_profile_stack<CategoryT>() <= 0;
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
pop_timemory(CategoryT, const char* name, Args&&... args)
|
||||
push_timemory(CategoryT, std::string_view name, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
if(category_push_disabled<CategoryT>()) return;
|
||||
|
||||
auto _hash = tim::hash::get_hash_id(tim::string_view_t{ name });
|
||||
auto& _data = tracing::get_instrumentation_bundles();
|
||||
if(_data.bundles.empty())
|
||||
// this generates a hash for the raw string array
|
||||
auto _hash = tim::add_hash_id(name);
|
||||
_data.construct(_hash)->start(std::forward<Args>(args)...);
|
||||
// increment the profile stack
|
||||
++get_profile_stack<CategoryT>();
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
pop_timemory(CategoryT, std::string_view name, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled and not pushed on this thread
|
||||
if(profile_pop_disabled<CategoryT>()) return;
|
||||
|
||||
auto _hash = tim::hash::get_hash_id(name);
|
||||
auto& _data = tracing::get_instrumentation_bundles();
|
||||
if(OMNITRACE_UNLIKELY(_data.bundles.empty()))
|
||||
{
|
||||
OMNITRACE_DEBUG("[%s] skipped %s :: empty bundle stack\n", "omnitrace_pop_trace",
|
||||
name);
|
||||
name.data());
|
||||
return;
|
||||
}
|
||||
for(size_t i = _data.bundles.size(); i > 0; --i)
|
||||
|
||||
auto*& _v_back = _data.bundles.back();
|
||||
if(OMNITRACE_LIKELY(_v_back->get_hash() == _hash))
|
||||
{
|
||||
auto*& _v = _data.bundles.at(i - 1);
|
||||
if(_v->get_hash() == _hash)
|
||||
// decrement the profile stack
|
||||
--get_profile_stack<CategoryT>();
|
||||
_v_back->stop(std::forward<Args>(args)...);
|
||||
_data.allocator.destroy(_v_back);
|
||||
_data.allocator.deallocate(_v_back, 1);
|
||||
_data.bundles.erase(--_data.bundles.end());
|
||||
}
|
||||
else if(_data.bundles.size() > 1)
|
||||
{
|
||||
for(size_t i = _data.bundles.size() - 1; i > 0; --i)
|
||||
{
|
||||
_v->stop(std::forward<Args>(args)...);
|
||||
_data.allocator.destroy(_v);
|
||||
_data.allocator.deallocate(_v, 1);
|
||||
_data.bundles.erase(_data.bundles.begin() + (i - 1));
|
||||
break;
|
||||
auto*& _v = _data.bundles.at(i - 1);
|
||||
if(_v->get_hash() == _hash)
|
||||
{
|
||||
// decrement the profile stack
|
||||
--get_profile_stack<CategoryT>();
|
||||
_v->stop(std::forward<Args>(args)...);
|
||||
_data.allocator.destroy(_v);
|
||||
_data.allocator.deallocate(_v, 1);
|
||||
_data.bundles.erase(_data.bundles.begin() + (i - 1));
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -221,12 +333,13 @@ inline void
|
||||
push_perfetto(CategoryT, const char* name, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
if(category_push_disabled<CategoryT>()) return;
|
||||
|
||||
uint64_t _ts = comp::wall_clock::record();
|
||||
if constexpr(sizeof...(Args) == 1 &&
|
||||
std::is_invocable<Args..., perfetto::EventContext>::value)
|
||||
{
|
||||
++get_tracing_stack<CategoryT>();
|
||||
uint64_t _ts = now();
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name),
|
||||
@@ -240,28 +353,48 @@ push_perfetto(CategoryT, const char* name, Args&&... args)
|
||||
}
|
||||
else
|
||||
{
|
||||
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name),
|
||||
_ts, std::forward<Args>(args)...,
|
||||
[&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
tracing::add_perfetto_annotation(ctx, "begin_ns", _ts);
|
||||
}
|
||||
});
|
||||
using tuple_type = std::tuple<concepts::unqualified_type_t<Args>...>;
|
||||
using arg0_type = concepts::tuple_element_t<0, tuple_type>;
|
||||
using arg1_type = concepts::tuple_element_t<1, tuple_type>;
|
||||
|
||||
if constexpr(std::is_same<arg0_type, perfetto::Track>::value &&
|
||||
std::is_same<arg1_type, uint64_t>::value)
|
||||
{
|
||||
push_perfetto_track(CategoryT{}, name, std::forward<Args>(args)...);
|
||||
}
|
||||
else if constexpr(std::is_same<arg0_type, uint64_t>::value)
|
||||
{
|
||||
push_perfetto_ts(CategoryT{}, name, std::forward<Args>(args)...);
|
||||
}
|
||||
else
|
||||
{
|
||||
++get_tracing_stack<CategoryT>();
|
||||
uint64_t _ts = now();
|
||||
TRACE_EVENT_BEGIN(
|
||||
trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
|
||||
std::forward<Args>(args)..., [&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
tracing::add_perfetto_annotation(ctx, "begin_ns", _ts);
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
pop_perfetto(CategoryT, const char*, Args&&... args)
|
||||
pop_perfetto(CategoryT, const char* name, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
// skip if category is disabled and not pushed on this thread
|
||||
if(tracing_pop_disabled<CategoryT>()) return;
|
||||
|
||||
uint64_t _ts = comp::wall_clock::record();
|
||||
if constexpr(sizeof...(Args) == 1 &&
|
||||
std::is_invocable<Args..., perfetto::EventContext>::value)
|
||||
{
|
||||
// decrement tracing stack
|
||||
--get_tracing_stack<CategoryT>();
|
||||
uint64_t _ts = now();
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts, "end_ns", _ts,
|
||||
@@ -275,14 +408,35 @@ pop_perfetto(CategoryT, const char*, Args&&... args)
|
||||
}
|
||||
else
|
||||
{
|
||||
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts, std::forward<Args>(args)...,
|
||||
[&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
tracing::add_perfetto_annotation(ctx, "end_ns", _ts);
|
||||
}
|
||||
});
|
||||
using tuple_type = std::tuple<concepts::unqualified_type_t<Args>...>;
|
||||
using arg0_type = concepts::tuple_element_t<0, tuple_type>;
|
||||
using arg1_type = concepts::tuple_element_t<1, tuple_type>;
|
||||
|
||||
if constexpr(std::is_same<arg0_type, perfetto::Track>::value &&
|
||||
std::is_same<arg1_type, uint64_t>::value)
|
||||
{
|
||||
pop_perfetto_track(CategoryT{}, name, std::forward<Args>(args)...);
|
||||
}
|
||||
else if constexpr(std::is_same<arg0_type, uint64_t>::value)
|
||||
{
|
||||
pop_perfetto_ts(CategoryT{}, name, std::forward<Args>(args)...);
|
||||
}
|
||||
else
|
||||
{
|
||||
// decrement tracing stack
|
||||
--get_tracing_stack<CategoryT>();
|
||||
uint64_t _ts = now();
|
||||
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts,
|
||||
std::forward<Args>(args)..., [&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
tracing::add_perfetto_annotation(ctx, "end_ns", _ts);
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
(void) name;
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
@@ -290,8 +444,9 @@ inline void
|
||||
push_perfetto_ts(CategoryT, const char* name, uint64_t _ts, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
if(category_push_disabled<CategoryT>()) return;
|
||||
|
||||
++get_tracing_stack<CategoryT>();
|
||||
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
|
||||
std::forward<Args>(args)...);
|
||||
}
|
||||
@@ -300,8 +455,11 @@ template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
pop_perfetto_ts(CategoryT, const char*, uint64_t _ts, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(!trait::runtime_enabled<CategoryT>::get()) return;
|
||||
// skip if category is disabled and not pushed on this thread
|
||||
if(tracing_pop_disabled<CategoryT>()) return;
|
||||
|
||||
// decrement tracing stack
|
||||
--get_tracing_stack<CategoryT>();
|
||||
|
||||
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts, std::forward<Args>(args)...);
|
||||
}
|
||||
@@ -311,6 +469,10 @@ inline void
|
||||
push_perfetto_track(CategoryT, const char* name, perfetto::Track _track, uint64_t _ts,
|
||||
Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(category_push_disabled<CategoryT>()) return;
|
||||
|
||||
++get_tracing_stack<CategoryT>();
|
||||
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name), _track,
|
||||
_ts, std::forward<Args>(args)...);
|
||||
}
|
||||
@@ -320,8 +482,91 @@ inline void
|
||||
pop_perfetto_track(CategoryT, const char*, perfetto::Track _track, uint64_t _ts,
|
||||
Args&&... args)
|
||||
{
|
||||
// skip if category is disabled and not pushed on this thread
|
||||
if(tracing_pop_disabled<CategoryT>()) return;
|
||||
|
||||
// decrement tracing stack
|
||||
--get_tracing_stack<CategoryT>();
|
||||
|
||||
TRACE_EVENT_END(trait::name<CategoryT>::value, _track, _ts,
|
||||
std::forward<Args>(args)...);
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
mark_perfetto(CategoryT, const char* name, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(category_mark_disabled<CategoryT>()) return;
|
||||
|
||||
if constexpr(sizeof...(Args) == 1 &&
|
||||
std::is_invocable<Args..., perfetto::EventContext>::value)
|
||||
{
|
||||
uint64_t _ts = now();
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value,
|
||||
perfetto::StaticString(name), _ts, "ns", _ts,
|
||||
std::forward<Args>(args)...);
|
||||
}
|
||||
else
|
||||
{
|
||||
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value,
|
||||
perfetto::StaticString(name), _ts,
|
||||
std::forward<Args>(args)...);
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
using tuple_type = std::tuple<concepts::unqualified_type_t<Args>...>;
|
||||
using arg0_type = concepts::tuple_element_t<0, tuple_type>;
|
||||
using arg1_type = concepts::tuple_element_t<1, tuple_type>;
|
||||
|
||||
if constexpr(std::is_same<arg0_type, perfetto::Track>::value &&
|
||||
std::is_same<arg1_type, uint64_t>::value)
|
||||
{
|
||||
mark_perfetto_track(CategoryT{}, name, std::forward<Args>(args)...);
|
||||
}
|
||||
else if constexpr(std::is_same<arg0_type, uint64_t>::value)
|
||||
{
|
||||
mark_perfetto_ts(CategoryT{}, name, std::forward<Args>(args)...);
|
||||
}
|
||||
else
|
||||
{
|
||||
uint64_t _ts = now();
|
||||
TRACE_EVENT_INSTANT(
|
||||
trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
|
||||
std::forward<Args>(args)..., [&](perfetto::EventContext ctx) {
|
||||
if(config::get_perfetto_annotations())
|
||||
{
|
||||
tracing::add_perfetto_annotation(ctx, "ns", _ts);
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
mark_perfetto_ts(CategoryT, const char* name, uint64_t _ts, Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(category_mark_disabled<CategoryT>()) return;
|
||||
|
||||
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
|
||||
std::forward<Args>(args)...);
|
||||
}
|
||||
|
||||
template <typename CategoryT, typename... Args>
|
||||
inline void
|
||||
mark_perfetto_track(CategoryT, const char*, perfetto::Track _track, uint64_t _ts,
|
||||
Args&&... args)
|
||||
{
|
||||
// skip if category is disabled
|
||||
if(category_mark_disabled<CategoryT>()) return;
|
||||
|
||||
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value, _track, _ts,
|
||||
std::forward<Args>(args)...);
|
||||
}
|
||||
} // namespace tracing
|
||||
} // namespace omnitrace
|
||||
|
||||
@@ -32,6 +32,7 @@
|
||||
#include <atomic>
|
||||
#include <cstddef>
|
||||
#include <cstdint>
|
||||
#include <sstream>
|
||||
#include <stdexcept>
|
||||
#include <vector>
|
||||
|
||||
@@ -226,5 +227,16 @@ get_regex_or(const ContainerT<Tp, TailT...>& _container, PredicateT&& _predicate
|
||||
|
||||
return get_regex_or(_dest, _fallback);
|
||||
}
|
||||
|
||||
template <typename Tp>
|
||||
Tp
|
||||
convert(std::string_view _inp)
|
||||
{
|
||||
auto _iss = std::stringstream{};
|
||||
auto _ret = Tp{};
|
||||
_iss << _inp;
|
||||
_iss >> _ret;
|
||||
return _ret;
|
||||
}
|
||||
} // namespace utility
|
||||
} // namespace omnitrace
|
||||
|
||||
+116
-7
@@ -504,7 +504,7 @@ set(_ompt_preload_environ
|
||||
"OMNITRACE_SAMPLING_REALTIME=ON"
|
||||
"OMNITRACE_SAMPLING_CPUTIME_FREQ=1000"
|
||||
"OMNITRACE_SAMPLING_REALTIME_FREQ=500"
|
||||
"OMNITRACE_COLORIZED_LOG=OFF")
|
||||
"OMNITRACE_MONOCHROME=ON")
|
||||
|
||||
set(_ompt_sample_no_tmpfiles_environ
|
||||
"${_ompt_environment}"
|
||||
@@ -516,7 +516,7 @@ set(_ompt_sample_no_tmpfiles_environ
|
||||
"OMNITRACE_SAMPLING_REALTIME=OFF"
|
||||
"OMNITRACE_SAMPLING_CPUTIME_FREQ=700"
|
||||
"OMNITRACE_USE_TEMPORARY_FILES=OFF"
|
||||
"OMNITRACE_COLORIZED_LOG=OFF")
|
||||
"OMNITRACE_MONOCHROME=ON")
|
||||
|
||||
set(_ompt_preload_samp_regex
|
||||
"Sampler for thread 0 will be triggered 1000.0x per second of CPU-time(.*)Sampler for thread 0 will be triggered 500.0x per second of wall-time(.*)Sampling will be disabled after 0.250000 seconds(.*)Sampling duration of 0.250000 seconds has elapsed. Shutting down sampling"
|
||||
@@ -684,6 +684,111 @@ omnitrace_add_test(
|
||||
RUNTIME_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%"
|
||||
REWRITE_RUN_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%")
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING SKIP_PRELOAD
|
||||
NAME trace-time-window
|
||||
TARGET trace-time-window
|
||||
REWRITE_ARGS -e -v 2 --caller-include inner -i 4096
|
||||
RUNTIME_ARGS -e -v 1 --caller-include inner -i 4096
|
||||
LABELS "time-window"
|
||||
ENVIRONMENT "${_window_environment};OMNITRACE_TRACE_DURATION=1.25")
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-binary-rewrite
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
FAIL_REGEX "outer_d"
|
||||
ARGS -l
|
||||
main
|
||||
outer_a
|
||||
outer_b
|
||||
outer_c
|
||||
-c
|
||||
1
|
||||
1
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
1
|
||||
1
|
||||
1
|
||||
-p)
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-runtime-instrument
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
FAIL_REGEX "outer_d"
|
||||
ARGS -l
|
||||
main
|
||||
outer_a
|
||||
outer_b
|
||||
outer_c
|
||||
-c
|
||||
1
|
||||
1
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
1
|
||||
1
|
||||
1
|
||||
-p)
|
||||
|
||||
omnitrace_add_test(
|
||||
SKIP_BASELINE SKIP_SAMPLING SKIP_PRELOAD
|
||||
NAME trace-time-window-delay
|
||||
TARGET trace-time-window
|
||||
REWRITE_ARGS -e -v 2 --caller-include inner -i 4096
|
||||
RUNTIME_ARGS -e -v 1 --caller-include inner -i 4096
|
||||
LABELS "time-window"
|
||||
ENVIRONMENT
|
||||
"${_window_environment};OMNITRACE_TRACE_DELAY=0.75;OMNITRACE_TRACE_DURATION=0.75")
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-delay-binary-rewrite
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
ARGS -l
|
||||
outer_c
|
||||
outer_d
|
||||
-c
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
0
|
||||
-p)
|
||||
|
||||
omnitrace_add_validation_test(
|
||||
NAME trace-time-window-delay-runtime-instrument
|
||||
TIMEMORY_METRIC "wall_clock"
|
||||
TIMEMORY_FILE "wall_clock.json"
|
||||
PERFETTO_METRIC "host"
|
||||
PERFETTO_FILE "perfetto-trace.proto"
|
||||
LABELS "time-window"
|
||||
ARGS -l
|
||||
outer_c
|
||||
outer_d
|
||||
-c
|
||||
1
|
||||
1
|
||||
-d
|
||||
0
|
||||
0
|
||||
-p)
|
||||
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# critical-trace tests
|
||||
@@ -823,6 +928,10 @@ foreach(_TARGET ${RCCL_TEST_TARGETS})
|
||||
line
|
||||
return
|
||||
args
|
||||
-ME
|
||||
sysdeps
|
||||
--log-file
|
||||
rccl-test-${_NAME}.log
|
||||
RUN_ARGS -t
|
||||
1
|
||||
-g
|
||||
@@ -910,7 +1019,7 @@ omnitrace_add_causal_test(
|
||||
)
|
||||
|
||||
set(_causal_common_args
|
||||
"-n 10 -e -s 0 10 20 30 -B $<TARGET_FILE_BASE_NAME:causal-cpu-omni>")
|
||||
"-n 20 -e -s 0 10 20 30 -B $<TARGET_FILE_BASE_NAME:causal-cpu-omni>")
|
||||
|
||||
macro(
|
||||
causal_e2e_args_and_validation
|
||||
@@ -945,7 +1054,7 @@ omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-slow-func-e2e
|
||||
TARGET causal-cpu-omni
|
||||
RUN_ARGS 80 30 432525 200000000
|
||||
RUN_ARGS 80 12 432525 250000000
|
||||
CAUSAL_MODE "func"
|
||||
CAUSAL_ARGS ${_causal_slow_func_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_slow_func_valid}
|
||||
@@ -957,7 +1066,7 @@ omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-fast-func-e2e
|
||||
TARGET causal-cpu-omni
|
||||
RUN_ARGS 80 30 432525 200000000
|
||||
RUN_ARGS 80 12 432525 250000000
|
||||
CAUSAL_MODE "func"
|
||||
CAUSAL_ARGS ${_causal_fast_func_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_fast_func_valid}
|
||||
@@ -969,7 +1078,7 @@ omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-line-155-e2e
|
||||
TARGET causal-cpu-omni
|
||||
RUN_ARGS 80 30 432525 200000000
|
||||
RUN_ARGS 80 12 432525 250000000
|
||||
CAUSAL_MODE "line"
|
||||
CAUSAL_ARGS ${_causal_line_155_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_line_155_valid}
|
||||
@@ -981,7 +1090,7 @@ omnitrace_add_causal_test(
|
||||
SKIP_BASELINE
|
||||
NAME cpu-omni-line-165-e2e
|
||||
TARGET causal-cpu-omni
|
||||
RUN_ARGS 80 30 432525 200000000
|
||||
RUN_ARGS 80 12 432525 250000000
|
||||
CAUSAL_MODE "line"
|
||||
CAUSAL_ARGS ${_causal_line_165_args}
|
||||
CAUSAL_VALIDATE_ARGS ${_causal_line_165_valid}
|
||||
|
||||
@@ -164,6 +164,17 @@ set(_rccl_environment
|
||||
"${_test_openmp_env}"
|
||||
"${_test_library_path}")
|
||||
|
||||
set(_window_environment
|
||||
"OMNITRACE_USE_PERFETTO=ON"
|
||||
"OMNITRACE_USE_TIMEMORY=ON"
|
||||
"OMNITRACE_USE_SAMPLING=OFF"
|
||||
"OMNITRACE_USE_PROCESS_SAMPLING=OFF"
|
||||
"OMNITRACE_TIME_OUTPUT=OFF"
|
||||
"OMNITRACE_FILE_OUTPUT=ON"
|
||||
"OMNITRACE_VERBOSE=2"
|
||||
"${_test_openmp_env}"
|
||||
"${_test_library_path}")
|
||||
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
set(MPIEXEC_EXECUTABLE_ARGS)
|
||||
@@ -231,7 +242,7 @@ endif()
|
||||
|
||||
function(OMNITRACE_WRITE_TEST_CONFIG _FILE _ENV)
|
||||
set(_ENV_ONLY
|
||||
"OMNITRACE_(MODE|USE_MPIP|DEBUG_SETTINGS|FORCE_ROCPROFILER_INIT|DEFAULT_MIN_INSTRUCTIONS|COLORIZED_LOG)="
|
||||
"OMNITRACE_(MODE|USE_MPIP|DEBUG_SETTINGS|FORCE_ROCPROFILER_INIT|DEFAULT_MIN_INSTRUCTIONS|MONOCHROME)="
|
||||
)
|
||||
set(_FILE_CONTENTS)
|
||||
set(_ENV_CONTENTS)
|
||||
@@ -436,7 +447,7 @@ function(OMNITRACE_ADD_TEST)
|
||||
|
||||
set(_environ
|
||||
"OMNITRACE_DEFAULT_MIN_INSTRUCTIONS=64" "${TEST_ENVIRONMENT}"
|
||||
"OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
|
||||
"OMNITRACE_OUTPUT_PATH=${PROJECT_BINARY_DIR}/omnitrace-tests-output"
|
||||
"OMNITRACE_OUTPUT_PREFIX=${_prefix}")
|
||||
|
||||
set(_timeout ${TEST_REWRITE_TIMEOUT})
|
||||
@@ -575,7 +586,7 @@ function(OMNITRACE_ADD_CAUSAL_TEST)
|
||||
|
||||
set(_environ
|
||||
"${_causal_environment}"
|
||||
"OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
|
||||
"OMNITRACE_OUTPUT_PATH=${PROJECT_BINARY_DIR}/omnitrace-tests-output"
|
||||
"OMNITRACE_OUTPUT_PREFIX=${_prefix}"
|
||||
"OMNITRACE_CI=ON"
|
||||
"OMNITRACE_USE_PID=OFF"
|
||||
@@ -739,3 +750,146 @@ function(OMNITRACE_ADD_PYTHON_TEST)
|
||||
${_TEST_PROPERTIES})
|
||||
endforeach()
|
||||
endfunction()
|
||||
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# Find Python3 interpreter for output validation
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
if(NOT OMNITRACE_USE_PYTHON)
|
||||
find_package(Python3 QUIET COMPONENTS Interpreter)
|
||||
|
||||
if(Python3_FOUND)
|
||||
set(OMNITRACE_VALIDATION_PYTHON ${Python3_EXECUTABLE})
|
||||
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import perfetto"
|
||||
RESULT_VARIABLE OMNITRACE_VALIDATION_PYTHON_PERFETTO)
|
||||
|
||||
if(NOT OMNITRACE_VALIDATION_PYTHON_PERFETTO EQUAL 0)
|
||||
omnitrace_message(AUTHOR_WARNING
|
||||
"Python3 found but perfetto support is disabled")
|
||||
endif()
|
||||
endif()
|
||||
else()
|
||||
set(_INDEX 0)
|
||||
foreach(_VERSION ${OMNITRACE_PYTHON_VERSIONS})
|
||||
if(NOT OMNITRACE_USE_PYTHON)
|
||||
continue()
|
||||
endif()
|
||||
|
||||
list(GET OMNITRACE_PYTHON_ROOT_DIRS ${_INDEX} _PYTHON_ROOT_DIR)
|
||||
|
||||
omnitrace_find_python(
|
||||
_PYTHON
|
||||
ROOT_DIR "${_PYTHON_ROOT_DIR}"
|
||||
COMPONENTS Interpreter)
|
||||
|
||||
if(_PYTHON_EXECUTABLE)
|
||||
set(OMNITRACE_VALIDATION_PYTHON ${_PYTHON_EXECUTABLE})
|
||||
execute_process(COMMAND ${_PYTHON_EXECUTABLE} -c "import perfetto"
|
||||
RESULT_VARIABLE OMNITRACE_VALIDATION_PYTHON_PERFETTO)
|
||||
|
||||
# prefer Python3 with perfetto support
|
||||
if(OMNITRACE_VALIDATION_PYTHON_PERFETTO EQUAL 0)
|
||||
break()
|
||||
else()
|
||||
omnitrace_message(
|
||||
AUTHOR_WARNING
|
||||
"${_PYTHON_EXECUTABLE} found but perfetto support is disabled")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
math(EXPR _INDEX "${_INDEX} + 1")
|
||||
endforeach()
|
||||
endif()
|
||||
|
||||
if(NOT OMNITRACE_VALIDATION_PYTHON)
|
||||
omnitrace_message(AUTHOR_WARNING
|
||||
"Python3 interpreter not found. Validation tests will be disabled")
|
||||
endif()
|
||||
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
#
|
||||
# Output validation test function
|
||||
#
|
||||
# -------------------------------------------------------------------------------------- #
|
||||
|
||||
function(OMNITRACE_ADD_VALIDATION_TEST)
|
||||
|
||||
if(NOT OMNITRACE_VALIDATION_PYTHON)
|
||||
return()
|
||||
endif()
|
||||
|
||||
cmake_parse_arguments(
|
||||
TEST
|
||||
""
|
||||
"NAME;TIMEOUT;TIMEMORY_METRIC;TIMEMORY_FILE;PERFETTO_METRIC;PERFETTO_FILE"
|
||||
"ENVIRONMENT;LABELS;PROPERTIES;PASS_REGEX;FAIL_REGEX;SKIP_REGEX;DEPENDS;ARGS"
|
||||
${ARGN})
|
||||
|
||||
if(NOT TEST_TIMEOUT)
|
||||
set(TEST_TIMEOUT 30)
|
||||
endif()
|
||||
|
||||
set(PYTHON_EXECUTABLE "${OMNITRACE_VALIDATION_PYTHON}")
|
||||
|
||||
list(APPEND TEST_LABELS "validate")
|
||||
foreach(_DEP ${TEST_DEPENDS})
|
||||
list(APPEND TEST_LABELS "validate-${_DEP}")
|
||||
endforeach()
|
||||
|
||||
list(APPEND TEST_DEPENDS "${TEST_NAME}")
|
||||
|
||||
if(NOT TEST_PASS_REGEX)
|
||||
set(TEST_PASS_REGEX
|
||||
"omnitrace-tests-output/${TEST_NAME}/(${TEST_TIMEMORY_FILE}|${TEST_PERFETTO_FILE}) validated"
|
||||
)
|
||||
endif()
|
||||
|
||||
add_test(
|
||||
NAME validate-${TEST_NAME}-timemory
|
||||
COMMAND
|
||||
${OMNITRACE_VALIDATION_PYTHON}
|
||||
${CMAKE_CURRENT_LIST_DIR}/validate-timemory-json.py -m ${TEST_TIMEMORY_METRIC}
|
||||
${TEST_ARGS} -i
|
||||
${PROJECT_BINARY_DIR}/omnitrace-tests-output/${TEST_NAME}/${TEST_TIMEMORY_FILE}
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
if(OMNITRACE_VALIDATION_PYTHON_PERFETTO EQUAL 0)
|
||||
add_test(
|
||||
NAME validate-${TEST_NAME}-perfetto
|
||||
COMMAND
|
||||
${OMNITRACE_VALIDATION_PYTHON}
|
||||
${CMAKE_CURRENT_LIST_DIR}/validate-perfetto-proto.py -m
|
||||
${TEST_PERFETTO_METRIC} ${TEST_ARGS} -i
|
||||
${PROJECT_BINARY_DIR}/omnitrace-tests-output/${TEST_NAME}/${TEST_PERFETTO_FILE}
|
||||
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
endif()
|
||||
|
||||
foreach(_TEST validate-${TEST_NAME}-timemory validate-${TEST_NAME}-perfetto)
|
||||
|
||||
if(NOT TEST "${_TEST}")
|
||||
continue()
|
||||
endif()
|
||||
|
||||
set_tests_properties(
|
||||
${_TEST}
|
||||
PROPERTIES ENVIRONMENT
|
||||
"${_TEST_ENV}"
|
||||
TIMEOUT
|
||||
${TEST_TIMEOUT}
|
||||
LABELS
|
||||
"${TEST_LABELS}"
|
||||
DEPENDS
|
||||
"${TEST_DEPENDS};${TEST_NAME}"
|
||||
PASS_REGULAR_EXPRESSION
|
||||
"${TEST_PASS_REGEX}"
|
||||
FAIL_REGULAR_EXPRESSION
|
||||
"${TEST_FAIL_REGEX}"
|
||||
SKIP_REGULAR_EXPRESSION
|
||||
"${TEST_SKIP_REGEX}"
|
||||
REQUIRED_FILES
|
||||
"${TEST_FILE}"
|
||||
${TEST_PROPERTIES})
|
||||
endforeach()
|
||||
endfunction()
|
||||
|
||||
@@ -274,7 +274,6 @@ def compute_speedups(_data, args):
|
||||
|
||||
|
||||
def get_validations(args):
|
||||
|
||||
data = []
|
||||
_len = len(args.validate)
|
||||
if _len == 0:
|
||||
@@ -297,7 +296,6 @@ def get_validations(args):
|
||||
|
||||
|
||||
def main():
|
||||
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
@@ -42,6 +42,9 @@ if __name__ == "__main__":
|
||||
parser.add_argument(
|
||||
"-d", "--depths", nargs="+", type=int, help="Expected depths", default=[]
|
||||
)
|
||||
parser.add_argument(
|
||||
"-p", "--print", action="store_true", help="Print the processed perfetto data"
|
||||
)
|
||||
parser.add_argument("-i", "--input", type=str, help="Input file", required=True)
|
||||
|
||||
args = parser.parse_args()
|
||||
@@ -54,6 +57,19 @@ if __name__ == "__main__":
|
||||
ret = 0
|
||||
with open(args.input) as f:
|
||||
data = json.load(f)
|
||||
|
||||
# demo display of data
|
||||
if args.print:
|
||||
for itr in data["timemory"][args.metric]["ranks"][0]["graph"]:
|
||||
_prefix = itr["prefix"]
|
||||
_depth = itr["depth"]
|
||||
_count = itr["entry"]["laps"]
|
||||
_idx = _prefix.find(">>>")
|
||||
if _idx is not None:
|
||||
_prefix = _prefix[(_idx + 4) :]
|
||||
|
||||
print("| {:40} | {:6} | {:6} |".format(_prefix, _count, _depth))
|
||||
|
||||
try:
|
||||
validate_json(
|
||||
data["timemory"][args.metric]["ranks"][0]["graph"],
|
||||
|
||||
Referencia en una nueva incidencia
Block a user