Global trace delay and duration (#235)

- The primary feature of this PR is the **addition of support for scoping the collection of tracing/profiling data into one or more time-based windows**
  - Closes #222 
  - Closes #207
  - Support for a real-clock time delay and/or a duration for tracing/profiling was added, *resembling the support for this feature during sampling and process-sampling*
  - However, above paradigm was enhanced for tracing 
    - Instead of one delay and/or one duration based on real time, ***tracing supports periodic and varying delays and durations and these delay+duration sets can be controlled with different clocks***  
    - At some point, this capability will be extended to sampling and process-sampling
- A secondary feature of this PR are the improvements to the handling of categories (by-product of the primary feature)
  - For example, previously setting `OMNITRACE_ENABLE_CATEGORIES` to a specific set of categories only eliminated the disabled categories from the perfetto trace, now these are applied to timemory profiles too
  - A new configuration variable `OMNITRACE_DISABLE_CATEGORIES` was added for when disabling only a handful of categories is easier
- There are quite a few miscellaneous modifications which pollute this PR a bit

## Multiple Tracing Windows

As noted above, tracing now supports specifying multiple delays and durations _and_ with different clocks. Consider the configuration below with two entries in the format `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_TYPE>`:

```console
OMNITRACE_TRACE_PERIODS = 0.5:1.0:2:realtime    10.0:5.0:3:cputime
```

The above configuration defines:
1. `0.5:1.0:2:realtime`
  - A delay of 0.5 seconds (real-time)
  - Followed by a data collection duration of 1 second (real-time)
  - This delay + duration is repeated 2x
  - Summary: tracing data is collected for 2 out of the first 3 seconds of the application's execution
2. `10.0:5.0:3:cputime`
  - A delay of 10 seconds (process _CPU-time_)
  - Followed by a data collection duration of 5 seconds (process _CPU-time_)
  - This delay + duration is repeated 3x
  - Summary: tracing data is collected for a total of 15 seconds of process CPU-time in the ensuing 75 seconds of CPU-time during the application execution. 
    - Note: the elapsed CPU-time is the aggregate of the CPU-time consumed by all the threads in the process and should be scaled accordingly, e.g., 4 threads running constantly for 1 second of real-time is ~4 seconds of CPU time. 

## `omnitrace-sample` Changes

Formerly, `--wait` and `--duration` command-line options only applied to sampling delay and duration. The value of these options are now applied to the tracing delay and duration. To retain the ability to control sampling delay/duration without setting tracing delay/duration or vice versa, `--sampling-wait`, `--sampling-duration`, `--trace-wait`, and `--trace-duration` options were added. `omnitrace-sample` also has new options for most of the new configuration options detailed below.

## New configuration options

| Option | Description |
| ------- | ----------- |
| `OMNITRACE_DISABLE_CATEGORIES` | inverse behavior from `OMNITRACE_ENABLE_CATEGORIES` -- populates list of all available categories and then removes the specified ones. |
| `OMNITRACE_TRACE_DELAY` | Single floating-point number specifying time to wait before starting data collection. Analagous to `OMNITRACE_SAMPLING_DELAY` and `OMNITRACE_PROCESS_SAMPLING_DELAY` |
| `OMNITRACE_TRACE_DURATION` | Single floating-point number specifying data collection duration. Analagous to `OMNITRACE_SAMPLING_DURATION` and `OMNITRACE_PROCESS_SAMPLING_DURATION` |
| `OMNITRACE_TRACE_PERIOD_CLOCK_ID` | Sets the default clock-type for tracing delay/duration. Always applied to above two options, can be overridden in below option. Accepts `CLOCK_REALTIME`, `CLOCK_MONOTONIC`, `CLOCK_PROCESS_CPUTIME_ID`, `CLOCK_MONOTONIC_RAW`, `CLOCK_REALTIME_COARSE`, `CLOCK_MONOTONIC_COARSE`, `CLOCK_BOOTTIME`. See `man 2 clock_gettime` for details on differences. |
| `OMNITRACE_TRACE_PERIODS` | More powerful version for specifying delay + duration. Supports formats: `<DELAY>`, `<DELAY>:<DURATION>`, `<DELAY>:<DURATION>:<REPEAT>`, and `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>`.  |

 ## Miscellaneous Changes

- Expanded `critical_trace_categories_t` to include tracing data from MPI, pthread, HIP, HSA, RCCL, NUMA, and Python.
- Added categories `thread_wall_time` and `thread_cpu_time` (derived from sampling)
- Read DWARF info for breakpoints
- Relocated some source code
  - Reason: necessary to make `libomnitrace` a bit more modular. Eventually, a large chunk will be separated into `libomnitrace-core`, `libomnitrace-binary`, etc. in order to facilitate re-usability
  - Relocated some functionality from `runtime.cpp` to `config.cpp`
  - Relocated code using rocm-smi library to query number of devices to `gpu.cpp` (where the code for using HIP to query number of devices is)
  - Relocated code for perfetto config and perfetto session out of tracing namespace to reside with other perfetto code
- `OMNITRACE_COLORIZED_LOG` configuration option renamed to `OMNITRACE_MONOCHROME`
  - Backwards compatibility via a deprecated option was not retained here since the logic changed (i.e. true in former means false in latter)
- Replaced `TIMEMORY_DEFAULT_OBJECT` macro with `OMNITRACE_DEFAULT_OBJECT` macro 
- Updated some code in roctracer to use `component::category_region` instead of explicitly using `tracing::` functions
- Updated `backtrace_metrics` to better support controlling their presence in the traces/profiles via categories
- Added support for `--print` in `validate-timemory-json.py`
- Generic `OMNITRACE_ADD_VALIDATION_TEST` CMake function

## Git Log

* OMNITRACE_DEFAULT_OBJECT

- replace TIMEMORY_DEFAULT_OBJECT with TIMEMORY_DEFAULT_OBJECT

* trace-time-window example + tests

- adds cmake OMNITRACE_ADD_VALIDATION_TEST function for testing
- validate-timemory-json.py now supports printing (-p)
- update to OMNITRACE_STRIP_TARGET

* Update timemory submodule

- detailed backtrace print /proc/<PID>/maps
- operation::push_node verbosity change
- storage::insert_hierarchy use emplace + at instead of operator[]
- concepts::is_type_listing
- argparse updates for start/end group
- argparse color fixes

* perfetto updates

- Remove OMNITRACE_CUSTOM_DATA_SOURCE CMake option
- move tracing::get_perfetto_config and tracing::get_perfetto_session to perfetto.cpp

* config and runtime updates

- OMNITRACE_DISABLE_CATEGORIES option
  - get_enabled_categories() + get_disabled_categories()
  - config impl handles populating them
- OMNITRACE_TRACE_DELAY option
- OMNITRACE_TRACE_DURATION option
- OMNITRACE_TRACE_PERIODS option
- {get,set}_signal_handler
  - removes config.cpp link dependency for omnitrace_finalize
- get_realtime_signal() + get_cputime_signal() + get_sampling_signals()
  - moved from runtime.cpp to config.cpp

* utility::convert

- helper function for converting string to a type

* pthread_create_gotcha + thread_info updates

- thread_index_data::as_string()
- tweak printing info about new thread / exited thread

* binary updates

- get_binary_info has arg to disable dwarf parsing
- binary_info contains vector of breakpoint addresses
- binary_info:filename() function
- binary::get_linked_path
- binary::get_link_map has args for dlopen mode
- symbol::read_dwarf -> symbol::read_dwarf_entries
- symbol::read_dwarf_breakpoints

* library updates + categories impl

- implement config::set_signal_handler
- categories.cpp for handling trace delays
  - implement trace delay/duration/periods

* concepts + debug + defines

- tuple_element in concepts
- removed runtime header from debug header
- OMNITRACE_DEFAULT_COPY_MOVE

* gpu + rocm_smi

- moved rsmi_num_monitor_devices call to gpu.cpp
  - gpu::rsmi_device_count()

* roctracer updates

- roctracer_bundle_t -> roctracer_hip_bundle_t
- use category_region instead of explicit tracing push/pop calls

* sampling + backtrace_metrics

- rework backtrace_metrics to support categories

* tracing updates

- category stack counters (i.e. push vs. pop counter) for profiling and tracing
- push_timemory and pop_timemory accept string_view instead of const char*
- tweaked the pop_timemory hash search
- {push,pop}_perfetto theoretically supports same invocations as for {push,pop}_perfetto_ts and {push,pop}_perfetto_track
- mark_perfetto, mark_perfetto_ts, mark_perfetto_track

* category_region update

- expanded the critical trace categories
- use category_push_disabled
- use category_pop_disabled
- use category_mark_disabled

* constraint implementation

- This provides generic functionality for constraining data collection within a windows of time.
 - E.g., delay, delay + duration, (delay + duration) * nrepeat

* COLORIZED_LOG -> MONOCHROME

* constraint + omnitrace-causal + omnitrace-sample updates

- support for using different clock IDs for constraints
- OMNITRACE_TRACE_PERIOD_CLOCK_ID option
- tweak to trace-time-window example
- tweak to trace-time-window tests

* Fix formatting

* Update time-window tests

- Fix detection of validation support for perfetto
- Using the --caller-include feature + runtime instrumentation on Ubuntu 18.04 and OpenSUSE 15.2 results in a segfault in the internals of Dyninst.
  - For now, mark that these tests will fail
  - Later, determine if updating Dyninst submodule fixes this problem

* Fix OMNITRACE_OUTPUT_PATH for all tests

- Provide absolute path instead of relative

* Tweak lambda for checking whether HW counters are enabled

- causing strange build errors on older GCC compilers

* Update dyninst submodule

- fix issues with using --caller-include for Ubuntu 18.04, OpenSUSE 15.x

* cmake formatting

* fix sampling compiler issue for GCC 8

* Tweak thread create message

* Increase causal validation iterations
Este commit está contenido en:
Jonathan R. Madsen
2023-02-03 14:10:42 -06:00
cometido por GitHub
padre 2fb67c394b
commit 8feb6bf8b6
Se han modificado 82 ficheros con 2538 adiciones y 632 borrados
+16
Ver fichero
@@ -215,6 +215,22 @@ parse:
DEFINITIONS: '*'
LINK_LIBRARIES: '*'
INCLUDE_DIRECTORIES: '*'
omnitrace_add_validation_test:
kwargs:
NAME: '*'
ARGS: '*'
LABELS: '*'
TIMEOUT: '*'
DEPENDS: '*'
PROPERTIES: '*'
PASS_REGEX: '*'
FAIL_REGEX: '*'
SKIP_REGEX: '*'
ENVIRONMENT: '*'
PERFETTO_FILE: '*'
PERFETTO_METRIC: '*'
TIMEMORY_FILE: '*'
TIMEMORY_METRIC: '*'
override_spec: {}
vartags: []
proptags: []
-4
Ver fichero
@@ -116,8 +116,6 @@ if(CI_BUILD)
ADVANCED)
omnitrace_add_option(OMNITRACE_BUILD_DEBUG
"Enable building with extensive debug symbols" OFF ADVANCED)
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF
ADVANCED)
omnitrace_add_option(
OMNITRACE_BUILD_HIDDEN_VISIBILITY
"Build with hidden visibility (disable for Debug builds)" OFF ADVANCED)
@@ -131,8 +129,6 @@ else()
ADVANCED)
omnitrace_add_option(OMNITRACE_BUILD_DEBUG
"Enable building with extensive debug symbols" OFF ADVANCED)
omnitrace_add_option(OMNITRACE_CUSTOM_DATA_SOURCE "Enable custom data source" OFF
ADVANCED)
omnitrace_add_option(
OMNITRACE_BUILD_HIDDEN_VISIBILITY
"Build with hidden visibility (disable for Debug builds)" ON ADVANCED)
+49 -19
Ver fichero
@@ -108,27 +108,57 @@ function(OMNITRACE_CAPITALIZE str var)
endfunction()
# ------------------------------------------------------------------------------#
# function omnitrace_strip_target()
# function omnitrace_strip_target(<TARGET> [FORCE] [EXPLICIT])
#
# Creates a target which runs ctest but depends on all the tests being built.
# Creates a post-build command which strips a binary. FORCE flag will override
#
function(OMNITRACE_STRIP_TARGET _TARGET)
if(CMAKE_STRIP AND OMNITRACE_STRIP_LIBRARIES)
add_custom_command(
TARGET ${_TARGET}
POST_BUILD
COMMAND
${CMAKE_STRIP} -w --keep-symbol="omnitrace_init"
--keep-symbol="omnitrace_finalize" --keep-symbol="omnitrace_push_trace"
--keep-symbol="omnitrace_pop_trace" --keep-symbol="omnitrace_push_region"
--keep-symbol="omnitrace_pop_region" --keep-symbol="omnitrace_set_env"
--keep-symbol="omnitrace_set_mpi" --keep-symbol="omnitrace_reset_preload"
--keep-symbol="omnitrace_user_*" --keep-symbol="ompt_start_tool"
--keep-symbol="kokkosp_*" --keep-symbol="OnLoad" --keep-symbol="OnUnload"
--keep-symbol="OnLoadToolProp" --keep-symbol="OnUnloadTool"
--keep-symbol="__libc_start_main" ${ARGN} $<TARGET_FILE:${_TARGET}>
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Stripping ${_TARGET}...")
function(OMNITRACE_STRIP_TARGET)
cmake_parse_arguments(STRIP "FORCE;EXPLICIT" "" "ARGS" ${ARGN})
list(LENGTH STRIP_UNPARSED_ARGUMENTS NUM_UNPARSED)
if(NUM_UNPARSED EQUAL 1)
set(_TARGET "${STRIP_UNPARSED_ARGUMENTS}")
else()
omnitrace_message(FATAL_ERROR
"omnitrace_strip_target cannot deduce target from \"${ARGN}\"")
endif()
if(NOT TARGET "${_TARGET}")
omnitrace_message(
FATAL_ERROR
"omnitrace_strip_target not provided valid target: \"${_TARGET}\"")
endif()
if(CMAKE_STRIP AND (STRIP_FORCE OR OMNITRACE_STRIP_LIBRARIES))
if(STRIP_EXPLICIT)
add_custom_command(
TARGET ${_TARGET}
POST_BUILD
COMMAND ${CMAKE_STRIP} ${STRIP_ARGS} $<TARGET_FILE:${_TARGET}>
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Stripping ${_TARGET}...")
else()
add_custom_command(
TARGET ${_TARGET}
POST_BUILD
COMMAND
${CMAKE_STRIP} -w --keep-symbol="omnitrace_init"
--keep-symbol="omnitrace_finalize"
--keep-symbol="omnitrace_push_trace"
--keep-symbol="omnitrace_pop_trace"
--keep-symbol="omnitrace_push_region"
--keep-symbol="omnitrace_pop_region" --keep-symbol="omnitrace_set_env"
--keep-symbol="omnitrace_set_mpi"
--keep-symbol="omnitrace_reset_preload"
--keep-symbol="omnitrace_user_*" --keep-symbol="ompt_start_tool"
--keep-symbol="kokkosp_*" --keep-symbol="OnLoad"
--keep-symbol="OnUnload" --keep-symbol="OnLoadToolProp"
--keep-symbol="OnUnloadTool" --keep-symbol="__libc_start_main"
${STRIP_ARGS} $<TARGET_FILE:${_TARGET}>
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
COMMENT "Stripping ${_TARGET}...")
endif()
endif()
endfunction()
+1
Ver fichero
@@ -53,3 +53,4 @@ add_subdirectory(lulesh)
add_subdirectory(rccl)
add_subdirectory(rewrite-caller)
add_subdirectory(causal)
add_subdirectory(trace-time-window)
+24
Ver fichero
@@ -0,0 +1,24 @@
cmake_minimum_required(VERSION 3.15 FATAL_ERROR)
project(omnitrace-trace-time-window-example LANGUAGES CXX)
if(OMNITRACE_DISABLE_EXAMPLES)
get_filename_component(_DIR ${CMAKE_CURRENT_LIST_DIR} NAME)
if(${PROJECT_NAME} IN_LIST OMNITRACE_DISABLE_EXAMPLES OR ${_DIR} IN_LIST
OMNITRACE_DISABLE_EXAMPLES)
return()
endif()
endif()
set(CMAKE_BUILD_TYPE "Debug")
add_executable(trace-time-window trace-time-window.cpp)
target_compile_options(trace-time-window PRIVATE ${_FLAGS})
if(OMNITRACE_INSTALL_EXAMPLES)
install(
TARGETS trace-time-window
DESTINATION bin
COMPONENT omnitrace-examples)
endif()
@@ -0,0 +1,80 @@
#include <chrono>
#include <cstdio>
#include <cstdlib>
#include <ratio>
#include <string>
#include <thread>
#define NOINLINE __attribute__((noinline))
NOINLINE size_t
inner();
NOINLINE size_t
outer_a();
NOINLINE size_t
outer_b();
NOINLINE size_t
outer_c();
NOINLINE size_t
outer_d();
NOINLINE size_t
outer_e();
int
main(int argc, char** argv)
{
int nrepeat = 1;
if(argc > 1) nrepeat = atol(argv[1]);
std::string _name = argv[0];
auto _pos = _name.find_last_of('/');
if(_pos != std::string::npos) _name = _name.substr(_pos + 1);
size_t nitr = 0;
for(int i = 0; i < nrepeat; ++i)
{
nitr += outer_a();
nitr += outer_b();
nitr += outer_c();
nitr += outer_d();
nitr += outer_e();
printf("[%s][%i] number of calls made = %zu\n", _name.c_str(), i, nitr);
}
}
size_t
inner(size_t _duration)
{
static int64_t _n = 0;
if(_n++ % 5 == 2)
{
using clock_type = std::chrono::high_resolution_clock;
auto _end = clock_type::now() + std::chrono::milliseconds{ _duration };
size_t nitr = 0;
while(clock_type::now() < _end)
{
++nitr;
}
return nitr;
}
else
{
std::this_thread::sleep_for(std::chrono::milliseconds{ _duration });
return 1;
}
}
#define OUTER_FUNCTION(TAG) \
size_t outer_##TAG() { return inner(500); }
OUTER_FUNCTION(a)
OUTER_FUNCTION(b)
OUTER_FUNCTION(c)
OUTER_FUNCTION(d)
OUTER_FUNCTION(e)
-3
Ver fichero
@@ -19,7 +19,6 @@ def which(cmd, require):
def generate_custom(args, cmake_args, ctest_args):
if not os.path.exists(args.binary_dir):
os.makedirs(args.binary_dir)
@@ -74,7 +73,6 @@ def generate_custom(args, cmake_args, ctest_args):
def generate_dashboard_script(args):
CODECOV = 1 if args.coverage else 0
DASHBOARD_MODE = args.mode
SOURCE_DIR = os.path.realpath(args.source_dir)
@@ -244,7 +242,6 @@ def run(*args, **kwargs):
if __name__ == "__main__":
args, cmake_args, ctest_args = parse_args()
if not os.path.exists(args.binary_dir):
+13 -20
Ver fichero
@@ -58,7 +58,7 @@ namespace console = ::tim::utility::console;
namespace argparse = ::tim::argparse;
using namespace timemory::join;
using tim::get_env;
using tim::log::colorized;
using tim::log::monochrome;
using tim::log::stream;
namespace std
@@ -535,15 +535,6 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
exit(EXIT_FAILURE);
});
auto _add_separator = [&](std::string _v, const std::string& _desc) {
parser.add_argument({ "" }, "");
parser
.add_argument({ join("", "[", _v, "]") },
(_desc.empty()) ? _desc : join({ "", "(", ")" }, _desc))
.color(color::info());
parser.add_argument({ "" }, "");
};
parser.enable_help();
parser.enable_version("omnitrace-causal", "v" OMNITRACE_VERSION_STRING,
OMNITRACE_GIT_DESCRIBE, OMNITRACE_GIT_REVISION);
@@ -553,16 +544,16 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
parser.set_description_width(
std::min<int>(_cols - parser.get_help_width() - 8, 120));
_add_separator("DEBUG OPTIONS", "");
parser.start_group("DEBUG OPTIONS", "");
parser.add_argument({ "--monochrome" }, "Disable colorized output")
.max_count(1)
.dtype("bool")
.action([&](parser_t& p) {
auto _colorized = !p.get<bool>("monochrome");
colorized() = _colorized;
p.set_use_color(_colorized);
update_env(_env, "OMNITRACE_COLORIZED_LOG", (_colorized) ? "1" : "0");
update_env(_env, "COLORIZED_LOG", (_colorized) ? "1" : "0");
auto _monochrome = p.get<bool>("monochrome");
monochrome() = _monochrome;
p.set_use_color(!_monochrome);
update_env(_env, "OMNITRACE_MONOCHROME", (_monochrome) ? "1" : "0");
update_env(_env, "MONOCHROME", (_monochrome) ? "1" : "0");
});
parser.add_argument({ "--debug" }, "Debug output")
.max_count(1)
@@ -582,7 +573,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
bool _generate_configs = false;
bool _add_defaults = true;
_add_separator("GENERAL OPTIONS", "");
parser.start_group("GENERAL OPTIONS", "");
parser.add_argument({ "-c", "--config" }, "Base configuration file")
.min_count(0)
.dtype("filepath")
@@ -629,8 +620,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
.dtype("bool")
.action([&](parser_t& p) { _add_defaults = !p.get<bool>("no-defaults"); });
_add_separator("CAUSAL PROFILING OPTIONS (General)",
"These settings will be applied to all causal profiling runs");
parser.start_group("CAUSAL PROFILING OPTIONS (General)",
"These settings will be applied to all causal profiling runs");
parser.add_argument({ "-m", "--mode" }, "Causal profiling mode")
.count(1)
.dtype("string")
@@ -706,7 +697,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
.dtype("int")
.action([&](parser_t& p) { _niterations = p.get<int64_t>("iterations"); });
_add_separator(
parser.start_group(
"CAUSAL PROFILING OPTIONS (Combinatorial)",
"Each individual argument to these options will multiply the number runs by the "
"number of arguments and the number of iterations. E.g. -n 2 -B \"MAIN\" -F "
@@ -804,6 +795,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env,
_function_excludes = p.get<std::vector<std::string>>("function-exclude");
});
parser.end_group();
#if OMNITRACE_HIP_VERSION > 0 && OMNITRACE_HIP_VERSION < 50300
update_env(_env, "HSA_ENABLE_INTERRUPT", 0);
#endif
+138 -33
Ver fichero
@@ -54,14 +54,47 @@
namespace color = tim::log::color;
using namespace timemory::join;
using tim::get_env;
using tim::log::colorized;
using tim::log::monochrome;
using tim::log::stream;
namespace
{
int verbose = 0;
auto updated_envs = std::set<std::string_view>{};
auto original_envs = std::set<std::string>{};
int verbose = 0;
auto updated_envs = std::set<std::string_view>{};
auto original_envs = std::set<std::string>{};
auto clock_id_choices = []() {
auto clock_name = [](std::string _v) {
constexpr auto _clock_prefix = std::string_view{ "clock_" };
for(auto& itr : _v)
itr = tolower(itr);
auto _pos = _v.find(_clock_prefix);
if(_pos == 0) _v = _v.substr(_pos + _clock_prefix.length());
if(_v == "process_cputime_id") _v = "cputime";
return _v;
};
#define OMNITRACE_CLOCK_IDENTIFIER(VAL) \
std::make_tuple(clock_name(#VAL), VAL, std::string_view{ #VAL })
auto _choices = std::vector<std::string>{};
auto _aliases = std::map<std::string, std::vector<std::string>>{};
for(auto itr : { OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_PROCESS_CPUTIME_ID),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_RAW),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME_COARSE),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_COARSE),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_BOOTTIME) })
{
auto _choice = std::to_string(std::get<1>(itr));
_choices.emplace_back(_choice);
_aliases[_choice] = { std::get<0>(itr), std::string{ std::get<2>(itr) } };
}
#undef OMNITRACE_CLOCK_IDENTIFIER
return std::make_pair(_choices, _aliases);
}();
} // namespace
std::string
@@ -329,15 +362,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
%{INDENT}%- discard : new data is ignored
%{INDENT}%- ring_buffer : new data overwrites oldest data)";
auto _add_separator = [&](std::string _v, const std::string& _desc) {
parser.add_argument({ "" }, "");
parser
.add_argument({ join("", "[", _v, "]") },
(_desc.empty()) ? _desc : join({ "", "(", ")" }, _desc))
.color(tim::log::color::info());
parser.add_argument({ "" }, "");
};
parser.set_use_color(true);
parser.enable_help();
parser.enable_version("omnitrace-sample", "v" OMNITRACE_VERSION_STRING,
OMNITRACE_GIT_DESCRIBE, OMNITRACE_GIT_REVISION);
@@ -347,16 +372,16 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
parser.set_description_width(
std::min<int>(_cols - parser.get_help_width() - 8, 120));
_add_separator("DEBUG OPTIONS", "");
parser.start_group("DEBUG OPTIONS", "");
parser.add_argument({ "--monochrome" }, "Disable colorized output")
.max_count(1)
.dtype("bool")
.action([&](parser_t& p) {
auto _colorized = !p.get<bool>("monochrome");
colorized() = _colorized;
p.set_use_color(_colorized);
update_env(_env, "OMNITRACE_COLORIZED_LOG", (_colorized) ? "1" : "0");
update_env(_env, "COLORIZED_LOG", (_colorized) ? "1" : "0");
auto _monochrome = p.get<bool>("monochrome");
monochrome() = _monochrome;
p.set_use_color(!_monochrome);
update_env(_env, "OMNITRACE_MONOCHROME", (_monochrome) ? "1" : "0");
update_env(_env, "MONOCHROME", (_monochrome) ? "1" : "0");
});
parser.add_argument({ "--debug" }, "Debug output")
.max_count(1)
@@ -371,7 +396,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
update_env(_env, "OMNITRACE_VERBOSE", _v);
});
_add_separator("GENERAL OPTIONS", "");
parser.start_group("GENERAL OPTIONS",
"These are options which are ubiquitously applied");
parser.add_argument({ "-c", "--config" }, "Configuration file")
.min_count(0)
.dtype("filepath")
@@ -437,8 +463,28 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
update_env(_env, "OMNITRACE_USE_PROCESS_SAMPLING", _h || _d);
update_env(_env, "OMNITRACE_USE_ROCM_SMI", _d);
});
parser
.add_argument({ "-w", "--wait" },
"This option is a combination of '--trace-wait' and "
"'--sampling-wait'. See the descriptions for those two options.")
.count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_TRACE_DELAY", p.get<double>("wait"));
update_env(_env, "OMNITRACE_SAMPLING_DELAY", p.get<double>("wait"));
});
parser
.add_argument(
{ "-d", "--duration" },
"This option is a combination of '--trace-duration' and "
"'--sampling-duration'. See the descriptions for those two options.")
.count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_TRACE_DURATION", p.get<double>("duration"));
update_env(_env, "OMNITRACE_SAMPLING_DURATION", p.get<double>("duration"));
});
_add_separator("TRACING OPTIONS", "");
parser.start_group("TRACING OPTIONS", "Specific options controlling tracing (i.e. "
"deterministic measurements of every event)");
parser
.add_argument({ "--trace-file" },
"Specify the trace output filename. Relative filepath will be with "
@@ -464,8 +510,57 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
update_env(_env, "OMNITRACE_PERFETTO_FILL_POLICY",
p.get<std::string>("trace-fill-policy"));
});
parser
.add_argument({ "--trace-wait" },
"Set the wait time (in seconds) "
"before collecting trace and/or profiling data"
"(in seconds). By default, the duration is in seconds of realtime "
"but that can changed via --trace-clock-id.")
.count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_TRACE_DELAY", p.get<double>("trace-wait"));
});
parser
.add_argument({ "--trace-duration" },
"Set the duration of the trace and/or profile data collection (in "
"seconds). By default, the duration is in seconds of realtime but "
"that can changed via --trace-clock-id.")
.count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_TRACE_DURATION", p.get<double>("trace-duration"));
});
parser
.add_argument(
{ "--trace-periods" },
"More powerful version of specifying trace delay and/or duration. Format is "
"one or more groups of: <DELAY>:<DURATION>, <DELAY>:<DURATION>:<REPEAT>, "
"and/or <DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>.")
.min_count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_TRACE_PERIODS",
join(array_config{ ",", "", "" },
p.get<std::vector<std::string>>("trace-periods")));
});
parser
.add_argument(
{ "--trace-clock-id" },
"Set the default clock ID for for trace delay/duration. Note: \"cputime\" is "
"the *process* CPU time and might need to be scaled based on the number of "
"threads, i.e. 4 seconds of CPU-time for an application with 4 fully active "
"threads would equate to ~1 second of realtime. If this proves to be "
"difficult to handle in practice, please file a feature request for "
"omnitrace to auto-scale based on the number of threads.")
.count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_TRACE_PERIOD_CLOCK_ID",
p.get<double>("trace-clock-id"));
})
.choices(clock_id_choices.first)
.choice_aliases(clock_id_choices.second);
_add_separator("PROFILE OPTIONS", "");
parser.start_group("PROFILE OPTIONS",
"Specific options controlling profiling (i.e. deterministic "
"measurements which are aggregated into a summary)");
parser.add_argument({ "--profile-format" }, "Data formats for profiling results")
.min_count(1)
.max_count(3)
@@ -496,7 +591,10 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
if(_v.size() > 1) update_env(_env, "OMNITRACE_INPUT_PREFIX", _v.at(1));
});
_add_separator("HOST/DEVICE (PROCESS SAMPLING) OPTIONS", "");
parser.start_group(
"HOST/DEVICE (PROCESS SAMPLING) OPTIONS",
"Process sampling is background measurements for resources available to the "
"entire process. These samples are not tied to specific lines/regions of code");
parser
.add_argument({ "--process-freq" },
"Set the default host/device sampling frequency "
@@ -545,7 +643,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
join(array_config{ "," }, p.get<std::vector<std::string>>("gpus")));
});
_add_separator("GENERAL SAMPLING OPTIONS", "");
parser.start_group("GENERAL SAMPLING OPTIONS",
"General options for timer-based sampling per-thread");
parser
.add_argument({ "-f", "--freq" }, "Set the default sampling frequency "
"(number of interrupts per second)")
@@ -555,23 +654,24 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
});
parser
.add_argument(
{ "-w", "--wait" },
{ "--sampling-wait" },
"Set the default wait time (i.e. delay) before taking first sample "
"(in seconds). This delay time is based on the clock of the sampler, i.e., a "
"delay of 1 second for CPU-clock sampler may not equal 1 second of realtime")
.count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_SAMPLING_DELAY", p.get<double>("wait"));
update_env(_env, "OMNITRACE_SAMPLING_DELAY", p.get<double>("sampling-wait"));
});
parser
.add_argument(
{ "-d", "--duration" },
{ "--sampling-duration" },
"Set the duration of the sampling (in seconds of realtime). I.e., it is "
"possible (currently) to set a CPU-clock time delay that exceeds the "
"real-time duration... resulting in zero samples being taken")
.count(1)
.action([&](parser_t& p) {
update_env(_env, "OMNITRACE_SAMPLING_DURATION", p.get<double>("duration"));
update_env(_env, "OMNITRACE_SAMPLING_DURATION",
p.get<double>("sampling-duration"));
});
parser
.add_argument({ "-t", "--tids" },
@@ -584,7 +684,9 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
join(array_config{ ", " }, p.get<std::vector<int64_t>>("tids")));
});
_add_separator("SAMPLING TIMER OPTIONS", "");
parser.start_group(
"SAMPLING TIMER OPTIONS",
"These options determine the heuristic for deciding when to take a sample");
parser.add_argument({ "--cputime" }, _cputime_desc)
.min_count(0)
.action([&](parser_t& p) {
@@ -660,8 +762,9 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
_backend_choices.erase("rocprofiler");
#endif
_add_separator("BACKEND OPTIONS", "These options control region information captured "
"w/o sampling or instrumentation");
parser.start_group("BACKEND OPTIONS",
"These options control region information captured "
"w/o sampling or instrumentation");
parser.add_argument({ "-I", "--include" }, "Include data from these backends")
.choices(_backend_choices)
.action([&](parser_t& p) {
@@ -727,7 +830,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
remove_env(_env, "KOKKOS_PROFILE_LIBRARY");
});
_add_separator("HARDWARE COUNTER OPTIONS", "");
parser.start_group("HARDWARE COUNTER OPTIONS", "See also: omnitrace-avail -H");
parser
.add_argument({ "-C", "--cpu-events" },
"Set the CPU hardware counter events to record (ref: "
@@ -750,7 +853,7 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
});
#endif
_add_separator("MISCELLANEOUS OPTIONS", "");
parser.start_group("MISCELLANEOUS OPTIONS", "");
parser
.add_argument({ "-i", "--inlines" },
"Include inline info in output when available")
@@ -768,6 +871,8 @@ parse_args(int argc, char** argv, std::vector<char*>& _env)
update_env(_env, "HSA_ENABLE_INTERRUPT", p.get<int>("hsa-interrupt"));
});
parser.end_group();
auto _inpv = std::vector<char*>{};
auto _outv = std::vector<char*>{};
bool _hash = false;
+1
Ver fichero
@@ -154,6 +154,7 @@ for pref in preferences:
from recommonmark.transform import AutoStructify
# app setup hook
def setup(app):
app.add_config_value(
+2 -2
Ver fichero
@@ -550,8 +550,8 @@ extern "C"
{
void omnitrace_preinit_library(void)
{
if(!omnitrace::common::get_env("OMNITRACE_COLORIZED_LOG", tim::log::colorized()))
tim::log::colorized() = false;
if(omnitrace::common::get_env("OMNITRACE_MONOCHROME", tim::log::monochrome()))
tim::log::monochrome() = true;
}
int omnitrace_preload_library(void)
@@ -75,6 +75,8 @@ extern "C"
OMNITRACE_CATEGORY_PROCESS_PAGE_FAULT,
OMNITRACE_CATEGORY_PROCESS_USER_MODE_TIME,
OMNITRACE_CATEGORY_PROCESS_KERNEL_MODE_TIME,
OMNITRACE_CATEGORY_THREAD_WALL_TIME,
OMNITRACE_CATEGORY_THREAD_CPU_TIME,
OMNITRACE_CATEGORY_THREAD_PAGE_FAULT,
OMNITRACE_CATEGORY_THREAD_PEAK_MEMORY,
OMNITRACE_CATEGORY_THREAD_CONTEXT_SWITCH,
+2 -6
Ver fichero
@@ -15,12 +15,8 @@ target_include_directories(
omnitrace-interface-library INTERFACE ${CMAKE_CURRENT_SOURCE_DIR}
${CMAKE_CURRENT_BINARY_DIR})
target_compile_definitions(
omnitrace-interface-library
INTERFACE
OMNITRACE_MAX_THREADS=${OMNITRACE_MAX_THREADS}
$<BUILD_INTERFACE:$<IF:$<BOOL:${OMNITRACE_CUSTOM_DATA_SOURCE}>,CUSTOM_DATA_SOURCE,>>
)
target_compile_definitions(omnitrace-interface-library
INTERFACE OMNITRACE_MAX_THREADS=${OMNITRACE_MAX_THREADS})
target_link_libraries(
omnitrace-interface-library
+39 -13
Ver fichero
@@ -20,12 +20,13 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
// clang-format off
#include <timemory/log/color.hpp>
// clang-format on
//
// above should always be included first
//
#include "api.hpp"
#include "common/setup.hpp"
#include "library/categories.hpp"
#include "library/causal/data.hpp"
#include "library/causal/experiment.hpp"
#include "library/causal/sampling.hpp"
@@ -37,6 +38,7 @@
#include "library/components/rocprofiler.hpp"
#include "library/concepts.hpp"
#include "library/config.hpp"
#include "library/constraint.hpp"
#include "library/coverage.hpp"
#include "library/critical_trace.hpp"
#include "library/debug.hpp"
@@ -56,24 +58,25 @@
#include "library/utility.hpp"
#include "omnitrace/categories.h" // in omnitrace-user
#include <timemory/process/threading.hpp>
#include <timemory/signals/signal_handlers.hpp>
#include <timemory/signals/types.hpp>
#include <timemory/hash/types.hpp>
#include <timemory/manager/manager.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/operations/types/file_output_message.hpp>
#include <timemory/signals/signal_mask.hpp>
#include <timemory/process/threading.hpp>
#include <timemory/settings/types.hpp>
#include <timemory/signals/signal_handlers.hpp>
#include <timemory/signals/signal_mask.hpp>
#include <timemory/signals/types.hpp>
#include <timemory/utility/backtrace.hpp>
#include <timemory/utility/procfs/maps.hpp>
#include <atomic>
#include <cstdio>
#include <cstdlib>
#include <mutex>
#include <stdexcept>
#include <string_view>
#include <utility>
#include <cstdlib>
#include <stdexcept>
using namespace omnitrace;
@@ -122,9 +125,18 @@ ensure_initialization(bool _offset, int64_t _glob_n, int64_t _offset_n)
return _offset;
}
void
finalization_handler()
{
if(get_state() == State::Active) omnitrace_finalize();
}
auto
ensure_finalization(bool _static_init = false)
{
if(config::set_signal_handler(nullptr) == nullptr)
config::set_signal_handler(&finalization_handler);
if(_static_init)
{
auto _idx = threading::add_callback(&ensure_initialization);
@@ -132,6 +144,12 @@ ensure_finalization(bool _static_init = false)
throw exception<std::runtime_error>("failure adding threading callback");
}
OMNITRACE_CI_BASIC_THROW(
config::set_signal_handler(nullptr) != &finalization_handler,
"Assignment of signal handler failed. signal handler is %s, expected %s\n",
as_hex(reinterpret_cast<void*>(config::set_signal_handler(nullptr))).c_str(),
as_hex(reinterpret_cast<void*>(&finalization_handler)).c_str());
const auto& _info = thread_info::init();
const auto& _tid = _info->index_data;
if(_tid)
@@ -144,7 +162,7 @@ ensure_finalization(bool _static_init = false)
_tid->system_value);
}
if(!get_env("OMNITRACE_COLORIZED_LOG", true)) tim::log::colorized() = false;
if(get_env("OMNITRACE_MONOCHROME", false)) tim::log::monochrome() = true;
(void) tim::manager::instance();
(void) tim::settings::shared_instance();
@@ -192,7 +210,7 @@ struct fini_bundle
{
using data_type = std::tuple<Tp...>;
TIMEMORY_DEFAULT_OBJECT(fini_bundle)
OMNITRACE_DEFAULT_OBJECT(fini_bundle)
fini_bundle(std::string_view _label)
: m_label{ _label }
@@ -400,7 +418,7 @@ omnitrace_init_library_hidden()
extern "C" bool
omnitrace_init_tooling_hidden()
{
if(!get_env("OMNITRACE_COLORIZED_LOG", true, false)) tim::log::colorized() = false;
if(get_env("OMNITRACE_MONOCHROME", false, false)) tim::log::monochrome() = true;
if(!tim::get_env("OMNITRACE_INIT_TOOLING", true))
{
@@ -538,6 +556,8 @@ omnitrace_init_tooling_hidden()
omnitrace::perfetto::start();
}
categories::setup();
// if static objects are destroyed in the inverse order of when they are
// created this should ensure that finalization is called before perfetto
// ends the tracing session
@@ -701,6 +721,10 @@ omnitrace_finalize_hidden(void)
push_enable_sampling_on_child_threads(false);
set_sampling_on_all_future_threads(false);
// if the categories are not enabled, it can/will suppress generating output for data
// in category
categories::enable_categories();
auto _debug_init = get_debug_finalize();
auto _debug_value = get_debug();
if(_debug_init) config::set_setting_value("OMNITRACE_DEBUG", true);
@@ -951,7 +975,7 @@ omnitrace_finalize_hidden(void)
bool _perfetto_output_error = false;
if(get_use_perfetto() && !is_system_backend())
{
auto& tracing_session = tracing::get_perfetto_session();
auto& tracing_session = get_perfetto_session();
OMNITRACE_CI_THROW(tracing_session == nullptr,
"Null pointer to the tracing session");
@@ -1061,6 +1085,8 @@ omnitrace_finalize_hidden(void)
"omnitrace", _cfg);
}
categories::shutdown();
_finalization.stop();
if(_perfetto_output_error)
@@ -3,7 +3,9 @@ configure_file(${CMAKE_CURRENT_SOURCE_DIR}/defines.hpp.in
${CMAKE_CURRENT_BINARY_DIR}/defines.hpp @ONLY)
set(library_sources
${CMAKE_CURRENT_LIST_DIR}/categories.cpp
${CMAKE_CURRENT_LIST_DIR}/config.cpp
${CMAKE_CURRENT_LIST_DIR}/constraint.cpp
${CMAKE_CURRENT_LIST_DIR}/coverage.cpp
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.cpp
${CMAKE_CURRENT_LIST_DIR}/critical_trace.cpp
@@ -31,6 +33,7 @@ set(library_headers
${CMAKE_CURRENT_LIST_DIR}/common.hpp
${CMAKE_CURRENT_LIST_DIR}/concepts.hpp
${CMAKE_CURRENT_LIST_DIR}/config.hpp
${CMAKE_CURRENT_LIST_DIR}/constraint.hpp
${CMAKE_CURRENT_LIST_DIR}/coverage.hpp
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.hpp
${CMAKE_CURRENT_LIST_DIR}/critical_trace.hpp
@@ -39,7 +39,7 @@ struct address_multirange
struct coarse
{};
TIMEMORY_DEFAULT_OBJECT(address_multirange)
OMNITRACE_DEFAULT_OBJECT(address_multirange)
address_multirange& operator+=(std::pair<coarse, uintptr_t>&&);
address_multirange& operator+=(std::pair<coarse, address_range>&& _v);
@@ -43,7 +43,7 @@ struct address_range
uintptr_t low = std::numeric_limits<uintptr_t>::max();
uintptr_t high = std::numeric_limits<uintptr_t>::min();
TIMEMORY_DEFAULT_OBJECT(address_range)
OMNITRACE_DEFAULT_OBJECT(address_range)
explicit address_range(uintptr_t _v);
address_range(uintptr_t _low, uintptr_t _high);
+12 -5
Ver fichero
@@ -64,7 +64,7 @@ namespace binary
namespace
{
binary_info
parse_line_info(const std::string& _name)
parse_line_info(const std::string& _name, bool _process_dwarf)
{
auto _info = binary_info{};
@@ -105,10 +105,17 @@ parse_line_info(const std::string& _name)
<< "section set size (" << _section_set.size() << ") != section map size ("
<< _section_map.size() << ")\n";
_info.debug_info = dwarf_entry::process_dwarf(_bfd->fd, _info.ranges);
if(_process_dwarf)
{
std::tie(_info.debug_info, _info.ranges, _info.breakpoints) =
dwarf_entry::process_dwarf(_bfd->fd);
}
for(auto& itr : _info.symbols)
itr.read_dwarf(_info.debug_info);
{
itr.read_dwarf_entries(_info.debug_info);
itr.read_dwarf_breakpoints(_info.breakpoints);
}
_info.sort();
}
@@ -122,7 +129,7 @@ parse_line_info(const std::string& _name)
std::vector<binary_info>
get_binary_info(const std::vector<std::string>& _files,
const std::vector<scope_filter>& _filters)
const std::vector<scope_filter>& _filters, bool _process_dwarf)
{
auto _satisfies_filter = [&_filters](auto _scope, const std::string& _value) {
for(const auto& itr : _filters) // NOLINT
@@ -157,7 +164,7 @@ get_binary_info(const std::vector<std::string>& _files,
if(filepath::exists(_filename) && _satisfies_binary_filter(_filename) &&
_exists.find(_filename) == _exists.end())
{
_data.emplace_back(parse_line_info(_filename));
_data.emplace_back(parse_line_info(_filename, _process_dwarf));
_exists.emplace(_filename);
}
}
@@ -54,6 +54,7 @@ using bfd_file = ::tim::unwind::bfd_file;
using hash_value_t = ::tim::hash_value_t;
std::vector<binary_info>
get_binary_info(const std::vector<std::string>&, const std::vector<scope_filter>&);
get_binary_info(const std::vector<std::string>&, const std::vector<scope_filter>&,
bool _process_dwarf = true);
} // namespace binary
} // namespace omnitrace
@@ -30,8 +30,10 @@
#include <timemory/utility/procfs/maps.hpp>
#include <cstdint>
#include <deque>
#include <memory>
#include <string>
#include <vector>
namespace omnitrace
@@ -40,17 +42,19 @@ namespace binary
{
struct binary_info
{
std::shared_ptr<bfd_file> bfd = {};
std::vector<procfs::maps> mappings = {};
std::deque<symbol> symbols = {};
std::deque<dwarf_entry> debug_info = {};
std::vector<address_range> ranges = {};
std::unordered_map<address_range, void*> sections = {};
std::shared_ptr<bfd_file> bfd = {};
std::vector<procfs::maps> mappings = {};
std::deque<symbol> symbols = {};
std::deque<dwarf_entry> debug_info = {};
std::vector<address_range> ranges = {};
std::vector<uintptr_t> breakpoints = {};
std::unordered_map<address_range, void*> sections = {};
void sort();
void sort();
std::string filename() const;
template <typename RetT = void>
RetT* find_section(uintptr_t);
RetT* find_section(uintptr_t) const;
};
inline void
@@ -60,11 +64,12 @@ binary_info::sort()
utility::filter_sort_unique(symbols);
utility::filter_sort_unique(ranges);
utility::filter_sort_unique(debug_info);
utility::filter_sort_unique(breakpoints);
}
template <typename RetT>
inline RetT*
binary_info::find_section(uintptr_t _addr)
binary_info::find_section(uintptr_t _addr) const
{
for(const auto& sitr : sections)
{
@@ -72,5 +77,11 @@ binary_info::find_section(uintptr_t _addr)
}
return nullptr;
}
inline std::string
binary_info::filename() const
{
return (bfd) ? std::string{ bfd->name } : std::string{};
}
} // namespace binary
} // namespace omnitrace
@@ -41,28 +41,51 @@ get_dwarf_address_ranges(Dwarf_Die* _die)
{
auto _ranges = std::vector<address_range>{};
if(dwarf_tag(_die) != DW_TAG_compile_unit) return _ranges;
if(dwarf_tag(_die) != DW_TAG_compile_unit && dwarf_tag(_die) != DW_TAG_subprogram)
return _ranges;
Dwarf_Addr _low_pc;
Dwarf_Addr _high_pc;
dwarf_lowpc(_die, &_low_pc);
dwarf_highpc(_die, &_high_pc);
_ranges.emplace_back(address_range{ _low_pc, _high_pc });
if(_low_pc > _high_pc)
{
Dwarf_Addr _entry_pc;
dwarf_entrypc(_die, &_entry_pc);
if(_entry_pc < _low_pc) _low_pc = _entry_pc;
}
if(_low_pc < _high_pc) _ranges.emplace_back(_low_pc, _high_pc);
Dwarf_Addr _base_addr;
ptrdiff_t _offset = 0;
do
{
_ranges.emplace_back(address_range{ 0, 0 });
} while((_offset = dwarf_ranges(_die, _offset, &_base_addr, &_ranges.back().low,
&_ranges.back().high)) > 0);
// will always have one extra
_ranges.pop_back();
uintptr_t _low = 0;
uintptr_t _high = 0;
_offset = dwarf_ranges(_die, _offset, &_base_addr, &_low, &_high);
if(_low < _high) _ranges.emplace_back(_low, _high);
} while(_offset > 0);
return _ranges;
}
auto
get_dwarf_breakpoints(Dwarf_Die* _die)
{
auto _bkpts = std::vector<uintptr_t>{};
if(dwarf_tag(_die) != DW_TAG_subprogram) return _bkpts;
Dwarf_Addr* _pts = nullptr;
auto _npts = dwarf_entry_breakpoints(_die, &_pts);
if(_npts > 0 && _pts) _bkpts.assign(_pts, _pts + _npts);
return _bkpts;
}
auto
get_dwarf_entry(Dwarf_Die* _die)
{
@@ -133,37 +156,50 @@ dwarf_entry::is_valid() const
return (*this != dwarf_entry{} && !file.empty());
}
std::deque<dwarf_entry>
dwarf_entry::process_dwarf(int _fd, std::vector<address_range>& _ranges)
dwarf_entry::dwarf_tuple_t
dwarf_entry::process_dwarf(int _fd)
{
auto* _dwarf_v = dwarf_begin(_fd, DWARF_C_READ);
auto _line_info = std::deque<dwarf_entry>{};
auto* _dwarf_v = dwarf_begin(_fd, DWARF_C_READ);
auto _data_v = dwarf_tuple_t{};
size_t cu_header_size = 0;
Dwarf_Off cu_off = 0;
Dwarf_Off next_cu_off = 0;
for(; dwarf_nextcu(_dwarf_v, cu_off, &next_cu_off, &cu_header_size, nullptr, nullptr,
nullptr) == 0;
cu_off = next_cu_off)
if(_dwarf_v)
{
Dwarf_Off cu_die_off = cu_off + cu_header_size;
Dwarf_Die cu_die;
if(dwarf_offdie(_dwarf_v, cu_die_off, &cu_die) != nullptr)
auto& _entries = std::get<0>(_data_v);
auto& _ranges = std::get<1>(_data_v);
auto& _bkpts = std::get<2>(_data_v);
size_t cu_header_size = 0;
Dwarf_Off cu_off = 0;
Dwarf_Off next_cu_off = 0;
for(; dwarf_nextcu(_dwarf_v, cu_off, &next_cu_off, &cu_header_size, nullptr,
nullptr, nullptr) == 0;
cu_off = next_cu_off)
{
Dwarf_Die* _die = &cu_die;
if(dwarf_tag(_die) == DW_TAG_compile_unit)
auto cu_die_off = cu_off + cu_header_size;
auto cu_die = Dwarf_Die{};
if(dwarf_offdie(_dwarf_v, cu_die_off, &cu_die) != nullptr)
{
combine(_line_info, get_dwarf_entry(_die));
combine(_ranges, get_dwarf_address_ranges(_die));
Dwarf_Die* _die = &cu_die;
if(dwarf_tag(_die) == DW_TAG_compile_unit)
{
combine(_entries, get_dwarf_entry(_die));
combine(_ranges, get_dwarf_address_ranges(_die));
}
else if(dwarf_tag(_die) == DW_TAG_subprogram)
{
combine(_bkpts, get_dwarf_breakpoints(_die));
combine(_ranges, get_dwarf_address_ranges(_die));
}
}
}
dwarf_end(_dwarf_v);
utility::filter_sort_unique(_entries);
utility::filter_sort_unique(_ranges);
utility::filter_sort_unique(_bkpts);
}
dwarf_end(_dwarf_v);
utility::filter_sort_unique(_line_info);
utility::filter_sort_unique(_ranges);
return _line_info;
return _data_v;
}
template <typename ArchiveT>
@@ -31,7 +31,11 @@ namespace binary
{
struct dwarf_entry
{
TIMEMORY_DEFAULT_OBJECT(dwarf_entry)
// tuple of dwarf line info, address ranges, and breakpoints
using dwarf_tuple_t = std::tuple<std::deque<dwarf_entry>, std::vector<address_range>,
std::vector<uintptr_t>>;
OMNITRACE_DEFAULT_OBJECT(dwarf_entry)
bool begin_statement = false;
bool end_sequence = false;
@@ -53,7 +57,7 @@ struct dwarf_entry
bool operator!=(const dwarf_entry&) const;
explicit operator bool() const { return is_valid(); }
static std::deque<dwarf_entry> process_dwarf(int _fd, std::vector<address_range>&);
static dwarf_tuple_t process_dwarf(int _fd);
template <typename ArchiveT>
void serialize(ArchiveT&, const unsigned int);
+53 -4
Ver fichero
@@ -39,13 +39,59 @@ namespace omnitrace
{
namespace binary
{
namespace
{
const open_modes_vec_t default_link_open_modes = { (RTLD_LAZY | RTLD_NOLOAD),
(RTLD_LAZY | RTLD_LOCAL) };
}
std::string
get_linked_path(const char* _name, open_modes_vec_t&& _open_modes)
{
if(_name == nullptr) return config::get_exe_realpath();
if(_open_modes.empty()) _open_modes = default_link_open_modes;
auto _lib = std::string{ _name };
void* _handle = nullptr;
bool _noload = false;
for(auto _mode : _open_modes)
{
_handle = dlopen(_name, _mode);
_noload = (_mode & RTLD_NOLOAD) == RTLD_NOLOAD;
if(_handle) break;
}
if(_handle)
{
struct link_map* _link_map = nullptr;
dlinfo(_handle, RTLD_DI_LINKMAP, &_link_map);
if(_link_map != nullptr && !std::string_view{ _link_map->l_name }.empty())
{
_lib = filepath::realpath(_link_map->l_name, nullptr, false);
}
if(_noload == false) dlclose(_handle);
}
return _lib;
}
std::set<link_file>
get_link_map(const char* _lib, const std::string& _exclude_linked_by,
const std::string& _exclude_re)
const std::string& _exclude_re, open_modes_vec_t&& _open_modes)
{
auto _get_chain = [](const char* _name) {
void* _handle = dlopen(_name, RTLD_LAZY | RTLD_NOLOAD);
auto _chain = std::set<std::string>{};
if(_open_modes.empty()) _open_modes = default_link_open_modes;
auto _get_chain = [&_open_modes](const char* _name) {
void* _handle = nullptr;
bool _noload = false;
for(auto _mode : _open_modes)
{
_handle = dlopen(_name, _mode);
_noload = (_mode & RTLD_NOLOAD) == RTLD_NOLOAD;
if(_handle) break;
}
auto _chain = std::set<std::string>{};
if(_handle)
{
struct link_map* _link_map = nullptr;
@@ -66,6 +112,8 @@ get_link_map(const char* _lib, const std::string& _exclude_linked_by,
}
_next = _next->l_next;
}
if(_noload == false) dlclose(_handle);
}
return _chain;
};
@@ -78,6 +126,7 @@ get_link_map(const char* _lib, const std::string& _exclude_linked_by,
for(const auto& itr : _full_chain)
{
std::cout << itr << std::endl;
if(_excl_chain.find(itr) == _excl_chain.end())
{
if(_exclude_re.empty() || !std::regex_search(itr, std::regex{ _exclude_re }))
+10 -1
Ver fichero
@@ -23,14 +23,18 @@
#pragma once
#include <cstdint>
#include <dlfcn.h>
#include <set>
#include <string>
#include <string_view>
#include <vector>
namespace omnitrace
{
namespace binary
{
using open_modes_vec_t = std::vector<int>;
struct link_file
{
link_file(std::string_view&& _v)
@@ -44,11 +48,16 @@ struct link_file
std::string name = {};
};
// helper function for translating generic lib name to resolved path
std::string
get_linked_path(const char*, open_modes_vec_t&& = {});
// default parameters: get the linked binaries for the exe but exclude the linked binaries
// from libomnitrace
std::set<link_file>
get_link_map(const char* _lib = nullptr,
const std::string& _exclude_linked_by = "libomnitrace.so",
const std::string& _exclude_re = "libomnitrace-([a-zA-Z]+)\\.so");
const std::string& _exclude_re = "libomnitrace-([a-zA-Z]+)\\.so",
open_modes_vec_t&& _open_modes = {});
} // namespace binary
} // namespace omnitrace
+15 -1
Ver fichero
@@ -136,7 +136,7 @@ symbol::operator bool() const
}
size_t
symbol::read_dwarf(const std::deque<dwarf_entry>& _info)
symbol::read_dwarf_entries(const std::deque<dwarf_entry>& _info)
{
for(const auto& itr : _info)
{
@@ -173,6 +173,20 @@ symbol::read_dwarf(const std::deque<dwarf_entry>& _info)
return dwarf_info.size();
}
size_t
symbol::read_dwarf_breakpoints(const std::vector<uintptr_t>& _bkpts)
{
for(const auto& itr : _bkpts)
{
if(address.contains(itr)) breakpoints.emplace_back(itr);
}
// make sure the breakpoints are sorted low to high
std::sort(breakpoints.begin(), breakpoints.end());
return breakpoints.size();
}
bool
symbol::read_bfd(bfd_file& _bfd)
{
+3 -1
Ver fichero
@@ -67,7 +67,8 @@ struct symbol : private tim::unwind::bfd_file::symbol
explicit operator bool() const;
bool read_bfd(bfd_file&);
size_t read_dwarf(const std::deque<dwarf_entry>&);
size_t read_dwarf_entries(const std::deque<dwarf_entry>&);
size_t read_dwarf_breakpoints(const std::vector<uintptr_t>&);
address_range ipaddr() const { return address + load_address; }
symbol clone() const;
@@ -89,6 +90,7 @@ struct symbol : private tim::unwind::bfd_file::symbol
address_range address = {};
std::string func = {};
std::string file = {};
std::vector<uintptr_t> breakpoints = {};
std::vector<inlined_symbol> inlines = {};
std::vector<dwarf_entry> dwarf_info = {};
};
+141
Ver fichero
@@ -0,0 +1,141 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/categories.hpp"
#include "library/common.hpp"
#include "library/config.hpp"
#include "library/constraint.hpp"
#include "library/debug.hpp"
#include "library/timemory.hpp"
#include "library/utility.hpp"
#include <set>
#include <string>
namespace omnitrace
{
namespace categories
{
namespace
{
template <typename Tp>
void
configure_categories(bool _enable, const std::set<std::string>& _categories)
{
auto _name = trait::name<Tp>::value;
if(_categories.count(_name) > 0)
{
OMNITRACE_VERBOSE_F(3, "%s category: %s\n", (_enable) ? "Enabling" : "Disabling",
_name);
trait::runtime_enabled<Tp>::set(_enable);
}
}
template <size_t... Idx>
void
configure_categories(bool _enable, const std::set<std::string>& _categories,
std::index_sequence<Idx...>)
{
(configure_categories<category_type_id_t<Idx>>(_enable, _categories), ...);
}
void
configure_categories(bool _enable, const std::set<std::string>& _categories)
{
OMNITRACE_VERBOSE_F(1, "%s categories...\n", (_enable) ? "Enabling" : "Disabling");
configure_categories(
_enable, _categories,
utility::make_index_sequence_range<1, OMNITRACE_CATEGORY_LAST>{});
}
} // namespace
void
enable_categories(const std::set<std::string>& _categories)
{
configure_categories(
true, _categories,
utility::make_index_sequence_range<1, OMNITRACE_CATEGORY_LAST>{});
}
void
disable_categories(const std::set<std::string>& _categories)
{
configure_categories(
false, _categories,
utility::make_index_sequence_range<1, OMNITRACE_CATEGORY_LAST>{});
}
void
setup()
{
// disable specified categories
disable_categories();
auto _trace_specs = constraint::get_trace_specs();
if(!_trace_specs.empty())
{
auto _trace_stages = constraint::get_trace_stages();
_trace_stages.init = [](const constraint::spec& _spec) {
if(_spec.delay > 1.0e-3) disable_categories(config::get_enabled_categories());
return get_state() < State::Finalized;
};
_trace_stages.start = [](const constraint::spec&) {
enable_categories(config::get_enabled_categories());
return get_state() < State::Finalized;
};
_trace_stages.stop = [](const constraint::spec&) {
// only disable categories if not finalized since this might run in background
// during finalization and disable output of data in those categories
if(get_state() < State::Finalized)
disable_categories(config::get_enabled_categories());
return get_state() < State::Finalized;
};
auto _promise = std::promise<void>();
std::thread{ [_trace_specs, _trace_stages](std::promise<void>* _prom) {
// ensure all categories are disabled before proceeding
// if a delay is requested
if(_trace_specs.front().delay > 1.0e-3)
disable_categories(config::get_enabled_categories());
_prom->set_value();
for(const auto& itr : _trace_specs)
itr(_trace_stages);
},
&_promise }
.detach();
_promise.get_future().wait_for(std::chrono::seconds{ 1 });
}
}
void
shutdown()
{
disable_categories(config::get_enabled_categories());
}
} // namespace categories
} // namespace omnitrace
+34
Ver fichero
@@ -122,6 +122,8 @@ OMNITRACE_DEFINE_CATEGORY(category, process_context_switch, OMNITRACE_CATEGORY_P
OMNITRACE_DEFINE_CATEGORY(category, process_page_fault, OMNITRACE_CATEGORY_PROCESS_PAGE_FAULT, "process_page_fault", "Memory page faults in process (collected in background thread)")
OMNITRACE_DEFINE_CATEGORY(category, process_user_mode_time, OMNITRACE_CATEGORY_PROCESS_USER_MODE_TIME, "process_user_cpu_time", "CPU time of functions executing in user-space in process in seconds (collected in background thread)")
OMNITRACE_DEFINE_CATEGORY(category, process_kernel_mode_time, OMNITRACE_CATEGORY_PROCESS_KERNEL_MODE_TIME, "process_kernel_cpu_time", "CPU time of functions executing in kernel-space in process in seconds (collected in background thread)")
OMNITRACE_DEFINE_CATEGORY(category, thread_wall_time, OMNITRACE_CATEGORY_THREAD_WALL_TIME, "thread_wall_time", "Wall-clock time on thread (derived from sampling)")
OMNITRACE_DEFINE_CATEGORY(category, thread_cpu_time, OMNITRACE_CATEGORY_THREAD_CPU_TIME, "thread_cpu_time", "CPU time on thread (derived from sampling)")
OMNITRACE_DEFINE_CATEGORY(category, thread_page_fault, OMNITRACE_CATEGORY_THREAD_PAGE_FAULT, "thread_page_fault", "Memory page faults on thread (derived from sampling)")
OMNITRACE_DEFINE_CATEGORY(category, thread_peak_memory, OMNITRACE_CATEGORY_THREAD_PEAK_MEMORY, "thread_peak_memory", "Peak memory usage on thread in MB (derived from sampling)")
OMNITRACE_DEFINE_CATEGORY(category, thread_context_switch, OMNITRACE_CATEGORY_THREAD_CONTEXT_SWITCH, "thread_context_switch", "Context switches on thread (derived from sampling)")
@@ -182,6 +184,8 @@ using name = perfetto_category<Tp...>;
OMNITRACE_PERFETTO_CATEGORY(category::process_page_fault), \
OMNITRACE_PERFETTO_CATEGORY(category::process_user_mode_time), \
OMNITRACE_PERFETTO_CATEGORY(category::process_kernel_mode_time), \
OMNITRACE_PERFETTO_CATEGORY(category::thread_wall_time), \
OMNITRACE_PERFETTO_CATEGORY(category::thread_cpu_time), \
OMNITRACE_PERFETTO_CATEGORY(category::thread_page_fault), \
OMNITRACE_PERFETTO_CATEGORY(category::thread_peak_memory), \
OMNITRACE_PERFETTO_CATEGORY(category::thread_context_switch), \
@@ -193,3 +197,33 @@ using name = perfetto_category<Tp...>;
#if defined(TIMEMORY_USE_PERFETTO)
# define TIMEMORY_PERFETTO_CATEGORIES OMNITRACE_PERFETTO_CATEGORIES
#endif
#include <set>
#include <string>
namespace omnitrace
{
inline namespace config
{
std::set<std::string>
get_enabled_categories();
std::set<std::string>
get_disabled_categories();
} // namespace config
namespace categories
{
void
enable_categories(const std::set<std::string>& = config::get_enabled_categories());
void
disable_categories(const std::set<std::string>& = config::get_disabled_categories());
void
setup();
void
shutdown();
} // namespace categories
} // namespace omnitrace
@@ -46,7 +46,7 @@ struct blocking_gotcha : comp::base<blocking_gotcha, void>
{
static constexpr size_t gotcha_capacity = 13;
TIMEMORY_DEFAULT_OBJECT(blocking_gotcha)
OMNITRACE_DEFAULT_OBJECT(blocking_gotcha)
// string id for component
static std::string label();
@@ -37,7 +37,7 @@ namespace component
{
struct causal_gotcha : tim::component::base<causal_gotcha, void>
{
TIMEMORY_DEFAULT_OBJECT(causal_gotcha)
OMNITRACE_DEFAULT_OBJECT(causal_gotcha)
// string id for component
static std::string label() { return "causal_gotcha"; }
@@ -52,7 +52,7 @@ struct progress_point : comp::base<progress_point, void>
static std::string label();
static std::string description();
TIMEMORY_DEFAULT_OBJECT(progress_point)
OMNITRACE_DEFAULT_OBJECT(progress_point)
void start();
void stop();
@@ -130,7 +130,7 @@ struct push_node<omnitrace::causal::component::progress_point>
{
using type = omnitrace::causal::component::progress_point;
TIMEMORY_DEFAULT_OBJECT(push_node)
OMNITRACE_DEFAULT_OBJECT(push_node)
push_node(type& _obj, scope::config _scope, hash_value_t _hash,
int64_t _tid = threading::get_id())
@@ -147,7 +147,7 @@ struct pop_node<omnitrace::causal::component::progress_point>
{
using type = omnitrace::causal::component::progress_point;
TIMEMORY_DEFAULT_OBJECT(pop_node)
OMNITRACE_DEFAULT_OBJECT(pop_node)
pop_node(type& _obj, int64_t _tid = threading::get_id()) { (*this)(_obj, _tid); }
@@ -45,7 +45,7 @@ struct unblocking_gotcha : comp::base<unblocking_gotcha, void>
{
static constexpr size_t gotcha_capacity = 8;
TIMEMORY_DEFAULT_OBJECT(unblocking_gotcha)
OMNITRACE_DEFAULT_OBJECT(unblocking_gotcha)
// string id for component
static std::string label();
+1 -1
Ver fichero
@@ -44,7 +44,7 @@ struct delay
{
using value_type = void;
TIMEMORY_DEFAULT_OBJECT(delay)
OMNITRACE_DEFAULT_OBJECT(delay)
static void process();
static void credit();
@@ -90,7 +90,7 @@ struct experiment
static std::string description();
static const std::atomic<experiment*>& get_current_experiment();
TIMEMORY_DEFAULT_OBJECT(experiment)
OMNITRACE_DEFAULT_OBJECT(experiment)
bool start();
bool wait() const; // returns false if interrupted
@@ -47,7 +47,7 @@ namespace causal
{
struct selected_entry
{
TIMEMORY_DEFAULT_OBJECT(selected_entry)
OMNITRACE_DEFAULT_OBJECT(selected_entry)
uintptr_t address = 0x0;
uintptr_t symbol_address = 0x0;
@@ -48,6 +48,7 @@
#include <timemory/mpl.hpp>
#include <timemory/mpl/quirks.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/mpl/types.hpp>
#include <timemory/operations.hpp>
#include <timemory/storage.hpp>
#include <timemory/units.hpp>
@@ -150,10 +151,32 @@ void
backtrace_metrics::stop()
{}
namespace
{
template <typename... Tp>
auto get_enabled(tim::type_list<Tp...>)
{
constexpr size_t N = sizeof...(Tp);
auto _v = std::bitset<N>{};
size_t _n = 0;
(_v.set(_n++, trait::runtime_enabled<Tp>::get()), ...);
return _v;
}
} // namespace
void
backtrace_metrics::sample(int)
{
auto _tid = threading::get_id();
if(!get_enabled(type_list<category::process_sampling, backtrace_metrics>{}).all())
{
m_valid.reset();
return;
}
m_valid = get_enabled(categories_t{});
// return if everything is disabled
if(!m_valid.any()) return;
auto _cache = tim::rusage_cache{ RUSAGE_THREAD };
m_cpu = tim::get_clock_thread_now<int64_t, std::nano>();
m_mem_peak = _cache.get_peak_rss();
@@ -163,16 +186,15 @@ backtrace_metrics::sample(int)
if constexpr(tim::trait::is_available<hw_counters>::value)
{
if(tim::trait::runtime_enabled<hw_counters>::get())
constexpr auto hw_counters_idx = tim::index_of<hw_counters, categories_t>::value;
constexpr auto hw_category_idx =
tim::index_of<category::thread_hardware_counter, categories_t>::value;
auto _tid = threading::get_id();
if(m_valid.test(hw_category_idx) && m_valid.test(hw_counters_idx))
{
assert(get_papi_vector(_tid).get() != nullptr);
m_hw_counter = get_papi_vector(_tid)->record();
// const auto& _cfg = get_papi_vector(_tid)->get_config();
// std::cerr << "Config: ";
// for(size_t i = 0; i < _cfg->size; ++i)
// std::cerr << "[" << _cfg->labels.at(i) << "|" << _cfg->event_names.at(i)
// << "|" << _cfg->event_codes.at(i) << "]";
// std::cerr << "\n";
}
}
}
@@ -220,23 +242,27 @@ backtrace_metrics::configure(bool _setup, int64_t _tid)
}
void
backtrace_metrics::init_perfetto(int64_t _tid)
backtrace_metrics::init_perfetto(int64_t _tid, valid_array_t _valid)
{
auto _hw_cnt_labels = *get_papi_labels(_tid);
auto _tid_name = JOIN("", '[', _tid, ']');
if(!perfetto_counter_track<perfetto_rusage>::exists(_tid))
{
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Peak Memory Usage", _tid_name, "(S)"), "MB");
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Context Switches", _tid_name, "(S)"));
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Page Faults", _tid_name, "(S)"));
if(get_valid(category::thread_peak_memory{}, _valid))
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Peak Memory Usage", _tid_name, "(S)"), "MB");
if(get_valid(category::thread_context_switch{}, _valid))
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Context Switches", _tid_name, "(S)"));
if(get_valid(category::thread_page_fault{}, _valid))
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Page Faults", _tid_name, "(S)"));
}
if(!perfetto_counter_track<hw_counters>::exists(_tid) &&
tim::trait::runtime_enabled<hw_counters>::get())
get_valid(type_list<hw_counters>{}, _valid) &&
get_valid(category::thread_hardware_counter{}, _valid))
{
for(auto& itr : _hw_cnt_labels)
{
@@ -250,7 +276,7 @@ backtrace_metrics::init_perfetto(int64_t _tid)
}
void
backtrace_metrics::fini_perfetto(int64_t _tid)
backtrace_metrics::fini_perfetto(int64_t _tid, valid_array_t _valid)
{
auto _hw_cnt_labels = *get_papi_labels(_tid);
const auto& _thread_info = thread_info::get(_tid, SequentTID);
@@ -260,22 +286,32 @@ backtrace_metrics::fini_perfetto(int64_t _tid)
uint64_t _ts = _thread_info->get_stop();
TRACE_COUNTER("thread_peak_memory",
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts, 0);
if(get_valid(category::thread_peak_memory{}, _valid))
{
TRACE_COUNTER(trait::name<category::thread_peak_memory>::value,
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts, 0);
}
TRACE_COUNTER("thread_context_switch",
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, 0);
if(get_valid(category::thread_context_switch{}, _valid))
{
TRACE_COUNTER(trait::name<category::thread_context_switch>::value,
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, 0);
}
TRACE_COUNTER("thread_page_fault",
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, 0);
if(get_valid(category::thread_page_fault{}, _valid))
{
TRACE_COUNTER(trait::name<category::thread_page_fault>::value,
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, 0);
}
if(tim::trait::runtime_enabled<hw_counters>::get())
if(get_valid(type_list<hw_counters>{}, _valid) &&
get_valid(category::thread_hardware_counter{}, _valid))
{
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid); ++i)
{
if(i < _hw_cnt_labels.size())
{
TRACE_COUNTER("thread_hardware_counter",
TRACE_COUNTER(trait::name<category::thread_hardware_counter>::value,
perfetto_counter_track<hw_counters>::at(_tid, i), _ts, 0.0);
}
}
@@ -285,23 +321,33 @@ backtrace_metrics::fini_perfetto(int64_t _tid)
void
backtrace_metrics::post_process_perfetto(int64_t _tid, uint64_t _ts) const
{
TRACE_COUNTER("thread_peak_memory",
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts,
m_mem_peak / units::megabyte);
if((*this)(category::thread_peak_memory{}))
{
TRACE_COUNTER(trait::name<category::thread_peak_memory>::value,
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts,
m_mem_peak / units::megabyte);
}
TRACE_COUNTER("thread_context_switch",
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, m_ctx_swch);
if((*this)(category::thread_context_switch{}))
{
TRACE_COUNTER(trait::name<category::thread_context_switch>::value,
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts,
m_ctx_swch);
}
TRACE_COUNTER("thread_page_fault",
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, m_page_flt);
if(tim::trait::runtime_enabled<hw_counters>::get())
if((*this)(category::thread_page_fault{}))
{
TRACE_COUNTER(trait::name<category::thread_page_fault>::value,
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts,
m_page_flt);
}
if((*this)(type_list<hw_counters>{}) && (*this)(category::thread_hardware_counter{}))
{
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid); ++i)
{
if(i < m_hw_counter.size())
{
TRACE_COUNTER("thread_hardware_counter",
TRACE_COUNTER(trait::name<category::thread_hardware_counter>::value,
perfetto_counter_track<hw_counters>::at(_tid, i), _ts,
m_hw_counter.at(i));
}
@@ -34,6 +34,7 @@
#include <timemory/components/papi/types.hpp>
#include <timemory/macros/language.hpp>
#include <timemory/mpl/concepts.hpp>
#include <timemory/utility/type_list.hpp>
#include <timemory/variadic/types.hpp>
#include <array>
@@ -45,11 +46,14 @@
namespace omnitrace
{
template <typename... Tp>
using type_list = ::tim::type_list<Tp...>;
namespace component
{
struct backtrace_metrics
: tim::component::empty_base
, tim::concepts::component
, concepts::component
{
static constexpr size_t num_hw_counters = TIMEMORY_PAPI_ARRAY_SIZE;
@@ -60,6 +64,13 @@ struct backtrace_metrics
using system_clock = std::chrono::system_clock;
using system_time_point = typename system_clock::time_point;
using categories_t =
type_list<category::thread_cpu_time, category::thread_peak_memory,
category::thread_context_switch, category::thread_page_fault,
category::thread_hardware_counter, hw_counters>;
static constexpr size_t num_categories = std::tuple_size<categories_t>::value;
using valid_array_t = std::bitset<num_categories>;
static std::string label();
static std::string description();
@@ -72,16 +83,31 @@ struct backtrace_metrics
backtrace_metrics& operator=(backtrace_metrics&&) noexcept = default;
static void configure(bool, int64_t _tid = threading::get_id());
static void init_perfetto(int64_t _tid);
static void fini_perfetto(int64_t _tid);
static void init_perfetto(int64_t _tid, valid_array_t);
static void fini_perfetto(int64_t _tid, valid_array_t);
static std::vector<std::string> get_hw_counter_labels(int64_t);
template <typename Tp>
static bool get_valid(Tp, valid_array_t);
template <typename Tp>
static bool get_valid(type_list<Tp>, valid_array_t);
static void start();
static void stop();
void sample(int = -1);
void post_process(int64_t _tid, const backtrace* _bt,
const backtrace_metrics* _last) const;
explicit operator bool() const { return m_valid.any(); }
template <typename Tp>
bool operator()(Tp) const;
template <typename Tp>
bool operator()(type_list<Tp>) const;
auto get_valid() const { return m_valid; }
auto get_cpu_timestamp() const { return m_cpu; }
auto get_peak_memory() const { return m_mem_peak; }
auto get_context_switches() const { return m_ctx_swch; }
@@ -91,12 +117,44 @@ struct backtrace_metrics
void post_process_perfetto(int64_t _tid, uint64_t _ts) const;
private:
valid_array_t m_valid = {};
int64_t m_cpu = 0;
int64_t m_mem_peak = 0;
int64_t m_ctx_swch = 0;
int64_t m_page_flt = 0;
hw_counter_data_t m_hw_counter = {};
};
template <typename Tp>
bool
backtrace_metrics::get_valid(type_list<Tp>, valid_array_t _valid)
{
constexpr auto idx = tim::index_of<Tp, categories_t>::value;
return _valid.test(idx);
}
template <typename Tp>
bool backtrace_metrics::operator()(type_list<Tp>) const
{
static_assert(!concepts::is_type_listing<Tp>::value,
"Error! invalid call with tuple");
constexpr auto idx = tim::index_of<Tp, categories_t>::value;
return m_valid.test(idx);
}
template <typename Tp>
bool
backtrace_metrics::get_valid(Tp, valid_array_t _valid)
{
return get_valid(type_list<Tp>{}, _valid);
}
template <typename Tp>
bool backtrace_metrics::operator()(Tp) const
{
return (*this)(type_list<Tp>{});
}
} // namespace component
} // namespace omnitrace
@@ -68,7 +68,10 @@ using tracing_count_categories_t =
category::rocm_hsa, category::rocm_rccl>;
// these categories are added to the critical trace
using critical_trace_categories_t = type_list<category::host>;
using critical_trace_categories_t =
type_list<category::host, category::mpi, category::pthread, category::rocm_hip,
category::rocm_hsa, category::rocm_rccl, category::device_hip,
category::device_hsa, category::numa, category::python>;
// convert these categories to throughput points
using causal_throughput_categories_t =
@@ -128,7 +131,7 @@ void
category_region<CategoryT>::start(std::string_view name, Args&&... args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
if(tracing::category_push_disabled<CategoryT>()) return;
// unconditionally return if thread is disabled or finalized
if(get_thread_state() == ThreadState::Disabled) return;
@@ -212,7 +215,7 @@ void
category_region<CategoryT>::stop(std::string_view name, Args&&... args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
if(tracing::category_pop_disabled<CategoryT>()) return;
if(get_thread_state() == ThreadState::Disabled) return;
@@ -315,7 +318,7 @@ category_region<CategoryT>::mark(std::string_view name, Args&&...)
if constexpr(!_ct_use_causal) return;
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
if(tracing::category_mark_disabled<CategoryT>()) return;
// the expectation here is that if the state is not active then the call
// to omnitrace_init_tooling_hidden will activate all the appropriate
@@ -345,9 +348,6 @@ void
category_region<CategoryT>::audit(const gotcha_data_t& _data, audit::incoming,
Args&&... _args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
start<OptsT...>(_data.tool_id.c_str(), [&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
{
@@ -364,9 +364,6 @@ void
category_region<CategoryT>::audit(const gotcha_data_t& _data, audit::outgoing,
Args&&... _args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
stop<OptsT...>(_data.tool_id.c_str(), [&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
tracing::add_perfetto_annotation(ctx, "return", JOIN(", ", _args...));
@@ -379,9 +376,6 @@ void
category_region<CategoryT>::audit(std::string_view _name, audit::incoming,
Args&&... _args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
start<OptsT...>(_name.data(), [&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
{
@@ -398,9 +392,6 @@ void
category_region<CategoryT>::audit(std::string_view _name, audit::outgoing,
Args&&... _args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
stop<OptsT...>(_name.data(), [&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
tracing::add_perfetto_annotation(ctx, "return", JOIN(", ", _args...));
@@ -466,6 +457,5 @@ struct local_category_region : comp::base<local_category_region<CategoryT>, void
private:
std::string_view m_prefix = {};
};
} // namespace component
} // namespace omnitrace
@@ -97,7 +97,7 @@ struct comm_data : base<comm_data, void>
static constexpr auto label = "RCCL Comm Send";
};
TIMEMORY_DEFAULT_OBJECT(comm_data)
OMNITRACE_DEFAULT_OBJECT(comm_data)
static void preinit();
static void configure();
@@ -45,7 +45,7 @@ struct cpu_freq
using storage_type = tim::storage<cpu_freq, value_type>;
using cpu_id_set_t = std::set<uint64_t>;
TIMEMORY_DEFAULT_OBJECT(cpu_freq)
OMNITRACE_DEFAULT_OBJECT(cpu_freq)
// string id for component
static std::string label();
@@ -39,7 +39,7 @@ namespace
template <typename... Tp>
struct ensure_storage
{
TIMEMORY_DEFAULT_OBJECT(ensure_storage)
OMNITRACE_DEFAULT_OBJECT(ensure_storage)
void operator()() const { OMNITRACE_FOLD_EXPRESSION((*this)(tim::type_list<Tp>{})); }
@@ -44,7 +44,7 @@ struct exit_gotcha : tim::component::base<exit_gotcha, void>
using exit_func_t = void (*)(int);
using abort_func_t = void (*)();
TIMEMORY_DEFAULT_OBJECT(exit_gotcha)
OMNITRACE_DEFAULT_OBJECT(exit_gotcha)
// string id for component
static std::string label() { return "exit_gotcha"; }
@@ -37,7 +37,7 @@ struct fork_gotcha : comp::base<fork_gotcha, void>
using gotcha_data_t = comp::gotcha_data;
TIMEMORY_DEFAULT_OBJECT(fork_gotcha)
OMNITRACE_DEFAULT_OBJECT(fork_gotcha)
// string id for component
static std::string label() { return "fork_gotcha"; }
@@ -38,7 +38,7 @@ struct mpi_gotcha : comp::base<mpi_gotcha, void>
using comm_t = tim::mpi::comm_t;
using gotcha_data_t = comp::gotcha_data;
TIMEMORY_DEFAULT_OBJECT(mpi_gotcha)
OMNITRACE_DEFAULT_OBJECT(mpi_gotcha)
// string id for component
static std::string label() { return "mpi_gotcha"; }
@@ -44,7 +44,7 @@ struct numa_gotcha : tim::component::base<numa_gotcha, void>
using exit_func_t = void (*)(int);
using abort_func_t = void (*)();
TIMEMORY_DEFAULT_OBJECT(numa_gotcha)
OMNITRACE_DEFAULT_OBJECT(numa_gotcha)
// string id for component
static std::string label() { return "numa_gotcha"; }
@@ -161,6 +161,7 @@ pthread_create_gotcha::wrapper::operator()() const
auto _signals = std::set<int>{};
auto _coverage = (get_mode() == Mode::Coverage);
const auto& _parent_info = thread_info::get(m_config.parent_tid, InternalTID);
const auto& _info = thread_info::init(m_config.offset);
auto _dtor = [&]() {
set_thread_state(ThreadState::Internal);
if(_is_sampling)
@@ -189,16 +190,22 @@ pthread_create_gotcha::wrapper::operator()() const
_thr_bundle->stop();
if(_bundle) stop_bundle(*_bundle, _tid);
pthread_create_gotcha::shutdown(_tid);
OMNITRACE_BASIC_VERBOSE(
1, "[PID=%i][rank=%i] Thread %s (parent: %s) exited\n", process::get_id(),
dmp::rank(), _info->index_data->as_string().c_str(),
_parent_info->index_data->as_string().c_str());
}
};
auto _active = (get_state() == ::omnitrace::State::Active && bundles != nullptr &&
bundles_mutex != nullptr);
const auto& _info = thread_info::init(m_config.offset);
if(_active && !_coverage && !m_config.offset)
{
_tid = _info->index_data->sequent_value;
OMNITRACE_BASIC_VERBOSE(1, "[PID=%i][rank=%i] Thread %s (parent: %s) created\n",
process::get_id(), dmp::rank(),
_info->index_data->as_string().c_str(),
_parent_info->index_data->as_string().c_str());
threading::set_thread_name(TIMEMORY_JOIN(" ", "Thread", _tid).c_str());
if(!thread_bundle_data_t::instances().at(_tid))
{
@@ -235,6 +242,14 @@ pthread_create_gotcha::wrapper::operator()() const
sampling::unblock_signals();
}
}
else if(m_config.offset)
{
OMNITRACE_BASIC_VERBOSE(
2,
"[PID=%i][rank=%i] Thread %s (parent: %s) created [started by omnitrace]\n",
process::get_id(), dmp::rank(), _info->index_data->as_string().c_str(),
_parent_info->index_data->as_string().c_str());
}
// notify the wrapper that all internal work is completed
if(m_config.promise) m_config.promise->set_value();
@@ -399,8 +414,9 @@ pthread_create_gotcha::operator()(pthread_t* thread, const pthread_attr_t* attr,
if(_active && !_disabled && !_info->is_offset)
{
OMNITRACE_VERBOSE(1, "Creating new thread on PID %i (rank: %i), TID %li\n",
process::get_id(), dmp::rank(), _tid);
OMNITRACE_BASIC_VERBOSE(2, "[PID=%i][rank=%i] Starting new thread on %s...\n",
process::get_id(), dmp::rank(),
_info->index_data->as_string().c_str());
}
// ensure that cpu cid stack exists on the parent thread if active
@@ -64,7 +64,7 @@ struct pthread_create_gotcha : tim::component::base<pthread_create_gotcha, void>
wrapper_config m_config = {};
};
TIMEMORY_DEFAULT_OBJECT(pthread_create_gotcha)
OMNITRACE_DEFAULT_OBJECT(pthread_create_gotcha)
// string id for component
static std::string label() { return "pthread_create_gotcha"; }
@@ -48,7 +48,7 @@ struct stop<omnitrace::component::pthread_create_gotcha_t>
{
using type = omnitrace::component::pthread_create_gotcha_t;
TIMEMORY_DEFAULT_OBJECT(stop)
OMNITRACE_DEFAULT_OBJECT(stop)
template <typename... Args>
explicit stop(type&, Args&&...)
@@ -33,7 +33,7 @@ namespace omnitrace
{
struct pthread_gotcha : tim::component::base<pthread_gotcha, void>
{
TIMEMORY_DEFAULT_OBJECT(pthread_gotcha)
OMNITRACE_DEFAULT_OBJECT(pthread_gotcha)
// string id for component
static std::string label() { return "pthread_gotcha"; }
@@ -44,7 +44,7 @@ struct pthread_mutex_gotcha : comp::base<pthread_mutex_gotcha, void>
using hash_array_t = std::array<size_t, gotcha_capacity>;
using gotcha_data_t = comp::gotcha_data;
TIMEMORY_DEFAULT_OBJECT(pthread_mutex_gotcha)
OMNITRACE_DEFAULT_OBJECT(pthread_mutex_gotcha)
explicit pthread_mutex_gotcha(const gotcha_data_t&);
@@ -109,7 +109,7 @@ struct rocprofiler
using base_type = base<rocprofiler, void>;
using tracker_type = policy::instance_tracker<rocprofiler, false>;
TIMEMORY_DEFAULT_OBJECT(rocprofiler)
OMNITRACE_DEFAULT_OBJECT(rocprofiler)
static void preinit();
static void global_init() { setup(); }
@@ -173,7 +173,7 @@ struct set_storage<component::rocm_data_tracker>
using storage_array_t = std::array<storage<type>*, max_threads>;
friend struct get_storage<component::rocm_data_tracker>;
TIMEMORY_DEFAULT_OBJECT(set_storage)
OMNITRACE_DEFAULT_OBJECT(set_storage)
auto operator()(storage<type>*, size_t) const {}
auto operator()(type&, size_t) const {}
@@ -192,7 +192,7 @@ struct get_storage<component::rocm_data_tracker>
{
using type = component::rocm_data_tracker;
TIMEMORY_DEFAULT_OBJECT(get_storage)
OMNITRACE_DEFAULT_OBJECT(get_storage)
auto operator()(const type&) const
{
@@ -51,7 +51,7 @@ struct roctracer
using base_type = base<roctracer, void>;
using tracker_type = policy::instance_tracker<roctracer, false>;
TIMEMORY_DEFAULT_OBJECT(roctracer)
OMNITRACE_DEFAULT_OBJECT(roctracer)
static void preinit();
static void global_init() { setup(); }
+28
Ver fichero
@@ -94,5 +94,33 @@ public:
static constexpr bool value = sfinae(0);
constexpr auto operator()() const { return sfinae(0); }
};
template <size_t N, typename Tp, bool>
struct tuple_element_impl;
template <size_t N, typename... Tp>
struct tuple_element_impl<N, std::tuple<Tp...>, true>
{
using type = typename std::tuple_element<N, std::tuple<Tp...>>::type;
};
template <size_t N, typename... Tp>
struct tuple_element_impl<N, std::tuple<Tp...>, false>
{
using type = void;
};
template <size_t N, typename Tp>
struct tuple_element;
template <size_t N, typename... Tp>
struct tuple_element<N, std::tuple<Tp...>>
{
using type =
typename tuple_element_impl<N, std::tuple<Tp...>, (N < sizeof...(Tp))>::type;
};
template <size_t N, typename Tp>
using tuple_element_t = typename tuple_element<N, Tp>::type;
} // namespace concepts
} // namespace tim
+186 -35
Ver fichero
@@ -22,12 +22,12 @@
#include "library/config.hpp"
#include "common/defines.h"
#include "library/constraint.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include "library/gpu.hpp"
#include "library/mproc.hpp"
#include "library/perfetto.hpp"
#include "library/runtime.hpp"
#include <timemory/backends/dmp.hpp>
#include <timemory/backends/mpi.hpp>
@@ -43,12 +43,14 @@
#include <timemory/settings/types.hpp>
#include <timemory/utility/argparse.hpp>
#include <timemory/utility/declaration.hpp>
#include <timemory/utility/delimit.hpp>
#include <timemory/utility/filepath.hpp>
#include <timemory/utility/join.hpp>
#include <timemory/utility/signals.hpp>
#include <algorithm>
#include <array>
#include <atomic>
#include <csignal>
#include <cstdint>
#include <cstdlib>
@@ -98,7 +100,7 @@ get_setting_name(std::string _v)
template <typename Tp>
Tp
get_available_perfetto_categories()
get_available_categories()
{
auto _v = Tp{};
for(auto itr : { OMNITRACE_PERFETTO_CATEGORIES })
@@ -287,8 +289,8 @@ configure_settings(bool _init)
"for continuous integration)",
false, "debugging", "advanced");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_COLORIZED_LOG", "Enable colorized logging",
true, "debugging", "advanced");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_MONOCHROME", "Disable colorized logging",
false, "debugging", "advanced");
OMNITRACE_CONFIG_EXT_SETTING(int, "OMNITRACE_DL_VERBOSE",
"Verbosity within the omnitrace-dl library", 0,
@@ -392,10 +394,45 @@ configure_settings(bool _init)
"Enable support for code coverage", false, "coverage",
"backend", "advanced");
OMNITRACE_CONFIG_SETTING(size_t, "OMNITRACE_INSTRUMENTATION_INTERVAL",
"Instrumentation only takes measurements once every N "
"function calls (not statistical)",
size_t{ 1 }, "instrumentation", "data_sampling", "advanced");
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_TRACE_DELAY",
"Time in seconds to wait before enabling trace/profile data collection. If "
"multiple delays + durations are needed, see OMNITRACE_TRACE_PERIODS.",
0.0, "trace", "profile", "perfetto", "timemory");
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_TRACE_DURATION",
"If > 0.0, time (in seconds) to collect trace/profile data. If multiple delays + "
"durations are needed, see OMNITRACE_TRACE_PERIODS.",
0.0, "trace", "profile", "perfetto", "timemory");
auto _clock_s =
config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIOD_CLOCK_ID").second;
auto _clock_choices = std::vector<std::string>{};
for(const auto& itr : constraint::get_valid_clock_ids())
{
_clock_choices.emplace_back(
join("", "(", join('|', itr.name, itr.value, itr.raw_name), ")"));
}
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_TRACE_PERIODS",
"Similar to specify trace delay and/or duration except in "
"the form <DELAY>:<DURATION>, <DELAY>:<DURATION>:<REPEAT>, "
"and/or <DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>",
std::string{}, "trace", "profile", "perfetto", "timemory");
OMNITRACE_CONFIG_SETTING(
std::string, "OMNITRACE_TRACE_PERIOD_CLOCK_ID",
"Set the default clock ID for OMNITRACE_TRACE_DELAY, OMNITRACE_TRACE_DURATION, "
"and/or OMNITRACE_TRACE_PERIODS. E.g. \"realtime\" == the delay/duration is "
"governed by the elapsed realtime, \"cputime\" == the delay/duration is governed "
"by the elapsed CPU-time within the process, etc. Note: when using CPU-based "
"timing, it is recommened to scale the value by the number of threads and be "
"aware that omnitrace may contribute to advancing the process CPU-time",
"CLOCK_REALTIME", "trace", "profile", "perfetto", "timemory")
->set_choices(_clock_choices);
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_SAMPLING_FREQ",
@@ -639,10 +676,18 @@ configure_settings(bool _init)
"discard", "perfetto", "data")
->set_choices({ "fill", "discard" });
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_PERFETTO_CATEGORIES",
"Categories to collect within perfetto", "", "perfetto",
"data", "advanced")
->set_choices(get_available_perfetto_categories<std::vector<std::string>>());
OMNITRACE_CONFIG_SETTING(std::string, "OMNITRACE_ENABLE_CATEGORIES",
"Enable collecting profiling and trace data for these "
"categories and disable all other categories",
"", "trace", "profile", "perfetto", "timemory", "data",
"advanced")
->set_choices(get_available_categories<std::vector<std::string>>());
OMNITRACE_CONFIG_SETTING(
std::string, "OMNITRACE_DISABLE_CATEGORIES",
"Disable collecting profiling and trace data for these categories", "", "trace",
"profile", "perfetto", "timemory", "data", "advanced")
->set_choices(get_available_categories<std::vector<std::string>>());
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_PERFETTO_ANNOTATIONS",
"Include debug annotations in perfetto trace. When enabled, "
@@ -977,8 +1022,8 @@ configure_settings(bool _init)
settings::suppress_config() = true;
if(!get_env("OMNITRACE_COLORIZED_LOG", _config->get<bool>("OMNITRACE_COLORIZED_LOG")))
tim::log::colorized() = false;
if(get_env("OMNITRACE_MONOCHROME", _config->get<bool>("OMNITRACE_MONOCHROME")))
tim::log::monochrome() = true;
if(_init)
{
@@ -1105,8 +1150,6 @@ configure_mode_settings()
_set("OMNITRACE_USE_ROCM_SMI", false);
}
get_instrumentation_interval() = std::max<size_t>(get_instrumentation_interval(), 1);
if(get_use_kokkosp())
{
auto _current_kokkosp_lib = tim::get_env<std::string>("KOKKOS_PROFILE_LIBRARY");
@@ -1156,6 +1199,13 @@ namespace
using signal_settings = tim::signals::signal_settings;
using sys_signal = tim::signals::sys_signal;
std::atomic<signal_handler_t>&
get_signal_handler()
{
static auto _v = std::atomic<signal_handler_t>{ nullptr };
return _v;
}
void
omnitrace_exit_action(int nsig)
{
@@ -1163,7 +1213,8 @@ omnitrace_exit_action(int nsig)
tim::signals::sigmask_scope::process);
OMNITRACE_BASIC_PRINT("Finalizing afer signal %i :: %s\n", nsig,
signal_settings::str(static_cast<sys_signal>(nsig)).c_str());
if(get_state() == State::Active) omnitrace_finalize();
auto _handler = get_signal_handler().load();
if(_handler) (*_handler)();
kill(process::get_id(), nsig);
}
@@ -1183,6 +1234,28 @@ omnitrace_trampoline_handler(int _v)
}
} // namespace
signal_handler_t
set_signal_handler(signal_handler_t _func)
{
if(_func)
{
auto _handler = get_signal_handler().load(std::memory_order_relaxed);
if(get_signal_handler().compare_exchange_strong(_handler, _func,
std::memory_order_relaxed))
{
return _handler;
}
else
{
_handler = get_signal_handler().load(std::memory_order_seq_cst);
get_signal_handler().store(_func);
return _handler;
}
}
return get_signal_handler().load();
}
void
configure_signal_handler()
{
@@ -1218,6 +1291,35 @@ configure_signal_handler()
}
}
int
get_realtime_signal()
{
return SIGRTMIN + get_sampling_rtoffset();
}
int
get_cputime_signal()
{
return SIGPROF;
}
std::set<int> get_sampling_signals(int64_t)
{
auto _v = std::set<int>{};
if(get_use_causal())
{
_v.emplace(get_cputime_signal());
_v.emplace(get_realtime_signal());
}
else
{
if(get_use_sampling_cputime()) _v.emplace(get_cputime_signal());
if(get_use_sampling_realtime()) _v.emplace(get_realtime_signal());
}
return _v;
}
void
configure_disabled_settings()
{
@@ -1964,18 +2066,74 @@ get_perfetto_fill_policy()
return static_cast<tim::tsettings<std::string>&>(*_v->second).get();
}
std::set<std::string>
get_perfetto_categories()
namespace
{
static auto _v = get_config()->find("OMNITRACE_PERFETTO_CATEGORIES");
static auto _avail = get_available_perfetto_categories<std::set<std::string>>();
auto _ret = std::set<std::string>{};
for(auto itr : tim::delimit(
static_cast<tim::tsettings<std::string>&>(*_v->second).get(), " ,;:"))
{
if(_avail.count(itr) > 0) _ret.emplace(itr);
}
return _ret;
auto
get_category_config()
{
using strset_t = std::set<std::string>;
static auto _v = []() {
auto _avail = get_available_categories<strset_t>();
auto _parse = [&_avail](const auto& _setting) {
auto _ret = strset_t{};
for(auto itr : tim::delimit(
static_cast<tim::tsettings<std::string>&>(*_setting->second).get(),
" ,;:\n\t"))
{
if(_avail.count(itr) > 0) _ret.emplace(itr);
}
return _ret;
};
auto _enabled = _parse(get_config()->find("OMNITRACE_ENABLE_CATEGORIES"));
auto _disabled = _parse(get_config()->find("OMNITRACE_DISABLE_CATEGORIES"));
if(_enabled.empty() && _disabled.empty())
{
_enabled = _avail;
}
else if(_enabled.empty() && !_disabled.empty())
{
for(auto itr : _avail)
{
if(_disabled.count(itr) == 0) _enabled.emplace(itr);
}
}
else if(!_enabled.empty() && _disabled.empty())
{
for(auto itr : _avail)
{
if(_enabled.count(itr) == 0) _disabled.emplace(itr);
}
}
else
{
OMNITRACE_ABORT("Error! Conflicting options OMNITRACE_ENABLE_CATEGORIES and "
"OMNITRACE_DISABLE_CATEGORIES were both provided.");
}
OMNITRACE_CI_THROW(_enabled.size() + _disabled.size() != _avail.size(),
"Error! Internal error for categories: %zu (enabled) + %zu "
"(disabled) != %zu (total)\n",
_enabled.size(), _disabled.size(), _avail.size());
return std::make_pair(_enabled, _disabled);
}();
return _v;
}
} // namespace
std::set<std::string>
get_enabled_categories()
{
return get_category_config().first;
}
std::set<std::string>
get_disabled_categories()
{
return get_category_config().second;
}
bool
@@ -2043,13 +2201,6 @@ get_perfetto_output_filename()
return _val;
}
size_t&
get_instrumentation_interval()
{
static auto _v = get_config()->find("OMNITRACE_INSTRUMENTATION_INTERVAL");
return static_cast<tim::tsettings<size_t>&>(*_v->second).get();
}
double
get_sampling_freq()
{
+24 -4
Ver fichero
@@ -22,7 +22,6 @@
#pragma once
#include "api.hpp"
#include "library/common.hpp"
#include "library/defines.hpp"
#include "library/state.hpp"
@@ -43,6 +42,12 @@ namespace omnitrace
//
inline namespace config
{
using signal_handler_t = void (*)(void);
// if arg is nullptr, returns current signal handler
// if arg is non-null, returns replaced signal handler
signal_handler_t set_signal_handler(signal_handler_t);
bool
settings_are_configured() OMNITRACE_HOT;
@@ -55,6 +60,15 @@ configure_mode_settings();
void
configure_signal_handler();
int
get_realtime_signal();
int
get_cputime_signal();
std::set<int>
get_sampling_signals(int64_t _tid = 0);
void
configure_disabled_settings();
@@ -257,7 +271,10 @@ std::string
get_perfetto_fill_policy();
std::set<std::string>
get_perfetto_categories();
get_enabled_categories();
std::set<std::string>
get_disabled_categories();
bool
get_perfetto_annotations() OMNITRACE_HOT;
@@ -284,8 +301,11 @@ get_perfetto_roctracer_per_stream() OMNITRACE_HOT;
int64_t
get_critical_trace_count();
size_t&
get_instrumentation_interval();
double
get_trace_delay();
double
get_trace_duration();
double
get_sampling_freq();
+349
Ver fichero
@@ -0,0 +1,349 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/constraint.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/state.hpp"
#include "library/utility.hpp"
#include <timemory/units.hpp>
#include <timemory/utility/delimit.hpp>
#include <chrono>
#include <cstdint>
#include <ratio>
#include <string>
#include <thread>
#include <type_traits>
namespace omnitrace
{
namespace constraint
{
namespace
{
namespace units = ::tim::units;
using clock_type = std::chrono::high_resolution_clock;
using duration_type = std::chrono::duration<double, std::nano>;
#define OMNITRACE_CLOCK_IDENTIFIER(VAL) \
clock_identifier { #VAL, VAL }
auto
clock_name(std::string _v)
{
constexpr auto _clock_prefix = std::string_view{ "clock_" };
for(auto& itr : _v)
itr = tolower(itr);
auto _pos = _v.find(_clock_prefix);
if(_pos == 0) _v = _v.substr(_pos + _clock_prefix.length());
if(_v == "process_cputime_id") _v = "cputime";
return _v;
}
auto accepted_clock_ids =
std::set<clock_identifier>{ OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_PROCESS_CPUTIME_ID),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_RAW),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_REALTIME_COARSE),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_MONOTONIC_COARSE),
OMNITRACE_CLOCK_IDENTIFIER(CLOCK_BOOTTIME) };
template <typename Tp>
clock_identifier
find_clock_identifier(const Tp& _v)
{
const char* _descript = "";
if constexpr(std::is_integral<Tp>::value)
{
_descript = "value";
for(const auto& itr : accepted_clock_ids)
{
if(itr.value == _v)
{
return itr;
}
}
}
else
{
_descript = "name";
auto _clock_name = clock_name(_v);
for(const auto& itr : accepted_clock_ids)
{
if(itr.name == _clock_name || itr.raw_name == _v ||
std::to_string(itr.value) == _v)
{
return itr;
}
}
}
OMNITRACE_THROW("Unknown clock id %s: %s. Valid choices: %s\n", _descript,
timemory::join::join("", _v).c_str(),
timemory::join::join("", accepted_clock_ids).c_str());
}
void
sleep(uint64_t _n)
{
std::this_thread::sleep_for(std::chrono::nanoseconds{ _n });
}
timespec
get_timespec(clockid_t clock_id) noexcept
{
struct timespec _ts;
clock_gettime(clock_id, &_ts);
return _ts;
}
template <typename Tp = uint64_t, typename Precision = std::nano>
Tp
get_clock_now(clockid_t clock_id) noexcept
{
constexpr Tp factor = (Precision::den == std::nano::den)
? 1
: (Precision::den / static_cast<Tp>(std::nano::den));
auto _ts = get_timespec(clock_id);
return (_ts.tv_sec * std::nano::den + _ts.tv_nsec) * factor;
}
} // namespace
//--------------------------------------------------------------------------------------//
//
// stages implementation
//
//--------------------------------------------------------------------------------------//
stages::stages()
: init{ [](const spec&) { return get_state() < State::Finalized; } }
, wait{ [](const spec& _spec) {
sleep(std::min<uint64_t>(100 * units::msec, _spec.delay * units::sec));
return get_state() < State::Finalized;
} }
, start{ [](const spec&) { return get_state() < State::Finalized; } }
, collect{ [](const spec& _spec) {
sleep(std::min<uint64_t>(100 * units::msec, _spec.duration * units::sec));
return get_state() < State::Finalized;
} }
, stop{ [](const spec&) { return get_state() < State::Finalized; } }
{}
//--------------------------------------------------------------------------------------//
//
// clock identifier implementation
//
//--------------------------------------------------------------------------------------//
clock_identifier::clock_identifier(std::string_view _name, int _val)
: value{ _val }
, raw_name{ _name }
, name{ clock_name(std::string{ _name }) }
{}
bool
clock_identifier::operator<(const clock_identifier& _rhs) const
{
return value < _rhs.value;
}
bool
clock_identifier::operator==(const clock_identifier& _rhs) const
{
return std::tie(raw_name, value) == std::tie(_rhs.raw_name, _rhs.value);
}
bool
clock_identifier::operator==(int _rhs) const
{
return (value == _rhs);
}
bool
clock_identifier::operator==(std::string _rhs) const
{
return (raw_name == std::string_view{ _rhs }) ||
(name == clock_name(std::move(_rhs)));
}
std::string
clock_identifier::as_string() const
{
auto _name = name;
for(auto& itr : _name)
itr = tolower(itr);
auto _ss = std::stringstream{};
_ss << _name << "(id=" << raw_name << ", value=" << value << ")";
return _ss.str();
}
//--------------------------------------------------------------------------------------//
//
// spec implementation
//
//--------------------------------------------------------------------------------------//
spec::spec(clock_identifier _id, double _delay, double _dur, uint64_t _n, uint64_t _rep)
: delay{ _delay }
, duration{ _dur }
, count{ _n }
, repeat{ _rep }
, clock_id{ std::move(_id) }
{}
spec::spec(int _clock_id, double _delay, double _dur, uint64_t _n, uint64_t _rep)
: delay{ _delay }
, duration{ _dur }
, count{ _n }
, repeat{ _rep }
, clock_id{ find_clock_identifier(_clock_id) }
{}
spec::spec(const std::string& _clock_id, double _delay, double _dur, uint64_t _n,
uint64_t _rep)
: delay{ _delay }
, duration{ _dur }
, count{ _n }
, repeat{ _rep }
, clock_id{ find_clock_identifier(_clock_id) }
{}
spec::spec(const std::string& _line)
: spec{ config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIOD_CLOCK_ID").second,
config::get_setting_value<double>("OMNITRACE_TRACE_DELAY").second,
config::get_setting_value<double>("OMNITRACE_TRACE_DURATION").second }
{
auto _delim = tim::delimit(_line, ":");
if(!_delim.empty()) delay = utility::convert<double>(_delim.at(0));
if(_delim.size() > 1) duration = utility::convert<double>(_delim.at(1));
if(_delim.size() > 2) repeat = utility::convert<uint64_t>(_delim.at(2));
if(_delim.size() > 3) clock_id = find_clock_identifier(_delim.at(3));
}
void
spec::operator()(const stages& _stages) const
{
auto _n = repeat;
if(_n < 1) _n = std::numeric_limits<uint64_t>::max();
while(get_state() < State::Active)
sleep(1 * units::usec);
for(uint64_t i = 0; i < _n; ++i)
{
auto _spec = spec{ clock_id, delay, duration, i, repeat };
auto _wait = [_spec](const auto& _func, auto _dur) {
auto _ret = true;
auto _now = get_clock_now(_spec.clock_id.value);
auto _del = (_dur * units::sec);
auto _end = _now + _del;
while(get_clock_now(_spec.clock_id.value) < _end && (_ret = _func(_spec)))
{}
return _ret;
};
OMNITRACE_VERBOSE(2,
"Executing constraint spec %lu of %lu :: delay: %6.3f, "
"duration: %6.3f, clock: %s\n",
i, _spec.repeat, _spec.delay, _spec.duration,
_spec.clock_id.as_string().c_str());
if(_stages.init(_spec) && _wait(_stages.wait, _spec.delay) &&
_stages.start(_spec) && _wait(_stages.collect, _spec.duration) &&
_stages.stop(_spec))
{}
else
{
break;
}
}
}
//--------------------------------------------------------------------------------------//
//
// global usage functions
//
//--------------------------------------------------------------------------------------//
const std::set<clock_identifier>&
get_valid_clock_ids()
{
return accepted_clock_ids;
}
std::vector<spec>
get_trace_specs()
{
auto _v = std::vector<constraint::spec>{};
{
auto _delay_v = config::get_setting_value<double>("OMNITRACE_TRACE_DELAY").second;
auto _duration_v =
config::get_setting_value<double>("OMNITRACE_TRACE_DURATION").second;
auto _clock_v = find_clock_identifier(
config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIOD_CLOCK_ID")
.second);
if(_delay_v > 0.0 || _duration_v > 0.0)
{
_v.emplace_back(_clock_v, _delay_v, _duration_v);
}
}
{
auto _periods_v =
config::get_setting_value<std::string>("OMNITRACE_TRACE_PERIODS").second;
if(!_periods_v.empty())
{
for(auto itr : tim::delimit(_periods_v, " ;\t\n"))
_v.emplace_back(itr);
}
}
return _v;
}
stages
get_trace_stages()
{
auto _v = stages{};
_v.init = [](const spec&) { return get_state() < State::Finalized; };
_v.wait = [](const spec& _spec) {
sleep(std::min<uint64_t>(100 * units::msec, _spec.delay * units::sec));
return get_state() < State::Finalized;
};
_v.start = [](const spec&) { return get_state() < State::Finalized; };
_v.collect = [](const spec& _spec) {
sleep(std::min<uint64_t>(100 * units::msec, _spec.duration * units::sec));
return get_state() < State::Finalized;
};
_v.stop = [](const spec&) { return get_state() < State::Finalized; };
return _v;
}
} // namespace constraint
} // namespace omnitrace
+114
Ver fichero
@@ -0,0 +1,114 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
/// @file
/// This provides generic functionality for constraining data collection within
/// a windows of time. E.g., delay, delay + duration, (delay + duration) * nrepeat
///
/// @todo Migrate delay/duration for sampling, process sampling, and causal profiling
/// to use this
///
#include "library/defines.hpp"
#include <cstdint>
#include <ctime>
#include <functional>
#include <set>
#include <string>
#include <vector>
namespace omnitrace
{
namespace constraint
{
struct spec;
struct stages
{
using functor_t = std::function<bool(const spec&)>;
stages();
OMNITRACE_DEFAULT_COPY_MOVE(stages)
functor_t init = [](const spec&) { return true; };
functor_t wait = [](const spec&) { return true; };
functor_t start = [](const spec&) { return true; };
functor_t collect = [](const spec&) { return true; };
functor_t stop = [](const spec&) { return true; };
};
struct clock_identifier
{
int value = -1;
std::string_view raw_name = {};
std::string name = {};
clock_identifier();
clock_identifier(std::string_view, int);
OMNITRACE_DEFAULT_COPY_MOVE(clock_identifier)
std::string as_string() const;
bool operator<(const clock_identifier& _rhs) const;
bool operator==(const clock_identifier& _rhs) const;
bool operator==(int _rhs) const;
bool operator==(std::string _rhs) const;
friend std::ostream& operator<<(std::ostream& _os, const clock_identifier& _v)
{
return (_os << _v.as_string());
}
};
struct spec
{
spec(int, double, double, uint64_t = 0, uint64_t = 1);
spec(clock_identifier, double, double, uint64_t = 0, uint64_t = 1);
spec(const std::string&, double, double, uint64_t = 0, uint64_t = 1);
spec(const std::string&);
OMNITRACE_DEFAULT_COPY_MOVE(spec)
void operator()(const stages&) const;
double delay = 0.0;
double duration = 0.0;
uint64_t count = 0;
uint64_t repeat = 1;
clock_identifier clock_id = {};
};
const std::set<clock_identifier>&
get_valid_clock_ids();
std::vector<spec>
get_trace_specs();
stages
get_trace_stages();
} // namespace constraint
} // namespace omnitrace
+1 -2
Ver fichero
@@ -22,7 +22,6 @@
#include "library/debug.hpp"
#include "library/binary/address_range.hpp"
#include "library/runtime.hpp"
#include "library/state.hpp"
#include <timemory/log/color.hpp>
@@ -91,7 +90,7 @@ get_file()
{
static FILE* _v = []() {
auto&& _fname = tim::get_env<std::string>("OMNITRACE_LOG_FILE", "");
if(!_fname.empty()) tim::log::colorized() = false;
if(!_fname.empty()) tim::log::monochrome() = true;
return (_fname.empty()) ? stderr : tim::filepath::fopen(_fname, "w");
}();
return _v;
+17
Ver fichero
@@ -42,3 +42,20 @@
#define OMNITRACE_SAMPLING_GPU_MEMORY_USAGE OMNITRACE_SAMPLING_GPU_MEMORY_USAGE_idx
#define OMNITRACE_METADATA(...) ::tim::manager::add_metadata(__VA_ARGS__)
#if !defined(OMNITRACE_DEFAULT_OBJECT)
# define OMNITRACE_DEFAULT_OBJECT(NAME) \
NAME() = default; \
NAME(const NAME&) = default; \
NAME(NAME&&) noexcept = default; \
NAME& operator=(const NAME&) = default; \
NAME& operator=(NAME&&) noexcept = default;
#endif
#if !defined(OMNITRACE_DEFAULT_COPY_MOVE)
# define OMNITRACE_DEFAULT_COPY_MOVE(NAME) \
NAME(const NAME&) = default; \
NAME(NAME&&) noexcept = default; \
NAME& operator=(const NAME&) = default; \
NAME& operator=(NAME&&) noexcept = default;
#endif
+86 -15
Ver fichero
@@ -20,30 +20,34 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#if !defined(OMNITRACE_USE_ROCM_SMI)
# define OMNITRACE_USE_ROCM_SMI 0
#endif
#if !defined(OMNITRACE_USE_HIP)
# define OMNITRACE_USE_HIP 0
#endif
#if OMNITRACE_USE_HIP > 0
# if !defined(TIMEMORY_USE_HIP)
# define TIMEMORY_USE_HIP 1
# endif
#endif
#include "library/gpu.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include <timemory/manager.hpp>
#if defined(OMNITRACE_USE_ROCM_SMI) && OMNITRACE_USE_ROCM_SMI > 0
# include "library/rocm_smi.hpp"
#elif !defined(OMNITRACE_USE_ROCM_SMI)
# define OMNITRACE_USE_ROCM_SMI 0
#endif
#if defined(OMNITRACE_USE_HIP) && OMNITRACE_USE_HIP > 0
# if !defined(TIMEMORY_USE_HIP)
# define TIMEMORY_USE_HIP 1
# endif
# include <timemory/components/hip/backends.hpp>
#elif !defined(OMNITRACE_USE_HIP)
# define OMNITRACE_USE_HIP 0
#if OMNITRACE_USE_ROCM_SMI > 0
# include <rocm_smi/rocm_smi.h>
#endif
#if OMNITRACE_USE_HIP > 0
# include <hip/hip_runtime.h>
# include <hip/hip_runtime_api.h>
# include <timemory/components/hip/backends.hpp>
# if !defined(OMNITRACE_HIP_RUNTIME_CALL)
# define OMNITRACE_HIP_RUNTIME_CALL(err) \
@@ -62,6 +66,49 @@ namespace omnitrace
{
namespace gpu
{
namespace
{
namespace scope = ::tim::scope;
#if OMNITRACE_USE_ROCM_SMI > 0
# define OMNITRACE_ROCM_SMI_CALL(ERROR_CODE) \
::omnitrace::gpu::check_rsmi_error(ERROR_CODE, __FILE__, __LINE__)
void
check_rsmi_error(rsmi_status_t _code, const char* _file, int _line)
{
if(_code == RSMI_STATUS_SUCCESS) return;
const char* _msg = nullptr;
auto _err = rsmi_status_string(_code, &_msg);
if(_err != RSMI_STATUS_SUCCESS)
OMNITRACE_THROW("rsmi_status_string failed. No error message available. "
"Error code %i originated at %s:%i\n",
static_cast<int>(_code), _file, _line);
OMNITRACE_THROW("[%s:%i] Error code %i :: %s", _file, _line, static_cast<int>(_code),
_msg);
}
bool
rsmi_init()
{
auto _rsmi_init = []() {
try
{
OMNITRACE_ROCM_SMI_CALL(::rsmi_init(0));
} catch(std::exception& _e)
{
OMNITRACE_BASIC_VERBOSE(1, "Exception thrown initializing rocm-smi: %s\n",
_e.what());
return false;
}
return true;
}();
return _rsmi_init;
}
#endif
} // namespace
int
hip_device_count()
{
@@ -72,13 +119,37 @@ hip_device_count()
#endif
}
int
rsmi_device_count()
{
#if OMNITRACE_USE_ROCM_SMI > 0
if(!rsmi_init()) return 0;
static auto _num_devices = []() {
uint32_t _v = 0;
try
{
OMNITRACE_ROCM_SMI_CALL(rsmi_num_monitor_devices(&_v));
} catch(std::exception& _e)
{
OMNITRACE_BASIC_VERBOSE(
1, "Exception thrown getting the rocm-smi devices: %s\n", _e.what());
}
return _v;
}();
return _num_devices;
#else
return 0;
#endif
}
int
device_count()
{
#if OMNITRACE_USE_ROCM_SMI > 0
// store as static since calls after rsmi_shutdown will return zero
static auto _v = rocm_smi::device_count();
return _v;
return rsmi_device_count();
#elif OMNITRACE_USE_HIP > 0
return ::tim::hip::device_count();
#else
+3
Ver fichero
@@ -32,6 +32,9 @@ device_count();
int
hip_device_count();
int
rsmi_device_count();
void
add_hip_device_metadata();
} // namespace gpu
+27 -36
Ver fichero
@@ -28,37 +28,40 @@ namespace omnitrace
{
namespace perfetto
{
auto&
get_config()
{
static auto _v = ::perfetto::TraceConfig{};
return _v;
}
auto&
get_session()
{
static auto _v = std::unique_ptr<::perfetto::TracingSession>{};
return _v;
}
void
setup()
{
auto args = ::perfetto::TracingInitArgs{};
auto track_event_cfg = ::perfetto::protos::gen::TrackEventConfig{};
auto& cfg = tracing::get_perfetto_config();
auto& cfg = get_config();
// environment settings
auto shmem_size_hint = get_perfetto_shmem_size_hint();
auto buffer_size = get_perfetto_buffer_size();
auto shmem_size_hint = config::get_perfetto_shmem_size_hint();
auto buffer_size = config::get_perfetto_buffer_size();
auto _policy =
get_perfetto_fill_policy() == "discard"
config::get_perfetto_fill_policy() == "discard"
? ::perfetto::protos::gen::TraceConfig_BufferConfig_FillPolicy_DISCARD
: ::perfetto::protos::gen::TraceConfig_BufferConfig_FillPolicy_RING_BUFFER;
auto* buffer_config = cfg.add_buffers();
buffer_config->set_size_kb(buffer_size);
buffer_config->set_fill_policy(_policy);
std::set<std::string> _available_categories = {};
std::set<std::string> _disabled_categories = {};
for(auto itr : { OMNITRACE_PERFETTO_CATEGORIES })
_available_categories.emplace(itr.name);
auto _enabled_categories = config::get_perfetto_categories();
for(const auto& itr : _available_categories)
{
if(!_enabled_categories.empty() && _enabled_categories.count(itr) == 0)
_disabled_categories.emplace(itr);
}
for(const auto& itr : _disabled_categories)
for(const auto& itr : config::get_disabled_categories())
{
OMNITRACE_VERBOSE_F(1, "Disabling perfetto track event category: %s\n",
itr.c_str());
@@ -81,31 +84,19 @@ setup()
void
start()
{
#if defined(CUSTOM_DATA_SOURCE)
// Add the following:
::perfetto::DataSourceDescriptor dsd{};
dsd.set_name("com.example.custom_data_source");
CustomDataSource::Register(dsd);
auto* ds_cfg = cfg.add_data_sources()->mutable_config();
ds_cfg->set_name("com.example.custom_data_source");
CustomDataSource::Trace([](CustomDataSource::TraceContext ctx) {
auto packet = ctx.NewTracePacket();
packet->set_timestamp(::perfetto::TrackEvent::GetTraceTimeNs());
packet->set_for_testing()->set_str("Hello world!");
PRINT_HERE("%s", "Trace");
});
#endif
auto& cfg = tracing::get_perfetto_config();
auto& tracing_session = tracing::get_perfetto_session();
auto& cfg = get_config();
auto& tracing_session = get_session();
tracing_session = ::perfetto::Tracing::NewTrace();
tracing_session->Setup(cfg);
tracing_session->StartBlocking();
}
} // namespace perfetto
std::unique_ptr<::perfetto::TracingSession>&
get_perfetto_session()
{
return ::omnitrace::perfetto::get_session();
}
} // namespace omnitrace
PERFETTO_TRACK_EVENT_STATIC_STORAGE();
#if defined(CUSTOM_DATA_SOURCE)
PERFETTO_DEFINE_DATA_SOURCE_STATIC_MEMBERS(CustomDataSource);
#endif
+90 -108
Ver fichero
@@ -43,123 +43,22 @@ PERFETTO_DEFINE_CATEGORIES(OMNITRACE_PERFETTO_CATEGORIES);
namespace omnitrace
{
#if defined(CUSTOM_DATA_SOURCE)
class CustomDataSource : public perfetto::DataSource<CustomDataSource>
{
public:
void OnSetup(const SetupArgs&) override
{
// Use this callback to apply any custom configuration to your data source
// based on the TraceConfig in SetupArgs.
OMNITRACE_PRINT_F("[CustomDataSource] setup\n");
}
void OnStart(const StartArgs&) override
{
// This notification can be used to initialize the GPU driver, enable
// counters, etc. StartArgs will contains the DataSourceDescriptor,
// which can be extended.
OMNITRACE_PRINT_F("[CustomDataSource] start\n");
}
void OnStop(const StopArgs&) override
{
// Undo any initialization done in OnStart.
OMNITRACE_PRINT_F("[CustomDataSource] stop\n");
}
// Data sources can also have per-instance state.
int my_custom_state = 0;
};
PERFETTO_DECLARE_DATA_SOURCE_STATIC_MEMBERS(CustomDataSource);
#endif
std::unique_ptr<::perfetto::TracingSession>&
get_perfetto_session();
template <typename Tp>
struct perfetto_counter_track
{
using track_map_t = std::map<uint32_t, std::vector<perfetto::CounterTrack>>;
using track_map_t = std::map<uint32_t, std::vector<::perfetto::CounterTrack>>;
using name_map_t = std::map<uint32_t, std::vector<std::unique_ptr<std::string>>>;
using data_t = std::pair<name_map_t, track_map_t>;
static auto init() { (void) get_data(); }
static auto exists(size_t _idx, int64_t _n = -1)
{
bool _v = get_data().second.count(_idx) != 0;
if(_n < 0 || !_v) return _v;
return static_cast<size_t>(_n) < get_data().second.at(_idx).size();
}
static size_t size(size_t _idx)
{
bool _v = get_data().second.count(_idx) != 0;
if(!_v) return 0;
return get_data().second.at(_idx).size();
}
static auto init() { (void) get_data(); }
static auto exists(size_t _idx, int64_t _n = -1);
static size_t size(size_t _idx);
static auto emplace(size_t _idx, const std::string& _v, const char* _units = nullptr,
const char* _category = nullptr, int64_t _mult = 1,
bool _incr = false)
{
auto& _name_data = get_data().first[_idx];
auto& _track_data = get_data().second[_idx];
std::vector<std::tuple<std::string, const char*, bool>> _missing = {};
if(config::get_is_continuous_integration())
{
for(const auto& itr : _name_data)
{
_missing.emplace_back(std::make_tuple(*itr, itr->c_str(), false));
}
}
auto _index = _track_data.size();
auto& _name = _name_data.emplace_back(std::make_unique<std::string>(_v));
const char* _unit_name = (_units && strlen(_units) > 0) ? _units : nullptr;
_track_data.emplace_back(perfetto::CounterTrack{ _name->c_str() }
.set_unit_name(_unit_name)
.set_category(_category)
.set_unit_multiplier(_mult)
.set_is_incremental(_incr));
if(config::get_is_continuous_integration())
{
for(auto& itr : _missing)
{
const char* citr = std::get<1>(itr);
for(const auto& ditr : _name_data)
{
if(citr == ditr->c_str() && strcmp(citr, ditr->c_str()) == 0)
{
std::get<2>(itr) = true;
break;
}
}
if(!std::get<2>(itr))
{
std::set<void*> _prev = {};
std::set<void*> _curr = {};
for(const auto& eitr : _missing)
_prev.emplace(
static_cast<void*>(const_cast<char*>(std::get<1>(eitr))));
for(const auto& eitr : _name_data)
_curr.emplace(
static_cast<void*>(const_cast<char*>(eitr->c_str())));
std::stringstream _pss{};
for(auto&& eitr : _prev)
_pss << " " << std::hex << std::setw(12) << std::left << eitr;
std::stringstream _css{};
for(auto&& eitr : _curr)
_css << " " << std::hex << std::setw(12) << std::left << eitr;
OMNITRACE_THROW("perfetto_counter_track emplace method for '%s' (%p) "
"invalidated C-string '%s' (%p).\n%8s: %s\n%8s: %s\n",
_v.c_str(), (void*) _name->c_str(),
std::get<0>(itr).c_str(),
(void*) std::get<0>(itr).c_str(), "previous",
_pss.str().c_str(), "current", _css.str().c_str());
}
}
}
return _index;
}
bool _incr = false);
static auto& at(size_t _idx, size_t _n) { return get_data().second.at(_idx).at(_n); }
@@ -170,4 +69,87 @@ private:
return _v;
}
};
template <typename Tp>
auto
perfetto_counter_track<Tp>::exists(size_t _idx, int64_t _n)
{
bool _v = get_data().second.count(_idx) != 0;
if(_n < 0 || !_v) return _v;
return static_cast<size_t>(_n) < get_data().second.at(_idx).size();
}
template <typename Tp>
size_t
perfetto_counter_track<Tp>::size(size_t _idx)
{
bool _v = get_data().second.count(_idx) != 0;
if(!_v) return 0;
return get_data().second.at(_idx).size();
}
template <typename Tp>
auto
perfetto_counter_track<Tp>::emplace(size_t _idx, const std::string& _v,
const char* _units, const char* _category,
int64_t _mult, bool _incr)
{
auto& _name_data = get_data().first[_idx];
auto& _track_data = get_data().second[_idx];
std::vector<std::tuple<std::string, const char*, bool>> _missing = {};
if(config::get_is_continuous_integration())
{
for(const auto& itr : _name_data)
{
_missing.emplace_back(std::make_tuple(*itr, itr->c_str(), false));
}
}
auto _index = _track_data.size();
auto& _name = _name_data.emplace_back(std::make_unique<std::string>(_v));
const char* _unit_name = (_units && strlen(_units) > 0) ? _units : nullptr;
_track_data.emplace_back(::perfetto::CounterTrack{ _name->c_str() }
.set_unit_name(_unit_name)
.set_category(_category)
.set_unit_multiplier(_mult)
.set_is_incremental(_incr));
if(config::get_is_continuous_integration())
{
for(auto& itr : _missing)
{
const char* citr = std::get<1>(itr);
for(const auto& ditr : _name_data)
{
if(citr == ditr->c_str() && strcmp(citr, ditr->c_str()) == 0)
{
std::get<2>(itr) = true;
break;
}
}
if(!std::get<2>(itr))
{
std::set<void*> _prev = {};
std::set<void*> _curr = {};
for(const auto& eitr : _missing)
_prev.emplace(
static_cast<void*>(const_cast<char*>(std::get<1>(eitr))));
for(const auto& eitr : _name_data)
_curr.emplace(static_cast<void*>(const_cast<char*>(eitr->c_str())));
std::stringstream _pss{};
for(auto&& eitr : _prev)
_pss << " " << std::hex << std::setw(12) << std::left << eitr;
std::stringstream _css{};
for(auto&& eitr : _curr)
_css << " " << std::hex << std::setw(12) << std::left << eitr;
OMNITRACE_THROW("perfetto_counter_track emplace method for '%s' (%p) "
"invalidated C-string '%s' (%p).\n%8s: %s\n%8s: %s\n",
_v.c_str(), (void*) _name->c_str(),
std::get<0>(itr).c_str(),
(void*) std::get<0>(itr).c_str(), "previous",
_pss.str().c_str(), "current", _css.str().c_str());
}
}
}
return _index;
}
} // namespace omnitrace
+1 -14
Ver fichero
@@ -442,20 +442,7 @@ post_process()
uint32_t
device_count()
{
uint32_t _num_devices = 0;
try
{
static auto _rsmi_init_once = []() { OMNITRACE_ROCM_SMI_CALL(rsmi_init(0)); };
static std::once_flag _once{};
std::call_once(_once, _rsmi_init_once);
OMNITRACE_ROCM_SMI_CALL(rsmi_num_monitor_devices(&_num_devices));
} catch(std::exception& _e)
{
OMNITRACE_BASIC_VERBOSE(1, "Exception thrown getting the rocm-smi devices: %s\n",
_e.what());
}
return _num_devices;
return gpu::rsmi_device_count();
}
} // namespace rocm_smi
} // namespace omnitrace
+1 -1
Ver fichero
@@ -82,7 +82,7 @@ struct data
using mem_usage_t = uint64_t;
using temp_t = int64_t;
TIMEMORY_DEFAULT_OBJECT(data)
OMNITRACE_DEFAULT_OBJECT(data)
explicit data(uint32_t _dev_id);
+1 -1
Ver fichero
@@ -660,7 +660,7 @@ post_process_timemory()
rocm_event* parent = nullptr;
mutable std::vector<local_event> children = {};
TIMEMORY_DEFAULT_OBJECT(local_event)
OMNITRACE_DEFAULT_OBJECT(local_event)
explicit local_event(rocm_event* _v)
: parent{ _v }
+10 -24
Ver fichero
@@ -21,6 +21,7 @@
// SOFTWARE.
#include "library/roctracer.hpp"
#include "library/components/category_region.hpp"
#include "library/components/fwd.hpp"
#include "library/config.hpp"
#include "library/critical_trace.hpp"
@@ -99,7 +100,7 @@ get_roctracer_kernels()
auto&
get_roctracer_hip_data(int64_t _tid = threading::get_id())
{
using data_t = std::unordered_map<uint64_t, roctracer_bundle_t>;
using data_t = std::unordered_map<uint64_t, roctracer_hip_bundle_t>;
using thread_data_t = thread_data<data_t, category::roctracer>;
static auto& _v = thread_data_t::instances(construct_on_init{});
return _v.at(_tid);
@@ -124,7 +125,7 @@ struct cid_data : cid_tuple_t
{
using cid_tuple_t::cid_tuple_t;
TIMEMORY_DEFAULT_OBJECT(cid_data)
OMNITRACE_DEFAULT_OBJECT(cid_data)
auto& cid() { return std::get<0>(*this); }
auto& pcid() { return std::get<1>(*this); }
@@ -454,20 +455,12 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
{
case ROCTX_API_ID_roctxRangePushA:
{
if(get_use_perfetto())
tracing::push_perfetto(category::rocm_roctx{}, _data->args.message);
if(get_use_timemory())
tracing::push_timemory(category::rocm_roctx{}, _data->args.message);
component::category_region<category::rocm_roctx>::start(_data->args.message);
break;
}
case ROCTX_API_ID_roctxRangePop:
{
if(get_use_timemory())
tracing::pop_timemory(category::rocm_roctx{}, _data->args.message);
if(get_use_perfetto())
tracing::pop_perfetto(category::rocm_roctx{}, _data->args.message);
component::category_region<category::rocm_roctx>::stop(_data->args.message);
break;
}
case ROCTX_API_ID_roctxRangeStartA:
@@ -479,11 +472,7 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
std::string_view{ _data->args.message });
}
if(get_use_perfetto())
tracing::push_perfetto(category::rocm_roctx{}, _data->args.message);
if(get_use_timemory())
tracing::push_timemory(category::rocm_roctx{}, _data->args.message);
component::category_region<category::rocm_roctx>::start(_data->args.message);
break;
}
case ROCTX_API_ID_roctxRangeStop:
@@ -510,10 +499,7 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
if(!_message.empty())
{
if(get_use_timemory())
tracing::pop_timemory(category::rocm_roctx{}, _message.data());
if(get_use_perfetto())
tracing::pop_perfetto(category::rocm_roctx{}, _message.data());
component::category_region<category::rocm_roctx>::stop(_message.data());
}
break;
@@ -733,8 +719,8 @@ hip_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
}
if(get_use_timemory())
{
auto itr = get_roctracer_hip_data()->emplace(_corr_id,
roctracer_bundle_t{ op_name });
auto itr = get_roctracer_hip_data()->emplace(
_corr_id, roctracer_hip_bundle_t{ op_name });
if(itr.second)
{
itr.first->second.start();
@@ -983,7 +969,7 @@ hip_activity_callback(const char* begin, const char* end, void* arg)
if(_found && _name != nullptr && get_use_timemory())
{
auto _func = [_beg_ns, _end_ns, _name]() {
roctracer_bundle_t _bundle{ _name };
roctracer_hip_bundle_t _bundle{ _name };
_bundle.start()
.store(std::plus<double>{}, static_cast<double>(_end_ns - _beg_ns))
.stop()
+3 -3
Ver fichero
@@ -46,10 +46,10 @@
namespace omnitrace
{
using roctracer_bundle_t =
tim::component_bundle<project::omnitrace, comp::roctracer_data, comp::wall_clock>;
using roctracer_hip_bundle_t =
tim::component_bundle<category::rocm_hip, comp::roctracer_data, comp::wall_clock>;
using roctracer_hsa_bundle_t =
tim::component_bundle<project::omnitrace, comp::roctracer_data>;
tim::component_bundle<category::rocm_hsa, comp::roctracer_data>;
using roctracer_functions_t = std::vector<std::pair<std::string, std::function<void()>>>;
// HSA API callback function
-29
Ver fichero
@@ -89,35 +89,6 @@ sampling_on_child_threads()
}
} // namespace
int
get_realtime_signal()
{
return SIGRTMIN + config::get_sampling_rtoffset();
}
int
get_cputime_signal()
{
return SIGPROF;
}
std::set<int> get_sampling_signals(int64_t)
{
auto _v = std::set<int>{};
if(config::get_use_causal())
{
_v.emplace(get_cputime_signal());
_v.emplace(get_realtime_signal());
}
else
{
if(config::get_use_sampling_cputime()) _v.emplace(get_cputime_signal());
if(config::get_use_sampling_realtime()) _v.emplace(get_realtime_signal());
}
return _v;
}
std::atomic<uint64_t>&
get_cpu_cid()
{
-9
Ver fichero
@@ -78,15 +78,6 @@ get_init_bundle();
std::unique_ptr<preinit_bundle_t>&
get_preinit_bundle();
int
get_realtime_signal();
int
get_cputime_signal();
std::set<int>
get_sampling_signals(int64_t _tid = 0);
std::atomic<uint64_t>&
get_cpu_cid() TIMEMORY_HOT;
+53 -25
Ver fichero
@@ -854,11 +854,19 @@ void
post_process_perfetto(int64_t _tid, const bundle_t* _init,
const std::vector<bundle_t*>& _data)
{
auto _valid_metrics = backtrace_metrics::valid_array_t{};
for(const auto& itr : _data)
{
const auto* _bt_mt = itr->get<backtrace_metrics>();
if(_bt_mt) _valid_metrics |= _bt_mt->get_valid();
}
if(trait::runtime_enabled<backtrace_metrics>::get())
{
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
"[%li] Post-processing metrics for perfetto...\n", _tid);
backtrace_metrics::init_perfetto(_tid);
backtrace_metrics::init_perfetto(_tid, _valid_metrics);
for(const auto& itr : _data)
{
const auto* _bt_metrics = itr->get<backtrace_metrics>();
@@ -867,8 +875,7 @@ post_process_perfetto(int64_t _tid, const bundle_t* _init,
if(_bt_time->get_tid() != _tid) continue;
_bt_metrics->post_process_perfetto(_tid, _bt_time->get_timestamp());
}
backtrace_metrics::fini_perfetto(_tid);
backtrace_metrics::fini_perfetto(_tid, _valid_metrics);
}
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
@@ -936,6 +943,12 @@ post_process_perfetto(int64_t _tid, const bundle_t* _init,
_bt_mt->get_hw_counters().size() ==
_last->get<backtrace_metrics>()->get_hw_counters().size();
auto _hw_counters_enabled = [](const auto* _bt_v) {
return (_bt_v != nullptr) &&
(*_bt_v)(type_list<backtrace_metrics::hw_counters>{}) &&
(*_bt_v)(category::thread_hardware_counter{});
};
// annotations common to both modes
auto _common_annotate = [&](::perfetto::EventContext& ctx, bool _is_last) {
if(_include_common && _is_last)
@@ -943,7 +956,9 @@ post_process_perfetto(int64_t _tid, const bundle_t* _init,
tracing::add_perfetto_annotation(ctx, "begin_ns", _beg);
tracing::add_perfetto_annotation(ctx, "end_ns", _end);
}
if(_include_hw && _is_last)
if(_include_hw && _is_last && _last &&
_hw_counters_enabled(_last->get<backtrace_metrics>()) &&
_hw_counters_enabled(_bt_mt))
{
// current values when read
auto _hw_cnt_vals = _bt_mt->get_hw_counters();
@@ -1048,16 +1063,15 @@ post_process_timemory(int64_t _tid, const bundle_t* _init,
using bundle_t = tim::lightweight_tuple<comp::trip_count, sampling_wall_clock,
sampling_cpu_clock, hw_counters>;
auto* _bt_data = itr->get<backtrace>();
auto* _bt_time = itr->get<backtrace_timestamp>();
auto* _bt_metrics = itr->get<backtrace_metrics>();
auto* _bt_data = itr->get<backtrace>();
auto* _bt_time = itr->get<backtrace_timestamp>();
auto* _bt_metrics = itr->get<backtrace_metrics>();
const auto* _last_metrics = _last->get<backtrace_metrics>();
if(!_bt_data || !_bt_time || !_bt_metrics) continue;
if(!_bt_data || !_bt_time) continue;
double _elapsed_wc = (_bt_time->get_timestamp() -
_last->get<backtrace_timestamp>()->get_timestamp());
double _elapsed_cc = (_bt_metrics->get_cpu_timestamp() -
_last->get<backtrace_metrics>()->get_cpu_timestamp());
std::vector<bundle_t> _tc{};
_tc.reserve(_bt_data->size());
@@ -1090,31 +1104,45 @@ post_process_timemory(int64_t _tid, const bundle_t* _init,
if constexpr(tim::trait::is_available<sampling_cpu_clock>::value)
{
auto* _cc = iitr.get<sampling_cpu_clock>();
if(_cc)
if(_cc && _bt_metrics && _last_metrics &&
(*_bt_metrics)(category::thread_cpu_time{}) &&
(*_last_metrics)(category::thread_cpu_time{}))
{
double _elapsed_cc = (_bt_metrics->get_cpu_timestamp() -
_last_metrics->get_cpu_timestamp());
_cc->set_value(_elapsed_cc / sampling_cpu_clock::get_unit());
_cc->set_accum(_elapsed_cc / sampling_cpu_clock::get_unit());
}
}
if constexpr(tim::trait::is_available<hw_counters>::value)
{
auto _hw_cnt_vals = _bt_metrics->get_hw_counters();
if(_last && _bt_metrics->get_hw_counters().size() ==
_last->get<backtrace_metrics>()->get_hw_counters().size())
auto _hw_counters_enabled = [](const auto* _bt_v) {
return (_bt_v != nullptr) &&
(*_bt_v)(type_list<backtrace_metrics::hw_counters>{}) &&
(*_bt_v)(category::thread_hardware_counter{});
};
if(_bt_metrics && _last_metrics && _hw_counters_enabled(_bt_metrics) &&
_hw_counters_enabled(_last_metrics))
{
for(size_t k = 0; k < _bt_metrics->get_hw_counters().size(); ++k)
auto _hw_cnt_vals = _bt_metrics->get_hw_counters();
if(_bt_metrics->get_hw_counters().size() ==
_last_metrics->get_hw_counters().size())
{
if(_last->get<backtrace_metrics>()->get_hw_counters()[k] >
_hw_cnt_vals[k])
_hw_cnt_vals[k] -=
_last->get<backtrace_metrics>()->get_hw_counters()[k];
for(size_t k = 0; k < _bt_metrics->get_hw_counters().size(); ++k)
{
if(_last_metrics->get_hw_counters()[k] > _hw_cnt_vals[k])
_hw_cnt_vals[k] -= _last_metrics->get_hw_counters()[k];
}
}
auto* _hw_counter = iitr.get<hw_counters>();
if(_hw_counter)
{
_hw_counter->set_value(_hw_cnt_vals);
_hw_counter->set_accum(_hw_cnt_vals);
}
}
auto* _hw_counter = iitr.get<hw_counters>();
if(_hw_counter)
{
_hw_counter->set_value(_hw_cnt_vals);
_hw_counter->set_accum(_hw_cnt_vals);
}
}
iitr.pop();
@@ -98,6 +98,15 @@ init_index_data(int64_t _tid, bool _offset = false)
const auto unknown_thread = std::optional<thread_info>{};
} // namespace
std::string
thread_index_data::as_string() const
{
auto _ss = std::stringstream{};
_ss << sequent_value << " [" << as_hex(system_value) << "] (#" << internal_value
<< ")";
return _ss.str();
}
int64_t
grow_data(int64_t _tid)
{
@@ -64,6 +64,8 @@ struct thread_index_data
int64_t internal_value = utility::get_thread_index();
int64_t system_value = tim::threading::get_sys_tid();
int64_t sequent_value = tim::threading::get_id();
std::string as_string() const;
};
int64_t grow_data(int64_t);
-15
Ver fichero
@@ -34,20 +34,6 @@ bool debug_pop = tim::get_env("OMNITRACE_DEBUG_POP", false) || get_debug_env();
bool debug_mark = tim::get_env("OMNITRACE_DEBUG_MARK", false) || get_debug_env();
bool debug_user = tim::get_env("OMNITRACE_DEBUG_USER_REGIONS", false) || get_debug_env();
perfetto::TraceConfig&
get_perfetto_config()
{
static auto _v = ::perfetto::TraceConfig{};
return _v;
}
std::unique_ptr<perfetto::TracingSession>&
get_perfetto_session()
{
static auto _v = std::unique_ptr<perfetto::TracingSession>{};
return _v;
}
std::unordered_map<hash_value_t, std::string>&
get_perfetto_track_uuids()
{
@@ -114,7 +100,6 @@ thread_init()
process::get_id(), "thread",
threading::get_id()),
quirk::config<quirk::auto_start>{});
get_interval_data()->reserve(512);
// save the hash maps
get_timemory_hash_ids() = tim::get_hash_ids();
get_timemory_hash_aliases() = tim::get_hash_aliases();
+304 -59
Ver fichero
@@ -40,6 +40,7 @@
#include <timemory/components/timing/backends.hpp>
#include <timemory/hash/types.hpp>
#include <timemory/mpl/concepts.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <atomic>
@@ -70,12 +71,6 @@ extern OMNITRACE_HIDDEN_API bool debug_mark;
std::unordered_map<hash_value_t, std::string>&
get_perfetto_track_uuids();
perfetto::TraceConfig&
get_perfetto_config();
std::unique_ptr<perfetto::TracingSession>&
get_perfetto_session();
tim::hash_map_ptr_t&
get_timemory_hash_ids(int64_t _tid = threading::get_id());
@@ -91,6 +86,46 @@ record_thread_start_time();
void
thread_init();
template <typename CategoryT>
auto&
get_category_stack();
template <typename CategoryT, typename... Args>
inline void
push_perfetto(CategoryT, const char*, Args&&...);
template <typename CategoryT, typename... Args>
inline void
pop_perfetto(CategoryT, const char*, Args&&...);
template <typename CategoryT, typename... Args>
inline void
push_perfetto_ts(CategoryT, const char*, uint64_t _ts, Args&&...);
template <typename CategoryT, typename... Args>
inline void
pop_perfetto_ts(CategoryT, const char*, uint64_t, Args&&...);
template <typename CategoryT, typename... Args>
inline void
push_perfetto_track(CategoryT, const char*, perfetto::Track, uint64_t, Args&&...);
template <typename CategoryT, typename... Args>
inline void
pop_perfetto_track(CategoryT, const char*, perfetto::Track, uint64_t, Args&&...);
template <typename CategoryT, typename... Args>
inline void
mark_perfetto(CategoryT, const char*, Args&&...);
template <typename CategoryT, typename... Args>
inline void
mark_perfetto_ts(CategoryT, const char*, uint64_t, Args&&...);
template <typename CategoryT, typename... Args>
inline void
mark_perfetto_track(CategoryT, const char*, perfetto::Track, uint64_t, Args&&...);
//
// definitions
//
@@ -147,13 +182,6 @@ now()
return ::tim::get_clock_real_now<Tp, std::nano>();
}
inline auto&
get_interval_data(int64_t _tid = threading::get_id())
{
static auto& _v = interval_data_instances::instances(construct_on_init{});
return _v.at(_tid);
}
inline auto&
get_instrumentation_bundles(int64_t _tid = threading::get_id())
{
@@ -174,44 +202,128 @@ pop_count()
return _v;
}
template <typename CategoryT, typename... Args>
inline void
push_timemory(CategoryT, const char* name, Args&&... args)
struct category_stack
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
int32_t profile = 0; // use signed so compiler doesn't have to
int32_t tracing = 0; // account for underflow/overflow
};
auto& _data = tracing::get_instrumentation_bundles();
// this generates a hash for the raw string array
auto _hash = tim::add_hash_id(tim::string_view_t{ name });
_data.construct(_hash)->start(std::forward<Args>(args)...);
template <typename CategoryT>
auto&
get_category_stack()
{
static thread_local auto _v = category_stack{};
return _v;
}
template <typename CategoryT>
auto&
get_tracing_stack()
{
return get_category_stack<CategoryT>().tracing;
}
template <typename CategoryT>
auto&
get_profile_stack()
{
return get_category_stack<CategoryT>().profile;
}
template <typename CategoryT>
auto
category_push_disabled()
{
return !trait::runtime_enabled<CategoryT>::get();
}
template <typename CategoryT>
auto
category_mark_disabled()
{
return !trait::runtime_enabled<CategoryT>::get();
}
template <typename CategoryT>
auto
category_pop_disabled()
{
return !trait::runtime_enabled<CategoryT>::get() &&
(get_profile_stack<CategoryT>() + get_tracing_stack<CategoryT>()) <= 0;
}
template <typename CategoryT>
auto
tracing_pop_disabled()
{
return !trait::runtime_enabled<CategoryT>::get() &&
get_tracing_stack<CategoryT>() <= 0;
}
template <typename CategoryT>
auto
profile_pop_disabled()
{
return !trait::runtime_enabled<CategoryT>::get() &&
get_profile_stack<CategoryT>() <= 0;
}
template <typename CategoryT, typename... Args>
inline void
pop_timemory(CategoryT, const char* name, Args&&... args)
push_timemory(CategoryT, std::string_view name, Args&&... args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
if(category_push_disabled<CategoryT>()) return;
auto _hash = tim::hash::get_hash_id(tim::string_view_t{ name });
auto& _data = tracing::get_instrumentation_bundles();
if(_data.bundles.empty())
// this generates a hash for the raw string array
auto _hash = tim::add_hash_id(name);
_data.construct(_hash)->start(std::forward<Args>(args)...);
// increment the profile stack
++get_profile_stack<CategoryT>();
}
template <typename CategoryT, typename... Args>
inline void
pop_timemory(CategoryT, std::string_view name, Args&&... args)
{
// skip if category is disabled and not pushed on this thread
if(profile_pop_disabled<CategoryT>()) return;
auto _hash = tim::hash::get_hash_id(name);
auto& _data = tracing::get_instrumentation_bundles();
if(OMNITRACE_UNLIKELY(_data.bundles.empty()))
{
OMNITRACE_DEBUG("[%s] skipped %s :: empty bundle stack\n", "omnitrace_pop_trace",
name);
name.data());
return;
}
for(size_t i = _data.bundles.size(); i > 0; --i)
auto*& _v_back = _data.bundles.back();
if(OMNITRACE_LIKELY(_v_back->get_hash() == _hash))
{
auto*& _v = _data.bundles.at(i - 1);
if(_v->get_hash() == _hash)
// decrement the profile stack
--get_profile_stack<CategoryT>();
_v_back->stop(std::forward<Args>(args)...);
_data.allocator.destroy(_v_back);
_data.allocator.deallocate(_v_back, 1);
_data.bundles.erase(--_data.bundles.end());
}
else if(_data.bundles.size() > 1)
{
for(size_t i = _data.bundles.size() - 1; i > 0; --i)
{
_v->stop(std::forward<Args>(args)...);
_data.allocator.destroy(_v);
_data.allocator.deallocate(_v, 1);
_data.bundles.erase(_data.bundles.begin() + (i - 1));
break;
auto*& _v = _data.bundles.at(i - 1);
if(_v->get_hash() == _hash)
{
// decrement the profile stack
--get_profile_stack<CategoryT>();
_v->stop(std::forward<Args>(args)...);
_data.allocator.destroy(_v);
_data.allocator.deallocate(_v, 1);
_data.bundles.erase(_data.bundles.begin() + (i - 1));
break;
}
}
}
}
@@ -221,12 +333,13 @@ inline void
push_perfetto(CategoryT, const char* name, Args&&... args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
if(category_push_disabled<CategoryT>()) return;
uint64_t _ts = comp::wall_clock::record();
if constexpr(sizeof...(Args) == 1 &&
std::is_invocable<Args..., perfetto::EventContext>::value)
{
++get_tracing_stack<CategoryT>();
uint64_t _ts = now();
if(config::get_perfetto_annotations())
{
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name),
@@ -240,28 +353,48 @@ push_perfetto(CategoryT, const char* name, Args&&... args)
}
else
{
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name),
_ts, std::forward<Args>(args)...,
[&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
{
tracing::add_perfetto_annotation(ctx, "begin_ns", _ts);
}
});
using tuple_type = std::tuple<concepts::unqualified_type_t<Args>...>;
using arg0_type = concepts::tuple_element_t<0, tuple_type>;
using arg1_type = concepts::tuple_element_t<1, tuple_type>;
if constexpr(std::is_same<arg0_type, perfetto::Track>::value &&
std::is_same<arg1_type, uint64_t>::value)
{
push_perfetto_track(CategoryT{}, name, std::forward<Args>(args)...);
}
else if constexpr(std::is_same<arg0_type, uint64_t>::value)
{
push_perfetto_ts(CategoryT{}, name, std::forward<Args>(args)...);
}
else
{
++get_tracing_stack<CategoryT>();
uint64_t _ts = now();
TRACE_EVENT_BEGIN(
trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
std::forward<Args>(args)..., [&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
{
tracing::add_perfetto_annotation(ctx, "begin_ns", _ts);
}
});
}
}
}
template <typename CategoryT, typename... Args>
inline void
pop_perfetto(CategoryT, const char*, Args&&... args)
pop_perfetto(CategoryT, const char* name, Args&&... args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
// skip if category is disabled and not pushed on this thread
if(tracing_pop_disabled<CategoryT>()) return;
uint64_t _ts = comp::wall_clock::record();
if constexpr(sizeof...(Args) == 1 &&
std::is_invocable<Args..., perfetto::EventContext>::value)
{
// decrement tracing stack
--get_tracing_stack<CategoryT>();
uint64_t _ts = now();
if(config::get_perfetto_annotations())
{
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts, "end_ns", _ts,
@@ -275,14 +408,35 @@ pop_perfetto(CategoryT, const char*, Args&&... args)
}
else
{
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts, std::forward<Args>(args)...,
[&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
{
tracing::add_perfetto_annotation(ctx, "end_ns", _ts);
}
});
using tuple_type = std::tuple<concepts::unqualified_type_t<Args>...>;
using arg0_type = concepts::tuple_element_t<0, tuple_type>;
using arg1_type = concepts::tuple_element_t<1, tuple_type>;
if constexpr(std::is_same<arg0_type, perfetto::Track>::value &&
std::is_same<arg1_type, uint64_t>::value)
{
pop_perfetto_track(CategoryT{}, name, std::forward<Args>(args)...);
}
else if constexpr(std::is_same<arg0_type, uint64_t>::value)
{
pop_perfetto_ts(CategoryT{}, name, std::forward<Args>(args)...);
}
else
{
// decrement tracing stack
--get_tracing_stack<CategoryT>();
uint64_t _ts = now();
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts,
std::forward<Args>(args)..., [&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
{
tracing::add_perfetto_annotation(ctx, "end_ns", _ts);
}
});
}
}
(void) name;
}
template <typename CategoryT, typename... Args>
@@ -290,8 +444,9 @@ inline void
push_perfetto_ts(CategoryT, const char* name, uint64_t _ts, Args&&... args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
if(category_push_disabled<CategoryT>()) return;
++get_tracing_stack<CategoryT>();
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
std::forward<Args>(args)...);
}
@@ -300,8 +455,11 @@ template <typename CategoryT, typename... Args>
inline void
pop_perfetto_ts(CategoryT, const char*, uint64_t _ts, Args&&... args)
{
// skip if category is disabled
if(!trait::runtime_enabled<CategoryT>::get()) return;
// skip if category is disabled and not pushed on this thread
if(tracing_pop_disabled<CategoryT>()) return;
// decrement tracing stack
--get_tracing_stack<CategoryT>();
TRACE_EVENT_END(trait::name<CategoryT>::value, _ts, std::forward<Args>(args)...);
}
@@ -311,6 +469,10 @@ inline void
push_perfetto_track(CategoryT, const char* name, perfetto::Track _track, uint64_t _ts,
Args&&... args)
{
// skip if category is disabled
if(category_push_disabled<CategoryT>()) return;
++get_tracing_stack<CategoryT>();
TRACE_EVENT_BEGIN(trait::name<CategoryT>::value, perfetto::StaticString(name), _track,
_ts, std::forward<Args>(args)...);
}
@@ -320,8 +482,91 @@ inline void
pop_perfetto_track(CategoryT, const char*, perfetto::Track _track, uint64_t _ts,
Args&&... args)
{
// skip if category is disabled and not pushed on this thread
if(tracing_pop_disabled<CategoryT>()) return;
// decrement tracing stack
--get_tracing_stack<CategoryT>();
TRACE_EVENT_END(trait::name<CategoryT>::value, _track, _ts,
std::forward<Args>(args)...);
}
template <typename CategoryT, typename... Args>
inline void
mark_perfetto(CategoryT, const char* name, Args&&... args)
{
// skip if category is disabled
if(category_mark_disabled<CategoryT>()) return;
if constexpr(sizeof...(Args) == 1 &&
std::is_invocable<Args..., perfetto::EventContext>::value)
{
uint64_t _ts = now();
if(config::get_perfetto_annotations())
{
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value,
perfetto::StaticString(name), _ts, "ns", _ts,
std::forward<Args>(args)...);
}
else
{
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value,
perfetto::StaticString(name), _ts,
std::forward<Args>(args)...);
}
}
else
{
using tuple_type = std::tuple<concepts::unqualified_type_t<Args>...>;
using arg0_type = concepts::tuple_element_t<0, tuple_type>;
using arg1_type = concepts::tuple_element_t<1, tuple_type>;
if constexpr(std::is_same<arg0_type, perfetto::Track>::value &&
std::is_same<arg1_type, uint64_t>::value)
{
mark_perfetto_track(CategoryT{}, name, std::forward<Args>(args)...);
}
else if constexpr(std::is_same<arg0_type, uint64_t>::value)
{
mark_perfetto_ts(CategoryT{}, name, std::forward<Args>(args)...);
}
else
{
uint64_t _ts = now();
TRACE_EVENT_INSTANT(
trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
std::forward<Args>(args)..., [&](perfetto::EventContext ctx) {
if(config::get_perfetto_annotations())
{
tracing::add_perfetto_annotation(ctx, "ns", _ts);
}
});
}
}
}
template <typename CategoryT, typename... Args>
inline void
mark_perfetto_ts(CategoryT, const char* name, uint64_t _ts, Args&&... args)
{
// skip if category is disabled
if(category_mark_disabled<CategoryT>()) return;
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value, perfetto::StaticString(name), _ts,
std::forward<Args>(args)...);
}
template <typename CategoryT, typename... Args>
inline void
mark_perfetto_track(CategoryT, const char*, perfetto::Track _track, uint64_t _ts,
Args&&... args)
{
// skip if category is disabled
if(category_mark_disabled<CategoryT>()) return;
TRACE_EVENT_INSTANT(trait::name<CategoryT>::value, _track, _ts,
std::forward<Args>(args)...);
}
} // namespace tracing
} // namespace omnitrace
+12
Ver fichero
@@ -32,6 +32,7 @@
#include <atomic>
#include <cstddef>
#include <cstdint>
#include <sstream>
#include <stdexcept>
#include <vector>
@@ -226,5 +227,16 @@ get_regex_or(const ContainerT<Tp, TailT...>& _container, PredicateT&& _predicate
return get_regex_or(_dest, _fallback);
}
template <typename Tp>
Tp
convert(std::string_view _inp)
{
auto _iss = std::stringstream{};
auto _ret = Tp{};
_iss << _inp;
_iss >> _ret;
return _ret;
}
} // namespace utility
} // namespace omnitrace
+116 -7
Ver fichero
@@ -504,7 +504,7 @@ set(_ompt_preload_environ
"OMNITRACE_SAMPLING_REALTIME=ON"
"OMNITRACE_SAMPLING_CPUTIME_FREQ=1000"
"OMNITRACE_SAMPLING_REALTIME_FREQ=500"
"OMNITRACE_COLORIZED_LOG=OFF")
"OMNITRACE_MONOCHROME=ON")
set(_ompt_sample_no_tmpfiles_environ
"${_ompt_environment}"
@@ -516,7 +516,7 @@ set(_ompt_sample_no_tmpfiles_environ
"OMNITRACE_SAMPLING_REALTIME=OFF"
"OMNITRACE_SAMPLING_CPUTIME_FREQ=700"
"OMNITRACE_USE_TEMPORARY_FILES=OFF"
"OMNITRACE_COLORIZED_LOG=OFF")
"OMNITRACE_MONOCHROME=ON")
set(_ompt_preload_samp_regex
"Sampler for thread 0 will be triggered 1000.0x per second of CPU-time(.*)Sampler for thread 0 will be triggered 500.0x per second of wall-time(.*)Sampling will be disabled after 0.250000 seconds(.*)Sampling duration of 0.250000 seconds has elapsed. Shutting down sampling"
@@ -684,6 +684,111 @@ omnitrace_add_test(
RUNTIME_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%"
REWRITE_RUN_PASS_REGEX "(\\\[[0-9]+\\\]) function coverage :: 66.67%")
omnitrace_add_test(
SKIP_BASELINE SKIP_SAMPLING SKIP_PRELOAD
NAME trace-time-window
TARGET trace-time-window
REWRITE_ARGS -e -v 2 --caller-include inner -i 4096
RUNTIME_ARGS -e -v 1 --caller-include inner -i 4096
LABELS "time-window"
ENVIRONMENT "${_window_environment};OMNITRACE_TRACE_DURATION=1.25")
omnitrace_add_validation_test(
NAME trace-time-window-binary-rewrite
TIMEMORY_METRIC "wall_clock"
TIMEMORY_FILE "wall_clock.json"
PERFETTO_METRIC "host"
PERFETTO_FILE "perfetto-trace.proto"
LABELS "time-window"
FAIL_REGEX "outer_d"
ARGS -l
main
outer_a
outer_b
outer_c
-c
1
1
1
1
-d
0
1
1
1
-p)
omnitrace_add_validation_test(
NAME trace-time-window-runtime-instrument
TIMEMORY_METRIC "wall_clock"
TIMEMORY_FILE "wall_clock.json"
PERFETTO_METRIC "host"
PERFETTO_FILE "perfetto-trace.proto"
LABELS "time-window"
FAIL_REGEX "outer_d"
ARGS -l
main
outer_a
outer_b
outer_c
-c
1
1
1
1
-d
0
1
1
1
-p)
omnitrace_add_test(
SKIP_BASELINE SKIP_SAMPLING SKIP_PRELOAD
NAME trace-time-window-delay
TARGET trace-time-window
REWRITE_ARGS -e -v 2 --caller-include inner -i 4096
RUNTIME_ARGS -e -v 1 --caller-include inner -i 4096
LABELS "time-window"
ENVIRONMENT
"${_window_environment};OMNITRACE_TRACE_DELAY=0.75;OMNITRACE_TRACE_DURATION=0.75")
omnitrace_add_validation_test(
NAME trace-time-window-delay-binary-rewrite
TIMEMORY_METRIC "wall_clock"
TIMEMORY_FILE "wall_clock.json"
PERFETTO_METRIC "host"
PERFETTO_FILE "perfetto-trace.proto"
LABELS "time-window"
ARGS -l
outer_c
outer_d
-c
1
1
-d
0
0
-p)
omnitrace_add_validation_test(
NAME trace-time-window-delay-runtime-instrument
TIMEMORY_METRIC "wall_clock"
TIMEMORY_FILE "wall_clock.json"
PERFETTO_METRIC "host"
PERFETTO_FILE "perfetto-trace.proto"
LABELS "time-window"
ARGS -l
outer_c
outer_d
-c
1
1
-d
0
0
-p)
# -------------------------------------------------------------------------------------- #
#
# critical-trace tests
@@ -823,6 +928,10 @@ foreach(_TARGET ${RCCL_TEST_TARGETS})
line
return
args
-ME
sysdeps
--log-file
rccl-test-${_NAME}.log
RUN_ARGS -t
1
-g
@@ -910,7 +1019,7 @@ omnitrace_add_causal_test(
)
set(_causal_common_args
"-n 10 -e -s 0 10 20 30 -B $<TARGET_FILE_BASE_NAME:causal-cpu-omni>")
"-n 20 -e -s 0 10 20 30 -B $<TARGET_FILE_BASE_NAME:causal-cpu-omni>")
macro(
causal_e2e_args_and_validation
@@ -945,7 +1054,7 @@ omnitrace_add_causal_test(
SKIP_BASELINE
NAME cpu-omni-slow-func-e2e
TARGET causal-cpu-omni
RUN_ARGS 80 30 432525 200000000
RUN_ARGS 80 12 432525 250000000
CAUSAL_MODE "func"
CAUSAL_ARGS ${_causal_slow_func_args}
CAUSAL_VALIDATE_ARGS ${_causal_slow_func_valid}
@@ -957,7 +1066,7 @@ omnitrace_add_causal_test(
SKIP_BASELINE
NAME cpu-omni-fast-func-e2e
TARGET causal-cpu-omni
RUN_ARGS 80 30 432525 200000000
RUN_ARGS 80 12 432525 250000000
CAUSAL_MODE "func"
CAUSAL_ARGS ${_causal_fast_func_args}
CAUSAL_VALIDATE_ARGS ${_causal_fast_func_valid}
@@ -969,7 +1078,7 @@ omnitrace_add_causal_test(
SKIP_BASELINE
NAME cpu-omni-line-155-e2e
TARGET causal-cpu-omni
RUN_ARGS 80 30 432525 200000000
RUN_ARGS 80 12 432525 250000000
CAUSAL_MODE "line"
CAUSAL_ARGS ${_causal_line_155_args}
CAUSAL_VALIDATE_ARGS ${_causal_line_155_valid}
@@ -981,7 +1090,7 @@ omnitrace_add_causal_test(
SKIP_BASELINE
NAME cpu-omni-line-165-e2e
TARGET causal-cpu-omni
RUN_ARGS 80 30 432525 200000000
RUN_ARGS 80 12 432525 250000000
CAUSAL_MODE "line"
CAUSAL_ARGS ${_causal_line_165_args}
CAUSAL_VALIDATE_ARGS ${_causal_line_165_valid}
+157 -3
Ver fichero
@@ -164,6 +164,17 @@ set(_rccl_environment
"${_test_openmp_env}"
"${_test_library_path}")
set(_window_environment
"OMNITRACE_USE_PERFETTO=ON"
"OMNITRACE_USE_TIMEMORY=ON"
"OMNITRACE_USE_SAMPLING=OFF"
"OMNITRACE_USE_PROCESS_SAMPLING=OFF"
"OMNITRACE_TIME_OUTPUT=OFF"
"OMNITRACE_FILE_OUTPUT=ON"
"OMNITRACE_VERBOSE=2"
"${_test_openmp_env}"
"${_test_library_path}")
# -------------------------------------------------------------------------------------- #
set(MPIEXEC_EXECUTABLE_ARGS)
@@ -231,7 +242,7 @@ endif()
function(OMNITRACE_WRITE_TEST_CONFIG _FILE _ENV)
set(_ENV_ONLY
"OMNITRACE_(MODE|USE_MPIP|DEBUG_SETTINGS|FORCE_ROCPROFILER_INIT|DEFAULT_MIN_INSTRUCTIONS|COLORIZED_LOG)="
"OMNITRACE_(MODE|USE_MPIP|DEBUG_SETTINGS|FORCE_ROCPROFILER_INIT|DEFAULT_MIN_INSTRUCTIONS|MONOCHROME)="
)
set(_FILE_CONTENTS)
set(_ENV_CONTENTS)
@@ -436,7 +447,7 @@ function(OMNITRACE_ADD_TEST)
set(_environ
"OMNITRACE_DEFAULT_MIN_INSTRUCTIONS=64" "${TEST_ENVIRONMENT}"
"OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
"OMNITRACE_OUTPUT_PATH=${PROJECT_BINARY_DIR}/omnitrace-tests-output"
"OMNITRACE_OUTPUT_PREFIX=${_prefix}")
set(_timeout ${TEST_REWRITE_TIMEOUT})
@@ -575,7 +586,7 @@ function(OMNITRACE_ADD_CAUSAL_TEST)
set(_environ
"${_causal_environment}"
"OMNITRACE_OUTPUT_PATH=omnitrace-tests-output"
"OMNITRACE_OUTPUT_PATH=${PROJECT_BINARY_DIR}/omnitrace-tests-output"
"OMNITRACE_OUTPUT_PREFIX=${_prefix}"
"OMNITRACE_CI=ON"
"OMNITRACE_USE_PID=OFF"
@@ -739,3 +750,146 @@ function(OMNITRACE_ADD_PYTHON_TEST)
${_TEST_PROPERTIES})
endforeach()
endfunction()
# -------------------------------------------------------------------------------------- #
#
# Find Python3 interpreter for output validation
#
# -------------------------------------------------------------------------------------- #
if(NOT OMNITRACE_USE_PYTHON)
find_package(Python3 QUIET COMPONENTS Interpreter)
if(Python3_FOUND)
set(OMNITRACE_VALIDATION_PYTHON ${Python3_EXECUTABLE})
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import perfetto"
RESULT_VARIABLE OMNITRACE_VALIDATION_PYTHON_PERFETTO)
if(NOT OMNITRACE_VALIDATION_PYTHON_PERFETTO EQUAL 0)
omnitrace_message(AUTHOR_WARNING
"Python3 found but perfetto support is disabled")
endif()
endif()
else()
set(_INDEX 0)
foreach(_VERSION ${OMNITRACE_PYTHON_VERSIONS})
if(NOT OMNITRACE_USE_PYTHON)
continue()
endif()
list(GET OMNITRACE_PYTHON_ROOT_DIRS ${_INDEX} _PYTHON_ROOT_DIR)
omnitrace_find_python(
_PYTHON
ROOT_DIR "${_PYTHON_ROOT_DIR}"
COMPONENTS Interpreter)
if(_PYTHON_EXECUTABLE)
set(OMNITRACE_VALIDATION_PYTHON ${_PYTHON_EXECUTABLE})
execute_process(COMMAND ${_PYTHON_EXECUTABLE} -c "import perfetto"
RESULT_VARIABLE OMNITRACE_VALIDATION_PYTHON_PERFETTO)
# prefer Python3 with perfetto support
if(OMNITRACE_VALIDATION_PYTHON_PERFETTO EQUAL 0)
break()
else()
omnitrace_message(
AUTHOR_WARNING
"${_PYTHON_EXECUTABLE} found but perfetto support is disabled")
endif()
endif()
math(EXPR _INDEX "${_INDEX} + 1")
endforeach()
endif()
if(NOT OMNITRACE_VALIDATION_PYTHON)
omnitrace_message(AUTHOR_WARNING
"Python3 interpreter not found. Validation tests will be disabled")
endif()
# -------------------------------------------------------------------------------------- #
#
# Output validation test function
#
# -------------------------------------------------------------------------------------- #
function(OMNITRACE_ADD_VALIDATION_TEST)
if(NOT OMNITRACE_VALIDATION_PYTHON)
return()
endif()
cmake_parse_arguments(
TEST
""
"NAME;TIMEOUT;TIMEMORY_METRIC;TIMEMORY_FILE;PERFETTO_METRIC;PERFETTO_FILE"
"ENVIRONMENT;LABELS;PROPERTIES;PASS_REGEX;FAIL_REGEX;SKIP_REGEX;DEPENDS;ARGS"
${ARGN})
if(NOT TEST_TIMEOUT)
set(TEST_TIMEOUT 30)
endif()
set(PYTHON_EXECUTABLE "${OMNITRACE_VALIDATION_PYTHON}")
list(APPEND TEST_LABELS "validate")
foreach(_DEP ${TEST_DEPENDS})
list(APPEND TEST_LABELS "validate-${_DEP}")
endforeach()
list(APPEND TEST_DEPENDS "${TEST_NAME}")
if(NOT TEST_PASS_REGEX)
set(TEST_PASS_REGEX
"omnitrace-tests-output/${TEST_NAME}/(${TEST_TIMEMORY_FILE}|${TEST_PERFETTO_FILE}) validated"
)
endif()
add_test(
NAME validate-${TEST_NAME}-timemory
COMMAND
${OMNITRACE_VALIDATION_PYTHON}
${CMAKE_CURRENT_LIST_DIR}/validate-timemory-json.py -m ${TEST_TIMEMORY_METRIC}
${TEST_ARGS} -i
${PROJECT_BINARY_DIR}/omnitrace-tests-output/${TEST_NAME}/${TEST_TIMEMORY_FILE}
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
if(OMNITRACE_VALIDATION_PYTHON_PERFETTO EQUAL 0)
add_test(
NAME validate-${TEST_NAME}-perfetto
COMMAND
${OMNITRACE_VALIDATION_PYTHON}
${CMAKE_CURRENT_LIST_DIR}/validate-perfetto-proto.py -m
${TEST_PERFETTO_METRIC} ${TEST_ARGS} -i
${PROJECT_BINARY_DIR}/omnitrace-tests-output/${TEST_NAME}/${TEST_PERFETTO_FILE}
WORKING_DIRECTORY ${PROJECT_BINARY_DIR})
endif()
foreach(_TEST validate-${TEST_NAME}-timemory validate-${TEST_NAME}-perfetto)
if(NOT TEST "${_TEST}")
continue()
endif()
set_tests_properties(
${_TEST}
PROPERTIES ENVIRONMENT
"${_TEST_ENV}"
TIMEOUT
${TEST_TIMEOUT}
LABELS
"${TEST_LABELS}"
DEPENDS
"${TEST_DEPENDS};${TEST_NAME}"
PASS_REGULAR_EXPRESSION
"${TEST_PASS_REGEX}"
FAIL_REGULAR_EXPRESSION
"${TEST_FAIL_REGEX}"
SKIP_REGULAR_EXPRESSION
"${TEST_SKIP_REGEX}"
REQUIRED_FILES
"${TEST_FILE}"
${TEST_PROPERTIES})
endforeach()
endfunction()
-2
Ver fichero
@@ -274,7 +274,6 @@ def compute_speedups(_data, args):
def get_validations(args):
data = []
_len = len(args.validate)
if _len == 0:
@@ -297,7 +296,6 @@ def get_validations(args):
def main():
import argparse
parser = argparse.ArgumentParser()
+16
Ver fichero
@@ -42,6 +42,9 @@ if __name__ == "__main__":
parser.add_argument(
"-d", "--depths", nargs="+", type=int, help="Expected depths", default=[]
)
parser.add_argument(
"-p", "--print", action="store_true", help="Print the processed perfetto data"
)
parser.add_argument("-i", "--input", type=str, help="Input file", required=True)
args = parser.parse_args()
@@ -54,6 +57,19 @@ if __name__ == "__main__":
ret = 0
with open(args.input) as f:
data = json.load(f)
# demo display of data
if args.print:
for itr in data["timemory"][args.metric]["ranks"][0]["graph"]:
_prefix = itr["prefix"]
_depth = itr["depth"]
_count = itr["entry"]["laps"]
_idx = _prefix.find(">>>")
if _idx is not None:
_prefix = _prefix[(_idx + 4) :]
print("| {:40} | {:6} | {:6} |".format(_prefix, _count, _depth))
try:
validate_json(
data["timemory"][args.metric]["ranks"][0]["graph"],