Rework sampling and colorized logs (#140)

## Overview

This is a significant PR which has 3 very notable characteristics:

1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling 
  - Samples now record the current instruction pointers instead of strings
    - This _dramatically_ decreases the overhead of taking a sample
  - The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  - When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
    - `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency 
  - `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
  - Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
    - In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
  - This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

## Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

## Details

- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
  - there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
  - removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
  - removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule

[ROCm/rocprofiler-systems commit: 808ea7dfa7]
This commit is contained in:
Jonathan R. Madsen
2022-08-31 01:24:31 -05:00
zatwierdzone przez GitHub
rodzic f642813ad1
commit 473f452d39
113 zmienionych plików z 4284 dodań i 2880 usunięć
@@ -210,8 +210,8 @@ format:
markup:
bullet_char: '*'
enum_char: .
first_comment_is_literal: false
literal_comment_pattern: null
first_comment_is_literal: true
literal_comment_pattern: ^#
fence_pattern: ^\s*([`~]{3}[`~]*)(.*)$
ruler_pattern: ^\s*[^\w\s]{3}.*[^\w\s]{3}$
explicit_trailing_pattern: '#<'
@@ -94,11 +94,11 @@ jobs:
ldd $(which omnitrace)
omnitrace --help
omnitrace -e -v 1 -o ls.inst --simulate -- ls
for i in omnitrace-ls.inst-output/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-ls.inst-output -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -o ls.inst -- ls
./ls.inst
omnitrace -e -v 1 --simulate -- ls
for i in omnitrace-ls-output/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-ls-output -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -- ls
- name: Test User API
@@ -123,11 +123,11 @@ jobs:
ldd $(which omnitrace)
omnitrace --help
omnitrace -e -v 1 -o ls.inst --simulate -- ls
for i in omnitrace-ls.inst-output/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-ls.inst-output -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -o ls.inst -- ls
./ls.inst
omnitrace -e -v 1 --simulate -- ls
for i in omnitrace-ls-output/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-ls-output -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -- ls
- name: Test User API
@@ -139,11 +139,11 @@ jobs:
ldd $(which omnitrace)
omnitrace --help
omnitrace -e -v 1 -o ls.inst --simulate -- ls
for i in omnitrace-ls.inst-output/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-ls.inst-output -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -o ls.inst -- ls
./ls.inst
omnitrace -e -v 1 --simulate -- ls
for i in omnitrace-ls-output/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-ls-output -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -- ls
- name: Test User API
@@ -509,11 +509,11 @@ jobs:
ldd $(which omnitrace)
omnitrace --help
omnitrace -e -v 1 -o ls.inst --simulate -- ls
for i in omnitrace-tests-output/ls.inst/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-tests-output/ls.inst -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -o ls.inst -- ls
./ls.inst
omnitrace -e -v 1 --simulate -- ls
for i in omnitrace-tests-output/ls/*; do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
for i in $(find omnitrace-tests-output/ls -type f); do echo -e "\n\n --> ${i} \n\n"; cat ${i}; done
omnitrace -e -v 1 -- ls
- name: Test User API
@@ -214,6 +214,16 @@ omnitrace_add_feature(
OMNITRACE_MAX_THREADS
"Maximum number of total threads supported in the host application (default: max of 128 or 16 * nproc)"
)
set(OMNITRACE_MAX_UNWIND_DEPTH
"64"
CACHE
STRING
"Maximum call-stack depth to search during call-stack unwinding. Decreasing this value will result in sampling consuming less memory"
)
omnitrace_add_feature(
OMNITRACE_MAX_UNWIND_DEPTH
"Maximum call-stack depth to search during call-stack unwinding. Decreasing this value will result in sampling consuming less memory"
)
# default visibility settings
set(CMAKE_C_VISIBILITY_PRESET
@@ -82,6 +82,10 @@ add_flag_if_avail(
"-W" "-Wall" "-Wno-unknown-pragmas" "-Wno-unused-function" "-Wno-ignored-attributes"
"-Wno-attributes" "-Wno-missing-field-initializers")
if(OMNITRACE_BUILD_DEBUG)
add_flag_if_avail("-g" "-gdwarf-3" "-fno-omit-frame-pointer")
endif()
if(WIN32)
# suggested by MSVC for spectre mitigation in rapidjson implementation
add_cxx_flag_if_avail("/Qspectre")
@@ -87,7 +87,7 @@ if(OMNITRACE_CLANG_FORMAT_EXE
if(OMNITRACE_BLACK_FORMAT_EXE)
add_custom_target(
format-omnitrace-python
${OMNITRACE_BLACK_FORMAT_EXE} ${PROJECT_SOURCE_DIR}
${OMNITRACE_BLACK_FORMAT_EXE} -q ${PROJECT_SOURCE_DIR}
COMMENT
"[omnitrace] Running Python formatter ${OMNITRACE_BLACK_FORMAT_EXE}...")
if(NOT TARGET format-python)
@@ -265,8 +265,21 @@ if(OMNITRACE_BUILD_DYNINST)
OFF
CACHE BOOL "Enable LTO for dyninst libraries")
omnitrace_save_variables(PIC VARIABLES CMAKE_POSITION_INDEPENDENT_CODE)
if(NOT DEFINED CMAKE_INSTALL_RPATH)
set(CMAKE_INSTALL_RPATH "")
endif()
if(NOT DEFINED CMAKE_BUILD_RPATH)
set(CMAKE_BUILD_RPATH "")
endif()
omnitrace_save_variables(
PIC VARIABLES CMAKE_POSITION_INDEPENDENT_CODE CMAKE_INSTALL_RPATH
CMAKE_BUILD_RPATH CMAKE_INSTALL_RPATH_USE_LINK_PATH)
set(CMAKE_POSITION_INDEPENDENT_CODE ON)
set(CMAKE_INSTALL_RPATH_USE_LINK_PATH OFF)
set(CMAKE_BUILD_RPATH "\$ORIGIN:\$ORIGIN/omnitrace")
set(CMAKE_INSTALL_RPATH "\$ORIGIN:\$ORIGIN/omnitrace")
set(DYNINST_TPL_INSTALL_PREFIX
"omnitrace"
CACHE PATH "Third-party library install-tree install prefix" FORCE)
@@ -274,7 +287,9 @@ if(OMNITRACE_BUILD_DYNINST)
"omnitrace"
CACHE PATH "Third-party library install-tree install library prefix" FORCE)
add_subdirectory(external/dyninst EXCLUDE_FROM_ALL)
omnitrace_restore_variables(PIC VARIABLES CMAKE_POSITION_INDEPENDENT_CODE)
omnitrace_restore_variables(
PIC VARIABLES CMAKE_POSITION_INDEPENDENT_CODE CMAKE_INSTALL_RPATH
CMAKE_BUILD_RPATH CMAKE_INSTALL_RPATH_USE_LINK_PATH)
add_library(Dyninst::Dyninst INTERFACE IMPORTED)
foreach(_LIB common dyninstAPI parseAPI instructionAPI symtabAPI stackwalk)
@@ -299,7 +314,8 @@ if(OMNITRACE_BUILD_DYNINST)
TARGETS ${_LIB}
DESTINATION ${CMAKE_INSTALL_LIBDIR}/omnitrace
COMPONENT dyninst
PUBLIC_HEADER DESTINATION ${CMAKE_INSTALL_LIBDIR}/omnitrace/include)
PUBLIC_HEADER DESTINATION ${PROJECT_BINARY_DIR}/.discard/omnitrace/include
)
endif()
endforeach()
@@ -502,7 +518,7 @@ endif()
#
# ----------------------------------------------------------------------------------------#
target_compile_definitions(omnitrace-timemory-config INTERFACE TIMEMORY_PAPI_ARRAY_SIZE=16
target_compile_definitions(omnitrace-timemory-config INTERFACE TIMEMORY_PAPI_ARRAY_SIZE=12
TIMEMORY_USE_ROOFLINE=0)
if(OMNITRACE_BUILD_STACK_PROTECTOR)
@@ -510,6 +526,11 @@ if(OMNITRACE_BUILD_STACK_PROTECTOR)
"-Wstack-protector")
endif()
if(OMNITRACE_BUILD_DEBUG)
add_target_flag_if_avail(omnitrace-timemory-config "-fno-omit-frame-pointer" "-g"
"-gdwarf-3")
endif()
set(TIMEMORY_EXTERNAL_INTERFACE_LIBRARY
omnitrace-timemory-config
CACHE STRING "timemory configuration interface library")
@@ -563,6 +584,9 @@ set(TIMEMORY_USE_PAPI
set(TIMEMORY_USE_LIBUNWIND
ON
CACHE BOOL "Enable libunwind support in timemory")
set(TIMEMORY_USE_VISIBILITY
OFF
CACHE BOOL "Enable/disable using visibility decorations")
if(DEFINED TIMEMORY_BUILD_GOTCHA AND NOT TIMEMORY_BUILD_GOTCHA)
omnitrace_message(
@@ -581,6 +605,9 @@ set(TIMEMORY_BUILD_LIBUNWIND
set(TIMEMORY_BUILD_EXTRA_OPTIMIZATIONS
${OMNITRACE_BUILD_EXTRA_OPTIMIZATIONS}
CACHE BOOL "Enable building GOTCHA library from submodule" FORCE)
set(TIMEMORY_BUILD_ERT
OFF
CACHE BOOL "Disable building ERT support" FORCE)
# timemory build settings
set(TIMEMORY_TLS_MODEL
@@ -595,6 +622,10 @@ set(TIMEMORY_SETTINGS_PREFIX
set(TIMEMORY_PROJECT_NAME
"omnitrace"
CACHE STRING "Name for configuration")
set(TIMEMORY_CXX_LIBRARY_EXCLUDE
"kokkosp.cpp;pthread.cpp;timemory_c.cpp;trace.cpp;weak.cpp;library.cpp"
CACHE STRING "Timemory C++ library implementation files to exclude from compiling")
mark_as_advanced(TIMEMORY_SETTINGS_PREFIX)
mark_as_advanced(TIMEMORY_PROJECT_NAME)
@@ -2,6 +2,14 @@ cmake_minimum_required(VERSION 3.16 FATAL_ERROR)
project(omnitrace-examples LANGUAGES C CXX)
if("${CMAKE_BUILD_TYPE}" STREQUAL "")
set(CMAKE_BUILD_TYPE
"RelWithDebInfo"
CACHE STRING "Build type" FORCE)
endif()
string(TOUPPER "${CMAKE_BUILD_TYPE}" BUILD_TYPE)
set(CMAKE_VISIBILITY_INLINES_HIDDEN OFF)
set(CMAKE_CXX_VISIBILITY_PRESET "default")
set(CMAKE_CXX_STANDARD 17)
@@ -9,6 +17,10 @@ set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_CLANG_TIDY)
set(CMAKE_INSTALL_DEFAULT_COMPONENT_NAME examples)
if(OMNITRACE_BUILD_DEBUG)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -gdwarf-3 -fno-omit-frame-pointer")
endif()
option(BUILD_SHARED_LIBS "Build dynamic libraries" ON)
if(CMAKE_PROJECT_NAME STREQUAL "omnitrace")
@@ -10,7 +10,7 @@ option(LULESH_BUILD_KOKKOS "Build Kokkos from submodule" ON)
if(LULESH_BUILD_KOKKOS)
add_subdirectory(external)
if(LULESH_USE_CUDA)
kokkos_compilation(PROJECT)
kokkos_compilation(PROJECT COMPILER ${Kokkos_NVCC_WRAPPER})
elseif(LULESH_USE_HIP AND NOT "${CMAKE_CXX_COMPILER}" MATCHES "hipcc")
if(NOT HIPCC_EXECUTABLE)
find_package(hip QUIET HINTS ${ROCmVersion_DIR} PATHS ${ROCmVersion_DIR})
@@ -158,8 +158,8 @@ run(MPI_Comm _comm, int nitr)
MPI_Comm_rank(_comm, &_rank);
MPI_Comm_size(_comm, &_size);
printf("[%s][%i] running %i iterations on %i ranks...\n", _name.c_str(), _rank, nitr,
_size);
printf("[%s][%i][%s] running %i iterations on %i ranks... \n", _name.c_str(), _rank,
__FUNCTION__, nitr, _size);
MPI_Barrier(_comm);
for(int i = 0; i < nitr; ++i)
@@ -175,6 +175,9 @@ run(MPI_Comm _comm, int nitr)
all2all<double, 6>(_rank, _comm);
}
MPI_Barrier(_comm);
printf("[%s][%i][%s] running %i iterations on %i ranks... Done\n", _name.c_str(),
_rank, __FUNCTION__, nitr, _size);
}
void
@@ -235,8 +238,6 @@ run_main(int argc, char** argv)
print_info(MPI_COMM_WORLD, true, "MPI_COMM_WORLD");
printf("[%s]\n", _name.c_str());
if(size > 1)
{
MPI_Comm dup;
@@ -313,6 +314,8 @@ run_main(int argc, char** argv)
print_info(dup, false);
}
printf("[%s][%i of %i] %s... Done", _name.c_str(), rank, size, __FUNCTION__);
}
int
@@ -325,13 +328,12 @@ main(int argc, char** argv)
auto _prom = std::promise<void>{};
auto _fut = _prom.get_future();
std::thread _thr{ [&]() {
std::thread{ [&]() {
_prom.set_value_at_thread_exit();
run_main(argc, argv);
_prom.set_value();
} };
} }.join();
_fut.wait();
_thr.join();
MPI_Finalize();
return EXIT_SUCCESS;
@@ -14,6 +14,9 @@ add_executable(openmp-lu ${CMAKE_CURRENT_SOURCE_DIR}/LU/lu.cpp
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
find_package(OpenMP REQUIRED)
target_link_libraries(openmp-common PUBLIC OpenMP::OpenMP_CXX)
set(OMNITRACE_OPENMP_USING_LIBOMP_LIBRARY
ON
CACHE INTERNAL "Used by omnitrace testing" FORCE)
else()
find_program(CLANGXX_EXECUTABLE NAMES clang++)
find_library(
@@ -27,9 +30,15 @@ else()
omnitrace_custom_compilation(COMPILER ${CLANGXX_EXECUTABLE} TARGET openmp-common)
omnitrace_custom_compilation(COMPILER ${CLANGXX_EXECUTABLE} TARGET openmp-cg)
omnitrace_custom_compilation(COMPILER ${CLANGXX_EXECUTABLE} TARGET openmp-lu)
set(OMNITRACE_OPENMP_USING_LIBOMP_LIBRARY
ON
CACHE INTERNAL "Used by omnitrace testing" FORCE)
else()
find_package(OpenMP REQUIRED)
target_link_libraries(openmp-common PUBLIC OpenMP::OpenMP_CXX)
set(OMNITRACE_OPENMP_USING_LIBOMP_LIBRARY
OFF
CACHE INTERNAL "Used by omnitrace testing" FORCE)
endif()
endif()
@@ -14,11 +14,13 @@
#if USE_LOCKS > 0
# include <mutex>
using auto_lock_t = std::unique_lock<std::mutex>;
long total = 0;
using auto_lock_t = std::unique_lock<std::mutex>;
long total = 0;
long lock_count = 0;
std::mutex mtx{};
#else
std::atomic<long> total{ 0 };
long lock_count = 0;
#endif
long
@@ -52,6 +54,7 @@ run(size_t nitr, long n)
auto _v = fib(_get_n());
auto_lock_t _lk{ mtx };
total += _v;
++lock_count;
}
#else
long local = 0;
@@ -110,6 +113,7 @@ main(int argc, char** argv)
printf("[%s] fibonacci(%li) x %lu = %li\n", _name.c_str(), nfib, nthread,
static_cast<long>(total));
printf("[%s] number of mutex locks = %li\n", _name.c_str(), lock_count);
return 0;
}
@@ -2,19 +2,22 @@ cmake_minimum_required(VERSION 3.16 FATAL_ERROR)
project(omnitrace-python)
set(PYTHON_FILES builtin.py external.py source.py noprofile.py)
set(PYTHON_FILES builtin.py external.py source.py noprofile.py fill.py)
if(OMNITRACE_INSTALL_EXAMPLES)
find_package(Python3 COMPONENTS Interpreter)
if(Python3_FOUND)
set(PYTHON_EXECUTABLE "${Python3_EXECUTABLE}")
foreach(_FILE ${PYTHON_FILES})
configure_file(${PROJECT_SOURCE_DIR}/${_FILE} ${PROJECT_BINARY_DIR}/${_FILE}
@ONLY)
find_package(Python3 COMPONENTS Interpreter)
if(Python3_FOUND)
set(PYTHON_EXECUTABLE "${Python3_EXECUTABLE}")
foreach(_FILE ${PYTHON_FILES})
configure_file(${PROJECT_SOURCE_DIR}/${_FILE} ${PROJECT_BINARY_DIR}/${_FILE}
@ONLY)
if(OMNITRACE_INSTALL_EXAMPLES)
install(
PROGRAMS ${PROJECT_BINARY_DIR}/${_FILE}
DESTINATION bin
COMPONENT omnitrace-examples)
endforeach()
endif()
endif()
endforeach()
endif()
+39
Wyświetl plik
@@ -0,0 +1,39 @@
#!@PYTHON_EXECUTABLE@
import os
import sys
import time
import omnitrace
from omnitrace.user import region as omni_user_region
from omnitrace.profiler import config as omni_config
_prefix = ""
def loop(n):
pass
@omnitrace.profile()
def run(i, n, v):
for l in range(n * n):
loop(v + l)
return v + (n * n)
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("-n", "--num-iterations", help="Number", type=int, default=100)
parser.add_argument("-v", "--value", help="Starting value", type=int, default=10)
args = parser.parse_args()
omni_config.include_args = True
_prefix = os.path.basename(__file__)
print(f"[{_prefix}] Executing {args.num_iterations} iterations...\n")
ans = 0
for i in range(args.num_iterations):
beg = ans
ans = run(i, args.value, beg)
print(f"[{_prefix}] [{i}] result of run({args.value}, {beg}) = {ans}")
@@ -1,5 +0,0 @@
[tool.black]
line-length = 80
target-version = ['py38']
include = '\.py'
Submodule projects/rocprofiler-systems/external/timemory updated: 48f4735fb7...2f209b7dff
@@ -24,15 +24,13 @@
#include "enumerated_list.hpp"
#include "get_availability.hpp"
#include "library/api.hpp"
#include "api.hpp"
#include "library/components/backtrace.hpp"
#include "library/components/fork_gotcha.hpp"
#include "library/components/mpi_gotcha.hpp"
#include "library/components/omnitrace.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/components/rocprofiler.hpp"
#include "library/components/roctracer.hpp"
#include "library/components/user_region.hpp"
#include <timemory/components/definition.hpp>
#include <timemory/enum.h>
@@ -22,7 +22,7 @@
#include "critical-trace.hpp"
#include "library/api.hpp"
#include "api.hpp"
#include "library/config.hpp"
#include "library/perfetto.hpp"
@@ -54,7 +54,7 @@ main(int argc, char** argv)
// config::set_setting_value("OMNITRACE_CRITICAL_TRACE_DEBUG", true);
config::set_setting_value<int64_t>("OMNITRACE_CRITICAL_TRACE_COUNT", 500);
config::set_setting_value<int64_t>("OMNITRACE_CRITICAL_TRACE_PER_ROW", 100);
config::set_setting_value<int64_t>("OMNITRACE_CRITICAL_TRACE_NUM_THREADS",
config::set_setting_value<int64_t>("OMNITRACE_THREAD_POOL_SIZE",
std::thread::hardware_concurrency());
config::set_setting_value("OMNITRACE_CRITICAL_TRACE_SERIALIZE_NAMES", true);
@@ -856,8 +856,7 @@ compute_critical_trace()
try
{
PTL::ThreadPool _tp{ get_critical_trace_num_threads(), []() { copy_hash_ids(); },
[]() {} };
PTL::ThreadPool _tp{ get_thread_pool_size(), []() { copy_hash_ids(); }, []() {} };
_tp.set_verbose(-1);
PTL::TaskGroup<void> _tg{ &_tp };
@@ -22,9 +22,13 @@ target_sources(
target_link_libraries(
omnitrace-exe
PRIVATE omnitrace::omnitrace-headers omnitrace::omnitrace-dyninst
omnitrace::omnitrace-compile-options omnitrace::omnitrace-compile-definitions
omnitrace::omnitrace-sanitizer timemory::timemory-headers)
PRIVATE omnitrace::omnitrace-headers
omnitrace::omnitrace-dyninst
omnitrace::omnitrace-compile-options
omnitrace::omnitrace-compile-definitions
omnitrace::omnitrace-sanitizer
timemory::timemory-headers
timemory::timemory-extensions)
set_target_properties(
omnitrace-exe
@@ -27,6 +27,7 @@
#include <timemory/mpl/policy.hpp>
#include <timemory/settings.hpp>
#include <timemory/settings/types.hpp>
#include <timemory/tpls/cereal/cereal.hpp>
#include <timemory/utility/delimit.hpp>
#include <timemory/utility/filepath.hpp>
@@ -60,7 +61,9 @@ dump_info(const string_t& _label, string_t _oname, const string_t& _ext,
namespace cereal = tim::cereal;
namespace policy = tim::policy;
_oname = tim::settings::compose_output_filename(_oname, _ext);
auto _cfg = tim::settings::compose_filename_config{};
_cfg.subdirectory = "instrumentation";
_oname = tim::settings::compose_output_filename(_oname, _ext, _cfg);
auto _handle_error = [&]() {
std::stringstream _msg{};
_msg << "[dump_info] Error opening '" << _oname << " for output";
@@ -195,7 +195,7 @@ main(int argc, char** argv)
};
std::set<std::string> dyninst_defs = { "TypeChecking", "SaveFPR", "DelayedParsing",
"MergeTramp" };
"DebugParsing", "MergeTramp" };
int _argc = argc;
int _cmdc = 0;
@@ -302,7 +302,7 @@ main(int argc, char** argv)
.add_argument({ "--simulate" },
"Exit after outputting diagnostic "
"{available,instrumented,excluded,overlapping} module "
"function lists, e.g. available-instr.txt")
"function lists, e.g. available.txt")
.max_count(1)
.dtype("bool")
.action([](parser_t& p) { simulate = p.get<bool>("simulate"); });
@@ -310,7 +310,7 @@ main(int argc, char** argv)
.add_argument({ "--print-format" },
"Output format for diagnostic "
"{available,instrumented,excluded,overlapping} module "
"function lists, e.g. {print-dir}/available-instr.txt")
"function lists, e.g. {print-dir}/available.txt")
.min_count(1)
.max_count(3)
.dtype("string")
@@ -320,7 +320,7 @@ main(int argc, char** argv)
.add_argument({ "--print-dir" },
"Output directory for diagnostic "
"{available,instrumented,excluded,overlapping} module "
"function lists, e.g. {print-dir}/available-instr.txt")
"function lists, e.g. {print-dir}/available.txt")
.count(1)
.dtype("string")
.action([](parser_t& p) {
@@ -1040,7 +1040,7 @@ main(int argc, char** argv)
bpatch->setTypeChecking(true);
bpatch->setSaveFPR(true);
bpatch->setDelayedParsing(true);
bpatch->setDebugParsing(false);
bpatch->setDebugParsing(true);
bpatch->setInstrStackFrames(false);
bpatch->setLivenessAnalysis(false);
bpatch->setBaseTrampDeletion(false);
@@ -1276,9 +1276,9 @@ main(int argc, char** argv)
std::cout << '\n' << std::endl;
}
dump_info("available-instr", available_module_functions, 1, werror, "available-instr",
dump_info("available", available_module_functions, 1, werror, "available",
print_formats);
dump_info("overlapping-instr", overlapping_module_functions, 1, werror,
dump_info("overlapping", overlapping_module_functions, 1, werror,
"overlapping_module_functions", print_formats);
//----------------------------------------------------------------------------------//
@@ -1943,16 +1943,16 @@ main(int argc, char** argv)
//
//----------------------------------------------------------------------------------//
dump_info("available-instr", available_module_functions, 0, werror,
dump_info("available", available_module_functions, 0, werror,
"available_module_functions", print_formats);
dump_info("instrumented-instr", instrumented_module_functions, 0, werror,
dump_info("instrumented", instrumented_module_functions, 0, werror,
"instrumented_module_functions", print_formats);
dump_info("excluded-instr", excluded_module_functions, 0, werror,
dump_info("excluded", excluded_module_functions, 0, werror,
"excluded_module_functions", print_formats);
if(coverage_mode != CODECOV_NONE)
dump_info("coverage-instr", coverage_module_functions, 0, werror,
dump_info("coverage", coverage_module_functions, 0, werror,
"coverage_module_functions", print_formats);
dump_info("overlapping-instr", overlapping_module_functions, 0, werror,
dump_info("overlapping", overlapping_module_functions, 0, werror,
"overlapping_module_functions", print_formats);
auto _dump_info = [](const std::string& _label, const string_t& _mode,
@@ -174,10 +174,10 @@ omnitrace_add_bin_test(
omnitrace_add_bin_test(
NAME omnitrace-exe-simulate-ls-check
DEPENDS omnitrace-exe-simulate-ls
COMMAND ls omnitrace-tests-output/omnitrace-exe-simulate-ls
COMMAND ls omnitrace-tests-output/omnitrace-exe-simulate-ls/instrumentation
TIMEOUT 60
PASS_REGEX
".*available-instr.json.*available-instr.txt.*available-instr.xml.*excluded-instr.json.*excluded-instr.txt.*excluded-instr.xml.*instrumented-instr.json.*instrumented-instr.txt.*instrumented-instr.xml.*overlapping-instr.json.*overlapping-instr.txt.*overlapping-instr.xml.*"
".*available.json.*available.txt.*available.xml.*excluded.json.*excluded.txt.*excluded.xml.*instrumented.json.*instrumented.txt.*instrumented.xml.*overlapping.json.*overlapping.txt.*overlapping.xml.*"
)
omnitrace_add_bin_test(
@@ -265,7 +265,8 @@ omnitrace_add_bin_test(
--advanced
LABELS "omnitrace-avail"
TIMEOUT 45
PASS_REGEX "ENVIRONMENT VARIABLE,[ \n]+OMNITRACE_USE_PID,[ \n]+"
PASS_REGEX
"ENVIRONMENT VARIABLE,[ \n]+OMNITRACE_THREAD_POOL_SIZE,[ \n]+OMNITRACE_USE_PID,[ \n]+"
FAIL_REGEX "OMNITRACE_USE_PERFETTO")
string(REPLACE "+" "\\\+" _AVAIL_CFG_PATH
@@ -524,125 +524,299 @@ component explicitly sets type-traits which specify that the data is only releva
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
```
### Timemory Flat JSON Output
### Timemory JSON Output
> ***Hint: the generation of flat JSON output is configurable via `OMNITRACE_JSON_OUTPUT`***
> ***Hint: the generation of flat JSON output is configurable via `OMNITRACE_JSON_OUTPUT`.***
> ***The generation of hierarchical JSON data is configurable via `OMNITRACE_TREE_OUTPUT`.***
Timemory provides two JSON output formats. The flat JSON output files are similar to the text files: the hierarchical information
is represented by the indentation of the `"prefix"` field and the `"depth"` field. All the data entries are in a single JSON array,
e.g. the `["timemory"]["wall_clock"]["ranks"][0]["graph"][<N>]["prefix"]` entry in the below:
Timemory represents the data within the JSON output in two forms: a flat structure and a hierarchical structure.
The flat JSON data represents the data similar to the text files: the hierarchical information
is represented by the indentation of the `"prefix"` field and the `"depth"` field.
The hierarchical JSON contains additional information with respect to inclusive and exclusive value, however,
it's structure requires processing through recursion. This section of the JSON supports analysis
by [hatchet](https://github.com/hatchet/hatchet).
All the data entries for the flat structure are in a single JSON array.
This format is easier than the hierarchical format to write a simple Python script for post-processing.
#### Timemory JSON Output Sample
In the JSON below, the flat data starts at `["timemory"]["wall_clock"]["ranks"]`
and the hierarchical data starts at `["timemory"]["wall_clock"]["graph"]`.
E.g., accessing the name (prefix) of the nth entry in the flat data layout is:
`["timemory"]["wall_clock"]["ranks"][0]["graph"][<N>]["prefix"]`. When full MPI
support is enable, the per-rank data in flat layout will be represented
in as an entry in the "ranks" array; in the hierarchical data structure,
the per-rank data is represented as entry in the "mpi" array (but "graph"
is used in lieu of "mpi" when full MPI support is enabled).
In the hierarchical layout, all data for the process is all a child of a (dummy)
root node (which has the name `unknown-hash=0`).
```json
{
"timemory": {
"wall_clock": {
"description": "Real-clock timer (i.e. wall-clock timer)",
"thread_count": 12,
"process_count": 1,
"properties": {
"cereal_class_version": 0,
"value": 78,
"enum": "WALL_CLOCK",
"id": "wall_clock",
"value": 78,
"ids": [
"real_clock",
"virtual_clock",
"wall_clock"
]
},
"mpi_size": 0,
"num_ranks": 1,
"concurrency": 12,
"upcxx_size": 1,
"unit_value": 1000000000,
"thread_scope_only": false,
"type": "wall_clock",
"description": "Real-clock timer (i.e. wall-clock timer)",
"unit_value": 1000000000,
"unit_repr": "sec",
"thread_scope_only": false,
"thread_count": 2,
"mpi_size": 1,
"upcxx_size": 1,
"process_count": 1,
"num_ranks": 1,
"concurrency": 2,
"ranks": [
{
"graph_size": 173,
"rank": 0,
"graph_size": 112,
"graph": [
{
"hash": 17481650134347108265,
"prefix": "|0>>> main",
"depth": 0,
"stats": {
"count": 1,
"min": 13.360264917,
"sqr": 178.49667865242102,
"sum": 13.360264917,
"stddev": 0.0,
"max": 13.360264917,
"cereal_class_version": 0,
"mean": 13.360264917
},
"prefix": "|00>>> main",
"rolling_hash": 17481650134347108265,
"entry": {
"repr_display": 13.360264917,
"value": 13360264917,
"repr_data": 13.360264917,
"cereal_class_version": 0,
"accum": 13360264917,
"laps": 1
"laps": 1,
"value": 894743517,
"accum": 894743517,
"repr_data": 0.894743517,
"repr_display": 0.894743517
},
"hash": 17481650134347108265
"stats": {
"cereal_class_version": 0,
"sum": 0.894743517,
"count": 1,
"min": 0.894743517,
"max": 0.894743517,
"sqr": 0.8005659612135293,
"mean": 0.894743517,
"stddev": 0.0
},
"rolling_hash": 17481650134347108265
},
{
"hash": 3455444288293231339,
"prefix": "|0>>> |_read_input",
"depth": 1,
"stats": {
"count": 1,
"min": 10.924160502,
"max": 10.924160502,
"sum": 10.924160502,
"stddev": 0.0,
"sqr": 119.33728267345688,
"mean": 10.924160502
},
"prefix": "|00>>> |_ompt_thread_initial",
"rolling_hash": 5142782188440775656,
"entry": {
"repr_display": 10.924160502,
"laps": 1,
"accum": 10924160502,
"repr_data": 10.924160502,
"value": 10924160502
"value": 9808,
"accum": 9808,
"repr_data": 9.808e-06,
"repr_display": 9.808e-06
},
"hash": 6107876127803219007
"stats": {
"sum": 9.808e-06,
"count": 1,
"min": 9.808e-06,
"max": 9.808e-06,
"sqr": 9.6196864e-11,
"mean": 9.808e-06,
"stddev": 0.0
},
"rolling_hash": 2490350348930787988
},
{
"depth": 2,
"stats": {
"count": 1,
"min": 10.923050237,
"max": 10.923050237,
"sum": 10.923050237,
"stddev": 0.0,
"sqr": 119.31302648002575,
"mean": 10.923050237
},
"prefix": "|00>>> |_ompt_implicit_task",
"rolling_hash": 2098840206724841601,
"hash": 8456966793631718807,
"prefix": "|0>>> |_setcoeff",
"depth": 1,
"entry": {
"repr_display": 10.923050237,
"laps": 1,
"accum": 10923050237,
"repr_data": 10.923050237,
"value": 10923050237
"value": 922,
"accum": 922,
"repr_data": 9.22e-07,
"repr_display": 9.22e-07
},
"hash": 15402802091993617561
"stats": {
"sum": 9.22e-07,
"count": 1,
"min": 9.22e-07,
"max": 9.22e-07,
"sqr": 8.50084e-13,
"mean": 9.22e-07,
"stddev": 0.0
},
"rolling_hash": 7491872854269275456
},
{
"hash": 6107876127803219007,
"prefix": "|0>>> |_ompt_thread_initial",
"depth": 1,
"entry": {
"laps": 1,
"value": 896506392,
"accum": 896506392,
"repr_data": 0.896506392,
"repr_display": 0.896506392
},
"stats": {
"sum": 0.896506392,
"count": 1,
"min": 0.896506392,
"max": 0.896506392,
"sqr": 0.8037237108968578,
"mean": 0.896506392,
"stddev": 0.0
},
"rolling_hash": 5142782188440775656
},
{
"hash": 15402802091993617561,
"prefix": "|0>>> |_ompt_implicit_task",
"depth": 2,
"entry": {
"laps": 1,
"value": 896479111,
"accum": 896479111,
"repr_data": 0.896479111,
"repr_display": 0.896479111
},
"stats": {
"sum": 0.896479111,
"count": 1,
"min": 0.896479111,
"max": 0.896479111,
"sqr": 0.8036747964593504,
"mean": 0.896479111,
"stddev": 0.0
},
"rolling_hash": 2098840206724841601 },
{
"..." : "... etc. ..."
}
]
}
],
"graph": [
[
{
"cereal_class_version": 0,
"node": {
"hash": 0,
"prefix": "unknown-hash=0",
"tid": [
0
],
"pid": [
2539175
],
"depth": 0,
"is_dummy": false,
"inclusive": {
"entry": {
"laps": 0,
"value": 0,
"accum": 0,
"repr_data": 0.0,
"repr_display": 0.0
},
"stats": {
"sum": 0.0,
"count": 0,
"min": 0.0,
"max": 0.0,
"sqr": 0.0,
"mean": 0.0,
"stddev": 0.0
}
},
"exclusive": {
"entry": {
"laps": 0,
"value": -894743517,
"accum": -894743517,
"repr_data": -0.894743517,
"repr_display": -0.894743517
},
"stats": {
"sum": 0.0,
"count": 0,
"min": 0.0,
"max": 0.0,
"sqr": 0.0,
"mean": 0.0,
"stddev": 0.0
}
}
},
"children": [
{
"node": {
"hash": 17481650134347108265,
"prefix": "main",
"tid": [
0
],
"pid": [
2539175
],
"depth": 1,
"is_dummy": false,
"inclusive": {
"entry": {
"laps": 1,
"value": 894743517,
"accum": 894743517,
"repr_data": 0.894743517,
"repr_display": 0.894743517
},
"stats": {
"sum": 0.894743517,
"count": 1,
"min": 0.894743517,
"max": 0.894743517,
"sqr": 0.8005659612135293,
"mean": 0.894743517,
"stddev": 0.0
}
},
"exclusive": {
"entry": {
"laps": 1,
"value": -1773605,
"accum": -1773605,
"repr_data": -0.001773605,
"repr_display": -0.001773605
},
"stats": {
"sum": -0.001773605,
"count": 1,
"min": 9.22e-07,
"max": 0.896506392,
"sqr": -0.0031577497803754,
"mean": -0.001773605,
"stddev": 0.0
}
}
},
"children": [
{
"..." : "... etc. ..."
}
]
}
]
}
]
]
}
}
}
```
This format is easier than the hierarchical format to write a simple Python script for post-processing, e.g.:
#### Timemory JSON Output Python Post-Processing Example
```python
#!/usr/bin/env python3
@@ -708,11 +882,3 @@ This script applied to the corresponding JSON output from [Text Output Example](
[openmp-cg.inst-wall_clock.json] Found metric: wall_clock
[openmp-cg.inst-wall_clock.json] Maximum value: 'conj_grad' at depth 6 was called 76x :: 10.641 sec (mean = 1.400e-01 sec)
```
### Timemory Hierarchical JSON Output
> ***Hint: the generation of hierarchical JSON output is configurable via `OMNITRACE_TREE_OUTPUT`***
The hierarchical JSON output (extension: `.tree.json`) contains the very similar data to the flat JSON output, however,
it's structure requires processing through recursion. The main use of these files are their analysis support
by [hatchet](https://github.com/hatchet/hatchet).
@@ -191,7 +191,7 @@ OMNITRACE_CRITICAL_TRACE = false
OMNITRACE_CRITICAL_TRACE_BUFFER_COUNT = 2000
OMNITRACE_CRITICAL_TRACE_COUNT = 0
OMNITRACE_CRITICAL_TRACE_DEBUG = false
OMNITRACE_CRITICAL_TRACE_NUM_THREADS = 8
OMNITRACE_THREAD_POOL_SIZE = 8
OMNITRACE_CRITICAL_TRACE_PER_ROW = 0
OMNITRACE_CRITICAL_TRACE_SERIALIZE_NAMES = false
OMNITRACE_DEBUG = false
@@ -287,7 +287,7 @@ $ omnitrace-avail -S -bd
| OMNITRACE_CRITICAL_TRACE_BUFFER_COUNT | Number of critical trace records to ... |
| OMNITRACE_CRITICAL_TRACE_COUNT | Number of critical trace to export (... |
| OMNITRACE_CRITICAL_TRACE_DEBUG | Enable debugging for critical trace |
| OMNITRACE_CRITICAL_TRACE_NUM_THREADS | Number of threads to use when genera... |
| OMNITRACE_THREAD_POOL_SIZE | Number of threads to use when genera... |
| OMNITRACE_CRITICAL_TRACE_PER_ROW | How many critical traces per row in ... |
| OMNITRACE_CRITICAL_TRACE_SERIALIZE_N... | Include names in serialization of cr... |
| OMNITRACE_DEBUG | Enable debug output |
@@ -1200,21 +1200,20 @@ OMNITRACE_USE_PERFETTO = $ENABLE
OMNITRACE_USE_TIMEMORY = $ENABLE
OMNITRACE_USE_SAMPLING = $SAMPLE
OMNITRACE_USE_PROCESS_SAMPLING = $SAMPLE
OMNITRACE_CRITICAL_TRACE = OFF
# debug
OMNITRACE_DEBUG = OFF
OMNITRACE_VERBOSE = 1
# output fields
OMNITRACE_OUTPUT_PATH = omnitrace-example-output
OMNITRACE_OUTPUT_PATH = omnitrace-output
OMNITRACE_OUTPUT_PREFIX = %tag%/
OMNITRACE_TIME_OUTPUT = OFF
OMNITRACE_USE_PID = OFF
# timemory fields
OMNITRACE_PAPI_EVENTS = PAPI_TOT_INS PAPI_FP_INS
OMNITRACE_TIMEMORY_COMPONENTS = wall_clock trip_count
OMNITRACE_TIMEMORY_COMPONENTS = wall_clock peak_rss trip_count
OMNITRACE_MEMORY_UNITS = MB
OMNITRACE_TIMING_UNITS = sec
@@ -1226,7 +1225,6 @@ OMNITRACE_SAMPLING_GPUS = $env:HIP_VISIBLE_DEVICES
# misc env variables (see metadata JSON file after run)
$env:OMNITRACE_SAMPLING_KEEP_DYNINST_SUFFIX = OFF
$env:OMNITRACE_SAMPLING_KEEP_INTERNAL = OFF
```
### Sample JSON Configuration File
@@ -26,7 +26,9 @@ get_filename_component(COMMON_BINARY_INCLUDE_DIR "${CMAKE_CURRENT_BINARY_DIR}" D
target_include_directories(
omnitrace-common-library
INTERFACE $<BUILD_INTERFACE:${COMMON_SOURCE_INCLUDE_DIR}>
INTERFACE $<BUILD_INTERFACE:${COMMON_BINARY_INCLUDE_DIR}>)
$<BUILD_INTERFACE:${COMMON_BINARY_INCLUDE_DIR}>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/external/timemory/source>
$<BUILD_INTERFACE:${PROJECT_BINARY_DIR}/external/timemory/source>)
target_compile_definitions(omnitrace-common-library
INTERFACE $<BUILD_INTERFACE:OMNITRACE_INTERNAL_BUILD=1>)
@@ -45,6 +45,28 @@
((10000 * OMNITRACE_HIP_VERSION_MAJOR) + (100 * OMNITRACE_HIP_VERSION_MINOR) + \
OMNITRACE_HIP_VERSION_PATCH)
// clang-format off
#if !defined(OMNITRACE_MAX_THREADS)
# define OMNITRACE_MAX_THREADS @OMNITRACE_MAX_THREADS@
#endif
#if !defined(OMNITRACE_MAX_UNWIND_DEPTH)
# define OMNITRACE_MAX_UNWIND_DEPTH @OMNITRACE_MAX_UNWIND_DEPTH@
#endif
// clang-format on
#if !defined(OMNITRACE_MAX_COUNTERS)
# define OMNITRACE_MAX_COUNTERS 25
#endif
#if !defined(OMNITRACE_ROCM_LOOK_AHEAD)
# define OMNITRACE_ROCM_LOOK_AHEAD 128
#endif
#if !defined(OMNITRACE_MAX_ROCM_QUEUES)
# define OMNITRACE_MAX_ROCM_QUEUES OMNITRACE_MAX_THREADS
#endif
#define OMNITRACE_ATTRIBUTE(...) __attribute__((__VA_ARGS__))
#define OMNITRACE_VISIBILITY(MODE) OMNITRACE_ATTRIBUTE(visibility(MODE))
#define OMNITRACE_PUBLIC_API OMNITRACE_VISIBILITY("default")
@@ -35,6 +35,14 @@
# error OMNITRACE_COMMON_LIBRARY_NAME must be defined
#endif
#if !defined(OMNITRACE_COMMON_LIBRARY_LOG_START)
# define OMNITRACE_COMMON_LIBRARY_LOG_START
#endif
#if !defined(OMNITRACE_COMMON_LIBRARY_LOG_END)
# define OMNITRACE_COMMON_LIBRARY_LOG_END
#endif
namespace omnitrace
{
inline namespace common
@@ -98,11 +106,13 @@ invoke(const char* _name, int _verbose, bool& _toggle, FuncT&& _func, Args... _a
if(_verbose >= 3)
{
fflush(stderr);
OMNITRACE_COMMON_LIBRARY_LOG_START
fprintf(stderr,
"[omnitrace][" OMNITRACE_COMMON_LIBRARY_NAME
"][%li][%i] %s(%s)\n",
get_thread_index(), _lk, _name,
join(QuoteStrings{}, ", ", _args...).c_str());
OMNITRACE_COMMON_LIBRARY_LOG_END
fflush(stderr);
}
return std::invoke(std::forward<FuncT>(_func), _args...);
@@ -110,20 +120,24 @@ invoke(const char* _name, int _verbose, bool& _toggle, FuncT&& _func, Args... _a
else if(_verbose >= 2)
{
fflush(stderr);
OMNITRACE_COMMON_LIBRARY_LOG_START
fprintf(stderr,
"[omnitrace][" OMNITRACE_COMMON_LIBRARY_NAME
"][%li] %s(%s) was guarded :: value = %i\n",
get_thread_index(), _name,
join(QuoteStrings{}, ", ", _args...).c_str(), _lk);
OMNITRACE_COMMON_LIBRARY_LOG_END
fflush(stderr);
}
}
else if(_verbose >= 0)
{
OMNITRACE_COMMON_LIBRARY_LOG_START
fprintf(stderr,
"[omnitrace][" OMNITRACE_COMMON_LIBRARY_NAME
"][%li] %s(%s) ignored :: null function pointer\n",
get_thread_index(), _name, join(QuoteStrings{}, ", ", _args...).c_str());
OMNITRACE_COMMON_LIBRARY_LOG_END
}
using return_type = decltype(std::invoke(std::forward<FuncT>(_func), _args...));
@@ -42,12 +42,35 @@
# endif
#endif
#if !defined(OMNITRACE_SETUP_LOG_START)
# if defined(OMNITRACE_COMMON_LIBRARY_LOG_START)
# define OMNITRACE_SETUP_LOG_START OMNITRACE_COMMON_LIBRARY_LOG_START
# elif defined(TIMEMORY_LOG_COLORS_AVAILABLE)
# define OMNITRACE_SETUP_LOG_START \
fprintf(stderr, "%s", ::tim::log::color::info());
# else
# define OMNITRACE_SETUP_LOG_START
# endif
#endif
#if !defined(OMNITRACE_SETUP_LOG_END)
# if defined(OMNITRACE_COMMON_LIBRARY_LOG_END)
# define OMNITRACE_SETUP_LOG_END OMNITRACE_COMMON_LIBRARY_LOG_END
# elif defined(TIMEMORY_LOG_COLORS_AVAILABLE)
# define OMNITRACE_SETUP_LOG_END fprintf(stderr, "%s", ::tim::log::color::end());
# else
# define OMNITRACE_SETUP_LOG_END
# endif
#endif
#define OMNITRACE_SETUP_LOG(CONDITION, ...) \
if(CONDITION) \
{ \
fflush(stderr); \
OMNITRACE_SETUP_LOG_START \
fprintf(stderr, "[omnitrace]" OMNITRACE_SETUP_LOG_NAME "[%i] ", getpid()); \
fprintf(stderr, __VA_ARGS__); \
OMNITRACE_SETUP_LOG_END \
fflush(stderr); \
}
@@ -26,13 +26,19 @@
#define OMNITRACE_COMMON_LIBRARY_NAME "dl"
#include "dl.hpp"
#include <timemory/log/color.hpp>
#define OMNITRACE_COMMON_LIBRARY_LOG_START \
fprintf(stderr, "%s", ::tim::log::color::info());
#define OMNITRACE_COMMON_LIBRARY_LOG_END fprintf(stderr, "%s", ::tim::log::color::end());
#include "common/defines.h"
#include "common/delimit.hpp"
#include "common/environment.hpp"
#include "common/invoke.hpp"
#include "common/join.hpp"
#include "common/setup.hpp"
#include "dl.hpp"
#include <cassert>
#include <gnu/libc-version.h>
@@ -45,13 +51,17 @@
*(void**) (&VARNAME) = dlsym(HANDLE, FUNCNAME); \
if(VARNAME == nullptr && _omnitrace_dl_verbose >= _warn_verbose) \
{ \
OMNITRACE_COMMON_LIBRARY_LOG_START \
fprintf(stderr, "[omnitrace][dl][pid=%i]> %s :: %s\n", getpid(), FUNCNAME, \
dlerror()); \
OMNITRACE_COMMON_LIBRARY_LOG_END \
} \
else if(_omnitrace_dl_verbose > _info_verbose) \
{ \
OMNITRACE_COMMON_LIBRARY_LOG_START \
fprintf(stderr, "[omnitrace][dl][pid=%i]> %s :: success\n", getpid(), \
FUNCNAME); \
OMNITRACE_COMMON_LIBRARY_LOG_END \
} \
}
@@ -146,12 +156,14 @@ struct OMNITRACE_HIDDEN_API indirect
{
if(_omnitrace_dl_verbose >= 1)
{
OMNITRACE_COMMON_LIBRARY_LOG_START
fprintf(stderr, "[omnitrace][dl][pid=%i] %s resolved to '%s'\n", getpid(),
::basename(_omnilib.c_str()), m_omnilib.c_str());
fprintf(stderr, "[omnitrace][dl][pid=%i] %s resolved to '%s'\n", getpid(),
::basename(_dllib.c_str()), m_dllib.c_str());
fprintf(stderr, "[omnitrace][dl][pid=%i] %s resolved to '%s'\n", getpid(),
::basename(_userlib.c_str()), m_userlib.c_str());
OMNITRACE_COMMON_LIBRARY_LOG_END
}
auto _search_paths = common::join(':', common::path::dirname(_omnilib),
@@ -173,8 +185,10 @@ struct OMNITRACE_HIDDEN_API indirect
{
if(_omnitrace_dl_verbose >= 2)
{
OMNITRACE_COMMON_LIBRARY_LOG_START
fprintf(stderr, "[omnitrace][dl][pid=%i] dlopen(\"%s\", %s) :: success\n",
getpid(), _lib.c_str(), _omnitrace_dl_dlopen_descr);
OMNITRACE_COMMON_LIBRARY_LOG_END
}
}
else
@@ -182,8 +196,10 @@ struct OMNITRACE_HIDDEN_API indirect
if(_omnitrace_dl_verbose >= 0)
{
perror("dlopen");
OMNITRACE_COMMON_LIBRARY_LOG_START
fprintf(stderr, "[omnitrace][dl][pid=%i] dlopen(\"%s\", %s) :: %s\n",
getpid(), _lib.c_str(), _omnitrace_dl_dlopen_descr, dlerror());
OMNITRACE_COMMON_LIBRARY_LOG_END
}
}
@@ -451,7 +467,9 @@ bool _omnitrace_dl_fini = (std::atexit([]() {
if(::omnitrace::dl::_omnitrace_dl_verbose >= LEVEL) \
{ \
fflush(stderr); \
OMNITRACE_COMMON_LIBRARY_LOG_START \
fprintf(stderr, "[omnitrace][" OMNITRACE_COMMON_LIBRARY_NAME "] " __VA_ARGS__); \
OMNITRACE_COMMON_LIBRARY_LOG_END \
fflush(stderr); \
}
@@ -4,6 +4,10 @@
#
# ------------------------------------------------------------------------------#
if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.20)
cmake_policy(SET CMP0115 NEW)
endif()
add_library(omnitrace-interface-library INTERFACE)
add_library(omnitrace::omnitrace-interface-library ALIAS omnitrace-interface-library)
@@ -49,124 +53,12 @@ target_link_libraries(
add_library(omnitrace-object-library OBJECT)
add_library(omnitrace::omnitrace-object-library ALIAS omnitrace-object-library)
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/library/defines.hpp.in
${CMAKE_CURRENT_BINARY_DIR}/library/defines.hpp @ONLY)
target_sources(
omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/library.cpp ${CMAKE_CURRENT_LIST_DIR}/api.cpp
${CMAKE_CURRENT_LIST_DIR}/api.hpp)
set(library_sources
${CMAKE_CURRENT_LIST_DIR}/library.cpp
${CMAKE_CURRENT_LIST_DIR}/library/api.cpp
${CMAKE_CURRENT_LIST_DIR}/library/config.cpp
${CMAKE_CURRENT_LIST_DIR}/library/coverage.cpp
${CMAKE_CURRENT_LIST_DIR}/library/cpu_freq.cpp
${CMAKE_CURRENT_LIST_DIR}/library/critical_trace.cpp
${CMAKE_CURRENT_LIST_DIR}/library/debug.cpp
${CMAKE_CURRENT_LIST_DIR}/library/dynamic_library.cpp
${CMAKE_CURRENT_LIST_DIR}/library/kokkosp.cpp
${CMAKE_CURRENT_LIST_DIR}/library/gpu.cpp
${CMAKE_CURRENT_LIST_DIR}/library/mproc.cpp
${CMAKE_CURRENT_LIST_DIR}/library/ompt.cpp
${CMAKE_CURRENT_LIST_DIR}/library/perfetto.cpp
${CMAKE_CURRENT_LIST_DIR}/library/process_sampler.cpp
${CMAKE_CURRENT_LIST_DIR}/library/ptl.cpp
${CMAKE_CURRENT_LIST_DIR}/library/runtime.cpp
${CMAKE_CURRENT_LIST_DIR}/library/sampling.cpp
${CMAKE_CURRENT_LIST_DIR}/library/state.cpp
${CMAKE_CURRENT_LIST_DIR}/library/thread_data.cpp
${CMAKE_CURRENT_LIST_DIR}/library/timemory.cpp
${CMAKE_CURRENT_LIST_DIR}/library/tracing.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/backtrace.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/comm_data.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/exit_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/fork_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/mpi_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/omnitrace.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/pthread_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/pthread_create_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/pthread_mutex_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/user_region.cpp)
set(library_headers
${CMAKE_CURRENT_LIST_DIR}/library.hpp
${CMAKE_CURRENT_LIST_DIR}/library/api.hpp
${CMAKE_CURRENT_LIST_DIR}/library/config.hpp
${CMAKE_CURRENT_LIST_DIR}/library/common.hpp
${CMAKE_CURRENT_LIST_DIR}/library/coverage.hpp
${CMAKE_CURRENT_LIST_DIR}/library/cpu_freq.hpp
${CMAKE_CURRENT_LIST_DIR}/library/critical_trace.hpp
${CMAKE_CURRENT_LIST_DIR}/library/debug.hpp
${CMAKE_CURRENT_LIST_DIR}/library/dynamic_library.hpp
${CMAKE_CURRENT_LIST_DIR}/library/gpu.hpp
${CMAKE_CURRENT_LIST_DIR}/library/mproc.hpp
${CMAKE_CURRENT_LIST_DIR}/library/ompt.hpp
${CMAKE_CURRENT_LIST_DIR}/library/perfetto.hpp
${CMAKE_CURRENT_LIST_DIR}/library/process_sampler.hpp
${CMAKE_CURRENT_LIST_DIR}/library/ptl.hpp
${CMAKE_CURRENT_LIST_DIR}/library/rcclp.hpp
${CMAKE_CURRENT_LIST_DIR}/library/rocm.hpp
${CMAKE_CURRENT_LIST_DIR}/library/rocprofiler.hpp
${CMAKE_CURRENT_LIST_DIR}/library/roctracer.hpp
${CMAKE_CURRENT_LIST_DIR}/library/runtime.hpp
${CMAKE_CURRENT_LIST_DIR}/library/sampling.hpp
${CMAKE_CURRENT_LIST_DIR}/library/state.hpp
${CMAKE_CURRENT_LIST_DIR}/library/thread_data.hpp
${CMAKE_CURRENT_LIST_DIR}/library/timemory.hpp
${CMAKE_CURRENT_LIST_DIR}/library/tracing.hpp
${CMAKE_CURRENT_LIST_DIR}/library/utility.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/fwd.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/backtrace.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/category_region.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/comm_data.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/exit_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/fork_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/functors.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/mpi_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/omnitrace.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/rcclp.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/rocm_smi.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/rocprofiler.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/roctracer.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/pthread_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/pthread_create_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/pthread_mutex_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/library/components/user_region.hpp)
target_sources(omnitrace-object-library PRIVATE ${library_sources} ${library_headers})
if(OMNITRACE_USE_ROCTRACER OR OMNITRACE_USE_ROCPROFILER)
target_sources(
omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/library/rocprofiler.cpp
${CMAKE_CURRENT_LIST_DIR}/library/rocm.cpp
${CMAKE_CURRENT_LIST_DIR}/library/components/rocprofiler.cpp)
endif()
if(OMNITRACE_USE_ROCTRACER)
target_sources(
omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/library/components/roctracer.cpp
${CMAKE_CURRENT_LIST_DIR}/library/roctracer.cpp)
endif()
if(OMNITRACE_USE_RCCL)
target_sources(
omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/library/components/rcclp.cpp
${CMAKE_CURRENT_LIST_DIR}/library/rcclp.cpp)
endif()
if(OMNITRACE_USE_ROCPROFILER)
target_sources(
omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/library/rocprofiler.cpp
${CMAKE_CURRENT_LIST_DIR}/library/rocprofiler.hpp
${CMAKE_CURRENT_LIST_DIR}/library/rocprofiler/hsa_rsrc_factory.hpp
${CMAKE_CURRENT_LIST_DIR}/library/rocprofiler/hsa_rsrc_factory.cpp)
endif()
if(OMNITRACE_USE_ROCM_SMI)
target_sources(omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/library/components/rocm_smi.cpp)
endif()
add_subdirectory(library)
target_link_libraries(omnitrace-object-library
PRIVATE omnitrace::omnitrace-interface-library)
@@ -20,7 +20,7 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/api.hpp"
#include "api.hpp"
#include "library/debug.hpp"
#include <exception>
@@ -20,17 +20,14 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library.hpp"
#include "api.hpp"
#include "common/setup.hpp"
#include "library/api.hpp"
#include "library/components/category_region.hpp"
#include "library/components/exit_gotcha.hpp"
#include "library/components/fork_gotcha.hpp"
#include "library/components/functors.hpp"
#include "library/components/fwd.hpp"
#include "library/components/mpi_gotcha.hpp"
#include "library/components/pthread_create_gotcha.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/components/pthread_mutex_gotcha.hpp"
#include "library/components/rocprofiler.hpp"
#include "library/config.hpp"
#include "library/coverage.hpp"
@@ -43,11 +40,17 @@
#include "library/ptl.hpp"
#include "library/rcclp.hpp"
#include "library/rocprofiler.hpp"
#include "library/runtime.hpp"
#include "library/sampling.hpp"
#include "library/thread_data.hpp"
#include "library/thread_info.hpp"
#include "library/timemory.hpp"
#include "library/tracing.hpp"
#include <timemory/hash/types.hpp>
#include <timemory/operations/types/file_output_message.hpp>
#include <timemory/sampling/signals.hpp>
#include <timemory/utility/backtrace.hpp>
#include <timemory/utility/procfs/maps.hpp>
#include <atomic>
@@ -59,29 +62,21 @@ using namespace omnitrace;
//======================================================================================//
namespace
{
struct omni_regions
{};
struct user_regions
{};
} // namespace
using omni_functors = omnitrace::component::functors<omni_regions>;
using user_functors = omnitrace::component::functors<user_regions>;
TIMEMORY_INVOKE_PREINIT(omni_functors)
TIMEMORY_INVOKE_PREINIT(user_functors)
//======================================================================================//
namespace
{
auto
ensure_finalization(bool _static_init = false)
{
(void) threading::get_id();
(void) utility::get_thread_index();
const auto& _info = thread_info::init();
auto _tid = _info->index_data;
OMNITRACE_CI_THROW(_tid->internal_value != threading::get_id(),
"Error! internal tid != %li :: %li", threading::get_id(),
_tid->internal_value);
OMNITRACE_CI_THROW(_tid->system_value != threading::get_sys_tid(),
"Error! system tid != %li :: %li", threading::get_sys_tid(),
_tid->system_value);
if(!get_env("OMNITRACE_COLORIZED_LOG", true)) tim::log::colorized() = false;
if(!_static_init)
{
@@ -119,105 +114,13 @@ using Phase = critical_trace::Phase;
extern "C" void
omnitrace_push_trace_hidden(const char* name)
{
++tracing::push_count();
// unconditionally return if finalized
if(get_state() == State::Finalized)
{
OMNITRACE_CONDITIONAL_BASIC_PRINT(
tracing::debug_push, "omnitrace_push_trace(%s) called during finalization\n",
name);
return;
}
OMNITRACE_CONDITIONAL_BASIC_PRINT(tracing::debug_push, "omnitrace_push_trace(%s)\n",
name);
// the expectation here is that if the state is not active then the call
// to omnitrace_init_tooling_hidden will activate all the appropriate
// tooling one time and as it exits set it to active and return true.
if(get_state() != State::Active && !omnitrace_init_tooling_hidden())
{
static auto _debug = get_debug_env() || get_debug_init();
OMNITRACE_CONDITIONAL_BASIC_PRINT(
_debug, "omnitrace_push_trace(%s) ignored :: not active. state = %s\n", name,
std::to_string(get_state()).c_str());
return;
}
OMNITRACE_DEBUG("omnitrace_push_trace(%s)\n", name);
static auto _sample_rate = std::max<size_t>(get_instrumentation_interval(), 1);
static thread_local size_t _sample_idx = 0;
auto& _interval = tracing::get_interval_data();
auto _enabled = (_sample_idx++ % _sample_rate == 0);
_interval->emplace_back(_enabled);
if(_enabled) omni_functors::start(name);
if(get_use_critical_trace())
{
uint64_t _cid = 0;
uint64_t _parent_cid = 0;
uint32_t _depth = 0;
std::tie(_cid, _parent_cid, _depth) = create_cpu_cid_entry();
auto _ts = comp::wall_clock::record();
add_critical_trace<Device::CPU, Phase::BEGIN>(
threading::get_id(), _cid, 0, _parent_cid, _ts, 0, 0, 0,
critical_trace::add_hash_id(name), _depth);
}
component::category_region<category::host>::start(name);
}
//======================================================================================//
///
///
///
//======================================================================================//
extern "C" void
omnitrace_pop_trace_hidden(const char* name)
{
++tracing::pop_count();
OMNITRACE_CONDITIONAL_BASIC_PRINT(tracing::debug_pop, "omnitrace_pop_trace(%s)\n",
name);
// only execute when active
if(get_state() == State::Active)
{
OMNITRACE_DEBUG("omnitrace_pop_trace(%s)\n", name);
auto& _interval_data = tracing::get_interval_data();
if(!_interval_data->empty())
{
if(_interval_data->back()) omni_functors::stop(name);
_interval_data->pop_back();
}
if(get_use_critical_trace())
{
if(get_cpu_cid_stack() && !get_cpu_cid_stack()->empty())
{
auto _cid = get_cpu_cid_stack()->back();
if(get_cpu_cid_parents()->find(_cid) != get_cpu_cid_parents()->end())
{
uint64_t _parent_cid = 0;
uint32_t _depth = 0;
auto _ts = comp::wall_clock::record();
std::tie(_parent_cid, _depth) = get_cpu_cid_parents()->at(_cid);
add_critical_trace<Device::CPU, Phase::END>(
threading::get_id(), _cid, 0, _parent_cid, _ts, _ts, 0, 0,
critical_trace::add_hash_id(name), _depth);
}
}
}
}
else
{
static auto _debug = get_debug_env();
OMNITRACE_CONDITIONAL_BASIC_PRINT(
_debug, "omnitrace_pop_trace(%s) ignored :: state = %s\n", name,
std::to_string(get_state()).c_str());
}
component::category_region<category::host>::stop(name);
}
//======================================================================================//
@@ -229,52 +132,13 @@ omnitrace_pop_trace_hidden(const char* name)
extern "C" void
omnitrace_push_region_hidden(const char* name)
{
// unconditionally return if finalized
if(get_state() == State::Finalized)
{
OMNITRACE_CONDITIONAL_BASIC_PRINT(
tracing::debug_user, "omnitrace_push_region(%s) called during finalization\n",
name);
return;
}
OMNITRACE_CONDITIONAL_BASIC_PRINT(tracing::debug_push, "omnitrace_push_region(%s)\n",
name);
// the expectation here is that if the state is not active then the call
// to omnitrace_init_tooling_hidden will activate all the appropriate
// tooling one time and as it exits set it to active and return true.
if(get_state() != State::Active && !omnitrace_init_tooling_hidden())
{
static auto _debug = get_debug_env() || get_debug_init();
OMNITRACE_CONDITIONAL_BASIC_PRINT(
_debug, "omnitrace_push_region(%s) ignored :: not active. state = %s\n", name,
std::to_string(get_state()).c_str());
return;
}
OMNITRACE_DEBUG("omnitrace_push_region(%s)\n", name);
user_functors::start(name);
component::category_region<category::user>::start(name);
}
//======================================================================================//
extern "C" void
omnitrace_pop_region_hidden(const char* name)
{
// only execute when active
if(get_state() == State::Active)
{
OMNITRACE_DEBUG("omnitrace_pop_region(%s)\n", name);
user_functors::stop(name);
}
else
{
static auto _debug = get_debug_env();
OMNITRACE_CONDITIONAL_BASIC_PRINT(
_debug, "omnitrace_pop_region(%s) ignored :: state = %s\n", name,
std::to_string(get_state()).c_str());
}
component::category_region<category::user>::stop(name);
}
//======================================================================================//
@@ -393,7 +257,7 @@ omnitrace_init_library_hidden()
"glibc's backtrace() occurs...\n");
{
std::stringstream _ss{};
tim::print_backtrace<16>(_ss);
timemory_print_backtrace<16>(_ss);
(void) _ss;
}
@@ -471,7 +335,6 @@ omnitrace_init_tooling_hidden()
auto _dtor = scope::destructor{ []() {
// if set to finalized, don't continue
if(get_state() > State::Active) return;
if(config::get_trace_thread_locks()) pthread_mutex_gotcha::validate();
if(get_use_process_sampling())
{
pthread_gotcha::push_enable_sampling_on_child_threads(false);
@@ -527,7 +390,7 @@ omnitrace_init_tooling_hidden()
}
else
{
tim::trait::runtime_enabled<api::omnitrace>::set(false);
tim::trait::runtime_enabled<project::omnitrace>::set(false);
}
}
@@ -581,67 +444,6 @@ omnitrace_init_tooling_hidden()
perfetto::TrackEvent::Register();
}
auto _exe = get_exe_name();
if(get_use_perfetto() && get_use_timemory())
{
omni_functors::configure(
[](const char* name) {
tracing::thread_init();
tracing::push_perfetto(category::host{}, name);
tracing::push_timemory(name);
tracing::thread_init_sampling();
},
[](const char* name) {
tracing::pop_timemory(name);
tracing::pop_perfetto(category::host{}, name);
});
user_functors::configure(
[](const char* name) {
tracing::thread_init();
tracing::push_perfetto(category::user{}, name);
tracing::push_timemory(name);
},
[](const char* name) {
tracing::pop_timemory(name);
tracing::pop_perfetto(category::user{}, name);
});
}
else if(get_use_perfetto())
{
omni_functors::configure(
[](const char* name) {
tracing::thread_init();
tracing::push_perfetto(category::host{}, name);
tracing::thread_init_sampling();
},
[](const char* name) { tracing::pop_perfetto(category::host{}, name); });
user_functors::configure(
[](const char* name) {
tracing::thread_init();
tracing::push_perfetto(category::user{}, name);
tracing::thread_init_sampling();
},
[](const char* name) { tracing::pop_perfetto(category::user{}, name); });
}
else if(get_use_timemory())
{
omni_functors::configure(
[](const char* name) {
tracing::thread_init();
tracing::push_timemory(name);
tracing::thread_init_sampling();
},
[](const char* name) { tracing::pop_timemory(name); });
user_functors::configure(
[](const char* name) {
tracing::thread_init();
tracing::push_timemory(name);
tracing::thread_init_sampling();
},
[](const char* name) { tracing::pop_timemory(name); });
}
if(get_use_ompt())
{
OMNITRACE_VERBOSE_F(1, "Setting up OMPT...\n");
@@ -683,8 +485,6 @@ omnitrace_init_tooling_hidden()
if(dmp::rank() == 0 && get_verbose() >= 0) fprintf(stderr, "\n");
pthread_create_gotcha::get_execution_time()->first = comp::wall_clock::record();
return true;
}
@@ -761,8 +561,6 @@ omnitrace_init_hidden(const char* _mode, bool _is_binary_rewrite, const char* _a
{
get_gotcha_bundle()->start();
}
pthread_create_gotcha::get_execution_time()->first = comp::wall_clock::record();
}
//======================================================================================//
@@ -784,7 +582,10 @@ omnitrace_finalize_hidden(void)
}
OMNITRACE_VERBOSE_F(0, "finalizing...\n");
pthread_create_gotcha::get_execution_time()->second = comp::wall_clock::record();
thread_info::set_stop(comp::wall_clock::record());
tim::sampling::block_signals(get_sampling_signals(),
tim::sampling::sigmask_scope::process);
// some functions called during finalization may alter the push/pop count so we need
// to save them here
@@ -803,9 +604,6 @@ omnitrace_finalize_hidden(void)
set_state(State::Finalized);
omni_functors::configure([](const char*) {}, [](const char*) {});
user_functors::configure([](const char*) {}, [](const char*) {});
pthread_gotcha::push_enable_sampling_on_child_threads(false);
pthread_gotcha::set_sampling_on_all_future_threads(false);
@@ -847,6 +645,13 @@ omnitrace_finalize_hidden(void)
}
}
// stop the main bundle which shuts down the pthread gotchas
if(get_main_bundle())
{
OMNITRACE_DEBUG_F("Stopping main bundle...\n");
get_main_bundle()->stop();
}
if(get_use_rcclp())
{
OMNITRACE_VERBOSE_F(1, "Shutting down RCCLP...\n");
@@ -862,14 +667,22 @@ omnitrace_finalize_hidden(void)
OMNITRACE_DEBUG_F("Stopping and destroying instrumentation bundles...\n");
for(size_t i = 0; i < max_supported_threads; ++i)
{
auto& itr = instrumentation_bundles::instances().at(i);
auto& itr = instrumentation_bundles::instances().at(i);
const auto& _info = thread_info::get(i, InternalTID);
while(!itr.bundles.empty())
{
OMNITRACE_VERBOSE_F(1,
int _lvl = 1;
if(_info->is_offset)
{
++_pop_count;
_lvl = 4;
}
OMNITRACE_VERBOSE_F(_lvl,
"Warning! instrumentation bundle on thread %zu (TID=%li) "
"with label '%s' was not stopped.\n",
i, itr.bundles.back()->tid(),
itr.bundles.back()->key().c_str());
itr.bundles.back()->stop();
itr.bundles.back()->pop();
itr.allocator.destroy(itr.bundles.back());
@@ -878,25 +691,15 @@ omnitrace_finalize_hidden(void)
}
}
if(get_use_sampling())
{
OMNITRACE_VERBOSE_F(1, "Shutting down sampling...\n");
sampling::shutdown();
sampling::block_signals();
}
// stop the gotcha bundle
if(get_gotcha_bundle())
{
OMNITRACE_VERBOSE_F(1, "Shutting down miscellaneous gotchas...\n");
get_gotcha_bundle()->stop();
get_gotcha_bundle().reset();
mpi_gotcha::shutdown();
component::mpi_gotcha::shutdown();
}
OMNITRACE_VERBOSE_F(1, "Shutting down pthread gotchas...\n");
pthread_gotcha::shutdown();
if(get_use_process_sampling())
{
OMNITRACE_VERBOSE_F(1, "Shutting down background sampler...\n");
@@ -921,13 +724,16 @@ omnitrace_finalize_hidden(void)
rocprofiler::rocm_cleanup();
}
if(dmp::rank() == 0) fprintf(stderr, "\n");
if(get_use_sampling())
{
OMNITRACE_VERBOSE_F(1, "Shutting down sampling...\n");
sampling::shutdown();
}
OMNITRACE_DEBUG_F("Stopping main bundle...\n");
// stop the main bundle and report the high-level metrics
OMNITRACE_VERBOSE_F(3, "Reporting the process- and thread-level metrics...\n");
// report the high-level metrics for the process
if(get_main_bundle())
{
get_main_bundle()->stop();
std::string _msg = JOIN("", *get_main_bundle());
auto _pos = _msg.find(">>> ");
if(_pos != std::string::npos) _msg = _msg.substr(_pos + 5);
@@ -940,7 +746,6 @@ omnitrace_finalize_hidden(void)
// if they are still running (e.g. thread-pool still alive), the
// thread-specific data will be wrong if try to stop them from
// the main thread.
OMNITRACE_VERBOSE_F(3, "Destroying thread bundle data...\n");
for(auto& itr : thread_data<omnitrace_thread_bundle_t>::instances())
{
if(itr && itr->get<comp::wall_clock>() &&
@@ -957,19 +762,16 @@ omnitrace_finalize_hidden(void)
if(get_use_sampling())
{
OMNITRACE_VERBOSE_F(1, "Post-processing the sampling backtraces...\n");
for(size_t i = 0; i < max_supported_threads; ++i)
{
sampling::backtrace::post_process(i);
sampling::get_sampler(i).reset();
}
sampling::post_process();
}
if(get_use_critical_trace() || (get_use_rocm_smi() && get_use_roctracer()))
{
OMNITRACE_VERBOSE_F(1, "Generating the critical trace...\n");
// increase the thread-pool size
tasking::critical_trace::get_thread_pool().initialize_threadpool(
get_critical_trace_num_threads());
// (potentially) increase the thread-pool size since application
// shouldn't be using threads during finalization
tasking::initialize_threadpool(std::min<uint64_t>(
{ std::thread::hardware_concurrency(), get_thread_pool_size(), 8 }));
for(size_t i = 0; i < max_supported_threads; ++i)
{
@@ -1021,8 +823,9 @@ omnitrace_finalize_hidden(void)
if(get_verbose() >= 0) fprintf(stderr, "\n");
if(get_verbose() >= 0 || get_debug())
fprintf(stderr, "[%s][%s]|%i> Flushing perfetto...\n", TIMEMORY_PROJECT_NAME,
OMNITRACE_FUNCTION, dmp::rank());
fprintf(stderr, "%s[%s][%s]|%i> Flushing perfetto...%s\n",
tim::log::color::info(), TIMEMORY_PROJECT_NAME, OMNITRACE_FUNCTION,
dmp::rank(), tim::log::color::end());
// Make sure the last event is closed for this example.
perfetto::TrackEvent::Flush();
@@ -1066,35 +869,33 @@ omnitrace_finalize_hidden(void)
if(!trace_data.empty())
{
operation::file_output_message<tim::project::omnitrace> _fom{};
// Write the trace into a file.
if(get_verbose() >= 0)
fprintf(stderr,
"[%s][%s]|%i> Outputting '%s' (%.2f KB / %.2f MB / %.2f GB)... ",
TIMEMORY_PROJECT_NAME, OMNITRACE_FUNCTION, dmp::rank(),
get_perfetto_output_filename().c_str(),
static_cast<double>(trace_data.size()) / units::KB,
static_cast<double>(trace_data.size()) / units::MB,
static_cast<double>(trace_data.size()) / units::GB);
_fom(get_perfetto_output_filename(), std::string{ "perfetto" },
" (%.2f KB / %.2f MB / %.2f GB)... ",
static_cast<double>(trace_data.size()) / units::KB,
static_cast<double>(trace_data.size()) / units::MB,
static_cast<double>(trace_data.size()) / units::GB);
std::ofstream ofs{};
if(!tim::filepath::open(ofs, get_perfetto_output_filename(),
std::ios::out | std::ios::binary))
{
OMNITRACE_VERBOSE_F(0, "Error opening '%s'...\n",
get_perfetto_output_filename().c_str());
_fom.append("Error opening '%s'...",
get_perfetto_output_filename().c_str());
_perfetto_output_error = true;
}
else
{
// Write the trace into a file.
ofs.write(&trace_data[0], trace_data.size());
if(get_verbose() >= 0) fprintf(stderr, "Done\n");
if(get_verbose() >= 0) _fom.append("%s", "Done"); // NOLINT
auto _manager = tim::manager::instance();
if(_manager)
_manager->add_file_output("protobuf", "perfetto",
get_perfetto_output_filename());
}
ofs.close();
if(get_verbose() >= 0) fprintf(stderr, "\n");
}
else if(dmp::rank() == 0)
{
@@ -1,128 +0,0 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
// this always needs to included first
// clang-format off
#include "library/perfetto.hpp"
// clang-format on
#include "library/timemory.hpp"
#include "library/components/roctracer.hpp"
#include "library/api.hpp"
#include "library/components/fork_gotcha.hpp"
#include "library/components/mpi_gotcha.hpp"
#include "library/api.hpp"
#include "library/common.hpp"
#include "library/state.hpp"
#include "library/config.hpp"
#include "library/thread_data.hpp"
#include "library/ptl.hpp"
#include "library/debug.hpp"
#include "library/critical_trace.hpp"
#include "library/runtime.hpp"
#include <timemory/backends/process.hpp>
#include <timemory/macros/language.hpp>
#include <timemory/utility/utility.hpp>
#include <mutex>
namespace omnitrace
{
template <critical_trace::Device DevID, critical_trace::Phase PhaseID,
bool UpdateStack = true>
inline void
add_critical_trace(int32_t _targ_tid, size_t _cpu_cid, size_t _gpu_cid,
size_t _parent_cid, int64_t _ts_beg, int64_t _ts_val, int32_t _devid,
uintptr_t _queue, size_t _hash, uint32_t _depth, uint16_t _prio = 0)
{
// clang-format off
// these are used to create unique type mutexes
struct critical_insert {};
struct cpu_cid_stack {};
// clang-format on
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
using tim::type_mutex;
using auto_lock_t = tim::auto_lock_t;
static constexpr auto num_mutexes = max_supported_threads;
static auto _update_freq = critical_trace::get_update_frequency();
static auto _pid = process::get_id();
auto _self_tid = threading::get_id();
if constexpr(PhaseID != critical_trace::Phase::NONE)
{
auto& _self_mtx =
type_mutex<critical_insert, api::omnitrace, num_mutexes>(_self_tid);
auto_lock_t _self_lk{ _self_mtx, std::defer_lock };
// unique lock per thread
if(!_self_lk.owns_lock()) _self_lk.lock();
auto& _critical_trace = critical_trace::get(_self_tid);
_critical_trace->emplace_back(critical_trace::entry{
DevID, PhaseID, _prio, _depth, _devid, _pid, _targ_tid, _cpu_cid, _gpu_cid,
_parent_cid, _ts_beg, _ts_val, _queue, _hash });
}
if constexpr(UpdateStack)
{
auto& _self_mtx = get_cpu_cid_stack_lock(_self_tid);
auto& _targ_mtx = get_cpu_cid_stack_lock(_targ_tid);
auto_lock_t _self_lk{ _self_mtx, std::defer_lock };
auto_lock_t _targ_lk{ _targ_mtx, std::defer_lock };
// unique lock per thread
auto _lock = [&_self_lk, &_targ_lk, _self_tid, _targ_tid]() {
if(!_self_lk.owns_lock() && _self_tid != _targ_tid) _self_lk.lock();
if(!_targ_lk.owns_lock()) _targ_lk.lock();
};
if constexpr(PhaseID == critical_trace::Phase::NONE)
{
_lock();
get_cpu_cid_stack(_targ_tid)->emplace_back(_cpu_cid);
}
else if constexpr(PhaseID == critical_trace::Phase::BEGIN)
{
_lock();
get_cpu_cid_stack(_targ_tid)->emplace_back(_cpu_cid);
}
else if constexpr(PhaseID == critical_trace::Phase::END)
{
_lock();
get_cpu_cid_stack(_targ_tid)->pop_back();
if(_gpu_cid == 0 && _cpu_cid % _update_freq == (_update_freq - 1))
critical_trace::update(_targ_tid);
}
tim::consume_parameters(_lock);
}
tim::consume_parameters(_pid, _targ_tid, _cpu_cid, _gpu_cid, _parent_cid, _ts_beg,
_ts_val, _devid, _queue, _hash, _depth, _prio, num_mutexes);
}
} // namespace omnitrace
@@ -0,0 +1,87 @@
#
configure_file(${CMAKE_CURRENT_SOURCE_DIR}/defines.hpp.in
${CMAKE_CURRENT_BINARY_DIR}/defines.hpp @ONLY)
set(library_sources
${CMAKE_CURRENT_LIST_DIR}/config.cpp
${CMAKE_CURRENT_LIST_DIR}/coverage.cpp
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.cpp
${CMAKE_CURRENT_LIST_DIR}/critical_trace.cpp
${CMAKE_CURRENT_LIST_DIR}/debug.cpp
${CMAKE_CURRENT_LIST_DIR}/dynamic_library.cpp
${CMAKE_CURRENT_LIST_DIR}/kokkosp.cpp
${CMAKE_CURRENT_LIST_DIR}/gpu.cpp
${CMAKE_CURRENT_LIST_DIR}/mproc.cpp
${CMAKE_CURRENT_LIST_DIR}/ompt.cpp
${CMAKE_CURRENT_LIST_DIR}/perfetto.cpp
${CMAKE_CURRENT_LIST_DIR}/process_sampler.cpp
${CMAKE_CURRENT_LIST_DIR}/ptl.cpp
${CMAKE_CURRENT_LIST_DIR}/runtime.cpp
${CMAKE_CURRENT_LIST_DIR}/sampling.cpp
${CMAKE_CURRENT_LIST_DIR}/state.cpp
${CMAKE_CURRENT_LIST_DIR}/thread_data.cpp
${CMAKE_CURRENT_LIST_DIR}/thread_info.cpp
${CMAKE_CURRENT_LIST_DIR}/timemory.cpp
${CMAKE_CURRENT_LIST_DIR}/tracing.cpp)
set(library_headers
${CMAKE_CURRENT_LIST_DIR}/categories.hpp
${CMAKE_CURRENT_LIST_DIR}/config.hpp
${CMAKE_CURRENT_LIST_DIR}/common.hpp
${CMAKE_CURRENT_LIST_DIR}/concepts.hpp
${CMAKE_CURRENT_LIST_DIR}/coverage.hpp
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.hpp
${CMAKE_CURRENT_LIST_DIR}/critical_trace.hpp
${CMAKE_CURRENT_LIST_DIR}/debug.hpp
${CMAKE_CURRENT_LIST_DIR}/dynamic_library.hpp
${CMAKE_CURRENT_LIST_DIR}/gpu.hpp
${CMAKE_CURRENT_LIST_DIR}/mproc.hpp
${CMAKE_CURRENT_LIST_DIR}/ompt.hpp
${CMAKE_CURRENT_LIST_DIR}/perfetto.hpp
${CMAKE_CURRENT_LIST_DIR}/process_sampler.hpp
${CMAKE_CURRENT_LIST_DIR}/ptl.hpp
${CMAKE_CURRENT_LIST_DIR}/rcclp.hpp
${CMAKE_CURRENT_LIST_DIR}/rocm.hpp
${CMAKE_CURRENT_LIST_DIR}/rocm_smi.hpp
${CMAKE_CURRENT_LIST_DIR}/rocprofiler.hpp
${CMAKE_CURRENT_LIST_DIR}/roctracer.hpp
${CMAKE_CURRENT_LIST_DIR}/runtime.hpp
${CMAKE_CURRENT_LIST_DIR}/sampling.hpp
${CMAKE_CURRENT_LIST_DIR}/state.hpp
${CMAKE_CURRENT_LIST_DIR}/thread_data.hpp
${CMAKE_CURRENT_LIST_DIR}/thread_info.hpp
${CMAKE_CURRENT_LIST_DIR}/timemory.hpp
${CMAKE_CURRENT_LIST_DIR}/tracing.hpp
${CMAKE_CURRENT_LIST_DIR}/utility.hpp)
target_sources(omnitrace-object-library PRIVATE ${library_sources} ${library_headers}
${CMAKE_CURRENT_BINARY_DIR}/defines.hpp)
if(OMNITRACE_USE_ROCTRACER OR OMNITRACE_USE_ROCPROFILER)
target_sources(
omnitrace-object-library PRIVATE ${CMAKE_CURRENT_LIST_DIR}/rocprofiler.cpp
${CMAKE_CURRENT_LIST_DIR}/rocm.cpp)
endif()
if(OMNITRACE_USE_ROCTRACER)
target_sources(omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/roctracer.cpp)
endif()
if(OMNITRACE_USE_RCCL)
target_sources(omnitrace-object-library PRIVATE ${CMAKE_CURRENT_LIST_DIR}/rcclp.cpp)
endif()
if(OMNITRACE_USE_ROCPROFILER)
target_sources(
omnitrace-object-library PRIVATE ${CMAKE_CURRENT_LIST_DIR}/rocprofiler.cpp
${CMAKE_CURRENT_LIST_DIR}/rocprofiler.hpp)
endif()
if(OMNITRACE_USE_ROCM_SMI)
target_sources(omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/rocm_smi.cpp)
endif()
add_subdirectory(components)
add_subdirectory(rocprofiler)
@@ -0,0 +1,168 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
#include "common/join.hpp"
#include "library/defines.hpp"
#if defined(TIMEMORY_PERFETTO_CATEGORIES)
# error "TIMEMORY_PERFETTO_CATEGORIES is already defined. Please include \"" __FILE__ "\" before including any timemory files"
#endif
#include <timemory/api.hpp>
#include <timemory/api/macros.hpp>
#include <timemory/mpl/macros.hpp>
#include <timemory/mpl/types.hpp>
#define OMNITRACE_DEFINE_NAME_TRAIT(NAME, ...) \
namespace tim \
{ \
namespace trait \
{ \
template <> \
struct perfetto_category<__VA_ARGS__> \
{ \
static constexpr auto value = NAME; \
}; \
} \
}
#define OMNITRACE_DECLARE_CATEGORY(NS, VALUE, NAME) \
TIMEMORY_DECLARE_NS_API(NS, VALUE) \
OMNITRACE_DEFINE_NAME_TRAIT(NAME, NS::VALUE)
#define OMNITRACE_DEFINE_CATEGORY(NS, VALUE, NAME) \
TIMEMORY_DEFINE_NS_API(NS, VALUE) \
OMNITRACE_DEFINE_NAME_TRAIT(NAME, NS::VALUE)
// these are defined by omnitrace
OMNITRACE_DEFINE_CATEGORY(project, omnitrace, "omnitrace")
OMNITRACE_DEFINE_CATEGORY(category, host, "host")
OMNITRACE_DEFINE_CATEGORY(category, user, "user")
OMNITRACE_DEFINE_CATEGORY(category, device, "device")
OMNITRACE_DEFINE_CATEGORY(category, device_hip, "device_hip")
OMNITRACE_DEFINE_CATEGORY(category, device_hsa, "device_hsa")
OMNITRACE_DEFINE_CATEGORY(category, rocm_hip, "rocm_hip")
OMNITRACE_DEFINE_CATEGORY(category, rocm_hsa, "rocm_hsa")
OMNITRACE_DEFINE_CATEGORY(category, rocm_roctx, "rocm_roctx")
OMNITRACE_DEFINE_CATEGORY(category, rocm_smi, "rocm_smi")
OMNITRACE_DEFINE_CATEGORY(category, rocm_rccl, "rccl")
OMNITRACE_DEFINE_CATEGORY(category, roctracer, "roctracer")
OMNITRACE_DEFINE_CATEGORY(category, rocprofiler, "rocprofiler")
OMNITRACE_DEFINE_CATEGORY(category, pthread, "pthread")
OMNITRACE_DEFINE_CATEGORY(category, kokkos, "kokkos")
OMNITRACE_DEFINE_CATEGORY(category, mpi, "mpi")
OMNITRACE_DEFINE_CATEGORY(category, ompt, "ompt")
OMNITRACE_DEFINE_CATEGORY(category, process_sampling, "process_sampling")
OMNITRACE_DEFINE_CATEGORY(category, critical_trace, "critical-trace")
OMNITRACE_DEFINE_CATEGORY(category, host_critical_trace, "host-critical-trace")
OMNITRACE_DEFINE_CATEGORY(category, device_critical_trace, "device-critical-trace")
OMNITRACE_DEFINE_CATEGORY(cpu_freq, cpu_page, "process_page_fault")
OMNITRACE_DEFINE_CATEGORY(cpu_freq, cpu_virt, "process_virtual_memory")
OMNITRACE_DEFINE_CATEGORY(cpu_freq, cpu_peak, "process_memory_hwm")
OMNITRACE_DEFINE_CATEGORY(cpu_freq, cpu_context_switch, "process_context_switch")
OMNITRACE_DEFINE_CATEGORY(cpu_freq, cpu_page_fault, "process_page_fault")
OMNITRACE_DEFINE_CATEGORY(cpu_freq, cpu_user_mode_time, "process_user_cpu_time")
OMNITRACE_DEFINE_CATEGORY(cpu_freq, cpu_kernel_mode_time, "process_kernel_cpu_time")
OMNITRACE_DECLARE_CATEGORY(category, sampling, "sampling")
namespace tim
{
namespace trait
{
template <typename... Tp>
using name = perfetto_category<Tp...>;
}
} // namespace tim
#define OMNITRACE_PERFETTO_CATEGORIES \
perfetto::Category(tim::trait::name<tim::category::host>::value) \
.SetDescription("Host-side function tracing"), \
perfetto::Category("user").SetDescription("User-defined regions"), \
perfetto::Category("sampling").SetDescription("Host-side function sampling"), \
perfetto::Category("device_hip") \
.SetDescription("Device-side functions submitted via HSA API"), \
perfetto::Category("device_hsa") \
.SetDescription("Device-side functions submitted via HIP API"), \
perfetto::Category("rocm_hip").SetDescription("Host-side HIP functions"), \
perfetto::Category("rocm_hsa").SetDescription("Host-side HSA functions"), \
perfetto::Category("rocm_roctx").SetDescription("Host-side ROCTX labels"), \
perfetto::Category("device_busy") \
.SetDescription("Busy percentage of a GPU device"), \
perfetto::Category("device_temp") \
.SetDescription("Temperature of GPU device in degC"), \
perfetto::Category("device_power") \
.SetDescription("Power consumption of GPU device in watts"), \
perfetto::Category("device_memory_usage") \
.SetDescription("Memory usage of GPU device in MB"), \
perfetto::Category("thread_peak_memory") \
.SetDescription( \
"Peak memory usage on thread in MB (derived from sampling)"), \
perfetto::Category("thread_context_switch") \
.SetDescription("Context switches on thread (derived from sampling)"), \
perfetto::Category("thread_page_fault") \
.SetDescription("Memory page faults on thread (derived from sampling)"), \
perfetto::Category("hardware_counter") \
.SetDescription("Hardware counter value on thread (derived from sampling)"), \
perfetto::Category("cpu_freq") \
.SetDescription("CPU frequency in MHz (collected in background thread)"), \
perfetto::Category("process_page_fault") \
.SetDescription( \
"Memory page faults in process (collected in background thread)"), \
perfetto::Category("process_memory_hwm") \
.SetDescription("Memory High-Water Mark i.e. peak memory usage (collected " \
"in background thread)"), \
perfetto::Category("process_virtual_memory") \
.SetDescription("Virtual memory usage in process in MB (collected in " \
"background thread)"), \
perfetto::Category("process_context_switch") \
.SetDescription( \
"Context switches in process (collected in background thread)"), \
perfetto::Category("process_page_fault") \
.SetDescription( \
"Memory page faults in process (collected in background thread)"), \
perfetto::Category("process_user_cpu_time") \
.SetDescription("CPU time of functions executing in user-space in process " \
"in seconds (collected in background thread)"), \
perfetto::Category("process_kernel_cpu_time") \
.SetDescription("CPU time of functions executing in kernel-space in " \
"process in seconds (collected in background thread)"), \
perfetto::Category("pthread").SetDescription("Pthread functions"), \
perfetto::Category("kokkos").SetDescription("Kokkos regions"), \
perfetto::Category("mpi").SetDescription("MPI regions"), \
perfetto::Category("ompt").SetDescription("OpenMP Tools regions"), \
perfetto::Category("rccl").SetDescription( \
"ROCm Communication Collectives Library (RCCL) regions"), \
perfetto::Category("comm_data") \
.SetDescription( \
"MPI/RCCL counters for tracking amount of data sent or received"), \
perfetto::Category("critical-trace").SetDescription("Combined critical traces"), \
perfetto::Category("host-critical-trace") \
.SetDescription("Host-side critical traces"), \
perfetto::Category("device-critical-trace") \
.SetDescription("Device-side critical traces"), \
perfetto::Category("timemory").SetDescription("Events from the timemory API")
#if defined(TIMEMORY_USE_PERFETTO)
# define TIMEMORY_PERFETTO_CATEGORIES OMNITRACE_PERFETTO_CATEGORIES
#endif
@@ -23,13 +23,19 @@
#pragma once
#include "common/join.hpp"
#include "library/categories.hpp"
#include "library/concepts.hpp"
#include "library/defines.hpp"
#include <timemory/api.hpp>
#include <timemory/backends/dmp.hpp>
#include <timemory/api/macros.hpp>
#include <timemory/backends/process.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/environment/types.hpp>
#include <timemory/mpl/types.hpp>
#include <timemory/utility/demangle.hpp>
#include <timemory/utility/filepath.hpp>
#include <timemory/utility/locking.hpp>
#include <cassert>
#include <cstdint>
@@ -44,20 +50,75 @@
#include <utility>
#include <vector>
TIMEMORY_DEFINE_NS_API(api, omnitrace)
TIMEMORY_DEFINE_NS_API(api, sampling)
TIMEMORY_DEFINE_NS_API(api, rocm_smi)
TIMEMORY_DEFINE_NS_API(api, rccl)
#define OMNITRACE_DECLARE_COMPONENT(NAME) \
namespace omnitrace \
{ \
namespace component \
{ \
struct NAME; \
} \
} \
namespace tim \
{ \
namespace trait \
{ \
template <> \
struct is_component<omnitrace::component::NAME> : true_type \
{}; \
} \
} \
namespace tim \
{ \
namespace component \
{ \
using ::omnitrace::component::NAME; \
} \
}
#define OMNITRACE_COMPONENT_ALIAS(NAME, ...) \
namespace omnitrace \
{ \
namespace component \
{ \
using NAME = __VA_ARGS__; \
} \
} \
namespace tim \
{ \
namespace component \
{ \
using ::omnitrace::component::NAME; \
} \
}
#define OMNITRACE_DEFINE_CONCRETE_TRAIT(TRAIT, TYPE, VALUE) \
namespace tim \
{ \
namespace trait \
{ \
template <> \
struct TRAIT<::omnitrace::TYPE> : VALUE \
{}; \
} \
}
namespace omnitrace
{
namespace api = ::tim::api; // NOLINT
namespace category = ::tim::category; // NOLINT
namespace filepath = ::tim::filepath; // NOLINT
namespace api = ::tim::api; // NOLINT
namespace category = ::tim::category; // NOLINT
namespace filepath = ::tim::filepath; // NOLINT
namespace project = ::tim::project; // NOLINT
namespace process = ::tim::process; // NOLINT
namespace threading = ::tim::threading; // NOLINT
namespace scope = ::tim::scope; // NOLINT
namespace policy = ::tim::policy; // NOLINT
namespace trait = ::tim::trait; // NOLINT
using ::tim::auto_lock_t; // NOLINT
using ::tim::demangle; // NOLINT
using ::tim::get_env; // NOLINT
using ::tim::try_demangle; // NOLINT
using ::tim::type_mutex; // NOLINT
} // namespace omnitrace
// same sort of functionality as python's " ".join([...])
@@ -0,0 +1,48 @@
#
set(component_sources
${CMAKE_CURRENT_LIST_DIR}/backtrace.cpp
${CMAKE_CURRENT_LIST_DIR}/backtrace_metrics.cpp
${CMAKE_CURRENT_LIST_DIR}/backtrace_timestamp.cpp
${CMAKE_CURRENT_LIST_DIR}/comm_data.cpp
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.cpp
${CMAKE_CURRENT_LIST_DIR}/exit_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/fork_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/mpi_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/pthread_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/pthread_create_gotcha.cpp
${CMAKE_CURRENT_LIST_DIR}/pthread_mutex_gotcha.cpp)
set(component_headers
${CMAKE_CURRENT_LIST_DIR}/fwd.hpp
${CMAKE_CURRENT_LIST_DIR}/backtrace.hpp
${CMAKE_CURRENT_LIST_DIR}/backtrace_metrics.hpp
${CMAKE_CURRENT_LIST_DIR}/backtrace_timestamp.hpp
${CMAKE_CURRENT_LIST_DIR}/category_region.hpp
${CMAKE_CURRENT_LIST_DIR}/comm_data.hpp
${CMAKE_CURRENT_LIST_DIR}/cpu_freq.hpp
${CMAKE_CURRENT_LIST_DIR}/ensure_storage.hpp
${CMAKE_CURRENT_LIST_DIR}/exit_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/fork_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/mpi_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/rcclp.hpp
${CMAKE_CURRENT_LIST_DIR}/rocprofiler.hpp
${CMAKE_CURRENT_LIST_DIR}/roctracer.hpp
${CMAKE_CURRENT_LIST_DIR}/pthread_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/pthread_create_gotcha.hpp
${CMAKE_CURRENT_LIST_DIR}/pthread_mutex_gotcha.hpp)
target_sources(omnitrace-object-library PRIVATE ${component_sources} ${component_headers})
if(OMNITRACE_USE_ROCTRACER OR OMNITRACE_USE_ROCPROFILER)
target_sources(omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/rocprofiler.cpp)
endif()
if(OMNITRACE_USE_ROCTRACER)
target_sources(omnitrace-object-library
PRIVATE ${CMAKE_CURRENT_LIST_DIR}/roctracer.cpp)
endif()
if(OMNITRACE_USE_RCCL)
target_sources(omnitrace-object-library PRIVATE ${CMAKE_CURRENT_LIST_DIR}/rcclp.cpp)
endif()
@@ -20,17 +20,14 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/components/ensure_storage.hpp"
#include "library/components/fwd.hpp"
#include "library/components/pthread_create_gotcha.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/components/rocm_smi.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/perfetto.hpp"
#include "library/ptl.hpp"
#include "library/runtime.hpp"
#include "library/sampling.hpp"
#include "library/tracing.hpp"
#include <timemory/backends/papi.hpp>
#include <timemory/backends/threading.hpp>
@@ -49,8 +46,6 @@
#include <timemory/mpl/quirks.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/operations.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/sampling/sampler.hpp>
#include <timemory/storage.hpp>
#include <timemory/units.hpp>
#include <timemory/utility/backtrace.hpp>
@@ -73,152 +68,49 @@
#include <pthread.h>
#include <signal.h>
namespace tracing
{
using namespace ::omnitrace::tracing;
}
namespace
{
template <typename... Tp>
struct ensure_storage
{
TIMEMORY_DEFAULT_OBJECT(ensure_storage)
void operator()() const { TIMEMORY_FOLD_EXPRESSION((*this)(tim::type_list<Tp>{})); }
private:
template <typename Up, std::enable_if_t<tim::trait::is_available<Up>::value, int> = 0>
void operator()(tim::type_list<Up>) const
{
using namespace tim;
static thread_local auto _storage = operation::get_storage<Up>{}();
static thread_local auto _tid = threading::get_id();
static thread_local auto _dtor =
scope::destructor{ []() { operation::set_storage<Up>{}(nullptr, _tid); } };
tim::operation::set_storage<Up>{}(_storage, _tid);
if(_tid == 0 && !_storage) tim::trait::runtime_enabled<Up>::set(false);
}
template <typename Up,
std::enable_if_t<!tim::trait::is_available<Up>::value, long> = 0>
void operator()(tim::type_list<Up>) const
{
tim::trait::runtime_enabled<Up>::set(false);
}
};
} // namespace
namespace omnitrace
{
namespace component
{
using hw_counters = typename backtrace::hw_counters;
using signal_type_instances = thread_data<std::set<int>, api::sampling>;
using backtrace_init_instances = thread_data<backtrace, api::sampling>;
using sampler_running_instances = thread_data<bool, api::sampling>;
using papi_vector_instances = thread_data<hw_counters, api::sampling>;
using papi_label_instances = thread_data<std::vector<std::string>, api::sampling>;
namespace
{
struct perfetto_rusage
{};
unique_ptr_t<std::vector<std::string>>&
get_papi_labels(int64_t _tid)
{
static auto& _v =
papi_label_instances::instances(papi_label_instances::construct_on_init{});
return _v.at(_tid);
}
unique_ptr_t<hw_counters>&
get_papi_vector(int64_t _tid)
{
static auto& _v = papi_vector_instances::instances();
if(_tid == threading::get_id()) papi_vector_instances::construct();
return _v.at(_tid);
}
unique_ptr_t<backtrace>&
get_backtrace_init(int64_t _tid)
{
static auto& _v = backtrace_init_instances::instances();
return _v.at(_tid);
}
unique_ptr_t<bool>&
get_sampler_running(int64_t _tid)
{
static auto& _v = sampler_running_instances::instances(
sampler_running_instances::construct_on_init{}, false);
return _v.at(_tid);
}
} // namespace
bool
backtrace::operator<(const backtrace& rhs) const
{
return (m_ts == rhs.m_ts) ? (m_tid < rhs.m_tid) : (m_ts < rhs.m_ts);
}
std::vector<std::string_view>
std::vector<std::string>
backtrace::get() const
{
std::vector<std::string_view> _v = {};
if(m_size == 0) return _v;
size_t _size = 0;
for(const auto* itr : m_data)
_size += (strlen(itr) > 0) ? 1 : 0;
_v.reserve(_size);
for(const auto* itr : m_data)
std::vector<std::string> _v = {};
if(size() == 0) return _v;
_v.reserve(m_data.size());
for(const auto& itr : m_data.call_stack)
{
if(strlen(itr) > 0) _v.emplace_back(itr);
if(!itr) continue;
#if defined(OMNITRACE_CI) && OMNITRACE_CI > 0
std::string _name = {};
_name.reserve(1024);
const char* _addr = _name.data();
_name = itr->get_name(m_data.context, _name);
OMNITRACE_CONDITIONAL_PRINT(
_name.data() != _addr,
"[backtrace::get()] processing unw_get_proc_name_from_ip for '%s' "
"caused a reallocation. Before=%p, After=%p\n",
_name.c_str(), _addr, _name.data());
#else
auto _name = itr->get_name(m_data.context);
#endif
if(!_name.empty()) _v.emplace_back(_name);
}
// put the bottom of the call-stack on top
std::reverse(_v.begin(), _v.end());
while(!_v.empty() && _v.back() == "funlockfile")
//
auto _known_excludes =
std::set<std::string>{ "funlockfile", "killpg", "__restore_rt" };
// remove some known functions which are by-products of interrupts
while(!_v.empty() && _known_excludes.find(_v.back()) != _known_excludes.end())
_v.pop_back();
return _v;
}
void
backtrace::preinit()
{
sampling_wall_clock::label() = "sampling_wall_clock";
sampling_wall_clock::description() = "Wall clock time (via sampling)";
sampling_cpu_clock::label() = "sampling_cpu_clock";
sampling_cpu_clock::description() = "CPU clock time (via sampling)";
sampling_percent::label() = "sampling_percent";
sampling_percent::description() = "Percentage of samples";
sampling_gpu_busy::label() = "sampling_gpu_busy_percent";
sampling_gpu_busy::description() = "Utilization of GPU(s)";
sampling_gpu_busy::set_precision(0);
sampling_gpu_busy::set_format_flags(sampling_gpu_busy::get_format_flags() &
std::ios_base::showpoint);
sampling_gpu_memory::label() = "sampling_gpu_memory_usage";
sampling_gpu_memory::description() = "Memory usage of GPU(s)";
sampling_gpu_power::label() = "sampling_gpu_power";
sampling_gpu_power::description() = "Power usage of GPU(s)";
sampling_gpu_power::unit() = units::watt;
sampling_gpu_power::display_unit() = "watts";
sampling_gpu_power::set_precision(2);
sampling_gpu_power::set_format_flags(sampling_gpu_power::get_format_flags());
sampling_gpu_temp::label() = "sampling_gpu_temperature";
sampling_gpu_temp::description() = "Temperature of GPU(s)";
sampling_gpu_temp::unit() = 1;
sampling_gpu_temp::display_unit() = "degC";
sampling_gpu_temp::set_precision(1);
sampling_gpu_temp::set_format_flags(sampling_gpu_temp::get_format_flags());
}
std::string
backtrace::label()
{
@@ -231,252 +123,9 @@ backtrace::description()
return "Records backtrace data";
}
void
backtrace::start()
{}
void
backtrace::stop()
{}
bool
backtrace::empty() const
std::vector<std::string>
backtrace::filter_and_patch(const std::vector<std::string>& _data)
{
return (m_size == 0);
}
size_t
backtrace::size() const
{
return m_size;
}
uint64_t
backtrace::get_timestamp() const
{
return m_ts;
}
int64_t
backtrace::get_thread_cpu_timestamp() const
{
return m_thr_cpu_ts;
}
void
backtrace::sample(int signum)
{
if(signum != -1 && get_state() != State::Active)
{
OMNITRACE_CONDITIONAL_PRINT(
get_debug_sampling(),
"request to sample (signal %i) ignored because omnitrace is not active\n",
signum);
return;
}
if(get_debug_sampling())
{
static auto _timestamp_str = [](const auto& _tp) {
char _repr[64];
std::memset(_repr, '\0', sizeof(_repr));
std::time_t _value = system_clock::to_time_t(_tp);
// alternative: "%c %Z"
if(std::strftime(_repr, sizeof(_repr), "%a %b %d %T %Y %Z",
std::localtime(&_value)) > 0)
return std::string{ _repr };
return std::string{};
};
static thread_local size_t _tot = 0;
static thread_local auto _last = system_clock::now();
auto _now = system_clock::now();
auto _diff = (_now - _last).count();
_last = _now;
_tot += _diff;
OMNITRACE_PRINT(
"Sample on signal %i taken at %s after interval %zu :: total %zu\n", signum,
_timestamp_str(_now).c_str(), _diff, _tot);
}
if(!*get_sampler_running(0)) return;
m_size = 0;
m_tid = threading::get_id();
m_ts = comp::wall_clock::record();
m_thr_cpu_ts = tim::get_clock_thread_now<int64_t, std::nano>();
auto _cache = tim::rusage_cache{ RUSAGE_THREAD };
m_mem_peak = _cache.get_peak_rss();
m_ctx_swch = _cache.get_num_priority_context_switch() +
_cache.get_num_voluntary_context_switch();
m_page_flt = _cache.get_num_major_page_faults() + _cache.get_num_minor_page_faults();
m_data = tim::get_unw_backtrace<stack_depth, 3, false>();
m_size = m_data.size();
if constexpr(tim::trait::is_available<hw_counters>::value)
{
if(tim::trait::runtime_enabled<hw_counters>::get())
{
assert(get_papi_vector(m_tid).get() != nullptr);
m_hw_counter = get_papi_vector(m_tid)->record();
}
}
}
std::set<int>
backtrace::configure(bool _setup, int64_t _tid)
{
auto& _sampler = sampling::get_sampler(_tid);
auto& _running = get_sampler_running(_tid);
bool _is_running = (!_running) ? false : *_running;
auto& _signal_types = sampling::get_signal_types(_tid);
ensure_storage<comp::trip_count, sampling_wall_clock, sampling_cpu_clock, hw_counters,
sampling_percent>{}();
if(_setup && !_sampler && !_is_running)
{
(void) get_debug_sampling(); // make sure query in sampler does not allocate
assert(_tid == threading::get_id());
sampling::block_signals(*_signal_types);
if constexpr(tim::trait::is_available<hw_counters>::value)
{
perfetto_counter_track<hw_counters>::init();
OMNITRACE_DEBUG("HW COUNTER: starting...\n");
if(get_papi_vector(_tid))
{
using common_type_t = typename hw_counters::common_type;
get_papi_vector(_tid)->start();
*get_papi_labels(_tid) = comp::papi_common::get_events<common_type_t>();
}
}
auto _alrm_freq = std::min<double>(get_sampling_freq(), 20.0);
auto _prof_freq = get_sampling_freq();
auto _delay = std::max<double>(1.0e-3, get_sampling_delay());
auto _verbose = std::min<int>(get_verbose() - 2, 2);
if(get_debug_sampling()) _verbose = 2;
OMNITRACE_DEBUG("Configuring sampler for thread %lu...\n", _tid);
sampling::sampler_instances::construct("omnitrace", _tid, *_signal_types);
_sampler->set_signals(*_signal_types);
_sampler->set_flags(SA_RESTART);
_sampler->set_delay(_delay, *_signal_types, (_verbose > 1));
_sampler->set_verbose(_verbose);
if(_signal_types->count(get_realtime_signal()) > 0)
_sampler->set_frequency(_alrm_freq, { get_realtime_signal() },
(_verbose > 1));
if(_signal_types->count(get_cputime_signal()) > 0)
_sampler->set_frequency(_prof_freq, { get_cputime_signal() }, (_verbose > 1));
static_assert(tim::trait::buffer_size<sampling::sampler_t>::value > 0,
"Error! Zero buffer size");
OMNITRACE_CONDITIONAL_THROW(
_sampler->get_buffer_size() <= 0,
"dynamic sampler requires a positive buffer size: %zu",
_sampler->get_buffer_size());
for(auto itr : *_signal_types)
{
const char* _type = (itr == get_realtime_signal()) ? "wall" : "CPU";
OMNITRACE_VERBOSE(1,
"[%i] Sampler for thread %lu will be triggered %.1fx per "
"second of %s-time (every %.3f milliseconds)...\n",
itr, _tid, _sampler->get_frequency(units::sec, itr), _type,
_sampler->get_period(units::msec, itr));
}
*_running = true;
backtrace_init_instances::construct();
get_backtrace_init(_tid)->sample();
_sampler->configure(false);
_sampler->start();
}
else if(!_setup && _sampler && _is_running)
{
OMNITRACE_DEBUG("Destroying sampler for thread %lu...\n", _tid);
*_running = false;
if(_tid == threading::get_id())
{
sampling::block_signals(*_signal_types);
}
// remove the timer delivering the signal
_sampler->reset(false, *_signal_types);
if(_tid == 0)
{
// this propagates to all threads
_sampler->ignore(*_signal_types);
for(int64_t i = 1; i < OMNITRACE_MAX_THREADS; ++i)
{
if(sampling::get_sampler(i))
{
sampling::get_sampler(i)->reset(false,
*sampling::get_signal_types(i));
*get_sampler_running(i) = false;
}
}
}
_sampler->stop();
if constexpr(tim::trait::is_available<hw_counters>::value)
{
if(_tid == threading::get_id())
{
if(get_papi_vector(_tid)) get_papi_vector(_tid)->stop();
OMNITRACE_DEBUG("HW COUNTER: stopped...\n");
}
}
OMNITRACE_DEBUG("Sampler destroyed for thread %lu\n", _tid);
}
return (_signal_types) ? *_signal_types : std::set<int>{};
}
backtrace::hw_counter_data_t&
backtrace::get_last_hwcounters()
{
static thread_local auto _v = hw_counter_data_t{ 0 };
return _v;
}
void
backtrace::post_process(int64_t _tid)
{
namespace quirk = tim::quirk;
configure(false, _tid);
auto& _sampler = sampling::sampler_instances::instances().at(_tid);
if(!_sampler)
{
// this should be relatively common
OMNITRACE_CONDITIONAL_PRINT(
get_debug() && get_verbose() >= 2,
"Post-processing sampling entries for thread %lu skipped (no sampler)\n",
_tid);
return;
}
auto& _init = backtrace_init_instances::instances().at(_tid);
if(!_init)
{
// this is not common
OMNITRACE_PRINT(
"Post-processing sampling entries for thread %lu skipped (not initialized)\n",
_tid);
return;
}
_init->m_ts = std::max<uint64_t>(
_init->m_ts, pthread_create_gotcha::get_execution_time(_tid)->first);
// check whether the call-stack entry should be used. -1 means break, 0 means continue
auto _use_label = [](std::string_view _lbl) -> short {
// debugging feature
@@ -491,7 +140,6 @@ backtrace::post_process(int64_t _tid)
if(_lbl.find("rocprofiler_") != _npos) return -1;
if(_lbl.find("roctracer_") != _npos) return -1;
if(_lbl.find("perfetto::") != _npos) return -1;
if(_lbl == "funlockfile") return 0;
return 1;
};
@@ -509,327 +157,55 @@ backtrace::post_process(int64_t _tid)
return std::string{ _lbl }.replace(_pos, _dyninst.length(), "");
};
auto _hw_cnt_labels = *get_papi_labels(_tid);
auto _process_perfetto_counters = [&](const std::vector<sampling::bundle_t*>& _data) {
auto _tid_name = JOIN("", '[', _tid, ']');
if(!perfetto_counter_track<perfetto_rusage>::exists(_tid))
{
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Peak Memory Usage", _tid_name, "(S)"), "MB");
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Context Switches", _tid_name, "(S)"));
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Page Faults", _tid_name, "(S)"));
}
if(!perfetto_counter_track<hw_counters>::exists(_tid) &&
tim::trait::runtime_enabled<hw_counters>::get())
{
for(auto& itr : _hw_cnt_labels)
{
std::string _desc = tim::papi::get_event_info(itr).short_descr;
if(_desc.empty()) _desc = itr;
OMNITRACE_CI_THROW(_desc.empty(), "Empty description for %s\n",
itr.c_str());
perfetto_counter_track<hw_counters>::emplace(
_tid, JOIN(' ', "Thread", _desc, _tid_name, "(S)"));
}
}
uint64_t _mean_ts = 0;
const backtrace* _last_bt = nullptr;
for(const auto& ditr : _data)
{
const auto* _bt = ditr->get<backtrace>();
if(_bt->m_tid != _tid) continue;
auto _ts = _bt->m_ts;
if(!pthread_create_gotcha::is_valid_execution_time(_tid, _ts)) continue;
_last_bt = _bt;
_mean_ts += _ts;
TRACE_COUNTER("thread_peak_memory",
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts,
_bt->m_mem_peak / units::megabyte);
TRACE_COUNTER("thread_context_switch",
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts,
_bt->m_ctx_swch);
TRACE_COUNTER("thread_page_fault",
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts,
_bt->m_page_flt);
if(tim::trait::runtime_enabled<hw_counters>::get())
{
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid);
++i)
{
if(i < _bt->m_hw_counter.size())
{
TRACE_COUNTER("hardware_counter",
perfetto_counter_track<hw_counters>::at(_tid, i),
_ts, _bt->m_hw_counter.at(i));
}
}
}
}
if(_tid > 0 && _last_bt)
{
auto _ts = pthread_create_gotcha::get_execution_time(_tid)->second;
uint64_t _zero = 0;
TRACE_COUNTER("thread_peak_memory",
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts,
_zero);
TRACE_COUNTER("thread_context_switch",
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts,
_zero);
TRACE_COUNTER("thread_page_fault",
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts,
_zero);
if(tim::trait::runtime_enabled<hw_counters>::get())
{
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid);
++i)
{
if(i < _last_bt->m_hw_counter.size())
{
TRACE_COUNTER("hardware_counter",
perfetto_counter_track<hw_counters>::at(_tid, i),
_ts, _zero);
}
}
}
}
};
auto _process_perfetto = [&](const std::vector<sampling::bundle_t*>& _data,
bool _rename) {
if(_rename)
threading::set_thread_name(TIMEMORY_JOIN(" ", "Thread", _tid, "(S)").c_str());
uint64_t _beg_ns = pthread_create_gotcha::get_execution_time(_tid)->first;
uint64_t _end_ns = pthread_create_gotcha::get_execution_time(_tid)->second;
uint64_t _last_wall_ts = _init->get_timestamp();
tracing::push_perfetto_ts(category::sampling{}, "samples [omnitrace]", _beg_ns,
"begin_ns", _beg_ns);
for(const auto& ditr : _data)
{
const auto* _bt = ditr->get<backtrace>();
if(_bt->m_tid != _tid) continue;
static std::set<std::string> _static_strings{};
for(const auto& itr : _bt->get())
{
auto _name = tim::demangle(_patch_label(itr));
auto _use = _use_label(_name);
if(_use == -1) break;
if(_use == 0) continue;
auto sitr = _static_strings.emplace(_name);
uint64_t _beg = _last_wall_ts;
uint64_t _end = _bt->m_ts;
if(_end <= _beg) continue;
if(!pthread_create_gotcha::is_valid_execution_time(_tid, _beg)) continue;
if(!pthread_create_gotcha::is_valid_execution_time(_tid, _end)) continue;
tracing::push_perfetto_ts(category::sampling{}, sitr.first->c_str(), _beg,
"begin_ns", _beg);
tracing::pop_perfetto_ts(category::sampling{}, sitr.first->c_str(), _end,
"end_ns", _end);
}
_last_wall_ts = _bt->m_ts;
}
tracing::pop_perfetto_ts(category::sampling{}, "samples [omnitrace]", _end_ns,
"end_ns", _end_ns);
};
_sampler->stop();
auto _raw_data = _sampler->get_data();
OMNITRACE_CI_THROW(
_sampler->get_sample_count() != _raw_data.size(),
"Error! sampler recorded %zu samples but %zu samples were returned\n",
_sampler->get_sample_count(), _raw_data.size());
// single sample that is useless (backtrace to unblocking signals)
if(_raw_data.size() == 1 && _raw_data.front().size() <= 1) _raw_data.clear();
std::vector<sampling::bundle_t*> _data{};
for(auto& itr : _raw_data)
auto _ret = std::vector<std::string>{};
for(const auto& itr : _data)
{
_data.reserve(_data.size() + itr.size());
auto* _bt = itr.get<backtrace>();
if(!_bt)
{
OMNITRACE_PRINT("Warning! Nullptr to backtrace instance for thread %lu...\n",
_tid);
continue;
}
if(_bt->empty()) continue;
if(!pthread_create_gotcha::is_valid_execution_time(_tid, _bt->m_ts)) continue;
_data.emplace_back(&itr);
auto _name = tim::demangle(_patch_label(itr));
auto _use = _use_label(_name);
if(_use == -1) break;
if(_use == 0) continue;
_ret.emplace_back(_name);
}
if(_data.empty()) return;
OMNITRACE_VERBOSE(0 || get_debug_sampling(),
"Post-processing %zu sampling entries for thread %lu...\n",
_data.size(), _tid);
std::sort(_data.begin(), _data.end(),
[](const sampling::bundle_t* _lhs, const sampling::bundle_t* _rhs) {
return _lhs->get<backtrace>()->m_ts < _rhs->get<backtrace>()->m_ts;
});
if(get_use_perfetto())
{
_process_perfetto_counters(_data);
pthread_gotcha::push_enable_sampling_on_child_threads(false);
std::thread{ _process_perfetto, _data, true }.join();
pthread_gotcha::pop_enable_sampling_on_child_threads();
}
if(!get_use_timemory()) return;
std::map<int64_t, std::map<int64_t, int64_t>> _depth_sum = {};
auto _scope = tim::scope::config{};
if(get_timeline_sampling()) _scope += scope::timeline{};
if(get_flat_sampling()) _scope += scope::flat{};
backtrace* _last_bt = _init.get();
for(auto& ditr : _data)
{
using bundle_t = tim::lightweight_tuple<comp::trip_count, sampling_wall_clock,
sampling_cpu_clock, hw_counters>;
auto* _bt = ditr->get<backtrace>();
if(!pthread_create_gotcha::is_valid_execution_time(_tid, _bt->m_ts)) continue;
if(_bt->m_ts < _last_bt->m_ts) continue;
double _elapsed_wc = (_bt->m_ts - _last_bt->m_ts);
double _elapsed_cc = (_bt->m_thr_cpu_ts - _last_bt->m_thr_cpu_ts);
std::vector<bundle_t> _tc{};
_tc.reserve(_bt->size());
// generate the instances of the tuple of components and start them
for(const auto& itr : _bt->get())
{
auto _lbl = tim::demangle(_patch_label(itr));
auto _use = _use_label(_lbl);
if(_use == -1) break;
if(_use == 0) continue;
_tc.emplace_back(tim::string_view_t{ _lbl }, _scope);
_tc.back().push(_bt->m_tid);
_tc.back().start();
}
// stop the instances and update the values as needed
for(size_t i = 0; i < _tc.size(); ++i)
{
auto& itr = _tc.at(_tc.size() - i - 1);
size_t _depth = 0;
_depth_sum[_bt->m_tid][_depth] += 1;
itr.stop();
if constexpr(tim::trait::is_available<sampling_wall_clock>::value)
{
auto* _sc = itr.get<sampling_wall_clock>();
if(_sc)
{
auto _value = _elapsed_wc / sampling_wall_clock::get_unit();
_sc->set_value(_value);
_sc->set_accum(_value);
}
}
if constexpr(tim::trait::is_available<sampling_cpu_clock>::value)
{
auto* _cc = itr.get<sampling_cpu_clock>();
if(_cc)
{
_cc->set_value(_elapsed_cc / sampling_cpu_clock::get_unit());
_cc->set_accum(_elapsed_cc / sampling_cpu_clock::get_unit());
}
}
if constexpr(tim::trait::is_available<hw_counters>::value)
{
auto _hw_cnt_vals = _bt->m_hw_counter;
if(_last_bt && _bt->m_hw_counter.size() == _last_bt->m_hw_counter.size())
{
for(size_t k = 0; k < _bt->m_hw_counter.size(); ++k)
{
if(_last_bt->m_hw_counter[k] > _hw_cnt_vals[k])
_hw_cnt_vals[k] -= _last_bt->m_hw_counter[k];
}
}
auto* _hw_counter = itr.get<hw_counters>();
if(_hw_counter)
{
_hw_counter->set_value(_hw_cnt_vals);
_hw_counter->set_accum(_hw_cnt_vals);
}
}
itr.pop();
}
_last_bt = _bt;
}
for(auto&& ditr : _data)
{
using bundle_t =
tim::lightweight_tuple<sampling_percent, quirk::config<quirk::tree_scope>>;
auto* _bt = ditr->get<backtrace>();
std::vector<bundle_t> _tc{};
_tc.reserve(_bt->size());
// generate the instances of the tuple of components and start them
for(const auto& itr : _bt->get())
{
auto _lbl = tim::demangle(_patch_label(itr));
auto _use = _use_label(_lbl);
if(_use == -1) break;
if(_use == 0) continue;
_tc.emplace_back(tim::string_view_t{ _lbl });
_tc.back().push(_bt->m_tid);
_tc.back().start();
}
// stop the instances and update the values as needed
for(size_t i = 0; i < _tc.size(); ++i)
{
auto& itr = _tc.at(_tc.size() - i - 1);
size_t _depth = 0;
double _value = (1.0 / _depth_sum[_bt->m_tid][_depth]) * 100.0;
itr.store(std::plus<double>{}, _value);
itr.stop();
itr.pop();
}
}
return _ret;
}
void
backtrace::start()
{}
void
backtrace::stop()
{}
bool
backtrace::empty() const
{
return (size() == 0);
}
size_t
backtrace::size() const
{
return m_data.size();
}
void
backtrace::sample(int)
{
using namespace tim::backtrace;
constexpr bool with_signal_frame = false;
constexpr size_t ignore_depth = 3;
// ignore depth based on:
// 1. this frame
// 2. tim::sampling::sampler<...>::sample(...) [always inline]
// 3. tim::sampling::sampler<...>::execute(...)
// 4a. funlockfile [common but not explicitly in call-stack]
// 4b. __resume_rt [common but not explicitly in call-stack]
// 4c. killpg [common but not explicitly in call-stack]
m_data = get_unw_backtrace_raw<stack_depth, ignore_depth, with_signal_frame>();
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
double)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
double)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
double)
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::backtrace)
@@ -29,8 +29,6 @@
#include "library/timemory.hpp"
#include <timemory/components/base.hpp>
#include <timemory/components/papi/papi_array.hpp>
#include <timemory/components/papi/types.hpp>
#include <timemory/macros/language.hpp>
#include <timemory/mpl/concepts.hpp>
#include <timemory/variadic/types.hpp>
@@ -50,74 +48,37 @@ struct backtrace
: tim::component::empty_base
, tim::concepts::component
{
static constexpr size_t num_hw_counters = TIMEMORY_PAPI_ARRAY_SIZE;
static constexpr size_t buffer_width = 512;
static constexpr size_t stack_depth = 128;
static constexpr size_t stack_depth = OMNITRACE_MAX_UNWIND_DEPTH;
using data_t = std::array<char[buffer_width], stack_depth>;
using data_t = tim::unwind::stack<stack_depth>;
using clock_type = std::chrono::steady_clock;
using value_type = void;
using hw_counters = tim::component::papi_array<num_hw_counters>;
using hw_counter_data_t = typename hw_counters::value_type;
using system_clock = std::chrono::system_clock;
using system_time_point = typename system_clock::time_point;
static void preinit();
static std::string label();
static std::string description();
backtrace() = default;
~backtrace() = default;
backtrace(backtrace&&) = default;
backtrace(const backtrace&) = default;
backtrace() = default;
~backtrace() = default;
backtrace(const backtrace&) = default;
backtrace(backtrace&&) noexcept = default;
backtrace& operator=(const backtrace&) = default;
backtrace& operator=(backtrace&&) = default;
backtrace& operator=(backtrace&&) noexcept = default;
bool operator<(const backtrace& rhs) const;
static std::vector<std::string> filter_and_patch(const std::vector<std::string>&);
static std::set<int> configure(bool, int64_t _tid = threading::get_id());
static void post_process(int64_t _tid = threading::get_id());
static hw_counter_data_t& get_last_hwcounters();
static void start();
static void stop();
static void start();
static void stop();
void sample(int = -1);
bool empty() const;
size_t size() const;
std::vector<std::string_view> get() const;
uint64_t get_timestamp() const;
int64_t get_thread_cpu_timestamp() const;
void sample(int = -1);
bool empty() const;
size_t size() const;
std::vector<std::string> get() const;
private:
int64_t m_tid = 0;
int64_t m_thr_cpu_ts = 0;
int64_t m_mem_peak = 0;
int64_t m_ctx_swch = 0;
int64_t m_page_flt = 0;
uint64_t m_ts = {};
size_t m_size = 0;
data_t m_data = {};
hw_counter_data_t m_hw_counter = {};
data_t m_data = {};
};
} // namespace component
} // namespace omnitrace
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
double)
TIMEMORY_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
double)
TIMEMORY_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
double)
#endif
@@ -0,0 +1,316 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/components/backtrace_metrics.hpp"
#include "library/components/ensure_storage.hpp"
#include "library/components/fwd.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/perfetto.hpp"
#include "library/ptl.hpp"
#include "library/runtime.hpp"
#include "library/sampling.hpp"
#include "library/thread_info.hpp"
#include "library/tracing.hpp"
#include <timemory/backends/papi.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/components/data_tracker/components.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/components/papi/extern.hpp>
#include <timemory/components/papi/papi_array.hpp>
#include <timemory/components/papi/papi_vector.hpp>
#include <timemory/components/rusage/components.hpp>
#include <timemory/components/rusage/types.hpp>
#include <timemory/components/timing/backends.hpp>
#include <timemory/components/trip_count/extern.hpp>
#include <timemory/macros.hpp>
#include <timemory/math.hpp>
#include <timemory/mpl.hpp>
#include <timemory/mpl/quirks.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/operations.hpp>
#include <timemory/storage.hpp>
#include <timemory/units.hpp>
#include <timemory/utility/backtrace.hpp>
#include <timemory/utility/demangle.hpp>
#include <timemory/utility/types.hpp>
#include <timemory/variadic.hpp>
#include <array>
#include <cstring>
#include <ctime>
#include <initializer_list>
#include <mutex>
#include <regex>
#include <sstream>
#include <string>
#include <string_view>
#include <type_traits>
#include <vector>
#include <pthread.h>
#include <signal.h>
namespace tracing
{
using namespace ::omnitrace::tracing;
}
namespace omnitrace
{
namespace component
{
using hw_counters = typename backtrace_metrics::hw_counters;
using signal_type_instances = thread_data<std::set<int>, category::sampling>;
using backtrace_metrics_init_instances =
thread_data<backtrace_metrics, category::sampling>;
using sampler_running_instances = thread_data<bool, category::sampling>;
using papi_vector_instances = thread_data<hw_counters, category::sampling>;
using papi_label_instances = thread_data<std::vector<std::string>, category::sampling>;
namespace
{
struct perfetto_rusage
{};
unique_ptr_t<std::vector<std::string>>&
get_papi_labels(int64_t _tid)
{
static auto& _v =
papi_label_instances::instances(papi_label_instances::construct_on_init{});
return _v.at(_tid);
}
unique_ptr_t<hw_counters>&
get_papi_vector(int64_t _tid)
{
static auto& _v = papi_vector_instances::instances();
if(_tid == threading::get_id()) papi_vector_instances::construct();
return _v.at(_tid);
}
unique_ptr_t<backtrace_metrics>&
get_backtrace_metrics_init(int64_t _tid)
{
static auto& _v = backtrace_metrics_init_instances::instances();
return _v.at(_tid);
}
unique_ptr_t<bool>&
get_sampler_running(int64_t _tid)
{
static auto& _v = sampler_running_instances::instances(
sampler_running_instances::construct_on_init{}, false);
return _v.at(_tid);
}
} // namespace
std::string
backtrace_metrics::label()
{
return "backtrace_metrics";
}
std::string
backtrace_metrics::description()
{
return "Records sampling data";
}
void
backtrace_metrics::start()
{}
void
backtrace_metrics::stop()
{}
void
backtrace_metrics::sample(int)
{
auto _tid = threading::get_id();
auto _cache = tim::rusage_cache{ RUSAGE_THREAD };
m_cpu = tim::get_clock_thread_now<int64_t, std::nano>();
m_mem_peak = _cache.get_peak_rss();
m_ctx_swch = _cache.get_num_priority_context_switch() +
_cache.get_num_voluntary_context_switch();
m_page_flt = _cache.get_num_major_page_faults() + _cache.get_num_minor_page_faults();
if constexpr(tim::trait::is_available<hw_counters>::value)
{
if(tim::trait::runtime_enabled<hw_counters>::get())
{
assert(get_papi_vector(_tid).get() != nullptr);
m_hw_counter = get_papi_vector(_tid)->record();
}
}
}
void
backtrace_metrics::configure(bool _setup, int64_t _tid)
{
auto& _running = get_sampler_running(_tid);
bool _is_running = (!_running) ? false : *_running;
ensure_storage<comp::trip_count, sampling_wall_clock, sampling_cpu_clock, hw_counters,
sampling_percent>{}();
if(_setup && !_is_running)
{
(void) get_debug_sampling(); // make sure query in sampler does not allocate
assert(_tid == threading::get_id());
if constexpr(tim::trait::is_available<hw_counters>::value)
{
perfetto_counter_track<hw_counters>::init();
OMNITRACE_DEBUG("HW COUNTER: starting...\n");
if(get_papi_vector(_tid))
{
using common_type_t = typename hw_counters::common_type;
get_papi_vector(_tid)->start();
*get_papi_labels(_tid) = comp::papi_common::get_events<common_type_t>();
}
}
}
else if(!_setup && _is_running)
{
OMNITRACE_DEBUG("Destroying sampler for thread %lu...\n", _tid);
*_running = false;
if constexpr(tim::trait::is_available<hw_counters>::value)
{
if(_tid == threading::get_id())
{
if(get_papi_vector(_tid)) get_papi_vector(_tid)->stop();
OMNITRACE_DEBUG("HW COUNTER: stopped...\n");
}
}
OMNITRACE_DEBUG("Sampler destroyed for thread %lu\n", _tid);
}
}
void
backtrace_metrics::init_perfetto(int64_t _tid)
{
auto _hw_cnt_labels = *get_papi_labels(_tid);
auto _tid_name = JOIN("", '[', _tid, ']');
if(!perfetto_counter_track<perfetto_rusage>::exists(_tid))
{
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Peak Memory Usage", _tid_name, "(S)"), "MB");
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Context Switches", _tid_name, "(S)"));
perfetto_counter_track<perfetto_rusage>::emplace(
_tid, JOIN(' ', "Thread Page Faults", _tid_name, "(S)"));
}
if(!perfetto_counter_track<hw_counters>::exists(_tid) &&
tim::trait::runtime_enabled<hw_counters>::get())
{
for(auto& itr : _hw_cnt_labels)
{
std::string _desc = tim::papi::get_event_info(itr).short_descr;
if(_desc.empty()) _desc = itr;
OMNITRACE_CI_THROW(_desc.empty(), "Empty description for %s\n", itr.c_str());
perfetto_counter_track<hw_counters>::emplace(
_tid, JOIN(' ', "Thread", _desc, _tid_name, "(S)"));
}
}
}
void
backtrace_metrics::fini_perfetto(int64_t _tid)
{
auto _hw_cnt_labels = *get_papi_labels(_tid);
const auto& _thread_info = thread_info::get(_tid, InternalTID);
OMNITRACE_CI_THROW(!_thread_info, "Error! missing thread info for tid=%li\n", _tid);
if(!_thread_info) return;
uint64_t _ts = _thread_info->get_stop();
TRACE_COUNTER("thread_peak_memory",
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts, 0);
TRACE_COUNTER("thread_context_switch",
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, 0);
TRACE_COUNTER("thread_page_fault",
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, 0);
if(tim::trait::runtime_enabled<hw_counters>::get())
{
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid); ++i)
{
if(i < _hw_cnt_labels.size())
{
TRACE_COUNTER("hardware_counter",
perfetto_counter_track<hw_counters>::at(_tid, i), _ts, 0.0);
}
}
}
}
void
backtrace_metrics::post_process_perfetto(int64_t _tid, uint64_t _ts) const
{
TRACE_COUNTER("thread_peak_memory",
perfetto_counter_track<perfetto_rusage>::at(_tid, 0), _ts,
m_mem_peak / units::megabyte);
TRACE_COUNTER("thread_context_switch",
perfetto_counter_track<perfetto_rusage>::at(_tid, 1), _ts, m_ctx_swch);
TRACE_COUNTER("thread_page_fault",
perfetto_counter_track<perfetto_rusage>::at(_tid, 2), _ts, m_page_flt);
if(tim::trait::runtime_enabled<hw_counters>::get())
{
for(size_t i = 0; i < perfetto_counter_track<hw_counters>::size(_tid); ++i)
{
if(i < m_hw_counter.size())
{
TRACE_COUNTER("hardware_counter",
perfetto_counter_track<hw_counters>::at(_tid, i), _ts,
m_hw_counter.at(i));
}
}
}
}
} // namespace component
} // namespace omnitrace
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
double)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
double)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
double)
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::backtrace_metrics)
@@ -0,0 +1,119 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
#include "library/common.hpp"
#include "library/components/backtrace.hpp"
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
#include "library/thread_data.hpp"
#include "library/timemory.hpp"
#include <timemory/components/base.hpp>
#include <timemory/components/papi/papi_array.hpp>
#include <timemory/components/papi/types.hpp>
#include <timemory/macros/language.hpp>
#include <timemory/mpl/concepts.hpp>
#include <timemory/variadic/types.hpp>
#include <array>
#include <chrono>
#include <cstddef>
#include <cstdint>
#include <set>
#include <vector>
namespace omnitrace
{
namespace component
{
struct backtrace_metrics
: tim::component::empty_base
, tim::concepts::component
{
static constexpr size_t num_hw_counters = TIMEMORY_PAPI_ARRAY_SIZE;
using clock_type = std::chrono::steady_clock;
using value_type = void;
using hw_counters = tim::component::papi_array<num_hw_counters>;
using hw_counter_data_t = typename hw_counters::value_type;
using system_clock = std::chrono::system_clock;
using system_time_point = typename system_clock::time_point;
static std::string label();
static std::string description();
backtrace_metrics() = default;
~backtrace_metrics() = default;
backtrace_metrics(const backtrace_metrics&) = default;
backtrace_metrics(backtrace_metrics&&) noexcept = default;
backtrace_metrics& operator=(const backtrace_metrics&) = default;
backtrace_metrics& operator=(backtrace_metrics&&) noexcept = default;
static void configure(bool, int64_t _tid = threading::get_id());
static void init_perfetto(int64_t _tid);
static void fini_perfetto(int64_t _tid);
static void start();
static void stop();
void sample(int = -1);
void post_process(int64_t _tid, const backtrace* _bt,
const backtrace_metrics* _last) const;
auto get_cpu_timestamp() const { return m_cpu; }
auto get_peak_memory() const { return m_mem_peak; }
auto get_context_switches() const { return m_ctx_swch; }
auto get_page_faults() const { return m_page_flt; }
const auto& get_hw_counters() const { return m_hw_counter; }
void post_process_perfetto(int64_t _tid, uint64_t _ts) const;
private:
int64_t m_cpu = 0;
int64_t m_mem_peak = 0;
int64_t m_ctx_swch = 0;
int64_t m_page_flt = 0;
hw_counter_data_t m_hw_counter = {};
};
} // namespace component
} // namespace omnitrace
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
# include <timemory/operations.hpp>
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_wall_clock>), true,
double)
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_cpu_clock>), true,
double)
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_fraction>), true,
double)
#endif
@@ -20,33 +20,35 @@
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/components/user_region.hpp"
#include "library/api.hpp"
#include "library/components/fwd.hpp"
#include "library/components/backtrace_timestamp.hpp"
#include "library/thread_info.hpp"
#include <timemory/components/timing/backends.hpp>
namespace omnitrace
{
namespace component
{
void
user_region::start()
bool
backtrace_timestamp::operator<(const backtrace_timestamp& rhs) const
{
if(m_prefix) omnitrace_push_region_hidden(m_prefix);
return std::tie(m_tid, m_real) < std::tie(rhs.m_tid, rhs.m_real);
}
bool
backtrace_timestamp::is_valid() const
{
const auto& _info = thread_info::get(m_tid, InternalTID);
return (_info) ? _info->is_valid_time(m_real) : false;
}
void
user_region::stop()
backtrace_timestamp::sample(int)
{
if(m_prefix) omnitrace_pop_region_hidden(m_prefix);
}
void
user_region::set_prefix(const char* _prefix)
{
m_prefix = _prefix;
m_tid = tim::threading::get_id();
m_real = tim::get_clock_real_now<uint64_t, std::nano>();
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::user_region)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(omnitrace_user_region, false, void)
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::backtrace_timestamp)
@@ -22,35 +22,53 @@
#pragma once
#include "library/common.hpp"
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
#include "library/timemory.hpp"
#include <timemory/components/base.hpp>
#include <timemory/macros/language.hpp>
#include <timemory/mpl/concepts.hpp>
#include <chrono>
#include <cstdint>
namespace omnitrace
{
namespace component
{
// timemory component which calls omnitrace functions
// (used in gotcha wrappers)
struct user_region : comp::base<user_region, void>
struct backtrace_timestamp
: tim::component::empty_base
, tim::concepts::component
{
static std::string label() { return "user_region"; }
void start();
void stop();
void set_prefix(const char*);
using value_type = void;
static std::string label() { return "backtrace_timestamp"; }
static std::string description() { return "Timestamp for backtrace"; }
backtrace_timestamp() = default;
~backtrace_timestamp() = default;
backtrace_timestamp(const backtrace_timestamp&) = default;
backtrace_timestamp(backtrace_timestamp&&) noexcept = default;
backtrace_timestamp& operator=(const backtrace_timestamp&) = default;
backtrace_timestamp& operator=(backtrace_timestamp&&) noexcept = default;
bool operator<(const backtrace_timestamp& rhs) const;
static void start() {}
static void stop() {}
void sample(int = -1);
auto get_tid() const { return m_tid; }
auto get_timestamp() const { return m_real; }
bool is_valid() const;
private:
const char* m_prefix = nullptr;
int64_t m_tid = 0;
uint64_t m_real = 0;
};
} // namespace component
} // namespace omnitrace
TIMEMORY_COMPONENT_ALIAS(omnitrace_user_region, omnitrace::component::user_region)
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(omnitrace_user_region, false, void)
#endif
@@ -23,7 +23,9 @@
#pragma once
#include "library/config.hpp"
#include "library/critical_trace.hpp"
#include "library/defines.hpp"
#include "library/runtime.hpp"
#include "library/timemory.hpp"
#include "library/tracing.hpp"
@@ -48,8 +50,6 @@ struct timemory : concepts::quirk_type
namespace omnitrace
{
namespace audit = ::tim::audit;
namespace component
{
// timemory component which calls omnitrace functions
@@ -90,36 +90,23 @@ template <typename... OptsT, typename... Args>
void
category_region<CategoryT>::start(std::string_view name, Args&&... args)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
// unconditionally return if thread is disabled or finalized
if(get_thread_state() == ThreadState::Disabled) return;
if(get_state() == State::Finalized) return;
// unconditionally return if finalized
if(get_state() == State::Finalized)
{
OMNITRACE_CONDITIONAL_BASIC_PRINT(
tracing::debug_user, "omnitrace_push_region(%s) called during finalization\n",
name.data());
return;
}
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
// the expectation here is that if the state is not active then the call
// to omnitrace_init_tooling_hidden will activate all the appropriate
// tooling one time and as it exits set it to active and return true.
if(get_state() != State::Active && !omnitrace_init_tooling_hidden())
{
static auto _debug = get_debug_env() || get_debug_init();
OMNITRACE_CONDITIONAL_BASIC_PRINT(
_debug, "[%s] omnitrace_push_region(%s) ignored :: not active. state = %s\n",
category_name, name.data(), std::to_string(get_state()).c_str());
return;
}
if(get_state() != State::Active && !omnitrace_init_tooling_hidden()) return;
OMNITRACE_CONDITIONAL_PRINT(tracing::debug_push,
"[%s][PID=%i][state=%s] omnitrace_push_region(%s)\n",
category_name, process::get_id(),
std::to_string(get_state()).c_str(), name.data());
tracing::thread_init();
auto _use_timemory = get_use_timemory();
auto _use_perfetto = get_use_perfetto();
// thread initialization may have disabled the thread
if(get_thread_state() == ThreadState::Disabled) return;
tracing::thread_init_sampling();
constexpr bool _ct_use_timemory =
(sizeof...(OptsT) == 0 ||
@@ -129,23 +116,49 @@ category_region<CategoryT>::start(std::string_view name, Args&&... args)
(sizeof...(OptsT) == 0 ||
tim::is_one_of<quirk::perfetto, tim::type_list<OptsT...>>::value);
if(_use_timemory || _use_perfetto) tracing::thread_init();
OMNITRACE_CONDITIONAL_PRINT(tracing::debug_push,
"[%s][PID=%i][state=%s] omnitrace_push_region(%s)\n",
category_name, process::get_id(),
std::to_string(get_state()).c_str(), name.data());
if(_use_perfetto)
if constexpr(tim::is_one_of<CategoryT, tim::type_list<category::host>>::value)
{
if constexpr(_ct_use_perfetto)
++tracing::push_count();
}
if constexpr(_ct_use_perfetto)
{
if(get_use_perfetto())
{
tracing::push_perfetto(CategoryT{}, name.data(), std::forward<Args>(args)...);
}
}
if(_use_timemory)
if constexpr(_ct_use_timemory)
{
if constexpr(_ct_use_timemory)
if(get_use_timemory())
{
tracing::push_timemory(name.data(), std::forward<Args>(args)...);
tracing::push_timemory(CategoryT{}, name.data(), std::forward<Args>(args)...);
}
}
if constexpr(tim::is_one_of<CategoryT, tim::type_list<category::host>>::value)
{
using Device = critical_trace::Device;
using Phase = critical_trace::Phase;
if(get_use_critical_trace())
{
uint64_t _cid = 0;
uint64_t _parent_cid = 0;
uint32_t _depth = 0;
std::tie(_cid, _parent_cid, _depth) = create_cpu_cid_entry();
auto _ts = comp::wall_clock::record();
add_critical_trace<Device::CPU, Phase::BEGIN>(
threading::get_id(), _cid, 0, _parent_cid, _ts, 0, 0, 0,
critical_trace::add_hash_id(name.data()), _depth);
}
}
if(_use_timemory || _use_perfetto) tracing::thread_init_sampling();
}
template <typename CategoryT>
@@ -153,6 +166,8 @@ template <typename... OptsT, typename... Args>
void
category_region<CategoryT>::stop(std::string_view name, Args&&... args)
{
if(get_thread_state() == ThreadState::Disabled) return;
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
constexpr bool _ct_use_timemory =
@@ -171,21 +186,52 @@ category_region<CategoryT>::stop(std::string_view name, Args&&... args)
// only execute when active
if(get_state() == State::Active)
{
if(get_use_timemory())
if constexpr(tim::is_one_of<CategoryT, tim::type_list<category::host>>::value)
{
if constexpr(_ct_use_timemory)
++tracing::pop_count();
}
if constexpr(_ct_use_timemory)
{
if(get_use_timemory())
{
tracing::pop_timemory(name.data(), std::forward<Args>(args)...);
tracing::pop_timemory(CategoryT{}, name.data(),
std::forward<Args>(args)...);
}
}
if(get_use_perfetto())
if constexpr(_ct_use_perfetto)
{
if constexpr(_ct_use_perfetto)
if(get_use_perfetto())
{
tracing::pop_perfetto(CategoryT{}, name.data(),
std::forward<Args>(args)...);
}
}
if constexpr(tim::is_one_of<CategoryT, tim::type_list<category::host>>::value)
{
using Device = critical_trace::Device;
using Phase = critical_trace::Phase;
if(get_use_critical_trace())
{
if(get_cpu_cid_stack() && !get_cpu_cid_stack()->empty())
{
auto _cid = get_cpu_cid_stack()->back();
if(get_cpu_cid_parents()->find(_cid) != get_cpu_cid_parents()->end())
{
uint64_t _parent_cid = 0;
uint32_t _depth = 0;
auto _ts = comp::wall_clock::record();
std::tie(_parent_cid, _depth) = get_cpu_cid_parents()->at(_cid);
add_critical_trace<Device::CPU, Phase::END>(
threading::get_id(), _cid, 0, _parent_cid, _ts, _ts, 0, 0,
critical_trace::add_hash_id(name.data()), _depth);
}
}
}
}
}
else
{
@@ -249,5 +295,51 @@ category_region<CategoryT>::audit(quirk::config<OptsT...>, Args&&... _args)
{
audit<OptsT...>(std::forward<Args>(_args)...);
}
template <typename CategoryT>
struct local_category_region : comp::base<local_category_region<CategoryT>, void>
{
using impl_type = category_region<CategoryT>;
static constexpr auto category_name = impl_type::category_name;
static std::string label() { return impl_type::label(); }
template <typename... OptsT, typename... Args>
auto start(Args&&... args)
{
if(m_prefix.empty()) return;
return impl_type::template start<OptsT...>(m_prefix, std::forward<Args>(args)...);
}
template <typename... OptsT, typename... Args>
auto stop(Args&&... args)
{
if(m_prefix.empty()) return;
return impl_type::template stop<OptsT...>(m_prefix, std::forward<Args>(args)...);
}
template <typename... OptsT, typename... Args>
auto audit(Args&&... args)
-> decltype(impl_type::template audit<OptsT...>(std::declval<std::string_view>(),
std::forward<Args>(args)...))
{
if(m_prefix.empty()) return;
return impl_type::template audit<OptsT...>(m_prefix, std::forward<Args>(args)...);
}
template <typename... OptsT, typename... Args>
auto audit(quirk::config<OptsT...>, Args&&... args)
{
if(m_prefix.empty()) return;
return impl_type::template audit<OptsT...>(quirk::config<OptsT...>{}, m_prefix,
std::forward<Args>(args)...);
}
void set_prefix(std::string_view _v) { m_prefix = _v; }
private:
std::string_view m_prefix = {};
};
} // namespace component
} // namespace omnitrace
@@ -31,7 +31,7 @@
#include <timemory/units.hpp>
#include <timemory/utility/locking.hpp>
namespace tim
namespace omnitrace
{
namespace component
{
@@ -415,9 +415,9 @@ comm_data::audit(const gotcha_data& _data, audit::incoming, const void*, const v
}
#endif
} // namespace component
} // namespace tim
} // namespace omnitrace
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<float, tim::api::omnitrace>), true, float)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<float, tim::project::omnitrace>), true, float)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(comm_data, false, void)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(comm_data, false, void)
@@ -30,6 +30,7 @@
#include "library/timemory.hpp"
#include <timemory/api/macros.hpp>
#include <timemory/components/gotcha/backends.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/operations/types/set.hpp>
#include <timemory/utility/types.hpp>
@@ -55,11 +56,14 @@
#include <string>
#include <utility>
namespace tim
OMNITRACE_COMPONENT_ALIAS(comm_data_tracker_t,
::tim::component::data_tracker<float, project::omnitrace>)
namespace omnitrace
{
namespace component
{
using comm_data_tracker_t = data_tracker<float, api::omnitrace>;
using gotcha_data = ::tim::component::gotcha_data;
struct comm_data : base<comm_data, void>
{
@@ -231,7 +235,7 @@ private:
}
};
} // namespace component
} // namespace tim
} // namespace omnitrace
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
@@ -240,8 +244,8 @@ private:
# include <timemory/components/data_tracker/components.hpp>
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(TIMEMORY_ESC(data_tracker<float, tim::api::omnitrace>),
true, float)
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<float, tim::project::omnitrace>), true, float)
TIMEMORY_DECLARE_EXTERN_COMPONENT(comm_data, false, void)
OMNITRACE_DECLARE_EXTERN_COMPONENT(comm_data, false, void)
#endif
@@ -0,0 +1,221 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/components/cpu_freq.hpp"
#include "library/common.hpp"
#include "library/components/fwd.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include "library/perfetto.hpp"
#include "library/timemory.hpp"
#include <timemory/components/macros.hpp>
#include <timemory/components/rusage/backends.hpp>
#include <timemory/mpl/types.hpp>
#include <timemory/units.hpp>
#include <timemory/utility/procfs/cpuinfo.hpp>
#include <timemory/utility/type_list.hpp>
namespace cpuinfo = tim::procfs::cpuinfo;
namespace omnitrace
{
namespace component
{
cpu_freq::cpu_id_set_t&
cpu_freq::get_enabled_cpus()
{
static auto _v = cpu_id_set_t{};
return _v;
}
std::string
cpu_freq::label()
{
return "cpu_freq";
}
std::string
cpu_freq::description()
{
return "Records the current CPU frequencies";
}
int64_t
cpu_freq::unit()
{
return tim::units::MHz;
}
std::string
cpu_freq::display_unit()
{
return tim::units::freq_repr(unit());
}
void
cpu_freq::configure()
{
auto _ncpu = cpuinfo::freq::size();
auto _enabled_freqs = std::set<uint64_t>{};
auto _enabled_val = get_sampling_cpus();
for(auto& itr : _enabled_val)
itr = tolower(itr);
if(_enabled_val == "off")
_enabled_val = "none";
else if(_enabled_val == "on")
_enabled_val = "all";
if(_enabled_val != "none" && _enabled_val != "all")
{
auto _enabled = tim::delimit(_enabled_val, ",; \t");
if(_enabled.empty())
{
for(size_t i = 0; i < _ncpu; ++i)
_enabled_freqs.emplace(i);
}
for(auto&& _v : _enabled)
{
if(_v.find_first_not_of("0123456789-") != std::string::npos)
{
OMNITRACE_VERBOSE_F(
0,
"Invalid CPU specification. Only numerical values (e.g., 0) or "
"ranges (e.g., 0-7) are permitted. Ignoring %s...",
_v.c_str());
continue;
}
if(_v.find('-') != std::string::npos)
{
auto _vv = tim::delimit(_v, "-");
OMNITRACE_CONDITIONAL_THROW(
_vv.size() != 2,
"Invalid CPU range specification: %s. Required format N-M, e.g. 0-4",
_v.c_str());
for(size_t i = std::stoull(_vv.at(0)); i <= std::stoull(_vv.at(1)); ++i)
_enabled_freqs.emplace(i);
}
else
{
_enabled_freqs.emplace(std::stoull(_v));
}
}
}
else if(_enabled_val == "all")
{
for(size_t i = 0; i < _ncpu; ++i)
_enabled_freqs.emplace(i);
}
else if(_enabled_val == "none")
{
_enabled_freqs.clear();
}
for(auto itr : _enabled_freqs)
{
if(itr < cpuinfo::freq::size())
_enabled_freqs.emplace(itr);
else
{
OMNITRACE_VERBOSE(
0, "[cpu_freq::config] Warning! Removing invalid cpu %zu...\n", itr);
}
}
if(!cpuinfo::freq{})
{
OMNITRACE_VERBOSE(0, "[cpu_freq::config] Warning! CPU frequencies are disabled "
":: unable to open /proc/cpuinfo");
_enabled_freqs.clear();
}
OMNITRACE_CI_FAIL(!cpuinfo::freq{}, "[cpu_freq::config] CPU frequencies are disabled "
":: unable to open /proc/cpuinfo");
get_enabled_cpus() = _enabled_freqs;
}
std::string
cpu_freq::as_string() const
{
return tim::operation::base_printer<cpu_freq>{}(std::stringstream{}, *this).str();
}
cpu_freq::value_type
cpu_freq::record()
{
auto& enabled_cpu_freqs = get_enabled_cpus();
std::vector<uint64_t> _freqs{};
if(!enabled_cpu_freqs.empty())
{
_freqs.reserve(enabled_cpu_freqs.size());
auto&& _freq = cpuinfo::freq{};
for(const auto& itr : enabled_cpu_freqs)
{
_freqs.emplace_back(_freq(itr) * tim::units::MHz);
}
}
return _freqs;
}
void
cpu_freq::start()
{
value = record();
}
void
cpu_freq::stop()
{
using namespace tim::stl;
value = (record() - value);
}
cpu_freq&
cpu_freq::sample()
{
value = record();
return *this;
}
float
cpu_freq::at(size_t _idx, int64_t _unit) const
{
return (value.at(_idx) / static_cast<float>(_unit));
}
std::vector<float>
cpu_freq::get(int64_t _unit) const
{
std::vector<float> _v{};
_v.reserve(value.size());
for(const auto& itr : value)
_v.emplace_back(itr / static_cast<float>(_unit));
return _v;
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::cpu_freq)
@@ -0,0 +1,113 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
#include "library/common.hpp"
#include "library/defines.hpp"
#include "library/timemory.hpp"
#include <timemory/mpl/concepts.hpp>
#include <timemory/units.hpp>
namespace omnitrace
{
namespace component
{
struct cpu_freq
: tim::concepts::component
, tim::component::empty_base
, tim::component::base_format<cpu_freq>
, tim::component::base_data<std::vector<uint64_t>, 1>
{
using base_type = tim::component::empty_base;
using this_type = cpu_freq;
using value_type = std::vector<uint64_t>;
using storage_type = tim::storage<cpu_freq, value_type>;
using cpu_id_set_t = std::set<uint64_t>;
TIMEMORY_DEFAULT_OBJECT(cpu_freq)
// string id for component
static std::string label();
static std::string description();
static int64_t unit();
static std::string display_unit();
static void configure();
static cpu_id_set_t& get_enabled_cpus();
static value_type record();
// this will get called right before fork
void start();
void stop();
cpu_freq& sample();
std::string as_string() const;
float at(size_t _idx, int64_t _unit = unit()) const;
std::vector<float> get(int64_t _unit = unit()) const;
public:
static auto get_label() { return label(); }
static auto get_description() { return description(); }
static auto get_unit() { return unit(); }
static auto get_display_unit() { return display_unit(); }
static int64_t get_laps() { return 0; }
static storage_type* get_storage() { return nullptr; }
auto get_display() const { return as_string(); }
friend std::ostream& operator<<(std::ostream& _os, const cpu_freq& _v)
{
return (_os << _v.as_string());
}
template <typename ArchiveT>
void serialize(ArchiveT& _ar, const unsigned _version)
{
if constexpr(tim::concepts::is_output_archive<ArchiveT>::value)
operation::serialization<cpu_freq>{}(*this, _ar, _version);
else
_ar(tim::cereal::make_nvp("value", value));
}
this_type& operator+=(const this_type& _rhs)
{
using namespace tim::stl;
value += _rhs.value;
return *this;
}
this_type& operator-=(const this_type& _rhs)
{
using namespace tim::stl;
value -= _rhs.value;
return *this;
}
private:
using tim::component::base_data<value_type, 1>::value;
};
} // namespace component
} // namespace omnitrace
OMNITRACE_DEFINE_NAME_TRAIT("cpu_freq", omnitrace::component::cpu_freq);
@@ -0,0 +1,69 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
#include "library/defines.hpp"
#include <timemory/backends/threading.hpp>
#include <timemory/mpl/type_traits.hpp>
#include <timemory/operations/types.hpp>
#include <timemory/utility/macros.hpp>
#include <timemory/utility/type_list.hpp>
namespace omnitrace
{
namespace component
{
namespace
{
template <typename... Tp>
struct ensure_storage
{
TIMEMORY_DEFAULT_OBJECT(ensure_storage)
void operator()() const { OMNITRACE_FOLD_EXPRESSION((*this)(tim::type_list<Tp>{})); }
private:
template <typename Up, std::enable_if_t<tim::trait::is_available<Up>::value, int> = 0>
void operator()(tim::type_list<Up>) const
{
using namespace tim;
static thread_local auto _storage = operation::get_storage<Up>{}();
static thread_local auto _tid = threading::get_id();
static thread_local auto _dtor =
scope::destructor{ []() { operation::set_storage<Up>{}(nullptr, _tid); } };
tim::operation::set_storage<Up>{}(_storage, _tid);
if(_tid == 0 && !_storage) tim::trait::runtime_enabled<Up>::set(false);
}
template <typename Up,
std::enable_if_t<!tim::trait::is_available<Up>::value, long> = 0>
void operator()(tim::type_list<Up>) const
{
tim::trait::runtime_enabled<Up>::set(false);
}
};
} // namespace
} // namespace component
} // namespace omnitrace
@@ -29,8 +29,6 @@
#include "library/timemory.hpp"
#include <timemory/backends/threading.hpp>
#include <timemory/components/timing/wall_clock.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/utility/types.hpp>
#include <cstddef>
@@ -38,13 +36,15 @@
namespace omnitrace
{
namespace component
{
void
exit_gotcha::configure()
{
exit_gotcha_t::get_initializer() = []() {
exit_gotcha_t::template configure<0, void>("abort");
exit_gotcha_t::template configure<1, void, int>("exit");
exit_gotcha_t::template configure<2, void, int>("quick_exit");
exit_gotcha_t::configure<0, void>("abort");
exit_gotcha_t::configure<1, void, int>("exit");
exit_gotcha_t::configure<2, void, int>("quick_exit");
};
}
@@ -54,10 +54,29 @@ template <typename FuncT, typename... Args>
void
invoke_exit_gotcha(const exit_gotcha::gotcha_data& _data, FuncT _func, Args... _args)
{
OMNITRACE_VERBOSE(0, "%s called %s(%s)...\n", get_exe_name().c_str(),
_data.tool_id.c_str(), JOIN(", ", _args...).c_str());
if(config::settings_are_configured())
{
OMNITRACE_VERBOSE(0, "%s called %s(%s)...\n", get_exe_name().c_str(),
_data.tool_id.c_str(), JOIN(", ", _args...).c_str());
}
else
{
OMNITRACE_BASIC_VERBOSE(0, "%s called %s(%s)...\n", get_exe_name().c_str(),
_data.tool_id.c_str(), JOIN(", ", _args...).c_str());
}
if(get_state() != omnitrace::State::Finalized) omnitrace_finalize_hidden();
if(get_state() != State::Finalized) omnitrace_finalize_hidden();
if(config::settings_are_configured())
{
OMNITRACE_VERBOSE(0, "%s called %s(%s)...\n", get_exe_name().c_str(),
_data.tool_id.c_str(), JOIN(", ", _args...).c_str());
}
else
{
OMNITRACE_BASIC_VERBOSE(0, "%s called %s(%s)...\n", get_exe_name().c_str(),
_data.tool_id.c_str(), JOIN(", ", _args...).c_str());
}
(*_func)(_args...);
}
@@ -77,4 +96,5 @@ exit_gotcha::operator()(const gotcha_data& _data, abort_func_t _func) const
{
invoke_exit_gotcha(_data, _func);
}
} // namespace component
} // namespace omnitrace
@@ -34,6 +34,8 @@
namespace omnitrace
{
namespace component
{
struct exit_gotcha : tim::component::base<exit_gotcha, void>
{
using gotcha_data = tim::component::gotcha_data;
@@ -49,11 +51,15 @@ struct exit_gotcha : tim::component::base<exit_gotcha, void>
static void configure();
static void shutdown();
static inline void start() {}
static inline void stop() {}
// exit
void operator()(const gotcha_data&, exit_func_t, int) const;
// abort
void operator()(const gotcha_data&, abort_func_t) const;
};
} // namespace component
using exit_gotcha_t = tim::component::gotcha<3, std::tuple<>, exit_gotcha>;
using exit_gotcha_t = tim::component::gotcha<3, std::tuple<>, component::exit_gotcha>;
} // namespace omnitrace
@@ -32,6 +32,8 @@
namespace omnitrace
{
namespace component
{
void
fork_gotcha::configure()
{
@@ -57,8 +59,9 @@ fork_gotcha::audit(const gotcha_data_t&, audit::outgoing, pid_t _pid)
if(_pid != 0)
{
OMNITRACE_VERBOSE(1, "fork() called on PID %i created PID %i\n", getppid(), _pid);
tim::settings::use_output_suffix() = true;
tim::settings::default_process_suffix() = process::get_id();
settings::use_output_suffix() = true;
settings::default_process_suffix() = process::get_id();
}
}
} // namespace component
} // namespace omnitrace
@@ -28,6 +28,8 @@
namespace omnitrace
{
namespace component
{
// this is used to wrap fork()
struct fork_gotcha : comp::base<fork_gotcha, void>
{
@@ -51,6 +53,8 @@ struct fork_gotcha : comp::base<fork_gotcha, void>
static inline void start() {}
static inline void stop() {}
};
} // namespace component
using fork_gotcha_t = comp::gotcha<4, tim::component_tuple<fork_gotcha>, api::omnitrace>;
using fork_gotcha_t =
comp::gotcha<4, tim::component_tuple<component::fork_gotcha>, project::omnitrace>;
} // namespace omnitrace
@@ -1,177 +0,0 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#pragma once
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
#include "library/runtime.hpp"
#include "library/state.hpp"
#include "library/timemory.hpp"
#include <timemory/mpl/concepts.hpp>
#include <timemory/mpl/function_traits.hpp>
#include <timemory/utility/macros.hpp>
#include <type_traits>
#include <utility>
namespace omnitrace
{
namespace component
{
template <bool Cond, typename Tp>
using enable_if_t = typename std::enable_if<Cond, Tp>::type;
template <typename... Tp>
static auto get_default_functor(tim::type_list<Tp...>)
{
return [](Tp...) {};
};
// timemory component which calls omnitrace functions
// (used in gotcha wrappers)
template <typename ApiT, typename StartFuncT, typename StopFuncT>
struct functors : comp::base<functors<ApiT, StartFuncT, StopFuncT>, void>
{
using this_type = functors<ApiT, StartFuncT, StopFuncT>;
using base_type = comp::base<functors<ApiT, StartFuncT, StopFuncT>, void>;
using pair_type = std::pair<StartFuncT, StopFuncT>;
static constexpr bool begin_supports_cstr =
std::is_invocable<StartFuncT, const char*>::value;
static constexpr bool end_supports_cstr =
std::is_invocable<StopFuncT, const char*>::value;
static constexpr bool begin_supports_void = std::is_invocable<StartFuncT>::value;
static constexpr bool end_supports_void = std::is_invocable<StopFuncT>::value;
static void preinit();
static void configure(StartFuncT&& _beg, StopFuncT&& _end);
static std::string label();
template <typename... Args, enable_if_t<((sizeof...(Args) > 0) &&
std::is_invocable_v<StartFuncT, Args...>),
int> = 0>
static auto start(Args&&... _args)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
get_functors().first(std::forward<Args>(_args)...);
}
template <typename... Args, enable_if_t<((sizeof...(Args) > 0) &&
std::is_invocable_v<StopFuncT, Args...>),
int> = 0>
static auto stop(Args&&... _args)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
get_functors().second(std::forward<Args>(_args)...);
}
TIMEMORY_DEFAULT_OBJECT(functors)
template <typename Tp = this_type, enable_if_t<Tp::begin_supports_cstr, int> = 0>
void start()
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
get_functors().first(m_prefix);
}
template <typename Tp = this_type, enable_if_t<Tp::end_supports_cstr, int> = 0>
void stop()
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
get_functors().second(m_prefix);
}
template <typename Tp = this_type, enable_if_t<Tp::begin_supports_void, int> = 0>
void start()
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
get_functors().first();
}
template <typename Tp = this_type, enable_if_t<Tp::end_supports_void, int> = 0>
void stop()
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
get_functors().second();
}
template <typename Tp = this_type,
enable_if_t<Tp::begin_supports_cstr || Tp::end_supports_cstr, int> = 0>
void set_prefix(const char* _v)
{
m_prefix = _v;
}
private:
static bool& is_configured();
static pair_type& get_functors();
private:
const char* m_prefix = nullptr;
};
template <typename ApiT, typename StartFuncT, typename StopFuncT>
void
functors<ApiT, StartFuncT, StopFuncT>::preinit()
{
using start_args_t = typename tim::mpl::function_traits<StartFuncT>::args_type;
using stop_args_t = typename tim::mpl::function_traits<StopFuncT>::args_type;
get_functors().first = get_default_functor(start_args_t{});
get_functors().second = get_default_functor(stop_args_t{});
}
template <typename ApiT, typename StartFuncT, typename StopFuncT>
void
functors<ApiT, StartFuncT, StopFuncT>::configure(StartFuncT&& _beg, StopFuncT&& _end)
{
is_configured() = true;
get_functors().first = std::forward<StartFuncT>(_beg);
get_functors().second = std::forward<StopFuncT>(_end);
}
template <typename ApiT, typename StartFuncT, typename StopFuncT>
std::string
functors<ApiT, StartFuncT, StopFuncT>::label()
{
return trait::name<this_type>::value;
}
template <typename ApiT, typename StartFuncT, typename StopFuncT>
bool&
functors<ApiT, StartFuncT, StopFuncT>::is_configured()
{
static bool _v = false;
return _v;
}
template <typename ApiT, typename StartFuncT, typename StopFuncT>
typename functors<ApiT, StartFuncT, StopFuncT>::pair_type&
functors<ApiT, StartFuncT, StopFuncT>::get_functors()
{
static auto _v = pair_type{};
return _v;
}
} // namespace component
} // namespace omnitrace
@@ -22,11 +22,13 @@
#pragma once
#include "library/categories.hpp"
#include "library/common.hpp"
#include "library/defines.hpp"
#include <timemory/api.hpp>
#include <timemory/api/macros.hpp>
#include <timemory/components/base/types.hpp>
#include <timemory/components/data_tracker/types.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/components/user_bundle/types.hpp>
@@ -37,78 +39,24 @@
#include <type_traits>
TIMEMORY_DEFINE_NS_API(project, omnitrace)
TIMEMORY_DEFINE_NS_API(category, process_sampling)
OMNITRACE_DECLARE_COMPONENT(roctracer)
OMNITRACE_DECLARE_COMPONENT(rocprofiler)
OMNITRACE_DECLARE_COMPONENT(rcclp_handle)
OMNITRACE_DECLARE_COMPONENT(comm_data)
TIMEMORY_DECLARE_COMPONENT(roctracer)
TIMEMORY_DECLARE_COMPONENT(rocprofiler)
TIMEMORY_DECLARE_COMPONENT(rcclp_handle)
TIMEMORY_COMPONENT_ALIAS(rccl_api_t, api::rccl)
TIMEMORY_COMPONENT_ALIAS(comm_data_tracker_t, data_tracker<float, api::omnitrace>)
TIMEMORY_DECLARE_COMPONENT(comm_data)
/// \struct tim::trait::name
/// \brief provides a constexpr string in ::value
TIMEMORY_DECLARE_TYPE_TRAIT(name, typename Tp)
#define TIMEMORY_DEFINE_NAME_TRAIT(NAME, ...) \
namespace tim \
{ \
namespace trait \
{ \
template <> \
struct name<__VA_ARGS__> \
{ \
static constexpr auto value = NAME; \
}; \
template <> \
struct name<type_list<__VA_ARGS__>> : name<__VA_ARGS__> \
{}; \
} \
}
TIMEMORY_DEFINE_NS_API(category, host)
TIMEMORY_DEFINE_NS_API(category, user)
TIMEMORY_DEFINE_NS_API(category, device)
TIMEMORY_DEFINE_NS_API(category, device_hip)
TIMEMORY_DEFINE_NS_API(category, device_hsa)
TIMEMORY_DEFINE_NS_API(category, rocm_hip)
TIMEMORY_DEFINE_NS_API(category, rocm_hsa)
TIMEMORY_DEFINE_NS_API(category, rocm_smi)
TIMEMORY_DEFINE_NS_API(category, rocm_roctx)
TIMEMORY_DEFINE_NS_API(category, pthread)
TIMEMORY_DEFINE_NS_API(category, kokkos)
TIMEMORY_DEFINE_NS_API(category, mpi)
TIMEMORY_DEFINE_NS_API(category, ompt)
TIMEMORY_DEFINE_NS_API(category, rccl)
TIMEMORY_DEFINE_NS_API(category, critical_trace)
TIMEMORY_DEFINE_NS_API(category, host_critical_trace)
TIMEMORY_DEFINE_NS_API(category, device_critical_trace)
TIMEMORY_DEFINE_NAME_TRAIT("host", category::host);
TIMEMORY_DEFINE_NAME_TRAIT("device", category::device);
TIMEMORY_DEFINE_NAME_TRAIT("device_hip", category::device_hip);
TIMEMORY_DEFINE_NAME_TRAIT("device_hsa", category::device_hsa);
TIMEMORY_DEFINE_NAME_TRAIT("user", category::user);
TIMEMORY_DEFINE_NAME_TRAIT("rocm_hip", category::rocm_hip);
TIMEMORY_DEFINE_NAME_TRAIT("rocm_hsa", category::rocm_hsa);
TIMEMORY_DEFINE_NAME_TRAIT("rocm_smi", category::rocm_smi);
TIMEMORY_DEFINE_NAME_TRAIT("rocm_roctx", category::rocm_roctx);
TIMEMORY_DEFINE_NAME_TRAIT("sampling", category::sampling);
TIMEMORY_DEFINE_NAME_TRAIT("thread_sampling", category::thread_sampling);
TIMEMORY_DEFINE_NAME_TRAIT("pthread", category::pthread);
TIMEMORY_DEFINE_NAME_TRAIT("kokkos", category::kokkos);
TIMEMORY_DEFINE_NAME_TRAIT("mpi", category::mpi);
TIMEMORY_DEFINE_NAME_TRAIT("ompt", category::ompt);
TIMEMORY_DEFINE_NAME_TRAIT("rccl", category::rccl);
TIMEMORY_DEFINE_NAME_TRAIT("critical-trace", category::critical_trace);
TIMEMORY_DEFINE_NAME_TRAIT("host-critical-trace", category::host_critical_trace);
TIMEMORY_DEFINE_NAME_TRAIT("device-critical-trace", category::device_critical_trace);
OMNITRACE_COMPONENT_ALIAS(comm_data_tracker_t,
::tim::component::data_tracker<float, project::omnitrace>)
namespace omnitrace
{
namespace policy = ::tim::policy; // NOLINT
namespace comp = ::tim::component; // NOLINT
namespace component
{
template <typename Tp, typename ValueT>
using base = ::tim::component::base<Tp, ValueT>;
template <typename... Tp>
using data_tracker = tim::component::data_tracker<Tp...>;
@@ -117,9 +65,9 @@ using functor_t = std::function<void(Tp...)>;
using default_functor_t = functor_t<const char*>;
struct omnitrace;
struct user_region;
struct backtrace;
struct backtrace_metrics;
struct backtrace_timestamp;
struct backtrace_wall_clock
{};
struct backtrace_cpu_clock
@@ -141,8 +89,6 @@ using sampling_gpu_busy = data_tracker<double, backtrace_gpu_busy>;
using sampling_gpu_temp = data_tracker<double, backtrace_gpu_temp>;
using sampling_gpu_power = data_tracker<double, backtrace_gpu_power>;
using sampling_gpu_memory = data_tracker<double, backtrace_gpu_memory>;
using roctracer = tim::component::roctracer;
using rocprofiler = tim::component::rocprofiler;
template <typename ApiT, typename StartFuncT = default_functor_t,
typename StopFuncT = default_functor_t>
@@ -151,49 +97,40 @@ struct functors;
} // namespace omnitrace
#if !defined(OMNITRACE_USE_ROCTRACER)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer, false_type)
#endif
#if !defined(OMNITRACE_USE_ROCPROFILER)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::rocprofiler, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::rocprofiler, false_type)
#endif
#if !defined(OMNITRACE_USE_RCCL)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, api::rccl, false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::rcclp_handle, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, category::rocm_rccl, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::rcclp_handle, false_type)
#endif
#if !defined(OMNITRACE_USE_RCCL) && !defined(OMNITRACE_USE_MPI)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::comm_data_tracker_t, false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::comm_data, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::comm_data_tracker_t, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::comm_data, false_type)
#endif
#if !defined(TIMEMORY_USE_LIBUNWIND)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::api::sampling, false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::backtrace, false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_wall_clock,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_cpu_clock,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_percent,
false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, category::sampling, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::backtrace, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::backtrace_metrics, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::backtrace_timestamp, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::sampling_wall_clock, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::sampling_cpu_clock, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::sampling_percent, false_type)
#endif
#if !defined(TIMEMORY_USE_LIBUNWIND) || !defined(OMNITRACE_USE_ROCM_SMI)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_gpu_busy,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_gpu_temp,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_gpu_power,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, omnitrace::component::sampling_gpu_memory,
false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::sampling_gpu_busy, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::sampling_gpu_temp, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::sampling_gpu_power, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::sampling_gpu_memory, false_type)
#endif
TIMEMORY_SET_COMPONENT_API(omnitrace::component::omnitrace, project::omnitrace,
category::dynamic_instrumentation, os::supports_linux)
TIMEMORY_SET_COMPONENT_API(omnitrace::component::user_region, project::omnitrace,
os::supports_linux)
TIMEMORY_SET_COMPONENT_API(omnitrace::component::roctracer, project::omnitrace,
tpls::rocm, device::gpu, os::supports_linux,
category::external)
@@ -223,10 +160,6 @@ TIMEMORY_SET_COMPONENT_API(omnitrace::component::sampling_gpu_temp, project::omn
category::temperature, category::sampling,
category::process_sampling)
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::omnitrace, OMNITRACE_COMPONENT,
"omnitrace", "omnitrace_component")
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::user_region, OMNITRACE_USER_REGION,
"user_region", "omnitrace_user_region")
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::roctracer, OMNITRACE_ROCTRACER,
"roctracer", "omnitrace_roctracer")
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::rocprofiler, OMNITRACE_ROCPROFILER,
@@ -248,15 +181,6 @@ TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_gpu_power,
TIMEMORY_PROPERTY_SPECIALIZATION(omnitrace::component::sampling_gpu_temp,
OMNITRACE_SAMPLING_GPU_TEMP, "sampling_gpu_temp", "")
TIMEMORY_METADATA_SPECIALIZATION(
omnitrace::component::omnitrace, "omnitrace",
"Invokes instrumentation functions 'omnitrace_push_trace' and 'omnitrace_pop_trace'",
"Used by gotcha wrappers")
TIMEMORY_METADATA_SPECIALIZATION(
omnitrace::component::user_region, "user_region",
"Invokes instrumentation functions 'omnitrace_user_push_region' and "
"'omnitrace_user_pop_region'",
"Used by OMPT")
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::roctracer, "roctracer",
"High-precision ROCm API and kernel tracing", "")
TIMEMORY_METADATA_SPECIALIZATION(omnitrace::component::rocprofiler, "rocprofiler",
@@ -292,44 +216,55 @@ TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_gpu_busy, double)
TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_gpu_temp, double)
TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_gpu_power, double)
TIMEMORY_STATISTICS_TYPE(omnitrace::component::sampling_gpu_memory, double)
TIMEMORY_STATISTICS_TYPE(component::comm_data_tracker_t, float)
TIMEMORY_STATISTICS_TYPE(omnitrace::component::comm_data_tracker_t, float)
// enable timing units
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category,
omnitrace::component::sampling_wall_clock, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category,
omnitrace::component::sampling_cpu_clock, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category, omnitrace::component::sampling_percent,
true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units,
omnitrace::component::sampling_wall_clock, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units,
omnitrace::component::sampling_cpu_clock, true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_timing_category, component::sampling_wall_clock,
true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_timing_category, component::sampling_cpu_clock,
true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_timing_category, component::sampling_percent,
true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::sampling_wall_clock,
true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::sampling_cpu_clock,
true_type)
// enable percent units
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_percent_units,
omnitrace::component::sampling_gpu_busy, true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_percent_units, component::sampling_gpu_busy,
true_type)
// enable memory units
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_memory_category,
omnitrace::component::sampling_gpu_memory, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_memory_units,
omnitrace::component::sampling_gpu_memory, true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_memory_category, component::sampling_gpu_memory,
true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_memory_units, component::sampling_gpu_memory,
true_type)
// reporting categories (sum)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_sum, omnitrace::component::sampling_gpu_busy,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_sum, omnitrace::component::sampling_gpu_temp,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_sum, omnitrace::component::sampling_gpu_power,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_sum, omnitrace::component::sampling_gpu_memory,
false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_sum, component::sampling_gpu_busy, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_sum, component::sampling_gpu_temp, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_sum, component::sampling_gpu_power, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_sum, component::sampling_gpu_memory, false_type)
// reporting categories (mean)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_mean, omnitrace::component::sampling_percent,
false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_mean, component::sampling_percent, false_type)
// reporting categories (stats)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_statistics, omnitrace::component::sampling_percent,
false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_statistics, component::sampling_percent,
false_type)
#define OMNITRACE_DECLARE_EXTERN_COMPONENT(NAME, HAS_DATA, ...) \
TIMEMORY_DECLARE_EXTERN_TEMPLATE( \
struct tim::component::base<TIMEMORY_ESC(omnitrace::component::NAME), \
__VA_ARGS__>) \
TIMEMORY_DECLARE_EXTERN_OPERATIONS(TIMEMORY_ESC(omnitrace::component::NAME), \
HAS_DATA) \
TIMEMORY_DECLARE_EXTERN_STORAGE(TIMEMORY_ESC(omnitrace::component::NAME))
#define OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(NAME, HAS_DATA, ...) \
TIMEMORY_INSTANTIATE_EXTERN_TEMPLATE( \
struct tim::component::base<TIMEMORY_ESC(omnitrace::component::NAME), \
__VA_ARGS__>) \
TIMEMORY_INSTANTIATE_EXTERN_OPERATIONS(TIMEMORY_ESC(omnitrace::component::NAME), \
HAS_DATA) \
TIMEMORY_INSTANTIATE_EXTERN_STORAGE(TIMEMORY_ESC(omnitrace::component::NAME))
@@ -21,7 +21,7 @@
// SOFTWARE.
#include "library/components/mpi_gotcha.hpp"
#include "library/api.hpp"
#include "api.hpp"
#include "library/components/category_region.hpp"
#include "library/components/comm_data.hpp"
#include "library/components/fwd.hpp"
@@ -32,6 +32,7 @@
#include <timemory/backends/mpi.hpp>
#include <timemory/backends/process.hpp>
#include <timemory/mpl/types.hpp>
#include <timemory/sampling/signals.hpp>
#include <timemory/utility/locking.hpp>
#include <cstdint>
@@ -41,11 +42,12 @@
namespace omnitrace
{
namespace component
{
namespace
{
using mpip_bundle_t =
tim::component_tuple<omnitrace::component::category_region<category::mpi>,
comp::comm_data>;
tim::component_tuple<category_region<category::mpi>, comp::comm_data>;
struct comm_rank_data
{
@@ -110,8 +112,11 @@ omnitrace_mpi_set_attr()
};
static auto _mpi_fini = [](MPI_Comm, int, void*, void*) {
OMNITRACE_DEBUG("MPI Comm attribute finalize\n");
auto _blocked = get_sampling_signals();
if(!_blocked.empty())
tim::sampling::block_signals(_blocked, tim::sampling::sigmask_scope::process);
if(mpip_index != std::numeric_limits<uint64_t>::max())
comp::deactivate_mpip<mpip_bundle_t, api::omnitrace>(mpip_index);
comp::deactivate_mpip<mpip_bundle_t, project::omnitrace>(mpip_index);
omnitrace_finalize_hidden();
return MPI_SUCCESS;
};
@@ -124,6 +129,10 @@ omnitrace_mpi_set_attr()
PMPI_Comm_set_attr(MPI_COMM_SELF, _comm_key, nullptr);
#endif
}
using strset_t = std::set<std::string>;
auto permit_bindings = strset_t{};
auto reject_bindings = strset_t{};
} // namespace
void
@@ -134,9 +143,14 @@ mpi_gotcha::configure()
mpi_gotcha_t::template configure<1, int, int*, char***, int, int*>(
"MPI_Init_thread");
mpi_gotcha_t::template configure<2, int>("MPI_Finalize");
reject_bindings.emplace("MPI_Init");
reject_bindings.emplace("MPI_Init_thread");
reject_bindings.emplace("MPI_Finalize");
#if defined(OMNITRACE_USE_MPI_HEADERS) && OMNITRACE_USE_MPI_HEADERS > 0
mpi_gotcha_t::template configure<3, int, comm_t, int*>("MPI_Comm_rank");
mpi_gotcha_t::template configure<4, int, comm_t, int*>("MPI_Comm_size");
reject_bindings.emplace("MPI_Comm_rank");
reject_bindings.emplace("MPI_Comm_size");
#endif
};
}
@@ -199,8 +213,6 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, int*, char***)
{
OMNITRACE_BASIC_DEBUG_F("%s(int*, char***)\n", _data.tool_id.c_str());
if(get_state() < ::omnitrace::State::Init) set_state(::omnitrace::State::PreInit);
omnitrace_push_trace_hidden(_data.tool_id.c_str());
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
tim::mpi::is_initialized_callback() = []() { return true; };
@@ -213,8 +225,6 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming, int*, char***, in
{
OMNITRACE_BASIC_DEBUG_F("%s(int*, char***, int, int*)\n", _data.tool_id.c_str());
if(get_state() < ::omnitrace::State::Init) set_state(::omnitrace::State::PreInit);
omnitrace_push_trace_hidden(_data.tool_id.c_str());
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
tim::mpi::is_initialized_callback() = []() { return true; };
@@ -227,8 +237,12 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::incoming)
{
OMNITRACE_BASIC_DEBUG_F("%s()\n", _data.tool_id.c_str());
auto _blocked = get_sampling_signals();
if(!_blocked.empty())
tim::sampling::block_signals(_blocked, tim::sampling::sigmask_scope::process);
if(mpip_index != std::numeric_limits<uint64_t>::max())
comp::deactivate_mpip<mpip_bundle_t, api::omnitrace>(mpip_index);
comp::deactivate_mpip<mpip_bundle_t, project::omnitrace>(mpip_index);
#if !defined(TIMEMORY_USE_MPI) && defined(TIMEMORY_USE_MPI_HEADERS)
tim::mpi::is_initialized_callback() = []() { return false; };
@@ -278,15 +292,11 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
{
OMNITRACE_BASIC_VERBOSE_F(2, "Activating MPI wrappers...\n");
if(!get_use_timemory())
{
trait::runtime_enabled<comp::comm_data>::set(false);
trait::runtime_enabled<comp::comm_data_tracker_t>::set(false);
}
// use env vars OMNITRACE_MPIP_PERMIT_LIST and OMNITRACE_MPIP_REJECT_LIST
// to control the gotcha bindings at runtime
comp::configure_mpip<mpip_bundle_t, api::omnitrace>();
mpip_index = comp::activate_mpip<mpip_bundle_t, api::omnitrace>();
comp::configure_mpip<mpip_bundle_t, project::omnitrace>(permit_bindings,
reject_bindings);
mpip_index = comp::activate_mpip<mpip_bundle_t, project::omnitrace>();
}
auto_lock_t _lk{ type_mutex<mpi_gotcha>() };
@@ -344,6 +354,7 @@ mpi_gotcha::audit(const gotcha_data_t& _data, audit::outgoing, int _retval)
}
omnitrace_pop_trace_hidden(_data.tool_id.c_str());
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INITIALIZE_STORAGE(omnitrace::mpi_gotcha)
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::mpi_gotcha)
@@ -30,6 +30,8 @@
namespace omnitrace
{
namespace component
{
// this is used to wrap MPI_Init and MPI_Init_thread
struct mpi_gotcha : comp::base<mpi_gotcha, void>
{
@@ -76,6 +78,8 @@ private:
int* m_size_ptr = nullptr;
uintptr_t m_comm_val = null_comm();
};
} // namespace component
using mpi_gotcha_t = comp::gotcha<5, tim::component_tuple<mpi_gotcha>, api::omnitrace>;
using mpi_gotcha_t =
comp::gotcha<5, tim::component_tuple<component::mpi_gotcha>, project::omnitrace>;
} // namespace omnitrace
@@ -1,50 +0,0 @@
// MIT License
//
// Copyright (c) 2022 Advanced Micro Devices, Inc. All Rights Reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "library/components/omnitrace.hpp"
#include "library/api.hpp"
namespace omnitrace
{
namespace component
{
void
omnitrace::start()
{
if(m_prefix) omnitrace_push_trace_hidden(m_prefix);
}
void
omnitrace::stop()
{
if(m_prefix) omnitrace_pop_trace_hidden(m_prefix);
}
void
omnitrace::set_prefix(const char* _prefix)
{
m_prefix = _prefix;
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INITIALIZE_STORAGE(omnitrace::component::omnitrace)
@@ -22,16 +22,19 @@
#include "library/components/pthread_create_gotcha.hpp"
#include "library/components/category_region.hpp"
#include "library/components/omnitrace.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/components/roctracer.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/runtime.hpp"
#include "library/sampling.hpp"
#include "library/state.hpp"
#include "library/thread_data.hpp"
#include "library/thread_info.hpp"
#include "library/utility.hpp"
#include <timemory/backends/threading.hpp>
#include <timemory/components/macros.hpp>
#include <timemory/components/timing/wall_clock.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/utility/types.hpp>
@@ -49,13 +52,10 @@ std::set<int>
shutdown();
} // namespace sampling
namespace mpl = tim::mpl;
using bundle_t = tim::lightweight_tuple<comp::wall_clock, comp::roctracer_data>;
using wall_pw_t = mpl::piecewise_select<comp::wall_clock>; // only wall-clock
using main_pw_t = mpl::piecewise_ignore<comp::wall_clock>; // exclude wall-clock
using category_region_t =
tim::lightweight_tuple<omnitrace::component::category_region<category::pthread>>;
namespace component
{
using bundle_t = tim::lightweight_tuple<comp::wall_clock, comp::roctracer_data>;
using category_region_t = tim::lightweight_tuple<category_region<category::pthread>>;
namespace
{
@@ -63,7 +63,7 @@ auto* is_shutdown = new bool{ false }; // intentional data leak
auto* bundles = new std::map<int64_t, std::shared_ptr<bundle_t>>{};
auto* bundles_mutex = new std::mutex{};
auto bundles_dtor = scope::destructor{ []() {
omnitrace::pthread_create_gotcha::shutdown();
pthread_create_gotcha::shutdown();
delete bundles;
delete bundles_mutex;
bundles = nullptr;
@@ -105,12 +105,12 @@ stop_bundle(bundle_t& _bundle, int64_t _tid, Args&&... _args)
_bundle.key().c_str(), _tid);
if(get_use_timemory())
{
_bundle.stop(wall_pw_t{}); // stop wall-clock so we can get the value
auto _wc = *_bundle.get<comp::wall_clock>();
_wc.stop();
// update roctracer_data
_bundle.store(std::plus<double>{},
_bundle.get<comp::wall_clock>()->get() * units::sec);
// stop all other components including roctracer_data after update
_bundle.stop(main_pw_t{});
_bundle.store(std::plus<double>{}, _wc.get() * _wc.unit());
// stop all
_bundle.stop();
// exclude popping wall-clock
_bundle.pop(_tid);
}
@@ -154,15 +154,16 @@ pthread_create_gotcha::wrapper::operator()() const
return m_routine(m_arg);
}
push_thread_state(omnitrace::ThreadState::Internal);
push_thread_state(ThreadState::Internal);
int64_t _tid = -1;
void* _ret = nullptr;
auto _is_sampling = false;
auto _bundle = std::shared_ptr<bundle_t>{};
auto _signals = std::set<int>{};
auto _coverage = (get_mode() == omnitrace::Mode::Coverage);
auto _dtor = [&]() {
auto _coverage = (get_mode() == Mode::Coverage);
// const auto& _parent_info = thread_info::get(m_parent_tid, LookupTID);
auto _dtor = [&]() {
set_thread_state(ThreadState::Internal);
if(_is_sampling)
{
@@ -172,11 +173,11 @@ pthread_create_gotcha::wrapper::operator()() const
if(_tid >= 0)
{
auto _active = (get_state() == omnitrace::State::Active &&
auto _active = (get_state() == ::omnitrace::State::Active &&
bundles != nullptr && bundles_mutex != nullptr);
if(!_active) return;
get_execution_time(_tid)->second = comp::wall_clock::record();
auto& _thr_bundle = thread_bundle_data_t::instance();
thread_info::set_stop(comp::wall_clock::record());
auto& _thr_bundle = thread_bundle_data_t::instance();
if(_thr_bundle && _thr_bundle->get<comp::wall_clock>() &&
_thr_bundle->get<comp::wall_clock>()->get_is_running())
_thr_bundle->stop();
@@ -185,18 +186,19 @@ pthread_create_gotcha::wrapper::operator()() const
}
};
auto _active = (get_state() == omnitrace::State::Active && bundles != nullptr &&
auto _active = (get_state() == ::omnitrace::State::Active && bundles != nullptr &&
bundles_mutex != nullptr);
if(_active && !_coverage)
{
_tid = threading::get_id();
const auto& _tid_index = thread_info::init();
_tid = _tid_index->index_data->internal_value;
threading::set_thread_name(TIMEMORY_JOIN(" ", "Thread", _tid).c_str());
if(!thread_bundle_data_t::instances().at(_tid))
{
thread_data<omnitrace_thread_bundle_t>::construct(
TIMEMORY_JOIN('/', "omnitrace/process", process::get_id(), "thread",
threading::get_id()),
_tid),
quirk::config<quirk::auto_start>{});
thread_bundle_data_t::instances().at(_tid)->start();
}
@@ -207,12 +209,9 @@ pthread_create_gotcha::wrapper::operator()() const
.first->second;
}
if(_bundle) start_bundle(*_bundle);
get_execution_time(_tid)->first = comp::wall_clock::record();
get_cpu_cid_stack(_tid, m_parent_tid);
if(m_enable_sampling)
{
// initialize thread-local statics
(void) tim::get_unw_backtrace<12, 1, false>();
_is_sampling = true;
pthread_gotcha::push_enable_sampling_on_child_threads(false);
_signals = sampling::setup();
@@ -220,6 +219,10 @@ pthread_create_gotcha::wrapper::operator()() const
sampling::unblock_signals();
}
}
else
{
thread_info::init(true);
}
// notify the wrapper that all internal work is completed
if(m_promise) m_promise->set_value();
@@ -227,7 +230,7 @@ pthread_create_gotcha::wrapper::operator()() const
// Internal -> Enabled
pop_thread_state();
push_thread_state(omnitrace::ThreadState::Enabled);
push_thread_state(ThreadState::Enabled);
// execute the original function
_ret = m_routine(m_arg);
@@ -237,7 +240,7 @@ pthread_create_gotcha::wrapper::operator()() const
// execute the destructor actions
_dtor();
set_thread_state(omnitrace::ThreadState::Completed);
set_thread_state(ThreadState::Completed);
return _ret;
}
@@ -324,48 +327,50 @@ int
pthread_create_gotcha::operator()(pthread_t* thread, const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) const
{
auto _initial_thread_state = get_thread_state();
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
bundle_t _bundle{ "pthread_create" };
auto _enable_sampling = pthread_gotcha::sampling_enabled_on_child_threads();
auto _coverage = (get_mode() == omnitrace::Mode::Coverage);
auto _active = (get_state() == omnitrace::State::Active);
int64_t _tid = (_active) ? threading::get_id() : 0;
auto _disabled = (get_thread_state() == ThreadState::Disabled);
auto _enabled = (get_thread_state() == ThreadState::Enabled);
auto _bundle = std::optional<bundle_t>{};
if(_active)
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
auto _active = (get_state() == ::omnitrace::State::Active && !_disabled);
auto _coverage = (get_mode() == Mode::Coverage);
auto _use_sampling = get_use_sampling();
auto _sample_child = pthread_gotcha::sampling_enabled_on_child_threads();
auto _tid = utility::get_thread_index();
auto _use_bundle = (_active && !_coverage);
const auto& _info = thread_info::init(!_active || !_sample_child || _disabled);
auto _enable_sampling =
(!_disabled && _enabled && _sample_child && _use_sampling && !_info->is_offset);
if(_active && !_disabled && !_info->is_offset)
{
OMNITRACE_VERBOSE(1, "Creating new thread on PID %i (rank: %i), TID %li\n",
process::get_id(), dmp::rank(), _tid);
}
// ensure that cpu cid stack exists on the parent thread if active
if(!_coverage && _active) get_cpu_cid_stack();
if(_active && !_coverage) get_cpu_cid_stack();
if(!get_use_sampling() || !_enable_sampling)
{
auto* _obj = new wrapper(start_routine, arg, _enable_sampling, _tid, nullptr);
if(_active && !_coverage && _enable_sampling &&
_initial_thread_state == ThreadState::Enabled)
start_bundle(_bundle, audit::incoming{}, thread, attr, start_routine, arg);
// create the thread
auto _ret = (*m_wrappee)(thread, attr, &wrapper::wrap, static_cast<void*>(_obj));
if(_active && !_coverage && _enable_sampling &&
_initial_thread_state == ThreadState::Enabled)
stop_bundle(_bundle, _tid, audit::outgoing{}, _ret);
return _ret;
}
// block the signals in entire process
OMNITRACE_DEBUG("blocking signals...\n");
auto _blocked_signals = get_sampling_signals();
tim::sampling::block_signals(_blocked_signals, tim::sampling::sigmask_scope::process);
start_bundle(_bundle, audit::incoming{}, thread, attr, start_routine, arg);
// promise set by thread when signal handler is configured
set_thread_state(ThreadState::Disabled);
auto _blocked = get_sampling_signals();
auto _promise = std::promise<void>{};
auto _fut = _promise.get_future();
auto* _wrap = new wrapper(start_routine, arg, _enable_sampling, _tid, &_promise);
set_thread_state(ThreadState::Internal);
// block the signals in entire process
if(_enable_sampling && !_blocked.empty())
{
OMNITRACE_DEBUG("blocking signals...\n");
tim::sampling::block_signals(_blocked, tim::sampling::sigmask_scope::process);
}
if(_use_bundle)
{
_bundle = bundle_t{ "pthread_create" };
start_bundle(*_bundle, audit::incoming{}, thread, attr, start_routine, arg);
}
// create the thread
auto _ret = (*m_wrappee)(thread, attr, &wrapper::wrap, static_cast<void*>(_wrap));
@@ -374,21 +379,19 @@ pthread_create_gotcha::operator()(pthread_t* thread, const pthread_attr_t* attr,
OMNITRACE_DEBUG("waiting for child to signal it is setup...\n");
_fut.wait();
stop_bundle(_bundle, threading::get_id(), audit::outgoing{}, _ret);
if(_use_bundle) stop_bundle(*_bundle, threading::get_id(), audit::outgoing{}, _ret);
// unblock the signals in the entire process
OMNITRACE_DEBUG("unblocking signals...\n");
tim::sampling::unblock_signals(_blocked_signals,
tim::sampling::sigmask_scope::process);
if(_enable_sampling && !_blocked.empty())
{
OMNITRACE_DEBUG("unblocking signals...\n");
tim::sampling::unblock_signals(_blocked, tim::sampling::sigmask_scope::process);
}
OMNITRACE_DEBUG("returning success...\n");
return _ret;
}
bool
pthread_create_gotcha::is_valid_execution_time(int64_t _tid, uint64_t _ts)
{
return (_ts >= get_execution_time(_tid)->first &&
_ts <= get_execution_time(_tid)->second);
}
} // namespace component
} // namespace omnitrace
TIMEMORY_INITIALIZE_STORAGE(component::roctracer_data)
@@ -32,6 +32,8 @@
namespace omnitrace
{
namespace component
{
struct pthread_create_gotcha : tim::component::base<pthread_create_gotcha, void>
{
using routine_t = void* (*) (void*);
@@ -68,26 +70,13 @@ struct pthread_create_gotcha : tim::component::base<pthread_create_gotcha, void>
int operator()(pthread_t* thread, const pthread_attr_t* attr,
void* (*start_routine)(void*), void* arg) const;
static auto& get_execution_time(int64_t _tid = threading::get_id());
static bool is_valid_execution_time(int64_t _tid, uint64_t _ts);
void set_data(wrappee_t);
private:
wrappee_t m_wrappee = &pthread_create;
};
inline auto&
pthread_create_gotcha::get_execution_time(int64_t _tid)
{
struct omnitrace_thread_exec_time
{};
using data_t = std::pair<uint64_t, uint64_t>;
using thread_data_t = thread_data<data_t, omnitrace_thread_exec_time>;
static auto& _v = thread_data_t::instances(thread_data_t::construct_on_init{});
return _v.at(_tid);
}
using pthread_create_gotcha_t =
tim::component::gotcha<2, std::tuple<>, pthread_create_gotcha>;
} // namespace component
} // namespace omnitrace
@@ -21,7 +21,6 @@
// SOFTWARE.
#include "library/components/pthread_gotcha.hpp"
#include "library/components/omnitrace.hpp"
#include "library/components/pthread_create_gotcha.hpp"
#include "library/components/pthread_mutex_gotcha.hpp"
#include "library/components/roctracer.hpp"
@@ -33,7 +32,7 @@
#include "library/utility.hpp"
#include <timemory/backends/threading.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/utility/macros.hpp>
#include <timemory/utility/types.hpp>
#include <pthread.h>
@@ -41,11 +40,34 @@
#include <array>
#include <vector>
namespace tim
{
namespace operation
{
template <>
struct stop<omnitrace::component::pthread_create_gotcha_t>
{
using type = omnitrace::component::pthread_create_gotcha_t;
TIMEMORY_DEFAULT_OBJECT(stop)
template <typename... Args>
explicit stop(type&, Args&&...)
{}
template <typename... Args>
void operator()(type&, Args&&...)
{}
};
} // namespace operation
} // namespace tim
namespace omnitrace
{
namespace
{
using bundle_t = tim::lightweight_tuple<pthread_create_gotcha_t, pthread_mutex_gotcha_t>;
using bundle_t = tim::lightweight_tuple<component::pthread_create_gotcha_t,
component::pthread_mutex_gotcha_t>;
auto&
get_sampling_on_child_threads_history(int64_t _idx = utility::get_thread_index())
@@ -62,6 +84,8 @@ get_bundle()
if(!_v) _v = std::make_unique<bundle_t>("pthread_gotcha");
return _v;
}
bool is_configured = false;
} // namespace
//--------------------------------------------------------------------------------------//
@@ -69,15 +93,23 @@ get_bundle()
void
pthread_gotcha::configure()
{
pthread_create_gotcha::configure();
pthread_mutex_gotcha::configure();
if(!is_configured)
{
::omnitrace::component::pthread_create_gotcha::configure();
::omnitrace::component::pthread_mutex_gotcha::configure();
is_configured = true;
}
}
void
pthread_gotcha::shutdown()
{
pthread_create_gotcha::shutdown();
pthread_mutex_gotcha::shutdown();
if(is_configured)
{
::omnitrace::component::pthread_mutex_gotcha::shutdown();
// ::omnitrace::component::pthread_create_gotcha::shutdown();
is_configured = false;
}
}
bool
@@ -89,10 +121,10 @@ pthread_gotcha::sampling_enabled_on_child_threads()
bool
pthread_gotcha::push_enable_sampling_on_child_threads(bool _v)
{
auto& _hist = get_sampling_on_child_threads_history();
bool _last = sampling_on_child_threads();
_hist.emplace_back(_last);
bool _last = sampling_on_child_threads();
sampling_on_child_threads() = _v;
auto& _hist = get_sampling_on_child_threads_history();
_hist.emplace_back(_last);
return _last;
}
@@ -128,6 +160,7 @@ pthread_gotcha::sampling_on_child_threads()
void
pthread_gotcha::start()
{
configure();
get_bundle()->start();
}
@@ -135,6 +168,5 @@ void
pthread_gotcha::stop()
{
get_bundle()->stop();
get_bundle().reset();
}
} // namespace omnitrace
@@ -21,7 +21,6 @@
// SOFTWARE.
#include "library/components/pthread_mutex_gotcha.hpp"
#include "library.hpp"
#include "library/components/category_region.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/config.hpp"
@@ -41,6 +40,8 @@
namespace omnitrace
{
namespace component
{
using Device = critical_trace::Device;
using Phase = critical_trace::Phase;
@@ -98,7 +99,7 @@ pthread_mutex_gotcha::configure()
pthread_mutex_gotcha_t::get_initializer() = []() {
if(config::get_trace_thread_locks())
{
pthread_mutex_gotcha::validate();
validate();
pthread_mutex_gotcha_t::configure(
comp::gotcha_config<0, int, pthread_mutex_t*>{ "pthread_mutex_lock" });
@@ -186,23 +187,39 @@ pthread_mutex_gotcha::validate()
}
}
pthread_mutex_gotcha::pthread_mutex_gotcha(const gotcha_data_t& _data)
: m_data{ &_data }
{}
template <typename... Args>
auto
pthread_mutex_gotcha::operator()(uintptr_t&& _id, const comp::gotcha_data& _data,
int (*_callee)(Args...), Args... _args) const
pthread_mutex_gotcha::operator()(uintptr_t&& _id, int (*_callee)(Args...),
Args... _args) const
{
using bundle_t = omnitrace::component::category_region<category::pthread>;
using bundle_t = category_region<category::pthread>;
if(is_disabled())
{
if(!_callee)
{
OMNITRACE_PRINT("Warning! nullptr to %s\n", _data.tool_id.c_str());
if(m_data)
{
OMNITRACE_PRINT("Warning! nullptr to %s\n", m_data->tool_id.c_str());
}
return EINVAL;
}
return (*_callee)(_args...);
}
struct local_dtor
{
explicit local_dtor(bool& _v)
: _protect{ _v }
{}
~local_dtor() { _protect = false; }
bool& _protect;
} _dtor{ m_protect = true };
uint64_t _cid = 0;
uint64_t _parent_cid = 0;
uint32_t _depth = 0;
@@ -216,15 +233,15 @@ pthread_mutex_gotcha::operator()(uintptr_t&& _id, const comp::gotcha_data& _data
_ts = comp::wall_clock::record();
}
bundle_t::audit(_data, audit::incoming{}, _args...);
bundle_t::audit(std::string_view{ m_data->tool_id }, audit::incoming{}, _args...);
auto _ret = (*_callee)(_args...);
bundle_t::audit(_data, audit::outgoing{}, _ret);
bundle_t::audit(std::string_view{ m_data->tool_id }, audit::outgoing{}, _ret);
if(_id < std::numeric_limits<uintptr_t>::max() && get_use_critical_trace())
{
add_critical_trace<Device::CPU, Phase::DELTA>(
threading::get_id(), _cid, 0, _parent_cid, _ts, comp::wall_clock::record(), 0,
_id, get_hashes().at(_data.index), _depth);
_id, get_hashes().at(m_data->index), _depth);
}
tim::consume_parameters(_id, _cid, _parent_cid, _depth, _ts);
@@ -232,51 +249,69 @@ pthread_mutex_gotcha::operator()(uintptr_t&& _id, const comp::gotcha_data& _data
}
int
pthread_mutex_gotcha::operator()(const gotcha_data_t& _data,
int (*_callee)(pthread_mutex_t*),
pthread_mutex_gotcha::operator()(int (*_callee)(pthread_mutex_t*),
pthread_mutex_t* _mutex) const
{
return (*this)(reinterpret_cast<uintptr_t>(_mutex), _data, _callee, _mutex);
if(m_protect) return (*_callee)(_mutex);
return (*this)(reinterpret_cast<uintptr_t>(_mutex), _callee, _mutex);
}
int
pthread_mutex_gotcha::operator()(const gotcha_data_t& _data,
int (*_callee)(pthread_spinlock_t*),
pthread_mutex_gotcha::operator()(int (*_callee)(pthread_spinlock_t*),
pthread_spinlock_t* _lock) const
{
return (*this)(reinterpret_cast<uintptr_t>(_lock), _data, _callee, _lock);
if(m_protect) return (*_callee)(_lock);
return (*this)(reinterpret_cast<uintptr_t>(_lock), _callee, _lock);
}
int
pthread_mutex_gotcha::operator()(const gotcha_data_t& _data,
int (*_callee)(pthread_rwlock_t*),
pthread_mutex_gotcha::operator()(int (*_callee)(pthread_rwlock_t*),
pthread_rwlock_t* _lock) const
{
return (*this)(reinterpret_cast<uintptr_t>(_lock), _data, _callee, _lock);
if(m_protect) return (*_callee)(_lock);
return (*this)(reinterpret_cast<uintptr_t>(_lock), _callee, _lock);
}
int
pthread_mutex_gotcha::operator()(const gotcha_data_t& _data,
int (*_callee)(pthread_barrier_t*),
pthread_mutex_gotcha::operator()(int (*_callee)(pthread_barrier_t*),
pthread_barrier_t* _barrier) const
{
return (*this)(reinterpret_cast<uintptr_t>(_barrier), _data, _callee, _barrier);
if(m_protect) return (*_callee)(_barrier);
return (*this)(reinterpret_cast<uintptr_t>(_barrier), _callee, _barrier);
}
int
pthread_mutex_gotcha::operator()(const gotcha_data_t& _data,
int (*_callee)(pthread_t, void**), pthread_t _thr,
pthread_mutex_gotcha::operator()(int (*_callee)(pthread_t, void**), pthread_t _thr,
void** _tinfo) const
{
return (*this)(static_cast<uintptr_t>(threading::get_id()), _data, _callee, _thr,
_tinfo);
if(m_protect) return (*_callee)(_thr, _tinfo);
return (*this)(static_cast<uintptr_t>(threading::get_id()), _callee, _thr, _tinfo);
}
bool
pthread_mutex_gotcha::is_disabled()
{
return (omnitrace::get_state() != omnitrace::State::Active ||
omnitrace::get_thread_state() != omnitrace::ThreadState::Enabled ||
return (get_state() != ::omnitrace::State::Active ||
get_thread_state() != ThreadState::Enabled ||
(get_use_sampling() && !pthread_gotcha::sampling_enabled_on_child_threads()));
}
} // namespace component
} // namespace omnitrace
namespace tim
{
namespace policy
{
template <size_t N>
pthread_mutex_gotcha&
static_data<pthread_mutex_gotcha, pthread_mutex_gotcha_t>::operator()(
std::integral_constant<size_t, N>, const component::gotcha_data& _data) const
{
using thread_data_t =
omnitrace::thread_data<pthread_mutex_gotcha, std::integral_constant<size_t, N>>;
static thread_local auto& _v =
thread_data_t::instance(omnitrace::construct_on_init{}, _data);
return *_v;
}
} // namespace policy
} // namespace tim
@@ -26,12 +26,17 @@
#include "library/defines.hpp"
#include "library/timemory.hpp"
#include <timemory/components/gotcha/backends.hpp>
#include <timemory/mpl/macros.hpp>
#include <array>
#include <cstddef>
#include <string>
namespace omnitrace
{
namespace component
{
// this is used to wrap pthread_mutex()
struct pthread_mutex_gotcha : comp::base<pthread_mutex_gotcha, void>
{
@@ -41,6 +46,8 @@ struct pthread_mutex_gotcha : comp::base<pthread_mutex_gotcha, void>
TIMEMORY_DEFAULT_OBJECT(pthread_mutex_gotcha)
explicit pthread_mutex_gotcha(const gotcha_data_t&);
// string id for component
static std::string label() { return "pthread_mutex_gotcha"; }
@@ -49,25 +56,44 @@ struct pthread_mutex_gotcha : comp::base<pthread_mutex_gotcha, void>
static void shutdown();
static void validate();
int operator()(const gotcha_data_t&, int (*)(pthread_mutex_t*),
pthread_mutex_t*) const;
int operator()(const gotcha_data_t&, int (*)(pthread_spinlock_t*),
pthread_spinlock_t*) const;
int operator()(const gotcha_data_t&, int (*)(pthread_rwlock_t*),
pthread_rwlock_t*) const;
int operator()(const gotcha_data_t&, int (*)(pthread_barrier_t*),
pthread_barrier_t*) const;
int operator()(const gotcha_data_t&, int (*)(pthread_t, void**), pthread_t,
void**) const;
int operator()(int (*)(pthread_mutex_t*), pthread_mutex_t*) const;
int operator()(int (*)(pthread_spinlock_t*), pthread_spinlock_t*) const;
int operator()(int (*)(pthread_rwlock_t*), pthread_rwlock_t*) const;
int operator()(int (*)(pthread_barrier_t*), pthread_barrier_t*) const;
int operator()(int (*)(pthread_t, void**), pthread_t, void**) const;
private:
static bool is_disabled();
static hash_array_t& get_hashes();
template <typename... Args>
auto operator()(uintptr_t&&, const gotcha_data_t&, int (*)(Args...), Args...) const;
auto operator()(uintptr_t&&, int (*)(Args...), Args...) const;
mutable bool m_protect = false;
const gotcha_data_t* m_data = nullptr;
};
using pthread_mutex_gotcha_t = comp::gotcha<pthread_mutex_gotcha::gotcha_capacity,
quirk::fast, pthread_mutex_gotcha>;
std::tuple<>, pthread_mutex_gotcha>;
} // namespace component
} // namespace omnitrace
OMNITRACE_DEFINE_CONCRETE_TRAIT(fast_gotcha, component::pthread_mutex_gotcha_t, true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(static_data, component::pthread_mutex_gotcha_t, true_type)
namespace tim
{
namespace policy
{
using pthread_mutex_gotcha = ::omnitrace::component::pthread_mutex_gotcha;
using pthread_mutex_gotcha_t = ::omnitrace::component::pthread_mutex_gotcha_t;
template <>
struct static_data<pthread_mutex_gotcha, pthread_mutex_gotcha_t> : std::true_type
{
template <size_t N>
pthread_mutex_gotcha& operator()(std::integral_constant<size_t, N>,
const component::gotcha_data& _data) const;
};
} // namespace policy
} // namespace tim
@@ -33,7 +33,7 @@ operator<<(std::ostream& _os, const ncclUniqueId& _v)
return _os;
}
namespace tim
namespace omnitrace
{
namespace component
{
@@ -59,7 +59,7 @@ activate_rcclp()
std::stringstream ss;
ss << "timemory-rcclp-" << demangle<rccl_toolset_t>() << "-"
<< demangle<api::rccl>();
<< demangle<category::rocm_rccl>();
tim::manager::instance()->add_cleanup(ss.str(), cleanup_functor);
return 1;
}
@@ -75,7 +75,7 @@ deactivate_rcclp(uint64_t id)
{
std::stringstream ss;
ss << "timemory-rcclp-" << demangle<rccl_toolset_t>() << "-"
<< demangle<api::rccl>();
<< demangle<category::rocm_rccl>();
tim::manager::instance()->cleanup(ss.str());
return 0;
}
@@ -90,7 +90,7 @@ configure_rcclp(const std::set<std::string>& permit, const std::set<std::string>
static constexpr size_t rcclp_wrapper_count = OMNITRACE_NUM_RCCLP_WRAPPERS;
using rcclp_gotcha_t =
tim::component::gotcha<rcclp_wrapper_count, rccl_toolset_t, api::rccl>;
tim::component::gotcha<rcclp_wrapper_count, rccl_toolset_t, category::rocm_rccl>;
static bool is_initialized = false;
if(!is_initialized)
@@ -197,4 +197,4 @@ rcclp_handle::get_tool_count()
return get_persistent_data().m_count;
}
} // namespace component
} // namespace tim
} // namespace omnitrace
@@ -49,18 +49,20 @@
# define OMNITRACE_NUM_RCCLP_WRAPPERS 25
#endif
TIMEMORY_COMPONENT_ALIAS(
OMNITRACE_COMPONENT_ALIAS(
rccl_toolset_t,
component_bundle<rccl_api_t, omnitrace::component::category_region<category::rccl>,
comm_data>)
TIMEMORY_COMPONENT_ALIAS(rcclp_gotcha_t,
gotcha<OMNITRACE_NUM_RCCLP_WRAPPERS, rccl_toolset_t, rccl_api_t>)
::tim::component_bundle<category::rocm_rccl,
omnitrace::component::category_region<category::rocm_rccl>,
comm_data>)
OMNITRACE_COMPONENT_ALIAS(rcclp_gotcha_t,
::tim::component::gotcha<OMNITRACE_NUM_RCCLP_WRAPPERS,
rccl_toolset_t, category::rocm_rccl>)
#if !defined(OMNITRACE_USE_RCCL)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::rcclp_gotcha_t, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::rcclp_gotcha_t, false_type)
#endif
namespace tim
namespace omnitrace
{
namespace component
{
@@ -106,4 +108,4 @@ private:
static std::atomic<int64_t>& get_tool_count();
};
} // namespace component
} // namespace tim
} // namespace omnitrace
@@ -22,8 +22,6 @@
#include "library/components/rocprofiler.hpp"
#include "library/common.hpp"
#include "library/components/pthread_create_gotcha.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
@@ -45,7 +43,7 @@
#include <string_view>
#include <type_traits>
namespace tim
namespace omnitrace
{
namespace component
{
@@ -59,10 +57,10 @@ rocprofiler_activity_count()
}
} // namespace
omnitrace::unique_ptr_t<rocm_data_t>&
unique_ptr_t<rocm_data_t>&
rocm_data(int64_t _tid)
{
using thread_data_t = omnitrace::thread_data<rocm_data_t, rocm_event>;
using thread_data_t = thread_data<rocm_data_t, rocm_event>;
static auto& _v = thread_data_t::instances(thread_data_t::construct_on_init{});
return _v.at(_tid);
}
@@ -179,153 +177,6 @@ rocprofiler::shutdown()
{
omnitrace::rocprofiler::post_process();
omnitrace::rocprofiler::rocm_cleanup();
/*
using storage_type = typename rocprofiler_data::storage_type;
using bundle_t = rocprofiler_data;
using tag_t = api::omnitrace;
auto _data = omnitrace::rocprofiler::get_data();
auto _labels = omnitrace::rocprofiler::get_data_labels();
auto _info = omnitrace::rocprofiler::rocm_metrics();
int64_t _idx = 0;
auto _scope = tim::scope::get_default();
auto _get_metric_desc = [_info](std::string_view _v) {
for(auto itr : _info)
{
if(itr.symbol().find(_v) == 0 || itr.short_description().find(_v) == 0)
return std::make_pair(itr.short_description(), itr.long_description());
}
return std::make_pair(std::string{}, std::string{});
};
auto _debug = settings::debug();
settings::debug() = true;
struct hw_counters
{};
using rocm_counter = omnitrace::rocprofiler::rocm_counter;
struct perfetto_rocm_event
{
rocm_counter entry = {};
rocm_counter exit = {};
rocprofiler_value value = {};
bool operator<(const perfetto_rocm_event& _v) const
{
return (entry.at(0) == _v.entry.at(0)) ? exit.at(0) < _v.exit.at(0)
: entry.at(0) < _v.entry.at(0);
}
};
// contains the necessary info for export to perfetto
auto _perfetto_raw_data =
std::map<int64_t, std::map<int64_t, std::vector<perfetto_rocm_event>>>{};
// contains the time-stamp regions for the counter tracks
auto _perfetto_time_regions =
std::map<int64_t, std::map<int64_t, std::set<uint64_t>>>{};
// create a layout compatible for exporting to perfetto
for(const auto& itr : _labels)
{
auto _dev_id = itr.first;
auto _dev_name = JOIN("", '[', _dev_id, ']');
for(size_t i = 0; i < itr.second.size(); ++i)
{
auto _metric_name = itr.second.at(i);
auto _idx = perfetto_counter_track<hw_counters>::emplace(
_dev_id, JOIN(' ', "Device", _metric_name, _dev_name));
auto& _raw = _perfetto_raw_data[_dev_id][_idx];
auto& _reg = _perfetto_time_regions[_dev_id][_idx];
for(const auto& ditr : _data)
{
_raw.emplace_back(
perfetto_rocm_event{ ditr.entry, ditr.exit, ditr.data.at(i) });
}
std::sort(_raw.begin(), _raw.end());
for(auto ritr : _raw)
{
if(pthread_create_gotcha::is_valid_execution_time(0, ritr.entry.at(0)))
_reg.emplace(ritr.entry.at(0));
if(pthread_create_gotcha::is_valid_execution_time(0, ritr.exit.at(0)))
_reg.emplace(ritr.exit.at(0));
}
}
}
for(auto& ditr : _perfetto_time_regions)
for(auto& citr : ditr.second)
{
for(auto _ts = citr.second.begin(); _ts != citr.second.end(); ++_ts)
{
rocprofiler_value _v = {};
auto _curr = _ts;
auto _next = std::next(_ts);
if(_next == citr.second.end()) continue;
auto _min_ts = *_curr;
auto _max_ts = (_next == citr.second.end()) ? *_curr : *_next;
for(auto itr : _perfetto_raw_data[ditr.first][citr.first])
{
if(itr.entry[0] >= _min_ts && itr.exit[0] <= _max_ts)
{
using namespace tim::stl;
_v += itr.value;
}
}
auto _write_counter = [&](auto _v) {
if(_min_ts == _max_ts)
{
using value_type = std::remove_reference_t<
std::remove_cv_t<decay_t<decltype(_v)>>>;
_v = static_cast<value_type>(0);
}
TRACE_COUNTER(
"hardware_counter",
perfetto_counter_track<hw_counters>::at(ditr.first, citr.first),
_min_ts, _v);
};
std::visit(_write_counter, _v);
}
}
for(const auto& itr : _labels)
{
for(size_t i = 0; i < itr.second.size(); ++i)
{
auto _metric_name = itr.second.at(i);
auto _metric_desc = _get_metric_desc(_metric_name).second;
rocprofiler_data::label() = _metric_name;
if(!_metric_desc.empty())
rocprofiler_data::description() = JOIN(" - ", "rocprof", _metric_desc);
auto _dev_id = itr.first;
auto _label = JOIN('-', "rocprofiler", _metric_name, "device", _dev_id);
storage_type _storage{ standalone_storage{}, ++_idx, _label };
std::vector<bundle_t> _bundles = {};
_bundles.reserve(_data.size());
for(const auto& ditr : _data)
{
auto _hash = add_hash_id(ditr.name);
auto _v = ditr.data.at(i);
auto _obj = std::tie(_bundles.emplace_back(bundle_t{}));
invoke::reset<tag_t>(_obj);
invoke::push<tag_t>(_obj, _scope, _hash, &_storage, _dev_id);
invoke::start<tag_t>(_obj);
invoke::store<tag_t>(_obj, _v);
invoke::stop<tag_t>(_obj);
invoke::pop<tag_t>(_obj, &_storage, _dev_id);
}
_storage.write(_label);
}
}
settings::debug() = _debug;
*/
OMNITRACE_VERBOSE_F(1, "rocprofiler is shutdown\n");
}
@@ -336,8 +187,8 @@ rocprofiler::protect_flush_activity()
[]() { ++rocprofiler_activity_count(); });
}
} // namespace component
} // namespace tim
} // namespace omnitrace
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(rocprofiler, false, void)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(rocprofiler_data, true,
tim::component::rocprofiler_value)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(rocprofiler, false, void)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(rocprofiler_data, true,
tim::component::rocprofiler_value)
@@ -47,19 +47,7 @@
#include <variant>
#include <vector>
#if !defined(OMNITRACE_MAX_COUNTERS)
# define OMNITRACE_MAX_COUNTERS 25
#endif
#if !defined(OMNITRACE_ROCM_LOOK_AHEAD)
# define OMNITRACE_ROCM_LOOK_AHEAD 128
#endif
#if !defined(OMNITRACE_MAX_ROCM_QUEUES)
# define OMNITRACE_MAX_ROCM_QUEUES OMNITRACE_MAX_THREADS
#endif
namespace tim
namespace omnitrace
{
namespace component
{
@@ -159,7 +147,21 @@ rocprofiler::is_setup()
}
#endif
} // namespace component
} // namespace omnitrace
namespace tim
{
namespace component
{
using ::omnitrace::component::rocm_data_tracker;
using ::omnitrace::component::rocm_feature_value;
using ::omnitrace::component::rocprofiler_data;
using ::omnitrace::component::rocprofiler_value;
} // namespace component
} // namespace tim
namespace tim
{
namespace operation
{
template <>
@@ -214,25 +216,26 @@ struct get_storage<component::rocm_data_tracker>
} // namespace tim
#if !defined(OMNITRACE_USE_ROCTRACER)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::rocprofiler_data, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::rocprofiler_data, false_type)
#endif
TIMEMORY_SET_COMPONENT_API(component::rocprofiler_data, project::timemory,
category::timing, os::supports_unix)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category, component::rocprofiler_data,
false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::rocprofiler_data, false_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_units, component::rocprofiler_data, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_timing_category, component::rocprofiler_data,
false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::rocprofiler_data,
false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_units, component::rocprofiler_data, false_type)
TIMEMORY_STATISTICS_TYPE(component::rocprofiler_data, component::rocprofiler_value)
TIMEMORY_STATISTICS_TYPE(component::rocm_data_tracker, component::rocm_feature_value)
TIMEMORY_DEFINE_CONCRETE_TRAIT(report_units, component::rocm_data_tracker, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(report_units, component::rocm_data_tracker, false_type)
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(rocprofiler, false, void)
TIMEMORY_DECLARE_EXTERN_COMPONENT(rocprofiler_data, true, double)
OMNITRACE_DECLARE_EXTERN_COMPONENT(rocprofiler, false, void)
OMNITRACE_DECLARE_EXTERN_COMPONENT(rocprofiler_data, true, double)
#endif
@@ -21,6 +21,7 @@
// SOFTWARE.
#include "library/components/roctracer.hpp"
#include "library/common.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
@@ -31,9 +32,7 @@
#include "library/sampling.hpp"
#include "library/thread_data.hpp"
using namespace omnitrace;
namespace tim
namespace omnitrace
{
namespace component
{
@@ -243,7 +242,7 @@ roctracer::protect_flush_activity()
[]() { ++roctracer_activity_count(); });
}
} // namespace component
} // namespace tim
} // namespace omnitrace
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(roctracer, false, void)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(roctracer_data, true, double)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(roctracer, false, void)
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(roctracer_data, true, double)
@@ -22,6 +22,7 @@
#pragma once
#include "library/common.hpp"
#include "library/components/fwd.hpp"
#include "library/defines.hpp"
@@ -35,12 +36,13 @@
#include <timemory/mpl/types.hpp>
#include <timemory/utility/transient_function.hpp>
namespace tim
OMNITRACE_COMPONENT_ALIAS(roctracer_data,
::tim::component::data_tracker<double, roctracer>)
namespace omnitrace
{
namespace component
{
using roctracer_data = data_tracker<double, roctracer>;
struct roctracer
: base<roctracer, void>
, private policy::instance_tracker<roctracer, false>
@@ -87,23 +89,25 @@ roctracer::is_setup()
}
#endif
} // namespace component
} // namespace tim
} // namespace omnitrace
#if !defined(OMNITRACE_USE_ROCTRACER)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer_data, false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_available, component::roctracer_data, false_type)
#endif
TIMEMORY_SET_COMPONENT_API(component::roctracer_data, project::timemory, category::timing,
os::supports_unix)
TIMEMORY_DEFINE_CONCRETE_TRAIT(is_timing_category, component::roctracer_data, true_type)
TIMEMORY_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::roctracer_data, true_type)
TIMEMORY_SET_COMPONENT_API(omnitrace::component::roctracer_data, project::timemory,
category::timing, os::supports_unix)
OMNITRACE_DEFINE_CONCRETE_TRAIT(is_timing_category, component::roctracer_data, true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(uses_timing_units, component::roctracer_data, true_type)
#if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
#if defined(OMNITRACE_USE_ROCTRACER) && OMNITRACE_USE_ROCTRACER > 0
# if !defined(OMNITRACE_EXTERN_COMPONENTS) || \
(defined(OMNITRACE_EXTERN_COMPONENTS) && OMNITRACE_EXTERN_COMPONENTS > 0)
# include <timemory/operations.hpp>
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(roctracer, false, void)
TIMEMORY_DECLARE_EXTERN_COMPONENT(roctracer_data, true, double)
OMNITRACE_DECLARE_EXTERN_COMPONENT(roctracer, false, void)
OMNITRACE_DECLARE_EXTERN_COMPONENT(roctracer_data, true, double)
# endif
#endif
@@ -23,23 +23,46 @@
#pragma once
#include "library/defines.hpp"
#include "library/timemory.hpp"
#include <timemory/mpl/concepts.hpp>
#include <memory>
#include <optional>
namespace omnitrace
{
namespace component
{
// timemory component which calls omnitrace functions
// (used in gotcha wrappers)
struct omnitrace : comp::base<omnitrace, void>
{
static std::string label() { return "omnitrace"; }
void start();
void stop();
void set_prefix(const char*);
namespace concepts = ::tim::concepts; // NOLINT
private:
const char* m_prefix = nullptr;
};
} // namespace component
template <typename Tp>
struct thread_deleter;
// unique ptr type for omnitrace
template <typename Tp>
using unique_ptr_t = std::unique_ptr<Tp, thread_deleter<Tp>>;
} // namespace omnitrace
namespace tim
{
namespace concepts
{
template <typename Tp>
struct is_unique_pointer : std::false_type
{};
template <typename Tp>
struct is_unique_pointer<::omnitrace::unique_ptr_t<Tp>> : std::true_type
{};
template <typename Tp>
struct is_unique_pointer<std::unique_ptr<Tp>> : std::true_type
{};
template <typename Tp>
struct is_optional : std::false_type
{};
template <typename Tp>
struct is_optional<std::optional<Tp>> : std::true_type
{};
} // namespace concepts
} // namespace tim
@@ -34,6 +34,8 @@
#include <timemory/backends/threading.hpp>
#include <timemory/environment.hpp>
#include <timemory/environment/types.hpp>
#include <timemory/log/color.hpp>
#include <timemory/log/logger.hpp>
#include <timemory/manager.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/settings.hpp>
@@ -174,7 +176,7 @@ configure_settings(bool _init)
if(get_state() < State::Init)
{
::tim::print_demangled_backtrace<64>();
timemory_print_demangled_backtrace<64>();
OMNITRACE_THROW("config::configure_settings() called before "
"omnitrace_init_library. state = %s",
std::to_string(get_state()).c_str());
@@ -220,6 +222,9 @@ configure_settings(bool _init)
"for continuous integration)",
false, "debugging", "advanced");
OMNITRACE_CONFIG_SETTING(bool, "OMNITRACE_COLORIZED_LOG", "Enable colorized logging",
true, "debugging", "advanced");
OMNITRACE_CONFIG_EXT_SETTING(int, "OMNITRACE_DL_VERBOSE",
"Verbosity within the omnitrace-dl library", 0,
"debugging", "libomnitrace-dl", "advanced");
@@ -300,12 +305,35 @@ configure_settings(bool _init)
"Number of software interrupts per second when OMNITTRACE_USE_SAMPLING=ON", 10.0,
"sampling", "process_sampling");
OMNITRACE_CONFIG_SETTING(double, "OMNITRACE_SAMPLING_CPUTIME_FREQ",
"Number of software interrupts per second of CPU-time. "
"Defaults to OMNITRACE_SAMPLING_FREQ when <= 0.0",
-1.0, "sampling", "advanced");
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_SAMPLING_REALTIME_FREQ",
"Number of software interrupts per second of real (wall) time. "
"Defaults to OMNITRACE_SAMPLING_FREQ when <= 0.0",
-1.0, "sampling", "advanced");
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_SAMPLING_DELAY",
"Time (in seconds) to wait before the first sampling signal is delivered, "
"increasing this value can fix deadlocks during init",
0.5, "sampling", "process_sampling");
OMNITRACE_CONFIG_SETTING(double, "OMNITRACE_SAMPLING_CPUTIME_DELAY",
"Time (in seconds) to wait before the first CPU-time "
"sampling signal is delivered. "
"Defaults to OMNITRACE_SAMPLING_DELAY when <= 0.0",
-1.0, "sampling", "advanced");
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_SAMPLING_REALTIME_DELAY",
"Time (in seconds) to wait before the first real (wall) time sampling signal is "
"delivered. Defaults to OMNITRACE_SAMPLING_DELAY when <= 0.0",
-1.0, "sampling", "advanced");
OMNITRACE_CONFIG_SETTING(
double, "OMNITRACE_PROCESS_SAMPLING_FREQ",
"Number of measurements per second when OMNITTRACE_USE_PROCESS_SAMPLING=ON. If "
@@ -383,7 +411,7 @@ configure_settings(bool _init)
"CPU time used by the current process, "
"and CPU time expended on behalf of the process by the "
"system. This is recommended.",
true, "timemory", "sampling", "advanced");
true, "sampling", "advanced");
auto _sigrt_range = SIGRTMAX - SIGRTMIN;
@@ -472,6 +500,13 @@ configure_settings(bool _init)
"data", "advanced")
->set_choices(get_available_perfetto_categories<std::vector<std::string>>());
OMNITRACE_CONFIG_SETTING(
uint64_t, "OMNITRACE_THREAD_POOL_SIZE",
"Max number of threads for processing background tasks",
std::max<uint64_t>(std::min<uint64_t>(4, std::thread::hardware_concurrency() / 2),
1),
"parallelism", "advanced");
OMNITRACE_CONFIG_EXT_SETTING(int64_t, "OMNITRACE_CRITICAL_TRACE_COUNT",
"Number of critical trace to export (0 == all)",
int64_t{ 0 }, "data", "critical_trace",
@@ -482,12 +517,6 @@ configure_settings(bool _init)
"memory before submitting to shared buffer",
uint64_t{ 2000 }, "data", "critical_trace", "advanced");
OMNITRACE_CONFIG_SETTING(
uint64_t, "OMNITRACE_CRITICAL_TRACE_NUM_THREADS",
"Number of threads to use when generating the critical trace",
std::min<uint64_t>(8, std::thread::hardware_concurrency()), "parallelism",
"critical_trace", "advanced");
OMNITRACE_CONFIG_EXT_SETTING(
int64_t, "OMNITRACE_CRITICAL_TRACE_PER_ROW",
"How many critical traces per row in perfetto (0 == all in one row)",
@@ -688,6 +717,9 @@ configure_settings(bool _init)
settings::suppress_config() = true;
if(!get_env("OMNITRACE_COLORIZED_LOG", _config->get<bool>("OMNITRACE_COLORIZED_LOG")))
tim::log::colorized() = false;
if(_init)
{
using argparser_t = tim::argparse::argument_parser;
@@ -758,6 +790,7 @@ configure_mode_settings()
set_default_setting_value("OMNITRACE_USE_CODE_COVERAGE", true);
_set("OMNITRACE_USE_PERFETTO", false);
_set("OMNITRACE_USE_TIMEMORY", false);
//_set("OMNITRACE_USE_CAUSAL", false);
_set("OMNITRACE_USE_ROCM_SMI", false);
_set("OMNITRACE_USE_ROCTRACER", false);
_set("OMNITRACE_USE_ROCPROFILER", false);
@@ -814,6 +847,7 @@ configure_mode_settings()
{
_set("OMNITRACE_USE_PERFETTO", false);
_set("OMNITRACE_USE_TIMEMORY", false);
//_set("OMNITRACE_USE_CAUSAL", false);
_set("OMNITRACE_USE_ROCM_SMI", false);
_set("OMNITRACE_USE_ROCTRACER", false);
_set("OMNITRACE_USE_ROCPROFILER", false);
@@ -878,7 +912,7 @@ configure_signal_handler()
"signal %s (%i) ignored (OMNITRACE_IGNORE_DYNINST_TRAMPOLINE=ON)\n",
std::get<0>(_info).c_str(), _v);
if(get_verbose_env() > 1 || get_debug_env())
::tim::print_demangled_backtrace<64>();
timemory_print_demangled_backtrace<64>();
if(_old_handler) _old_handler(_v);
}
};
@@ -1071,7 +1105,8 @@ print_banner(std::ostream& _os)
\______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______|
)banner";
_os << _banner << std::endl;
tim::log::stream(_os, tim::log::color::info()) << _banner;
_os << std::endl;
}
void
@@ -1139,7 +1174,6 @@ print_settings(
_spacer << "#" << std::setw(tot_width + _spacer_extra) << ""
<< "#";
_os << _spacer.str() << "\n";
// _os << "# api::omnitrace settings:" << std::setw(tot_width - 8) << "#" << "\n";
for(const auto& itr : _data)
{
_os << ((_md) ? "| " : "# ");
@@ -1174,9 +1208,11 @@ print_settings(
}
_os << ((_md) ? "\n" : " #\n");
}
_os << _spacer.str() << "\n";
_ros << _os.str() << std::flush;
tim::log::stream(_ros, tim::log::color::info()) << _os.str();
_ros << std::flush;
}
void
@@ -1191,9 +1227,13 @@ print_settings(bool _include_env)
if(_include_env)
{
std::cerr << tim::log::info;
tim::print_env(std::cerr, [_is_omnitrace_option](const std::string& _v) {
return _is_omnitrace_option(_v, std::set<std::string>{});
auto _is_omni_opt = _is_omnitrace_option(_v, std::set<std::string>{});
if(settings::verbose() >= 2 || settings::debug()) return _is_omni_opt;
return (_is_omni_opt && _v.find("OMNITRACE_SIGNAL_") != 0);
});
std::cerr << tim::log::flush;
}
print_settings(std::cerr, _is_omnitrace_option);
@@ -1607,10 +1647,9 @@ get_critical_trace_update_freq()
}
uint64_t
get_critical_trace_num_threads()
get_thread_pool_size()
{
static uint64_t _v =
get_config()->get<uint64_t>("OMNITRACE_CRITICAL_TRACE_NUM_THREADS");
static uint64_t _v = get_config()->get<uint64_t>("OMNITRACE_THREAD_POOL_SIZE");
return _v;
}
@@ -1671,13 +1710,49 @@ get_sampling_freq()
return static_cast<tim::tsettings<double>&>(*_v->second).get();
}
double&
double
get_sampling_cpu_freq()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME_FREQ");
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
if(_val <= 0.0) _val = get_sampling_freq();
return _val;
}
double
get_sampling_real_freq()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_FREQ");
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
if(_val <= 0.0) _val = get_sampling_freq();
return _val;
}
double
get_sampling_delay()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLING_DELAY");
return static_cast<tim::tsettings<double>&>(*_v->second).get();
}
double
get_sampling_cpu_delay()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLING_CPUTIME_DELAY");
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
if(_val <= 0.0) _val = get_sampling_delay();
return _val;
}
double
get_sampling_real_delay()
{
static auto _v = get_config()->find("OMNITRACE_SAMPLING_REALTIME_DELAY");
auto& _val = static_cast<tim::tsettings<double>&>(*_v->second).get();
if(_val <= 0.0) _val = get_sampling_delay();
return _val;
}
std::string
get_sampling_cpus()
{
@@ -22,7 +22,7 @@
#pragma once
#include "library/api.hpp"
#include "api.hpp"
#include "library/common.hpp"
#include "library/defines.hpp"
#include "library/state.hpp"
@@ -262,7 +262,7 @@ uint64_t
get_critical_trace_update_freq();
uint64_t
get_critical_trace_num_threads();
get_thread_pool_size();
std::string
get_trace_hsa_api_types();
@@ -283,9 +283,21 @@ get_instrumentation_interval();
double
get_sampling_freq();
double&
double
get_sampling_cpu_freq();
double
get_sampling_real_freq();
double
get_sampling_delay();
double
get_sampling_cpu_delay();
double
get_sampling_real_delay();
std::string
get_sampling_cpus();
@@ -21,7 +21,7 @@
// SOFTWARE.
#include "library/coverage.hpp"
#include "library/api.hpp"
#include "api.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/impl/coverage.hpp"
@@ -238,8 +238,8 @@ post_process()
if(tim::filepath::open(ofs, _fname))
{
if(get_verbose() >= 0)
fprintf(stderr, "[%s][coverage]|%i> Outputting '%s'...\n",
TIMEMORY_PROJECT_NAME, dmp::rank(), _fname.c_str());
operation::file_output_message<code_coverage>{}(
_fname, std::string{ "coverage" });
for(auto& itr : _coverage_data)
{
// if(get_debug() && get_verbose() >= 2)
@@ -283,8 +283,8 @@ post_process()
if(tim::filepath::open(ofs, _fname))
{
if(get_verbose() >= 0)
fprintf(stderr, "[%s][coverage]|%i> Outputting '%s'...\n",
TIMEMORY_PROJECT_NAME, dmp::rank(), _fname.c_str());
operation::file_output_message<code_coverage>{}(
_fname, std::string{ "coverage" });
ofs << oss.str() << "\n";
}
else
@@ -22,12 +22,14 @@
#include "library/cpu_freq.hpp"
#include "library/common.hpp"
#include "library/components/cpu_freq.hpp"
#include "library/components/fwd.hpp"
#include "library/components/pthread_create_gotcha.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
#include "library/perfetto.hpp"
#include "library/thread_data.hpp"
#include "library/thread_info.hpp"
#include "library/timemory.hpp"
#include <timemory/components/rusage/backends.hpp>
@@ -44,39 +46,21 @@
#include <utility>
#include <vector>
namespace cpuinfo = tim::procfs::cpuinfo;
namespace omnitrace
{
namespace cpu_freq
{
using namespace ::tim::cpu_freq;
using cpu_freq_component = component::cpu_freq;
template <typename... Tp>
using type_list = tim::type_list<Tp...>;
namespace
{
struct cpu_freq // cpu frequency
{};
struct cpu_page // amount of memory allocated in pages
{};
struct cpu_virt // virtual memory usage
{};
struct cpu_peak // memory high-water mark
{};
struct cpu_context_switch
{};
struct cpu_page_fault
{};
struct cpu_user_mode_time // cpu time spent in userspace
{};
struct cpu_kernel_mode_time // cpu time spent in kernelspace
{};
using cpu_data_tuple_t = std::tuple<size_t, int64_t, int64_t, int64_t, int64_t, int64_t,
int64_t, int64_t, std::vector<double>>;
std::set<size_t> enabled_cpu_freqs = {};
std::deque<cpu_data_tuple_t> cpu_data = {};
int64_t ncpu = threading::affinity::hw_concurrency();
int64_t, int64_t, cpu_freq_component>;
std::deque<cpu_data_tuple_t> cpu_data = {};
template <typename... Types>
void init_perfetto_counter_tracks(type_list<Types...>)
@@ -87,18 +71,6 @@ void init_perfetto_counter_tracks(type_list<Types...>)
} // namespace cpu_freq
} // namespace omnitrace
TIMEMORY_DEFINE_NAME_TRAIT("cpu_freq", omnitrace::cpu_freq::cpu_freq);
TIMEMORY_DEFINE_NAME_TRAIT("process_page_fault", omnitrace::cpu_freq::cpu_page);
TIMEMORY_DEFINE_NAME_TRAIT("process_virtual_memory", omnitrace::cpu_freq::cpu_virt);
TIMEMORY_DEFINE_NAME_TRAIT("process_memory_hwm", omnitrace::cpu_freq::cpu_peak);
TIMEMORY_DEFINE_NAME_TRAIT("process_context_switch",
omnitrace::cpu_freq::cpu_context_switch);
TIMEMORY_DEFINE_NAME_TRAIT("process_page_fault", omnitrace::cpu_freq::cpu_page_fault);
TIMEMORY_DEFINE_NAME_TRAIT("process_user_cpu_time",
omnitrace::cpu_freq::cpu_user_mode_time);
TIMEMORY_DEFINE_NAME_TRAIT("process_kernel_cpu_time",
omnitrace::cpu_freq::cpu_kernel_mode_time);
namespace omnitrace
{
namespace cpu_freq
@@ -107,109 +79,24 @@ void
setup()
{
init_perfetto_counter_tracks(
type_list<cpu_freq, cpu_page, cpu_virt, cpu_peak, cpu_context_switch,
type_list<cpu_freq_component, cpu_page, cpu_virt, cpu_peak, cpu_context_switch,
cpu_page_fault, cpu_user_mode_time, cpu_kernel_mode_time>{});
}
void
config()
{
auto _ncpu = cpuinfo::freq::size();
auto _enabled_freqs = std::set<size_t>{};
auto _enabled_val = get_sampling_cpus();
for(auto& itr : _enabled_val)
itr = tolower(itr);
if(_enabled_val == "off")
_enabled_val = "none";
else if(_enabled_val == "on")
_enabled_val = "all";
if(_enabled_val != "none" && _enabled_val != "all")
{
auto _enabled = tim::delimit(_enabled_val, ",; \t");
if(_enabled.empty())
{
for(size_t i = 0; i < _ncpu; ++i)
_enabled_freqs.emplace(i);
}
for(auto&& _v : _enabled)
{
if(_v.find_first_not_of("0123456789-") != std::string::npos)
{
OMNITRACE_VERBOSE_F(
0,
"Invalid CPU specification. Only numerical values (e.g., 0) or "
"ranges (e.g., 0-7) are permitted. Ignoring %s...",
_v.c_str());
continue;
}
if(_v.find('-') != std::string::npos)
{
auto _vv = tim::delimit(_v, "-");
OMNITRACE_CONDITIONAL_THROW(
_vv.size() != 2,
"Invalid CPU range specification: %s. Required format N-M, e.g. 0-4",
_v.c_str());
for(size_t i = std::stoull(_vv.at(0)); i <= std::stoull(_vv.at(1)); ++i)
_enabled_freqs.emplace(i);
}
else
{
_enabled_freqs.emplace(std::stoull(_v));
}
}
}
else if(_enabled_val == "all")
{
for(size_t i = 0; i < _ncpu; ++i)
_enabled_freqs.emplace(i);
}
else if(_enabled_val == "none")
{
_enabled_freqs.clear();
}
for(auto itr : _enabled_freqs)
{
if(itr < cpuinfo::freq::size())
_enabled_freqs.emplace(itr);
else
{
OMNITRACE_VERBOSE(
0, "[cpu_freq::config] Warning! Removing invalid cpu %zu...\n", itr);
}
}
if(!cpuinfo::freq{})
{
OMNITRACE_VERBOSE(0, "[cpu_freq::config] Warning! CPU frequencies are disabled "
":: unable to open /proc/cpuinfo");
_enabled_freqs.clear();
}
OMNITRACE_CI_FAIL(!cpuinfo::freq{}, "[cpu_freq::config] CPU frequencies are disabled "
":: unable to open /proc/cpuinfo");
enabled_cpu_freqs = _enabled_freqs;
cpu_freq_component::configure();
}
void
sample()
{
std::vector<double> _freqs{};
if(!enabled_cpu_freqs.empty())
{
_freqs.reserve(enabled_cpu_freqs.size());
auto&& _freq = cpuinfo::freq{};
for(const auto& itr : enabled_cpu_freqs)
{
_freqs.emplace_back(_freq(itr));
}
}
auto _ts = tim::get_clock_real_now<size_t, std::nano>();
tim::rusage_cache _rcache{ RUSAGE_SELF };
auto _rcache = tim::rusage_cache{ RUSAGE_SELF };
auto _freqs = cpu_freq_component{}.sample();
// user and kernel mode times are in microseconds
cpu_data.emplace_back(
_ts, tim::get_page_rss(), tim::get_virt_mem(), _rcache.get_peak_rss(),
@@ -276,7 +163,12 @@ post_process()
OMNITRACE_PRINT("Post-processing %zu cpu frequency and memory usage entries...\n",
cpu_data.size());
auto _process_frequencies = [](size_t _idx, size_t _offset) {
using freq_track = perfetto_counter_track<cpu_freq>;
using freq_track = perfetto_counter_track<cpu_freq_component>;
const auto& _thread_info = thread_info::get(0, LookupTID);
OMNITRACE_CI_THROW(!_thread_info, "Missing thread info for thread 0");
if(!_thread_info) return;
if(!freq_track::exists(_idx))
{
auto addendum = [&](const char* _v) {
@@ -289,12 +181,12 @@ post_process()
{
uint64_t _ts = std::get<0>(itr);
double _freq = std::get<8>(itr).at(_offset);
if(!pthread_create_gotcha::is_valid_execution_time(0, _ts)) continue;
write_perfetto_counter_track<cpu_freq>(index{ _idx }, _ts, _freq);
if(!_thread_info->is_valid_time(_ts)) continue;
write_perfetto_counter_track<cpu_freq_component>(index{ _idx }, _ts, _freq);
}
auto _end_ts = pthread_create_gotcha::get_execution_time(0)->second;
write_perfetto_counter_track<cpu_freq>(index{ _idx }, _end_ts, 0);
auto _end_ts = _thread_info->get_stop();
write_perfetto_counter_track<cpu_freq_component>(index{ _idx }, _end_ts, 0);
};
auto _process_cpu_rusage = []() {
@@ -305,10 +197,14 @@ post_process()
"Page Faults", "User Time", "Kernel Time" },
{ "MB", "MB", "MB", "", "", "sec", "sec" });
const auto& _thread_info = thread_info::get(0, LookupTID);
OMNITRACE_CI_THROW(!_thread_info, "Missing thread info for thread 0");
if(!_thread_info) return;
for(auto& itr : cpu_data)
{
uint64_t _ts = std::get<0>(itr);
if(!pthread_create_gotcha::is_valid_execution_time(0, _ts)) continue;
if(!_thread_info->is_valid_time(_ts)) continue;
double _page = std::get<1>(itr);
double _virt = std::get<2>(itr);
@@ -326,7 +222,7 @@ post_process()
write_perfetto_counter_track<cpu_kernel_mode_time>(_ts, _kern / units::sec);
}
auto _end_ts = pthread_create_gotcha::get_execution_time(0)->second;
auto _end_ts = _thread_info->get_stop();
write_perfetto_counter_track<cpu_page>(_end_ts, 0.0);
write_perfetto_counter_track<cpu_virt>(_end_ts, 0.0);
write_perfetto_counter_track<cpu_peak>(_end_ts, 0.0);
@@ -337,13 +233,14 @@ post_process()
};
_process_cpu_rusage();
auto& enabled_cpu_freqs = cpu_freq_component::get_enabled_cpus();
for(auto itr = enabled_cpu_freqs.begin(); itr != enabled_cpu_freqs.end(); ++itr)
{
auto _idx = *itr;
auto _offset = std::distance(enabled_cpu_freqs.begin(), itr);
_process_frequencies(_idx, _offset);
}
enabled_cpu_freqs.clear();
}
} // namespace cpu_freq
@@ -547,7 +547,6 @@ void
add_hash_id(const hash_ids& _labels)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
std::unique_lock<std::mutex> _lk{ tasking::critical_trace::get_mutex() };
if(!tasking::critical_trace::get_task_group().pool()) return;
tasking::critical_trace::get_task_group().exec([_labels]() {
static std::mutex _mtx{};
@@ -578,7 +577,6 @@ update(int64_t _tid)
{
if(!get_use_critical_trace() && !get_use_rocm_smi()) return;
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
std::unique_lock<std::mutex> _lk{ tasking::critical_trace::get_mutex() };
if(!tasking::critical_trace::get_task_group().pool()) return;
call_chain _data{};
std::swap(_data, *critical_trace::get(_tid));
@@ -590,7 +588,6 @@ compute(int64_t _tid)
{
update(_tid);
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
std::unique_lock<std::mutex> _lk{ tasking::critical_trace::get_mutex() };
if(!tasking::critical_trace::get_task_group().pool()) return;
tasking::critical_trace::get_task_group().exec(compute_critical_trace);
}
@@ -808,13 +805,13 @@ compute_critical_trace()
using perfstats_t =
tim::lightweight_tuple<comp::wall_clock, comp::peak_rss, comp::page_rss>;
perfstats_t _ct_perf{ JOIN("", "[", __FUNCTION__, "]") };
perfstats_t _ct_perf{};
_ct_perf.start();
try
{
OMNITRACE_CT_DEBUG("[%s] initial call chain: %zu entries\n", __FUNCTION__,
complete_call_chain.size());
OMNITRACE_VERBOSE_F(1, "[%s] initial call chain: %zu entries\n", __FUNCTION__,
complete_call_chain.size());
perfstats_t _perf{ get_perf_name(__FUNCTION__) };
_perf.start();
@@ -822,7 +819,7 @@ compute_critical_trace()
std::sort(complete_call_chain.begin(), complete_call_chain.end());
_perf.stop().rekey("Sorting critical trace");
OMNITRACE_CT_DEBUG("%s\n", JOIN("", _perf).c_str());
OMNITRACE_VERBOSE_F(1, "%s\n", JOIN("", _perf).c_str());
_perf.reset().start();
save_call_chain_json(
@@ -830,20 +827,16 @@ compute_critical_trace()
complete_call_chain, true, __FUNCTION__);
_perf.stop().rekey("Save call-chain");
OMNITRACE_CT_DEBUG("%s\n", JOIN("", _perf).c_str());
OMNITRACE_VERBOSE_F(1, "%s\n", JOIN("", _perf).c_str());
} catch(std::exception& e)
{
OMNITRACE_PRINT("Thread exited '%s' with exception: %s\n", __FUNCTION__,
e.what());
OMNITRACE_PRINT_F("Thread exited '%s' with exception: %s\n", __FUNCTION__,
e.what());
TIMEMORY_CONDITIONAL_DEMANGLED_BACKTRACE(true, 32);
}
_ct_perf.stop();
auto _ct_msg = JOIN("", _ct_perf);
auto _ct_pos = _ct_msg.find(">>> ");
if(_ct_pos != std::string::npos) _ct_msg = _ct_msg.substr(_ct_pos + 5);
OMNITRACE_PRINT("%s\n", _ct_msg.c_str());
OMNITRACE_PRINT_F("%s\n", _ct_perf.stop().as_string<false, false>().c_str());
}
} // namespace
@@ -869,8 +862,7 @@ get_entries(int64_t _ts, const std::function<bool(const entry&)>& _eval)
*_targ = _v;
};
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
std::unique_lock<std::mutex> _lk{ tasking::critical_trace::get_mutex() };
size_t _n = 0;
size_t _n = 0;
std::vector<std::pair<std::string, entry>> _v{};
if(!tasking::critical_trace::get_task_group().pool()) return _v;
tasking::critical_trace::get_task_group().exec(_func, &_v, &_n);
@@ -22,16 +22,23 @@
#pragma once
#include "library/common.hpp"
#include "library/config.hpp"
#include "library/defines.hpp"
#include "library/runtime.hpp"
#include "library/thread_data.hpp"
#include <timemory/backends/process.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/hash/types.hpp>
#include <timemory/macros/language.hpp>
#include <timemory/tpls/cereal/cereal.hpp>
#include <timemory/utility/demangle.hpp>
#include <timemory/utility/utility.hpp>
#include <cstdint>
#include <cstdlib>
#include <mutex>
#include <ostream>
#include <string>
#include <vector>
@@ -284,4 +291,78 @@ struct id
{};
} // namespace critical_trace
template <critical_trace::Device DevID, critical_trace::Phase PhaseID,
bool UpdateStack = true>
inline void
add_critical_trace(int32_t _targ_tid, size_t _cpu_cid, size_t _gpu_cid,
size_t _parent_cid, int64_t _ts_beg, int64_t _ts_val, int32_t _devid,
uintptr_t _queue, size_t _hash, uint32_t _depth, uint16_t _prio = 0)
{
// clang-format off
// these are used to create unique type mutexes
struct critical_insert {};
struct cpu_cid_stack {};
// clang-format on
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
static constexpr auto num_mutexes = max_supported_threads;
static auto _update_freq = critical_trace::get_update_frequency();
static auto _pid = process::get_id();
auto _self_tid = threading::get_id();
if constexpr(PhaseID != critical_trace::Phase::NONE)
{
auto& _self_mtx =
type_mutex<critical_insert, project::omnitrace, num_mutexes>(_self_tid);
auto_lock_t _self_lk{ _self_mtx, std::defer_lock };
// unique lock per thread
if(!_self_lk.owns_lock()) _self_lk.lock();
auto& _critical_trace = critical_trace::get(_self_tid);
_critical_trace->emplace_back(critical_trace::entry{
DevID, PhaseID, _prio, _depth, _devid, _pid, _targ_tid, _cpu_cid, _gpu_cid,
_parent_cid, _ts_beg, _ts_val, _queue, _hash });
}
if constexpr(UpdateStack)
{
auto& _self_mtx = get_cpu_cid_stack_lock(_self_tid);
auto& _targ_mtx = get_cpu_cid_stack_lock(_targ_tid);
auto_lock_t _self_lk{ _self_mtx, std::defer_lock };
auto_lock_t _targ_lk{ _targ_mtx, std::defer_lock };
// unique lock per thread
auto _lock = [&_self_lk, &_targ_lk, _self_tid, _targ_tid]() {
if(!_self_lk.owns_lock() && _self_tid != _targ_tid) _self_lk.lock();
if(!_targ_lk.owns_lock()) _targ_lk.lock();
};
if constexpr(PhaseID == critical_trace::Phase::NONE)
{
_lock();
get_cpu_cid_stack(_targ_tid)->emplace_back(_cpu_cid);
}
else if constexpr(PhaseID == critical_trace::Phase::BEGIN)
{
_lock();
get_cpu_cid_stack(_targ_tid)->emplace_back(_cpu_cid);
}
else if constexpr(PhaseID == critical_trace::Phase::END)
{
_lock();
get_cpu_cid_stack(_targ_tid)->pop_back();
if(_gpu_cid == 0 && _cpu_cid % _update_freq == (_update_freq - 1))
critical_trace::update(_targ_tid);
}
tim::consume_parameters(_lock);
}
tim::consume_parameters(_pid, _targ_tid, _cpu_cid, _gpu_cid, _parent_cid, _ts_beg,
_ts_val, _devid, _queue, _hash, _depth, _prio, num_mutexes);
}
} // namespace omnitrace
@@ -28,6 +28,7 @@
#include <timemory/backends/dmp.hpp>
#include <timemory/backends/process.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/log/logger.hpp>
#include <timemory/mpl/concepts.hpp>
#include <timemory/utility/backtrace.hpp>
#include <timemory/utility/locking.hpp>
@@ -73,10 +74,12 @@ namespace debug
inline void
flush()
{
fprintf(stdout, "%s", ::tim::log::color::end());
fflush(stdout);
std::cout << std::flush;
std::cout << ::tim::log::color::end() << std::flush;
fprintf(stderr, "%s", ::tim::log::color::end());
fflush(stderr);
std::cerr << std::flush;
std::cerr << ::tim::log::color::end() << std::flush;
}
//
struct lock
@@ -112,7 +115,7 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
} // namespace omnitrace
#if !defined(OMNITRACE_DEBUG_BUFFER_LEN)
# define OMNITRACE_DEBUG_BUFFER_LEN 2048
# define OMNITRACE_DEBUG_BUFFER_LEN 1024
#endif
#if !defined(OMNITRACE_PROCESS_IDENTIFIER)
@@ -159,12 +162,18 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
//--------------------------------------------------------------------------------------//
#define OMNITRACE_FPRINTF_STDERR_COLOR(COLOR) \
fprintf(stderr, "%s", ::tim::log::color::COLOR())
//--------------------------------------------------------------------------------------//
#define OMNITRACE_CONDITIONAL_PRINT(COND, ...) \
if((COND) && ::omnitrace::config::get_debug_tid() && \
::omnitrace::config::get_debug_pid()) \
{ \
::omnitrace::debug::flush(); \
::omnitrace::debug::lock _lk{}; \
OMNITRACE_FPRINTF_STDERR_COLOR(info); \
fprintf(stderr, "[omnitrace][%i][%li]%s", OMNITRACE_PROCESS_IDENTIFIER, \
OMNITRACE_THREAD_IDENTIFIER, \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
@@ -178,6 +187,7 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
{ \
::omnitrace::debug::flush(); \
::omnitrace::debug::lock _lk{}; \
OMNITRACE_FPRINTF_STDERR_COLOR(info); \
fprintf(stderr, "[omnitrace]%s", \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
fprintf(stderr, __VA_ARGS__); \
@@ -190,6 +200,7 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
{ \
::omnitrace::debug::flush(); \
::omnitrace::debug::lock _lk{}; \
OMNITRACE_FPRINTF_STDERR_COLOR(info); \
fprintf(stderr, "[omnitrace][%i][%li][%s]%s", OMNITRACE_PROCESS_IDENTIFIER, \
OMNITRACE_THREAD_IDENTIFIER, OMNITRACE_FUNCTION, \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
@@ -203,6 +214,7 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
{ \
::omnitrace::debug::flush(); \
::omnitrace::debug::lock _lk{}; \
OMNITRACE_FPRINTF_STDERR_COLOR(info); \
fprintf(stderr, "[omnitrace][%s]%s", OMNITRACE_FUNCTION, \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
fprintf(stderr, __VA_ARGS__); \
@@ -221,7 +233,8 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
auto len = strlen(_msg_buffer); \
snprintf(_msg_buffer + len, OMNITRACE_DEBUG_BUFFER_LEN - len, __VA_ARGS__); \
throw std::runtime_error(_msg_buffer); \
throw std::runtime_error( \
::tim::log::string(::tim::log::color::fatal(), _msg_buffer)); \
}
#define OMNITRACE_CONDITIONAL_BASIC_THROW(COND, ...) \
@@ -233,7 +246,8 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
auto len = strlen(_msg_buffer); \
snprintf(_msg_buffer + len, OMNITRACE_DEBUG_BUFFER_LEN - len, __VA_ARGS__); \
throw std::runtime_error(_msg_buffer); \
throw std::runtime_error( \
::tim::log::string(::tim::log::color::fatal(), _msg_buffer)); \
}
#define OMNITRACE_CI_THROW(COND, ...) \
@@ -250,13 +264,15 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
if(COND) \
{ \
::omnitrace::debug::flush(); \
OMNITRACE_FPRINTF_STDERR_COLOR(fatal); \
fprintf(stderr, "[omnitrace][%i][%li]%s", OMNITRACE_PROCESS_IDENTIFIER, \
OMNITRACE_THREAD_IDENTIFIER, \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
fprintf(stderr, __VA_ARGS__); \
::omnitrace::debug::flush(); \
::omnitrace::set_state(::omnitrace::State::Finalized); \
::tim::disable_signal_detection(); \
::tim::print_demangled_backtrace<64>(); \
timemory_print_demangled_backtrace<64>(); \
METHOD; \
}
@@ -264,12 +280,14 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
if(COND) \
{ \
::omnitrace::debug::flush(); \
OMNITRACE_FPRINTF_STDERR_COLOR(fatal); \
fprintf(stderr, "[omnitrace]%s", \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
fprintf(stderr, __VA_ARGS__); \
::omnitrace::debug::flush(); \
::omnitrace::set_state(::omnitrace::State::Finalized); \
::tim::disable_signal_detection(); \
::tim::print_demangled_backtrace<64>(); \
timemory_print_demangled_backtrace<64>(); \
METHOD; \
}
@@ -277,13 +295,15 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
if(COND) \
{ \
::omnitrace::debug::flush(); \
OMNITRACE_FPRINTF_STDERR_COLOR(fatal); \
fprintf(stderr, "[omnitrace][%i][%li][%s]%s", OMNITRACE_PROCESS_IDENTIFIER, \
OMNITRACE_THREAD_IDENTIFIER, OMNITRACE_FUNCTION, \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
fprintf(stderr, __VA_ARGS__); \
::omnitrace::debug::flush(); \
::omnitrace::set_state(::omnitrace::State::Finalized); \
::tim::disable_signal_detection(); \
::tim::print_demangled_backtrace<64>(); \
timemory_print_demangled_backtrace<64>(); \
METHOD; \
}
@@ -291,12 +311,14 @@ get_chars(T&& _c, std::index_sequence<Idx...>)
if(COND) \
{ \
::omnitrace::debug::flush(); \
OMNITRACE_FPRINTF_STDERR_COLOR(fatal); \
fprintf(stderr, "[omnitrace][%s]%s", OMNITRACE_FUNCTION, \
::omnitrace::debug::is_bracket(__VA_ARGS__) ? "" : " "); \
fprintf(stderr, __VA_ARGS__); \
::omnitrace::debug::flush(); \
::omnitrace::set_state(::omnitrace::State::Finalized); \
::tim::disable_signal_detection(); \
::tim::print_demangled_backtrace<64>(); \
timemory_print_demangled_backtrace<64>(); \
METHOD; \
}
@@ -25,14 +25,12 @@
#include "common/defines.h"
#define TIMEMORY_USER_COMPONENT_ENUM \
OMNITRACE_COMPONENT_idx, OMNITRACE_USER_REGION_idx, OMNITRACE_ROCTRACER_idx, \
OMNITRACE_ROCPROFILER_idx, OMNITRACE_SAMPLING_WALL_CLOCK_idx, \
OMNITRACE_SAMPLING_CPU_CLOCK_idx, OMNITRACE_SAMPLING_PERCENT_idx, \
OMNITRACE_SAMPLING_GPU_POWER_idx, OMNITRACE_SAMPLING_GPU_TEMP_idx, \
OMNITRACE_SAMPLING_GPU_BUSY_idx, OMNITRACE_SAMPLING_GPU_MEMORY_USAGE_idx,
OMNITRACE_ROCTRACER_idx, OMNITRACE_ROCPROFILER_idx, \
OMNITRACE_SAMPLING_WALL_CLOCK_idx, OMNITRACE_SAMPLING_CPU_CLOCK_idx, \
OMNITRACE_SAMPLING_PERCENT_idx, OMNITRACE_SAMPLING_GPU_POWER_idx, \
OMNITRACE_SAMPLING_GPU_TEMP_idx, OMNITRACE_SAMPLING_GPU_BUSY_idx, \
OMNITRACE_SAMPLING_GPU_MEMORY_USAGE_idx,
#define OMNITRACE_COMPONENT OMNITRACE_COMPONENT_idx
#define OMNITRACE_USER_REGION OMNITRACE_USER_REGION_idx
#define OMNITRACE_ROCTRACER OMNITRACE_ROCTRACER_idx
#define OMNITRACE_ROCPROFILER OMNITRACE_ROCPROFILER_idx
#define OMNITRACE_SAMPLING_WALL_CLOCK OMNITRACE_SAMPLING_WALL_CLOCK_idx
@@ -21,7 +21,6 @@
// SOFTWARE.
#include "library/dynamic_library.hpp"
#include "common/defines.h"
#include "library/common.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
@@ -23,7 +23,7 @@
#include "library/gpu.hpp"
#if defined(OMNITRACE_USE_ROCM_SMI) && OMNITRACE_USE_ROCM_SMI > 0
# include "library/components/rocm_smi.hpp"
# include "library/rocm_smi.hpp"
#elif !defined(OMNITRACE_USE_ROCM_SMI)
# define OMNITRACE_USE_ROCM_SMI 0
#endif
@@ -22,8 +22,9 @@
#define TIMEMORY_KOKKOSP_POSTFIX OMNITRACE_PUBLIC_API
#include "library/api.hpp"
#include "library/components/user_region.hpp"
#include "api.hpp"
#include "library/components/category_region.hpp"
#include "library/components/fwd.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
@@ -39,7 +40,11 @@
#include <sstream>
#include <string>
namespace kokkosp = tim::kokkosp;
namespace kokkosp = ::tim::kokkosp;
namespace category = ::tim::category;
namespace comp = ::omnitrace::component;
using kokkosp_region = comp::local_category_region<category::kokkos>;
//--------------------------------------------------------------------------------------//
@@ -140,6 +145,10 @@ extern "C"
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
tim::consume_parameters(devInfoCount, deviceInfo);
OMNITRACE_BASIC_VERBOSE_F(
0, "Initializing omnitrace kokkos connector (sequence %d, version: %llu)... ",
loadSeq, (unsigned long long) interfaceVer);
if(_standalone_initialized || (!omnitrace::config::settings_are_configured() &&
omnitrace::get_state() < omnitrace::State::Active))
{
@@ -173,10 +182,6 @@ extern "C"
}
}
OMNITRACE_BASIC_VERBOSE_F(0,
"Initializing kokkos omnitrace connector "
"(standalone, sequence %d, version: %llu)...\n",
loadSeq, (unsigned long long) interfaceVer);
OMNITRACE_BASIC_VERBOSE_F(0, "Initializing omnitrace (standalone)... ");
auto _mode = tim::get_env<std::string>("OMNITRACE_MODE", "trace");
auto _arg0 = (_initialize_arguments.empty()) ? std::string{ "unknown" }
@@ -187,21 +192,17 @@ extern "C"
omnitrace_init_hidden(_mode.c_str(), false, _arg0.c_str());
omnitrace_push_trace_hidden("kokkos_main");
}
else
{
OMNITRACE_VERBOSE_F(0,
"Initializing kokkos omnitrace connector "
"(sequence %d, version: %llu)... ",
loadSeq, (unsigned long long) interfaceVer);
}
setup_kernel_logger();
tim::trait::runtime_enabled<kokkosp::memory_tracker>::set(
omnitrace::config::get_use_timemory());
if(_standalone_initialized && omnitrace::get_verbose() >= 0)
fprintf(stderr, "Done\n");
if(omnitrace::get_verbose() >= 0)
{
fprintf(stderr, "%sDone\n%s", tim::log::color::info(),
tim::log::color::end());
}
}
void kokkosp_finalize_library()
@@ -233,16 +234,16 @@ extern "C"
: TIMEMORY_JOIN(" ", TIMEMORY_JOIN("", "[kokkos][dev", devid, ']'), name);
*kernid = kokkosp::get_unique_id();
kokkosp::logger_t{}.mark(1, __FUNCTION__, name, *kernid);
kokkosp::create_profiler<omnitrace::component::user_region>(pname, *kernid);
kokkosp::start_profiler<omnitrace::component::user_region>(*kernid);
kokkosp::create_profiler<kokkosp_region>(pname, *kernid);
kokkosp::start_profiler<kokkosp_region>(*kernid);
}
void kokkosp_end_parallel_for(uint64_t kernid)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(-1, __FUNCTION__, kernid);
kokkosp::stop_profiler<omnitrace::component::user_region>(kernid);
kokkosp::destroy_profiler<omnitrace::component::user_region>(kernid);
kokkosp::stop_profiler<kokkosp_region>(kernid);
kokkosp::destroy_profiler<kokkosp_region>(kernid);
}
//----------------------------------------------------------------------------------//
@@ -256,16 +257,16 @@ extern "C"
: TIMEMORY_JOIN(" ", TIMEMORY_JOIN("", "[kokkos][dev", devid, ']'), name);
*kernid = kokkosp::get_unique_id();
kokkosp::logger_t{}.mark(1, __FUNCTION__, name, *kernid);
kokkosp::create_profiler<omnitrace::component::user_region>(pname, *kernid);
kokkosp::start_profiler<omnitrace::component::user_region>(*kernid);
kokkosp::create_profiler<kokkosp_region>(pname, *kernid);
kokkosp::start_profiler<kokkosp_region>(*kernid);
}
void kokkosp_end_parallel_reduce(uint64_t kernid)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(-1, __FUNCTION__, kernid);
kokkosp::stop_profiler<omnitrace::component::user_region>(kernid);
kokkosp::destroy_profiler<omnitrace::component::user_region>(kernid);
kokkosp::stop_profiler<kokkosp_region>(kernid);
kokkosp::destroy_profiler<kokkosp_region>(kernid);
}
//----------------------------------------------------------------------------------//
@@ -279,16 +280,16 @@ extern "C"
: TIMEMORY_JOIN(" ", TIMEMORY_JOIN("", "[kokkos][dev", devid, ']'), name);
*kernid = kokkosp::get_unique_id();
kokkosp::logger_t{}.mark(1, __FUNCTION__, name, *kernid);
kokkosp::create_profiler<omnitrace::component::user_region>(pname, *kernid);
kokkosp::start_profiler<omnitrace::component::user_region>(*kernid);
kokkosp::create_profiler<kokkosp_region>(pname, *kernid);
kokkosp::start_profiler<kokkosp_region>(*kernid);
}
void kokkosp_end_parallel_scan(uint64_t kernid)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(-1, __FUNCTION__, kernid);
kokkosp::stop_profiler<omnitrace::component::user_region>(kernid);
kokkosp::destroy_profiler<omnitrace::component::user_region>(kernid);
kokkosp::stop_profiler<kokkosp_region>(kernid);
kokkosp::destroy_profiler<kokkosp_region>(kernid);
}
//----------------------------------------------------------------------------------//
@@ -302,16 +303,16 @@ extern "C"
: TIMEMORY_JOIN(" ", TIMEMORY_JOIN("", "[kokkos][dev", devid, ']'), name);
*kernid = kokkosp::get_unique_id();
kokkosp::logger_t{}.mark(1, __FUNCTION__, name, *kernid);
kokkosp::create_profiler<omnitrace::component::user_region>(pname, *kernid);
kokkosp::start_profiler<omnitrace::component::user_region>(*kernid);
kokkosp::create_profiler<kokkosp_region>(pname, *kernid);
kokkosp::start_profiler<kokkosp_region>(*kernid);
}
void kokkosp_end_fence(uint64_t kernid)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(-1, __FUNCTION__, kernid);
kokkosp::stop_profiler<omnitrace::component::user_region>(kernid);
kokkosp::destroy_profiler<omnitrace::component::user_region>(kernid);
kokkosp::stop_profiler<kokkosp_region>(kernid);
kokkosp::destroy_profiler<kokkosp_region>(kernid);
}
//----------------------------------------------------------------------------------//
@@ -321,9 +322,9 @@ extern "C"
if(omnitrace::get_use_perfetto()) return; // perfetto doesn't support regions
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(1, __FUNCTION__, name);
kokkosp::get_profiler_stack<omnitrace::component::user_region>().push_back(
kokkosp::profiler_t<omnitrace::component::user_region>(name));
kokkosp::get_profiler_stack<omnitrace::component::user_region>().back().start();
kokkosp::get_profiler_stack<kokkosp_region>().push_back(
kokkosp::profiler_t<kokkosp_region>(name));
kokkosp::get_profiler_stack<kokkosp_region>().back().start();
}
void kokkosp_pop_profile_region()
@@ -331,10 +332,9 @@ extern "C"
if(omnitrace::get_use_perfetto()) return; // perfetto doesn't support regions
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(-1, __FUNCTION__);
if(kokkosp::get_profiler_stack<omnitrace::component::user_region>().empty())
return;
kokkosp::get_profiler_stack<omnitrace::component::user_region>().back().stop();
kokkosp::get_profiler_stack<omnitrace::component::user_region>().pop_back();
if(kokkosp::get_profiler_stack<kokkosp_region>().empty()) return;
kokkosp::get_profiler_stack<kokkosp_region>().back().stop();
kokkosp::get_profiler_stack<kokkosp_region>().pop_back();
}
//----------------------------------------------------------------------------------//
@@ -345,14 +345,14 @@ extern "C"
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
*secid = kokkosp::get_unique_id();
auto pname = TIMEMORY_JOIN(" ", "[kokkos]", name);
kokkosp::create_profiler<omnitrace::component::user_region>(pname, *secid);
kokkosp::create_profiler<kokkosp_region>(pname, *secid);
}
void kokkosp_destroy_profile_section(uint32_t secid)
{
if(omnitrace::get_use_perfetto()) return; // perfetto doesn't support regions
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::destroy_profiler<omnitrace::component::user_region>(secid);
kokkosp::destroy_profiler<kokkosp_region>(secid);
}
//----------------------------------------------------------------------------------//
@@ -362,7 +362,7 @@ extern "C"
if(omnitrace::get_use_perfetto()) return; // perfetto doesn't support regions
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(1, __FUNCTION__, secid);
kokkosp::start_profiler<omnitrace::component::user_region>(secid);
kokkosp::start_profiler<kokkosp_region>(secid);
}
void kokkosp_stop_profile_section(uint32_t secid)
@@ -370,7 +370,7 @@ extern "C"
if(omnitrace::get_use_perfetto()) return; // perfetto doesn't support regions
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(-1, __FUNCTION__, secid);
kokkosp::start_profiler<omnitrace::component::user_region>(secid);
kokkosp::start_profiler<kokkosp_region>(secid);
}
//----------------------------------------------------------------------------------//
@@ -412,7 +412,7 @@ extern "C"
TIMEMORY_JOIN('=', dst_handle.name, dst_name),
TIMEMORY_JOIN('=', src_handle.name, src_name));
auto& _data = kokkosp::get_profiler_stack<omnitrace::component::user_region>();
auto& _data = kokkosp::get_profiler_stack<kokkosp_region>();
_data.emplace_back(name);
_data.back().audit(dst_handle, dst_name, dst_ptr, src_handle, src_name, src_ptr,
size);
@@ -424,7 +424,7 @@ extern "C"
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::logger_t{}.mark(-1, __FUNCTION__);
auto& _data = kokkosp::get_profiler_stack<omnitrace::component::user_region>();
auto& _data = kokkosp::get_profiler_stack<kokkosp_region>();
if(_data.empty()) return;
_data.back().store(std::minus<int64_t>{}, 0);
_data.back().stop();
@@ -436,7 +436,7 @@ extern "C"
void kokkosp_profile_event(const char* name)
{
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
kokkosp::profiler_t<omnitrace::component::user_region>{}.mark(name);
kokkosp::profiler_t<kokkosp_region>{}.mark(name);
}
//----------------------------------------------------------------------------------//
@@ -26,8 +26,8 @@
#if defined(OMNITRACE_USE_OMPT) && OMNITRACE_USE_OMPT > 0
# include "library/components/category_region.hpp"
# include "library/components/fwd.hpp"
# include "library/components/user_region.hpp"
# include <timemory/components/ompt.hpp>
# include <timemory/components/ompt/extern.hpp>
@@ -67,7 +67,7 @@ setup()
comp::user_ompt_bundle::global_init();
comp::user_ompt_bundle::reset();
tim::auto_lock_t lk{ tim::type_mutex<ompt_handle_t>() };
comp::user_ompt_bundle::configure<omnitrace::component::user_region>();
comp::user_ompt_bundle::configure<component::local_category_region<category::ompt>>();
f_bundle = std::make_unique<ompt_bundle_t>("omnitrace/ompt",
quirk::config<quirk::auto_start>{});
}
@@ -22,81 +22,10 @@
#pragma once
#include "library/defines.hpp"
#if defined(OMNITRACE_PERFETTO_CATEGORIES)
# error "OMNITRACE_PERFETTO_CATEGORIES is already defined. Please include \"" __FILE__ "\" before including any timemory files"
#endif
#define OMNITRACE_PERFETTO_CATEGORIES \
perfetto::Category("host").SetDescription("Host-side function tracing"), \
perfetto::Category("user").SetDescription("User-defined regions"), \
perfetto::Category("sampling").SetDescription("Host-side function sampling"), \
perfetto::Category("device_hip") \
.SetDescription("Device-side functions submitted via HSA API"), \
perfetto::Category("device_hsa") \
.SetDescription("Device-side functions submitted via HIP API"), \
perfetto::Category("rocm_hip").SetDescription("Host-side HIP functions"), \
perfetto::Category("rocm_hsa").SetDescription("Host-side HSA functions"), \
perfetto::Category("rocm_roctx").SetDescription("Host-side ROCTX labels"), \
perfetto::Category("device_busy") \
.SetDescription("Busy percentage of a GPU device"), \
perfetto::Category("device_temp") \
.SetDescription("Temperature of GPU device in degC"), \
perfetto::Category("device_power") \
.SetDescription("Power consumption of GPU device in watts"), \
perfetto::Category("device_memory_usage") \
.SetDescription("Memory usage of GPU device in MB"), \
perfetto::Category("thread_peak_memory") \
.SetDescription( \
"Peak memory usage on thread in MB (derived from sampling)"), \
perfetto::Category("thread_context_switch") \
.SetDescription("Context switches on thread (derived from sampling)"), \
perfetto::Category("thread_page_fault") \
.SetDescription("Memory page faults on thread (derived from sampling)"), \
perfetto::Category("hardware_counter") \
.SetDescription("Hardware counter value on thread (derived from sampling)"), \
perfetto::Category("cpu_freq") \
.SetDescription("CPU frequency in MHz (collected in background thread)"), \
perfetto::Category("process_page_fault") \
.SetDescription( \
"Memory page faults in process (collected in background thread)"), \
perfetto::Category("process_memory_hwm") \
.SetDescription("Memory High-Water Mark i.e. peak memory usage (collected " \
"in background thread)"), \
perfetto::Category("process_virtual_memory") \
.SetDescription("Virtual memory usage in process in MB (collected in " \
"background thread)"), \
perfetto::Category("process_context_switch") \
.SetDescription( \
"Context switches in process (collected in background thread)"), \
perfetto::Category("process_page_fault") \
.SetDescription( \
"Memory page faults in process (collected in background thread)"), \
perfetto::Category("process_user_cpu_time") \
.SetDescription("CPU time of functions executing in user-space in process " \
"in seconds (collected in background thread)"), \
perfetto::Category("process_kernel_cpu_time") \
.SetDescription("CPU time of functions executing in kernel-space in " \
"process in seconds (collected in background thread)"), \
perfetto::Category("pthread").SetDescription("Pthread functions"), \
perfetto::Category("kokkos").SetDescription("Kokkos regions"), \
perfetto::Category("mpi").SetDescription("MPI regions"), \
perfetto::Category("ompt").SetDescription("OpenMP Tools regions"), \
perfetto::Category("rccl").SetDescription( \
"ROCm Communication Collectives Library (RCCL) regions"), \
perfetto::Category("comm_data") \
.SetDescription( \
"MPI/RCCL counters for tracking amount of data sent or received"), \
perfetto::Category("critical-trace").SetDescription("Combined critical traces"), \
perfetto::Category("host-critical-trace") \
.SetDescription("Host-side critical traces"), \
perfetto::Category("device-critical-trace") \
.SetDescription("Device-side critical traces"), \
perfetto::Category("timemory").SetDescription("Events from the timemory API")
#include "library/categories.hpp"
#include "library/common.hpp"
#if defined(TIMEMORY_USE_PERFETTO)
# define TIMEMORY_PERFETTO_CATEGORIES OMNITRACE_PERFETTO_CATEGORIES
# include <timemory/components/perfetto/backends.hpp>
#else
# include <perfetto.h>
@@ -22,11 +22,12 @@
#include "library/process_sampler.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/components/rocm_smi.hpp"
#include "library/config.hpp"
#include "library/cpu_freq.hpp"
#include "library/debug.hpp"
#include "library/rocm_smi.hpp"
#include "library/runtime.hpp"
#include "library/sampling.hpp"
#include <memory>
#include <vector>
@@ -37,7 +38,6 @@ namespace process_sampler
{
namespace
{
using auto_lock_t = tim::auto_lock_t;
using promise_t = std::promise<void>;
std::unique_ptr<promise_t> polling_finished = {};
std::vector<std::unique_ptr<instance>> instances = {};
@@ -126,8 +126,6 @@ sampler::setup()
// shutdown if already running
shutdown();
pthread_gotcha::push_enable_sampling_on_child_threads(false);
if(get_use_rocm_smi())
{
auto& _rocm_smi = instances.emplace_back(std::make_unique<instance>());
@@ -158,12 +156,12 @@ sampler::setup()
polling_finished = std::make_unique<promise_t>();
set_state(State::PreInit);
pthread_gotcha::push_enable_sampling_on_child_threads(false);
get_thread() = std::make_unique<std::thread>(&poll<msec_t>, &get_sampler_state(),
msec_t{ _msec_freq }, &_prom);
_fut.wait();
pthread_gotcha::pop_enable_sampling_on_child_threads();
set_state(State::Active);
}
@@ -26,6 +26,8 @@
#include "library/defines.hpp"
#include "library/runtime.hpp"
#include "library/sampling.hpp"
#include "library/thread_data.hpp"
#include "library/thread_info.hpp"
#include <PTL/ThreadPool.hh>
@@ -39,25 +41,60 @@ namespace tasking
namespace
{
auto _thread_pool_cfg = []() {
int64_t _nthreads = 0;
if(config::settings_are_configured())
{
_nthreads = config::get_thread_pool_size();
}
else
{
const int64_t _max_threads = std::thread::hardware_concurrency() / 2;
const int64_t _min_threads = 1;
_nthreads = get_env<int64_t>("OMNITRACE_THREAD_POOL_SIZE", -1, false);
if(_nthreads == -1)
{
_nthreads = 4;
if(_nthreads > _max_threads) _nthreads = _max_threads;
if(_nthreads < _min_threads) _nthreads = _min_threads;
tim::set_env("OMNITRACE_THREAD_POOL_SIZE", _nthreads, 0);
}
}
PTL::ThreadPool::Config _v{};
_v.init = true;
_v.use_affinity = false;
_v.use_tbb = false;
_v.verbose = -1;
_v.initializer = []() {
threading::offset_this_id(true);
set_thread_state(ThreadState::Internal);
sampling::block_signals();
thread_info::init(true);
threading::set_thread_name(
JOIN('.', "ptl", PTL::Threading::GetThreadId()).c_str());
sampling::block_signals();
};
_v.finalizer = []() {};
_v.priority = 5;
_v.pool_size = 1;
_v.pool_size = _nthreads;
return _v;
};
auto&
get_thread_pool_state()
{
static auto _v = State::PreInit;
return _v;
}();
}
PTL::ThreadPool&
get_thread_pool()
{
static auto _v =
(get_thread_pool_state() = State::Active, PTL::ThreadPool{ _thread_pool_cfg() });
return _v;
}
} // namespace
namespace roctracer
{
namespace
@@ -94,13 +131,15 @@ join()
if(roctracer::get_thread_pool_state() == State::Active)
{
OMNITRACE_DEBUG_F("waiting for all roctracer tasks to complete...\n");
tasking::roctracer::get_task_group().join();
for(size_t i = 0; i < max_supported_threads; ++i)
roctracer::get_task_group(i).join();
}
if(critical_trace::get_thread_pool_state() == State::Active)
{
OMNITRACE_DEBUG_F("waiting for all critical tasks to complete...\n");
tasking::critical_trace::get_task_group().join();
for(size_t i = 0; i < max_supported_threads; ++i)
critical_trace::get_task_group(i).join();
}
}
@@ -109,70 +148,62 @@ shutdown()
{
if(roctracer::get_thread_pool_state() == State::Active)
{
OMNITRACE_DEBUG_F("Destroying the roctracer thread pool...\n");
std::unique_lock<std::mutex> _lk{ roctracer::get_mutex() };
roctracer::get_task_group().join();
roctracer::get_task_group().clear();
roctracer::get_task_group().set_pool(nullptr);
roctracer::get_thread_pool().destroy_threadpool();
OMNITRACE_DEBUG_F("Waiting on completion of roctracer tasks...\n");
for(size_t i = 0; i < max_supported_threads; ++i)
{
roctracer::get_task_group(i).join();
roctracer::get_task_group(i).clear();
roctracer::get_task_group(i).set_pool(nullptr);
}
roctracer::get_thread_pool_state() = State::Finalized;
}
if(critical_trace::get_thread_pool_state() == State::Active)
{
OMNITRACE_DEBUG_F("Destroying the critical trace thread pool...\n");
std::unique_lock<std::mutex> _lk{ critical_trace::get_mutex() };
critical_trace::get_task_group().join();
critical_trace::get_task_group().clear();
critical_trace::get_task_group().set_pool(nullptr);
critical_trace::get_thread_pool().destroy_threadpool();
OMNITRACE_DEBUG_F("Waiting on completion of critical trace tasks...\n");
for(size_t i = 0; i < max_supported_threads; ++i)
{
critical_trace::get_task_group(i).join();
critical_trace::get_task_group(i).clear();
critical_trace::get_task_group(i).set_pool(nullptr);
}
critical_trace::get_thread_pool_state() = State::Finalized;
}
if(get_thread_pool_state() == State::Active)
{
OMNITRACE_DEBUG_F("Destroying the omnitrace thread pool...\n");
get_thread_pool().destroy_threadpool();
get_thread_pool_state() = State::Finalized;
}
}
std::mutex&
roctracer::get_mutex()
size_t
initialize_threadpool(size_t _v)
{
static std::mutex _v{};
return _v;
}
PTL::ThreadPool&
roctracer::get_thread_pool()
{
static auto _v = (roctracer::get_thread_pool_state() = State::Active,
PTL::ThreadPool{ _thread_pool_cfg });
return _v;
return get_thread_pool().initialize_threadpool(_v);
}
PTL::TaskGroup<void>&
roctracer::get_task_group()
roctracer::get_task_group(int64_t _tid)
{
static PTL::TaskGroup<void> _v{ &roctracer::get_thread_pool() };
return _v;
}
std::mutex&
critical_trace::get_mutex()
{
static std::mutex _v{};
return _v;
}
PTL::ThreadPool&
critical_trace::get_thread_pool()
{
static auto _v = (critical_trace::get_thread_pool_state() = State::Active,
PTL::ThreadPool{ _thread_pool_cfg });
return _v;
struct local
{};
using thread_data_t = thread_data<PTL::TaskGroup<void>, local>;
static auto& _v =
thread_data_t::instances(construct_on_init{}, &tasking::get_thread_pool());
return *_v.at(_tid);
}
PTL::TaskGroup<void>&
critical_trace::get_task_group()
critical_trace::get_task_group(int64_t _tid)
{
static PTL::TaskGroup<void> _v{ &critical_trace::get_thread_pool() };
return _v;
struct local
{};
using thread_data_t = thread_data<PTL::TaskGroup<void>, local>;
static auto& _v =
thread_data_t::instances(construct_on_init{}, &tasking::get_thread_pool());
return *_v.at(_tid);
}
} // namespace tasking
} // namespace omnitrace
@@ -23,6 +23,7 @@
#pragma once
#include "library/defines.hpp"
#include "library/utility.hpp"
#include <PTL/PTL.hh>
@@ -41,6 +42,8 @@ join();
void
shutdown();
size_t initialize_threadpool(size_t);
//--------------------------------------------------------------------------------------//
//
// roctracer
@@ -49,17 +52,8 @@ shutdown();
namespace roctracer
{
std::mutex&
get_mutex();
PTL::ThreadPool&
get_thread_pool();
PTL::TaskGroup<void>&
get_task_group();
bool
get_thread_pool_is_active();
get_task_group(int64_t _tid = utility::get_thread_index());
} // namespace roctracer
//--------------------------------------------------------------------------------------//
@@ -70,17 +64,8 @@ get_thread_pool_is_active();
namespace critical_trace
{
std::mutex&
get_mutex();
PTL::ThreadPool&
get_thread_pool();
PTL::TaskGroup<void>&
get_task_group();
bool
get_thread_pool_is_active();
get_task_group(int64_t _tid = utility::get_thread_index());
} // namespace critical_trace
} // namespace tasking
} // namespace omnitrace
@@ -67,17 +67,17 @@ setup()
auto _use_data = tim::get_env("OMNITRACE_RCCLP_COMM_DATA", get_use_timemory());
if(!get_use_timemory())
{
trait::runtime_enabled<comp::comm_data>::set(false);
trait::runtime_enabled<comp::comm_data_tracker_t>::set(false);
trait::runtime_enabled<component::comm_data>::set(false);
trait::runtime_enabled<component::comm_data_tracker_t>::set(false);
}
else
{
trait::runtime_enabled<comp::comm_data>::set(_use_data);
trait::runtime_enabled<comp::comm_data_tracker_t>::set(_use_data);
trait::runtime_enabled<component::comm_data>::set(_use_data);
trait::runtime_enabled<component::comm_data_tracker_t>::set(_use_data);
}
comp::configure_rcclp();
global_id = comp::activate_rcclp();
component::configure_rcclp();
global_id = component::activate_rcclp();
if(librccl_handle) dlclose(librccl_handle);
}
@@ -85,7 +85,7 @@ void
shutdown()
{
if(global_id < std::numeric_limits<uint64_t>::max())
comp::deactivate_rcclp(global_id);
component::deactivate_rcclp(global_id);
}
} // namespace rcclp
} // namespace omnitrace
@@ -21,14 +21,13 @@
// SOFTWARE.
#include "library/rocm.hpp"
#include "library.hpp"
#include "library/components/rocm_smi.hpp"
#include "library/components/rocprofiler.hpp"
#include "library/components/roctracer.hpp"
#include "library/config.hpp"
#include "library/critical_trace.hpp"
#include "library/debug.hpp"
#include "library/dynamic_library.hpp"
#include "library/gpu.hpp"
#include "library/rocm_smi.hpp"
#include "library/rocprofiler.hpp"
#include "library/rocprofiler/hsa_rsrc_factory.hpp"
#include "library/roctracer.hpp"
@@ -30,10 +30,9 @@
# undef NDEBUG
#endif
#include "library/components/rocm_smi.hpp"
#include "library/rocm_smi.hpp"
#include "library/common.hpp"
#include "library/components/fwd.hpp"
#include "library/components/pthread_create_gotcha.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/config.hpp"
#include "library/critical_trace.hpp"
@@ -41,6 +40,7 @@
#include "library/gpu.hpp"
#include "library/perfetto.hpp"
#include "library/state.hpp"
#include "library/thread_info.hpp"
#include <timemory/backends/threading.hpp>
#include <timemory/components/timing/backends.hpp>
@@ -65,10 +65,8 @@ namespace omnitrace
{
namespace rocm_smi
{
using tim::type_mutex;
using auto_lock_t = tim::auto_lock_t;
using bundle_t = std::deque<data>;
using sampler_instances = thread_data<bundle_t, api::rocm_smi>;
using sampler_instances = thread_data<bundle_t, category::rocm_smi>;
namespace
{
@@ -243,8 +241,12 @@ data::post_process(uint32_t _dev_id)
if(device_count < _dev_id) return;
auto& _rocm_smi_v = sampler_instances::instances().at(_dev_id);
auto _rocm_smi = (_rocm_smi_v) ? *_rocm_smi_v : std::deque<rocm_smi::data>{};
auto& _rocm_smi_v = sampler_instances::instances().at(_dev_id);
auto _rocm_smi = (_rocm_smi_v) ? *_rocm_smi_v : std::deque<rocm_smi::data>{};
const auto& _thread_info = thread_info::get(0, LookupTID);
OMNITRACE_CI_THROW(!_thread_info, "Missing thread info for thread 0");
if(!_thread_info) return;
auto _process_perfetto = [&]() {
for(auto& itr : _rocm_smi)
@@ -262,7 +264,7 @@ data::post_process(uint32_t _dev_id)
counter_track::emplace(_dev_id, addendum("Memory Usage"), "megabytes");
}
uint64_t _ts = itr.m_ts;
if(!pthread_create_gotcha::is_valid_execution_time(0, _ts)) continue;
if(!_thread_info->is_valid_time(_ts)) continue;
double _busy = itr.m_busy_perc;
double _temp = itr.m_temp / 1.0e3;
@@ -289,7 +291,7 @@ data::post_process(uint32_t _dev_id)
{
using entry_t = critical_trace::entry;
auto _ts = itr.m_ts;
if(!pthread_create_gotcha::is_valid_execution_time(0, _ts)) continue;
if(!_thread_info->is_valid_time(_ts)) continue;
auto _entries = critical_trace::get_entries(_ts, [](const entry_t& _e) {
return _e.device == critical_trace::Device::GPU;
@@ -322,7 +324,7 @@ data::post_process(uint32_t _dev_id)
void
setup()
{
auto_lock_t _lk{ type_mutex<api::rocm_smi>() };
auto_lock_t _lk{ type_mutex<category::rocm_smi>() };
if(is_initialized() || !get_use_rocm_smi()) return;
@@ -407,7 +409,7 @@ setup()
void
shutdown()
{
auto_lock_t _lk{ type_mutex<api::rocm_smi>() };
auto_lock_t _lk{ type_mutex<category::rocm_smi>() };
if(!is_initialized()) return;
@@ -454,18 +456,18 @@ device_count()
} // namespace rocm_smi
} // namespace omnitrace
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_busy>), true,
double)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_temp>), true,
double)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_power>), true,
double)
TIMEMORY_INSTANTIATE_EXTERN_COMPONENT(
OMNITRACE_INSTANTIATE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_memory>), true,
double)
@@ -154,19 +154,19 @@ inline void set_state(State) {}
# include <timemory/components/data_tracker/components.hpp>
# include <timemory/operations.hpp>
TIMEMORY_DECLARE_EXTERN_COMPONENT(
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_busy>), true,
double)
TIMEMORY_DECLARE_EXTERN_COMPONENT(
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_temp>), true,
double)
TIMEMORY_DECLARE_EXTERN_COMPONENT(
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_power>), true,
double)
TIMEMORY_DECLARE_EXTERN_COMPONENT(
OMNITRACE_DECLARE_EXTERN_COMPONENT(
TIMEMORY_ESC(data_tracker<double, omnitrace::component::backtrace_gpu_memory>), true,
double)
@@ -174,10 +174,11 @@ rocm_dump_context_entry(context_entry_t* entry, rocprofiler_feature_t* features,
rocm_check_status(rocprofiler_get_metrics(group.context));
}
auto _evt = comp::rocm_event{ _dev_id, _thread_id, _queue_id, _kernel_name,
record->begin, record->end, feature_count, features };
auto _evt =
component::rocm_event{ _dev_id, _thread_id, _queue_id, _kernel_name,
record->begin, record->end, feature_count, features };
comp::rocm_data()->emplace_back(_evt);
component::rocm_data()->emplace_back(_evt);
}
// Profiling completion handler
@@ -232,7 +233,6 @@ rocm_dispatch_callback(const rocprofiler_callback_data_t* callback_data, void* a
unsigned
metrics_input(unsigned _device, rocprofiler_feature_t** ret)
{
// OMNITRACE_THROW("%s\n", __FUNCTION__);
// Profiling feature objects
auto _events = tim::delimit(config::get_rocm_events(), ", ;\t\n");
std::vector<std::string> _features = {};
@@ -275,8 +275,8 @@ metrics_input(unsigned _device, rocprofiler_feature_t** ret)
struct info_data
{
const AgentInfo* agent = nullptr;
std::vector<comp::rocm_info_entry>* data = nullptr;
const AgentInfo* agent = nullptr;
std::vector<component::rocm_info_entry>* data = nullptr;
};
hsa_status_t
@@ -304,7 +304,7 @@ info_data_callback(const rocprofiler_info_data_t info, void* arg)
{
auto _sym = JOIN("", info.metric.name, _device_qualifier_sym);
auto _short_desc = JOIN("", "Derived counter: ", info.metric.expr);
_data->emplace_back(comp::rocm_info_entry(
_data->emplace_back(component::rocm_info_entry(
true, tim::hardware_counters::api::rocm, _data->size(), 0, _sym,
_pysym, _short_desc, _long_desc, _units,
qualifier_vec_t{ _device_qualifier }));
@@ -316,7 +316,7 @@ info_data_callback(const rocprofiler_info_data_t info, void* arg)
auto _sym = JOIN("", info.metric.name, _device_qualifier_sym);
auto _short_desc =
JOIN("", info.metric.name, " on device ", _agent->dev_index);
_data->emplace_back(comp::rocm_info_entry(
_data->emplace_back(component::rocm_info_entry(
true, tim::hardware_counters::api::rocm, _data->size(), 0, _sym,
_pysym, _short_desc, _long_desc, _units,
qualifier_vec_t{ _device_qualifier }));
@@ -334,7 +334,7 @@ info_data_callback(const rocprofiler_info_data_t info, void* arg)
_device_qualifier_sym);
auto _short_desc = JOIN("", info.metric.name, " instance ", i,
" on device ", _agent->dev_index);
_data->emplace_back(comp::rocm_info_entry(
_data->emplace_back(component::rocm_info_entry(
true, tim::hardware_counters::api::rocm, _data->size(), 0,
_sym, _pysym, _short_desc, _long_desc, _units,
qualifier_vec_t{ _device_qualifier, _instance_qualifier }));
@@ -348,10 +348,10 @@ info_data_callback(const rocprofiler_info_data_t info, void* arg)
return HSA_STATUS_SUCCESS;
}
std::vector<comp::rocm_info_entry>
std::vector<component::rocm_info_entry>
rocm_metrics()
{
std::vector<comp::rocm_info_entry> _data = {};
std::vector<component::rocm_info_entry> _data = {};
try
{
(void) HsaRsrcFactory::Instance();
@@ -475,11 +475,11 @@ rocm_cleanup()
namespace
{
using rocm_event = comp::rocm_event;
using rocm_data_t = comp::rocm_data_t;
using rocm_metric_type = comp::rocm_metric_type;
using rocm_feature_value = comp::rocm_feature_value;
using rocm_data_tracker = comp::rocm_data_tracker;
using rocm_event = component::rocm_event;
using rocm_data_t = component::rocm_data_t;
using rocm_metric_type = component::rocm_metric_type;
using rocm_feature_value = component::rocm_feature_value;
using rocm_data_tracker = component::rocm_data_tracker;
void
post_process_perfetto()
@@ -496,7 +496,7 @@ post_process_perfetto()
for(size_t i = 0; i < OMNITRACE_MAX_THREADS; ++i)
{
auto& _v = comp::rocm_data(i);
auto& _v = component::rocm_data(i);
if(_v)
{
_data.reserve(_data.size() + _v->size());
@@ -605,7 +605,7 @@ post_process_timemory()
for(size_t i = 0; i < OMNITRACE_MAX_THREADS; ++i)
{
auto& _v = comp::rocm_data(i);
auto& _v = component::rocm_data(i);
if(_v)
{
_data.reserve(_data.size() + _v->size());
@@ -65,7 +65,7 @@ is_setup();
void
post_process();
std::vector<comp::rocm_info_entry>
std::vector<component::rocm_info_entry>
rocm_metrics();
#if !defined(OMNITRACE_USE_ROCPROFILER) || OMNITRACE_USE_ROCPROFILER == 0
@@ -77,10 +77,10 @@ inline void
rocm_cleanup()
{}
inline std::vector<comp::rocm_info_entry>
inline std::vector<component::rocm_info_entry>
rocm_metrics()
{
return std::vector<comp::rocm_info_entry>{};
return std::vector<component::rocm_info_entry>{};
}
#endif
@@ -0,0 +1,6 @@
#
if(OMNITRACE_USE_ROCPROFILER)
target_sources(
omnitrace-object-library PRIVATE ${CMAKE_CURRENT_LIST_DIR}/hsa_rsrc_factory.hpp
${CMAKE_CURRENT_LIST_DIR}/hsa_rsrc_factory.cpp)
endif()
@@ -21,7 +21,6 @@
// SOFTWARE.
#include "library/roctracer.hpp"
#include "library.hpp"
#include "library/components/fwd.hpp"
#include "library/config.hpp"
#include "library/critical_trace.hpp"
@@ -98,7 +97,7 @@ auto&
get_roctracer_hip_data(int64_t _tid = threading::get_id())
{
using data_t = std::unordered_map<uint64_t, roctracer_bundle_t>;
using thread_data_t = thread_data<data_t, api::roctracer>;
using thread_data_t = thread_data<data_t, category::roctracer>;
static auto& _v = thread_data_t::instances(thread_data_t::construct_on_init{});
return _v.at(_tid);
}
@@ -137,7 +136,7 @@ auto&
get_roctracer_cid_data(int64_t _tid = threading::get_id())
{
using thread_data_t =
thread_data<std::unordered_map<uint64_t, cid_data>, api::roctracer>;
thread_data<std::unordered_map<uint64_t, cid_data>, category::roctracer>;
static auto& _v = thread_data_t::instances(thread_data_t::construct_on_init{});
return *_v.at(_tid);
}
@@ -145,8 +144,9 @@ get_roctracer_cid_data(int64_t _tid = threading::get_id())
auto&
get_hip_activity_callbacks(int64_t _tid = threading::get_id())
{
using thread_data_t = thread_data<std::vector<std::function<void()>>, api::roctracer>;
static auto& _v = thread_data_t::instances(thread_data_t::construct_on_init{});
using thread_data_t =
thread_data<std::vector<std::function<void()>>, category::roctracer>;
static auto& _v = thread_data_t::instances(thread_data_t::construct_on_init{});
return _v.at(_tid);
}
@@ -156,8 +156,8 @@ using key_data_mutex_t = std::decay_t<decltype(get_roctracer_key_data())>;
auto&
get_hip_activity_mutex(int64_t _tid = threading::get_id())
{
return tim::type_mutex<hip_activity_mutex_t, api::roctracer, max_supported_threads>(
_tid);
return tim::type_mutex<hip_activity_mutex_t, category::roctracer,
max_supported_threads>(_tid);
}
} // namespace
@@ -230,17 +230,6 @@ hsa_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
OMNITRACE_SCOPED_THREAD_STATE(ThreadState::Internal);
static thread_local std::once_flag _once{};
std::call_once(_once, []() {
threading::offset_this_id(true);
if(threading::get_id() != 0)
{
sampling::block_signals();
threading::set_thread_name("roctracer.hsa");
sampling::shutdown();
}
});
(void) arg;
const hsa_api_data_t* data = reinterpret_cast<const hsa_api_data_t*>(callback_data);
OMNITRACE_CONDITIONAL_PRINT_F(
@@ -326,9 +315,8 @@ hsa_api_callback(uint32_t domain, uint32_t cid, const void* callback_data, void*
if(get_use_timemory())
{
std::unique_lock<std::mutex> _lk{ tasking::roctracer::get_mutex() };
auto _beg_ns = begin_timestamp;
auto _end_ns = end_timestamp;
auto _beg_ns = begin_timestamp;
auto _end_ns = end_timestamp;
if(tasking::roctracer::get_task_group().pool())
tasking::roctracer::get_task_group().exec(
[_name, _beg_ns, _end_ns]() {
@@ -413,7 +401,6 @@ hsa_activity_callback(uint32_t op, activity_record_t* record, void* arg)
}
};
std::unique_lock<std::mutex> _lk{ tasking::roctracer::get_mutex() };
if(tasking::roctracer::get_task_group().pool())
tasking::roctracer::get_task_group().exec(_func);
@@ -463,13 +450,15 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
if(get_use_perfetto())
tracing::push_perfetto(category::rocm_roctx{}, _data->args.message);
if(get_use_timemory()) tracing::push_timemory(_data->args.message);
if(get_use_timemory())
tracing::push_timemory(category::rocm_roctx{}, _data->args.message);
break;
}
case ROCTX_API_ID_roctxRangePop:
{
if(get_use_timemory()) tracing::pop_timemory(_data->args.message);
if(get_use_timemory())
tracing::pop_timemory(category::rocm_roctx{}, _data->args.message);
if(get_use_perfetto())
tracing::pop_perfetto(category::rocm_roctx{}, _data->args.message);
break;
@@ -486,7 +475,8 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
if(get_use_perfetto())
tracing::push_perfetto(category::rocm_roctx{}, _data->args.message);
if(get_use_timemory()) tracing::push_timemory(_data->args.message);
if(get_use_timemory())
tracing::push_timemory(category::rocm_roctx{}, _data->args.message);
break;
}
case ROCTX_API_ID_roctxRangeStop:
@@ -513,7 +503,8 @@ roctx_api_callback(uint32_t domain, uint32_t cid, const void* callback_data,
if(!_message.empty())
{
if(get_use_timemory()) tracing::pop_timemory(_message.data());
if(get_use_timemory())
tracing::pop_timemory(category::rocm_roctx{}, _message.data());
if(get_use_perfetto())
tracing::pop_perfetto(category::rocm_roctx{}, _message.data());
}
@@ -845,9 +836,9 @@ hip_activity_callback(const char* begin, const char* end, void*)
const char* op_name =
roctracer_op_string(record->domain, record->op, record->kind);
uint64_t _beg_ns = record->begin_ns + get_clock_skew();
uint64_t _end_ns = record->end_ns + get_clock_skew();
auto _ns_skew = get_clock_skew();
uint64_t _beg_ns = record->begin_ns + _ns_skew;
uint64_t _end_ns = record->end_ns + _ns_skew;
auto _corr_id = record->correlation_id;
static auto _scope = []() {
auto _v = scope::config{};
@@ -902,11 +893,13 @@ hip_activity_callback(const char* begin, const char* end, void*)
{
static size_t _n = 0;
OMNITRACE_CONDITIONAL_PRINT_F(
get_debug() && get_verbose() >= 2,
"%4zu :: %-20s :: %-20s :: correlation_id(%6lu) time_ns(%12lu:%12lu) "
"delta_ns(%12lu) device_id(%d) stream_id(%lu) proc_id(%u) thr_id(%lu)\n",
(get_debug() && get_verbose() >= 2) || _end_ns <= _beg_ns,
"%4zu :: %-20s :: %-20s :: cid=%lu, time_ns=(%12lu:%12lu) "
"delta=%li, device_id=%d, stream_id=%lu, pid=%u, tid=%lu\n",
_n++, op_name, _name, record->correlation_id, _beg_ns, _end_ns,
(_end_ns - _beg_ns), _devid, _queid, record->process_id, _tid);
(static_cast<int64_t>(_end_ns) - static_cast<int64_t>(_beg_ns)), _devid,
_queid, record->process_id, _tid);
if(_end_ns <= _beg_ns) continue;
}
// execute this on this thread bc of how perfetto visualization works
@@ -918,7 +911,7 @@ hip_activity_callback(const char* begin, const char* end, void*)
if(_kernel_names.find(_name) == _kernel_names.end())
_kernel_names.emplace(_name, tim::demangle(_name));
assert(_end_ns > _beg_ns);
assert(_end_ns >= _beg_ns);
tracing::push_perfetto_ts(
category::device_hip{}, _kernel_names.at(_name).c_str(), _beg_ns,
perfetto::Flow::ProcessScoped(_cid), "begin_ns", _beg_ns, "corr_id",
@@ -25,7 +25,6 @@
#include "library/components/roctracer.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/dynamic_library.hpp"
#include "library/perfetto.hpp"
#include "library/ptl.hpp"
@@ -48,9 +47,9 @@
namespace omnitrace
{
using roctracer_bundle_t =
tim::component_bundle<api::omnitrace, comp::roctracer_data, comp::wall_clock>;
tim::component_bundle<project::omnitrace, comp::roctracer_data, comp::wall_clock>;
using roctracer_hsa_bundle_t =
tim::component_bundle<api::omnitrace, comp::roctracer_data>;
tim::component_bundle<project::omnitrace, comp::roctracer_data>;
using roctracer_functions_t = std::vector<std::pair<std::string, std::function<void()>>>;
// HSA API callback function
@@ -21,7 +21,7 @@
// SOFTWARE.
#include "library/runtime.hpp"
#include "library/api.hpp"
#include "api.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/defines.hpp"
@@ -32,6 +32,7 @@
#include <timemory/backends/mpi.hpp>
#include <timemory/backends/process.hpp>
#include <timemory/backends/threading.hpp>
#include <timemory/components/rusage/backends.hpp>
#include <timemory/environment.hpp>
#include <timemory/sampling/allocator.hpp>
#include <timemory/settings.hpp>
@@ -63,21 +64,11 @@ get_cputime_signal()
return SIGPROF;
}
std::set<int>
get_sampling_signals(int64_t _tid)
std::set<int> get_sampling_signals(int64_t)
{
auto _sigreal = get_realtime_signal();
auto _sigprof = get_cputime_signal();
// on the main thread, typically use both real-time and cpu-time
// on secondary threads, typically use only cpu-time.
if(_tid < 1)
{
if(config::get_use_sampling_cputime()) return std::set<int>{ _sigreal, _sigprof };
return std::set<int>{ _sigreal };
}
if(config::get_use_sampling_realtime() && config::get_use_sampling_cputime())
return std::set<int>{ _sigreal, _sigprof };
else if(config::get_use_sampling_realtime())
@@ -168,7 +159,8 @@ get_cpu_cid_stack_lock(int64_t _tid)
{
struct cpu_cid_stack_s
{};
return tim::type_mutex<cpu_cid_stack_s, api::omnitrace, max_supported_threads>(_tid);
return tim::type_mutex<cpu_cid_stack_s, project::omnitrace, max_supported_threads>(
_tid);
}
namespace
@@ -183,19 +175,24 @@ setup_gotchas()
OMNITRACE_BASIC_DEBUG(
"Configuring gotcha wrapper around fork, MPI_Init, and MPI_Init_thread\n");
mpi_gotcha::configure();
exit_gotcha::configure();
fork_gotcha::configure();
pthread_gotcha::configure();
component::mpi_gotcha::configure();
component::exit_gotcha::configure();
component::fork_gotcha::configure();
}
} // namespace
std::unique_ptr<main_bundle_t>&
get_main_bundle()
{
static auto _v =
std::make_unique<main_bundle_t>(JOIN('/', "omnitrace/process", process::get_id()),
quirk::config<quirk::auto_start>{});
static auto _v = []() {
auto _self = RUSAGE_SELF;
std::swap(_self, tim::get_rusage_type());
auto _tmp = std::make_unique<main_bundle_t>(
JOIN('/', "omnitrace/process", process::get_id()),
quirk::config<quirk::auto_start>{});
std::swap(_self, tim::get_rusage_type());
return _tmp;
}();
return _v;
}
@@ -239,12 +236,16 @@ set_thread_state(ThreadState _n)
ThreadState
push_thread_state(ThreadState _v)
{
if(get_thread_state() >= ThreadState::Completed) return get_thread_state();
return get_thread_state_history().emplace_back(set_thread_state(_v));
}
ThreadState
pop_thread_state()
{
if(get_thread_state() >= ThreadState::Completed) return get_thread_state();
auto& _hist = get_thread_state_history();
if(!_hist.empty())
{
@@ -253,5 +254,4 @@ pop_thread_state()
}
return get_thread_state();
}
} // namespace omnitrace
@@ -22,7 +22,7 @@
#pragma once
#include "library/api.hpp"
#include "api.hpp"
#include "library/common.hpp"
#include "library/components/exit_gotcha.hpp"
#include "library/components/fork_gotcha.hpp"
@@ -47,8 +47,8 @@ namespace omnitrace
{
// bundle of components around omnitrace_init and omnitrace_finalize
using main_bundle_t =
tim::lightweight_tuple<comp::wall_clock, comp::peak_rss, comp::cpu_clock,
comp::cpu_util, pthread_gotcha>;
tim::lightweight_tuple<comp::wall_clock, comp::peak_rss, comp::page_rss,
comp::cpu_clock, comp::cpu_util, pthread_gotcha>;
using gotcha_bundle_t =
tim::lightweight_tuple<exit_gotcha_t, fork_gotcha_t, mpi_gotcha_t>;
@@ -21,11 +21,20 @@
// SOFTWARE.
#include "library/sampling.hpp"
#include "library/common.hpp"
#include "library/components/backtrace.hpp"
#include "library/components/backtrace_metrics.hpp"
#include "library/components/backtrace_timestamp.hpp"
#include "library/components/fwd.hpp"
#include "library/components/pthread_gotcha.hpp"
#include "library/config.hpp"
#include "library/debug.hpp"
#include "library/ptl.hpp"
#include "library/runtime.hpp"
#include "library/thread_data.hpp"
#include "library/thread_info.hpp"
#include "library/tracing.hpp"
#include "library/utility.hpp"
#include <timemory/backends/papi.hpp>
#include <timemory/backends/threading.hpp>
@@ -67,21 +76,30 @@ namespace omnitrace
{
namespace sampling
{
using bundle_t = tim::lightweight_tuple<backtrace>;
using sampler_t = tim::sampling::sampler<bundle_t, tim::sampling::dynamic>;
using hw_counters = typename component::backtrace_metrics::hw_counters;
using signal_type_instances = thread_data<std::set<int>, category::sampling>;
using sampler_running_instances = thread_data<bool, category::sampling>;
using bundle_t =
tim::lightweight_tuple<backtrace_timestamp, backtrace, backtrace_metrics>;
using sampler_t = tim::sampling::sampler<bundle_t, tim::sampling::dynamic>;
using sampler_instances = thread_data<sampler_t, category::sampling>;
using sampler_init_instances = thread_data<bundle_t, category::sampling>;
using tim::sampling::timer;
} // namespace sampling
} // namespace omnitrace
OMNITRACE_DEFINE_CONCRETE_TRAIT(prevent_reentry, sampling::sampler_t, std::true_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(provide_backtrace, sampling::sampler_t, std::false_type)
OMNITRACE_DEFINE_CONCRETE_TRAIT(buffer_size, sampling::sampler_t,
TIMEMORY_ESC(std::integral_constant<size_t, 1024>))
namespace omnitrace
{
namespace sampling
{
using hw_counters = typename component::backtrace::hw_counters;
using signal_type_instances = thread_data<std::set<int>, api::sampling>;
using backtrace_init_instances = thread_data<backtrace, api::sampling>;
using sampler_running_instances = thread_data<bool, api::sampling>;
using papi_vector_instances = thread_data<hw_counters, api::sampling>;
namespace
{
template <typename... Args>
@@ -119,6 +137,166 @@ get_signal_names(Tp&& _v)
" ";
return _sig_names.substr(0, _sig_names.length() - 1);
}
unique_ptr_t<sampler_t>&
get_sampler(int64_t _tid = threading::get_id())
{
static auto& _v = sampler_instances::instances();
return _v.at(_tid);
}
unique_ptr_t<bundle_t>&
get_sampler_init(int64_t _tid = threading::get_id())
{
static auto& _v = sampler_init_instances::instances();
if(!_v.at(_tid)) _v.at(_tid) = unique_ptr_t<bundle_t>{ new bundle_t{} };
return _v.at(_tid);
}
unique_ptr_t<bool>&
get_sampler_running(int64_t _tid)
{
static auto& _v = sampler_running_instances::instances(
sampler_running_instances::construct_on_init{}, false);
return _v.at(_tid);
}
std::set<int>
configure(bool _setup, int64_t _tid = threading::get_id())
{
const auto& _info = thread_info::get(_tid, InternalTID);
auto& _sampler = sampling::get_sampler(_tid);
auto& _running = get_sampler_running(_tid);
bool _is_running = (!_running) ? false : *_running;
auto& _signal_types = sampling::get_signal_types(_tid);
pthread_gotcha::push_enable_sampling_on_child_threads(false);
auto _dtor = scope::destructor{ []() {
pthread_gotcha::pop_enable_sampling_on_child_threads();
} };
if(_setup && !_sampler && !_is_running && !_signal_types->empty())
{
// if this thread has an offset ID, that means it was created internally
// and is probably here bc it called a function which was instrumented.
// thus we should not start a sampler for it
if(_tid > 0 && _info && _info->is_offset) return std::set<int>{};
// if the thread state is disabled or completed, return
if(_info && _info->index_data->internal_value == _tid &&
get_thread_state() == ThreadState::Disabled)
return std::set<int>{};
(void) get_debug_sampling(); // make sure query in sampler does not allocate
assert(_tid == threading::get_id());
if(trait::runtime_enabled<backtrace_metrics>::get())
backtrace_metrics::configure(_setup, _tid);
// NOTE: signals need to be unblocked by calling function
sampling::block_signals(*_signal_types);
auto _verbose = std::min<int>(get_verbose() - 2, 2);
if(get_debug_sampling()) _verbose = 2;
OMNITRACE_DEBUG("Configuring sampler for thread %lu...\n", _tid);
sampling::sampler_instances::construct("omnitrace", _tid, _verbose);
_sampler->set_flags(SA_RESTART);
_sampler->set_verbose(_verbose);
if(_signal_types->count(get_realtime_signal()) > 0)
{
_sampler->configure(timer{ get_realtime_signal(), CLOCK_REALTIME,
SIGEV_THREAD_ID, get_sampling_real_freq(),
get_sampling_real_delay(), _tid,
threading::get_sys_tid() });
}
if(_signal_types->count(get_cputime_signal()) > 0)
{
_sampler->configure(timer{ get_cputime_signal(), CLOCK_THREAD_CPUTIME_ID,
SIGEV_THREAD_ID, get_sampling_cpu_freq(),
get_sampling_cpu_delay(), _tid,
threading::get_sys_tid() });
}
static_assert(tim::trait::buffer_size<sampling::sampler_t>::value > 0,
"Error! Zero buffer size");
OMNITRACE_CONDITIONAL_THROW(
_sampler->get_buffer_size() !=
tim::trait::buffer_size<sampling::sampler_t>::value,
"dynamic sampler has a buffer size different from static trait: %zu instead "
"of %zu",
_sampler->get_buffer_size(),
tim::trait::buffer_size<sampling::sampler_t>::value);
OMNITRACE_CONDITIONAL_THROW(
_sampler->get_buffer_size() <= 0,
"dynamic sampler requires a positive buffer size: %zu",
_sampler->get_buffer_size());
for(auto itr : *_signal_types)
{
const char* _type = (itr == get_realtime_signal()) ? "wall" : "CPU";
const auto* _timer = _sampler->get_timer(itr);
if(_timer)
{
OMNITRACE_VERBOSE(
1,
"[SIG%i] Sampler for thread %lu will be triggered %.1fx per "
"second of %s-time (every %.3e milliseconds)...\n",
itr, _tid, _timer->get_frequency(units::sec), _type,
_timer->get_period(units::msec));
}
}
*_running = true;
sampling::get_sampler_init(_tid)->sample();
_sampler->start();
}
else if(!_setup && _sampler && _is_running)
{
OMNITRACE_DEBUG("Destroying sampler for thread %lu...\n", _tid);
*_running = false;
if(_tid == threading::get_id() && !_signal_types->empty())
{
sampling::block_signals(*_signal_types);
}
if(_tid == 0)
{
// this propagates to all threads
_sampler->ignore(*_signal_types);
for(int64_t i = 1; i < OMNITRACE_MAX_THREADS; ++i)
{
if(sampling::get_sampler(i))
{
sampling::get_sampler(i)->stop();
sampling::get_sampler(i)->reset();
*get_sampler_running(i) = false;
}
}
}
_sampler->stop();
if(trait::runtime_enabled<backtrace_metrics>::get())
backtrace_metrics::configure(_setup, _tid);
OMNITRACE_DEBUG("Sampler destroyed for thread %lu\n", _tid);
}
return (_signal_types) ? *_signal_types : std::set<int>{};
}
void
post_process_perfetto(int64_t _tid, const bundle_t* _init,
const std::vector<bundle_t*>& _data);
void
post_process_timemory(int64_t _tid, const bundle_t* _init,
const std::vector<bundle_t*>& _data);
} // namespace
unique_ptr_t<std::set<int>>&
@@ -133,13 +311,13 @@ std::set<int>
setup()
{
if(!get_use_sampling()) return std::set<int>{};
return backtrace::configure(true);
return configure(true);
}
std::set<int>
shutdown()
{
return backtrace::configure(false);
return configure(false);
}
void
@@ -176,11 +354,366 @@ unblock_signals(std::set<int> _signals)
thread_sigmask(SIG_UNBLOCK, &_v, nullptr);
}
unique_ptr_t<sampler_t>&
get_sampler(int64_t _tid)
void
post_process()
{
static auto& _v = sampler_instances::instances();
return _v.at(_tid);
omnitrace::component::backtrace::stop();
OMNITRACE_VERBOSE(2 || get_debug_sampling(), "Stopping backtrace metrics...\n");
for(size_t i = 0; i < max_supported_threads; ++i)
backtrace_metrics::configure(false, i);
OMNITRACE_VERBOSE(1 || get_debug_sampling(), "Post-processing sampling data...\n");
for(size_t i = 0; i < max_supported_threads; ++i)
{
auto& _sampler = get_sampler(i);
if(!_sampler)
{
// this should be relatively common
OMNITRACE_CONDITIONAL_PRINT(
get_debug() && get_verbose() >= 2,
"Post-processing sampling entries for thread %lu skipped (no sampler)\n",
i);
continue;
}
auto* _init = get_sampler_init(i).get();
if(!_init)
{
// this is not common
OMNITRACE_PRINT("Post-processing sampling entries for thread %lu skipped "
"(not initialized)\n",
i);
continue;
}
const auto& _thread_info = thread_info::get(i, InternalTID);
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
"Getting sampler data for thread %lu...\n", i);
_sampler->stop();
auto& _raw_data = _sampler->get_data();
OMNITRACE_VERBOSE(0 || get_debug_sampling(),
"Sampler data for thread %lu has %zu initial entries...\n", i,
_raw_data.size());
OMNITRACE_CI_THROW(
_sampler->get_sample_count() != _raw_data.size(),
"Error! sampler recorded %zu samples but %zu samples were returned\n",
_sampler->get_sample_count(), _raw_data.size());
// single sample that is useless (backtrace to unblocking signals)
if(_raw_data.size() == 1 && _raw_data.front().size() <= 1) _raw_data.clear();
std::vector<sampling::bundle_t*> _data{};
for(auto& itr : _raw_data)
{
_data.reserve(_data.size() + itr.size());
auto* _bt = itr.get<backtrace>();
auto* _ts = itr.get<backtrace_timestamp>();
if(!_bt || !_ts) continue;
if(_bt->empty()) continue;
if(!_thread_info->is_valid_time(_ts->get_timestamp())) continue;
_data.emplace_back(&itr);
}
if(_data.empty())
{
OMNITRACE_VERBOSE(
3 || get_debug_sampling(),
"Sampler data for thread %lu has %zu valid entries... (skipped)\n", i,
_raw_data.size());
continue;
}
OMNITRACE_VERBOSE(0 || get_debug_sampling(),
"Sampler data for thread %lu has %zu valid entries...\n", i,
_raw_data.size());
if(get_use_perfetto()) post_process_perfetto(i, _init, _data);
if(get_use_timemory()) post_process_timemory(i, _init, _data);
}
OMNITRACE_VERBOSE(0 || get_debug_sampling(),
"Post-processing sampling entries completed\n");
for(size_t i = 0; i < max_supported_threads; ++i)
{
get_sampler(i).reset();
}
OMNITRACE_VERBOSE(0 || get_debug_sampling(), "Post-processing samplers destroyed\n");
}
namespace
{
void
post_process_perfetto(int64_t _tid, const bundle_t* _init,
const std::vector<bundle_t*>& _data)
{
if(trait::runtime_enabled<backtrace_metrics>::get())
{
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
"[%li] Post-processing metrics for perfetto...\n", _tid);
backtrace_metrics::init_perfetto(_tid);
for(const auto& itr : _data)
{
const auto* _bt_metrics = itr->get<backtrace_metrics>();
const auto* _bt_time = itr->get<backtrace_timestamp>();
if(!_bt_metrics || !_bt_time) continue;
if(_bt_time->get_tid() != _tid) continue;
_bt_metrics->post_process_perfetto(_tid, _bt_time->get_timestamp());
}
backtrace_metrics::fini_perfetto(_tid);
}
auto _process_perfetto = [_tid,
_init](const std::vector<sampling::bundle_t*>& _data) {
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
"[%li] Post-processing backtraces for perfetto...\n", _tid);
const auto& _thread_info = thread_info::get(_tid, InternalTID);
OMNITRACE_CI_THROW(!_thread_info, "No valid thread info for tid=%li\n", _tid);
if(!_thread_info) return;
uint64_t _beg_ns = _thread_info->get_start();
uint64_t _end_ns = _thread_info->get_stop();
uint64_t _last_ts = std::max<uint64_t>(
_init->get<backtrace_timestamp>()->get_timestamp(), _beg_ns);
tracing::push_perfetto_ts(category::sampling{}, "samples [omnitrace]", _beg_ns,
"begin_ns", _beg_ns);
for(const auto& itr : _data)
{
const auto* _bt_ts = itr->get<backtrace_timestamp>();
const auto* _bt_cs = itr->get<backtrace>();
if(!_bt_ts || !_bt_cs) continue;
if(_bt_ts->get_tid() != _tid) continue;
static std::set<std::string> _static_strings{};
for(const auto& itr : backtrace::filter_and_patch(_bt_cs->get()))
{
const auto* _name = _static_strings.emplace(itr).first->c_str();
uint64_t _beg = _last_ts;
uint64_t _end = _bt_ts->get_timestamp();
if(!_thread_info->is_valid_lifetime({ _beg, _end })) continue;
tracing::push_perfetto_ts(category::sampling{}, _name, _beg, "begin_ns",
_beg);
tracing::pop_perfetto_ts(category::sampling{}, _name, _end, "end_ns",
_end);
}
_last_ts = _bt_ts->get_timestamp();
}
tracing::pop_perfetto_ts(category::sampling{}, "samples [omnitrace]", _end_ns,
"end_ns", _end_ns);
};
auto _processing_thread = threading::get_tid();
auto _process_perfetto_wrapper = [&]() {
if(threading::get_tid() != _processing_thread)
threading::set_thread_name(TIMEMORY_JOIN(" ", "Thread", _tid, "(S)").c_str());
try
{
_process_perfetto(_data);
} catch(std::runtime_error& _e)
{
OMNITRACE_PRINT("[sampling][post_process_perfetto] Exception: %s\n",
_e.what());
OMNITRACE_CI_ABORT(true, "[sampling][post_process_perfetto] Exception: %s\n",
_e.what());
}
};
if(_tid == 0 && config::get_mode() == Mode::Sampling &&
config::get_perfetto_fill_policy() == "discard")
{
_process_perfetto(_data);
}
else
{
pthread_gotcha::push_enable_sampling_on_child_threads(false);
std::thread{ _process_perfetto_wrapper }.join();
pthread_gotcha::pop_enable_sampling_on_child_threads();
}
}
void
post_process_timemory(int64_t _tid, const bundle_t* _init,
const std::vector<bundle_t*>& _data)
{
std::map<int64_t, std::map<int64_t, int64_t>> _depth_sum = {};
auto _scope = tim::scope::config{};
if(get_timeline_sampling()) _scope += scope::timeline{};
if(get_flat_sampling()) _scope += scope::flat{};
OMNITRACE_VERBOSE(3 || get_debug_sampling(),
"[%li] Post-processing data for timemory...\n", _tid);
const auto* _last = _init;
for(const auto& itr : _data)
{
using bundle_t = tim::lightweight_tuple<comp::trip_count, sampling_wall_clock,
sampling_cpu_clock, hw_counters>;
auto* _bt_data = itr->get<backtrace>();
auto* _bt_time = itr->get<backtrace_timestamp>();
auto* _bt_metrics = itr->get<backtrace_metrics>();
if(!_bt_data || !_bt_time || !_bt_metrics) continue;
double _elapsed_wc = (_bt_time->get_timestamp() -
_last->get<backtrace_timestamp>()->get_timestamp());
double _elapsed_cc = (_bt_metrics->get_cpu_timestamp() -
_last->get<backtrace_metrics>()->get_cpu_timestamp());
std::vector<bundle_t> _tc{};
_tc.reserve(_bt_data->size());
// generate the instances of the tuple of components and start them
for(const auto& itr : backtrace::filter_and_patch(_bt_data->get()))
{
_tc.emplace_back(tim::string_view_t{ itr }, _scope);
_tc.back().push(_bt_time->get_tid());
_tc.back().start();
}
// stop the instances and update the values as needed
for(size_t i = 0; i < _tc.size(); ++i)
{
auto& itr = _tc.at(_tc.size() - i - 1);
size_t _depth = 0;
_depth_sum[_bt_time->get_tid()][_depth] += 1;
itr.stop();
if constexpr(tim::trait::is_available<sampling_wall_clock>::value)
{
auto* _sc = itr.get<sampling_wall_clock>();
if(_sc)
{
auto _value = _elapsed_wc / sampling_wall_clock::get_unit();
_sc->set_value(_value);
_sc->set_accum(_value);
}
}
if constexpr(tim::trait::is_available<sampling_cpu_clock>::value)
{
auto* _cc = itr.get<sampling_cpu_clock>();
if(_cc)
{
_cc->set_value(_elapsed_cc / sampling_cpu_clock::get_unit());
_cc->set_accum(_elapsed_cc / sampling_cpu_clock::get_unit());
}
}
if constexpr(tim::trait::is_available<hw_counters>::value)
{
auto _hw_cnt_vals = _bt_metrics->get_hw_counters();
if(_last && _bt_metrics->get_hw_counters().size() ==
_last->get<backtrace_metrics>()->get_hw_counters().size())
{
for(size_t k = 0; k < _bt_metrics->get_hw_counters().size(); ++k)
{
if(_last->get<backtrace_metrics>()->get_hw_counters()[k] >
_hw_cnt_vals[k])
_hw_cnt_vals[k] -=
_last->get<backtrace_metrics>()->get_hw_counters()[k];
}
}
auto* _hw_counter = itr.get<hw_counters>();
if(_hw_counter)
{
_hw_counter->set_value(_hw_cnt_vals);
_hw_counter->set_accum(_hw_cnt_vals);
}
}
itr.pop();
}
_last = itr;
}
for(auto&& itr : _data)
{
using bundle_t =
tim::lightweight_tuple<sampling_percent, quirk::config<quirk::tree_scope>>;
auto* _bt_data = itr->get<backtrace>();
auto* _bt_time = itr->get<backtrace_timestamp>();
if(!_bt_time || !_bt_data) continue;
if(_depth_sum.find(_bt_time->get_tid()) == _depth_sum.end()) continue;
std::vector<bundle_t> _tc{};
_tc.reserve(_bt_data->size());
// generate the instances of the tuple of components and start them
for(const auto& itr : backtrace::filter_and_patch(_bt_data->get()))
{
_tc.emplace_back(tim::string_view_t{ itr });
_tc.back().push(_bt_time->get_tid());
_tc.back().start();
}
// stop the instances and update the values as needed
for(size_t i = 0; i < _tc.size(); ++i)
{
auto& itr = _tc.at(_tc.size() - i - 1);
size_t _depth = 0;
double _value = (1.0 / _depth_sum[_bt_time->get_tid()][_depth]) * 100.0;
itr.store(std::plus<double>{}, _value);
itr.stop();
itr.pop();
}
}
}
struct sampling_initialization
{
static void preinit()
{
sampling_wall_clock::label() = "sampling_wall_clock";
sampling_wall_clock::description() = "Wall clock time (via sampling)";
sampling_cpu_clock::label() = "sampling_cpu_clock";
sampling_cpu_clock::description() = "CPU clock time (via sampling)";
sampling_percent::label() = "sampling_percent";
sampling_percent::description() = "Percentage of samples";
sampling_gpu_busy::label() = "sampling_gpu_busy_percent";
sampling_gpu_busy::description() = "Utilization of GPU(s)";
sampling_gpu_busy::set_precision(0);
sampling_gpu_busy::set_format_flags(sampling_gpu_busy::get_format_flags() &
std::ios_base::showpoint);
sampling_gpu_memory::label() = "sampling_gpu_memory_usage";
sampling_gpu_memory::description() = "Memory usage of GPU(s)";
sampling_gpu_power::label() = "sampling_gpu_power";
sampling_gpu_power::description() = "Power usage of GPU(s)";
sampling_gpu_power::unit() = units::watt;
sampling_gpu_power::display_unit() = "watts";
sampling_gpu_power::set_precision(2);
sampling_gpu_power::set_format_flags(sampling_gpu_power::get_format_flags());
sampling_gpu_temp::label() = "sampling_gpu_temperature";
sampling_gpu_temp::description() = "Temperature of GPU(s)";
sampling_gpu_temp::unit() = 1;
sampling_gpu_temp::display_unit() = "degC";
sampling_gpu_temp::set_precision(1);
sampling_gpu_temp::set_format_flags(sampling_gpu_temp::get_format_flags());
}
};
} // namespace
} // namespace sampling
} // namespace omnitrace
TIMEMORY_INVOKE_PREINIT(omnitrace::sampling::sampling_initialization)

Some files were not shown because too many files have changed in this diff Show More