Commit Graph

61 Commits

Author SHA1 Message Date
Mészáros Gergely 1b8f09aa2d instrumentation: include functions with specific calls (#202)
* instrumentation: include functions with specific calls

Add the option `--caller-include <regex>` or environment variable
`OMNITRACE_REGEX_CALLER_INCLUDE` to instrument functions which contain
call to a set of functions, E.g. `--caller-include foo` instruments any
function which calls `foo`.

* Serialize caller include information

* Add test for caller include

* Tweak to the caller include test

- tweak environment
- tweak pass regexes

* Set rewrite caller example to debug

, to avoid optimizing out the call expressions that it relies on.

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2022-11-11 02:32:57 -06:00
Jonathan R. Madsen f147670a7a Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}

- remove /merge from CI name

* disable using BFD when sampling_include_inlines is OFF

- this consumes a lot of memory

* Improve finalization of rocprofiler

* update timemory submodule

- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init

* Improve timemory build time

* Remove kokkosp restrictions for perfetto

* omnitrace exe signal handler update

- configure signal handlers before main to allow libomnitrace to override

* Backtrace and timemory submodule updates

- Use unwind::cache w/o inline info
- update timemory submodule
  - unwind::cache updates
  - filepath updates
  - fix termination_signal_message
  - fix vsettings::report_change

* Update dyninst submodule

- updates BinaryEdit::getResolvedLibraryPath

* update timemory submodule

- update CpuArch support

* Cleanup configure warnings

* Update examples cmake and workflows

- (Mostly) eliminate configuration warnings

* omnitrace exe updates

- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS

* Update dyninst submodule

- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath

* examples/mpi CMakeLists.txt update

- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING

* Dev build and linker flags

- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
  - disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function

* Update workflows to set OMNITRACE_BUILD_NUMBER

* Fix generator expressions for -fuse-ld

* Suppress some configuration warnings during CI

- helps to keep track of real warnings when they arise

* Update timemory and dyninst submodules with CMP0135

* Add -V flag to run-ci script
2022-11-01 17:28:12 -05:00
Jonathan R. Madsen b23b581563 Offload sampling data (#190)
- update timemory submodule
  - support for load/save of ring_buffers
  - new output keys, e.g. `%nid%`
  - sampling allocator offloading data
- writing sampling data to temporary file
- new advanced config option `OMNITRACE_USE_TEMPORARY_FILES`
- new advanced config option `OMNITRACE_TMPDIR`
- SIGINT signal (i.e. `Ctrl+C`) triggers backtrace + finalization
  - this behavior is common to other profilers

* update output.md docs

* Update omnitrace-avail output keys handling

* update writing metadata

* str format in perfetto_counter_track

* Fix fail regex for mpi-example

* config updates

- OMNITRACE_USE_TEMPORARY_FILES
- OMNITRACE_TMPDIR
- Enable finalization with SIGINT
- code supporting creation of temp files

* sampling offloading to temporary file

* Disable creation of empty temporary files when off
2022-10-31 22:23:10 -05:00
Jonathan R. Madsen 46b6db1a4c Submitting jobs to cdash (#124)
* Submitting jobs to cdash

* Fail on submit

* submit url env

* submit url env

* try passing submit url as arg

* fix submit url

* Updated default URL

* Add submissions for remaining ubuntu focal workflow jobs

* Replace g++ with gcc in dashboard build name

* Add --ctest-args to run-ci.sh

* Add cdash support for bionic, jammy, and opensuse workflows

* Decrease CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE

* OMNITRACE_BUILD_CODECOV option

* Support code coverage in CDash script

* CI dyninst built with debug info

* Update ci-containers

- cron schedule moved 4 hours later to UTC+5

* Update implementation of config::configure_signal_handler

- using lambdas failed to compile with codecov flags

* Add codecov job to ubuntu focal workflow

* Fix support for --ctest-args in run-ci script

* Fix ubuntu workflows

* Fix quotation handling in run-ci script

* git safe directory for codecov

* New MPI examples

* Remove --stop-on-failure

* dynamic_library update

- find_library_path checks procfs maps
- invoke find_library_path with no additional args to resolve to mapped file

* RCCLP uses dynamic_library

* check if file exists for memory_map_files metadata

* Testing updates

- include new mpi examples in tests
- fix test labels
- test critical-trace exe

* Update MPI C examples tests (needed arg)

* Remove try/catch block from critical-trace

* Fix sampling max wait when shutting down

* Fix test env for critical-trace

* Fix settings for critical-trace

- disable time output: data is deterministic
- disable PID suffixes: not multiprocess

* Update critical-trace ctest

* Update critical-trace exe

- throw error if input cannot be opened
- throw error if input has no data

* Update lulesh example with more kokkos tools usage

* Fix tasking issue with critical_trace and roctracer

- were not setting pools to active
- also sync before critical_trace::get_entries

* Increase verbosity of critical-trace tests

* Update code coverage tests

- skip code coverage + preload
- code-coverage python example and test

* Remove duplication omnitrace.initialize function

* Skip python3.6 for ubuntu jammy

* Update MPI examples

- use MPI_Isend and MPI_Irecv
- explicitly use MPI_Bcast

* Update Formatting.cmake

- include C files in examples

* run-ci script does not check return of coverage

* mpi-allreduce link to libm

* Update ctest args in run-ci script

* Update dyninst submodule

- safety improvements in BinaryEdit::openResolvedLibraryName

* capture cmake error for ctest_coverage
2022-10-31 15:39:45 -05:00
Jonathan R. Madsen 139070a2de Use execvpe instead of execve in omnitrace-sample (#187)
* Use execvpe instead of execve in omnitrace-sample

- previous implementation preferred exe in PATH over exe in PWD
- 'p' variants of exec duplicate the actions of the shell in searching for an executable file if the specified filename does not contain a slash character

* OpenMPI oversubscribe arg in testing
2022-10-27 15:15:50 -05:00
Jonathan R. Madsen a41a5c155e Fix LD_PRELOAD (#184)
- __libc_start_main in libomnitrace-dl wasn't wrapping bc of -fvisibility=hidden
- fix OMNITRACE_STRIP_TARGET
- omnitrace_reset_preload function in main library
- defer removing libomnitrace from LD_PRELOAD
2022-10-21 17:39:08 -05:00
Jonathan R. Madsen 67f7471253 Omnitrace sample documentation (#179)
* Documentation for omnitrace-sample

* Improve omnitrace-sample

- improve the printing of the env updates
- remove env settings when something is deactivated
- restore env settings when something is deactivated
2022-10-19 03:30:00 -05:00
Jonathan R. Madsen 78a06e7a42 Signal handler backtraces provide line info (#178)
* Signal handler backtraces provide line info

- print backtrace after SIGINT during finalization

* Workflow run-name + jammy rocm CI

* fix jammy matrix indentation

* disable building dyninst in jammy

* Update jammy for rocm

* jammy rocm_agent_enumerator

* Fix rocm install for jammy

* jammy bash

* jammy workflow typo

* revert some changes

* stack-usage + omnitrace-rt symlink + ncclSocketAccept + indiv sigs

- symlink omnitrace-rt in build tree
- exclude ncclSocketAccept
- timemory submodule update accepting individual signal handlers
2022-10-18 21:45:56 -05:00
Jonathan R. Madsen ede6007f9b Support for Ubuntu 22.04 and ROCm 5.3 (#48)
* Testing and CI support for Ubuntu 22.04

* Fixes for ROCm

- Jammy does not have ROCm installers

* Name, timeout, and python updates

- renamed ubuntu-jammy-external.yml to ubuntu-jammy.yml
- increased all 5 minute timeouts to 10 minutes
- include python 3.10 in testing

* Update dyninst to remove interposed definition of _r_debug

* Rebuild Dyninst + test install script

* Revert container change

* git safe directory

* pushd -> cd

* fix MPI include

* Fix testing step

* OMPI_ALLOW_RUN_AS_ROOT

* Test script changes

* Fix mismatched malloc / delete[]

* Jammy workflow tweaks

* CPack tweak for boost deb deps

* pthread_mutex_gotcha config returns when not enabled

* fix echoing config in CI

* USE_CLANG_OMP

- option to disable using LLVM OpenMP when building OpenMP test executables
- Jammy workflow sets USE_CLANG_OMP=OFF

* Dyninst submodule boost download

- updated containers workflow to include jammy
- updated workflow to use ci

* Updates to workflows + replace test-install.sh

- test-install.sh in this branch was replaced with one in main branch

* Expand jammy test-install.sh args

* Fix openmp-cg-sampling-duration test

* update timemory submodule

- use-after-free violation in popen::pclose

* revert some tweaks to sampling-duration test

* Fix env of test-install.sh

* cmake format

* jammy bash

* CPack install for jammy

* formatting workflow action version bump

* Update timemory submodule

- libunwind submodule via timemory sets SOVERSION to 99 to avoid ABI conflicts with v8

* Fix help menu for omnitrace-sample

* Support other boolean forms in test-install.sh

* Update docker files and build-docker.sh

- consolidated cases in build-docker.sh
- support rocm version of 0.0 (no rocm install)
- support rocm v5.3
- updated centos handling

* update opensuse actions/checkout version

* Tweaks to ubuntu-focal testing

- actions/checkout@v3
- use test-install script

* update cpack

- ubuntu 22.04
- rocm 5.3
- rename os matrix field to os-version
- remove CI_ROCM_VERSION (no longer necessary)
- remove default-rocm-version matrix field (no longer necessary)
- CentOS packaging

* fix argparsing and omnitrace-sample tests in install-tests.sh

* focal rocm test install workflow fix

* Fix omnitrace-sample build

* Dockerfile.centos + build-docker.sh updates

* Update actions/upload-artifact version

* Dockerfile.ubuntu: install rocm-device-libs

* Refactor cpack

* fix cpack if quotes

* Dockerfile.ubuntu rocm < 5 installs rocm-dev

* build-release.sh defaults to boost version 1.79.0
2022-10-17 12:54:26 -05:00
Jonathan R. Madsen 2a387f9099 Fix finalization segfaults (#174)
- update timemory submodule with fixes to papi components and signals
update
2022-10-04 00:00:05 -05:00
Jonathan R. Madsen 7d7a8f2c23 Raise default min number of instructions (#173)
- Raise min instructions default to 1024 instead of 64
- Default value of 64 has demonstrated tendency to slow down real-life
applications
- Improved the memory safety during `omnitrace_finalize()`
- new modifications guarantee that when `tim::manager::instance()` on
main thread is destroyed, omnitrace will finalize before
- Improved some warning w/ roctracer
- Improved the search for `ROCP_METRICS` and
`OMNITRACE_ROCPROFILER_LIBRARY`
- disable printing env by default
- Attempted to improve the sampling shutdown
2022-09-30 19:28:43 -05:00
Jonathan R. Madsen 79a8f16646 omnitrace-sample (#169)
- `omnitrace-sample` executable which executes sampling (no
instrumentation)
- fixes bug with OMPT ignoring value of `OMNITRACE_USE_OMPT`
- fixes some issues with sampling duration
- new `OMNITRACE_SAMPLING_INCLUDE_INLINES` configuration variable
- restricts process-sampling to 100 interrupts/sec when inheriting value
from `OMNITRACE_SAMPLING_FREQ`
- `OMNITRACE_PROCESS_SAMPLING_FREQ` still supports up to 1000
interrupts/sec
- fixes bug with colorized log not truly being disabled in all instances
- adds tests for `omnitrace-sample`
- adds tests for sampling duration
- settings ROCP_TOOL_LIB to libomnitrace-dl throws error
  - rocprofiler does not configure correctly when this is done
- Quiet numa_gotcha warnings
- Fixed some shadowed variables
2022-09-30 10:47:07 -05:00
Jonathan R. Madsen 4e3527f0ed Resolve warnings/errors with extra warnings (#171) 2022-09-28 14:28:32 -05:00
Jonathan R. Madsen 8f36620e29 Fix deadlocks during initialization (#167)
- More to come in later commit, below is just tidying some stuff up
  - clang-tidy
  - mpi_gotcha quiet about not finding funcs
  - update to new papi config
  - sampling block_samples / unblock_samples
    - disable calling component's sample functions within sampler
  - release doesn't strip library
  - remove HSA and ROCP env variables from modulefile / setup-env
- preliminary support for LD_PRELOAD usage
- default sampling rate is 300 interrupts / second
- fixes various deadlock issues at startup
2022-09-26 07:52:14 -05:00
Jonathan R. Madsen 90ff7188f8 Crusher hackathon updates (#164)
- improved error handling in dyninst
- improved error handling in omnitrace exe
- new logging facility for omnitrace exe
- improved backtraces
- disable concurrent kernels in rocprofiler
- updates `setup-env.sh` and modulefile
  - set `omnitrace_ROOT`
  - set `HSA_TOOLS_LIB` if roctracer or rocprofiler enabled
  - set `ROCP_TOOL_LIB` if rocprofiler enabled
  - closes #163 
- No longer make setting `HSA_ENABLE_INTERRUPT=0` the default 
  - this has performance implications
- this was set to workaround a bug in ROCR which caused an ioctl call in
ROCm to hang when interrupted. But it was only interrupted when realtime
sampling was enabled since the CPU-clock doesn't increment when waiting
  - This bug should be fixed in ROCm 5.3
- omnitrace no longer activates a realtime sampler by default when
sampling, thus this bug is no longer encountered unless the user
explicitly triggers realtime sampling
2022-09-21 13:58:14 -05:00
Jonathan R. Madsen 15e6e6d979 Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
  - Closes #149 
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
  - Closes #153 
  - Closes #154
2022-09-13 08:11:48 -05:00
Jonathan R. Madsen 64afa49193 GOTCHA wrappers for NUMA functions (#152)
- GOTCHA wrappers for:
  - mbind
  - migrate_pages
  - move_pages
  - numa_migrate_pages
  - numa_move_pages
  - numa_alloc
  - numa_alloc_local
  - numa_alloc_interleaved
  - numa_alloc_onnode
  - numa_realloc
  - numa_free
- bumped version to 1.6.0
- updated categories
- hardware_counters -> kernel_hardware_counters and
thread_hardware_counters
  - simplified mapping category structs to perfetto category names
- updated timemory submodule
2022-09-12 17:44:27 -05:00
Jonathan R. Madsen 808ea7dfa7 Rework sampling and colorized logs (#140)
## Overview

This is a significant PR which has 3 very notable characteristics:

1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling 
  - Samples now record the current instruction pointers instead of strings
    - This _dramatically_ decreases the overhead of taking a sample
  - The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  - When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
    - `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency 
  - `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
  - Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
    - In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
  - This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

## Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

## Details

- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
  - there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
  - removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
  - removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
2022-08-31 01:24:31 -05:00
Jonathan R. Madsen a1dcd1bc4b Static libstdcxx and python (#139)
Support python + static libstdc++
2022-08-28 03:56:13 -05:00
Jonathan R. Madsen 23fb3946c7 Handle --advanced printing for config generation (#137)
Handle --advanced printing for config
2022-08-08 15:28:34 -05:00
Jonathan R. Madsen bb400d5d61 Remove unused funcs + messages for excluding system lib (#133)
- instead of filtering system library opaquely, generate info messages
- remove unused are_file_include_exclude_lists_empty()
- remove unused module_constraint free func
- remove unused routine_counstraint free func
2022-08-08 08:37:51 -05:00
Jonathan R. Madsen ac5ce00561 Update config generation, join fix, sampler (#129)
- update timemory submodule
  - update ring_buffer to remove all dynamic allocation
  - update sampler + sampler allocator to use ring-buffers
  - fix papi_array and papi_vector labels
- fix common::join with char delimiter
- fix config generation without --advanced
2022-08-08 07:03:24 -05:00
Jonathan R. Madsen afa3df8523 Advanced category for configuration options (#125)
Adds advanced category

- advanced category hides less relevant configuration options
- omnitrace-avail has new '--advanced' option which shows these flags
- increase verbosity level to print issue with reading ppid children
- OMNITRACE_ROCTRACER_HSA_ACTIVITY defaults to ON
- OMNITRACE_ROCTRACER_HSA_API defaults to ON
2022-08-03 12:13:00 -05:00
Jonathan R. Madsen b79ce10fee RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2 (#126)
* RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2

- until v5.2 only librocprofiler64.so was symlinked in /opt/rocm. Thus linker using SOVERSION caused issues finding librocprofiler64.so.1

* Test ROCm w/ CMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF

* INSTALL_RPATH_USE_LINK_PATH for omnitrace exe
2022-08-02 14:19:04 -05:00
Jonathan R. Madsen 97d17a8ef8 Fix RPATH handling (#122)
Fix rpath handling

- remove explicit add to CMAKE_INSTALL_RPATH
- remove overriding INSTALL_RPATH_USE_LINK_PATH for exes
2022-08-01 14:19:19 -05:00
Jonathan R. Madsen 7e31d9f450 ROCm environment fixes + workflow updates (#117)
* Improve dlopen of ROCm libraries + rocprofiler test

- Use PROJECT_BINARY_DIR in tests
- Added rocprofiler test

* Revert OMNITRACE_FORCE_ROCPROFILER_INIT

* omnitrace-avail --all test

* Fix ROCP_METRICS for ROCm 5.2.0

* Fix ROCP_METRICS for ROCm 5.2.0

* Restrict containers workflow to AMDResearch/omnitrace

* Bump version to 1.3.1

* Update cpack workflow

- generate release draft
- upload installers as release assets

* Test rocprofiler w/o roctracer enabled

* Fix formatting

* verbose message
2022-07-27 06:36:52 -05:00
Jonathan R. Madsen 45be03906a RCCL support (#93)
* Initial support for RCCL

* OMNITRACE_USE_RCCLP + sampling tweaks

- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile

* Update docker and workflows to download RCCL

* Update CPack DEB with rocprofiler dependency

* Rework rccl into library and library/components folder

- add tpls/rccl/rccl/rccl.h

* Fix timemory includes

* rcclp inline definitions when disabled

* Tweaks to ubuntu-focal-external-rocm

- disable ompt
- enable building testing

* Tweaks to ubuntu-focal-external-rocm

- ctest exclude

* Tweak ubuntu-focal.yml

- remove source /.../setup-env.sh, replace with $GITHUB_ENV

* Fix ubuntu-focal-rocm + OMPI + root

* Improved rocm-smi error handling

- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled

* formatting

* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL

* Update RCCL include directory

- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>

* RCCL Testing

- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes

* Handle RCCL include w/o HIP

* RCCL requires HIP

* Update OMNITRACE_SAMPLING_CPUS for testing

* Update tests/CMakeLists.txt

* Debug settings

* Install MPI even when USE_MPI=OFF

* exclude printf

* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS

* update ubuntu rocm workflow

* Fix configure env step for ubuntu rocm
2022-07-25 12:16:11 -05:00
Jonathan R. Madsen 99da25ea80 exit gotcha + remove DelayedInit state + rocm-smi + cleanup (#110)
* exit gotcha + remove DelayedInit state + cleanup

- exit gotcha which wraps exit, quick_exit, abort
- minor refactor of mpi gotchas
- removed some redundant code in omnitrace_finalized_hidden
- exclude instrumenting functions starting with dlopen and dlsym
- exclude instrumenting exit, quick_exit, and abort functions
- update timemory submodule with support for new gotcha_invoker with (gotcha_data, <function pointer>, args...)

* Improved rocm_smi error handling
2022-07-24 22:09:32 -05:00
Jonathan R. Madsen ea282d9301 Release 1.3.0 preparations (#109)
* v1.3.0

* ROCm 5.2 and extensions tweaks

* Container workflow + miscellaneous updates

* Misc fixes + timemory submodule update

- timemory submodule update for multiple definitions of variant_apply

* Increase timeouts

* Remove obsolete Julia docs and script

- support for rocprofiler makes rocprof merging obsolete

* Fix cpack testing and combine cpack workflows into single script

* Install components + omnitrace tpl exes

- Improved COMPONENT specification for installs
- Install PAPI executables with omnitrace- prefix and hyphens
- Install Perfetto executables with omnitrace- prefix and hyphens

* Update docs on perfetto and papi command-line tools

* remove ubuntu 22.04 from containers workflow

* remove containers workflow running on all pushes

* Fix CI_SCRIPT_ARGS

* Fix PAPI utils install

* Fixed traced test in workflow + removed return char

- validate-perfetto-proto.py had return character

* Fix test-docker-release.sh script to use correct container

* Release build bc RelWtihDebInfo using too much memory
2022-07-23 03:02:31 -05:00
Jonathan R. Madsen d27f22ea37 Sampling use SIGRTMIN + N signals (#104)
* Use SIGRTMIN instead of SIGALRM for sampling

* Config options + fully working SIGRTMIN sampling

- OMNITRACE_SAMPLING_KEEP_INTERNAL config option
- OMNITRACE_PROCESS_SAMPLING_FREQ config option
- OMNITRACE_SAMPLING_REALTIME config option
- OMNITRACE_SAMPLING_CPUTIME config option
- OMNITRACE_SAMPLING_REALTIME_OFFSET config option

* Fix omnitrace-avail-regex-negation test

- OMNITRACE_PROCESS_SAMPLING_FREQ was causing failure
2022-07-22 14:17:27 -05:00
Jonathan R. Madsen c006e542a5 Replaces OMNITRACE_CONDITIONAL_BASIC_PRINT with OMNITRACE_VERBOSE (#97)
Replaces OMNITRACE_CONDITIONAL_BASIC_PRINT with OMNITRACE_VERBOSE and similar where possible
2022-07-21 11:15:26 -05:00
Jonathan R. Madsen d04cbe862e fix omnitrace print-* with libraries (#94)
* fix omnitrace print-* with libraries

* timemory submodule update

* Update workflows to use ./bin/omnitrace instead of ./omnitrace

* cmake format

* update timemory submodule

- fix ODR violations in utility/procfs

* cmake updates

- uniform find_package for all ROCm-based libraries

* tweak transpose example

- throw exception instead of std::exit

* Inspect cmdv name before assuming not exe

- some ELF execs "think" they are libraries so only assume rewrite + simulate + all-functions if filename looks like library
- adds some test for --print-available -- <library>

* Fix _has_lib_prefix when command is < 3

* Updates and reverts to omnitrace exe

- update module_function operator< and operator==
- add function_signature operator<
- refactor module_function ctor
- revert some previous changes w.r.t. simulate and include_unninstr

* Fix source/bin/tests to use same output dir as tests

* cmake format

* Segfault mitigation + refactor + modify function iteration

- refactor module_function ctor to avoid segfaults
- string_t -> std::string
- replace std::string with std::string_view in some places
- get_name(module_t*)
- get_name(procedure_t*)
- disable using both app_modules and app_functions
- new option: --parse-all-modules to iterate over app_modules
- removed some unused code w.r.t. debug info

* Disable module_function address range for uninstrumentable functions

* Disable module_function address range for uninstrumentable functions

* Refactored getting file/line info and init/fini

- use dyninst insertInitCallback and insertFiniCallback if main not found
- fixed all issues with segmentation faults in --simulate --all-functions

* revert changes to Findrocprofiler.cmake
2022-07-21 01:15:41 -05:00
Jonathan R. Madsen 4208b5654c GPU HW Counters via rocprofiler (#84)
* Initial support for GPU hardware counters

* Update find modules for roctracer and rocprofiler

- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure

* Improve ConfigCPack for MPI

* Update rocprofiler

- rocm_metrics()
- minor cleanup

* Update rocm find modules

* declare rocm_metrics + call in omnitrace-avail

* relocate omnitrace-launch-compiler

* REALPATH and find_modules

* Examples cmake (may drop)

* omnitrace-avail

- hw_counter categories
- init rocm

* setenv updates for rocprofiler in library.cpp and dl.cpp

* get_rocm_events config

* gpu::hip_device_count()

* rocm_metrics returns hardware_counters::info

* - relocated library/components/roctracer_callbacks.* to library/roctracer.*
- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker

* update omnitrace-avail info_type generation

* Update timemory submodule

* rocprofiler component

* cmake formatting

* omnitrace-avail handle no GPUs

- Add -c command-line option for --categories
- support verbosity

* hsa_rsrc_factory throws exceptions

- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler

* rocprofiler symbols for when disabled

* Fix warning in omnitrace-avail

- std::stringstream from initializer list would use explicit constructor

* Fix finalization after settings are deleted

* Reorganized rocprofiler source

* Updated formatting

* Miscellaneous tweaks

- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes
2022-07-17 21:52:09 -05:00
Jonathan R. Madsen f6b6be3ddc Fix empty OMNITRACE_CONFIG_FILE and suppressing config and parsing (#81)
* Fix empty OMNITRACE_CONFIG_FILE and suppressing config and parsing
2022-07-11 18:46:12 -05:00
Jonathan R. Madsen fadcfa36da Rework submodule installation (#70)
* Rework submodule installation

- use add_subdirectory(... EXCLUDE_FROM_ALL) + explicit installation of deps
- install all library deps to lib/omnitrace
- internal builds of dyninst use libomnitrace-rt for binary rewriting
- support libdyninstAPI_RT not in LD_LIBRARY_PATH when dyninst built internally

* Update ubuntu-focal to test full dyninst install

* Use RelWithDebInfo because Dyninst segfaults with MinSizeRel

* Fix ubuntu-focal.yml install step
2022-06-28 16:32:07 -05:00
Jonathan R. Madsen 1877ebf47b omnitrace-avail generate config (#69)
* Config updates

- See PR #69 for details

- change type of OMNITRACE_DL_VERBOSE
- add "deprecated" category to OMNITRACE_ROCM_SMI_DEVICES
- reduce size of perfetto shared memory size hint
- deprecate OMNITRACE_OUTPUT_FILE in favor of OMNITRACE_PERFETTO_FILE
- set papi event choices
- read config file after reading command line
- fix update of OMNITRACE_DL_VERBOSE
- mark several settings as hidden
- timemory update support hidden attribute for settings
- rework get_perfetto_output_filename()
- Hide settings from not available backends

* Rework omnitrace-avail to support dumping configurations

* Overwrite query, tests, output flag

- Support using -O flag when dumping config
- Support checking before overwriting existing config
- Support --force to overwrite existing config
- Fix get_component_info not including omnitrace components
- Testing for dumping config

* Update documentation on omnitrace-avail

* Fix issue with timemory format + "/__w/"

* Update output prefix keys docs

* Rename --dump-config to --generate-config

* Hide MPI related options

- OMNITRACE_PERFETTO_COMBINE_TRACES and OMNITRACE_COLLAPSE_PROCESSES are hidden w/o MPI support
2022-06-28 01:36:04 -05:00
Jonathan R. Madsen 27e4e82376 Deprecate omnitrace use thread sampling (#68)
* Deprecate OMNITRACE_USE_THREAD_SAMPLING

* Reworked config based on OMNITRACE_MODE

- config::set_default_setting_value(...)
- config::get_mode() is now dynamically deduced
- moved tweaking defaults from library.cpp to config::configure_mode_settings(...)
- timemory submodule update fixing vsetting issue

* runtime.md update

* revert accidental lambda name change

* Reintroduce (deprecated) OMNITRACE_ROCM_SMI_DEVICES

- add handle_deprecated_setting(...) for this deprecated setting
2022-06-24 15:03:15 -05:00
Jonathan R. Madsen fe3d15a6c9 Fix attaching to running process, i.e. omnitrace -p <PID> (#60)
* Fix attaching to a process

- e.g. omnitrace -p <PID>

* Update /proc/sys/kernel/yama/ptrace_scope in CI

* Query /proc/sys/kernel/yama/ptrace_scope

* Use AUTHOR_WARNING instead of WARNING for ptrace_scope
2022-06-21 00:28:20 -05:00
Jonathan R. Madsen 8837b744ca Fixes excluded-instr output, fini functions, tweaks MPI (#50)
- fixes population of excluded_module_functions
- omnitrace-compile-definitions have OMNITRACE_USE_MPI and OMNITRACE_USE_MPI_HEADERS
- Do not enable mpi support if no full or partial MPI support
- New option --all-functions
2022-06-17 15:52:58 -05:00
Jonathan R. Madsen a640fbdb29 Fix loop-level instrumentation + more (#32)
- fix loop-level instrumentation
- support loop instrumentation w/o debug symbols via loop number
- improve module_function messages
- serialize num_basic_blocks
- serialize num_outer_loops
- serialize is_num_instructions_constrained
- serialize is_loop_num_instructions_constrained
- updated transpose example to use uniform_int_distribution
- added transpose loop test
- added fail regexes for tests which enable loop instrumentation
- use module->getFullName in get_loop_file_line_info
- use module->getFullName in get_func_file_line_info
- use module->getFullName in get_basic_block_file_line_info
2022-06-10 06:57:50 -05:00
Jonathan R. Madsen f93ddc1ee5 Fix category regex + new features (#25)
* Fix category regex + new features

- fixes issue with -R option
- Supports --csv option
- Supports --csv-separator option
- Signal handler to dump logs
- Tweak to component id strings display
- Support regex negation

* Tweak PASS_REGEX for new tests
2022-06-06 23:23:40 -05:00
Jonathan R. Madsen 8b97c70df8 Standalone build examples + testing workflow updates (#15)
* Update examples to support standalone builds

* Tweak to ubuntu-focal-external workflow

- disable PAPI

* ubuntu focal external workflow update

- GCC 11
- Test static libgcc + static libstdcxx + strip
- ubuntu-toolchain-r/test

* Improve build-release.sh

- command line args for lto, strip, perfetto-tools,
   static-libgcc, static-libstdcxx, hidden-visibility,
   max-threads, parallel

* Update VERSION to 1.0.1

* Fixes to LTO build

* Updates to ubuntu-focal-external workflow

* build-release.sh update

- enable static libstdcxx by default

* disable python + static libstdcxx

* ubuntu-focal-external updates

* build-release.sh disable static libstdcxx by default

* cmake-format
2022-05-31 01:51:18 -05:00
Jonathan R. Madsen 6491ce7808 omnitrace function exclude updates (#5)
- These functions cause weird call-stack behavior when instrumented
    - rocr::image::ImageRuntime::CreateImageManager
    - rocr::AMD::GpuAgent::GetInfo
    - rocr::HSA::hsa_agent_get_info
- These functions cause out-of-order call-stacks when KokkosP is enabled
    - Kokkos::Profiling::*
2022-05-24 19:26:12 -05:00
Jonathan R. Madsen 353e8eeb69 Critical trace updates (#6)
* critical trace updates

- better handling of OMNITRACE_USE_PERFETTO in omnitrace-critical-trace exe
- changed some data types in `critical_trace::entry`
- added device ids to critical trace entries
- added process ids to critical trace entries
- added packing to critical trace entries

* Update timemory submodule
2022-05-24 19:25:54 -05:00
Jonathan R. Madsen 5b2c27cccd Minor updates for transpose, timemory submodule, roctracer, and omnitrace exe (#4)
* transpose usage message

* timemory submodule update

* roctracer updates

- Changes to verbosity of roctracer::shutdown
- protect_flush_activity prevents deadlock when error in callback

* Removed linking to timemory-cxx in omnitrace

- omnitrace exe does not link to `timemory-cxx` target
2022-05-24 18:35:33 -05:00
Jonathan R. Madsen b208047741 Support for tracing mutex locking (#52)
* Parallel overhead example with locks

* Support tracing mutex locking + more

- support wrapping pthread_mutex_lock
- support wrapping pthread_mutex_unlock
- support wrapping pthread_mutex_trylock
- get_perfetto_combined_traces setting
- OMNITRACE_TRACE_THREAD_LOCKS option
- ThreadState
- critical trace includes queue id
- enabled/disabled settings in timemory
- fix OMNITRACE_TIMEMORY_COMPONENTS
- fix reading config
- fix setting categories
- applied ThreadState::Internal in various places
- utility::get_filled_array
- utility::get_reserved_vector
- utility::get_thread_index
- fork_gotcha messages about forks
- split out some pthread_gotcha functionality into pthread_create_gotcha
- handle queue id in roctracer callbacks

* Update timemory and PTL submodules

* Misc CMake updates

- Includes fix to omnitrace-static-lib{gcc,stdcxx}

* Misc cleanup to pthread_mutex_gotcha and backtrace

* Fix to duplicate field in module_function json

* Improvement to debug messages

* omnitrace-dl and common improvements

- tweak to delimit
- common::ignore message
- common::join quoting of strings
- omnitrace_set_env ignores if inited and active
- omnitrace_set_mpi ignores if inited and active

* nsync for transpose example

* Fix to thread_deleter<void> functor invoke

* Fix thread state and HIP stream enums
2022-05-08 04:40:10 -05:00
Jonathan R. Madsen 134b33320d Code coverage updates (#50)
* code coverage updates

- python support
- refactored source

* remove code_coverage::operator+ and operator+=

* impl/coverage.hpp
2022-05-08 01:40:56 -05:00
Jonathan R. Madsen 29220cba58 GOTCHA + Kokkos + tasking + more (#47)
* GOTCHA + Kokkos + tasking + more

- update gotcha with fix for dlsym(RTLD_NEXT, ...)
- support for standalone KOKKOS_PROFILE_LIBRARY
- remove extra flags for omnitrace-user
- roctracer and critical_trace namespaces in tasking
- generic tasking functions, e.g. join(), shutdown(), etc.
- omnitrace_init_tooling_hidden in api.hpp
- ompt.cpp uses OMNITRACE_USE_OMPT
- kokkosp uses user_region instead of omnitrace component
- re-enable recycling thread ids
- more generic _{push,pop}_perfetto functors
- fix for thread_data::instance(construct_on_init, ...)
- fix for omnitrace-headers interface target
- omnitrace_watch_for_change
2022-04-26 22:08:51 -05:00
Jonathan R. Madsen 791375bb24 Code Coverage Support (#46)
* Code-coverage support

* Examples update

- code-coverage example
- tweak transpose and parallel-overhead

* Coverage output + testing

- config::get_setting value(...)
- REGULAR_EXPRESSION -> REGEX in cmake func args
- coverage.hpp header
- coverage JSON
- coverage tests

* cmake formatting

* Library instrumentation w/o main + more

- fixed library instrumentation w/o main
- use TIMEMORY_PROJECT_NAME in output messages
- removed '--driver' option from omnitrace exe
- support coverage in trace mode
- OMNITRACE_KOKKOS_KERNEL_LOGGER
- support multiple calls to omnitrace_set_env after init if already called
- support multiple calls to omnitrace_set_mpi after init if same args
- support multiple calls to omnitrace_init if same mode
- unique_ptr_t for thread_data which calls finalize when thread_data is destroyed
- tweaked openmp tests
- improved finalization

* Replace CI --output-on-failure with -V

* Fix to OMNITRACE_DL_INVOKE

* omnitrace-exe and testing updates

- omnitrace::omnitrace-timemory interface library
- support for configs in omnitrace exe
- print-{available,instrumented,...} opts no longer exit w/o --simulate
- all tests apply --print-instrumented functions
- tweaked coverage tests
- print-* options print instructions not address range

* Remove OMNITRACE_DEBUG_FINALIZE=ON from CI

* Python cmake tweaks

* Tweak test ordering

* Upload CI artifacts if fail or success

* CI Python tweaks

- Use OMNITRACE_PYTHON_PREFIX and OMNITRACE_PYTHON_ENVS

* CI ELFULTILS_DOWNLOAD_VERSION

* test tweaks

- labels and more coverage tests

* tweak to omnitrace --config handling

* Update module/function constraint handling + PP

- tweak pre-processor definition handling
- removed free-standing module_constraint
- remove free-standing routine_constraint
- remove module_name.find("omnitrace") module constraint
- fully handle the output path of omnitrace *-instr files
- get_use_code_coverage config option
- print-coverage option
- coverage_module_functions

* use github.job not github.name

* Re-enable HSA_ENABLE_INTERRUPT

- remove coverage address report
2022-04-25 17:00:52 -05:00
Jonathan R. Madsen 77703ef4f1 Miscellaneous fixes (#44)
* Miscellaneous fixes

- handle HSA OnLoad called during omnitrace-avail
- disable setting HSA_ENABLE_INTERRUPT when roctracer not used
- sampler max verbose
- fix roctracer get_clock_skew
- cleanup roctracer debug output
- update timemory submodule with fence
- simplify min-instructions vs. min-address-range specification
- exclude cxx regex updates
- disable HSA_TOOLS_LIB and HSA_ENABLE_INTERRUPT when no roctracer

* git safe.directory
2022-04-21 22:59:50 -05:00