Граф коммитов

42 Коммитов

Автор SHA1 Сообщение Дата
Jonathan R. Madsen 1688a027d8 Add RedHat CI and release packaging (#251)
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
- improves the stability of MPI finalization
- reduces some debug messages within timemory when `OMNITRACE_DEBUG=ON`
- fixes issue found in RHEL where libunwind is using mutex and omnitrace was not treating this as an internal mutex call
  - this may have been affecting the causal profiling slightly (tests seem a bit more stable now)
- fix data race in timemory

* Add RedHat CI and release packaging

- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings

* Fix URL for ROCm packages in redhat workflow

* Fix dnf --enable-repo for ROCm perl packages

* Dockerfile.rhel and redhat.yml updates

- Fix dnf repo for ROCm PERL packages
- Disable python in CI (interpreter segfaults)
- Exclude parallel-overhead-locks tests due to inclusion of internal locks
  - This needs to be remedied in the future

* Exclude _dl_relocate_static_pie from instrumentation

* Testing updates

- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks

* Fix redhat workflow

* redhat.yml update

- remove if condition on config/build/test step

* Update timemory submodule

- tweaks to verbosity messages

* Set thread state before unw_step

- on Redhat, unw_step calls mutex

* Update timemory submodule

- verbosity changes
- gotcha uses spin_lock/spin_mutex

* Remove using gsplit-dwarf unless OMNITRACE_BUILD_NUMBER > 2

* Re-enable parallel-overhead-locks tests in redhat workflow

* Always disable timemory manager metadata auto output

* testing updates

- tweak parallel-overhead-locks-timemory to higher instruction count min
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks-perfetto

* Update timemory submodule

- quiet realpath queries

* omnitrace exe updates

- detect text files
- improved bin/lib locating

* cmake format

* test-install.sh and redhat workflow updates

- handle testing when ls is script
- re-enable python testing on redhat workflow
- invoke test-install.sh in redhat workflow

* Misc guards for finalization

* omnitrace-exe, testing updates

- test-install.sh: LS_EXEC -> LS_NAME
- handle /usr/bin/ls being script in source/bin/tests
- improve locating the binary

* Fix mpi_gotcha compile error

* omnitrace-exe updates

- improve file locating

* formatting

* Misc fixes

- remove -static-libstdc++ for RHEL packaging (rocky-linux doesn't distribute static lib)

* omnitrace-exe paths

* Replace realpath with absolute

- using absolute path to symlink fixes issues with locating libdyninstAPI_RT at runtime

* omnitrace exe updates

- judicious use of realpath

* Update timemory submodule

- fix update main hash ids/aliases data race in merge

* bin tests update

- change working directory of omnitrace-exe-simulate-lib-basename

* omnitrace exe updates

- Update resolved exe/lib messaging

* bin tests update

- change working directory of omnitrace-exe-simulate-lib-basename
2023-03-07 06:04:19 -06:00
Jonathan R. Madsen 846301bcaf Address and thread sanitizer fixes (#250)
* Address and thread sanitizer fixes

- Fix compilation with clang
- Tweak perfetto copy to build tree
- Added suppression files to scripts
- fix LD_PRELOAD support in omnitrace-causal and omnitrace-sample
- use spin_mutex and spin_lock from timemory instead of atomic_mutex and atomic_lock
- state uses atomic
- fix some memory leaks
- tweak testing
  - mpi tests do not use preload
  - increase timeout when using sanitizers
  - add env LD_PRELOAD when using sanitizers

* Tweak perfetto build

* Update timemory submodule

* Update version to 1.8.1

* Update omnitrace-leak.supp

* Update timemory submodule

- fixed spin_mutex implementation

* Remove previously added addr_space->allowTraps(instr_traps)

- this appears to cause errors during binary rewrite

* causal testing updates

- relaxed causal validation on CI systems (to account for hyperthreading decreasing prediction)
- improved impact calculation
- other general improvements to validate-causal-json.py

* Improve fork handling for perfetto

- numerous updates changing perfetto:: to ::perfetto::
- added perfetto_fwd.hpp

* Updated fork example

- user API for validation that stopping/starting perfetto is valid

* Misc fixes to perfetto + fork support

- tweak regions in fork example
- handle disabling tmp files
- get rid of stop/start with perfetto before/after fork
- fixed sampling support during fork
- tweak env of fork test

* Fix find_package in build-tree

* Fix buildtree export

* Fix buildtree export

* Restructured ConfigInstall before adding examples

* Guard against creating tmp file in sampling when disabled

* Fix buildtree package

* formatting

* exit handlers on child processes

- quick exit to avoid perfetto cleanup

* Further tweaking of causal tests for reliability

- enable PROCESSOR_AFFINITY
- decrease to 5 iterations

* Further tweaking of causal tests for reliability

- disable PROCESSOR_AFFINITY for fast func e2e tests
- enabling affinity results in (valid) speedup predictions greater than zero

* Fixes to fork handling

- use pthread_atfork for redundancy if fork_gotcha fails

* cmake formatting

* Fix fork init settings + install components

- remove dl from PROJECT_BUILD_TARGETS

* Testing tweaks

- fix mpi-binary-rewrite-run regex when OMNITRACE_VERBOSE set > 1 in env
- increase causal e2e iterations to 8

* Fix "Test User API"

- test-find-package.sh included dl component

* Further tweaks to causal validation

- further considerations of variance
2023-02-27 12:09:03 -06:00
Jonathan R. Madsen 32b15fe7b7 Handle fork in target application (#191)
* Always print PID in log messages

* omnitrace-dl updates

- omnitrace_preload does not call omnitrace_init or omnitrace_init_tooling
- omnitrace_preload will call omnitrace_set_mpi if OMNITRACE_USE_MPI
  or OMNITRACE_USE_MPIP in the env is true but not call it otherwise
  because doing so either overrides OMNITRACE_USE_PID (when true) or
  disable mpip from initialization (when false) and the MPI
  init can be caught later and override OMNITRACE_USE_PID

* config updates

- set_setting_value sets user update type
- remove volatile from get_settings_configured
- don't override settings::default_process_suffix
- don't kill process in omnitrace_exit_action
- set_state ignores updating state if >= State::Finalized

* Handle state > State::Finalized

* fork gotcha updates

- unsets LD_PRELOAD
- sets OMNITRACE_ROOT_PROCESS
- sets OMNITRACE_CHILD_PROCESS

* libomnitrace library.cpp updates

- basic_bundle for fini metrics
- handle finalization from child process

* sampling updates

- sampling::shutdown handles when child process

* Add example and test using fork

* Update run-ci script to support not submitting

* Tweak test envs

* Update build flags when codecov enabled

* remove unnecessary includes of sampling header

* Replace mpi copy/fini static lambda with free-funcs

* Update codecov job

* Fix OMPT segfaults after finalization

* Miscellaneous updates after rebase

* fixes for causal profiling

* revert some run-ci.sh changes

* Disable storing env in sampling::shutdown

* formatting fix

* Update timemory submodule

- fixed occasional synchronization issues with allocator offloading
- exclude protozero:: from internal samples

* improve root/child process detection

- avoid omnitrace_finalize in MPI when child process
- revert some testing tweaks
2023-02-08 01:31:38 -06:00
Jonathan R. Madsen 8feb6bf8b6 Global trace delay and duration (#235)
- The primary feature of this PR is the **addition of support for scoping the collection of tracing/profiling data into one or more time-based windows**
  - Closes #222 
  - Closes #207
  - Support for a real-clock time delay and/or a duration for tracing/profiling was added, *resembling the support for this feature during sampling and process-sampling*
  - However, above paradigm was enhanced for tracing 
    - Instead of one delay and/or one duration based on real time, ***tracing supports periodic and varying delays and durations and these delay+duration sets can be controlled with different clocks***  
    - At some point, this capability will be extended to sampling and process-sampling
- A secondary feature of this PR are the improvements to the handling of categories (by-product of the primary feature)
  - For example, previously setting `OMNITRACE_ENABLE_CATEGORIES` to a specific set of categories only eliminated the disabled categories from the perfetto trace, now these are applied to timemory profiles too
  - A new configuration variable `OMNITRACE_DISABLE_CATEGORIES` was added for when disabling only a handful of categories is easier
- There are quite a few miscellaneous modifications which pollute this PR a bit

## Multiple Tracing Windows

As noted above, tracing now supports specifying multiple delays and durations _and_ with different clocks. Consider the configuration below with two entries in the format `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_TYPE>`:

```console
OMNITRACE_TRACE_PERIODS = 0.5:1.0:2:realtime    10.0:5.0:3:cputime
```

The above configuration defines:
1. `0.5:1.0:2:realtime`
  - A delay of 0.5 seconds (real-time)
  - Followed by a data collection duration of 1 second (real-time)
  - This delay + duration is repeated 2x
  - Summary: tracing data is collected for 2 out of the first 3 seconds of the application's execution
2. `10.0:5.0:3:cputime`
  - A delay of 10 seconds (process _CPU-time_)
  - Followed by a data collection duration of 5 seconds (process _CPU-time_)
  - This delay + duration is repeated 3x
  - Summary: tracing data is collected for a total of 15 seconds of process CPU-time in the ensuing 75 seconds of CPU-time during the application execution. 
    - Note: the elapsed CPU-time is the aggregate of the CPU-time consumed by all the threads in the process and should be scaled accordingly, e.g., 4 threads running constantly for 1 second of real-time is ~4 seconds of CPU time. 

## `omnitrace-sample` Changes

Formerly, `--wait` and `--duration` command-line options only applied to sampling delay and duration. The value of these options are now applied to the tracing delay and duration. To retain the ability to control sampling delay/duration without setting tracing delay/duration or vice versa, `--sampling-wait`, `--sampling-duration`, `--trace-wait`, and `--trace-duration` options were added. `omnitrace-sample` also has new options for most of the new configuration options detailed below.

## New configuration options

| Option | Description |
| ------- | ----------- |
| `OMNITRACE_DISABLE_CATEGORIES` | inverse behavior from `OMNITRACE_ENABLE_CATEGORIES` -- populates list of all available categories and then removes the specified ones. |
| `OMNITRACE_TRACE_DELAY` | Single floating-point number specifying time to wait before starting data collection. Analagous to `OMNITRACE_SAMPLING_DELAY` and `OMNITRACE_PROCESS_SAMPLING_DELAY` |
| `OMNITRACE_TRACE_DURATION` | Single floating-point number specifying data collection duration. Analagous to `OMNITRACE_SAMPLING_DURATION` and `OMNITRACE_PROCESS_SAMPLING_DURATION` |
| `OMNITRACE_TRACE_PERIOD_CLOCK_ID` | Sets the default clock-type for tracing delay/duration. Always applied to above two options, can be overridden in below option. Accepts `CLOCK_REALTIME`, `CLOCK_MONOTONIC`, `CLOCK_PROCESS_CPUTIME_ID`, `CLOCK_MONOTONIC_RAW`, `CLOCK_REALTIME_COARSE`, `CLOCK_MONOTONIC_COARSE`, `CLOCK_BOOTTIME`. See `man 2 clock_gettime` for details on differences. |
| `OMNITRACE_TRACE_PERIODS` | More powerful version for specifying delay + duration. Supports formats: `<DELAY>`, `<DELAY>:<DURATION>`, `<DELAY>:<DURATION>:<REPEAT>`, and `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>`.  |

 ## Miscellaneous Changes

- Expanded `critical_trace_categories_t` to include tracing data from MPI, pthread, HIP, HSA, RCCL, NUMA, and Python.
- Added categories `thread_wall_time` and `thread_cpu_time` (derived from sampling)
- Read DWARF info for breakpoints
- Relocated some source code
  - Reason: necessary to make `libomnitrace` a bit more modular. Eventually, a large chunk will be separated into `libomnitrace-core`, `libomnitrace-binary`, etc. in order to facilitate re-usability
  - Relocated some functionality from `runtime.cpp` to `config.cpp`
  - Relocated code using rocm-smi library to query number of devices to `gpu.cpp` (where the code for using HIP to query number of devices is)
  - Relocated code for perfetto config and perfetto session out of tracing namespace to reside with other perfetto code
- `OMNITRACE_COLORIZED_LOG` configuration option renamed to `OMNITRACE_MONOCHROME`
  - Backwards compatibility via a deprecated option was not retained here since the logic changed (i.e. true in former means false in latter)
- Replaced `TIMEMORY_DEFAULT_OBJECT` macro with `OMNITRACE_DEFAULT_OBJECT` macro 
- Updated some code in roctracer to use `component::category_region` instead of explicitly using `tracing::` functions
- Updated `backtrace_metrics` to better support controlling their presence in the traces/profiles via categories
- Added support for `--print` in `validate-timemory-json.py`
- Generic `OMNITRACE_ADD_VALIDATION_TEST` CMake function

## Git Log

* OMNITRACE_DEFAULT_OBJECT

- replace TIMEMORY_DEFAULT_OBJECT with TIMEMORY_DEFAULT_OBJECT

* trace-time-window example + tests

- adds cmake OMNITRACE_ADD_VALIDATION_TEST function for testing
- validate-timemory-json.py now supports printing (-p)
- update to OMNITRACE_STRIP_TARGET

* Update timemory submodule

- detailed backtrace print /proc/<PID>/maps
- operation::push_node verbosity change
- storage::insert_hierarchy use emplace + at instead of operator[]
- concepts::is_type_listing
- argparse updates for start/end group
- argparse color fixes

* perfetto updates

- Remove OMNITRACE_CUSTOM_DATA_SOURCE CMake option
- move tracing::get_perfetto_config and tracing::get_perfetto_session to perfetto.cpp

* config and runtime updates

- OMNITRACE_DISABLE_CATEGORIES option
  - get_enabled_categories() + get_disabled_categories()
  - config impl handles populating them
- OMNITRACE_TRACE_DELAY option
- OMNITRACE_TRACE_DURATION option
- OMNITRACE_TRACE_PERIODS option
- {get,set}_signal_handler
  - removes config.cpp link dependency for omnitrace_finalize
- get_realtime_signal() + get_cputime_signal() + get_sampling_signals()
  - moved from runtime.cpp to config.cpp

* utility::convert

- helper function for converting string to a type

* pthread_create_gotcha + thread_info updates

- thread_index_data::as_string()
- tweak printing info about new thread / exited thread

* binary updates

- get_binary_info has arg to disable dwarf parsing
- binary_info contains vector of breakpoint addresses
- binary_info:filename() function
- binary::get_linked_path
- binary::get_link_map has args for dlopen mode
- symbol::read_dwarf -> symbol::read_dwarf_entries
- symbol::read_dwarf_breakpoints

* library updates + categories impl

- implement config::set_signal_handler
- categories.cpp for handling trace delays
  - implement trace delay/duration/periods

* concepts + debug + defines

- tuple_element in concepts
- removed runtime header from debug header
- OMNITRACE_DEFAULT_COPY_MOVE

* gpu + rocm_smi

- moved rsmi_num_monitor_devices call to gpu.cpp
  - gpu::rsmi_device_count()

* roctracer updates

- roctracer_bundle_t -> roctracer_hip_bundle_t
- use category_region instead of explicit tracing push/pop calls

* sampling + backtrace_metrics

- rework backtrace_metrics to support categories

* tracing updates

- category stack counters (i.e. push vs. pop counter) for profiling and tracing
- push_timemory and pop_timemory accept string_view instead of const char*
- tweaked the pop_timemory hash search
- {push,pop}_perfetto theoretically supports same invocations as for {push,pop}_perfetto_ts and {push,pop}_perfetto_track
- mark_perfetto, mark_perfetto_ts, mark_perfetto_track

* category_region update

- expanded the critical trace categories
- use category_push_disabled
- use category_pop_disabled
- use category_mark_disabled

* constraint implementation

- This provides generic functionality for constraining data collection within a windows of time.
 - E.g., delay, delay + duration, (delay + duration) * nrepeat

* COLORIZED_LOG -> MONOCHROME

* constraint + omnitrace-causal + omnitrace-sample updates

- support for using different clock IDs for constraints
- OMNITRACE_TRACE_PERIOD_CLOCK_ID option
- tweak to trace-time-window example
- tweak to trace-time-window tests

* Fix formatting

* Update time-window tests

- Fix detection of validation support for perfetto
- Using the --caller-include feature + runtime instrumentation on Ubuntu 18.04 and OpenSUSE 15.2 results in a segfault in the internals of Dyninst.
  - For now, mark that these tests will fail
  - Later, determine if updating Dyninst submodule fixes this problem

* Fix OMNITRACE_OUTPUT_PATH for all tests

- Provide absolute path instead of relative

* Tweak lambda for checking whether HW counters are enabled

- causing strange build errors on older GCC compilers

* Update dyninst submodule

- fix issues with using --caller-include for Ubuntu 18.04, OpenSUSE 15.x

* cmake formatting

* fix sampling compiler issue for GCC 8

* Tweak thread create message

* Increase causal validation iterations
2023-02-03 14:10:42 -06:00
Jonathan R. Madsen 2096f1f0c2 Remove temporary files (#223)
* config updates

- remove temporary file when destroyed
- extend parse_numeric_range: support vector, increment, better delimit
- git version info in omnitrace banner
- use OMNITRACE_BASIC_VERBOSE instead of OMNITRACE_VERBOSE

* Fix array-bounds warning in openmp lu.cpp

* Bump version to 1.7.4

* Update run-ci.py

- options to repeat
  - until-pass (default: 3)
  - until-fail (default: None)
  - after-timeout (default: 2)
2023-01-10 01:06:12 -06:00
Jonathan R. Madsen 1f818054ce omnitrace-install.py (#221)
- omnitrace-install.py will be uploaded as a release asset
- script simplifies selecting the correct installer script
2022-12-16 05:31:40 -06:00
Jonathan R. Madsen 589a729702 roctracer: device and stream tracks (#209)
* roctracer: use multiple tracks for HIP streams

Use different perfetto tracks for each stream, and set the name of
these tracks to the stream pointer values. Setting the name like this
matches the args in the API traces.
This fixes overlapping work on multiple streams appearing as a call
stack.

* Fix -pedantic

* Run clang-format

* Add option to disable per stream tracks in perfetto

* Updated scheme for roctracer activity + general roctracer fixes

- Per-device tracks
- Handle HSA OPS in ROCm 5.3
  - the changes in ROCm 5.3 were causing HSA ops to get discarded
- Default for OMNITRACE_ROCTRACER_DISCARD_INVALID is now zero
  - i.e. default behavior is to flip beg_ns and end_ns when beg_ns > end_ns

* Flush perfetto at end of hip_activity_callback

- fixes unterminated regions

* GitHub Actions and run-ci script updates

- improve reliability

* Set OMNITRACE_TMPDIR in testing

- files in /tmp get occasionally deleted during CI

Co-authored-by: Gergely Meszaros <gergely@streamhpc.com>
2022-11-16 15:57:27 -06:00
Jonathan R. Madsen 2f16e2ecb1 Optional perfetto annotations (#206)
* Misc tweaks

- C API function print with warning colors
- split region/trace start/stop functions into regions.cpp file

* Config option for disabling perfetto annotations

* Missing checks in roctracer.cpp and sampling.cpp

* Verbose makefile in CI

* run-ci uses -VV

* Fix gcc-7 maybe-uninitialized warning

* Fix push/pop perfetto

- moving perfetto::EventContext was causing errors
2022-11-16 09:48:15 -06:00
Jonathan R. Madsen e1102a8ba4 CI and testing updates (#203)
* Python implementation of run-ci.sh

* Container workflow update

- retry failed container build to combat network failures

* cpack workflow update

- retry failed base container build to combat network failures

* General CI workflow updates

- retry failed "Install packages" step to combat network failures

* Miscellanous linting fixes

* Formatting workflow update

- improve regex for source formatting

* format user.h

* Add new omnitrace-avail tests

* Make run-ci.py executable

* workflow retry fix

- timeout_seconds -> retry_wait_seconds

* Fix cmake formatting glob

* source formatting

* Handle PRs in run-ci.py

* Specify timeout_minutes in retry steps

* Remove remaining --cmake-args from workflows

* CI warnings about using MPICH headers

* Remove text=True from run-ci.py

- not capturing stdout/sterr so unnecessary

* Fix OpenSUSE step label

* Update omnitrace-avail-write-config tests

- use TWD (Test Working Directory) instead of PWD since PWD might not be build directory

* paths-ignore + workflow_dispatch
2022-11-13 10:42:14 -06:00
Jonathan R. Madsen f147670a7a Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}

- remove /merge from CI name

* disable using BFD when sampling_include_inlines is OFF

- this consumes a lot of memory

* Improve finalization of rocprofiler

* update timemory submodule

- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init

* Improve timemory build time

* Remove kokkosp restrictions for perfetto

* omnitrace exe signal handler update

- configure signal handlers before main to allow libomnitrace to override

* Backtrace and timemory submodule updates

- Use unwind::cache w/o inline info
- update timemory submodule
  - unwind::cache updates
  - filepath updates
  - fix termination_signal_message
  - fix vsettings::report_change

* Update dyninst submodule

- updates BinaryEdit::getResolvedLibraryPath

* update timemory submodule

- update CpuArch support

* Cleanup configure warnings

* Update examples cmake and workflows

- (Mostly) eliminate configuration warnings

* omnitrace exe updates

- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS

* Update dyninst submodule

- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath

* examples/mpi CMakeLists.txt update

- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING

* Dev build and linker flags

- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
  - disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function

* Update workflows to set OMNITRACE_BUILD_NUMBER

* Fix generator expressions for -fuse-ld

* Suppress some configuration warnings during CI

- helps to keep track of real warnings when they arise

* Update timemory and dyninst submodules with CMP0135

* Add -V flag to run-ci script
2022-11-01 17:28:12 -05:00
Jonathan R. Madsen 46b6db1a4c Submitting jobs to cdash (#124)
* Submitting jobs to cdash

* Fail on submit

* submit url env

* submit url env

* try passing submit url as arg

* fix submit url

* Updated default URL

* Add submissions for remaining ubuntu focal workflow jobs

* Replace g++ with gcc in dashboard build name

* Add --ctest-args to run-ci.sh

* Add cdash support for bionic, jammy, and opensuse workflows

* Decrease CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE

* OMNITRACE_BUILD_CODECOV option

* Support code coverage in CDash script

* CI dyninst built with debug info

* Update ci-containers

- cron schedule moved 4 hours later to UTC+5

* Update implementation of config::configure_signal_handler

- using lambdas failed to compile with codecov flags

* Add codecov job to ubuntu focal workflow

* Fix support for --ctest-args in run-ci script

* Fix ubuntu workflows

* Fix quotation handling in run-ci script

* git safe directory for codecov

* New MPI examples

* Remove --stop-on-failure

* dynamic_library update

- find_library_path checks procfs maps
- invoke find_library_path with no additional args to resolve to mapped file

* RCCLP uses dynamic_library

* check if file exists for memory_map_files metadata

* Testing updates

- include new mpi examples in tests
- fix test labels
- test critical-trace exe

* Update MPI C examples tests (needed arg)

* Remove try/catch block from critical-trace

* Fix sampling max wait when shutting down

* Fix test env for critical-trace

* Fix settings for critical-trace

- disable time output: data is deterministic
- disable PID suffixes: not multiprocess

* Update critical-trace ctest

* Update critical-trace exe

- throw error if input cannot be opened
- throw error if input has no data

* Update lulesh example with more kokkos tools usage

* Fix tasking issue with critical_trace and roctracer

- were not setting pools to active
- also sync before critical_trace::get_entries

* Increase verbosity of critical-trace tests

* Update code coverage tests

- skip code coverage + preload
- code-coverage python example and test

* Remove duplication omnitrace.initialize function

* Skip python3.6 for ubuntu jammy

* Update MPI examples

- use MPI_Isend and MPI_Irecv
- explicitly use MPI_Bcast

* Update Formatting.cmake

- include C files in examples

* run-ci script does not check return of coverage

* mpi-allreduce link to libm

* Update ctest args in run-ci script

* Update dyninst submodule

- safety improvements in BinaryEdit::openResolvedLibraryName

* capture cmake error for ctest_coverage
2022-10-31 15:39:45 -05:00
Jonathan R. Madsen ede6007f9b Support for Ubuntu 22.04 and ROCm 5.3 (#48)
* Testing and CI support for Ubuntu 22.04

* Fixes for ROCm

- Jammy does not have ROCm installers

* Name, timeout, and python updates

- renamed ubuntu-jammy-external.yml to ubuntu-jammy.yml
- increased all 5 minute timeouts to 10 minutes
- include python 3.10 in testing

* Update dyninst to remove interposed definition of _r_debug

* Rebuild Dyninst + test install script

* Revert container change

* git safe directory

* pushd -> cd

* fix MPI include

* Fix testing step

* OMPI_ALLOW_RUN_AS_ROOT

* Test script changes

* Fix mismatched malloc / delete[]

* Jammy workflow tweaks

* CPack tweak for boost deb deps

* pthread_mutex_gotcha config returns when not enabled

* fix echoing config in CI

* USE_CLANG_OMP

- option to disable using LLVM OpenMP when building OpenMP test executables
- Jammy workflow sets USE_CLANG_OMP=OFF

* Dyninst submodule boost download

- updated containers workflow to include jammy
- updated workflow to use ci

* Updates to workflows + replace test-install.sh

- test-install.sh in this branch was replaced with one in main branch

* Expand jammy test-install.sh args

* Fix openmp-cg-sampling-duration test

* update timemory submodule

- use-after-free violation in popen::pclose

* revert some tweaks to sampling-duration test

* Fix env of test-install.sh

* cmake format

* jammy bash

* CPack install for jammy

* formatting workflow action version bump

* Update timemory submodule

- libunwind submodule via timemory sets SOVERSION to 99 to avoid ABI conflicts with v8

* Fix help menu for omnitrace-sample

* Support other boolean forms in test-install.sh

* Update docker files and build-docker.sh

- consolidated cases in build-docker.sh
- support rocm version of 0.0 (no rocm install)
- support rocm v5.3
- updated centos handling

* update opensuse actions/checkout version

* Tweaks to ubuntu-focal testing

- actions/checkout@v3
- use test-install script

* update cpack

- ubuntu 22.04
- rocm 5.3
- rename os matrix field to os-version
- remove CI_ROCM_VERSION (no longer necessary)
- remove default-rocm-version matrix field (no longer necessary)
- CentOS packaging

* fix argparsing and omnitrace-sample tests in install-tests.sh

* focal rocm test install workflow fix

* Fix omnitrace-sample build

* Dockerfile.centos + build-docker.sh updates

* Update actions/upload-artifact version

* Dockerfile.ubuntu: install rocm-device-libs

* Refactor cpack

* fix cpack if quotes

* Dockerfile.ubuntu rocm < 5 installs rocm-dev

* build-release.sh defaults to boost version 1.79.0
2022-10-17 12:54:26 -05:00
Jonathan R. Madsen 8f36620e29 Fix deadlocks during initialization (#167)
- More to come in later commit, below is just tidying some stuff up
  - clang-tidy
  - mpi_gotcha quiet about not finding funcs
  - update to new papi config
  - sampling block_samples / unblock_samples
    - disable calling component's sample functions within sampler
  - release doesn't strip library
  - remove HSA and ROCP env variables from modulefile / setup-env
- preliminary support for LD_PRELOAD usage
- default sampling rate is 300 interrupts / second
- fixes various deadlock issues at startup
2022-09-26 07:52:14 -05:00
Jonathan R. Madsen 15e6e6d979 Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
  - Closes #149 
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
  - Closes #153 
  - Closes #154
2022-09-13 08:11:48 -05:00
Jonathan R. Madsen a1dcd1bc4b Static libstdcxx and python (#139)
Support python + static libstdc++
2022-08-28 03:56:13 -05:00
Jonathan R. Madsen b054b1a0ec Fix PAPI cpack packaging (#112)
* Fix PAPI cpack packaging
2022-07-25 05:03:51 -05:00
Jonathan R. Madsen 015fdfac6a Docker + build-release.sh + PAPI.cmake (#111)
- Release Docker uses CMake 3.21.4
- Configurable PAPI_C_COMPILER
- build-release.sh builds install
2022-07-25 02:36:28 -05:00
Jonathan R. Madsen ea282d9301 Release 1.3.0 preparations (#109)
* v1.3.0

* ROCm 5.2 and extensions tweaks

* Container workflow + miscellaneous updates

* Misc fixes + timemory submodule update

- timemory submodule update for multiple definitions of variant_apply

* Increase timeouts

* Remove obsolete Julia docs and script

- support for rocprofiler makes rocprof merging obsolete

* Fix cpack testing and combine cpack workflows into single script

* Install components + omnitrace tpl exes

- Improved COMPONENT specification for installs
- Install PAPI executables with omnitrace- prefix and hyphens
- Install Perfetto executables with omnitrace- prefix and hyphens

* Update docs on perfetto and papi command-line tools

* remove ubuntu 22.04 from containers workflow

* remove containers workflow running on all pushes

* Fix CI_SCRIPT_ARGS

* Fix PAPI utils install

* Fixed traced test in workflow + removed return char

- validate-perfetto-proto.py had return character

* Fix test-docker-release.sh script to use correct container

* Release build bc RelWtihDebInfo using too much memory
2022-07-23 03:02:31 -05:00
Jonathan R. Madsen 4208b5654c GPU HW Counters via rocprofiler (#84)
* Initial support for GPU hardware counters

* Update find modules for roctracer and rocprofiler

- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure

* Improve ConfigCPack for MPI

* Update rocprofiler

- rocm_metrics()
- minor cleanup

* Update rocm find modules

* declare rocm_metrics + call in omnitrace-avail

* relocate omnitrace-launch-compiler

* REALPATH and find_modules

* Examples cmake (may drop)

* omnitrace-avail

- hw_counter categories
- init rocm

* setenv updates for rocprofiler in library.cpp and dl.cpp

* get_rocm_events config

* gpu::hip_device_count()

* rocm_metrics returns hardware_counters::info

* - relocated library/components/roctracer_callbacks.* to library/roctracer.*
- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker

* update omnitrace-avail info_type generation

* Update timemory submodule

* rocprofiler component

* cmake formatting

* omnitrace-avail handle no GPUs

- Add -c command-line option for --categories
- support verbosity

* hsa_rsrc_factory throws exceptions

- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler

* rocprofiler symbols for when disabled

* Fix warning in omnitrace-avail

- std::stringstream from initializer list would use explicit constructor

* Fix finalization after settings are deleted

* Reorganized rocprofiler source

* Updated formatting

* Miscellaneous tweaks

- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes
2022-07-17 21:52:09 -05:00
Jonathan R. Madsen ab395f86c4 Support omnitrace-dl as kokkos profile library (#37)
- add OnLoad and OnUnload to omnitrace-dl
- disable global fence for kokkos profiling tools
- tweak omnitrace_strip_target to use wildcards
- added dl-gen.py script for generating dlopen bindings
- added support for kokkosp_request_tool_settings
- added support for kokkosp_dual_view_sync
- added support for kokkosp_dual_view_modify
2022-06-10 18:54:22 -05:00
Jonathan R. Madsen 8b97c70df8 Standalone build examples + testing workflow updates (#15)
* Update examples to support standalone builds

* Tweak to ubuntu-focal-external workflow

- disable PAPI

* ubuntu focal external workflow update

- GCC 11
- Test static libgcc + static libstdcxx + strip
- ubuntu-toolchain-r/test

* Improve build-release.sh

- command line args for lto, strip, perfetto-tools,
   static-libgcc, static-libstdcxx, hidden-visibility,
   max-threads, parallel

* Update VERSION to 1.0.1

* Fixes to LTO build

* Updates to ubuntu-focal-external workflow

* build-release.sh update

- enable static libstdcxx by default

* disable python + static libstdcxx

* ubuntu-focal-external updates

* build-release.sh disable static libstdcxx by default

* cmake-format
2022-05-31 01:51:18 -05:00
Jonathan R. Madsen 6af5b2a7e2 clang-tidy (#9)
- Fixed some clang-tidy warnings
- Fixed issue with omnitrace_launch_compiler + clang-tidy
2022-05-25 14:18:55 -05:00
Jonathan R. Madsen d2e635ed3c omnitrace find_package support (#3)
* omnitrace find_package support

- Fix to INSTALL_DESTINATION for configure_package_config_file
- Fixes to ConfigInstall.cmake and omnitrace-config.cmake.in

* Test find_package
2022-05-24 22:45:26 -05:00
Jonathan R. Madsen f26b3b81d6 Packaging test scripts + cpack fixes (#2) 2022-05-24 17:30:27 -05:00
Jonathan R. Madsen 9cba1f80ba Docker and build-release script updates [skip ci] (#61)
- Update CPack
2022-05-19 16:06:38 -05:00
Jonathan R. Madsen 346f8cd0bc Option rename + minor fixes (#57)
- Set choices of OMNITRACE_BACKEND option
- rename OMNITRACE_SHMEM_SIZE_HINT_KB option
- rename OMNITRACE_BUFFER_SIZE_KB option
- rename OMNITRACE_COMBINE_PERFETTO_TRACES
- rename OMNITRACE_BACKEND option
- default to OMNITRACE_COLLAPSE_PROCESSES for combining perfetto traces
- OMNITRACE_PERFETTO_FILL_POLICY option
- fix unused variables due to constexpr in add_critical_trace
- rename perfetto config from "track_event" to "omnitrace"
- fix build-release.sh + python
- handle config file updating OMNITRACE_DL_VERBOSE in omnitrace-dl
- rename roctrace.cfg to omnitrace.cfg
- accept "on" and "off" for get_sampling_cpus()
2022-05-10 17:30:45 -05:00
Jonathan R. Madsen 134b33320d Code coverage updates (#50)
* code coverage updates

- python support
- refactored source

* remove code_coverage::operator+ and operator+=

* impl/coverage.hpp
2022-05-08 01:40:56 -05:00
Jonathan R. Madsen 29220cba58 GOTCHA + Kokkos + tasking + more (#47)
* GOTCHA + Kokkos + tasking + more

- update gotcha with fix for dlsym(RTLD_NEXT, ...)
- support for standalone KOKKOS_PROFILE_LIBRARY
- remove extra flags for omnitrace-user
- roctracer and critical_trace namespaces in tasking
- generic tasking functions, e.g. join(), shutdown(), etc.
- omnitrace_init_tooling_hidden in api.hpp
- ompt.cpp uses OMNITRACE_USE_OMPT
- kokkosp uses user_region instead of omnitrace component
- re-enable recycling thread ids
- more generic _{push,pop}_perfetto functors
- fix for thread_data::instance(construct_on_init, ...)
- fix for omnitrace-headers interface target
- omnitrace_watch_for_change
2022-04-26 22:08:51 -05:00
Jonathan R. Madsen 4db6ba3d28 Multiple python versions (#42)
* Support multiple Python versions in single build

* RPATH + Split up config into config and runtime

* pybind11 submodule

* Docker build updates
2022-04-21 21:36:07 -05:00
Jonathan R. Madsen 945f541965 Documentation + Miscellaneous Fixes (#36)
* Added documentation markdown source

* Replaced AARInternal with AMDResearch in URLs

* Renamed cpack artifact names

* Fix to testing and lulesh submodule checkout

* Docker updates

* CMake and CPack

- force CMAKE_INSTALL_LIBDIR to lib
- CPACK_DEBIAN_PACKAGE_RELEASE uses OMNITRACE_CPACK_SYSTEM_NAME
- CPACK_RPM_PACKAGE_RELEASE uses OMNITRACE_CPACK_SYSTEM_NAME
- Tweak LIBOMP_LIBRARY find in examples/openmp
- Tweak setup-env.sh.in

* Partial update of README

- status badges
- docs link
- removed install info (covered by docs)

* OMNITRACE_SAMPLING_CPUS setting

- enables control over which CPUs are sampled for frequency

* omnitrace exe updates

- exclude transaction clone, virtual thunk, non-virtual thunk
- module_function::start_address
- module_function::instructions
- verbosity > 0 encodes instructions into JSON

* Miscellaneous fixes

- relocate setup-env.sh.in
- add modulefile.in
- Updated README.md and source/docs/about.md
- cmake fix for libomp
- fix license in miscellaneous places
- dl.hpp and dl.cpp

* Update timemory and dyninst submodules

- timemory signals updates
- dyninst Movement-adhoc updates

* cmake format
2022-04-04 15:27:38 -05:00
Jonathan R. Madsen 80e1a0d7e7 Tweaks to docker scripts [skip ci] (#28) 2022-02-25 18:30:37 -06:00
Jonathan R. Madsen 8648410309 Miscellaneous updates (#21)
* Miscellaneous updates

- Updated README
- Updated VERSION
- Header include tweaks
- get_verbose() + get_verbose_env()
- fixes to omnitrace-avail
    - exclude all cuda/cupti settings
    - apply available_only to hw counters
- config file warnings
- config displayed at verbose > 0
- fix to MPI_Finalize when only using MPI headers

* Updated LICENSE

* CPack tweak
2022-01-26 23:25:00 -06:00
Jonathan R. Madsen a546949ff4 MPI gotcha updates (#20)
* MPI gotcha updates

* Release script updates

- build for ubuntu focal and bionic
- use OpenMPI for OMNITRACE_USE_MPI_HEADERS

* build-release.sh updates
2022-01-26 15:02:53 -06:00
Jonathan R. Madsen 778af2a760 Sampling support + testing + omnitrace namespace (#19)
* omnitrace namespace

* Kokkos + Lulesh example/tests

* Sampling support + more

- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options

* Release workflow updates

* Updated settings printing

* Fixed defaults in README

* Tweak setting defaults in README

* CMake fixes

* cmake-format

* clang-format

* LULESH_USE_MPI OFF

* LULESH_USE_MPI fix

* timemory add_secondary fix

* timemory ambiguous internal namespace fix

* Update timemory submodule

* Handle output path/prefix in omnitrace

- updated timemory
- updated test environment

* sampling + papi fix

* Fix to sampling without PAPI

* Fix for using too many processors in CI

* formatting

* Updated CI

- minor cmake tweaks
- updated timemory submodule

* Updated CI

* Updated CI

* CI + timemory updates

- data race fixes

* CI updates + debug for sampling

* Sampling updates

- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality

* Minimum OMNITRACE_THREAD_COUNT of 128

* Handle multiple dims in sampler data

* Configure libunwind support for timemory

* Improved safeguards for sampling

- updated CI
- lulesh runtime-instrument test tweak

* formatting

* CI updates + sampler updates + misc

- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default

* Updated timemory submodule

- hidden visibility for timemory
- storage finalizers do not capture this

* Update timemory submodule

- component visibility updates

* Reworked header includes

- use <...> for timemory headers
- always include <library/defines.hpp>

* Rename some config options

* Update PTL submodule

* Update kokkos submodule

* Updated sampling

* Updated CI

* Reworked instrumentation exe

- lowered min-address-range threshold to 256
- extended whole function exclude

* CI fix + timemory submodule update

- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead

* Sampling flags + transpose update + CI update

- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads

* CI update

- removed ubuntu-focal-external-debug
- reduced data artifacts upload

* CI timeouts

- updated timemory submodule
- minor tweaks to omnitrace exe logging

* LICENSE updates (partial)

* CI Test stage timeout extension

* Docker and Packaging updates

* Miscellaneous fixes/tweaks

- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix

* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate

* cmake format

* Sampler deadlock fixes

* Removed debug messages from sampler

* Fix for MPI detection + test tweaks + misc

* Sampler deadlock fixes + misc

- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
    - no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY

* omnitrace-avail + libunwind update + restructure

- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace

* Fix remaining reorganization issues

- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting

* ensure_storage fix + avail improvements

- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail

* Delay settings initialization

- slight tweak to tests w/ MPI

* Disable OpenMPI testing w/ ubuntu-bionic

- MPI testing is hanging bc of network interface issue on system:

> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
>   Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.
2022-01-24 20:49:17 -06:00
Jonathan R. Madsen 39cf760a4e Rename hosttrace to omnitrace (#18) 2021-11-24 04:59:59 -06:00
Jonathan R. Madsen 752424efc2 Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16)

* Integrates roctracer values into wall-clock

* Fixed scoping + timemory roctracer

* Fixed data race in roctracer

* Synchronized HIP API on main thread

- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose

* Debugging + MPI + transpose updates

* PTL + HSA and timemory + kernel timing

- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks

* Ignore select HIP API types

- The ignored API types are ignored because there appears to be a bug
  which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory

* Tweaks to PTL config

* Timemory update + pid-prefix w/ mpi headers

- %pid%- prefix with mpi headers
- timemory submodule update

* CMake + critical trace + reorganize library source

- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace

* Remove bits/stdint-uintn.h headers

* Rename + config + depth + critical path

- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation

* Fixed duplicates in perfetto critical-trace

* reworked critical trace support

- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)

* Removed "%pid%-" output prefix in mpi_gotcha

* Update timemory submodule
2021-11-23 02:53:14 -06:00
Jonathan R. Madsen c56b49a0bd v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh (#15)
* v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh

- bumped to v0.0.3
- enabled gotcha wrapping of MPI functions w/o enabling MPI
- added setup-env.sh script
- minor updates to testing
- Update timemory submodule
- fixes tim::component::configure_mpip undefined symbol
- Script updates
2021-10-01 16:46:03 -05:00
Jonathan R. Madsen a061b7947f Integrated perfetto + roctracer (#5)
- hosttrace library automatically collects and merges timestamps for HIP API calls and kernels with the host-side instrumentation
  - mostly eliminates the need for using external rocprof
- added thread_instruction_count in perfetto output
- increased hosttrace min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- miscellaneous cmake updates

* roctracer support

- fully integrated perfetto + roctracer outputs
- thread_instruction_count in perfetto
- increased min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- updated timemory submodule

* hosttrace_launch_compiler

- support for using an alternative compiler as needed via launch compiler
- elfio added as submodule (not currently used)
- miscellaneous cmake updates

* README update + host/device categories + misc

- timemory fix for TIMEMORY_ROCTRACER_ENABLED
- transpose fix

* papi_tuple_t -> papi_tot_ins

- minor fix to Findroctracer.cmake
2021-09-06 22:23:24 -05:00
Jonathan R. Madsen fc15967f8f Updated documentation + misc (#3)
- tweaked some CMake option names
- moved merge-trace.jl to hosttrace-merge.jl
- removed Windows line encodings from hosttrace-merge.jl
- improved handling of !perfetto and !timemory
2021-09-02 13:14:58 -05:00
Jianbing Chen f518e09eab add systemTraceEvents to merge for perfetto ftrace data 2021-08-30 09:20:31 -05:00
Jianbing Chen 7793a1f331 use DISCARD for fill_policy since default RING_BUFFER fails when wrap around (most result table empty); add README.md and merge-trace.jl 2021-08-25 11:39:00 -05:00
Jonathan R. Madsen 9ef3800986 Hosttrace via Dyninst
- complete with ctest support
2021-08-06 13:08:57 -05:00