Revīziju grafs

128 Revīzijas

Autors SHA1 Ziņojums Datums
Jonathan R. Madsen bb591c82cb HIP API and Activity Config Options + metadata JSON PID tagging (#225)
* library metadata/functions JSON

- remove always tagging metadata.json and functions.json with PID

* roctracer options for HIP API vs. HIP activity

* opensuse docker update for ROCm

- remove adding perl repo (does not exist)
2023-01-13 07:48:42 -06:00
Jonathan R. Madsen 2096f1f0c2 Remove temporary files (#223)
* config updates

- remove temporary file when destroyed
- extend parse_numeric_range: support vector, increment, better delimit
- git version info in omnitrace banner
- use OMNITRACE_BASIC_VERBOSE instead of OMNITRACE_VERBOSE

* Fix array-bounds warning in openmp lu.cpp

* Bump version to 1.7.4

* Update run-ci.py

- options to repeat
  - until-pass (default: 3)
  - until-fail (default: None)
  - after-timeout (default: 2)
2023-01-10 01:06:12 -06:00
Jonathan R. Madsen 7ecc037d17 Fix release docs workflow + update documentation (#216)
* Fix release-docs workflow

* Documentation updates

- warning as errors when building docs
- fixed warnings when building docs
- fixed doxygen comments

* Miscellaneous fixes

* Fix doxygen comments
2022-12-14 07:59:53 -06:00
Mészáros Gergely 2449e7cd46 rocm: Fix compilation with rocm >= 5.3.1 (#214)
* rocm: Fix compilation with rocm >= 5.3.1

Add another version checked initialiation, because:
[roctracer has changed `hsa_ops_properties_t` again in 5.3.1][hsa_ops_properties_t]

[hsa_ops_properties_t]: https://github.com/ROCm-Developer-Tools/roctracer/compare/rocm-5.3.0...rocm-5.3.1#diff-42ffacd4e7f57213868dcd96aeb5c0d47d91d2a1121ba8d67f46caa267c1818cL41

* Fix formatting

* Fix preprocessing directive
2022-11-18 12:12:59 -06:00
Jonathan R. Madsen 2ebfe3fc30 Perfetto updates (#211)
* Retry support in build-docker-release.sh

* critical-trace perfetto update

- use ::perfetto::Track instead of threads to create rows
- refactor call_chain::generate_perfetto

* Fix backtrace_metrics for perfetto

- get_papi_labels is now properly populated

* Refactor sampling::post_process_perfetto

- include HW counter delta in sample debug annotations
- reduce the amount debug annotation data stored in the call-stack
  - if the data is common to the entire stack, it is only annotated in the first and the last call-stack entry

* exit_gotcha::exit_info

* Improve OMPT shutdown

- cause spurious test failures

* Update source/lib/omnitrace/library/ompt.cpp
2022-11-17 08:43:45 -06:00
Jonathan R. Madsen 589a729702 roctracer: device and stream tracks (#209)
* roctracer: use multiple tracks for HIP streams

Use different perfetto tracks for each stream, and set the name of
these tracks to the stream pointer values. Setting the name like this
matches the args in the API traces.
This fixes overlapping work on multiple streams appearing as a call
stack.

* Fix -pedantic

* Run clang-format

* Add option to disable per stream tracks in perfetto

* Updated scheme for roctracer activity + general roctracer fixes

- Per-device tracks
- Handle HSA OPS in ROCm 5.3
  - the changes in ROCm 5.3 were causing HSA ops to get discarded
- Default for OMNITRACE_ROCTRACER_DISCARD_INVALID is now zero
  - i.e. default behavior is to flip beg_ns and end_ns when beg_ns > end_ns

* Flush perfetto at end of hip_activity_callback

- fixes unterminated regions

* GitHub Actions and run-ci script updates

- improve reliability

* Set OMNITRACE_TMPDIR in testing

- files in /tmp get occasionally deleted during CI

Co-authored-by: Gergely Meszaros <gergely@streamhpc.com>
2022-11-16 15:57:27 -06:00
Jonathan R. Madsen 2f16e2ecb1 Optional perfetto annotations (#206)
* Misc tweaks

- C API function print with warning colors
- split region/trace start/stop functions into regions.cpp file

* Config option for disabling perfetto annotations

* Missing checks in roctracer.cpp and sampling.cpp

* Verbose makefile in CI

* run-ci uses -VV

* Fix gcc-7 maybe-uninitialized warning

* Fix push/pop perfetto

- moving perfetto::EventContext was causing errors
2022-11-16 09:48:15 -06:00
Jonathan R. Madsen 2135f82ab8 Improve sampling allocator (#205)
* Updated sampling

- dynamic sampler is constructed with a shared pointer to an allocator instance
- dynamic allocator handles multiple sampler
  - eliminates need for every per-thread dynamic sampler to start background allocator thread

* Fix usage of tim::popen
2022-11-13 14:37:07 -06:00
Jonathan R. Madsen e1102a8ba4 CI and testing updates (#203)
* Python implementation of run-ci.sh

* Container workflow update

- retry failed container build to combat network failures

* cpack workflow update

- retry failed base container build to combat network failures

* General CI workflow updates

- retry failed "Install packages" step to combat network failures

* Miscellanous linting fixes

* Formatting workflow update

- improve regex for source formatting

* format user.h

* Add new omnitrace-avail tests

* Make run-ci.py executable

* workflow retry fix

- timeout_seconds -> retry_wait_seconds

* Fix cmake formatting glob

* source formatting

* Handle PRs in run-ci.py

* Specify timeout_minutes in retry steps

* Remove remaining --cmake-args from workflows

* CI warnings about using MPICH headers

* Remove text=True from run-ci.py

- not capturing stdout/sterr so unnecessary

* Fix OpenSUSE step label

* Update omnitrace-avail-write-config tests

- use TWD (Test Working Directory) instead of PWD since PWD might not be build directory

* paths-ignore + workflow_dispatch
2022-11-13 10:42:14 -06:00
Jonathan R. Madsen 642b6b95ca Support external (i.e. user-defined) trace annotations (#195)
* Support external (i.e. user-defined) trace annotations

- tweaked the python examples to be more balanced
- updated the user-api example to conform to user API changes
- moved the get/set for State and ThreadState to state.{hpp,cpp}
- introduced user-provided trace annotations
- added perfetto python category
- moved coverage impl files around
- created enumerations for mapping category enums to category types
- created enumerations for mapping annotation type enums to annotation values
- moved tracing::add_perfetto_annotations to tracing/annotation.hpp
- utility make_index_sequence_range
- libomnitrace-dl: omnitrace_push_category_region
- libomnitrace-dl: omnitrace_pop_category_region
- libomnitrace-user: omnitrace_user_push_annotated_region
- libomnitrace-user: omnitrace_user_pop_annotated_region
- libpyomnitrace: support extra annotations via annotate_trace config value
  - filename
  - line
  - last attempted instr in bytecode (lasti)
  - argcount
  - num local variables
  - stacksize
- omnitrace-python: -a / --annotate-traces option

* tweak ubuntu-focal workflow

* Fix installation of omnitrace-user headers

* ubuntu-focal-codecov workflow update

- Install texinfo

* Update timemory submodule
2022-11-11 07:31:14 -06:00
Jonathan R. Madsen f147670a7a Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}

- remove /merge from CI name

* disable using BFD when sampling_include_inlines is OFF

- this consumes a lot of memory

* Improve finalization of rocprofiler

* update timemory submodule

- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init

* Improve timemory build time

* Remove kokkosp restrictions for perfetto

* omnitrace exe signal handler update

- configure signal handlers before main to allow libomnitrace to override

* Backtrace and timemory submodule updates

- Use unwind::cache w/o inline info
- update timemory submodule
  - unwind::cache updates
  - filepath updates
  - fix termination_signal_message
  - fix vsettings::report_change

* Update dyninst submodule

- updates BinaryEdit::getResolvedLibraryPath

* update timemory submodule

- update CpuArch support

* Cleanup configure warnings

* Update examples cmake and workflows

- (Mostly) eliminate configuration warnings

* omnitrace exe updates

- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS

* Update dyninst submodule

- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath

* examples/mpi CMakeLists.txt update

- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING

* Dev build and linker flags

- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
  - disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function

* Update workflows to set OMNITRACE_BUILD_NUMBER

* Fix generator expressions for -fuse-ld

* Suppress some configuration warnings during CI

- helps to keep track of real warnings when they arise

* Update timemory and dyninst submodules with CMP0135

* Add -V flag to run-ci script
2022-11-01 17:28:12 -05:00
Jonathan R. Madsen b23b581563 Offload sampling data (#190)
- update timemory submodule
  - support for load/save of ring_buffers
  - new output keys, e.g. `%nid%`
  - sampling allocator offloading data
- writing sampling data to temporary file
- new advanced config option `OMNITRACE_USE_TEMPORARY_FILES`
- new advanced config option `OMNITRACE_TMPDIR`
- SIGINT signal (i.e. `Ctrl+C`) triggers backtrace + finalization
  - this behavior is common to other profilers

* update output.md docs

* Update omnitrace-avail output keys handling

* update writing metadata

* str format in perfetto_counter_track

* Fix fail regex for mpi-example

* config updates

- OMNITRACE_USE_TEMPORARY_FILES
- OMNITRACE_TMPDIR
- Enable finalization with SIGINT
- code supporting creation of temp files

* sampling offloading to temporary file

* Disable creation of empty temporary files when off
2022-10-31 22:23:10 -05:00
Jonathan R. Madsen 46b6db1a4c Submitting jobs to cdash (#124)
* Submitting jobs to cdash

* Fail on submit

* submit url env

* submit url env

* try passing submit url as arg

* fix submit url

* Updated default URL

* Add submissions for remaining ubuntu focal workflow jobs

* Replace g++ with gcc in dashboard build name

* Add --ctest-args to run-ci.sh

* Add cdash support for bionic, jammy, and opensuse workflows

* Decrease CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE

* OMNITRACE_BUILD_CODECOV option

* Support code coverage in CDash script

* CI dyninst built with debug info

* Update ci-containers

- cron schedule moved 4 hours later to UTC+5

* Update implementation of config::configure_signal_handler

- using lambdas failed to compile with codecov flags

* Add codecov job to ubuntu focal workflow

* Fix support for --ctest-args in run-ci script

* Fix ubuntu workflows

* Fix quotation handling in run-ci script

* git safe directory for codecov

* New MPI examples

* Remove --stop-on-failure

* dynamic_library update

- find_library_path checks procfs maps
- invoke find_library_path with no additional args to resolve to mapped file

* RCCLP uses dynamic_library

* check if file exists for memory_map_files metadata

* Testing updates

- include new mpi examples in tests
- fix test labels
- test critical-trace exe

* Update MPI C examples tests (needed arg)

* Remove try/catch block from critical-trace

* Fix sampling max wait when shutting down

* Fix test env for critical-trace

* Fix settings for critical-trace

- disable time output: data is deterministic
- disable PID suffixes: not multiprocess

* Update critical-trace ctest

* Update critical-trace exe

- throw error if input cannot be opened
- throw error if input has no data

* Update lulesh example with more kokkos tools usage

* Fix tasking issue with critical_trace and roctracer

- were not setting pools to active
- also sync before critical_trace::get_entries

* Increase verbosity of critical-trace tests

* Update code coverage tests

- skip code coverage + preload
- code-coverage python example and test

* Remove duplication omnitrace.initialize function

* Skip python3.6 for ubuntu jammy

* Update MPI examples

- use MPI_Isend and MPI_Irecv
- explicitly use MPI_Bcast

* Update Formatting.cmake

- include C files in examples

* run-ci script does not check return of coverage

* mpi-allreduce link to libm

* Update ctest args in run-ci script

* Update dyninst submodule

- safety improvements in BinaryEdit::openResolvedLibraryName

* capture cmake error for ctest_coverage
2022-10-31 15:39:45 -05:00
Jonathan R. Madsen a41a5c155e Fix LD_PRELOAD (#184)
- __libc_start_main in libomnitrace-dl wasn't wrapping bc of -fvisibility=hidden
- fix OMNITRACE_STRIP_TARGET
- omnitrace_reset_preload function in main library
- defer removing libomnitrace from LD_PRELOAD
2022-10-21 17:39:08 -05:00
Jonathan R. Madsen f2a0485d60 OMNITRACE_ROCTRACER_DISCARD_INVALID=N (#186)
- only print N warnings that end < begin
- if N is zero, swap(begin, end) and include data
2022-10-21 09:24:13 -05:00
Jonathan R. Madsen 9ff4b6b624 Dramatic improvement of post-processing critical trace data (#185)
- several orders of magnitude faster
2022-10-21 09:23:52 -05:00
Jonathan R. Madsen 271f851896 Disable HSA API and activity by default (#183)
- OMNITRACE_ROCTRACER_HSA_ACTIVITY is OFF by default
- OMNITRACE_ROCTRACER_HSA_API is OFF by default
- in real applications, this adds way too much tracing data to perfetto
2022-10-21 09:23:29 -05:00
Jonathan R. Madsen 1bead56ce8 Fix perfetto debug annotations of function parameters (#181)
Fix debug annotations of function parameters
2022-10-21 07:24:21 -05:00
Jonathan R. Madsen 67f7471253 Omnitrace sample documentation (#179)
* Documentation for omnitrace-sample

* Improve omnitrace-sample

- improve the printing of the env updates
- remove env settings when something is deactivated
- restore env settings when something is deactivated
2022-10-19 03:30:00 -05:00
Jonathan R. Madsen 78a06e7a42 Signal handler backtraces provide line info (#178)
* Signal handler backtraces provide line info

- print backtrace after SIGINT during finalization

* Workflow run-name + jammy rocm CI

* fix jammy matrix indentation

* disable building dyninst in jammy

* Update jammy for rocm

* jammy rocm_agent_enumerator

* Fix rocm install for jammy

* jammy bash

* jammy workflow typo

* revert some changes

* stack-usage + omnitrace-rt symlink + ncclSocketAccept + indiv sigs

- symlink omnitrace-rt in build tree
- exclude ncclSocketAccept
- timemory submodule update accepting individual signal handlers
2022-10-18 21:45:56 -05:00
Jonathan R. Madsen ede6007f9b Support for Ubuntu 22.04 and ROCm 5.3 (#48)
* Testing and CI support for Ubuntu 22.04

* Fixes for ROCm

- Jammy does not have ROCm installers

* Name, timeout, and python updates

- renamed ubuntu-jammy-external.yml to ubuntu-jammy.yml
- increased all 5 minute timeouts to 10 minutes
- include python 3.10 in testing

* Update dyninst to remove interposed definition of _r_debug

* Rebuild Dyninst + test install script

* Revert container change

* git safe directory

* pushd -> cd

* fix MPI include

* Fix testing step

* OMPI_ALLOW_RUN_AS_ROOT

* Test script changes

* Fix mismatched malloc / delete[]

* Jammy workflow tweaks

* CPack tweak for boost deb deps

* pthread_mutex_gotcha config returns when not enabled

* fix echoing config in CI

* USE_CLANG_OMP

- option to disable using LLVM OpenMP when building OpenMP test executables
- Jammy workflow sets USE_CLANG_OMP=OFF

* Dyninst submodule boost download

- updated containers workflow to include jammy
- updated workflow to use ci

* Updates to workflows + replace test-install.sh

- test-install.sh in this branch was replaced with one in main branch

* Expand jammy test-install.sh args

* Fix openmp-cg-sampling-duration test

* update timemory submodule

- use-after-free violation in popen::pclose

* revert some tweaks to sampling-duration test

* Fix env of test-install.sh

* cmake format

* jammy bash

* CPack install for jammy

* formatting workflow action version bump

* Update timemory submodule

- libunwind submodule via timemory sets SOVERSION to 99 to avoid ABI conflicts with v8

* Fix help menu for omnitrace-sample

* Support other boolean forms in test-install.sh

* Update docker files and build-docker.sh

- consolidated cases in build-docker.sh
- support rocm version of 0.0 (no rocm install)
- support rocm v5.3
- updated centos handling

* update opensuse actions/checkout version

* Tweaks to ubuntu-focal testing

- actions/checkout@v3
- use test-install script

* update cpack

- ubuntu 22.04
- rocm 5.3
- rename os matrix field to os-version
- remove CI_ROCM_VERSION (no longer necessary)
- remove default-rocm-version matrix field (no longer necessary)
- CentOS packaging

* fix argparsing and omnitrace-sample tests in install-tests.sh

* focal rocm test install workflow fix

* Fix omnitrace-sample build

* Dockerfile.centos + build-docker.sh updates

* Update actions/upload-artifact version

* Dockerfile.ubuntu: install rocm-device-libs

* Refactor cpack

* fix cpack if quotes

* Dockerfile.ubuntu rocm < 5 installs rocm-dev

* build-release.sh defaults to boost version 1.79.0
2022-10-17 12:54:26 -05:00
Jonathan R. Madsen a3439d5bf2 Trace thread config + paranoid level + preload (#176)
- OMNITRACE_TRACE_THREAD_BARRIERS config option
  - set to OFF to disable wrapping `pthread_barrier`
- OMNITRACE_TRACE_THREAD_JOIN config option
  - set to OFF to disable wrapping `pthread_join`
- allow PAPI with perf_event_paranoid at level 2
- default to no PAPI events
- setenv LD_PRELOAD to not include libomnitrace after preload
  - closes #175 
- bump version to 1.7.1
2022-10-06 19:11:08 -05:00
Jonathan R. Madsen 2a387f9099 Fix finalization segfaults (#174)
- update timemory submodule with fixes to papi components and signals
update
2022-10-04 00:00:05 -05:00
Jonathan R. Madsen 7d7a8f2c23 Raise default min number of instructions (#173)
- Raise min instructions default to 1024 instead of 64
- Default value of 64 has demonstrated tendency to slow down real-life
applications
- Improved the memory safety during `omnitrace_finalize()`
- new modifications guarantee that when `tim::manager::instance()` on
main thread is destroyed, omnitrace will finalize before
- Improved some warning w/ roctracer
- Improved the search for `ROCP_METRICS` and
`OMNITRACE_ROCPROFILER_LIBRARY`
- disable printing env by default
- Attempted to improve the sampling shutdown
2022-09-30 19:28:43 -05:00
Jonathan R. Madsen 79a8f16646 omnitrace-sample (#169)
- `omnitrace-sample` executable which executes sampling (no
instrumentation)
- fixes bug with OMPT ignoring value of `OMNITRACE_USE_OMPT`
- fixes some issues with sampling duration
- new `OMNITRACE_SAMPLING_INCLUDE_INLINES` configuration variable
- restricts process-sampling to 100 interrupts/sec when inheriting value
from `OMNITRACE_SAMPLING_FREQ`
- `OMNITRACE_PROCESS_SAMPLING_FREQ` still supports up to 1000
interrupts/sec
- fixes bug with colorized log not truly being disabled in all instances
- adds tests for `omnitrace-sample`
- adds tests for sampling duration
- settings ROCP_TOOL_LIB to libomnitrace-dl throws error
  - rocprofiler does not configure correctly when this is done
- Quiet numa_gotcha warnings
- Fixed some shadowed variables
2022-09-30 10:47:07 -05:00
Jonathan R. Madsen 4e3527f0ed Resolve warnings/errors with extra warnings (#171) 2022-09-28 14:28:32 -05:00
Jonathan R. Madsen 2ed818449f Support for libbfd (binary file descriptor) (#168)
- provides addr2line information for sampling
2022-09-28 00:18:05 -05:00
Jonathan R. Madsen 8f36620e29 Fix deadlocks during initialization (#167)
- More to come in later commit, below is just tidying some stuff up
  - clang-tidy
  - mpi_gotcha quiet about not finding funcs
  - update to new papi config
  - sampling block_samples / unblock_samples
    - disable calling component's sample functions within sampler
  - release doesn't strip library
  - remove HSA and ROCP env variables from modulefile / setup-env
- preliminary support for LD_PRELOAD usage
- default sampling rate is 300 interrupts / second
- fixes various deadlock issues at startup
2022-09-26 07:52:14 -05:00
Jonathan R. Madsen 90ff7188f8 Crusher hackathon updates (#164)
- improved error handling in dyninst
- improved error handling in omnitrace exe
- new logging facility for omnitrace exe
- improved backtraces
- disable concurrent kernels in rocprofiler
- updates `setup-env.sh` and modulefile
  - set `omnitrace_ROOT`
  - set `HSA_TOOLS_LIB` if roctracer or rocprofiler enabled
  - set `ROCP_TOOL_LIB` if rocprofiler enabled
  - closes #163 
- No longer make setting `HSA_ENABLE_INTERRUPT=0` the default 
  - this has performance implications
- this was set to workaround a bug in ROCR which caused an ioctl call in
ROCm to hang when interrupted. But it was only interrupted when realtime
sampling was enabled since the CPU-clock doesn't increment when waiting
  - This bug should be fixed in ROCm 5.3
- omnitrace no longer activates a realtime sampler by default when
sampling, thus this bug is no longer encountered unless the user
explicitly triggers realtime sampling
2022-09-21 13:58:14 -05:00
Jonathan R. Madsen 472e96a084 Fix building w/ hip, etc. but w/o rocprofiler (#159)
- Fixes builds with `OMNITRACE_USE_ROCPROFILER=OFF` but
`OMNITRACE_USE_ROCTRACER=ON`
- Move rocprofiler/hsa_rsrc_factory.{hpp,cpp}` to rocm folder
2022-09-14 00:07:33 -05:00
Jonathan R. Madsen e4e3c2b7f4 Fix gotcha indexes for numa_gotcha (#161)
- In #152, `numa_alloc_interleaved` and `numa_alloc_local` both were
indexed at position 6
2022-09-13 23:03:45 -05:00
Jonathan R. Madsen 15e6e6d979 Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
  - Closes #149 
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
  - Closes #153 
  - Closes #154
2022-09-13 08:11:48 -05:00
Jonathan R. Madsen 4ed8f8f762 Individual perfetto debug annotations (#157)
- instead of args="<list-of-string>" in perfetto, each argument is added
individually, enabling matching pointer values, etc.

## Previous behavior

Previously, all function arguments were wrapped into a single string,
e.g.:

![Screen Shot 2022-09-12 at 11 52 11
PM](https://user-images.githubusercontent.com/6001865/189811896-3b13d5a4-e884-40b3-b6b4-f84f3e16f603.png)

## New Behavior

With the exception of the HIP API (whose args are provided as a single
string via `hipApiString`), all functions from MPI, RCCL, pthreads, etc.
have individual arguments in perfetto, e.g.:

![Screen Shot 2022-09-12 at 11 52 01
PM](https://user-images.githubusercontent.com/6001865/189812137-63afba72-170d-42b3-b3bc-ae74e32ceadf.png)

In the above, previously, this would have been:

|  |  |
| - | - |
| args | `pthread_rwlock_t*=0x1c1cc50` |

The key benefit enabled is the ability to find slices with same arg
values:

<img width="753" alt="Screen Shot 2022-09-12 at 11 59 18 PM"
src="https://user-images.githubusercontent.com/6001865/189812915-0342f841-e5ce-4f8e-8169-0cb52f3425b5.png">

Previously, the entire "args" field would have had to match, which
essentially never happened if the pointer was used in two different
functions with different function signatures

## Example

![Screen Shot 2022-09-12 at 11 49 18
PM](https://user-images.githubusercontent.com/6001865/189811621-5eed0008-f340-438e-938d-a140f836b583.png)

![Screen Shot 2022-09-12 at 11 49 45
PM](https://user-images.githubusercontent.com/6001865/189811584-a983ed1c-095f-4d54-a790-9ef3adbe4e08.png)
2022-09-13 02:05:03 -05:00
Jonathan R. Madsen 64afa49193 GOTCHA wrappers for NUMA functions (#152)
- GOTCHA wrappers for:
  - mbind
  - migrate_pages
  - move_pages
  - numa_migrate_pages
  - numa_move_pages
  - numa_alloc
  - numa_alloc_local
  - numa_alloc_interleaved
  - numa_alloc_onnode
  - numa_realloc
  - numa_free
- bumped version to 1.6.0
- updated categories
- hardware_counters -> kernel_hardware_counters and
thread_hardware_counters
  - simplified mapping category structs to perfetto category names
- updated timemory submodule
2022-09-12 17:44:27 -05:00
Jonathan R. Madsen 2718596e5a Support tracing thread locks with perfetto (#143)
- remove sampling and roctracer flat/timeline options
  - unused/unnecessary clutter
- start pthread_gotcha before perfetto
- remove pthread_mutex_gotcha validate
- update timemory submodule with tid fix
2022-08-31 11:33:45 -05:00
Jonathan R. Madsen e67afd33eb Support sampling duration, sampling TIDs (#142)
- Sampling duration config values
  - OMNITRACE_SAMPLING_DURATION
  - OMNITRACE_PROCESS_SAMPLING_DURATION
  - Disables sampling after this time (in seconds) has elapsed 
- Sampling thread-id config values
  - OMNITRACE_SAMPLING_TIDS
  - OMNITRACE_SAMPLING_CPUTIME_TIDS
  - OMNITRACE_SAMPLING_REALTIME_TIDS
  - Allows user to select certain threads for sampling
- Miscellaneous
  - Tweaked the finalization verbosity messages
  - moved sampling-on-child-threads into runtime.hpp and runtime.cpp
  - fixed submodule dyninst header install
2022-08-31 06:29:19 -05:00
Jonathan R. Madsen 808ea7dfa7 Rework sampling and colorized logs (#140)
## Overview

This is a significant PR which has 3 very notable characteristics:

1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling 
  - Samples now record the current instruction pointers instead of strings
    - This _dramatically_ decreases the overhead of taking a sample
  - The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  - When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
    - `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency 
  - `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
  - Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
    - In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
  - This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

## Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

## Details

- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
  - there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
  - removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
  - removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
2022-08-31 01:24:31 -05:00
Jonathan R. Madsen a1dcd1bc4b Static libstdcxx and python (#139)
Support python + static libstdc++
2022-08-28 03:56:13 -05:00
Jonathan R. Madsen 0dd8f52292 Generic comm_data component (#132)
* Generic comm_data component

- moved rccl_comm_data to comm_data
- comm_data includes communication data for MPI

* fix timemory include with quotes

* Only support MPI comm data with full MPI support

* Increase timeouts + kill perfetto

* Update timemory submodule

* Fix missing command killall

* set +e in Kill Perfetto workflow step

* Updated MPI example to include MPI_Send and MPI_Recv calls

* Update timemory submodule with storage merge fix

* Perfetto comm data

- tracing::now<T>() function

* Fix timemory header include
2022-08-25 19:48:10 -05:00
Jonathan R. Madsen 040da3fc6a Enable TRACE_THREAD_RW_LOCKS and TRACE_THREAD_SPIN_LOCKS by default (#136)
* Enable OMNITRACE_TRACE_THREAD_{RW,SPIN}_LOCKS by default

- updates timemory submodule with updated GOTCHA submodule
- fix to GOTCHA library which defaults to not wrapping dlopen and dlsym prevents deadlock

* Bump version to 1.4.0
2022-08-08 15:28:01 -05:00
Jonathan R. Madsen 34013bc539 OMNITRACE_TRACE_THREAD_SPIN_LOCKS config (#134)
- configuration setting to wrap pthread_spin_lock, pthread_spin_trylock, pthread_spin_unlock
2022-08-08 08:38:52 -05:00
Jonathan R. Madsen cbe8862ea4 Verbose messages based on ROCP_ONLOAD_TRACE env (#131)
Debugging based on ROCP_ONLOAD_TRACE env
2022-08-08 08:38:18 -05:00
Jonathan R. Madsen 49127db414 Fix some inconsistencies in debug messages w/in category_region (#135) 2022-08-08 08:38:04 -05:00
Jonathan R. Madsen 1343c6722a offset thread ids where possible (#130)
- configure known background threads to start indexing down from TIMEMORY_MAX_THREADS
- invoke sampling::shutdown() instead of just blocking signals where possible
2022-08-08 08:37:37 -05:00
Jonathan R. Madsen ac5ce00561 Update config generation, join fix, sampler (#129)
- update timemory submodule
  - update ring_buffer to remove all dynamic allocation
  - update sampler + sampler allocator to use ring-buffers
  - fix papi_array and papi_vector labels
- fix common::join with char delimiter
- fix config generation without --advanced
2022-08-08 07:03:24 -05:00
Jonathan R. Madsen afa3df8523 Advanced category for configuration options (#125)
Adds advanced category

- advanced category hides less relevant configuration options
- omnitrace-avail has new '--advanced' option which shows these flags
- increase verbosity level to print issue with reading ppid children
- OMNITRACE_ROCTRACER_HSA_ACTIVITY defaults to ON
- OMNITRACE_ROCTRACER_HSA_API defaults to ON
2022-08-03 12:13:00 -05:00
Jonathan R. Madsen b79ce10fee RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2 (#126)
* RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2

- until v5.2 only librocprofiler64.so was symlinked in /opt/rocm. Thus linker using SOVERSION caused issues finding librocprofiler64.so.1

* Test ROCm w/ CMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF

* INSTALL_RPATH_USE_LINK_PATH for omnitrace exe
2022-08-02 14:19:04 -05:00
Jonathan R. Madsen 97d17a8ef8 Fix RPATH handling (#122)
Fix rpath handling

- remove explicit add to CMAKE_INSTALL_RPATH
- remove overriding INSTALL_RPATH_USE_LINK_PATH for exes
2022-08-01 14:19:19 -05:00
Jonathan R. Madsen 7e31d9f450 ROCm environment fixes + workflow updates (#117)
* Improve dlopen of ROCm libraries + rocprofiler test

- Use PROJECT_BINARY_DIR in tests
- Added rocprofiler test

* Revert OMNITRACE_FORCE_ROCPROFILER_INIT

* omnitrace-avail --all test

* Fix ROCP_METRICS for ROCm 5.2.0

* Fix ROCP_METRICS for ROCm 5.2.0

* Restrict containers workflow to AMDResearch/omnitrace

* Bump version to 1.3.1

* Update cpack workflow

- generate release draft
- upload installers as release assets

* Test rocprofiler w/o roctracer enabled

* Fix formatting

* verbose message
2022-07-27 06:36:52 -05:00
Jonathan R. Madsen 45be03906a RCCL support (#93)
* Initial support for RCCL

* OMNITRACE_USE_RCCLP + sampling tweaks

- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile

* Update docker and workflows to download RCCL

* Update CPack DEB with rocprofiler dependency

* Rework rccl into library and library/components folder

- add tpls/rccl/rccl/rccl.h

* Fix timemory includes

* rcclp inline definitions when disabled

* Tweaks to ubuntu-focal-external-rocm

- disable ompt
- enable building testing

* Tweaks to ubuntu-focal-external-rocm

- ctest exclude

* Tweak ubuntu-focal.yml

- remove source /.../setup-env.sh, replace with $GITHUB_ENV

* Fix ubuntu-focal-rocm + OMPI + root

* Improved rocm-smi error handling

- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled

* formatting

* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL

* Update RCCL include directory

- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>

* RCCL Testing

- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes

* Handle RCCL include w/o HIP

* RCCL requires HIP

* Update OMNITRACE_SAMPLING_CPUS for testing

* Update tests/CMakeLists.txt

* Debug settings

* Install MPI even when USE_MPI=OFF

* exclude printf

* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS

* update ubuntu rocm workflow

* Fix configure env step for ubuntu rocm
2022-07-25 12:16:11 -05:00