* Roctracer wall clock integration (#16)
* Integrates roctracer values into wall-clock
* Fixed scoping + timemory roctracer
* Fixed data race in roctracer
* Synchronized HIP API on main thread
- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose
* Debugging + MPI + transpose updates
* PTL + HSA and timemory + kernel timing
- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks
* Ignore select HIP API types
- The ignored API types are ignored because there appears to be a bug
which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory
* Tweaks to PTL config
* Timemory update + pid-prefix w/ mpi headers
- %pid%- prefix with mpi headers
- timemory submodule update
* CMake + critical trace + reorganize library source
- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace
* Remove bits/stdint-uintn.h headers
* Rename + config + depth + critical path
- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation
* Fixed duplicates in perfetto critical-trace
* reworked critical trace support
- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)
* Removed "%pid%-" output prefix in mpi_gotcha
* Update timemory submodule
* Fixed Dyninst TBB symbolic links + bump to v0.0.2
* hosttrace exe and library updates + submodule updates
- Updated dyninst submodule with TBB build ORIGIN rpath
- Updated timemory submodule
- Dyninst build with shared libs
- Dockerfile for building packaging
- Disable hidden viz in examples
- parallel-overhead max parallelism
- query_instr in hosttrace
- different file-line info format
- full module names
- minor fix to MPI support
- disable instrumention stack frames by default
- disable trap instrumentation by default
- updated hosttrace output file dumps
- removed cstdlib option
- dyninst DebugParsing option
- improved instrument_module function
- fixed some MPI support
- tweaked some testing parameters
* Continuous integration
* linux-ci on
* fix for parallel-overhead
* Updated CMAKE_INSTALL_PREFIX
* Updated installed boost libraries
* Dyninst updates fixing TBB install
* timemory + dyninst submodule updates
- fixes some timemory package option handling
- fixes dyninst libiberty handling
* Update dyninst submodule with libiberty fix
* dyninst submodule update with TBB internal build fixes
* Updated linux-ci + tests + timemeory + dyninst
- updated timemory submodule
- update dyninst submodule
- delay OnLoad implementation
- DYNINST_RT_API handling improvement
- CI for ubuntu-bionic in addition to focal
- CTest in CI
* Update TBB handling in CI
* Update dyninst with symLite fix
* Update dyninst submodule with ElfUtils-External fix
* Dyninst::ElfUtils fix
* Modified dyninst submodule TPL install
* Update dyninst submodule with improved interface libraries
* Fix to Dyninst::ElfUtils in dyninst submodule
* Updated CI build matrix + test install
* Updated CI
* CI stage updates
* Tweak
* Dyninst updates + RPATH for hosttrace exe
- hosttrace will rpath to dyninst
- Dyninst will statically link to boost by default
- Fixes to double init w/ MPI + runtime instrumentation
- minor cleanup to hosttrace
- hosttrace help exits with zero
- CI updates
* Dyninst + CI updates
* Removed ELFUTILS_BUILD_STATIC option in Dyninst
* Dyninst + visibility + roctracer + tests
- Dyninst submodule updates with dynamic pcontrol lib
- hosttrace visibility is hidden
- improved handling of DYNINST_API_RT in hosttrace
- roctracer::tear_down in finalize fixes issues with timemory
- throw error if cannot create perfetto output file
- roctracer_default_pool() safety guards
- resume calling roctracer_close_pool()
- fixed working dir of tests
- fixed env of tests (cmake and CI)
- simplified CI
* Removed stray CI if condition
* Disable hidden visibility for now
* ifdef for roctracer::tear_down + reenable hidden
* Better CI environment handling + dyninst packaging
* Fix for dyninst-package CI
* Fixes for cmake 3.15 issues with aliasing imported lib
* Fix to ubuntu-focal-dyninst-package
* Dyninst updates for packaging
- fixes issues with hard-coded paths to libraries after relocation via package installer
* Restrict CI to main and develop branches
- hosttrace library automatically collects and merges timestamps for HIP API calls and kernels with the host-side instrumentation
- mostly eliminates the need for using external rocprof
- added thread_instruction_count in perfetto output
- increased hosttrace min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- miscellaneous cmake updates
* roctracer support
- fully integrated perfetto + roctracer outputs
- thread_instruction_count in perfetto
- increased min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- updated timemory submodule
* hosttrace_launch_compiler
- support for using an alternative compiler as needed via launch compiler
- elfio added as submodule (not currently used)
- miscellaneous cmake updates
* README update + host/device categories + misc
- timemory fix for TIMEMORY_ROCTRACER_ENABLED
- transpose fix
* papi_tuple_t -> papi_tot_ins
- minor fix to Findroctracer.cmake
- tweaked some CMake option names
- moved merge-trace.jl to hosttrace-merge.jl
- removed Windows line encodings from hosttrace-merge.jl
- improved handling of !perfetto and !timemory
* various tweaks
* build updates + cleanup + overlap guard + min addr range
* Library source reorg + miscellaneous tweaks
* Removed unnecessary fwd decls
* Print address range in --print-X pair mode
- hosttrace modifications
- disable instrumenting functions with overlapping sections or multiple entry points by default (control via --allow-overlapping option)
- disable instrumenting functions whose address range < 512 bytes unless a loop is present by default (control via --min-address-range option)
- disable instrumenting functions w/ loops whose address range < 64 bytes (control via --min-loop-address-range)
- Support for wrapping MPI function calls even in binary rewrite mode
- e.g. use gotcha to wrap MPI functions with hosttrace_push_trace and hosttrace_pop_trace
- New timemory only mode --> HOSTTRACE_USE_TIMEMORY=ON
- New timemory + perfetto mode --> HOSTTRACE_USE_PERFETTO=ON + HOSTTRACE_USE_TIMEMORY=ON
- Full support for all timemory components
- parallel-overhead example for measuring the overhead in a MT-parallelized application with very small instrumentation functions
- improvements to output directories for hosttrace exe
- improvements to output directories for hosttrace library
- new hosttrace options
- --print-instrumented <type> prints out the instrumented entities and exits
- --print-available <type> prints out the available instrumentation entities and exits
- --print-overlapping <type> prints out the overlapping entities and exits
- NOTE: <type> above refers to the information printed out, e.g. module name vs. function name vs. module and function name, etc.