f4e27d8aeec3595130362fcb0be99bf86aa74ce8
11 Коммитов
| Автор | SHA1 | Сообщение | Дата | |
|---|---|---|---|---|
|
|
f6241af5ee |
User api updates (#32)
* Update invoke.hpp
* Update OMNITRACE_FUNCTION
* Update library debug messages
* ptl verbosity
* Update timemory submodule
* mpi_gotcha calls omnitrace_finalize_hidden
* omnitrace_{push,pop}_region returns error code
* omnitrace-user updates
- doxygen documentation
- omnitrace_get_user_callbacks
- omnitrace_user_error_string
- omnitrace-user functions return error codes
* Update user-api example
* Tweak to workflows and tests
* Fix for OMNITRACE_FUNCTION
- conditional impl if __GNUC__ < 9
* focal-external-rocm workflow update
|
||
|
|
138d16d16a |
Split workflows + docker usage (#31)
* Split workflows + docker usage * Fix omnitrace-ci-ubuntu-focal-external * fix env * Update path to action * fix entrypoint * Updated cancelling, disabled formatting * fix entrypoint * rework * try using container * relocate container * fix image name * shell expand * external and external-rocm * install libopenmpi-dev * remove github.workspace * github.workspace for rocm * Update bionic, etc. + docker CI * Remove self-hosted + bionic fix * GIT_DISCOVERY_ACROSS_FILESYSTEM for bionic * TIMEMORY_INSTALL_LIBRARIES + exe RPATH updates - fix RPATH for omnitrace, omnitrace-avail, and omnitrace-critical-trace * ubuntu bionic update * bionic and focal-dyninst-package updates * Disable lulesh MPI by default + timeouts - increase openmp CG timeout - decrease openmp CG runtime |
||
|
|
d80752bc69 |
User API + reorganized lib folders (#30)
* User API + reorganized lib folders - omnitrace_user_start_trace - omnitrace_user_stop_trace - omnitrace_user_start_thread_trace - omnitrace_user_stop_thread_trace - omnitrace_user_push_region - omnitrace_user_pop_region * New OpenMP examples/tests * Fix to KokkosP * OMPT support - fixed omnitrace instrumenting reporting - common invoke improvements - component::user_region * exclude kmp_threadprivate_ * Separate omnitrace into multiple files * PTL and timemory submodule updates * Active guards + USE_OMPT guards in omnitrace-dl * Tweak transpose default iterations * omnitrace-precommit build target * Omnitrace exe restructuring pt 2 - Never instrument functions with less than 4 instructions - Never instrument ompt_start_tool or nanosleep - module_function serializes heuristics - removed hash stuff from omnitrace - removed instr_procedures lambda - WAITPID_DEBUG_MESSAGE * set_state, "_hidden" fix, CI exceptions, backtrace fix - set_state function - fixed "_hidden" from appearing in print macros using __FUNCTION__ - OMNITRACE_CI_THROW - more CI checks in library - fixed backtrace init value sample issue being ignored * Tweaks to OMPT tests * cmake-formatting * Removed debug output from backtrace processing * Fix warnings and verbosity * omnitrace-dl fix for libomp * omnitrace-avail fixes - remove second omnitrace_init_library call - fix -r option not working * Additional testing - source/bin/tests - tests for omnitrace-exe - tests for omnitrace-avail * cmake-format * Reduce runtime of openmp-lu * Update openmp-lu and tests timeout * openmp-lu and CI tweaks - decrease iterations - OMP_NUM_THREADS=2 - install clang and libomp-dev in linux-ci - fix data-files in linux-ci |
||
|
|
0d5c557552 |
Stability improvements (#26)
* omnitrace verbprintf and errprintf * avail categories fix * omnitrace-dl namespace * OMNITRACE_CI macro / OMNITRACE_BUILD_CI option - always enables asserts * Roctracer improvements - Reworked roctracer significantly - Added categories to settings - create_cpu_cid_entry - handle clock_skew in roctracer - fixed roctracer activity names - hip_api_callback is "host" - perfetto::Flow for GPU * timemory submodule update * Tweak to redirect * Improved recursive guards - functors component - created "_hidden" variants of instrumentation funcs - omnitrace_* calls omnitrace_*_hidden - omnitrace-dl calls non-hidden - omnitrace-dl now strongly protects against recursion - omnitrace-dl now is standalone w.r.t. headers * Stability fixes - OMNITRACE_DEBUG_PUSH env variable - fix to HSA_TOOLS_LIB in dl.cpp - Fixed SFINAE warning in mpi_gotcha - Handle 64, _l, _r extensions in whole function names * cmake formatting * Fix for last commit + push/pop count info - don't instrument rocr::core::Signal::WaitAny - don't instrument rocr::core::Runtime::AsyncEventsLoop - fixed main not being popped in runtime instrument - updated interval data reserve - copy hash-ids and aliases onto main thread - warn about unclosed regions - removed guards in libomnitrace - added error checks for incorrect push_count vs. pop_count - fixed missing pop_timemory in last commit * Finalization methodology updates - added some more rocr:: functions to whole function names * Add event_base_loop to whole functions * Update VERSION to 0.1.0 |
||
|
|
145a6ae06f |
omnitrace-dl-library (#25)
* timemory submodule update
* Visibility, setting categories, and task-group protection
- OMNITRACE_VISIBILITY instead of TIMEMORY_VISIBILITY
- increased task group data-race protection
- add omnitrace categories to settings
* set component_apis type-trait
* omnitrace-dl-library implementation
- this library dlopen + dlsym's libomnitrace
- significantly reduces the instrumentation time
* omnitrace-avail categories
- suppress AVAILABLE column when --available
* omnitrace-exe update
- uses omnitrace-dl
- adds --print-excluded option
- removes --jump option
- comments out --stubs option
- removes --stdlib option
- support for C++ STL functions not in libstdc++
- tweak the --print-* outputs
- significantly refactors instrument_module and instrument_entity
- removes unused c_stdlib_module_constraint
- removes unused c_stdlib_function_constraint
- decreases get_whole_function_names() coverage
* library.cpp updates
- OMNITRACE_DEBUG -> OMNITRACE_DEBUG_F
- omnitrace_finalize sets state earlier
- omnitrace_finalize clears push/pop functors
- increased tasking shutdown safety
* - fix critical-trace thread hierarchy
- signal handler calls omnitrace_finalize
- get_cpu_cid_stack supports parent tid
- interval data reserves
- omnitrace-avail serialization support for module_functions
- omnitrace --simulate option
- omnitrace --print-format option
- omnitrace --load-instr option
- omnitrace runtime-inst doesn't oneTimeCode
- updated regex
- expand get_whole_function_names()
- Test Install CI update
* fixes to last commit
- expand get_whole_function_names()
- ignore sig c modules
- kill process in signal handler
* Remove RTLD_DEEPBIND + more
- removed use of RTLD_DEEPBIND
- causes dyninst segfaults
- fixed signal handling
- updated timemory submodule
* Build/link static timemory libraries
* omnitrace --{module,function}-restrict option
- Added restrict regex options
- Reworked handling of regex options
- Reworked reporting of module/function skipping
- Handle -o w/o file specified
* timemory-avail
- category views
- backtrace::sample checks state
* get_debug_sampling()
|
||
|
|
b016c8929f |
Critical trace updates (#24)
* Source code restructuring * Critical trace updates following restructuring * thread_sampler, timestamps - thread_sampler - CPU frequency managed via thread_sampler - rocm-smi managed via thread_sampler - Use consistent timestamps for perfetto - removed hsa_timer_t in favor of wall_clock::record() - disable KokkosP by default - re-enable critical-trace testing * cmake-format * Fix for defines.hpp.in * Remove OMNITRACE_ROCM_SMI_FREQ - thread_sampler freq is set via OMNITRACE_SAMPLING_FREQ w/ max of 1000 * Increase CI Install Dyninst timeout * Debug macros + omnitrace_init_tooling + config - new debug macros - extern "C" omnitrace_init_tooling - guard get_rocm_smi_devices * Miscellaneous tweaks - tweak to transpose - critical_trace::Device::ANY - perfetto "critical-trace" category - OMNITRACE_VERBOSE usage * Disable key and tid data for HIP API calls - non-kernels are ignored in activity callback * critical-trace exe updates - fix perfetto generation - improved logging - improved readability * timemory submodule update - lulesh example cmake tweaks |
||
|
|
39f17ae8b8 |
rocm-smi and KokkosTools support (#23)
* renamed omnitrace_thread_data to thread_data
* initial implementation
* Numerous fixes and updates
- Updated timemory submodule
- Updated perfetto submodule (pulls in fixes for TRACE_EVENT)
- pthread_gotcha only after omnitrace_init_tooling
- omnitrace banner
- config settings for rocm-smi freq and devices
- critical_trace::get_entries
- OMNITRACE_BASIC_PRINT
- rocm_smi perfetto category
- redirect roctracer warnings for ROCm 4.5.0
- property specializations for rocm-smi components
- units fixes data_tracker types
- roctracer entries for pthread_create and start_thread
- omnitrace-avail defaults to settings, not components
- settings have conforming names
- settings warn about duplicates
- ptl named threads
- decreased max freq for sampler SIGALRM
- rocm-smi names thread
- rocm-smi avoids call to hipGetDeviceCount
- name roctracer activity callback threads
- fixed binary rewrite test output names
* Update lulesh example
- supports non-UVM GPU
* Lulesh tweaks + formatting
* KokkosP + Mode + Roctracer sampling deadlock fix
- kokkosp support
- omnitrace_init_library
- config::print_settings()
- config::get_mode()
- omnitrace::Mode
- omnitrace-avail improvements (removes settings)
- handle get_verbose() < 0
- disable dyninst InstrStackFrames by default
- handle perf_event_paranoid > 1 by disabling PAPI
- SIGALRM max freq to 5.0
- Name threads
- rocm-smi handles get_use_perfetto() and get_use_timemory()
- HSA_ENABLE_INTERRUPT=0 when roctracer + sampling (fixes deadlock)
* Tests, API renaming, roctracer
- disable renaming of thread 0
- verbprintf_bare
- enable dyninst merge tramp
- tweaked some omnitrace exe verbose levels
- reworked roctracer::setup and roctracer::shutdown
- rocm_smi::data::poll checks get_state()
- omnitrace_trace_finalize -> omnitrace_finalize
- omnitrace_trace_init -> omnitrace_init
- omnitrace_trace_set_env -> omnitrace_set_env
- omnitrace_trace_set_mpi -> omnitrace_set_mpi
- sampling mode does not disable timemory
- disable roctracer before shutting down rocm-smi
- lulesh tests w/ and w/o kokkosp
- lulesh tests for perfetto only
- with --dynamic-callsites --traps --allow-overlapping
- lulesh tests for timemory only
- with --stdlib --dynamic-callsites --traps --allow-overlapping
* Update timemory submodule
- fix for TIMEMORY_PROPERTY_SPECIALIZATION
* get_verbose() handling + timemory submodule update
- Findroctracer.cmake uses find_package(hsakmt)
* Stability fixes + rework roctracer + perfetto
- reworked roctracer start up
- critical_trace perfetto basic values
- perfetto sampling category
- sampler checks signals
- peak_rss in sampling
- pthread_gotcha::shutdown()
- rocm_smi::device_count()
- HSA_TOOLS_LIB is set
- HSA_ENABLE_INTERRUPT in omnitrace exe
- omnitrace exe verbosity level changes
- Avoid instrumenting Impl ns in Kokkos
- gpu::device_count prefers rocm_smi instead of hip
- ptl blocks signals
- fixed pthread_gotcha roctracer_data values
- removed runtime-instrument-sampling tests
- timemory submodule update
* cmake formatting
* timemory + roctracer updates
- fix timemory issue with papi_common
- fix timemory issue with units
- define roctracer::is_setup()
* Miscellaneous tweaks
- Disable sampling during runtime instrument
- Fixed warnings about dynamic callsites
- Fixed backtrace output when timemory disabled
- Test tweaks
* cmake-format
* omnitrace_target_compile_definitions
* timemory submodule update
* config, omnitrace, State, mpi_gotcha updates
- use OMNITRACE_THROW instead of direct throw
- is_attached()
- is_binary_rewrite()
- get_is_continuous_integration()
- get_debug_init()
- get_debug_finalize()
- max_thread_bookmarks default to 1
- State::Init
- app_thread oneTimeCode
- runtime instrumentation uses waitpid
- fixed init_names
- include main in MPI runs
- fixed sampling setup when disabled
- reworked mpi_gotcha
- disabled critical trace in transpose test
* cmake-format
* handle rocm_smi::device_count() exception
* CI timeouts
* Re-enable runtime-instrument + sampling
|
||
|
|
778af2a760 |
Sampling support + testing + omnitrace namespace (#19)
* omnitrace namespace
* Kokkos + Lulesh example/tests
* Sampling support + more
- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options
* Release workflow updates
* Updated settings printing
* Fixed defaults in README
* Tweak setting defaults in README
* CMake fixes
* cmake-format
* clang-format
* LULESH_USE_MPI OFF
* LULESH_USE_MPI fix
* timemory add_secondary fix
* timemory ambiguous internal namespace fix
* Update timemory submodule
* Handle output path/prefix in omnitrace
- updated timemory
- updated test environment
* sampling + papi fix
* Fix to sampling without PAPI
* Fix for using too many processors in CI
* formatting
* Updated CI
- minor cmake tweaks
- updated timemory submodule
* Updated CI
* Updated CI
* CI + timemory updates
- data race fixes
* CI updates + debug for sampling
* Sampling updates
- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality
* Minimum OMNITRACE_THREAD_COUNT of 128
* Handle multiple dims in sampler data
* Configure libunwind support for timemory
* Improved safeguards for sampling
- updated CI
- lulesh runtime-instrument test tweak
* formatting
* CI updates + sampler updates + misc
- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default
* Updated timemory submodule
- hidden visibility for timemory
- storage finalizers do not capture this
* Update timemory submodule
- component visibility updates
* Reworked header includes
- use <...> for timemory headers
- always include <library/defines.hpp>
* Rename some config options
* Update PTL submodule
* Update kokkos submodule
* Updated sampling
* Updated CI
* Reworked instrumentation exe
- lowered min-address-range threshold to 256
- extended whole function exclude
* CI fix + timemory submodule update
- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead
* Sampling flags + transpose update + CI update
- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads
* CI update
- removed ubuntu-focal-external-debug
- reduced data artifacts upload
* CI timeouts
- updated timemory submodule
- minor tweaks to omnitrace exe logging
* LICENSE updates (partial)
* CI Test stage timeout extension
* Docker and Packaging updates
* Miscellaneous fixes/tweaks
- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix
* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate
* cmake format
* Sampler deadlock fixes
* Removed debug messages from sampler
* Fix for MPI detection + test tweaks + misc
* Sampler deadlock fixes + misc
- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
- no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY
* omnitrace-avail + libunwind update + restructure
- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace
* Fix remaining reorganization issues
- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting
* ensure_storage fix + avail improvements
- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail
* Delay settings initialization
- slight tweak to tests w/ MPI
* Disable OpenMPI testing w/ ubuntu-bionic
- MPI testing is hanging bc of network interface issue on system:
> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
> Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.
|
||
|
|
39cf760a4e | Rename hosttrace to omnitrace (#18) | ||
|
|
6c93674f92 |
cmake-format + miscellaneous tweaks (#13)
* cmake-format + miscellaneous tweaks * Formatted cmake in examples and tests * Updated linux-ci.yml artifacts naming * Updated clang-format * Fixed submodule branches |
||
|
|
35ab6b0110 |
GitHub CI (#11)
* Continuous integration * linux-ci on * fix for parallel-overhead * Updated CMAKE_INSTALL_PREFIX * Updated installed boost libraries * Dyninst updates fixing TBB install * timemory + dyninst submodule updates - fixes some timemory package option handling - fixes dyninst libiberty handling * Update dyninst submodule with libiberty fix * dyninst submodule update with TBB internal build fixes * Updated linux-ci + tests + timemeory + dyninst - updated timemory submodule - update dyninst submodule - delay OnLoad implementation - DYNINST_RT_API handling improvement - CI for ubuntu-bionic in addition to focal - CTest in CI * Update TBB handling in CI * Update dyninst with symLite fix * Update dyninst submodule with ElfUtils-External fix * Dyninst::ElfUtils fix * Modified dyninst submodule TPL install * Update dyninst submodule with improved interface libraries * Fix to Dyninst::ElfUtils in dyninst submodule * Updated CI build matrix + test install * Updated CI * CI stage updates * Tweak * Dyninst updates + RPATH for hosttrace exe - hosttrace will rpath to dyninst - Dyninst will statically link to boost by default - Fixes to double init w/ MPI + runtime instrumentation - minor cleanup to hosttrace - hosttrace help exits with zero - CI updates * Dyninst + CI updates * Removed ELFUTILS_BUILD_STATIC option in Dyninst * Dyninst + visibility + roctracer + tests - Dyninst submodule updates with dynamic pcontrol lib - hosttrace visibility is hidden - improved handling of DYNINST_API_RT in hosttrace - roctracer::tear_down in finalize fixes issues with timemory - throw error if cannot create perfetto output file - roctracer_default_pool() safety guards - resume calling roctracer_close_pool() - fixed working dir of tests - fixed env of tests (cmake and CI) - simplified CI * Removed stray CI if condition * Disable hidden visibility for now * ifdef for roctracer::tear_down + reenable hidden * Better CI environment handling + dyninst packaging * Fix for dyninst-package CI * Fixes for cmake 3.15 issues with aliasing imported lib * Fix to ubuntu-focal-dyninst-package * Dyninst updates for packaging - fixes issues with hard-coded paths to libraries after relocation via package installer * Restrict CI to main and develop branches |