- configure known background threads to start indexing down from TIMEMORY_MAX_THREADS
- invoke sampling::shutdown() instead of just blocking signals where possible
Adds advanced category
- advanced category hides less relevant configuration options
- omnitrace-avail has new '--advanced' option which shows these flags
- increase verbosity level to print issue with reading ppid children
- OMNITRACE_ROCTRACER_HSA_ACTIVITY defaults to ON
- OMNITRACE_ROCTRACER_HSA_API defaults to ON
* RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2
- until v5.2 only librocprofiler64.so was symlinked in /opt/rocm. Thus linker using SOVERSION caused issues finding librocprofiler64.so.1
* Test ROCm w/ CMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF
* INSTALL_RPATH_USE_LINK_PATH for omnitrace exe
* Initial support for RCCL
* OMNITRACE_USE_RCCLP + sampling tweaks
- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile
* Update docker and workflows to download RCCL
* Update CPack DEB with rocprofiler dependency
* Rework rccl into library and library/components folder
- add tpls/rccl/rccl/rccl.h
* Fix timemory includes
* rcclp inline definitions when disabled
* Tweaks to ubuntu-focal-external-rocm
- disable ompt
- enable building testing
* Tweaks to ubuntu-focal-external-rocm
- ctest exclude
* Tweak ubuntu-focal.yml
- remove source /.../setup-env.sh, replace with $GITHUB_ENV
* Fix ubuntu-focal-rocm + OMPI + root
* Improved rocm-smi error handling
- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled
* formatting
* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL
* Update RCCL include directory
- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>
* RCCL Testing
- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes
* Handle RCCL include w/o HIP
* RCCL requires HIP
* Update OMNITRACE_SAMPLING_CPUS for testing
* Update tests/CMakeLists.txt
* Debug settings
* Install MPI even when USE_MPI=OFF
* exclude printf
* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS
* update ubuntu rocm workflow
* Fix configure env step for ubuntu rocm
* exit gotcha + remove DelayedInit state + cleanup
- exit gotcha which wraps exit, quick_exit, abort
- minor refactor of mpi gotchas
- removed some redundant code in omnitrace_finalized_hidden
- exclude instrumenting functions starting with dlopen and dlsym
- exclude instrumenting exit, quick_exit, and abort functions
- update timemory submodule with support for new gotcha_invoker with (gotcha_data, <function pointer>, args...)
* Improved rocm_smi error handling
* Fix reliability when KOKKOS_PROFILE_LIBRARY is set in env
- in certain situations, an exe using kokkos may be instrumented
- this will cause libomnitrace to be dlopened via libomnitrace-dl
- if KOKKOS_PROFILE_LIBRARY is set to libomnitrace and not libomnitrace-dl, you will end up with different instances of libomnitrace trying to collect data
* Set OMNITRACE_MAX_THREADS=32 in CI
* Added new tests validating gotcha wrappers
* Update MPI example to use thread
* Tweaks to mpi-flat test and mpi_gotcha
- enabled MPI_Comm_size and MPI_Comm_rank in mpip so disabled them at runtime
- set test to collapse threads and processes
* Tweak to test and example
- mpi test sets GOTCHA_DEBUG=1 in env
- removed checking for MPI_{Comm_dup,Comm_group,Group_incl}
- tweaked tests so pthread_join is where it is expected
* tweak starting sampling on main thread
- rename omnitrace::common::utility to omnitrace::common::path
- tracing::thread_init_sampling does not start sampling on main thread
* Cancel sampling itimers + timemory submodule update
- updates timemory submodule with support for cancelling itimer and SIGRTMIN through SIGRTMAX
* fix omnitrace print-* with libraries
* timemory submodule update
* Update workflows to use ./bin/omnitrace instead of ./omnitrace
* cmake format
* update timemory submodule
- fix ODR violations in utility/procfs
* cmake updates
- uniform find_package for all ROCm-based libraries
* tweak transpose example
- throw exception instead of std::exit
* Inspect cmdv name before assuming not exe
- some ELF execs "think" they are libraries so only assume rewrite + simulate + all-functions if filename looks like library
- adds some test for --print-available -- <library>
* Fix _has_lib_prefix when command is < 3
* Updates and reverts to omnitrace exe
- update module_function operator< and operator==
- add function_signature operator<
- refactor module_function ctor
- revert some previous changes w.r.t. simulate and include_unninstr
* Fix source/bin/tests to use same output dir as tests
* cmake format
* Segfault mitigation + refactor + modify function iteration
- refactor module_function ctor to avoid segfaults
- string_t -> std::string
- replace std::string with std::string_view in some places
- get_name(module_t*)
- get_name(procedure_t*)
- disable using both app_modules and app_functions
- new option: --parse-all-modules to iterate over app_modules
- removed some unused code w.r.t. debug info
* Disable module_function address range for uninstrumentable functions
* Disable module_function address range for uninstrumentable functions
* Refactored getting file/line info and init/fini
- use dyninst insertInitCallback and insertFiniCallback if main not found
- fixed all issues with segmentation faults in --simulate --all-functions
* revert changes to Findrocprofiler.cmake
* Initial support for GPU hardware counters
* Update find modules for roctracer and rocprofiler
- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure
* Improve ConfigCPack for MPI
* Update rocprofiler
- rocm_metrics()
- minor cleanup
* Update rocm find modules
* declare rocm_metrics + call in omnitrace-avail
* relocate omnitrace-launch-compiler
* REALPATH and find_modules
* Examples cmake (may drop)
* omnitrace-avail
- hw_counter categories
- init rocm
* setenv updates for rocprofiler in library.cpp and dl.cpp
* get_rocm_events config
* gpu::hip_device_count()
* rocm_metrics returns hardware_counters::info
* - relocated library/components/roctracer_callbacks.* to library/roctracer.*
- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker
* update omnitrace-avail info_type generation
* Update timemory submodule
* rocprofiler component
* cmake formatting
* omnitrace-avail handle no GPUs
- Add -c command-line option for --categories
- support verbosity
* hsa_rsrc_factory throws exceptions
- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler
* rocprofiler symbols for when disabled
* Fix warning in omnitrace-avail
- std::stringstream from initializer list would use explicit constructor
* Fix finalization after settings are deleted
* Reorganized rocprofiler source
* Updated formatting
* Miscellaneous tweaks
- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes
* Improved sampling performance
* Sampling tweaks
- backtrace::get() returns vector of string_view
- further performance improvements
- tweaked _use_label
- wrapped samples in perfetto "samples [omnitrace]" block
- samples in TID=0 in sampling mode are in separate thread row
* Fix empty HW counter desc + category for sampling
- fallback to to metric name if papi event info description not found
- add perfetto sampling category
* Limit the SIGALRM frequency
- found when using ROCm-enabled OpenMPI with rocHPL
- when wrapping pthread_rwlock_rdlock, pthread_rwlock_wrlock, and pthread_rwlock_unlock, omnitrace has been found to deadlock for some unknown reason
- New configuration variable: OMNITRACE_TRACE_THREAD_RW_LOCKS which defaults to false
- handle OMNITRACE_ENABLED=OFF by disabling everything
- use set_data to get wrappee in pthread_create_gotcha
- clear roctracer_data storage if roctracer not initialized
* HIP API perfetto args + updated perfetto categories
- Support for HIP API args field in perfetto
- PERFETTO_CATEGORIES -> OMNITRACE_PERFETTO_CATEGORIES
- Changed perfetto categories for several trace events and trace counters
- migrated several TRACE_EVENT_* to use omnitrace::tracing::{push,pop}_perfetto_ts(...)
* Tweaked category_region to encode the type of args as well as value
- Affects MPI args field in perfetto
* Improved testing in ubuntu-focal.yml
- "Test Install" step sources setup-env.sh
- "Test Install" step tests python support
- "Test Install" step tests reading ~/.omnitrace.cfg
- Avoid installing boost and tbb libs when building from submodule
* validate-perfetto-proto.py accepts -m / --categories
* Remove reference from category_region typeids
* Tweak opensuse action name
* Tweak the "Test Install" Step of ubuntu-focal
* Rework submodule installation
- use add_subdirectory(... EXCLUDE_FROM_ALL) + explicit installation of deps
- install all library deps to lib/omnitrace
- internal builds of dyninst use libomnitrace-rt for binary rewriting
- support libdyninstAPI_RT not in LD_LIBRARY_PATH when dyninst built internally
* Update ubuntu-focal to test full dyninst install
* Use RelWithDebInfo because Dyninst segfaults with MinSizeRel
* Fix ubuntu-focal.yml install step
* Config updates
- See PR #69 for details
- change type of OMNITRACE_DL_VERBOSE
- add "deprecated" category to OMNITRACE_ROCM_SMI_DEVICES
- reduce size of perfetto shared memory size hint
- deprecate OMNITRACE_OUTPUT_FILE in favor of OMNITRACE_PERFETTO_FILE
- set papi event choices
- read config file after reading command line
- fix update of OMNITRACE_DL_VERBOSE
- mark several settings as hidden
- timemory update support hidden attribute for settings
- rework get_perfetto_output_filename()
- Hide settings from not available backends
* Rework omnitrace-avail to support dumping configurations
* Overwrite query, tests, output flag
- Support using -O flag when dumping config
- Support checking before overwriting existing config
- Support --force to overwrite existing config
- Fix get_component_info not including omnitrace components
- Testing for dumping config
* Update documentation on omnitrace-avail
* Fix issue with timemory format + "/__w/"
* Update output prefix keys docs
* Rename --dump-config to --generate-config
* Hide MPI related options
- OMNITRACE_PERFETTO_COMBINE_TRACES and OMNITRACE_COLLAPSE_PROCESSES are hidden w/o MPI support
* Fix to pid via mpi_gotcha
* OMNITRACE_VERSION defines
* call perfetto on hsa_activity_callback thread
* Test label tweak
* Config fixes
- Change type of OMNITRACE_DL_VERBOSE
- Update OMNITRACE_DL_VERBOSE properly
- Add OMNITRACE_ROCM_SMI_DEVICES to deprecated group
- Set default_process_suffix
* metadata for OMNITRACE_VERSION and OMNITRACE_HIP_VERSION
tracing NS + category region component
- made library.cpp impl more broadly available
- support for perfetto args
- MPI wrappers encode args and return type
- new categories / perfetto categories
- omnitrace_library category -> libomnitrace
- omnitrace_dl_library -> libomnitrace-dl
* Remove reliance on MPI_Comm_rank
- read /proc/<PID>/tasks/<PID>/children of parent process to deduce the rank
- Old format relied on user calling MPI_Comm_rank(MPI_COMM_WORLD, ...)
- if MPI_Comm_rank called with subcommunicators only, multiple ranks would write to same file
* Tweak mpi example
* Fix the main library stop routine for timemory
- the main pop_timemory function was popping too many calls
- this primarily affected recursive calls
* Lengthen the timeout for the Configure CMake step
* Fix python tests
- new validate-timemory-json.py script
* Documentation update
- Call-counts in timemory output examples in documentation were affected by the changes
* Fix the per-thread metrics during finalization
- pthread_create_mutex starts/stops the per-thread data
- removed unintentional continue statement
* Docs tweaks
* Fix lap counter on per-thread metrics
- add OnLoad and OnUnload to omnitrace-dl
- disable global fence for kokkos profiling tools
- tweak omnitrace_strip_target to use wildcards
- added dl-gen.py script for generating dlopen bindings
- added support for kokkosp_request_tool_settings
- added support for kokkosp_dual_view_sync
- added support for kokkosp_dual_view_modify
* Fix sampling counter time scales
- All perfetto trace events have "begin_ns" and "end_ns" debug fields
- data for thread start and end timestamp in pthread_create_gotcha
- discard samples outside of thread start and end timestamps
- rename "CPU User CPU Time" perfetto counter to "CPU User Time"
- rename "CPU Kernel CPU Time" perfetto counter to "CPU Kernel Time"
- ensure CPU system samples in perfetto are set to zero at end
- backtrace uses comp::wall_clock record() for timestamps (consistency)
- "Peak Memory Usage [Thread X] (S)" renamed to "Thread Peak Memory Usage [X] (S)"
- "Context Switches [Thread X] (S)" renamed to "Thread Context Switches Usage [X] (S)"
- "Page Faults [Thread X] (S)" renamed to "Thread Page Faults Usage [X] (S)"
- "<PAPI_DESC> [Thread X] (S)" renamed to "Thread <PAPI_DESC> [X] (S)"
- samples
* Fix includes
* Rework sampling trace counter names + new trace counters
- reformulate trace counter names for easier comparison (thread sampling)
- new process-level trace counters for context switches (thread sampling)
- new process-level trace counters for page faults (thread sampling)
- new process-level trace counters for CPU time (thread sampling)
- new thread-level trace counters for context switches (sampling)
- new thread-level trace counters for page faults (sampling)
* tweak header include in backtrace.cpp
Support strict settings option in timemory
- timemory settings updates
- strict config option
- improved variable support
- $env: lvalues
- support for ${VARIABLE} syntax
- support for variable expansion in substring
- chained config files
* Fix perfetto_counter_track string lifetime
- ensure the C-string pointer backing the perfetto::CounterTrack label is not still valid even after resizing container
* STL includes