提交图

61 次代码提交

作者 SHA1 备注 提交日期
Jonathan R. Madsen 57ef312d26 Option rename + minor fixes (#57)
- Set choices of OMNITRACE_BACKEND option
- rename OMNITRACE_SHMEM_SIZE_HINT_KB option
- rename OMNITRACE_BUFFER_SIZE_KB option
- rename OMNITRACE_COMBINE_PERFETTO_TRACES
- rename OMNITRACE_BACKEND option
- default to OMNITRACE_COLLAPSE_PROCESSES for combining perfetto traces
- OMNITRACE_PERFETTO_FILL_POLICY option
- fix unused variables due to constexpr in add_critical_trace
- rename perfetto config from "track_event" to "omnitrace"
- fix build-release.sh + python
- handle config file updating OMNITRACE_DL_VERBOSE in omnitrace-dl
- rename roctrace.cfg to omnitrace.cfg
- accept "on" and "off" for get_sampling_cpus()

[ROCm/rocprofiler-systems commit: 346f8cd0bc]
2022-05-10 17:30:45 -05:00
Jonathan R. Madsen 77721c2db5 Remove wikipedia links [skip ci] (#56)
[ROCm/rocprofiler-systems commit: ef202f3d86]
2022-05-10 13:16:04 -05:00
Jonathan R. Madsen facd23b7bb Docs images [skip ci] (#55)
* Added images of perfetto in docs

* README images + updates

[ROCm/rocprofiler-systems commit: ae2ea090fb]
2022-05-08 07:57:09 -05:00
Jonathan R. Madsen 14d8998ba0 Fix $HOME/.omnitrace [skip ci] (#54)
[ROCm/rocprofiler-systems commit: e60fae5361]
2022-05-08 06:21:14 -05:00
Jonathan R. Madsen 0d5f0fb9cf Support for tracing mutex locking (#52)
* Parallel overhead example with locks

* Support tracing mutex locking + more

- support wrapping pthread_mutex_lock
- support wrapping pthread_mutex_unlock
- support wrapping pthread_mutex_trylock
- get_perfetto_combined_traces setting
- OMNITRACE_TRACE_THREAD_LOCKS option
- ThreadState
- critical trace includes queue id
- enabled/disabled settings in timemory
- fix OMNITRACE_TIMEMORY_COMPONENTS
- fix reading config
- fix setting categories
- applied ThreadState::Internal in various places
- utility::get_filled_array
- utility::get_reserved_vector
- utility::get_thread_index
- fork_gotcha messages about forks
- split out some pthread_gotcha functionality into pthread_create_gotcha
- handle queue id in roctracer callbacks

* Update timemory and PTL submodules

* Misc CMake updates

- Includes fix to omnitrace-static-lib{gcc,stdcxx}

* Misc cleanup to pthread_mutex_gotcha and backtrace

* Fix to duplicate field in module_function json

* Improvement to debug messages

* omnitrace-dl and common improvements

- tweak to delimit
- common::ignore message
- common::join quoting of strings
- omnitrace_set_env ignores if inited and active
- omnitrace_set_mpi ignores if inited and active

* nsync for transpose example

* Fix to thread_deleter<void> functor invoke

* Fix thread state and HIP stream enums

[ROCm/rocprofiler-systems commit: b208047741]
2022-05-08 04:40:10 -05:00
Jonathan R. Madsen 0094a471fd Update documentation (#53)
- updated info about OMNITRACE_USE_MPI
- removed wiki links
- info about metadata.json
- update HW counters and fix typos
- fix update-docs.sh

[ROCm/rocprofiler-systems commit: bab90baf0b]
2022-05-08 02:51:35 -05:00
Jonathan R. Madsen 060da8159c Code coverage updates (#50)
* code coverage updates

- python support
- refactored source

* remove code_coverage::operator+ and operator+=

* impl/coverage.hpp

[ROCm/rocprofiler-systems commit: 134b33320d]
2022-05-08 01:40:56 -05:00
Jonathan R. Madsen 00315e1e2f Reorganize source/lib/omnitrace (#51)
- Got rid of `source/lib/omnitrace/include` and `source/lib/omnitrace/src` and merged into `source/lib/omnitrace`
- Updated perfetto submodule to v25.0
- Updated papi submodule

[ROCm/rocprofiler-systems commit: 1f66e23fdd]
2022-05-02 13:08:51 -05:00
Jonathan R. Madsen b3c5a6f048 perfetto mpi + mpi example (#49)
[ROCm/rocprofiler-systems commit: 6b7b6e46cf]
2022-04-27 16:58:45 -05:00
Jonathan R. Madsen 2bb6fd0cfb Misc updates (#48)
- reworked `add_critical_trace`
- `get_use_thread_sampling` / `"OMNITRACE_USE_THREAD_SAMPLING"` option
- `get_cpu_cid_stack_lock`
- reworked finalization messaging
- significant updates to pthread_gotcha
  - shutdown stability
  - `"start_thread"` entries
- `rocm_smi` stability 
- roctracer_callbacks add critical trace entries on the callback thread
- reworked CPU CID initialization
- thread_sampler stability

[ROCm/rocprofiler-systems commit: 9b25d4b3b5]
2022-04-27 16:56:38 -05:00
Jonathan R. Madsen d45e84b116 GOTCHA + Kokkos + tasking + more (#47)
* GOTCHA + Kokkos + tasking + more

- update gotcha with fix for dlsym(RTLD_NEXT, ...)
- support for standalone KOKKOS_PROFILE_LIBRARY
- remove extra flags for omnitrace-user
- roctracer and critical_trace namespaces in tasking
- generic tasking functions, e.g. join(), shutdown(), etc.
- omnitrace_init_tooling_hidden in api.hpp
- ompt.cpp uses OMNITRACE_USE_OMPT
- kokkosp uses user_region instead of omnitrace component
- re-enable recycling thread ids
- more generic _{push,pop}_perfetto functors
- fix for thread_data::instance(construct_on_init, ...)
- fix for omnitrace-headers interface target
- omnitrace_watch_for_change

[ROCm/rocprofiler-systems commit: 29220cba58]
2022-04-26 22:08:51 -05:00
Jonathan R. Madsen 72d0a7d08a Code Coverage Support (#46)
* Code-coverage support

* Examples update

- code-coverage example
- tweak transpose and parallel-overhead

* Coverage output + testing

- config::get_setting value(...)
- REGULAR_EXPRESSION -> REGEX in cmake func args
- coverage.hpp header
- coverage JSON
- coverage tests

* cmake formatting

* Library instrumentation w/o main + more

- fixed library instrumentation w/o main
- use TIMEMORY_PROJECT_NAME in output messages
- removed '--driver' option from omnitrace exe
- support coverage in trace mode
- OMNITRACE_KOKKOS_KERNEL_LOGGER
- support multiple calls to omnitrace_set_env after init if already called
- support multiple calls to omnitrace_set_mpi after init if same args
- support multiple calls to omnitrace_init if same mode
- unique_ptr_t for thread_data which calls finalize when thread_data is destroyed
- tweaked openmp tests
- improved finalization

* Replace CI --output-on-failure with -V

* Fix to OMNITRACE_DL_INVOKE

* omnitrace-exe and testing updates

- omnitrace::omnitrace-timemory interface library
- support for configs in omnitrace exe
- print-{available,instrumented,...} opts no longer exit w/o --simulate
- all tests apply --print-instrumented functions
- tweaked coverage tests
- print-* options print instructions not address range

* Remove OMNITRACE_DEBUG_FINALIZE=ON from CI

* Python cmake tweaks

* Tweak test ordering

* Upload CI artifacts if fail or success

* CI Python tweaks

- Use OMNITRACE_PYTHON_PREFIX and OMNITRACE_PYTHON_ENVS

* CI ELFULTILS_DOWNLOAD_VERSION

* test tweaks

- labels and more coverage tests

* tweak to omnitrace --config handling

* Update module/function constraint handling + PP

- tweak pre-processor definition handling
- removed free-standing module_constraint
- remove free-standing routine_constraint
- remove module_name.find("omnitrace") module constraint
- fully handle the output path of omnitrace *-instr files
- get_use_code_coverage config option
- print-coverage option
- coverage_module_functions

* use github.job not github.name

* Re-enable HSA_ENABLE_INTERRUPT

- remove coverage address report

[ROCm/rocprofiler-systems commit: 791375bb24]
2022-04-25 17:00:52 -05:00
Jonathan R. Madsen 28ade7fbb9 Update CI to test multiple python versions (#45)
* Update CI to test multiple python versions

* Ensure numpy is installed

* Handle lulesh with cmake < 3.16

* Fix typo

* Bump minimum CMake version to 3.16

- CMake 3.15 has issue with PTL object library

* Tweak CI test output

[ROCm/rocprofiler-systems commit: 22eaa780ec]
2022-04-22 03:05:07 -05:00
Jonathan R. Madsen 55fb69a57c Miscellaneous fixes (#44)
* Miscellaneous fixes

- handle HSA OnLoad called during omnitrace-avail
- disable setting HSA_ENABLE_INTERRUPT when roctracer not used
- sampler max verbose
- fix roctracer get_clock_skew
- cleanup roctracer debug output
- update timemory submodule with fence
- simplify min-instructions vs. min-address-range specification
- exclude cxx regex updates
- disable HSA_TOOLS_LIB and HSA_ENABLE_INTERRUPT when no roctracer

* git safe.directory

[ROCm/rocprofiler-systems commit: 77703ef4f1]
2022-04-21 22:59:50 -05:00
Jonathan R. Madsen b4b5acf0a6 omnitrace-compile-definitions (CMake) [skip ci] (#43)
[ROCm/rocprofiler-systems commit: cc9ce3a871]
2022-04-21 21:52:57 -05:00
Jonathan R. Madsen a438000c21 Multiple python versions (#42)
* Support multiple Python versions in single build

* RPATH + Split up config into config and runtime

* pybind11 submodule

* Docker build updates

[ROCm/rocprofiler-systems commit: 4db6ba3d28]
2022-04-21 21:36:07 -05:00
Jonathan R. Madsen 681678ff11 Support for building PAPI via a submodule (#41)
* Enable building PAPI via submodule

* Miscellaneous fixes

- Use TIMEMORY_PAPI_ARRAY_SIZE in backtrace
- remove pthread_gotcha init from fork_gotcha::configure
- fix HSA OnLoad called during before tooling init

* PAPI array size + PAPI.cmake updates

- updated timemory submodule with PAPI updates
- fix for backtrace _hw_cnt_labels

* Disable OMPT for focal

* format

[ROCm/rocprofiler-systems commit: d98e60a17f]
2022-04-21 20:33:51 -05:00
Jonathan R. Madsen 317240ca1c Setup and Nomenclature pages [skip ci] (#40)
[ROCm/rocprofiler-systems commit: e24c24dc56]
2022-04-12 00:49:55 -05:00
Jonathan R. Madsen 44c7c29b54 Workaround for dyninst bug with SIGTRAP (#39)
- on some systems (e.g. OLCF Crusher) it has been noted that dyninst will raise SIGTRAP (or SIGILL if DYNINST_SIGNAL_TRAMPOLINE_SIGILL is set in env)
- this fix adds an environment variable OMNITRACE_IGNORE_DYNINST_TRAMPOLINE which, when on, will try to ignore this

[ROCm/rocprofiler-systems commit: d3c73a5860]
2022-04-05 20:46:17 -05:00
Jonathan R. Madsen e7546b201a Python updates (#38)
* silence SFINAE disabled for fork_gotcha

* Python updates

- Options for --{module,function}-include
- libpyomnitrace is_initialized and is_finalized
- source instrumentation auto init
- atexit finalization
- improved python testing

* Documentation Update

* Fix to 'cmake -E cat' not available < cmake v3.18

* Fix for inverse tests

* Update cancelling.yml

[ROCm/rocprofiler-systems commit: 593b3b69b8]
2022-04-05 20:40:27 -05:00
Jonathan R. Madsen 6daac0f60c Python support (#37)
* Initial python support

* Add python testing

* Increase timeout for bin tests

* cmake-format

* Valid build types + testing + formatting + more

- Enforce valid build types
- Fix to numpy install
- Increase testing timeout
- Fix to cmake format glob
- Fix to backtrace verbose

* Disable stripping libraries by default

* omnitrace exe updates

- new '--print-instructions' option
- changed format of instructions in JSON
- remove no-save-fpr tests

* Default to strip libraries when release build

[ROCm/rocprofiler-systems commit: afa3edebab]
2022-04-05 00:24:34 -05:00
Jonathan R. Madsen 127e30a4d7 Documentation + Miscellaneous Fixes (#36)
* Added documentation markdown source

* Replaced AARInternal with AMDResearch in URLs

* Renamed cpack artifact names

* Fix to testing and lulesh submodule checkout

* Docker updates

* CMake and CPack

- force CMAKE_INSTALL_LIBDIR to lib
- CPACK_DEBIAN_PACKAGE_RELEASE uses OMNITRACE_CPACK_SYSTEM_NAME
- CPACK_RPM_PACKAGE_RELEASE uses OMNITRACE_CPACK_SYSTEM_NAME
- Tweak LIBOMP_LIBRARY find in examples/openmp
- Tweak setup-env.sh.in

* Partial update of README

- status badges
- docs link
- removed install info (covered by docs)

* OMNITRACE_SAMPLING_CPUS setting

- enables control over which CPUs are sampled for frequency

* omnitrace exe updates

- exclude transaction clone, virtual thunk, non-virtual thunk
- module_function::start_address
- module_function::instructions
- verbosity > 0 encodes instructions into JSON

* Miscellaneous fixes

- relocate setup-env.sh.in
- add modulefile.in
- Updated README.md and source/docs/about.md
- cmake fix for libomp
- fix license in miscellaneous places
- dl.hpp and dl.cpp

* Update timemory and dyninst submodules

- timemory signals updates
- dyninst Movement-adhoc updates

* cmake format

[ROCm/rocprofiler-systems commit: 945f541965]
2022-04-04 15:27:38 -05:00
Jonathan R. Madsen 4ddb8405ac cpack workflow for building installers (#35)
* cpack workflow for building installers

- ConfigCPack.cmake update
  - STGZ and DEB + containers + test artifact
  - DEBIAN_FRONTEND + set -v
  - submodule fix
  - actions checkout
- OMNITRACE_ROCM_VERSION + continue-on-error
- Change CPack generators + fix path to DEB
- separate configure, build, and package steps
- use cd instead of pushd
- FindROCmVersion + fix to cpack testing
- use ${ROCM_PATH}/.info/version for ROCm version info
- Tweaks for debian installer
- Packaging fixes
- Use CMAKE_SHARED_LIBRARY_SUFFIX instead of .so
- Split cpack.yml into 4 workflows
- Replace source with export in cpack
- Dyninst boost uses tar.gz instead of zip on Unix

* Fix to common join

* Update VERSION to 1.0.0

[ROCm/rocprofiler-systems commit: 5c4d5c394f]
2022-03-27 22:52:36 -05:00
Jonathan R. Madsen 5f08854a3a Relaxed module/function restrictions (#33)
* Relaxed module/function restrictions

* Updated tests

[ROCm/rocprofiler-systems commit: 4a18f55d34]
2022-03-23 00:28:25 -05:00
Jonathan R. Madsen e8819abae1 Fixes for ROCM-SMI + MPI (#34)
[ROCm/rocprofiler-systems commit: f4e27d8aee]
2022-03-23 00:28:13 -05:00
Jonathan R. Madsen 9206792846 User api updates (#32)
* Update invoke.hpp

* Update OMNITRACE_FUNCTION

* Update library debug messages

* ptl verbosity

* Update timemory submodule

* mpi_gotcha calls omnitrace_finalize_hidden

* omnitrace_{push,pop}_region returns error code

* omnitrace-user updates

- doxygen documentation
- omnitrace_get_user_callbacks
- omnitrace_user_error_string
- omnitrace-user functions return error codes

* Update user-api example

* Tweak to workflows and tests

* Fix for OMNITRACE_FUNCTION

- conditional impl if __GNUC__ < 9

* focal-external-rocm workflow update

[ROCm/rocprofiler-systems commit: f6241af5ee]
2022-03-22 15:51:57 -05:00
Jonathan R. Madsen 6b51dbccf8 Split workflows + docker usage (#31)
* Split workflows + docker usage

* Fix omnitrace-ci-ubuntu-focal-external

* fix env

* Update path to action

* fix entrypoint

* Updated cancelling, disabled formatting

* fix entrypoint

* rework

* try using container

* relocate container

* fix image name

* shell expand

* external and external-rocm

* install libopenmpi-dev

* remove github.workspace

* github.workspace for rocm

* Update bionic, etc. + docker CI

* Remove self-hosted + bionic fix

* GIT_DISCOVERY_ACROSS_FILESYSTEM for bionic

* TIMEMORY_INSTALL_LIBRARIES + exe RPATH updates

- fix RPATH for omnitrace, omnitrace-avail, and omnitrace-critical-trace

* ubuntu bionic update

* bionic and focal-dyninst-package updates

* Disable lulesh MPI by default + timeouts

- increase openmp CG timeout
- decrease openmp CG runtime

[ROCm/rocprofiler-systems commit: 138d16d16a]
2022-03-22 12:30:07 -05:00
Jonathan R. Madsen 083035dd8b User API + reorganized lib folders (#30)
* User API + reorganized lib folders

- omnitrace_user_start_trace
- omnitrace_user_stop_trace
- omnitrace_user_start_thread_trace
- omnitrace_user_stop_thread_trace
- omnitrace_user_push_region
- omnitrace_user_pop_region

* New OpenMP examples/tests

* Fix to KokkosP

* OMPT support

- fixed omnitrace instrumenting reporting
- common invoke improvements
- component::user_region

* exclude kmp_threadprivate_

* Separate omnitrace into multiple files

* PTL and timemory submodule updates

* Active guards + USE_OMPT guards in omnitrace-dl

* Tweak transpose default iterations

* omnitrace-precommit build target

* Omnitrace exe restructuring pt 2

- Never instrument functions with less than 4 instructions
- Never instrument ompt_start_tool or nanosleep
- module_function serializes heuristics
- removed hash stuff from omnitrace
- removed instr_procedures lambda
- WAITPID_DEBUG_MESSAGE

* set_state, "_hidden" fix, CI exceptions, backtrace fix

- set_state function
- fixed "_hidden" from appearing in print macros using __FUNCTION__
- OMNITRACE_CI_THROW
- more CI checks in library
- fixed backtrace init value sample issue being ignored

* Tweaks to OMPT tests

* cmake-formatting

* Removed debug output from backtrace processing

* Fix warnings and verbosity

* omnitrace-dl fix for libomp

* omnitrace-avail fixes

- remove second omnitrace_init_library call
- fix -r option not working

* Additional testing

- source/bin/tests
- tests for omnitrace-exe
- tests for omnitrace-avail

* cmake-format

* Reduce runtime of openmp-lu

* Update openmp-lu and tests timeout

* openmp-lu and CI tweaks

- decrease iterations
- OMP_NUM_THREADS=2
- install clang and libomp-dev in linux-ci
- fix data-files in linux-ci

[ROCm/rocprofiler-systems commit: d80752bc69]
2022-03-07 20:40:48 -06:00
Jonathan R. Madsen a23bf28aaa Fix compilation for ROCm 4.0 (#29)
[ROCm/rocprofiler-systems commit: 2acaa7aa9f]
2022-03-07 13:16:41 -06:00
Jonathan R. Madsen 78ae7d1e37 Tweaks to docker scripts [skip ci] (#28)
[ROCm/rocprofiler-systems commit: 80e1a0d7e7]
2022-02-25 18:30:37 -06:00
Jonathan R. Madsen 1ad5529697 Created push/pop system for whether sampling is enabled (#27)
- also permitted turning off sampling in sampling mode
- also fixed ambiguous rocm_smi namespace issue in roctracer

[ROCm/rocprofiler-systems commit: 3151dd3aeb]
2022-02-25 05:33:59 -06:00
Jonathan R. Madsen 2403bbde49 Stability improvements (#26)
* omnitrace verbprintf and errprintf

* avail categories fix

* omnitrace-dl namespace

* OMNITRACE_CI macro / OMNITRACE_BUILD_CI option

- always enables asserts

* Roctracer improvements

- Reworked roctracer significantly
- Added categories to settings
- create_cpu_cid_entry
- handle clock_skew in roctracer
- fixed roctracer activity names
- hip_api_callback is "host"
- perfetto::Flow for GPU

* timemory submodule update

* Tweak to redirect

* Improved recursive guards

- functors component
- created "_hidden" variants of instrumentation funcs
  - omnitrace_* calls omnitrace_*_hidden
  - omnitrace-dl calls non-hidden
- omnitrace-dl now strongly protects against recursion
- omnitrace-dl now is standalone w.r.t. headers

* Stability fixes
- OMNITRACE_DEBUG_PUSH env variable
- fix to HSA_TOOLS_LIB in dl.cpp
- Fixed SFINAE warning in mpi_gotcha
- Handle 64, _l, _r extensions in whole function names

* cmake formatting

* Fix for last commit + push/pop count info

- don't instrument rocr::core::Signal::WaitAny
- don't instrument rocr::core::Runtime::AsyncEventsLoop
- fixed main not being popped in runtime instrument
- updated interval data reserve
- copy hash-ids and aliases onto main thread
- warn about unclosed regions
- removed guards in libomnitrace
- added error checks for incorrect push_count vs. pop_count
- fixed missing pop_timemory in last commit

* Finalization methodology updates

- added some more rocr:: functions to whole function names

* Add event_base_loop to whole functions

* Update VERSION to 0.1.0

[ROCm/rocprofiler-systems commit: 0d5c557552]
2022-02-25 03:56:41 -06:00
Jonathan R. Madsen 8b058902a2 omnitrace-dl-library (#25)
* timemory submodule update

* Visibility, setting categories, and task-group protection

- OMNITRACE_VISIBILITY instead of TIMEMORY_VISIBILITY
- increased task group data-race protection
- add omnitrace categories to settings

* set component_apis type-trait

* omnitrace-dl-library implementation

- this library dlopen + dlsym's libomnitrace
- significantly reduces the instrumentation time

* omnitrace-avail categories

- suppress AVAILABLE column when --available

* omnitrace-exe update

- uses omnitrace-dl
- adds --print-excluded option
- removes --jump option
- comments out --stubs option
- removes --stdlib option
- support for C++ STL functions not in libstdc++
- tweak the --print-* outputs
- significantly refactors instrument_module and instrument_entity
- removes unused c_stdlib_module_constraint
- removes unused c_stdlib_function_constraint
- decreases get_whole_function_names() coverage

* library.cpp updates

- OMNITRACE_DEBUG -> OMNITRACE_DEBUG_F
- omnitrace_finalize sets state earlier
- omnitrace_finalize clears push/pop functors
- increased tasking shutdown safety

* - fix critical-trace thread hierarchy
- signal handler calls omnitrace_finalize
- get_cpu_cid_stack supports parent tid
- interval data reserves
- omnitrace-avail serialization support for module_functions
- omnitrace --simulate option
- omnitrace --print-format option
- omnitrace --load-instr option
- omnitrace runtime-inst doesn't oneTimeCode
- updated regex
- expand get_whole_function_names()
- Test Install CI update

* fixes to last commit

- expand get_whole_function_names()
- ignore sig c modules
- kill process in signal handler

* Remove RTLD_DEEPBIND + more

- removed use of RTLD_DEEPBIND
  - causes dyninst segfaults
- fixed signal handling
- updated timemory submodule

* Build/link static timemory libraries

* omnitrace --{module,function}-restrict option

- Added restrict regex options
- Reworked handling of regex options
- Reworked reporting of module/function skipping
- Handle -o w/o file specified

* timemory-avail

- category views
- backtrace::sample checks state

* get_debug_sampling()

[ROCm/rocprofiler-systems commit: 145a6ae06f]
2022-02-23 06:59:32 -06:00
Jonathan R. Madsen b99b153030 Critical trace updates (#24)
* Source code restructuring

* Critical trace updates following restructuring

* thread_sampler, timestamps

- thread_sampler
- CPU frequency managed via thread_sampler
- rocm-smi managed via thread_sampler
- Use consistent timestamps for perfetto
- removed hsa_timer_t in favor of wall_clock::record()
- disable KokkosP by default
- re-enable critical-trace testing

* cmake-format

* Fix for defines.hpp.in

* Remove OMNITRACE_ROCM_SMI_FREQ

- thread_sampler freq is set via OMNITRACE_SAMPLING_FREQ w/ max of 1000

* Increase CI Install Dyninst timeout

* Debug macros + omnitrace_init_tooling + config

- new debug macros
- extern "C" omnitrace_init_tooling
- guard get_rocm_smi_devices

* Miscellaneous tweaks

- tweak to transpose
- critical_trace::Device::ANY
- perfetto "critical-trace" category
- OMNITRACE_VERBOSE usage

* Disable key and tid data for HIP API calls

- non-kernels are ignored in activity callback

* critical-trace exe updates

- fix perfetto generation
- improved logging
- improved readability

* timemory submodule update

- lulesh example cmake tweaks

[ROCm/rocprofiler-systems commit: b016c8929f]
2022-02-19 02:00:59 -06:00
Jonathan R. Madsen 4ae26e2d08 rocm-smi and KokkosTools support (#23)
* renamed omnitrace_thread_data to thread_data

* initial implementation

* Numerous fixes and updates

- Updated timemory submodule
- Updated perfetto submodule (pulls in fixes for TRACE_EVENT)
- pthread_gotcha only after omnitrace_init_tooling
- omnitrace banner
- config settings for rocm-smi freq and devices
- critical_trace::get_entries
- OMNITRACE_BASIC_PRINT
- rocm_smi perfetto category
- redirect roctracer warnings for ROCm 4.5.0
- property specializations for rocm-smi components
- units fixes data_tracker types
- roctracer entries for pthread_create and start_thread
- omnitrace-avail defaults to settings, not components
- settings have conforming names
- settings warn about duplicates
- ptl named threads
- decreased max freq for sampler SIGALRM
- rocm-smi names thread
- rocm-smi avoids call to hipGetDeviceCount
- name roctracer activity callback threads
- fixed binary rewrite test output names

* Update lulesh example

- supports non-UVM GPU

* Lulesh tweaks + formatting

* KokkosP + Mode + Roctracer sampling deadlock fix

- kokkosp support
- omnitrace_init_library
- config::print_settings()
- config::get_mode()
- omnitrace::Mode
- omnitrace-avail improvements (removes settings)
- handle get_verbose() < 0
- disable dyninst InstrStackFrames by default
- handle perf_event_paranoid > 1 by disabling PAPI
- SIGALRM max freq to 5.0
- Name threads
- rocm-smi handles get_use_perfetto() and get_use_timemory()
- HSA_ENABLE_INTERRUPT=0 when roctracer + sampling (fixes deadlock)

* Tests, API renaming, roctracer

- disable renaming of thread 0
- verbprintf_bare
- enable dyninst merge tramp
- tweaked some omnitrace exe verbose levels
- reworked roctracer::setup and roctracer::shutdown
- rocm_smi::data::poll checks get_state()
- omnitrace_trace_finalize -> omnitrace_finalize
- omnitrace_trace_init -> omnitrace_init
- omnitrace_trace_set_env -> omnitrace_set_env
- omnitrace_trace_set_mpi -> omnitrace_set_mpi
- sampling mode does not disable timemory
- disable roctracer before shutting down rocm-smi
- lulesh tests w/ and w/o kokkosp
- lulesh tests for perfetto only
    - with --dynamic-callsites --traps --allow-overlapping
- lulesh tests for timemory only
    - with --stdlib --dynamic-callsites --traps --allow-overlapping

* Update timemory submodule

- fix for TIMEMORY_PROPERTY_SPECIALIZATION

* get_verbose() handling + timemory submodule update

- Findroctracer.cmake uses find_package(hsakmt)

* Stability fixes + rework roctracer + perfetto

- reworked roctracer start up
- critical_trace perfetto basic values
- perfetto sampling category
- sampler checks signals
- peak_rss in sampling
- pthread_gotcha::shutdown()
- rocm_smi::device_count()
- HSA_TOOLS_LIB is set
- HSA_ENABLE_INTERRUPT in omnitrace exe
- omnitrace exe verbosity level changes
- Avoid instrumenting Impl ns in Kokkos
- gpu::device_count prefers rocm_smi instead of hip
- ptl blocks signals
- fixed pthread_gotcha roctracer_data values
- removed runtime-instrument-sampling tests
- timemory submodule update

* cmake formatting

* timemory + roctracer updates

- fix timemory issue with papi_common
- fix timemory issue with units
- define roctracer::is_setup()

* Miscellaneous tweaks

- Disable sampling during runtime instrument
- Fixed warnings about dynamic callsites
- Fixed backtrace output when timemory disabled
- Test tweaks

* cmake-format

* omnitrace_target_compile_definitions

* timemory submodule update

* config, omnitrace, State, mpi_gotcha updates

- use OMNITRACE_THROW instead of direct throw
- is_attached()
- is_binary_rewrite()
- get_is_continuous_integration()
- get_debug_init()
- get_debug_finalize()
- max_thread_bookmarks default to 1
- State::Init
- app_thread oneTimeCode
- runtime instrumentation uses waitpid
- fixed init_names
- include main in MPI runs
- fixed sampling setup when disabled
- reworked mpi_gotcha
- disabled critical trace in transpose test

* cmake-format

* handle rocm_smi::device_count() exception

* CI timeouts

* Re-enable runtime-instrument + sampling

[ROCm/rocprofiler-systems commit: 39f17ae8b8]
2022-02-08 17:42:17 -06:00
Jonathan R. Madsen b4a82711d1 Sampler improvements (#22)
* Sampler improvements

- roctracer_flush_activity
- papi_array in backtrace
- fixed sampler trait specializations
- split main_bundle into main and gotcha bundles
- cmake option display

* timemory update

* EINTR handling + debug_{pid,tid}

- sampler handles EINTR for sem_init and sem_destroy
- OMNITRACE_DEBUG_{TIDS,PIDS} env variables

* Increase waitForStatusChange

[ROCm/rocprofiler-systems commit: eccba14f00]
2022-01-27 21:31:08 -06:00
Jonathan R. Madsen ebb29ac54a Miscellaneous updates (#21)
* Miscellaneous updates

- Updated README
- Updated VERSION
- Header include tweaks
- get_verbose() + get_verbose_env()
- fixes to omnitrace-avail
    - exclude all cuda/cupti settings
    - apply available_only to hw counters
- config file warnings
- config displayed at verbose > 0
- fix to MPI_Finalize when only using MPI headers

* Updated LICENSE

* CPack tweak

[ROCm/rocprofiler-systems commit: 8648410309]
2022-01-26 23:25:00 -06:00
Jonathan R. Madsen adb6e94de5 MPI gotcha updates (#20)
* MPI gotcha updates

* Release script updates

- build for ubuntu focal and bionic
- use OpenMPI for OMNITRACE_USE_MPI_HEADERS

* build-release.sh updates

[ROCm/rocprofiler-systems commit: a546949ff4]
2022-01-26 15:02:53 -06:00
Jonathan R. Madsen f17ff12a66 Sampling support + testing + omnitrace namespace (#19)
* omnitrace namespace

* Kokkos + Lulesh example/tests

* Sampling support + more

- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options

* Release workflow updates

* Updated settings printing

* Fixed defaults in README

* Tweak setting defaults in README

* CMake fixes

* cmake-format

* clang-format

* LULESH_USE_MPI OFF

* LULESH_USE_MPI fix

* timemory add_secondary fix

* timemory ambiguous internal namespace fix

* Update timemory submodule

* Handle output path/prefix in omnitrace

- updated timemory
- updated test environment

* sampling + papi fix

* Fix to sampling without PAPI

* Fix for using too many processors in CI

* formatting

* Updated CI

- minor cmake tweaks
- updated timemory submodule

* Updated CI

* Updated CI

* CI + timemory updates

- data race fixes

* CI updates + debug for sampling

* Sampling updates

- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality

* Minimum OMNITRACE_THREAD_COUNT of 128

* Handle multiple dims in sampler data

* Configure libunwind support for timemory

* Improved safeguards for sampling

- updated CI
- lulesh runtime-instrument test tweak

* formatting

* CI updates + sampler updates + misc

- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default

* Updated timemory submodule

- hidden visibility for timemory
- storage finalizers do not capture this

* Update timemory submodule

- component visibility updates

* Reworked header includes

- use <...> for timemory headers
- always include <library/defines.hpp>

* Rename some config options

* Update PTL submodule

* Update kokkos submodule

* Updated sampling

* Updated CI

* Reworked instrumentation exe

- lowered min-address-range threshold to 256
- extended whole function exclude

* CI fix + timemory submodule update

- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead

* Sampling flags + transpose update + CI update

- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads

* CI update

- removed ubuntu-focal-external-debug
- reduced data artifacts upload

* CI timeouts

- updated timemory submodule
- minor tweaks to omnitrace exe logging

* LICENSE updates (partial)

* CI Test stage timeout extension

* Docker and Packaging updates

* Miscellaneous fixes/tweaks

- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix

* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate

* cmake format

* Sampler deadlock fixes

* Removed debug messages from sampler

* Fix for MPI detection + test tweaks + misc

* Sampler deadlock fixes + misc

- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
    - no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY

* omnitrace-avail + libunwind update + restructure

- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace

* Fix remaining reorganization issues

- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting

* ensure_storage fix + avail improvements

- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail

* Delay settings initialization

- slight tweak to tests w/ MPI

* Disable OpenMPI testing w/ ubuntu-bionic

- MPI testing is hanging bc of network interface issue on system:

> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
>   Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.

[ROCm/rocprofiler-systems commit: 778af2a760]
2022-01-24 20:49:17 -06:00
Jonathan R. Madsen 9d5ebf9c3b Rename hosttrace to omnitrace (#18)
[ROCm/rocprofiler-systems commit: 39cf760a4e]
2021-11-24 04:59:59 -06:00
Jonathan R. Madsen efb6d766af Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16)

* Integrates roctracer values into wall-clock

* Fixed scoping + timemory roctracer

* Fixed data race in roctracer

* Synchronized HIP API on main thread

- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose

* Debugging + MPI + transpose updates

* PTL + HSA and timemory + kernel timing

- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks

* Ignore select HIP API types

- The ignored API types are ignored because there appears to be a bug
  which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory

* Tweaks to PTL config

* Timemory update + pid-prefix w/ mpi headers

- %pid%- prefix with mpi headers
- timemory submodule update

* CMake + critical trace + reorganize library source

- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace

* Remove bits/stdint-uintn.h headers

* Rename + config + depth + critical path

- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation

* Fixed duplicates in perfetto critical-trace

* reworked critical trace support

- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)

* Removed "%pid%-" output prefix in mpi_gotcha

* Update timemory submodule

[ROCm/rocprofiler-systems commit: 752424efc2]
2021-11-23 02:53:14 -06:00
Jonathan R. Madsen cdd2707058 v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh (#15)
* v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh

- bumped to v0.0.3
- enabled gotcha wrapping of MPI functions w/o enabling MPI
- added setup-env.sh script
- minor updates to testing
- Update timemory submodule
- fixes tim::component::configure_mpip undefined symbol
- Script updates

[ROCm/rocprofiler-systems commit: c56b49a0bd]
2021-10-01 16:46:03 -05:00
Jonathan R. Madsen 2cc8005680 Release 0.0.2 (#14)
* Fixed Dyninst TBB symbolic links + bump to v0.0.2

* hosttrace exe and library updates + submodule updates

- Updated dyninst submodule with TBB build ORIGIN rpath
- Updated timemory submodule
- Dyninst build with shared libs
- Dockerfile for building packaging
- Disable hidden viz in examples
- parallel-overhead max parallelism
- query_instr in hosttrace
- different file-line info format
- full module names
- minor fix to MPI support
- disable instrumention stack frames by default
- disable trap instrumentation by default
- updated hosttrace output file dumps
- removed cstdlib option
- dyninst DebugParsing option
- improved instrument_module function
- fixed some MPI support
- tweaked some testing parameters

[ROCm/rocprofiler-systems commit: 02f59ec9dc]
2021-09-29 18:16:13 -05:00
Jonathan R. Madsen f3e7a1664a cmake-format + miscellaneous tweaks (#13)
* cmake-format + miscellaneous tweaks
* Formatted cmake in examples and tests
* Updated linux-ci.yml artifacts naming
* Updated clang-format
* Fixed submodule branches

[ROCm/rocprofiler-systems commit: 6c93674f92]
2021-09-20 11:12:06 -05:00
Jonathan R. Madsen 8bda450acf Updates timemory and dyninst submodules (#12)
* Updates timemory and dyninst submodules

[ROCm/rocprofiler-systems commit: e6ead5ab3f]
2021-09-17 21:02:30 -05:00
Jonathan R. Madsen 60145cd5c4 GitHub CI (#11)
* Continuous integration

* linux-ci on

* fix for parallel-overhead

* Updated CMAKE_INSTALL_PREFIX

* Updated installed boost libraries

* Dyninst updates fixing TBB install

* timemory + dyninst submodule updates

- fixes some timemory package option handling
- fixes dyninst libiberty handling

* Update dyninst submodule with libiberty fix

* dyninst submodule update with TBB internal build fixes

* Updated linux-ci + tests + timemeory + dyninst

- updated timemory submodule
- update dyninst submodule
- delay OnLoad implementation
- DYNINST_RT_API handling improvement
- CI for ubuntu-bionic in addition to focal
- CTest in CI

* Update TBB handling in CI

* Update dyninst with symLite fix

* Update dyninst submodule with ElfUtils-External fix

* Dyninst::ElfUtils fix

* Modified dyninst submodule TPL install

* Update dyninst submodule with improved interface libraries

* Fix to Dyninst::ElfUtils in dyninst submodule

* Updated CI build matrix + test install

* Updated CI

* CI stage updates

* Tweak

* Dyninst updates + RPATH for hosttrace exe

- hosttrace will rpath to dyninst
- Dyninst will statically link to boost by default
- Fixes to double init w/ MPI + runtime instrumentation
- minor cleanup to hosttrace
- hosttrace help exits with zero
- CI updates

* Dyninst + CI updates

* Removed ELFUTILS_BUILD_STATIC option in Dyninst

* Dyninst + visibility + roctracer + tests

- Dyninst submodule updates with dynamic pcontrol lib
- hosttrace visibility is hidden
- improved handling of DYNINST_API_RT in hosttrace
- roctracer::tear_down in finalize fixes issues with timemory
- throw error if cannot create perfetto output file
- roctracer_default_pool() safety guards
- resume calling roctracer_close_pool()
- fixed working dir of tests
- fixed env of tests (cmake and CI)
- simplified CI

* Removed stray CI if condition

* Disable hidden visibility for now

* ifdef for roctracer::tear_down + reenable hidden

* Better CI environment handling + dyninst packaging

* Fix for dyninst-package CI

* Fixes for cmake 3.15 issues with aliasing imported lib

* Fix to ubuntu-focal-dyninst-package

* Dyninst updates for packaging

- fixes issues with hard-coded paths to libraries after relocation via package installer

* Restrict CI to main and develop branches

[ROCm/rocprofiler-systems commit: 35ab6b0110]
2021-09-15 16:45:49 -05:00
Jonathan R. Madsen 1e34cb9b27 provide copy function for MPI_Comm_create_keyval for OpenMPI (#10)
[ROCm/rocprofiler-systems commit: ea60acc4b6]
2021-09-09 16:42:20 -05:00
Jonathan R. Madsen 20fe25006b Use TBB_INCLUDE_DIR instead of TBB_INCLUDE_DIRS (#9)
- fix to const int issue with mpi::comm_type_shared_v

[ROCm/rocprofiler-systems commit: e70a9e994d]
2021-09-09 16:34:01 -05:00
Jonathan R. Madsen cee9d21a57 dyninst submodule + non-papi build fix (#8)
* dyninst submodule + non-papi build fix
* Fix to checkout_git_submodule
* Append CMAKE_PREFIX_PATH with /opt/rocm

[ROCm/rocprofiler-systems commit: d2d1a80255]
2021-09-09 15:51:07 -05:00
Jonathan R. Madsen 98c37cdd3e Improve finding TBB (#7)
* Improve finding TBB
* Quiet TBB find

[ROCm/rocprofiler-systems commit: f23bc8fc19]
2021-09-09 14:10:42 -05:00