Граф коммитов

87 Коммитов

Автор SHA1 Сообщение Дата
Jonathan R. Madsen 219b2e988e Workflow, submodules, and thread info Updates (#352)
* Update CI workflows

- use node20 workflow packages

* Update tests/source/CMakeLists.txt

- Use OMNITRACE_TRACE and OMNTRACE_PROFILE instead of perfetto/timemory

* Update timemory submodule

- argparse: requires -> required
- parse callbacks

* Update thread_info.cpp

- fix causal::delay::get_local usage

* Update timemory submodule

* Update kokkos submodule

- release 3.7.02

* Revert opensuse.yml and ubuntu-bionic.yml to use node16 workflows

* Update docs.yml
2024-06-20 17:47:31 -05:00
Jonathan R. Madsen 9499e2f521 Remove Critical Trace Support (#327)
* Delete core critical-trace files

* Update docs and README

* Update workflows

* Update testing

* Update cmake

* Remove critical trace usage in source code

* Update source/docs/critical_trace.md

- fix spelling

* Formatting

* Update bin/omnitrace-avail/avail.cpp

- statically allocate shared pointers for timemory manager and hash id/aliases to prevent use-after-free errors
2024-04-23 09:35:44 -05:00
David Galiffi b81db80926 Updated links to point to the ROCm organization. (#337)
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2024-04-23 01:44:02 -05:00
Jonathan R. Madsen 1df597e049 Installers for RHEL 8.8 and RHEL 8.9 (#334)
* Installers for RHEL 8.8 and RHEL 8.9

- RHEL 8.8: Supports ROCm 5.6, 5.7, 6.0
- RHEL 8.9: Supports ROCm 6.0

* Update build-docker.sh

- fix PERL_REPO for OpenSUSE

* Update some workflows to use Node.js 20

* Fix Dockerfile.opensuse*
2024-04-01 13:31:10 -05:00
Jonathan R. Madsen cfaace38a8 ROCm 6.0 packaging (#325)
* Update build-docker.sh

- support rocm 6.0

* Update cpack workflow

- support rocm 6.0

* Update CI testing workflow paths-ignore

- changes to {docs,cpack,containers,formatting}.yml and docker do not require testing

* Update docker for OpenSUSE

- always use --non-interactive with zypper
- tweak to PERL_REPO when OS version >= 15.4
2024-01-10 17:29:47 -06:00
Ben Richard 5de4163d66 Deprecate OMNITRACE_USE_PERFETTO, OMNITRACE_USE_TIMEMORY (#306)
* Rename OMNITRACE_USE_PERFETTO to OMNITRACE_TRACE

* Rename OMNITRACE_USE_TIMEMORY to OMNITRACE_PROFILE

* Revert change to Perfetto.cmake

* Fix formatting

clang-format-11 was complaining about formatting
2024-01-10 07:20:54 -06:00
Jonathan R. Madsen 2e581e2a10 Installers for ROCm 5.7, Python 3.12 + Remove DEB and RPM installers (#313)
* Dockerfile update

- Python 3.12 support

* Bump version to 1.10.4

* Update docker scripts

- Support Python 3.12
- Set RETRY to 1 if less than 1
- Support ROCm 5.7

* Update scripts/build-release.sh

- Default to python 3.6-3.12 (i.e. add Python 3.12)

* Update cpack workflow

- Packaging for ROCm 5.7
  - Ubuntu 20.04
  - Ubuntu 22.04
  - OpenSUSE 15.4
  - RHEL 8.7
  - RHEL 9.1
- Packaging for older ROCms (by request)
  - RHEL 8.7 + ROCm 5.3
  - OpenSUSE 15.3 + ROCm 5.2
  - OpenSUSE 15.4 + ROCm 5.2
  - OpenSUSE 15.4 + ROCm 5.3
- Remove DEB and RPM installers
  - Only generate STGZ installers

* Update cpack workflow

- disable uploading DEB and RPM artifacts
2023-10-16 10:39:18 -05:00
Jonathan R. Madsen 0d84a357e5 Clean package manager caches in CI (#305)
* Clean package manager caches in CI

* remove cpack docker prune

* Apply suggestions from code review

* Apply suggestions from code review

* Update .github/workflows/cpack.yml

- disable removing large packages and swap storage to see if that fixes the jobs losing communication with server

* Update ubuntu-focal workflow

- disable debug flags in code coverage run
2023-09-26 15:44:49 -05:00
Jonathan R. Madsen 1216fd99a7 Reduce release packaging (#300)
* Bump version to 1.10.3

* Drop releases for ROCm < 5.3

- ROCm is no longer providing release for Ubuntu 18.04 starting with 5.3 so omnitrace is dropping support for Ubuntu 18.04 + ROCm
- Dropping ROCm 5.2 releases for Ubuntu 20.04
- Dropping ROCm 5.2 releases for OpenSUSE 15.4

* Update redhat workflow

- Test RedHat 9.1 + ROCm 5.5
- Test RedHat 9.1 + ROCm 5.6

* Update ubuntu-focal workflow

- drop ROCm 5.2 testing
- add ROCm 5.6 testing

* Update Findroctracer.cmake

- provide /opt/amdgpu to HINTS/PATHS for drm and drm_amdgpu libraries

* Update Findrocprofiler.cmake

- prefer librocprofiler64.so.1

* Update librocprofiler64.so to librocprofiler64.so.1

- search for the SOVERSION 1 library of librocprofiler64.so if ROCm > 5.5.0

* Update Findrocprofiler.cmake

- link to libpciaccess for ROCm 5.5.0

* Update redhat CI workflow

- install libpciaccess for rocm CI

* Update cpack workflow

- Remove all RHEL 9.0 packaging
- Remove all packaging for ROCm 5.3 on OSes supporting where releases are provided for 5.4, 5.5, and 5.6

* Update ubuntu focal workflow

- remove rocm 5.3 jobs
2023-09-12 20:19:26 -05:00
Jonathan R. Madsen 0b751d2aef Packaging for ROCm 5.6 (#299)
* Packaging for ROCm 5.6

- Bump version to 1.10.2
- build rocm 5.6 containers for ubuntu 20.04, 22.04
- build rocm 5.6 containers for opensuse 15.4
- build rocm 5.5 and 5.6 for rhel 8.7, 9.0, 9.1
- cpack rocm 5.6 for ubuntu 20.04, ubuntu 22.04, opensuse 15.4, rhel 8.7, rhel 9.0, rhel 9.1

* Update omnitrace.cfg

- remove file_write_period_ms
- remove flush_period_ms

* Remove ROCm 5.6 for RHEL 9.0

- no packaging support
2023-08-09 18:59:45 -05:00
Jonathan R. Madsen 693f753a9e GitHub Actions workflow and docker updates (#290)
* Support ROCm 5.5 in docker

* Update containers workflow

- add Ubuntu and OpenSUSE container builds for ROCm 5.4 and 5.5
- add RHEL builds

* Update cpack workflow

- build on PR against main when cpack.yml or docker files updated
- removed packaging for ROCm < 5.2 for many OSes
- added packaging for ROCm 5.5

* Update OpenSUSE workflow

- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails

* Update RedHat workflow

- provide run-name
- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails

* Update Ubuntu (Bionic) workflow

- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails

* Update Ubuntu (Focal) workflow

- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails
- remove testing of ROCm 4.3, 5.0, 5.1
- add testing of ROCm 5.5

* Update Ubuntu (Jammy) workflow

- add python 3.11 to OMNITRACE_PYTHON_ENVS
- upload-artifacts name includes strategy.job-index (prevent overwrite)
- only upload artifacts on failure
- continue on error if upload artifacts fails
- add testing of ROCm latest

* Dockerfile.{rhel,opensuse} update

- remove use of amdgpu-install in favor of installing rocm-dev package
  - In ROCm 5.5, amdgpu-install changed meaning of --usecase=rocm (added rocmdev use case)

* redhat workflow update

- remove use of amdgpu-install in favor of installing rocm-dev package
  - In ROCm 5.5, amdgpu-install changed meaning of --usecase=rocm (added rocmdev use case)

* build-docker.sh update

- add '--progress plain' to docker build commands

* Ubuntu (jammy) workflow update

- fix rocm installation

* Update Dockerfile.rhel

- add LIBRARY_PATH for /opt/amdgpu/lib64 for redhat

* Update Dockerfile.rhel

- install libpciaccess for rocm
2023-06-20 06:26:17 -05:00
Jonathan R. Madsen 2c684de367 Install rocm-dev for hipcc in ubuntu-jammy workflow (#285)
Install rocm-dev for hipcc in ubuntu-jammy workflow
2023-06-15 15:58:32 -05:00
Jonathan R. Madsen 2ce0cb4a19 Remove docker hiplibsdk from amdgpu-install (#283)
Remove docker hiplibsdk from amdgpu-install

- amdgpu-install use case hiplibsdk is not necessary and bloats the install
- same as above for package rocm-hip-sdk
2023-06-14 15:09:35 -05:00
Jonathan R. Madsen ad51223960 Fix RHEL docker containers (#282)
* Fix RHEL docker containers

- avoid `yum update` since that can update the distro minor version
2023-06-14 13:18:43 -05:00
Jonathan R. Madsen 9de3a6b0b4 Linux Perf Support + Causal Profiling Updates (#276)
* causal backtrace updates

- fix initial causal sampling period value

* causal delay updates

- tweak handling of sleep_for_overhead

* Fix experiment global scaling for prog pts

- results in drastically improved predictions

* pthread_mutex_gotcha updates

- disable all wrappers during causal profiling

* validate-causal-json.py updates

- support decimal stddev
- fix setting stddev from command-line

* causal perform_experiment_impl update

- handle start failing because finalizing

* deprecate causal::component::sample_rate

- appears to not help at all

* Rework sample info

* Increase causal unwind_depth

- use OMNITRACE_MAX_UNWIND_DEPTH

* validate-causal-json updates

- min experiments
  - exclude reporting predictions with less than X experiments at a given speedup
- percent samples
  - only print samples within X% of the peak (default: 95%)

* Update timemory submodule

- extensions to sampling for signals delivered via non-timer method
  - e.g. via HW counter overflow

* dwarf_entry::operator< updates

- sort via file

* causal profiling docs updates

- info about backends
- info about installing/enabling perf

* config updates: causal backend

- CausalBackend enum
- OMNITRACE_CAUSAL_BACKEND: perf, timer, auto
- omnitrace-causal option: --backend

* debug update

- use spin_mutex instead of std::mutex

* address_range::contains update

- range from 0-100 contains range from 10-100 but was returning false because high was == 100 not < 100

* symbol::operator< update

- handle load address differences

* sampling updates (non-causal)

- update get_timer to get_trigger + dynamic_cast

* container::static_vector updates

- support construction from container::c_array
- update_size private member func for handling atomic m_size

* Move perf files

- moved library/causal/perf.{hpp,cpp} to library/perf.{hpp,cpp}

* causal example update

- created impl.hpp (forward decls)
- renamed {cpu,rng}_func_impl to {cpu,rng}_impl_func
- only create two threads which run N iterations instead of two threads each iteration

* Update timemory submodule

- updates to unwind::processed_entry
- updates to procfs::maps

* Updated causal documentation

- fixed line numbers changed by modifications to causal example

* omnitrace-causal exe updates

- set OMNITRACE_THREAD_POOL_SIZE to zero by default

* core/containers updates

- static_vector: provide data() member function
- c_array pop_front() and pop_back() member functions

* core: config and argparse updates + perf

- core/perf.{hpp,cpp}
  - forward decl of enums
  - config-related capabilities
- argparse: --sample-overflow
- renamed some config functions
  - e.g. get_sampling_cpu_freq -> get_sampling_cputime_freq
- added config settings related to overflow sampling via perf
- added timer_sampling and overflow_sampling categories

* Update timemory submodule

- sampling allocator flushing

* binary updates

- lookup_ipaddr_entry
- use bfd_find_nearest_line instead of bfd_find_nearest_line_discriminator
  - discriminators are not used
- explicit instantiations of inlined_symbol::serialize

* Bump VERSION to 1.10.0

* sampling and perf updates

- support overflow sampling via Linux Perf
- update perf namespace
- update perf::perf_event
  - update record ctor: pointer instead of const ref
  - update open member func: return optional string
  - add m_batch_size member variable
- sampling updates
  - support overflow sampling
  - flush allocators
  - increase buffer size from 1024 to 2048
  - restructure post-processing in light of perf overflow supports
  - improve offload memory usage only load buffers for thread
  - load_offload_buffer(tid) uses thread-specific filepos
- component updates
  - backtrace_metrics::operator-=
  - backtrace_metrics::operator-
  - backtrace::sample does not record for overflow signal
  - callchain: perf overflow sample

* core updates

- component::sampling_percent does not report self + uses_percent_units

* causal updates

- tweak get_line_info
- overloads for set_current_selection (uint64_t, c_array, std::array)
- delay
  - use sampling::pause/sampling::resume
- experiment
  - experiment::sample derives from unwind::processed_entry
  - experiment::samples is vector instead of set
  - fixed samples
  - overloads for is_selected (uint64_t, c_array, std::array)
  - scaling factor defaults to 100 instead of 50
  - serialize updates follow change to experiment::sample
  - modify algorithm for increasing/decreasing experiment length
- sample_data
  - use map<uintptr, uint64_t> instead of set<sample_data>
  - get_samples returns vector<sample_data> instead of set<sample_data>
- sampling
  - support overflow via Linux Perf
  - update causal_offload_buffer
  - flush sampling allocator
- backtrace
  - overflow component

* libomnitrace-dl updates

- handle dl::InstrumentMode::PythonProfile

* testing updates (causal)

- causal line 155 -> causal line 100
- causal line 165 -> causal line 110

* formatting

* exit_gotcha updates

- exit_info for abort()
- message about non-zero exit code

* testing updates

- fail regex for causal tests
- validate-causal-json: >= min_experiments instead of > min_experiments
- handle OMNITRACE_DEBUG_SETTINGS in omnitrace_write_test_config

* causal sampling updates

- add new lines where appropriate

* causal data updates

- reorder diagnostic info when experiment fails to start

* binary updates

- symbol address range from address to address + symsize + 1
  - add 1 based on debug info

* causal data updates

- sample_selection wait_ns defaults to 1,000 instead of 10,000
- sample_selection wait scaled by iteration number
- save_line_info_impl verbosity
- print latest_eligible_pc when experiment does not start

* causal sampling + component updates

- perf backend disables component::backtrace
- ensure get_sampling_(realtime|cputime|overflow)_signal do not malloc

* causal: remove period stats

* validate-causal-json update

- fix --help

* causal data updates

- improve eligible pc history reporting when experiment fails to start

* causal data updates

- fix compute_eligible_lines_impl
  - eligible address ranges returning too many ranges
  - occasionally, overwrite all *true* eligible address ranges

* causal data updates

- reduce scoped ranges to symbol ranges
- is_eligible_address() returns true contains (not just coarse)
- revert some sample_selection behavior

* binary address_multirange updates

- make coarse_range private
- fix operator+=(pair<coarse, uintptr_t>)

* causal example update

- fix nsync to default to once per iteration

* binary analysis updates

- tweak header file includes

* causal updates

- remove factoring in sleep_for_overhead
- invoke delay::process() even if experiment is not active

* causal data updates

- update latest_eligible_pc structure

* update omnitrace-install.py.in

- fix support for fedora
  - /etc/os-release does not have ID_LIKE
  - fallback to RHEL 8.7 if version not specified

* update omnitrace-install.py.in

- fix support for debian
  - /etc/os-release does not have ID_LIKE
  - version mapping

* Update documentation

- update docs on installation

* causal data and experiment updates

- data: reset_sample_selection

* causal set_current_selection debugging

- debug messages for failed e2e runs

* causal data and backtrace component updates

- data: set_current_selection returns the number of eligible addresses added
- backtrace: if cputime signal has selected zero IPs > 5x, then realtime signal starts contributing call-stacks

* core library updates

- move config::parse_numeric_range to utility namespace
- add core/utility.cpp
- support range:increment, e.g. 5-25:10 expands to '5 15 25' instead of '5 10 15 20 25'

* omnitrace-causal update

- end-to-end expands all speedups
- support range:increment in speedups

* causal backtrace updates

- remove select_ival (realtime signal always contributes when select_count == 0)

* containers: static_vector update

- explicit c_array constructor
- explicit std::array constructor

* causal data updates

- remove set_current_selection(uint64_t)
- remove set_current_selection(std::array)
- sample_selection increase default wait time
- report eligible PC candidates
- move reset_sample_selection to perform_experiment_impl
- decrease latest_eligible_pc array size
- set_current_selection does not guard for experiment::active

* core debug updates

- OMNITRACE_PRINT_COLOR macros

* causal data updates

- tweak to experiment never started message

* causal gotcha updates

- remove unused code

* critical trace updates

- remove unused code

* omnitrace-causal

- OMNITRACE_LAUNCHER

* causal data updates

- don't fail on end-to-end + omnitrace-causal

* causal backtrace updates

- reintroduce select_ival behavior

* causal data updates

- tweak verbose messages about number of PC candidates

* core mproc updates

- utilities for waiting on child PID and diagnosing status
  - omnitrace::mproc::wait_pid
  - omnitrace::mproc::diagnose_status

* omnitrace-run updates

- support --fork argument for executing via fork in current process + execvpe on child instead of execvpe in current process

* omnitrace-causal updates

- wait_pid and diagnose_status just call equivalent functions in omnitrace::mproc

* ubuntu-focal workflow update

- attempt to launch ubuntu-focal-codecov job with CAP_SYS_ADMIN and use perf backend

* tests reorg and updates

- remove binary-rewrite-sampling and runtime-instrument-sampling tests
- rename *-preload tests (which use omnitrace-sample exe) to *-sampling
- split tests/CMakeLists.txt into several tests/omnitrace-<category>-tests.cmake files
- tweak to causal-both-omni-func test
  - add args: -n 2 -b timer

* update validate-causal-json.py

- better reasoning info for adjusting tolerance
- always apply tolerance adjustments in CI mode

* causal e2e tests update

- add label "causal-e2e" label
- tweak params
  - old: 80 12 432525 500000000
  - new: 80 50 432525 100000000
- disable processor affinity for slow-func/line-100 tests
  - artificially inflates some speedups with perf

* unblocking_gotcha updates

- overload operator() according to gotcha function index

* blocking_gotcha updates

- overload operator() according to gotcha function index
- fix bug where potentially post block functors (e.g. pthread_mutex_trylock) throw error if lock is not acquired.

* parse_numeric_range update

- support unordered_set

* config update

- OMNITRACE_DEBUG_{TIDS,PIDS} use parse_numeric_range
2023-04-13 02:14:35 -05:00
Jonathan R. Madsen cc14b52584 Casual Profiling GUI (#265)
* Long workload name layout fix (#269)

* changed layout to fit experiment names

- added span to show full name
- shortened name to fit dropwdown
- changed layout for added consistency

* layout Fixes

- refresh button is to the right
- header is more consistent across different width screens

* header layout update

- center div makes turns into multiple lines if not all items fit

* slight improvement for header/graph spacing

* Fixed refresh button shape and function

- moved find_causal_files to parser so that main and gui can access
- resized refresh width to allow for same shape across different screens

* all graphs now have the same width

- graphs now have same width

- chart headers start well below the header with filters

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Causal GUI: Linting, synced y-range, remove unused imports/variables, and bug fixes (#274)

* GUI: python linting workflow

- runs flake8 on code in source/python/gui

* GUI: flake8 settings in source/python/gui/setup.cfg

- ignore E501 errors (line too long)
- ignore W503 errors (line break before binary operator)

* GUI: setup.py updates

- remove unused imports

* GUI: __main__.py updates

- stddev is float value
- remove unused imports
- effectively propagate the --stddev argument

* GUI: gui.py updates

- remove unused imports
- fix light mode
- sync initial y range of all causal plots
- fix error bars for causal data
- set x-ticks to 5
- set y-ticks to 10
- only display top 99% of samples
- separate global declarations and assignments to global values
- remove unused variable assignments
- fix mislabeled function_regex and exp_regex
- change if X == False to if not X

* GUI: header.py updates

- remove unused imports
- fix mislabeled function_regex and exp_regex

* GUI: parser.py updates

- add set_num_stddev function for manipulating global num_stddev value
- remove unused variables
- fix latency point object (duplicated __init__ function)
- fix handling latency in JSON
- fix formatting of validation format error message
- replace if X == False with if not X
- fix unused dataframe creation in add_latency
- fix flake8 do not assign lambda for name_wo_ext (use def)

* GUI: gui.py updates

- replace misnamed "func_list" with "experiment_list"
- replace misnamed "exp_list" with "progpt_list"

* GUI: fix python workflow

- quote python versions to avoid truncating 3.10 to 3.1

---------

Co-authored-by: JoseSantosAMD <87447437+JoseSantosAMD@users.noreply.github.com>
2023-04-11 23:36:24 -05:00
Jonathan R. Madsen abe35de43a omnitrace-run executable - required for running binary writes (#257)
* omnitrace-run exe

- ensure LD_PRELOAD for libomnitrace-dl.so
- convert config options into command-line options

* Update timemory submodule

- updates to tsettings
- updates to argparser

* common environment update

- throw error if get_env<bool> has empty string

* config updates

- minor tweaks to categories of settings

* core lib update

- add argparse for common handling of argument parsers

* omnitrace-sample update

- fix handling of --trace-file (OMNITRACE_PERFETTO_FILE)

* omnitrace-run update

- updated to use omnitrace::argparse functions

* Tests for omnitrace-run

* argparse core update

- remove choices for --cpu-events and --gpu-events

* remove some debugging prints

* fix timemory include in argparse.cpp

* always provide --hsa-interrupt option

* Update source/lib/core/argparse.cpp

- fix pedantic warning

* Update testing

- remove testing args that may not be there in some builds

* roctracer/pthread_create fix

- disable roctracer_data when roctracer not enabled

* omnitrace-causal tweak

* omnitrace-instrument: module_function tweak

- allow DEFAULT_MODULE and LIBRARY_MODULE

* common environment update

- support get_env for enums

* core: config update

- Add "mode" category to OMNITRACE_MODE

* Update timemory submodule

- remove debug print statement

* omnitrace-sample tweak

- change var init

* omnitrace-run testing update

- use --help instead of -?

* core: common.hpp

- tweak header include style

* core: argparser update

- add_ld_preload func
- launcher and command member variables in parser_data
- support launcher

* omnitrace-run update

- clean up and reworked

* libomnitrace-dl updates

- require LD_PRELOAD with binary rewrite
- dl::InstrumentMode
- dl::get_instrumented()
- verify_instrumented_preloaded()
- omnitrace_set_instrumented(int)
- relocated omnitrace_main from main.c to dl.cpp
- omnitrace_set_env does not dlopen libomnitrace
- omnitrace_set_main(func_ptr) [internal API]
- OMNITRACE_HIDDEN_API -> OMNITRACE_INTERNAL_API

* Update testing to new LD_PRELOAD requirements

* omnitrace-instrument updates

- adhere to LD_PRELOAD requirementsa
- invoke omnitrace_set_instrumented
- binary rewrite does not instrument main
- binary rewrite does not instrument call to omnitrace_init
- runtime instr does not instrument main
- runtime instr does not instrument call to omnitrace_init

* Bump to v1.9.0

- LD_PRELOAD requirement necessitates minor version increment

* common: environment

- fix ambiguous get_env calls

* omnitrace-instrument update

- fix issue with temporaries

* omnitrace-instrument and libomnitrace-dl updates

- runtime instrumentation does not work if libomnitrace-dl is preloaded

* libomnitrace-dl and libpyomnitrace updates

- define dl::InstrumentMode in dl.hpp
- handle instrumentation via setprofile libpyomnitrace
  - do not push trace in omnitrace_init

* omnitrace-instrument and libomnitrace-dl updates

- move header to dl subdirectory
- omnitrace::omnitrace-headers include omnitrace-dl folder
- use InstrumentMode in omnitrace-instrument

* Update workflows and scripts

- Use omnitrace-run on instrumented exes

* Update docs

- add omnitrace-run to examples of running binary rewritten exes
2023-03-14 19:48:29 -05:00
Jonathan R. Madsen ab0e5d9b44 omnitrace -> omnitrace-instrument (#256)
* omnitrace-exe -> omnitrace-instrument

- Renamed omnitrace executable to omnitrace-instrument
- Provided dummy omnitrace exe which forwards onto omnitrace-instrument
- updated all docs to reflect the name change of the executable
  - however, it is possible some were missed

* Update dyninst submodule

- correctly handle BOOST_LINK_STATIC in DyninstBoost.cmake

* Disable IPO for omnitrace-instrument
2023-03-09 18:18:34 -06:00
Jonathan R. Madsen 83f9ed8696 Python 3.11 support + update RedHat CPack (#254)
* Fixes for Python 3.11

* Add python 3.11 to scripts

- also tweak to to{upper,lower} bash functions

* Fix PAPI RPM packaging in RedHat

- fix error from #!/usr/bin/python in papi_hl_output_writer.py
  - requires either python2 or python3 instead of python

* cpack updates

- only generate STGZ for RedHat
- support `--generators` arg in build-release.sh
- support 7z, zip, and other zip generators
- fix build-release.sh with `--mpi`
- support setting CONDA_ROOT

* Support rhel/fedora/centos in omnitrace-install.py

* RedHat status badge

* Fix support for Python 3.11 + tweak ubuntu ci

- Remove installing clang and mpich in Ubuntu CI container
- Fallback on conda-forge for Python 3.11
- Enable entrypoint-rhel.sh for RHEL CI
- Pull latest container by default

* Update ElfUtils and PAPI builds

- quieter build output
- disable-nls for ElfUtils
- use -s flag for make

* Development Guide Docs
2023-03-08 00:19:29 -06:00
Jonathan R. Madsen 1688a027d8 Add RedHat CI and release packaging (#251)
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
- improves the stability of MPI finalization
- reduces some debug messages within timemory when `OMNITRACE_DEBUG=ON`
- fixes issue found in RHEL where libunwind is using mutex and omnitrace was not treating this as an internal mutex call
  - this may have been affecting the causal profiling slightly (tests seem a bit more stable now)
- fix data race in timemory

* Add RedHat CI and release packaging

- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings

* Fix URL for ROCm packages in redhat workflow

* Fix dnf --enable-repo for ROCm perl packages

* Dockerfile.rhel and redhat.yml updates

- Fix dnf repo for ROCm PERL packages
- Disable python in CI (interpreter segfaults)
- Exclude parallel-overhead-locks tests due to inclusion of internal locks
  - This needs to be remedied in the future

* Exclude _dl_relocate_static_pie from instrumentation

* Testing updates

- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks

* Fix redhat workflow

* redhat.yml update

- remove if condition on config/build/test step

* Update timemory submodule

- tweaks to verbosity messages

* Set thread state before unw_step

- on Redhat, unw_step calls mutex

* Update timemory submodule

- verbosity changes
- gotcha uses spin_lock/spin_mutex

* Remove using gsplit-dwarf unless OMNITRACE_BUILD_NUMBER > 2

* Re-enable parallel-overhead-locks tests in redhat workflow

* Always disable timemory manager metadata auto output

* testing updates

- tweak parallel-overhead-locks-timemory to higher instruction count min
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks-perfetto

* Update timemory submodule

- quiet realpath queries

* omnitrace exe updates

- detect text files
- improved bin/lib locating

* cmake format

* test-install.sh and redhat workflow updates

- handle testing when ls is script
- re-enable python testing on redhat workflow
- invoke test-install.sh in redhat workflow

* Misc guards for finalization

* omnitrace-exe, testing updates

- test-install.sh: LS_EXEC -> LS_NAME
- handle /usr/bin/ls being script in source/bin/tests
- improve locating the binary

* Fix mpi_gotcha compile error

* omnitrace-exe updates

- improve file locating

* formatting

* Misc fixes

- remove -static-libstdc++ for RHEL packaging (rocky-linux doesn't distribute static lib)

* omnitrace-exe paths

* Replace realpath with absolute

- using absolute path to symlink fixes issues with locating libdyninstAPI_RT at runtime

* omnitrace exe updates

- judicious use of realpath

* Update timemory submodule

- fix update main hash ids/aliases data race in merge

* bin tests update

- change working directory of omnitrace-exe-simulate-lib-basename

* omnitrace exe updates

- Update resolved exe/lib messaging

* bin tests update

- change working directory of omnitrace-exe-simulate-lib-basename
2023-03-07 06:04:19 -06:00
Jonathan R. Madsen 32b15fe7b7 Handle fork in target application (#191)
* Always print PID in log messages

* omnitrace-dl updates

- omnitrace_preload does not call omnitrace_init or omnitrace_init_tooling
- omnitrace_preload will call omnitrace_set_mpi if OMNITRACE_USE_MPI
  or OMNITRACE_USE_MPIP in the env is true but not call it otherwise
  because doing so either overrides OMNITRACE_USE_PID (when true) or
  disable mpip from initialization (when false) and the MPI
  init can be caught later and override OMNITRACE_USE_PID

* config updates

- set_setting_value sets user update type
- remove volatile from get_settings_configured
- don't override settings::default_process_suffix
- don't kill process in omnitrace_exit_action
- set_state ignores updating state if >= State::Finalized

* Handle state > State::Finalized

* fork gotcha updates

- unsets LD_PRELOAD
- sets OMNITRACE_ROOT_PROCESS
- sets OMNITRACE_CHILD_PROCESS

* libomnitrace library.cpp updates

- basic_bundle for fini metrics
- handle finalization from child process

* sampling updates

- sampling::shutdown handles when child process

* Add example and test using fork

* Update run-ci script to support not submitting

* Tweak test envs

* Update build flags when codecov enabled

* remove unnecessary includes of sampling header

* Replace mpi copy/fini static lambda with free-funcs

* Update codecov job

* Fix OMPT segfaults after finalization

* Miscellaneous updates after rebase

* fixes for causal profiling

* revert some run-ci.sh changes

* Disable storing env in sampling::shutdown

* formatting fix

* Update timemory submodule

- fixed occasional synchronization issues with allocator offloading
- exclude protozero:: from internal samples

* improve root/child process detection

- avoid omnitrace_finalize in MPI when child process
- revert some testing tweaks
2023-02-08 01:31:38 -06:00
Jonathan R. Madsen e7d3125459 restructure libomnitrace + tasking and omnitrace-causal updates (#237)
* restructured libomnitrace

- this is necessary to incorporate some of the binary analysis capabilities into omnitrace exe
- created libomnitrace-core (static)
- created libomnitrace-binary (static)
- created libomnitrace (static)
- omnitrace-avail links to libomnitrace.a
- omnitrace-critical-trace links to libomnitrace.a
- tweaked the testing
  - reduced verbosity on some of MPI tests
  - excluded trace-time-window from tests on Ubuntu 18.04
  - reduced causal e2e iterations
- minor tweak to tasking
  - manually create `PTL::UserTaskQueue` instance instead of relying on `PTL::ThreadPool` to create it

* Update formatting workflow

- source formatting uses ubuntu-22.04
- check-includes doesn't generate false positive for 'include "timemory.hpp"'

* omnitrace-causal --generate-configs

- fix config generation in omnitrace causal
- add test for omnitrace-causal + generating configs

* Fix omnitrace-object-library build

- accidentally included rocm sources in non-rocm builds

* Fix rocm compilation w/o rocprofiler

* update timemory submodule with mpi_get warning messages

* sampling offload file updates

- more verbose messages
- disable offload before stopping

* testing updates

- increase causal e2e iterations to 12
- increase lock_environment verbose to 2 (for sampling offload messages)
- fix return for omnitrace_add_validation_test
2023-02-04 10:59:50 -06:00
Jonathan R. Madsen 9618ddefba Causal profiling (#229)
* Addition of basic structure

* Reworked categories

* More causal integration additions

* Causal implementation

* Update examples

* delete virtual_speedup files

* Update perfetto submodule to v31.0

* Update dyninst submodule

* Update timemory submodule

* ElfUtils build for libdw

* OMNITRACE_LIKELY and OMNITRACE_UNLIKELY

* Update common lib join

* Examples updates for causal profiling

* config updates with causal options

- OMNITRACE_CAUSAL_FIXED_LINE
- OMNITRACE_CAUSAL_FIXED_SPEEDUP
- OMNITRACE_CAUSAL_FILE
- OMNITRACE_CAUSAL_BINARY_SCOPE
- OMNITRACE_CAUSAL_SOURCE_SCOPE
- version info in banner
- support increments in parse_numeric_range
- fix occasional deadlock in first call to get_config

* PTL general task group

* Always include PID in debug/verbose messages

* Add blocking/unblocking gotchas to runtime init bundle

* CausalState

* thread_data updates

- generic component_bundle_cache

* Improve handling of causal in category_region

* components updates

- backtrace_causal component
- backtrace::get_data member func
- decrease ignore_depth in backtrace::sample(int)
- handle "omnitrace_main" in backtrace::filter_and_patch(...)
- tweak internal thread state scope for pthread_mutex_gotcha wrappers

* simplify tracing get_instrumentation_bundles usage

* sampling updates

- include backtrace_causal component
- disable backtrace_metrics if using causal and not using perfetto
- disable backtrace and backtrace_timestamp when using causal
- post_process_causal

* causal updates

- more checks in blocking_gotcha and unblocking_gotcha start/stop
- miscellaneous overhaul of data
- experiment update

* Remove virtual speedup

* libomnitrace code_object

* causal-profiling test

* libomnitrace library.cpp updates

- handle causal profiling
- fini_bundle

* Disable causal profiling by default

* Updated causal code and example

- example: three execution variants: cpu + rng, cpu, rng
- example: three instrumentation variants: none, omni, coz
- fix blocking gotcha credit
- rework perform_experiment_impl
- get_eligible_address_ranges
- compute_eligible_lines
- support fixed lines/speedups/functions
- update selected_entry to support function mode
- fix causal::delay
- experiment updates

* omnitrace_progress / omnitrace_user_progress

- with accompanying omnitrace_annotated_progress / omnitrace_user_annotated_progress

* Update timemory submodule

* CausalMode

- mode indicated whether causal predictions source be at line-level or function-level

* code_object, config, runtime, sampling, thread_data

- code_object: address_range
- code_object: basic::line_info serialize(), name(), hash()
- config updates
- two signals for causal sampling
- thread_data init fixes

* pthread updates

- pthread_create_gotcha processes delays
- pthread_mutex_gotcha does not wrap pthread_join in causal mode

* backtrace_causal update

- dynamic delay period stats

* main wrapper uses basename of argv[0]

* update elfio submodule

* perf support (currently unused)

* Fix experiment JSON serialization

- static_vector.hpp (unused)

* causal executable + config options updates

- omnitrace-causal exe simplifies running multiple causal configs
- changed the causal config option names

* Support both throughput and latency points

* process-causal-json.py script

- will be used later for testing

* stable_vector

* Rework thread_data

* Improve omnitrace-causal exe

- better verbosity handling
- correct diagnosis of status for child process
- execvpe when only one iteration (debugging)

* Update timemory submodule

* exe --version

- omnitrace, omnitrace-avail, and omnitrace-sample all support --version on command-line

* OMNITRACE_INTERNAL_API + OMNITRACE_{LIKELY,UNLIKELY}

* omnitrace-causal cmake format

* omnitrace config update

- OMNITRACE_CAUSAL_FILE_CLOBBER

* custom exception

- wraps STL exception and gets stacktrace during construction

* exit_gotcha supports _Exit

* use global construct_on_init + max threads

- add some safety when exceeding max # of threads

* update code_object binary filter

- exclude dyninst and tbbmalloc library

* containers: c_array, static_vector, stable_vector

- moved utility::c_array to container::c_array
- created static_vector: std::vector bound to std::array
- created stable_vector: vector with stable references

* grow thread_data when new thread created

* causal updates

- data: improve compute_eligible_lines to ignore lambdas
- data: use new thread_data
- delay: use new thread_data
- experiment: properly support latency points
- experiment: support file clobber
- experiment: ensure non-zero experiment time
- progress_point: use new thread_data
- backtrace_causal: use new thread_data

* Update causal-profiling tests

* fix omnitrace-causal backslash escaping

* process-causal-json script

* restructure causal implementation

- update verbose messages for omnitrace-causal diagnose_status
- migrated causal implementation in sampling.cpp to causal/sampling.cpp
- OMNITRACE_USE_CAUSAL does not require OMNITRACE_USE_SAMPLING
- added Mode::Causal
- causal sampling uses same signals as regular sampling
- moved tracing::thread_init to implementation file
- combined tracing::thread_init and tracing::thread_init_sampling
- added causal/components folder
- pthread_create_gotcha::wrapper_config
- omnitrace_preload checks OMNITRACE_USE_CAUSAL
  - updates mode accordingly

* update timemory submodule

* update timemory submodule

* causal example updates

- causal for lulesh

* perf code + utility - helpers

- relocated causal perf code
- placement new when generating unique ptr trait for potentially allocating during sampling
- additions to utility header
- removed previously added helpers.hpp

* update timemory submodule

* Default env variables for omnitrace-causal

- activate OMNITRACE_USE_KOKKOSP, etc.

* update stable_vector and static_vector

- static vector can use atomic for size tracking for thread-safe situations

* update causal example header

- CAUSAL_PROGRESS_NAMED
- use CAUSAL_ prefix for some macros

* Tweak lulesh example

- use CAUSAL_PROGRESS instead of CAUSAL_BEGIN and CAUSAL_END

* omnitrace-sample support for causal mode

- set OMNITRACE_USE_SAMPLING to off when OMNITRACE_MODE=causal

* refactor and cleanup code_object

- scope filter
- fixes to address_range

* overhaul causal data + causal config options

- full support for function and line mode
- support static vector of instruction pointers
- improve line info mapping resolution
- remove thread-locality from miscellanous functions where unnecessary
- causal options for {binary,source,function,fileline} exclusion

* causal experiment, sampling, and backtrace updates

- is_selected + unwind address array
- experiment warning about progress points
- increased buffer size for backtrace_casual sampler
- backtrace_causal only stores IP addresses instead of full unwind info

* category_region updates

- minor refactor
- local_category_region::mark

* Update causal tests

* Bump version to 1.8.0

* omnitrace-causal args + CLOBBER -> RESET

- renamed OMNITRACE_CAUSAL_FILE_CLOBBER to OMNITRACE_CAUSAL_FILE_RESET
- updated omnitrace-causal exe to support recently added configuration options
- other miscellaneous tweaks to data.cpp, experiment.cpp, and sampling.cpp

* Refactor causal and code_object

- code_object.hpp and code_object.cpp moved into binary folder
- causal components namespaced into omnitrace::causal::component
- moved sample_data out of backtrace_causal and into own file
- renamed backtrace_causal to causal::component::backtrace

* preload omnitrace_init + OMNITRACE_DEBUG_MARK

- env OMNITRACE_DEBUG_MARK
- fix omnitrace_init call when LD_PRELOAD-ing omnitrace

* Fix fileline support + line-info output names + experiment log

- line-info log files are prefixed with experiment name
- don't print experiment duration when E2E
- account for fileline scope in analysis

* KokkosP: OMNITRACE_KOKKOSP_NAME_LENGTH_MAX

- config option to limit the name of kokkos tool callbacks
- remove [kokkos] from KokkosP names

* Update causal example

- minor tweaks to decrease probability of overlapping regions in binary

* omnitrace-causal update

- prefix N / Ntot in environment printout

* Miscellaneous updates

- causal::finish_experimenting()
- OMNITRACE_CAUSAL_RANDOM_SEED
- KokkosP causal updates
  - exclude some callbacks, make some callbacks unique, etc.
- address_range::operator+=(address_range)
- combine contiguous ranges in binary/analysis.cpp when file, func, line is same and address range is contiguous
- bfd_line_info reads inline info
- wait for perform_experiment_impl to complete
- causal::delay updates
  - delay::process checks if experiment is active
  - uses threading::get_id()
- experiment scales duration up for larger speedup experiments
- line info samples includes excluded lines
- sampler uses CLOCK_REALTIME
- blocking_gotcha updates
  - is no longer fully static
  - adds audit routine which sets the postblock value to zero if try/timed routine fails
- category::host was added to causal_throughput_categories_t
- pthread_create_gotcha sets new threads local parent delay
  - was using internal value, now uses sequent value

* Causal improvements to KokkosP

* Updates to experiment time scaling

- use stats instead of just max

* binary/link_map.{hpp,cpp}

* update process-causal-json.py

* Folded fileline scope into source scope

* Update documentation

- Add documentation for causal profiling
- Replace 'Omnitrace' with 'OmniTrace' everywhere

* Update causal-helpers.cmake + omnitrace-testing.cmake

- split tests/CMakeLists.txt partially into omnitrace-testing.cmake

* omnitrace/causal.h

- OMNITRACE_CAUSAL_PROGRESS
- OMNITRACE_CAUSAL_PROGRESS_NAMED
- OMNITRACE_CAUSAL_BEGIN
- OMNITRACE_CAUSAL_END

* selected_entry + remove default filters for lambdas and operator()

- selected entry stores range and binary load address

* update process-causal-json.py

* format examples/lulesh/CMakeLists.txt

* causal-helpers find_package(Threads)

* OMNITRACE_KOKKOSP_KERNEL_LOGGER

- was OMNITRACE_KOKKOS_KERNEL_LOGGER

* quiet find of coz-profiler

* Fix rocm_smi exception handling

* Update timemory submodule (binutils)

- fix binutls compile error on some systems
- bump binutils to v2.40

* Fix miscellaneous tests

* OMNITRACE_KOKKOSP_PREFIX

* revert rocm_smi handling

* ElfUtils updates

- default to download version 0.188
- add -Wno-error=null-dereference due to GCC 12 compiler error

* Update causal example

* Remove OMNITRACE_VERBOSE from global workflow envs

* Reliable causal test

* disable compilation of causal perf files

* Remove set_current_selection with unwind stack

* update timemory submodule

* fix for segfault on bionic

- locking in TLS dtor was causing segfault

* remove experiment::is_selected(unwind_stack_t)

* update default init of selected_entry

* Fix for when IP is not offset by load address

* Update CMakeLists.txt

* Miscellaneous updates

- OMNITRACE_WARNING_OR_CI_THROW
- OMNITRACE_REQUIRE
- OMNITRACE_PREFER
- fixed issues with no ASLR
-  added load address variable and ipaddr() func to basic/bfd line info
- removed get_basic() from dwarf_line_info
- TIMEMORY_PREFER -> OMNITRACE_PREFER
- removed previously added binary_address and range variables from selected_entry

* Removed superfluous CausalState

* Additional causal tests (lulesh + kokkos)

* filter, prefer, analysis ASLR handling

- removed default filter on cold functions
- fixed OMNITRACE_PREFER
- fixed analysis ASLR handling

* Tweak line-info output

* Removed some superfluous code

- causal/delay
- causal/selected_entry

* Exclude main.cold in function mode

* Update validate-perfetto-proto.py

- account for occasional http errors

* Add sampling test disabling tmp files

* argparser for process-causal-json

- support validation
- support filtering

* Avoid pthread_{lock,unlock} in sampling offload

- use homemade atomic_mutex/atomic_lock since contention will be low and using pthread tools might trigger our wrappers

* Rename process-causal-json.py

- validate-causal-json.py

* rework omnitrace_add_causal_test

- capable of performing validation
- added validation tests

* Fix kokkosp_begin_deep_copy + causal

* Tweak address range in bfd_line_info::read_pc

* Tweak analysis and data IP handling

- look for gaps

* Disable scaling experiment time by speedup

* Revert change in max threads during CI

* binary updates

- significant overhaul of binary analysis implementation
- removed "basic_line_info" and "bfd_line_info" in lieu of "symbol" class
  - symbol class has basic BFD info + vector of inlines + vector of dwarf info

* Updated causal to use new binary analysis

- Fix symbol.cpp includes

* Updated formatting target

- include *.cmake files

* Updated causal tests

- causal tests should be stable now

* Update timemory and dyninst submodules

- TPLs are stripped + built w/o debug info

* Increase tolerance for causal validation speedups

- higher speedups have more variance (increased to +/- 5 from 3)

* Support causal output for MPI

- i.e. tag with MPI rank

* omnitrace-causal launcher argument

* improve experiment sampling output

* causal data updates

- call compute lines once
- fixed filtered cached binary info
- debugging info when experiment fails to start

* Tweaked causal validation tests

* dwarf_entry ranges

* CI updates

- increase max threads to 64

* Tweak causal E2E validation tests

- more threads
- shorter thread runtime
- more iterations

* Fix shadowed variable

* fix symbol read_bfd last PC calculation

* fix maybe-uninitialized warning

* omnitrace-causal launcher update

- only inject "omnitrace-causal --" once
- throw error if no matches found

* Update causal profiling docs for launcher

* fix address range boundaries
2023-01-24 18:53:23 -06:00
Jonathan R. Madsen 12001d9633 Fix release workflow (#228)
- remove docs updates and release step from cpack.yml workflow
- create docs.yml workflow
- create release.yml workflow
- update source/docs/environment.yml
- add status badge for documentation
2023-01-15 23:04:15 -06:00
Jonathan R. Madsen 1f35d58f47 CPack support for ROCm 5.4 (#227)
- add ROCm 5.4 to CPack variants
2023-01-15 19:20:50 -06:00
Jonathan R. Madsen 1f818054ce omnitrace-install.py (#221)
- omnitrace-install.py will be uploaded as a release asset
- script simplifies selecting the correct installer script
2022-12-16 05:31:40 -06:00
Jonathan R. Madsen 7ecc037d17 Fix release docs workflow + update documentation (#216)
* Fix release-docs workflow

* Documentation updates

- warning as errors when building docs
- fixed warnings when building docs
- fixed doxygen comments

* Miscellaneous fixes

* Fix doxygen comments
2022-12-14 07:59:53 -06:00
Jonathan R. Madsen 589a729702 roctracer: device and stream tracks (#209)
* roctracer: use multiple tracks for HIP streams

Use different perfetto tracks for each stream, and set the name of
these tracks to the stream pointer values. Setting the name like this
matches the args in the API traces.
This fixes overlapping work on multiple streams appearing as a call
stack.

* Fix -pedantic

* Run clang-format

* Add option to disable per stream tracks in perfetto

* Updated scheme for roctracer activity + general roctracer fixes

- Per-device tracks
- Handle HSA OPS in ROCm 5.3
  - the changes in ROCm 5.3 were causing HSA ops to get discarded
- Default for OMNITRACE_ROCTRACER_DISCARD_INVALID is now zero
  - i.e. default behavior is to flip beg_ns and end_ns when beg_ns > end_ns

* Flush perfetto at end of hip_activity_callback

- fixes unterminated regions

* GitHub Actions and run-ci script updates

- improve reliability

* Set OMNITRACE_TMPDIR in testing

- files in /tmp get occasionally deleted during CI

Co-authored-by: Gergely Meszaros <gergely@streamhpc.com>
2022-11-16 15:57:27 -06:00
Jonathan R. Madsen ab0a3e9d7d CI CPack fix (#204)
- Fixes error in CPack workflow from PR #203
2022-11-13 10:47:09 -06:00
Jonathan R. Madsen e1102a8ba4 CI and testing updates (#203)
* Python implementation of run-ci.sh

* Container workflow update

- retry failed container build to combat network failures

* cpack workflow update

- retry failed base container build to combat network failures

* General CI workflow updates

- retry failed "Install packages" step to combat network failures

* Miscellanous linting fixes

* Formatting workflow update

- improve regex for source formatting

* format user.h

* Add new omnitrace-avail tests

* Make run-ci.py executable

* workflow retry fix

- timeout_seconds -> retry_wait_seconds

* Fix cmake formatting glob

* source formatting

* Handle PRs in run-ci.py

* Specify timeout_minutes in retry steps

* Remove remaining --cmake-args from workflows

* CI warnings about using MPICH headers

* Remove text=True from run-ci.py

- not capturing stdout/sterr so unnecessary

* Fix OpenSUSE step label

* Update omnitrace-avail-write-config tests

- use TWD (Test Working Directory) instead of PWD since PWD might not be build directory

* paths-ignore + workflow_dispatch
2022-11-13 10:42:14 -06:00
Jonathan R. Madsen 642b6b95ca Support external (i.e. user-defined) trace annotations (#195)
* Support external (i.e. user-defined) trace annotations

- tweaked the python examples to be more balanced
- updated the user-api example to conform to user API changes
- moved the get/set for State and ThreadState to state.{hpp,cpp}
- introduced user-provided trace annotations
- added perfetto python category
- moved coverage impl files around
- created enumerations for mapping category enums to category types
- created enumerations for mapping annotation type enums to annotation values
- moved tracing::add_perfetto_annotations to tracing/annotation.hpp
- utility make_index_sequence_range
- libomnitrace-dl: omnitrace_push_category_region
- libomnitrace-dl: omnitrace_pop_category_region
- libomnitrace-user: omnitrace_user_push_annotated_region
- libomnitrace-user: omnitrace_user_pop_annotated_region
- libpyomnitrace: support extra annotations via annotate_trace config value
  - filename
  - line
  - last attempted instr in bytecode (lasti)
  - argcount
  - num local variables
  - stacksize
- omnitrace-python: -a / --annotate-traces option

* tweak ubuntu-focal workflow

* Fix installation of omnitrace-user headers

* ubuntu-focal-codecov workflow update

- Install texinfo

* Update timemory submodule
2022-11-11 07:31:14 -06:00
Jonathan R. Madsen 654beef6ab Dyninst fix for BinaryEdit::getResolvedLibraryPath (#193)
* Dyninst fix for BinaryEdit::getResolvedLibraryPath

- parsing of /sbin/ldconfig -p did not handle "Cache generated by: ..." line in Ubuntu 22.04

* Update ubuntu-focal-codecov workflow

- install texinfo
- set MPI_HEADERS_ALLOW_MPICH=ON
2022-11-02 01:25:26 -05:00
Jonathan R. Madsen f147670a7a Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}

- remove /merge from CI name

* disable using BFD when sampling_include_inlines is OFF

- this consumes a lot of memory

* Improve finalization of rocprofiler

* update timemory submodule

- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init

* Improve timemory build time

* Remove kokkosp restrictions for perfetto

* omnitrace exe signal handler update

- configure signal handlers before main to allow libomnitrace to override

* Backtrace and timemory submodule updates

- Use unwind::cache w/o inline info
- update timemory submodule
  - unwind::cache updates
  - filepath updates
  - fix termination_signal_message
  - fix vsettings::report_change

* Update dyninst submodule

- updates BinaryEdit::getResolvedLibraryPath

* update timemory submodule

- update CpuArch support

* Cleanup configure warnings

* Update examples cmake and workflows

- (Mostly) eliminate configuration warnings

* omnitrace exe updates

- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS

* Update dyninst submodule

- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath

* examples/mpi CMakeLists.txt update

- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING

* Dev build and linker flags

- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
  - disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function

* Update workflows to set OMNITRACE_BUILD_NUMBER

* Fix generator expressions for -fuse-ld

* Suppress some configuration warnings during CI

- helps to keep track of real warnings when they arise

* Update timemory and dyninst submodules with CMP0135

* Add -V flag to run-ci script
2022-11-01 17:28:12 -05:00
Jonathan R. Madsen 46b6db1a4c Submitting jobs to cdash (#124)
* Submitting jobs to cdash

* Fail on submit

* submit url env

* submit url env

* try passing submit url as arg

* fix submit url

* Updated default URL

* Add submissions for remaining ubuntu focal workflow jobs

* Replace g++ with gcc in dashboard build name

* Add --ctest-args to run-ci.sh

* Add cdash support for bionic, jammy, and opensuse workflows

* Decrease CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE

* OMNITRACE_BUILD_CODECOV option

* Support code coverage in CDash script

* CI dyninst built with debug info

* Update ci-containers

- cron schedule moved 4 hours later to UTC+5

* Update implementation of config::configure_signal_handler

- using lambdas failed to compile with codecov flags

* Add codecov job to ubuntu focal workflow

* Fix support for --ctest-args in run-ci script

* Fix ubuntu workflows

* Fix quotation handling in run-ci script

* git safe directory for codecov

* New MPI examples

* Remove --stop-on-failure

* dynamic_library update

- find_library_path checks procfs maps
- invoke find_library_path with no additional args to resolve to mapped file

* RCCLP uses dynamic_library

* check if file exists for memory_map_files metadata

* Testing updates

- include new mpi examples in tests
- fix test labels
- test critical-trace exe

* Update MPI C examples tests (needed arg)

* Remove try/catch block from critical-trace

* Fix sampling max wait when shutting down

* Fix test env for critical-trace

* Fix settings for critical-trace

- disable time output: data is deterministic
- disable PID suffixes: not multiprocess

* Update critical-trace ctest

* Update critical-trace exe

- throw error if input cannot be opened
- throw error if input has no data

* Update lulesh example with more kokkos tools usage

* Fix tasking issue with critical_trace and roctracer

- were not setting pools to active
- also sync before critical_trace::get_entries

* Increase verbosity of critical-trace tests

* Update code coverage tests

- skip code coverage + preload
- code-coverage python example and test

* Remove duplication omnitrace.initialize function

* Skip python3.6 for ubuntu jammy

* Update MPI examples

- use MPI_Isend and MPI_Irecv
- explicitly use MPI_Bcast

* Update Formatting.cmake

- include C files in examples

* run-ci script does not check return of coverage

* mpi-allreduce link to libm

* Update ctest args in run-ci script

* Update dyninst submodule

- safety improvements in BinaryEdit::openResolvedLibraryName

* capture cmake error for ctest_coverage
2022-10-31 15:39:45 -05:00
Jonathan R. Madsen 5bb7359b81 Update containers workflow (#182)
* Bump version to 1.7.2
* Update containers workflow
2022-10-21 06:21:00 -05:00
Jonathan R. Madsen 78a06e7a42 Signal handler backtraces provide line info (#178)
* Signal handler backtraces provide line info

- print backtrace after SIGINT during finalization

* Workflow run-name + jammy rocm CI

* fix jammy matrix indentation

* disable building dyninst in jammy

* Update jammy for rocm

* jammy rocm_agent_enumerator

* Fix rocm install for jammy

* jammy bash

* jammy workflow typo

* revert some changes

* stack-usage + omnitrace-rt symlink + ncclSocketAccept + indiv sigs

- symlink omnitrace-rt in build tree
- exclude ncclSocketAccept
- timemory submodule update accepting individual signal handlers
2022-10-18 21:45:56 -05:00
Jonathan R. Madsen ede6007f9b Support for Ubuntu 22.04 and ROCm 5.3 (#48)
* Testing and CI support for Ubuntu 22.04

* Fixes for ROCm

- Jammy does not have ROCm installers

* Name, timeout, and python updates

- renamed ubuntu-jammy-external.yml to ubuntu-jammy.yml
- increased all 5 minute timeouts to 10 minutes
- include python 3.10 in testing

* Update dyninst to remove interposed definition of _r_debug

* Rebuild Dyninst + test install script

* Revert container change

* git safe directory

* pushd -> cd

* fix MPI include

* Fix testing step

* OMPI_ALLOW_RUN_AS_ROOT

* Test script changes

* Fix mismatched malloc / delete[]

* Jammy workflow tweaks

* CPack tweak for boost deb deps

* pthread_mutex_gotcha config returns when not enabled

* fix echoing config in CI

* USE_CLANG_OMP

- option to disable using LLVM OpenMP when building OpenMP test executables
- Jammy workflow sets USE_CLANG_OMP=OFF

* Dyninst submodule boost download

- updated containers workflow to include jammy
- updated workflow to use ci

* Updates to workflows + replace test-install.sh

- test-install.sh in this branch was replaced with one in main branch

* Expand jammy test-install.sh args

* Fix openmp-cg-sampling-duration test

* update timemory submodule

- use-after-free violation in popen::pclose

* revert some tweaks to sampling-duration test

* Fix env of test-install.sh

* cmake format

* jammy bash

* CPack install for jammy

* formatting workflow action version bump

* Update timemory submodule

- libunwind submodule via timemory sets SOVERSION to 99 to avoid ABI conflicts with v8

* Fix help menu for omnitrace-sample

* Support other boolean forms in test-install.sh

* Update docker files and build-docker.sh

- consolidated cases in build-docker.sh
- support rocm version of 0.0 (no rocm install)
- support rocm v5.3
- updated centos handling

* update opensuse actions/checkout version

* Tweaks to ubuntu-focal testing

- actions/checkout@v3
- use test-install script

* update cpack

- ubuntu 22.04
- rocm 5.3
- rename os matrix field to os-version
- remove CI_ROCM_VERSION (no longer necessary)
- remove default-rocm-version matrix field (no longer necessary)
- CentOS packaging

* fix argparsing and omnitrace-sample tests in install-tests.sh

* focal rocm test install workflow fix

* Fix omnitrace-sample build

* Dockerfile.centos + build-docker.sh updates

* Update actions/upload-artifact version

* Dockerfile.ubuntu: install rocm-device-libs

* Refactor cpack

* fix cpack if quotes

* Dockerfile.ubuntu rocm < 5 installs rocm-dev

* build-release.sh defaults to boost version 1.79.0
2022-10-17 12:54:26 -05:00
Jonathan R. Madsen 79a8f16646 omnitrace-sample (#169)
- `omnitrace-sample` executable which executes sampling (no
instrumentation)
- fixes bug with OMPT ignoring value of `OMNITRACE_USE_OMPT`
- fixes some issues with sampling duration
- new `OMNITRACE_SAMPLING_INCLUDE_INLINES` configuration variable
- restricts process-sampling to 100 interrupts/sec when inheriting value
from `OMNITRACE_SAMPLING_FREQ`
- `OMNITRACE_PROCESS_SAMPLING_FREQ` still supports up to 1000
interrupts/sec
- fixes bug with colorized log not truly being disabled in all instances
- adds tests for `omnitrace-sample`
- adds tests for sampling duration
- settings ROCP_TOOL_LIB to libomnitrace-dl throws error
  - rocprofiler does not configure correctly when this is done
- Quiet numa_gotcha warnings
- Fixed some shadowed variables
2022-09-30 10:47:07 -05:00
Jonathan R. Madsen 15e6e6d979 Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
  - Closes #149 
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
  - Closes #153 
  - Closes #154
2022-09-13 08:11:48 -05:00
Jonathan R. Madsen 64afa49193 GOTCHA wrappers for NUMA functions (#152)
- GOTCHA wrappers for:
  - mbind
  - migrate_pages
  - move_pages
  - numa_migrate_pages
  - numa_move_pages
  - numa_alloc
  - numa_alloc_local
  - numa_alloc_interleaved
  - numa_alloc_onnode
  - numa_realloc
  - numa_free
- bumped version to 1.6.0
- updated categories
- hardware_counters -> kernel_hardware_counters and
thread_hardware_counters
  - simplified mapping category structs to perfetto category names
- updated timemory submodule
2022-09-12 17:44:27 -05:00
Jonathan R. Madsen 808ea7dfa7 Rework sampling and colorized logs (#140)
## Overview

This is a significant PR which has 3 very notable characteristics:

1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling 
  - Samples now record the current instruction pointers instead of strings
    - This _dramatically_ decreases the overhead of taking a sample
  - The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  - When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
    - `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency 
  - `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
  - Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
    - In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
  - This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

## Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

## Details

- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
  - there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
  - removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
  - removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
2022-08-31 01:24:31 -05:00
Jonathan R. Madsen 0dd8f52292 Generic comm_data component (#132)
* Generic comm_data component

- moved rccl_comm_data to comm_data
- comm_data includes communication data for MPI

* fix timemory include with quotes

* Only support MPI comm data with full MPI support

* Increase timeouts + kill perfetto

* Update timemory submodule

* Fix missing command killall

* set +e in Kill Perfetto workflow step

* Updated MPI example to include MPI_Send and MPI_Recv calls

* Update timemory submodule with storage merge fix

* Perfetto comm data

- tracing::now<T>() function

* Fix timemory header include
2022-08-25 19:48:10 -05:00
Jonathan R. Madsen b79ce10fee RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2 (#126)
* RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2

- until v5.2 only librocprofiler64.so was symlinked in /opt/rocm. Thus linker using SOVERSION caused issues finding librocprofiler64.so.1

* Test ROCm w/ CMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF

* INSTALL_RPATH_USE_LINK_PATH for omnitrace exe
2022-08-02 14:19:04 -05:00
Jonathan R. Madsen 641225f883 Fix uploading release assets (#118)
Fix uploading release assets
2022-07-27 15:10:19 -05:00
Jonathan R. Madsen 7e31d9f450 ROCm environment fixes + workflow updates (#117)
* Improve dlopen of ROCm libraries + rocprofiler test

- Use PROJECT_BINARY_DIR in tests
- Added rocprofiler test

* Revert OMNITRACE_FORCE_ROCPROFILER_INIT

* omnitrace-avail --all test

* Fix ROCP_METRICS for ROCm 5.2.0

* Fix ROCP_METRICS for ROCm 5.2.0

* Restrict containers workflow to AMDResearch/omnitrace

* Bump version to 1.3.1

* Update cpack workflow

- generate release draft
- upload installers as release assets

* Test rocprofiler w/o roctracer enabled

* Fix formatting

* verbose message
2022-07-27 06:36:52 -05:00
Jonathan R. Madsen 45be03906a RCCL support (#93)
* Initial support for RCCL

* OMNITRACE_USE_RCCLP + sampling tweaks

- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile

* Update docker and workflows to download RCCL

* Update CPack DEB with rocprofiler dependency

* Rework rccl into library and library/components folder

- add tpls/rccl/rccl/rccl.h

* Fix timemory includes

* rcclp inline definitions when disabled

* Tweaks to ubuntu-focal-external-rocm

- disable ompt
- enable building testing

* Tweaks to ubuntu-focal-external-rocm

- ctest exclude

* Tweak ubuntu-focal.yml

- remove source /.../setup-env.sh, replace with $GITHUB_ENV

* Fix ubuntu-focal-rocm + OMPI + root

* Improved rocm-smi error handling

- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled

* formatting

* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL

* Update RCCL include directory

- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>

* RCCL Testing

- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes

* Handle RCCL include w/o HIP

* RCCL requires HIP

* Update OMNITRACE_SAMPLING_CPUS for testing

* Update tests/CMakeLists.txt

* Debug settings

* Install MPI even when USE_MPI=OFF

* exclude printf

* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS

* update ubuntu rocm workflow

* Fix configure env step for ubuntu rocm
2022-07-25 12:16:11 -05:00
Jonathan R. Madsen bcdec188eb Fix reliability when KOKKOS_PROFILE_LIBRARY is set in env (#103)
* Fix reliability when KOKKOS_PROFILE_LIBRARY is set in env

- in certain situations, an exe using kokkos may be instrumented
- this will cause libomnitrace to be dlopened via libomnitrace-dl
- if KOKKOS_PROFILE_LIBRARY is set to libomnitrace and not libomnitrace-dl, you will end up with different instances of libomnitrace trying to collect data

* Set OMNITRACE_MAX_THREADS=32 in CI
2022-07-24 22:01:53 -05:00
Jonathan R. Madsen ea282d9301 Release 1.3.0 preparations (#109)
* v1.3.0

* ROCm 5.2 and extensions tweaks

* Container workflow + miscellaneous updates

* Misc fixes + timemory submodule update

- timemory submodule update for multiple definitions of variant_apply

* Increase timeouts

* Remove obsolete Julia docs and script

- support for rocprofiler makes rocprof merging obsolete

* Fix cpack testing and combine cpack workflows into single script

* Install components + omnitrace tpl exes

- Improved COMPONENT specification for installs
- Install PAPI executables with omnitrace- prefix and hyphens
- Install Perfetto executables with omnitrace- prefix and hyphens

* Update docs on perfetto and papi command-line tools

* remove ubuntu 22.04 from containers workflow

* remove containers workflow running on all pushes

* Fix CI_SCRIPT_ARGS

* Fix PAPI utils install

* Fixed traced test in workflow + removed return char

- validate-perfetto-proto.py had return character

* Fix test-docker-release.sh script to use correct container

* Release build bc RelWtihDebInfo using too much memory
2022-07-23 03:02:31 -05:00
Jonathan R. Madsen ab4f1e3f77 Increase build timeouts (#107)
- Increase build timeouts to 60 min (formerly 45 in most workflows)
2022-07-22 15:25:40 -05:00
Jonathan R. Madsen d04cbe862e fix omnitrace print-* with libraries (#94)
* fix omnitrace print-* with libraries

* timemory submodule update

* Update workflows to use ./bin/omnitrace instead of ./omnitrace

* cmake format

* update timemory submodule

- fix ODR violations in utility/procfs

* cmake updates

- uniform find_package for all ROCm-based libraries

* tweak transpose example

- throw exception instead of std::exit

* Inspect cmdv name before assuming not exe

- some ELF execs "think" they are libraries so only assume rewrite + simulate + all-functions if filename looks like library
- adds some test for --print-available -- <library>

* Fix _has_lib_prefix when command is < 3

* Updates and reverts to omnitrace exe

- update module_function operator< and operator==
- add function_signature operator<
- refactor module_function ctor
- revert some previous changes w.r.t. simulate and include_unninstr

* Fix source/bin/tests to use same output dir as tests

* cmake format

* Segfault mitigation + refactor + modify function iteration

- refactor module_function ctor to avoid segfaults
- string_t -> std::string
- replace std::string with std::string_view in some places
- get_name(module_t*)
- get_name(procedure_t*)
- disable using both app_modules and app_functions
- new option: --parse-all-modules to iterate over app_modules
- removed some unused code w.r.t. debug info

* Disable module_function address range for uninstrumentable functions

* Disable module_function address range for uninstrumentable functions

* Refactored getting file/line info and init/fini

- use dyninst insertInitCallback and insertFiniCallback if main not found
- fixed all issues with segmentation faults in --simulate --all-functions

* revert changes to Findrocprofiler.cmake
2022-07-21 01:15:41 -05:00