Commit graph

218 Commits

Autor SHA1 Nachricht Datum
Jonathan R. Madsen 1cdf67257e Remove temporary files (#223)
* config updates

- remove temporary file when destroyed
- extend parse_numeric_range: support vector, increment, better delimit
- git version info in omnitrace banner
- use OMNITRACE_BASIC_VERBOSE instead of OMNITRACE_VERBOSE

* Fix array-bounds warning in openmp lu.cpp

* Bump version to 1.7.4

* Update run-ci.py

- options to repeat
  - until-pass (default: 3)
  - until-fail (default: None)
  - after-timeout (default: 2)

[ROCm/rocprofiler-systems commit: 2096f1f0c2]
2023-01-10 01:06:12 -06:00
Jonathan R. Madsen 7f73cf67bb omnitrace-install.py (#221)
- omnitrace-install.py will be uploaded as a release asset
- script simplifies selecting the correct installer script

[ROCm/rocprofiler-systems commit: 1f818054ce]
2022-12-16 05:31:40 -06:00
Jonathan R. Madsen 75e1e8bf37 Fix release docs workflow + update documentation (#216)
* Fix release-docs workflow

* Documentation updates

- warning as errors when building docs
- fixed warnings when building docs
- fixed doxygen comments

* Miscellaneous fixes

* Fix doxygen comments

[ROCm/rocprofiler-systems commit: 7ecc037d17]
2022-12-14 07:59:53 -06:00
Mészáros Gergely 35f639a2b5 rocm: Fix compilation with rocm >= 5.3.1 (#214)
* rocm: Fix compilation with rocm >= 5.3.1

Add another version checked initialiation, because:
[roctracer has changed `hsa_ops_properties_t` again in 5.3.1][hsa_ops_properties_t]

[hsa_ops_properties_t]: https://github.com/ROCm-Developer-Tools/roctracer/compare/rocm-5.3.0...rocm-5.3.1#diff-42ffacd4e7f57213868dcd96aeb5c0d47d91d2a1121ba8d67f46caa267c1818cL41

* Fix formatting

* Fix preprocessing directive

[ROCm/rocprofiler-systems commit: 2449e7cd46]
2022-11-18 12:12:59 -06:00
Jonathan R. Madsen a52254abf3 Perfetto updates (#211)
* Retry support in build-docker-release.sh

* critical-trace perfetto update

- use ::perfetto::Track instead of threads to create rows
- refactor call_chain::generate_perfetto

* Fix backtrace_metrics for perfetto

- get_papi_labels is now properly populated

* Refactor sampling::post_process_perfetto

- include HW counter delta in sample debug annotations
- reduce the amount debug annotation data stored in the call-stack
  - if the data is common to the entire stack, it is only annotated in the first and the last call-stack entry

* exit_gotcha::exit_info

* Improve OMPT shutdown

- cause spurious test failures

* Update source/lib/omnitrace/library/ompt.cpp

[ROCm/rocprofiler-systems commit: 2ebfe3fc30]
2022-11-17 08:43:45 -06:00
Jonathan R. Madsen a8cec1ea17 roctracer: device and stream tracks (#209)
* roctracer: use multiple tracks for HIP streams

Use different perfetto tracks for each stream, and set the name of
these tracks to the stream pointer values. Setting the name like this
matches the args in the API traces.
This fixes overlapping work on multiple streams appearing as a call
stack.

* Fix -pedantic

* Run clang-format

* Add option to disable per stream tracks in perfetto

* Updated scheme for roctracer activity + general roctracer fixes

- Per-device tracks
- Handle HSA OPS in ROCm 5.3
  - the changes in ROCm 5.3 were causing HSA ops to get discarded
- Default for OMNITRACE_ROCTRACER_DISCARD_INVALID is now zero
  - i.e. default behavior is to flip beg_ns and end_ns when beg_ns > end_ns

* Flush perfetto at end of hip_activity_callback

- fixes unterminated regions

* GitHub Actions and run-ci script updates

- improve reliability

* Set OMNITRACE_TMPDIR in testing

- files in /tmp get occasionally deleted during CI

Co-authored-by: Gergely Meszaros <gergely@streamhpc.com>

[ROCm/rocprofiler-systems commit: 589a729702]
2022-11-16 15:57:27 -06:00
Jonathan R. Madsen b7e504e938 Optional perfetto annotations (#206)
* Misc tweaks

- C API function print with warning colors
- split region/trace start/stop functions into regions.cpp file

* Config option for disabling perfetto annotations

* Missing checks in roctracer.cpp and sampling.cpp

* Verbose makefile in CI

* run-ci uses -VV

* Fix gcc-7 maybe-uninitialized warning

* Fix push/pop perfetto

- moving perfetto::EventContext was causing errors

[ROCm/rocprofiler-systems commit: 2f16e2ecb1]
2022-11-16 09:48:15 -06:00
Jonathan R. Madsen a325f26c61 Improve sampling allocator (#205)
* Updated sampling

- dynamic sampler is constructed with a shared pointer to an allocator instance
- dynamic allocator handles multiple sampler
  - eliminates need for every per-thread dynamic sampler to start background allocator thread

* Fix usage of tim::popen

[ROCm/rocprofiler-systems commit: 2135f82ab8]
2022-11-13 14:37:07 -06:00
Jonathan R. Madsen 84c14233b9 CI CPack fix (#204)
- Fixes error in CPack workflow from PR #203

[ROCm/rocprofiler-systems commit: ab0a3e9d7d]
2022-11-13 10:47:09 -06:00
Jonathan R. Madsen 7042f85927 CI and testing updates (#203)
* Python implementation of run-ci.sh

* Container workflow update

- retry failed container build to combat network failures

* cpack workflow update

- retry failed base container build to combat network failures

* General CI workflow updates

- retry failed "Install packages" step to combat network failures

* Miscellanous linting fixes

* Formatting workflow update

- improve regex for source formatting

* format user.h

* Add new omnitrace-avail tests

* Make run-ci.py executable

* workflow retry fix

- timeout_seconds -> retry_wait_seconds

* Fix cmake formatting glob

* source formatting

* Handle PRs in run-ci.py

* Specify timeout_minutes in retry steps

* Remove remaining --cmake-args from workflows

* CI warnings about using MPICH headers

* Remove text=True from run-ci.py

- not capturing stdout/sterr so unnecessary

* Fix OpenSUSE step label

* Update omnitrace-avail-write-config tests

- use TWD (Test Working Directory) instead of PWD since PWD might not be build directory

* paths-ignore + workflow_dispatch

[ROCm/rocprofiler-systems commit: e1102a8ba4]
2022-11-13 10:42:14 -06:00
Jonathan R. Madsen 91627797a0 Support external (i.e. user-defined) trace annotations (#195)
* Support external (i.e. user-defined) trace annotations

- tweaked the python examples to be more balanced
- updated the user-api example to conform to user API changes
- moved the get/set for State and ThreadState to state.{hpp,cpp}
- introduced user-provided trace annotations
- added perfetto python category
- moved coverage impl files around
- created enumerations for mapping category enums to category types
- created enumerations for mapping annotation type enums to annotation values
- moved tracing::add_perfetto_annotations to tracing/annotation.hpp
- utility make_index_sequence_range
- libomnitrace-dl: omnitrace_push_category_region
- libomnitrace-dl: omnitrace_pop_category_region
- libomnitrace-user: omnitrace_user_push_annotated_region
- libomnitrace-user: omnitrace_user_pop_annotated_region
- libpyomnitrace: support extra annotations via annotate_trace config value
  - filename
  - line
  - last attempted instr in bytecode (lasti)
  - argcount
  - num local variables
  - stacksize
- omnitrace-python: -a / --annotate-traces option

* tweak ubuntu-focal workflow

* Fix installation of omnitrace-user headers

* ubuntu-focal-codecov workflow update

- Install texinfo

* Update timemory submodule

[ROCm/rocprofiler-systems commit: 642b6b95ca]
2022-11-11 07:31:14 -06:00
Mészáros Gergely 0e963e47c8 instrumentation: include functions with specific calls (#202)
* instrumentation: include functions with specific calls

Add the option `--caller-include <regex>` or environment variable
`OMNITRACE_REGEX_CALLER_INCLUDE` to instrument functions which contain
call to a set of functions, E.g. `--caller-include foo` instruments any
function which calls `foo`.

* Serialize caller include information

* Add test for caller include

* Tweak to the caller include test

- tweak environment
- tweak pass regexes

* Set rewrite caller example to debug

, to avoid optimizing out the call expressions that it relies on.

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-systems commit: 1b8f09aa2d]
2022-11-11 02:32:57 -06:00
Jonathan R. Madsen 7c9534e601 Dyninst fix for BinaryEdit::getResolvedLibraryPath (#193)
* Dyninst fix for BinaryEdit::getResolvedLibraryPath

- parsing of /sbin/ldconfig -p did not handle "Cache generated by: ..." line in Ubuntu 22.04

* Update ubuntu-focal-codecov workflow

- install texinfo
- set MPI_HEADERS_ALLOW_MPICH=ON

[ROCm/rocprofiler-systems commit: 654beef6ab]
2022-11-02 01:25:26 -05:00
Jonathan R. Madsen 5a1cec92e8 Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}

- remove /merge from CI name

* disable using BFD when sampling_include_inlines is OFF

- this consumes a lot of memory

* Improve finalization of rocprofiler

* update timemory submodule

- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init

* Improve timemory build time

* Remove kokkosp restrictions for perfetto

* omnitrace exe signal handler update

- configure signal handlers before main to allow libomnitrace to override

* Backtrace and timemory submodule updates

- Use unwind::cache w/o inline info
- update timemory submodule
  - unwind::cache updates
  - filepath updates
  - fix termination_signal_message
  - fix vsettings::report_change

* Update dyninst submodule

- updates BinaryEdit::getResolvedLibraryPath

* update timemory submodule

- update CpuArch support

* Cleanup configure warnings

* Update examples cmake and workflows

- (Mostly) eliminate configuration warnings

* omnitrace exe updates

- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS

* Update dyninst submodule

- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath

* examples/mpi CMakeLists.txt update

- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING

* Dev build and linker flags

- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
  - disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function

* Update workflows to set OMNITRACE_BUILD_NUMBER

* Fix generator expressions for -fuse-ld

* Suppress some configuration warnings during CI

- helps to keep track of real warnings when they arise

* Update timemory and dyninst submodules with CMP0135

* Add -V flag to run-ci script

[ROCm/rocprofiler-systems commit: f147670a7a]
2022-11-01 17:28:12 -05:00
Jonathan R. Madsen 7b9a527b7e Offload sampling data (#190)
- update timemory submodule
  - support for load/save of ring_buffers
  - new output keys, e.g. `%nid%`
  - sampling allocator offloading data
- writing sampling data to temporary file
- new advanced config option `OMNITRACE_USE_TEMPORARY_FILES`
- new advanced config option `OMNITRACE_TMPDIR`
- SIGINT signal (i.e. `Ctrl+C`) triggers backtrace + finalization
  - this behavior is common to other profilers

* update output.md docs

* Update omnitrace-avail output keys handling

* update writing metadata

* str format in perfetto_counter_track

* Fix fail regex for mpi-example

* config updates

- OMNITRACE_USE_TEMPORARY_FILES
- OMNITRACE_TMPDIR
- Enable finalization with SIGINT
- code supporting creation of temp files

* sampling offloading to temporary file

* Disable creation of empty temporary files when off

[ROCm/rocprofiler-systems commit: b23b581563]
2022-10-31 22:23:10 -05:00
Jonathan R. Madsen c87e69e522 Submitting jobs to cdash (#124)
* Submitting jobs to cdash

* Fail on submit

* submit url env

* submit url env

* try passing submit url as arg

* fix submit url

* Updated default URL

* Add submissions for remaining ubuntu focal workflow jobs

* Replace g++ with gcc in dashboard build name

* Add --ctest-args to run-ci.sh

* Add cdash support for bionic, jammy, and opensuse workflows

* Decrease CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE

* OMNITRACE_BUILD_CODECOV option

* Support code coverage in CDash script

* CI dyninst built with debug info

* Update ci-containers

- cron schedule moved 4 hours later to UTC+5

* Update implementation of config::configure_signal_handler

- using lambdas failed to compile with codecov flags

* Add codecov job to ubuntu focal workflow

* Fix support for --ctest-args in run-ci script

* Fix ubuntu workflows

* Fix quotation handling in run-ci script

* git safe directory for codecov

* New MPI examples

* Remove --stop-on-failure

* dynamic_library update

- find_library_path checks procfs maps
- invoke find_library_path with no additional args to resolve to mapped file

* RCCLP uses dynamic_library

* check if file exists for memory_map_files metadata

* Testing updates

- include new mpi examples in tests
- fix test labels
- test critical-trace exe

* Update MPI C examples tests (needed arg)

* Remove try/catch block from critical-trace

* Fix sampling max wait when shutting down

* Fix test env for critical-trace

* Fix settings for critical-trace

- disable time output: data is deterministic
- disable PID suffixes: not multiprocess

* Update critical-trace ctest

* Update critical-trace exe

- throw error if input cannot be opened
- throw error if input has no data

* Update lulesh example with more kokkos tools usage

* Fix tasking issue with critical_trace and roctracer

- were not setting pools to active
- also sync before critical_trace::get_entries

* Increase verbosity of critical-trace tests

* Update code coverage tests

- skip code coverage + preload
- code-coverage python example and test

* Remove duplication omnitrace.initialize function

* Skip python3.6 for ubuntu jammy

* Update MPI examples

- use MPI_Isend and MPI_Irecv
- explicitly use MPI_Bcast

* Update Formatting.cmake

- include C files in examples

* run-ci script does not check return of coverage

* mpi-allreduce link to libm

* Update ctest args in run-ci script

* Update dyninst submodule

- safety improvements in BinaryEdit::openResolvedLibraryName

* capture cmake error for ctest_coverage

[ROCm/rocprofiler-systems commit: 46b6db1a4c]
2022-10-31 15:39:45 -05:00
Jonathan R. Madsen 268f94be4b Use execvpe instead of execve in omnitrace-sample (#187)
* Use execvpe instead of execve in omnitrace-sample

- previous implementation preferred exe in PATH over exe in PWD
- 'p' variants of exec duplicate the actions of the shell in searching for an executable file if the specified filename does not contain a slash character

* OpenMPI oversubscribe arg in testing

[ROCm/rocprofiler-systems commit: 139070a2de]
2022-10-27 15:15:50 -05:00
Jonathan R. Madsen 58c7c71af6 Fix LD_PRELOAD (#184)
- __libc_start_main in libomnitrace-dl wasn't wrapping bc of -fvisibility=hidden
- fix OMNITRACE_STRIP_TARGET
- omnitrace_reset_preload function in main library
- defer removing libomnitrace from LD_PRELOAD

[ROCm/rocprofiler-systems commit: a41a5c155e]
2022-10-21 17:39:08 -05:00
Jonathan R. Madsen 18128adfd3 OMNITRACE_ROCTRACER_DISCARD_INVALID=N (#186)
- only print N warnings that end < begin
- if N is zero, swap(begin, end) and include data

[ROCm/rocprofiler-systems commit: f2a0485d60]
2022-10-21 09:24:13 -05:00
Jonathan R. Madsen 25d2a0e359 Dramatic improvement of post-processing critical trace data (#185)
- several orders of magnitude faster

[ROCm/rocprofiler-systems commit: 9ff4b6b624]
2022-10-21 09:23:52 -05:00
Jonathan R. Madsen 3073a5095a Disable HSA API and activity by default (#183)
- OMNITRACE_ROCTRACER_HSA_ACTIVITY is OFF by default
- OMNITRACE_ROCTRACER_HSA_API is OFF by default
- in real applications, this adds way too much tracing data to perfetto

[ROCm/rocprofiler-systems commit: 271f851896]
2022-10-21 09:23:29 -05:00
Jonathan R. Madsen 665b0fa471 Fix perfetto debug annotations of function parameters (#181)
Fix debug annotations of function parameters

[ROCm/rocprofiler-systems commit: 1bead56ce8]
2022-10-21 07:24:21 -05:00
Jonathan R. Madsen 8f9dcffae4 Update containers workflow (#182)
* Bump version to 1.7.2
* Update containers workflow

[ROCm/rocprofiler-systems commit: 5bb7359b81]
2022-10-21 06:21:00 -05:00
Jonathan R. Madsen 7234e8cf45 Omnitrace sample documentation (#179)
* Documentation for omnitrace-sample

* Improve omnitrace-sample

- improve the printing of the env updates
- remove env settings when something is deactivated
- restore env settings when something is deactivated

[ROCm/rocprofiler-systems commit: 67f7471253]
2022-10-19 03:30:00 -05:00
Jonathan R. Madsen 8f8ead76b5 Signal handler backtraces provide line info (#178)
* Signal handler backtraces provide line info

- print backtrace after SIGINT during finalization

* Workflow run-name + jammy rocm CI

* fix jammy matrix indentation

* disable building dyninst in jammy

* Update jammy for rocm

* jammy rocm_agent_enumerator

* Fix rocm install for jammy

* jammy bash

* jammy workflow typo

* revert some changes

* stack-usage + omnitrace-rt symlink + ncclSocketAccept + indiv sigs

- symlink omnitrace-rt in build tree
- exclude ncclSocketAccept
- timemory submodule update accepting individual signal handlers

[ROCm/rocprofiler-systems commit: 78a06e7a42]
2022-10-18 21:45:56 -05:00
Jonathan R. Madsen 713d08de6d Support for Ubuntu 22.04 and ROCm 5.3 (#48)
* Testing and CI support for Ubuntu 22.04

* Fixes for ROCm

- Jammy does not have ROCm installers

* Name, timeout, and python updates

- renamed ubuntu-jammy-external.yml to ubuntu-jammy.yml
- increased all 5 minute timeouts to 10 minutes
- include python 3.10 in testing

* Update dyninst to remove interposed definition of _r_debug

* Rebuild Dyninst + test install script

* Revert container change

* git safe directory

* pushd -> cd

* fix MPI include

* Fix testing step

* OMPI_ALLOW_RUN_AS_ROOT

* Test script changes

* Fix mismatched malloc / delete[]

* Jammy workflow tweaks

* CPack tweak for boost deb deps

* pthread_mutex_gotcha config returns when not enabled

* fix echoing config in CI

* USE_CLANG_OMP

- option to disable using LLVM OpenMP when building OpenMP test executables
- Jammy workflow sets USE_CLANG_OMP=OFF

* Dyninst submodule boost download

- updated containers workflow to include jammy
- updated workflow to use ci

* Updates to workflows + replace test-install.sh

- test-install.sh in this branch was replaced with one in main branch

* Expand jammy test-install.sh args

* Fix openmp-cg-sampling-duration test

* update timemory submodule

- use-after-free violation in popen::pclose

* revert some tweaks to sampling-duration test

* Fix env of test-install.sh

* cmake format

* jammy bash

* CPack install for jammy

* formatting workflow action version bump

* Update timemory submodule

- libunwind submodule via timemory sets SOVERSION to 99 to avoid ABI conflicts with v8

* Fix help menu for omnitrace-sample

* Support other boolean forms in test-install.sh

* Update docker files and build-docker.sh

- consolidated cases in build-docker.sh
- support rocm version of 0.0 (no rocm install)
- support rocm v5.3
- updated centos handling

* update opensuse actions/checkout version

* Tweaks to ubuntu-focal testing

- actions/checkout@v3
- use test-install script

* update cpack

- ubuntu 22.04
- rocm 5.3
- rename os matrix field to os-version
- remove CI_ROCM_VERSION (no longer necessary)
- remove default-rocm-version matrix field (no longer necessary)
- CentOS packaging

* fix argparsing and omnitrace-sample tests in install-tests.sh

* focal rocm test install workflow fix

* Fix omnitrace-sample build

* Dockerfile.centos + build-docker.sh updates

* Update actions/upload-artifact version

* Dockerfile.ubuntu: install rocm-device-libs

* Refactor cpack

* fix cpack if quotes

* Dockerfile.ubuntu rocm < 5 installs rocm-dev

* build-release.sh defaults to boost version 1.79.0

[ROCm/rocprofiler-systems commit: ede6007f9b]
2022-10-17 12:54:26 -05:00
Jonathan R. Madsen 0ec0d18ac8 Trace thread config + paranoid level + preload (#176)
- OMNITRACE_TRACE_THREAD_BARRIERS config option
  - set to OFF to disable wrapping `pthread_barrier`
- OMNITRACE_TRACE_THREAD_JOIN config option
  - set to OFF to disable wrapping `pthread_join`
- allow PAPI with perf_event_paranoid at level 2
- default to no PAPI events
- setenv LD_PRELOAD to not include libomnitrace after preload
  - closes #175 
- bump version to 1.7.1

[ROCm/rocprofiler-systems commit: a3439d5bf2]
2022-10-06 19:11:08 -05:00
Jonathan R. Madsen 45e5450bf2 Fix finalization segfaults (#174)
- update timemory submodule with fixes to papi components and signals
update

[ROCm/rocprofiler-systems commit: 2a387f9099]
2022-10-04 00:00:05 -05:00
Jonathan R. Madsen e1de27dde7 Raise default min number of instructions (#173)
- Raise min instructions default to 1024 instead of 64
- Default value of 64 has demonstrated tendency to slow down real-life
applications
- Improved the memory safety during `omnitrace_finalize()`
- new modifications guarantee that when `tim::manager::instance()` on
main thread is destroyed, omnitrace will finalize before
- Improved some warning w/ roctracer
- Improved the search for `ROCP_METRICS` and
`OMNITRACE_ROCPROFILER_LIBRARY`
- disable printing env by default
- Attempted to improve the sampling shutdown

[ROCm/rocprofiler-systems commit: 7d7a8f2c23]
2022-09-30 19:28:43 -05:00
Jonathan R. Madsen 5973299ccd omnitrace-sample (#169)
- `omnitrace-sample` executable which executes sampling (no
instrumentation)
- fixes bug with OMPT ignoring value of `OMNITRACE_USE_OMPT`
- fixes some issues with sampling duration
- new `OMNITRACE_SAMPLING_INCLUDE_INLINES` configuration variable
- restricts process-sampling to 100 interrupts/sec when inheriting value
from `OMNITRACE_SAMPLING_FREQ`
- `OMNITRACE_PROCESS_SAMPLING_FREQ` still supports up to 1000
interrupts/sec
- fixes bug with colorized log not truly being disabled in all instances
- adds tests for `omnitrace-sample`
- adds tests for sampling duration
- settings ROCP_TOOL_LIB to libomnitrace-dl throws error
  - rocprofiler does not configure correctly when this is done
- Quiet numa_gotcha warnings
- Fixed some shadowed variables

[ROCm/rocprofiler-systems commit: 79a8f16646]
2022-09-30 10:47:07 -05:00
Jonathan R. Madsen 07e3cf256a Resolve warnings/errors with extra warnings (#171)
[ROCm/rocprofiler-systems commit: 4e3527f0ed]
2022-09-28 14:28:32 -05:00
Jonathan R. Madsen 6c8570f610 Support for libbfd (binary file descriptor) (#168)
- provides addr2line information for sampling

[ROCm/rocprofiler-systems commit: 2ed818449f]
2022-09-28 00:18:05 -05:00
Jonathan R. Madsen 6ec1c639bd Fix spack builds when no ROCm .info/version files (#170)
- closes #166

[ROCm/rocprofiler-systems commit: 19edd0d421]
2022-09-27 07:40:02 -05:00
Jonathan R. Madsen 073ab3882b Fix deadlocks during initialization (#167)
- More to come in later commit, below is just tidying some stuff up
  - clang-tidy
  - mpi_gotcha quiet about not finding funcs
  - update to new papi config
  - sampling block_samples / unblock_samples
    - disable calling component's sample functions within sampler
  - release doesn't strip library
  - remove HSA and ROCP env variables from modulefile / setup-env
- preliminary support for LD_PRELOAD usage
- default sampling rate is 300 interrupts / second
- fixes various deadlock issues at startup

[ROCm/rocprofiler-systems commit: 8f36620e29]
2022-09-26 07:52:14 -05:00
Jonathan R. Madsen d371fdb149 Crusher hackathon updates (#164)
- improved error handling in dyninst
- improved error handling in omnitrace exe
- new logging facility for omnitrace exe
- improved backtraces
- disable concurrent kernels in rocprofiler
- updates `setup-env.sh` and modulefile
  - set `omnitrace_ROOT`
  - set `HSA_TOOLS_LIB` if roctracer or rocprofiler enabled
  - set `ROCP_TOOL_LIB` if rocprofiler enabled
  - closes #163 
- No longer make setting `HSA_ENABLE_INTERRUPT=0` the default 
  - this has performance implications
- this was set to workaround a bug in ROCR which caused an ioctl call in
ROCm to hang when interrupted. But it was only interrupted when realtime
sampling was enabled since the CPU-clock doesn't increment when waiting
  - This bug should be fixed in ROCm 5.3
- omnitrace no longer activates a realtime sampler by default when
sampling, thus this bug is no longer encountered unless the user
explicitly triggers realtime sampling

[ROCm/rocprofiler-systems commit: 90ff7188f8]
2022-09-21 13:58:14 -05:00
Jonathan R. Madsen 62fdf2720f Fix building w/ hip, etc. but w/o rocprofiler (#159)
- Fixes builds with `OMNITRACE_USE_ROCPROFILER=OFF` but
`OMNITRACE_USE_ROCTRACER=ON`
- Move rocprofiler/hsa_rsrc_factory.{hpp,cpp}` to rocm folder

[ROCm/rocprofiler-systems commit: 472e96a084]
2022-09-14 00:07:33 -05:00
Jonathan R. Madsen 31c1f5bdf0 Update timemory submodule (#160)
- Improve sampling allocator startup - semaphore init/destroy handled by
ctor/dtor - avoid future deadlocks
- Support runtime_enabled set to off in sampler::execute
- Fix sampling timer checks for finite delay/freq/period
- Fixes bug when sampling frequency was set to 300 and sampling::timer
deduced it as not-finite, causing an error to be thrown
- Update operation::generic_operator::check()
- lightweight_tuple update
- Includes numerous safety checks on `substr` calls in timemory

[ROCm/rocprofiler-systems commit: d4d44e744b]
2022-09-14 00:07:18 -05:00
Jonathan R. Madsen f8bdb3c325 Bump version to 1.7.0 (#162)
[ROCm/rocprofiler-systems commit: e671a6975a]
2022-09-13 23:05:26 -05:00
Jonathan R. Madsen df5c4f2333 Fix gotcha indexes for numa_gotcha (#161)
- In #152, `numa_alloc_interleaved` and `numa_alloc_local` both were
indexed at position 6

[ROCm/rocprofiler-systems commit: e4e3c2b7f4]
2022-09-13 23:03:45 -05:00
Jonathan R. Madsen 6227c25220 Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
  - Closes #149 
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
  - Closes #153 
  - Closes #154

[ROCm/rocprofiler-systems commit: 15e6e6d979]
2022-09-13 08:11:48 -05:00
Jonathan R. Madsen 63a999481c Individual perfetto debug annotations (#157)
- instead of args="<list-of-string>" in perfetto, each argument is added
individually, enabling matching pointer values, etc.

## Previous behavior

Previously, all function arguments were wrapped into a single string,
e.g.:

![Screen Shot 2022-09-12 at 11 52 11
PM](https://user-images.githubusercontent.com/6001865/189811896-3b13d5a4-e884-40b3-b6b4-f84f3e16f603.png)

## New Behavior

With the exception of the HIP API (whose args are provided as a single
string via `hipApiString`), all functions from MPI, RCCL, pthreads, etc.
have individual arguments in perfetto, e.g.:

![Screen Shot 2022-09-12 at 11 52 01
PM](https://user-images.githubusercontent.com/6001865/189812137-63afba72-170d-42b3-b3bc-ae74e32ceadf.png)

In the above, previously, this would have been:

|  |  |
| - | - |
| args | `pthread_rwlock_t*=0x1c1cc50` |

The key benefit enabled is the ability to find slices with same arg
values:

<img width="753" alt="Screen Shot 2022-09-12 at 11 59 18 PM"
src="https://user-images.githubusercontent.com/6001865/189812915-0342f841-e5ce-4f8e-8169-0cb52f3425b5.png">

Previously, the entire "args" field would have had to match, which
essentially never happened if the pointer was used in two different
functions with different function signatures

## Example

![Screen Shot 2022-09-12 at 11 49 18
PM](https://user-images.githubusercontent.com/6001865/189811621-5eed0008-f340-438e-938d-a140f836b583.png)

![Screen Shot 2022-09-12 at 11 49 45
PM](https://user-images.githubusercontent.com/6001865/189811584-a983ed1c-095f-4d54-a790-9ef3adbe4e08.png)

[ROCm/rocprofiler-systems commit: 4ed8f8f762]
2022-09-13 02:05:03 -05:00
Jonathan R. Madsen 3e69264431 GOTCHA wrappers for NUMA functions (#152)
- GOTCHA wrappers for:
  - mbind
  - migrate_pages
  - move_pages
  - numa_migrate_pages
  - numa_move_pages
  - numa_alloc
  - numa_alloc_local
  - numa_alloc_interleaved
  - numa_alloc_onnode
  - numa_realloc
  - numa_free
- bumped version to 1.6.0
- updated categories
- hardware_counters -> kernel_hardware_counters and
thread_hardware_counters
  - simplified mapping category structs to perfetto category names
- updated timemory submodule

[ROCm/rocprofiler-systems commit: 64afa49193]
2022-09-12 17:44:27 -05:00
Jonathan R. Madsen 1b6cbf7d65 Support tracing thread locks with perfetto (#143)
- remove sampling and roctracer flat/timeline options
  - unused/unnecessary clutter
- start pthread_gotcha before perfetto
- remove pthread_mutex_gotcha validate
- update timemory submodule with tid fix

[ROCm/rocprofiler-systems commit: 2718596e5a]
2022-08-31 11:33:45 -05:00
Jonathan R. Madsen 2ef9dfd002 Support sampling duration, sampling TIDs (#142)
- Sampling duration config values
  - OMNITRACE_SAMPLING_DURATION
  - OMNITRACE_PROCESS_SAMPLING_DURATION
  - Disables sampling after this time (in seconds) has elapsed 
- Sampling thread-id config values
  - OMNITRACE_SAMPLING_TIDS
  - OMNITRACE_SAMPLING_CPUTIME_TIDS
  - OMNITRACE_SAMPLING_REALTIME_TIDS
  - Allows user to select certain threads for sampling
- Miscellaneous
  - Tweaked the finalization verbosity messages
  - moved sampling-on-child-threads into runtime.hpp and runtime.cpp
  - fixed submodule dyninst header install

[ROCm/rocprofiler-systems commit: e67afd33eb]
2022-08-31 06:29:19 -05:00
Jonathan R. Madsen cbdc7cad4b bump-version (#141)
[ROCm/rocprofiler-systems commit: c2d6589b93]
2022-08-31 01:26:49 -05:00
Jonathan R. Madsen 473f452d39 Rework sampling and colorized logs (#140)
## Overview

This is a significant PR which has 3 very notable characteristics:

1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling 
  - Samples now record the current instruction pointers instead of strings
    - This _dramatically_ decreases the overhead of taking a sample
  - The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  - When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
    - `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency 
  - `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
  - Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
    - In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
  - This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

## Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

## Details

- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
  - there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
  - removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
  - removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule

[ROCm/rocprofiler-systems commit: 808ea7dfa7]
2022-08-31 01:24:31 -05:00
Jonathan R. Madsen f642813ad1 Static libstdcxx and python (#139)
Support python + static libstdc++

[ROCm/rocprofiler-systems commit: a1dcd1bc4b]
2022-08-28 03:56:13 -05:00
Jonathan R. Madsen 7ceb1e0bee Generic comm_data component (#132)
* Generic comm_data component

- moved rccl_comm_data to comm_data
- comm_data includes communication data for MPI

* fix timemory include with quotes

* Only support MPI comm data with full MPI support

* Increase timeouts + kill perfetto

* Update timemory submodule

* Fix missing command killall

* set +e in Kill Perfetto workflow step

* Updated MPI example to include MPI_Send and MPI_Recv calls

* Update timemory submodule with storage merge fix

* Perfetto comm data

- tracing::now<T>() function

* Fix timemory header include

[ROCm/rocprofiler-systems commit: 0dd8f52292]
2022-08-25 19:48:10 -05:00
Tamima Rashid 78fc9b9244 Python validation-external & builtin (#123)
* adding python-builtin-validation

* adding python-builtin-test check

* adding validation for external.py including and excluding inefficient

* validation for python-external(including & excluding inefficient) & python-builtin

* fixing label mixmatch for builtin and added only perfetto validation for external, no timemory validation for external

* python-validation-builtin

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

[ROCm/rocprofiler-systems commit: a1afd69a02]
2022-08-17 10:30:18 -05:00
Jonathan R. Madsen e40f11b1fc Python noprofile (#138)
* Fixed noprofile / FakeProfiler

- omnitrace.libpyomnitrace.profiler.profiler_pause()
- omnitrace.libpyomnitrace.profiler.profiler_resume()

* Python tests for noprofile

* Remove static imported module

[ROCm/rocprofiler-systems commit: 3f3ef7ddf9]
2022-08-16 19:28:58 -05:00