Grafico dei commit

16 Commit

Autore SHA1 Messaggio Data
Jonathan R. Madsen 223536896b Fix rocprofiler usage in ROCm >= 5.5.x (#288)
Fix rocprofiler usage in ROCm >= 5.5.x

- starting with ROCm 5.5.0, rocprofiler throws exception if OnLoad + dlopen librocprofiler
- CI skipped for this PR since CI does not support GPU usage (tested locally)
2023-06-15 23:28:45 -05:00
Jonathan R. Madsen 3e2fa69a14 CI timeout + line-info in releases (#279)
* Update perfetto args.gn.in

- remove enable_perfetto_tools_trace_to_text (unused)

* core timeout implementation

- requires OMNITRACE_CI=ON
- requires OMNITRACE_CI_TIMEOUT=<sec>
- adds pthread_self and std::this_thread::get_id to thread info
- pthread_create_gotcha stores native handles (pthread_self)

* Testing updates

- improve detection of segfault/failures with PASS_REGEX exists
- add OMNITRACE_CI_TIMEOUT env variable to all tests

* Line-info in releases

- e.g. -g1 + more options to minimize size of debug info

* Fix typo in config exit action message

* OMNITRACE_UNLIKELY around debug/verbose messages

* format fixes

* Overflow tests + capability check

* transpose example update

- link to threads library

* roctracer/rocprofiler update

- in ROCm 5.5.0, cannot include rocprofiler.h and roctracer.h in same file due to conflicting enum defs
- Moved HSA tracing setup/shutdown to component::roctracer

* roctracer update

- fix definition of roctracer::setup when disabled

* Update fork example

- detach threads on main PID
- flush io outputs when printing info

* Update overflow tests

- pass regular expressions
- overflow on PERF_COUNT_SW_CPU_CLOCK event

* fork gotcha update

- use getpid() instead of getppid()

* update fork example

- wait on threads calling fork

* timeout update

- wait on timeout thread to launch before proceeding
2023-06-14 11:55:22 -05:00
Jonathan R. Madsen 4ed5f3e67b rocprofler_iterate_info workaround + omnitrace-avail update (#270)
* rocprofler_iterate_info workaround + omnitrace-avail update

- provides workaround for rocprofiler_iterate_info behavior change in ROCm 5.4.0-3
- update timemory submodule with argparse tweaks
- updates hsa_rsrc_factory.{hpp,cpp}
- colorized log in omnitrace-avail
- Bump version to 1.9.2

* Fix empty_base inheritance

- timemory's component::empty_base inherits from concepts::component so direct inheritance was removed

* Fix OMNITRACE_HIP_VERSION_COMPAT_STRING

- defined as "" when OMNITRACE_HIP_VERSION_MAJOR==0

* new defines + extra info

- define OMNITRACE_LIBRARY_ARCH (via CMAKE_LIBRARY_ARCHITECTURE)
- define OMNITRACE_SYSTEM_NAME (via CMAKE_SYSTEM_NAME)
- define OMNITRACE_SYSTEM_PROCESSOR (via CMAKE_SYSTEM_PROCESSOR)
- define OMNITRACE_SYSTEM_VERSION (via OMNITRACE_SYSTEM_VERSION)
- define OMNITRACE_COMPILER_ID (via CMAKE_CXX_COMPILER_ID)
- define OMNITRACE_COMPILER_VERSION (via CMAKE_CXX_COMPILER_VERSION)
- include this info in metadata
- include subset of this info in --version for bin tools
- tweak to perfetto verbose messages
2023-03-30 04:21:43 -05:00
Jonathan R. Madsen 32b15fe7b7 Handle fork in target application (#191)
* Always print PID in log messages

* omnitrace-dl updates

- omnitrace_preload does not call omnitrace_init or omnitrace_init_tooling
- omnitrace_preload will call omnitrace_set_mpi if OMNITRACE_USE_MPI
  or OMNITRACE_USE_MPIP in the env is true but not call it otherwise
  because doing so either overrides OMNITRACE_USE_PID (when true) or
  disable mpip from initialization (when false) and the MPI
  init can be caught later and override OMNITRACE_USE_PID

* config updates

- set_setting_value sets user update type
- remove volatile from get_settings_configured
- don't override settings::default_process_suffix
- don't kill process in omnitrace_exit_action
- set_state ignores updating state if >= State::Finalized

* Handle state > State::Finalized

* fork gotcha updates

- unsets LD_PRELOAD
- sets OMNITRACE_ROOT_PROCESS
- sets OMNITRACE_CHILD_PROCESS

* libomnitrace library.cpp updates

- basic_bundle for fini metrics
- handle finalization from child process

* sampling updates

- sampling::shutdown handles when child process

* Add example and test using fork

* Update run-ci script to support not submitting

* Tweak test envs

* Update build flags when codecov enabled

* remove unnecessary includes of sampling header

* Replace mpi copy/fini static lambda with free-funcs

* Update codecov job

* Fix OMPT segfaults after finalization

* Miscellaneous updates after rebase

* fixes for causal profiling

* revert some run-ci.sh changes

* Disable storing env in sampling::shutdown

* formatting fix

* Update timemory submodule

- fixed occasional synchronization issues with allocator offloading
- exclude protozero:: from internal samples

* improve root/child process detection

- avoid omnitrace_finalize in MPI when child process
- revert some testing tweaks
2023-02-08 01:31:38 -06:00
Jonathan R. Madsen e7d3125459 restructure libomnitrace + tasking and omnitrace-causal updates (#237)
* restructured libomnitrace

- this is necessary to incorporate some of the binary analysis capabilities into omnitrace exe
- created libomnitrace-core (static)
- created libomnitrace-binary (static)
- created libomnitrace (static)
- omnitrace-avail links to libomnitrace.a
- omnitrace-critical-trace links to libomnitrace.a
- tweaked the testing
  - reduced verbosity on some of MPI tests
  - excluded trace-time-window from tests on Ubuntu 18.04
  - reduced causal e2e iterations
- minor tweak to tasking
  - manually create `PTL::UserTaskQueue` instance instead of relying on `PTL::ThreadPool` to create it

* Update formatting workflow

- source formatting uses ubuntu-22.04
- check-includes doesn't generate false positive for 'include "timemory.hpp"'

* omnitrace-causal --generate-configs

- fix config generation in omnitrace causal
- add test for omnitrace-causal + generating configs

* Fix omnitrace-object-library build

- accidentally included rocm sources in non-rocm builds

* Fix rocm compilation w/o rocprofiler

* update timemory submodule with mpi_get warning messages

* sampling offload file updates

- more verbose messages
- disable offload before stopping

* testing updates

- increase causal e2e iterations to 12
- increase lock_environment verbose to 2 (for sampling offload messages)
- fix return for omnitrace_add_validation_test
2023-02-04 10:59:50 -06:00
Mészáros Gergely 2449e7cd46 rocm: Fix compilation with rocm >= 5.3.1 (#214)
* rocm: Fix compilation with rocm >= 5.3.1

Add another version checked initialiation, because:
[roctracer has changed `hsa_ops_properties_t` again in 5.3.1][hsa_ops_properties_t]

[hsa_ops_properties_t]: https://github.com/ROCm-Developer-Tools/roctracer/compare/rocm-5.3.0...rocm-5.3.1#diff-42ffacd4e7f57213868dcd96aeb5c0d47d91d2a1121ba8d67f46caa267c1818cL41

* Fix formatting

* Fix preprocessing directive
2022-11-18 12:12:59 -06:00
Jonathan R. Madsen 2a387f9099 Fix finalization segfaults (#174)
- update timemory submodule with fixes to papi components and signals
update
2022-10-04 00:00:05 -05:00
Jonathan R. Madsen 8f36620e29 Fix deadlocks during initialization (#167)
- More to come in later commit, below is just tidying some stuff up
  - clang-tidy
  - mpi_gotcha quiet about not finding funcs
  - update to new papi config
  - sampling block_samples / unblock_samples
    - disable calling component's sample functions within sampler
  - release doesn't strip library
  - remove HSA and ROCP env variables from modulefile / setup-env
- preliminary support for LD_PRELOAD usage
- default sampling rate is 300 interrupts / second
- fixes various deadlock issues at startup
2022-09-26 07:52:14 -05:00
Jonathan R. Madsen 90ff7188f8 Crusher hackathon updates (#164)
- improved error handling in dyninst
- improved error handling in omnitrace exe
- new logging facility for omnitrace exe
- improved backtraces
- disable concurrent kernels in rocprofiler
- updates `setup-env.sh` and modulefile
  - set `omnitrace_ROOT`
  - set `HSA_TOOLS_LIB` if roctracer or rocprofiler enabled
  - set `ROCP_TOOL_LIB` if rocprofiler enabled
  - closes #163 
- No longer make setting `HSA_ENABLE_INTERRUPT=0` the default 
  - this has performance implications
- this was set to workaround a bug in ROCR which caused an ioctl call in
ROCm to hang when interrupted. But it was only interrupted when realtime
sampling was enabled since the CPU-clock doesn't increment when waiting
  - This bug should be fixed in ROCm 5.3
- omnitrace no longer activates a realtime sampler by default when
sampling, thus this bug is no longer encountered unless the user
explicitly triggers realtime sampling
2022-09-21 13:58:14 -05:00
Jonathan R. Madsen 472e96a084 Fix building w/ hip, etc. but w/o rocprofiler (#159)
- Fixes builds with `OMNITRACE_USE_ROCPROFILER=OFF` but
`OMNITRACE_USE_ROCTRACER=ON`
- Move rocprofiler/hsa_rsrc_factory.{hpp,cpp}` to rocm folder
2022-09-14 00:07:33 -05:00
Jonathan R. Madsen 15e6e6d979 Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
  - Closes #149 
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
  - Closes #153 
  - Closes #154
2022-09-13 08:11:48 -05:00
Jonathan R. Madsen e67afd33eb Support sampling duration, sampling TIDs (#142)
- Sampling duration config values
  - OMNITRACE_SAMPLING_DURATION
  - OMNITRACE_PROCESS_SAMPLING_DURATION
  - Disables sampling after this time (in seconds) has elapsed 
- Sampling thread-id config values
  - OMNITRACE_SAMPLING_TIDS
  - OMNITRACE_SAMPLING_CPUTIME_TIDS
  - OMNITRACE_SAMPLING_REALTIME_TIDS
  - Allows user to select certain threads for sampling
- Miscellaneous
  - Tweaked the finalization verbosity messages
  - moved sampling-on-child-threads into runtime.hpp and runtime.cpp
  - fixed submodule dyninst header install
2022-08-31 06:29:19 -05:00
Jonathan R. Madsen 808ea7dfa7 Rework sampling and colorized logs (#140)
## Overview

This is a significant PR which has 3 very notable characteristics:

1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling 
  - Samples now record the current instruction pointers instead of strings
    - This _dramatically_ decreases the overhead of taking a sample
  - The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  - When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
    - `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency 
  - `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
  - Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
    - In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
  - This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

## Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

## Details

- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
  - there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
  - removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
  - removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
2022-08-31 01:24:31 -05:00
Jonathan R. Madsen cbe8862ea4 Verbose messages based on ROCP_ONLOAD_TRACE env (#131)
Debugging based on ROCP_ONLOAD_TRACE env
2022-08-08 08:38:18 -05:00
Jonathan R. Madsen 7e31d9f450 ROCm environment fixes + workflow updates (#117)
* Improve dlopen of ROCm libraries + rocprofiler test

- Use PROJECT_BINARY_DIR in tests
- Added rocprofiler test

* Revert OMNITRACE_FORCE_ROCPROFILER_INIT

* omnitrace-avail --all test

* Fix ROCP_METRICS for ROCm 5.2.0

* Fix ROCP_METRICS for ROCm 5.2.0

* Restrict containers workflow to AMDResearch/omnitrace

* Bump version to 1.3.1

* Update cpack workflow

- generate release draft
- upload installers as release assets

* Test rocprofiler w/o roctracer enabled

* Fix formatting

* verbose message
2022-07-27 06:36:52 -05:00
Jonathan R. Madsen 4208b5654c GPU HW Counters via rocprofiler (#84)
* Initial support for GPU hardware counters

* Update find modules for roctracer and rocprofiler

- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure

* Improve ConfigCPack for MPI

* Update rocprofiler

- rocm_metrics()
- minor cleanup

* Update rocm find modules

* declare rocm_metrics + call in omnitrace-avail

* relocate omnitrace-launch-compiler

* REALPATH and find_modules

* Examples cmake (may drop)

* omnitrace-avail

- hw_counter categories
- init rocm

* setenv updates for rocprofiler in library.cpp and dl.cpp

* get_rocm_events config

* gpu::hip_device_count()

* rocm_metrics returns hardware_counters::info

* - relocated library/components/roctracer_callbacks.* to library/roctracer.*
- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker

* update omnitrace-avail info_type generation

* Update timemory submodule

* rocprofiler component

* cmake formatting

* omnitrace-avail handle no GPUs

- Add -c command-line option for --categories
- support verbosity

* hsa_rsrc_factory throws exceptions

- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler

* rocprofiler symbols for when disabled

* Fix warning in omnitrace-avail

- std::stringstream from initializer list would use explicit constructor

* Fix finalization after settings are deleted

* Reorganized rocprofiler source

* Updated formatting

* Miscellaneous tweaks

- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes
2022-07-17 21:52:09 -05:00