Граф коммитов

125 Коммитов

Автор SHA1 Сообщение Дата
Jonathan R. Madsen e4e3c2b7f4 Fix gotcha indexes for numa_gotcha (#161)
- In #152, `numa_alloc_interleaved` and `numa_alloc_local` both were
indexed at position 6
2022-09-13 23:03:45 -05:00
Jonathan R. Madsen 15e6e6d979 Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
  - Closes #149 
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
  - Closes #153 
  - Closes #154
2022-09-13 08:11:48 -05:00
Jonathan R. Madsen 4ed8f8f762 Individual perfetto debug annotations (#157)
- instead of args="<list-of-string>" in perfetto, each argument is added
individually, enabling matching pointer values, etc.

## Previous behavior

Previously, all function arguments were wrapped into a single string,
e.g.:

![Screen Shot 2022-09-12 at 11 52 11
PM](https://user-images.githubusercontent.com/6001865/189811896-3b13d5a4-e884-40b3-b6b4-f84f3e16f603.png)

## New Behavior

With the exception of the HIP API (whose args are provided as a single
string via `hipApiString`), all functions from MPI, RCCL, pthreads, etc.
have individual arguments in perfetto, e.g.:

![Screen Shot 2022-09-12 at 11 52 01
PM](https://user-images.githubusercontent.com/6001865/189812137-63afba72-170d-42b3-b3bc-ae74e32ceadf.png)

In the above, previously, this would have been:

|  |  |
| - | - |
| args | `pthread_rwlock_t*=0x1c1cc50` |

The key benefit enabled is the ability to find slices with same arg
values:

<img width="753" alt="Screen Shot 2022-09-12 at 11 59 18 PM"
src="https://user-images.githubusercontent.com/6001865/189812915-0342f841-e5ce-4f8e-8169-0cb52f3425b5.png">

Previously, the entire "args" field would have had to match, which
essentially never happened if the pointer was used in two different
functions with different function signatures

## Example

![Screen Shot 2022-09-12 at 11 49 18
PM](https://user-images.githubusercontent.com/6001865/189811621-5eed0008-f340-438e-938d-a140f836b583.png)

![Screen Shot 2022-09-12 at 11 49 45
PM](https://user-images.githubusercontent.com/6001865/189811584-a983ed1c-095f-4d54-a790-9ef3adbe4e08.png)
2022-09-13 02:05:03 -05:00
Jonathan R. Madsen 64afa49193 GOTCHA wrappers for NUMA functions (#152)
- GOTCHA wrappers for:
  - mbind
  - migrate_pages
  - move_pages
  - numa_migrate_pages
  - numa_move_pages
  - numa_alloc
  - numa_alloc_local
  - numa_alloc_interleaved
  - numa_alloc_onnode
  - numa_realloc
  - numa_free
- bumped version to 1.6.0
- updated categories
- hardware_counters -> kernel_hardware_counters and
thread_hardware_counters
  - simplified mapping category structs to perfetto category names
- updated timemory submodule
2022-09-12 17:44:27 -05:00
Jonathan R. Madsen 2718596e5a Support tracing thread locks with perfetto (#143)
- remove sampling and roctracer flat/timeline options
  - unused/unnecessary clutter
- start pthread_gotcha before perfetto
- remove pthread_mutex_gotcha validate
- update timemory submodule with tid fix
2022-08-31 11:33:45 -05:00
Jonathan R. Madsen e67afd33eb Support sampling duration, sampling TIDs (#142)
- Sampling duration config values
  - OMNITRACE_SAMPLING_DURATION
  - OMNITRACE_PROCESS_SAMPLING_DURATION
  - Disables sampling after this time (in seconds) has elapsed 
- Sampling thread-id config values
  - OMNITRACE_SAMPLING_TIDS
  - OMNITRACE_SAMPLING_CPUTIME_TIDS
  - OMNITRACE_SAMPLING_REALTIME_TIDS
  - Allows user to select certain threads for sampling
- Miscellaneous
  - Tweaked the finalization verbosity messages
  - moved sampling-on-child-threads into runtime.hpp and runtime.cpp
  - fixed submodule dyninst header install
2022-08-31 06:29:19 -05:00
Jonathan R. Madsen 808ea7dfa7 Rework sampling and colorized logs (#140)
## Overview

This is a significant PR which has 3 very notable characteristics:

1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling 
  - Samples now record the current instruction pointers instead of strings
    - This _dramatically_ decreases the overhead of taking a sample
  - The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
  - When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
    - `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency 
  - `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
  - Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
    - In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
  - This helped de-clutter the output folder

Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.

## Bug Fixes

There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made

## Details

- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
  - there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
  - added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
  - removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
  - removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
  - migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
  - handle disabled thread state
  - handle finalized state
  - fewer debug messages
  - invoke thread_init()
  - invoke thread_init_sampling()
  - handle push/pop count based on category
  - push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
2022-08-31 01:24:31 -05:00
Jonathan R. Madsen a1dcd1bc4b Static libstdcxx and python (#139)
Support python + static libstdc++
2022-08-28 03:56:13 -05:00
Jonathan R. Madsen 0dd8f52292 Generic comm_data component (#132)
* Generic comm_data component

- moved rccl_comm_data to comm_data
- comm_data includes communication data for MPI

* fix timemory include with quotes

* Only support MPI comm data with full MPI support

* Increase timeouts + kill perfetto

* Update timemory submodule

* Fix missing command killall

* set +e in Kill Perfetto workflow step

* Updated MPI example to include MPI_Send and MPI_Recv calls

* Update timemory submodule with storage merge fix

* Perfetto comm data

- tracing::now<T>() function

* Fix timemory header include
2022-08-25 19:48:10 -05:00
Jonathan R. Madsen 3f3ef7ddf9 Python noprofile (#138)
* Fixed noprofile / FakeProfiler

- omnitrace.libpyomnitrace.profiler.profiler_pause()
- omnitrace.libpyomnitrace.profiler.profiler_resume()

* Python tests for noprofile

* Remove static imported module
2022-08-16 19:28:58 -05:00
Jonathan R. Madsen 23fb3946c7 Handle --advanced printing for config generation (#137)
Handle --advanced printing for config
2022-08-08 15:28:34 -05:00
Jonathan R. Madsen 040da3fc6a Enable TRACE_THREAD_RW_LOCKS and TRACE_THREAD_SPIN_LOCKS by default (#136)
* Enable OMNITRACE_TRACE_THREAD_{RW,SPIN}_LOCKS by default

- updates timemory submodule with updated GOTCHA submodule
- fix to GOTCHA library which defaults to not wrapping dlopen and dlsym prevents deadlock

* Bump version to 1.4.0
2022-08-08 15:28:01 -05:00
Jonathan R. Madsen 34013bc539 OMNITRACE_TRACE_THREAD_SPIN_LOCKS config (#134)
- configuration setting to wrap pthread_spin_lock, pthread_spin_trylock, pthread_spin_unlock
2022-08-08 08:38:52 -05:00
Jonathan R. Madsen 95913c7135 Update python install + build-tree setup (#128)
- When one python version is used, install to proper lib/pythonX.Y/site-packages
- config files, etc. in build tree resemble the install-tree
2022-08-08 08:38:38 -05:00
Jonathan R. Madsen cbe8862ea4 Verbose messages based on ROCP_ONLOAD_TRACE env (#131)
Debugging based on ROCP_ONLOAD_TRACE env
2022-08-08 08:38:18 -05:00
Jonathan R. Madsen 49127db414 Fix some inconsistencies in debug messages w/in category_region (#135) 2022-08-08 08:38:04 -05:00
Jonathan R. Madsen bb400d5d61 Remove unused funcs + messages for excluding system lib (#133)
- instead of filtering system library opaquely, generate info messages
- remove unused are_file_include_exclude_lists_empty()
- remove unused module_constraint free func
- remove unused routine_counstraint free func
2022-08-08 08:37:51 -05:00
Jonathan R. Madsen 1343c6722a offset thread ids where possible (#130)
- configure known background threads to start indexing down from TIMEMORY_MAX_THREADS
- invoke sampling::shutdown() instead of just blocking signals where possible
2022-08-08 08:37:37 -05:00
Jonathan R. Madsen ac5ce00561 Update config generation, join fix, sampler (#129)
- update timemory submodule
  - update ring_buffer to remove all dynamic allocation
  - update sampler + sampler allocator to use ring-buffers
  - fix papi_array and papi_vector labels
- fix common::join with char delimiter
- fix config generation without --advanced
2022-08-08 07:03:24 -05:00
Jonathan R. Madsen afa3df8523 Advanced category for configuration options (#125)
Adds advanced category

- advanced category hides less relevant configuration options
- omnitrace-avail has new '--advanced' option which shows these flags
- increase verbosity level to print issue with reading ppid children
- OMNITRACE_ROCTRACER_HSA_ACTIVITY defaults to ON
- OMNITRACE_ROCTRACER_HSA_API defaults to ON
2022-08-03 12:13:00 -05:00
Jonathan R. Madsen b79ce10fee RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2 (#126)
* RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2

- until v5.2 only librocprofiler64.so was symlinked in /opt/rocm. Thus linker using SOVERSION caused issues finding librocprofiler64.so.1

* Test ROCm w/ CMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF

* INSTALL_RPATH_USE_LINK_PATH for omnitrace exe
2022-08-02 14:19:04 -05:00
Jonathan R. Madsen 97d17a8ef8 Fix RPATH handling (#122)
Fix rpath handling

- remove explicit add to CMAKE_INSTALL_RPATH
- remove overriding INSTALL_RPATH_USE_LINK_PATH for exes
2022-08-01 14:19:19 -05:00
Jonathan R. Madsen 7e31d9f450 ROCm environment fixes + workflow updates (#117)
* Improve dlopen of ROCm libraries + rocprofiler test

- Use PROJECT_BINARY_DIR in tests
- Added rocprofiler test

* Revert OMNITRACE_FORCE_ROCPROFILER_INIT

* omnitrace-avail --all test

* Fix ROCP_METRICS for ROCm 5.2.0

* Fix ROCP_METRICS for ROCm 5.2.0

* Restrict containers workflow to AMDResearch/omnitrace

* Bump version to 1.3.1

* Update cpack workflow

- generate release draft
- upload installers as release assets

* Test rocprofiler w/o roctracer enabled

* Fix formatting

* verbose message
2022-07-27 06:36:52 -05:00
Jonathan R. Madsen 4dd144a32c User regions in Python (#57)
* User regions in Python

* User-region testing + common submodule

- Updated examples/python/source.py to use user-regions
- Python-level user submodule
- Python-level common submodule
- clean-up of profiler python code
- extended source.py testing to include the user-regions
2022-07-25 20:24:34 -05:00
Jonathan R. Madsen 45be03906a RCCL support (#93)
* Initial support for RCCL

* OMNITRACE_USE_RCCLP + sampling tweaks

- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile

* Update docker and workflows to download RCCL

* Update CPack DEB with rocprofiler dependency

* Rework rccl into library and library/components folder

- add tpls/rccl/rccl/rccl.h

* Fix timemory includes

* rcclp inline definitions when disabled

* Tweaks to ubuntu-focal-external-rocm

- disable ompt
- enable building testing

* Tweaks to ubuntu-focal-external-rocm

- ctest exclude

* Tweak ubuntu-focal.yml

- remove source /.../setup-env.sh, replace with $GITHUB_ENV

* Fix ubuntu-focal-rocm + OMPI + root

* Improved rocm-smi error handling

- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled

* formatting

* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL

* Update RCCL include directory

- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>

* RCCL Testing

- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes

* Handle RCCL include w/o HIP

* RCCL requires HIP

* Update OMNITRACE_SAMPLING_CPUS for testing

* Update tests/CMakeLists.txt

* Debug settings

* Install MPI even when USE_MPI=OFF

* exclude printf

* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS

* update ubuntu rocm workflow

* Fix configure env step for ubuntu rocm
2022-07-25 12:16:11 -05:00
Jonathan R. Madsen 80a9039239 Minor fixes (#113)
- Improved error handling in getenv
- write_perfetto_counter_track
2022-07-25 07:25:43 -05:00
Jonathan R. Madsen 99da25ea80 exit gotcha + remove DelayedInit state + rocm-smi + cleanup (#110)
* exit gotcha + remove DelayedInit state + cleanup

- exit gotcha which wraps exit, quick_exit, abort
- minor refactor of mpi gotchas
- removed some redundant code in omnitrace_finalized_hidden
- exclude instrumenting functions starting with dlopen and dlsym
- exclude instrumenting exit, quick_exit, and abort functions
- update timemory submodule with support for new gotcha_invoker with (gotcha_data, <function pointer>, args...)

* Improved rocm_smi error handling
2022-07-24 22:09:32 -05:00
Jonathan R. Madsen bcdec188eb Fix reliability when KOKKOS_PROFILE_LIBRARY is set in env (#103)
* Fix reliability when KOKKOS_PROFILE_LIBRARY is set in env

- in certain situations, an exe using kokkos may be instrumented
- this will cause libomnitrace to be dlopened via libomnitrace-dl
- if KOKKOS_PROFILE_LIBRARY is set to libomnitrace and not libomnitrace-dl, you will end up with different instances of libomnitrace trying to collect data

* Set OMNITRACE_MAX_THREADS=32 in CI
2022-07-24 22:01:53 -05:00
Jonathan R. Madsen b1191b2831 Added new tests validating gotcha wrappers (#105)
* Added new tests validating gotcha wrappers

* Update MPI example to use thread

* Tweaks to mpi-flat test and mpi_gotcha

- enabled MPI_Comm_size and MPI_Comm_rank in mpip so disabled them at runtime
- set test to collapse threads and processes

* Tweak to test and example

- mpi test sets GOTCHA_DEBUG=1 in env
- removed checking for MPI_{Comm_dup,Comm_group,Group_incl}
- tweaked tests so pthread_join is where it is expected
2022-07-24 20:33:52 -05:00
Jonathan R. Madsen ea282d9301 Release 1.3.0 preparations (#109)
* v1.3.0

* ROCm 5.2 and extensions tweaks

* Container workflow + miscellaneous updates

* Misc fixes + timemory submodule update

- timemory submodule update for multiple definitions of variant_apply

* Increase timeouts

* Remove obsolete Julia docs and script

- support for rocprofiler makes rocprof merging obsolete

* Fix cpack testing and combine cpack workflows into single script

* Install components + omnitrace tpl exes

- Improved COMPONENT specification for installs
- Install PAPI executables with omnitrace- prefix and hyphens
- Install Perfetto executables with omnitrace- prefix and hyphens

* Update docs on perfetto and papi command-line tools

* remove ubuntu 22.04 from containers workflow

* remove containers workflow running on all pushes

* Fix CI_SCRIPT_ARGS

* Fix PAPI utils install

* Fixed traced test in workflow + removed return char

- validate-perfetto-proto.py had return character

* Fix test-docker-release.sh script to use correct container

* Release build bc RelWtihDebInfo using too much memory
2022-07-23 03:02:31 -05:00
Jonathan R. Madsen 0729e1737b Pthread category region (#102)
* Perfetto args support for pthread functions

* Add pthread category
2022-07-22 16:45:35 -05:00
Jonathan R. Madsen 45acadf756 Updated documentation for hardware counters (#108)
Updated documentation for hardware counters [skip ci]
2022-07-22 16:44:21 -05:00
Jonathan R. Madsen d27f22ea37 Sampling use SIGRTMIN + N signals (#104)
* Use SIGRTMIN instead of SIGALRM for sampling

* Config options + fully working SIGRTMIN sampling

- OMNITRACE_SAMPLING_KEEP_INTERNAL config option
- OMNITRACE_PROCESS_SAMPLING_FREQ config option
- OMNITRACE_SAMPLING_REALTIME config option
- OMNITRACE_SAMPLING_CPUTIME config option
- OMNITRACE_SAMPLING_REALTIME_OFFSET config option

* Fix omnitrace-avail-regex-negation test

- OMNITRACE_PROCESS_SAMPLING_FREQ was causing failure
2022-07-22 14:17:27 -05:00
Jonathan R. Madsen cf7052d919 Fix warnings + Werror (#101)
- Fix warnings via OMNITRACE_BASIC_VERBOSE and OMNITRACE_BASIC_VERBOSE_F
2022-07-21 16:24:04 -05:00
Jonathan R. Madsen 8bea1a995d Updated features docs (#98) 2022-07-21 12:56:10 -05:00
Jonathan R. Madsen c006e542a5 Replaces OMNITRACE_CONDITIONAL_BASIC_PRINT with OMNITRACE_VERBOSE (#97)
Replaces OMNITRACE_CONDITIONAL_BASIC_PRINT with OMNITRACE_VERBOSE and similar where possible
2022-07-21 11:15:26 -05:00
Jonathan R. Madsen 400e5078ed Sampling Tweaks: disable sampling itimer (#95)
* tweak starting sampling on main thread

- rename omnitrace::common::utility to omnitrace::common::path
- tracing::thread_init_sampling does not start sampling on main thread

* Cancel sampling itimers + timemory submodule update

- updates timemory submodule with support for cancelling itimer and SIGRTMIN through SIGRTMAX
2022-07-21 03:55:41 -05:00
Jonathan R. Madsen d04cbe862e fix omnitrace print-* with libraries (#94)
* fix omnitrace print-* with libraries

* timemory submodule update

* Update workflows to use ./bin/omnitrace instead of ./omnitrace

* cmake format

* update timemory submodule

- fix ODR violations in utility/procfs

* cmake updates

- uniform find_package for all ROCm-based libraries

* tweak transpose example

- throw exception instead of std::exit

* Inspect cmdv name before assuming not exe

- some ELF execs "think" they are libraries so only assume rewrite + simulate + all-functions if filename looks like library
- adds some test for --print-available -- <library>

* Fix _has_lib_prefix when command is < 3

* Updates and reverts to omnitrace exe

- update module_function operator< and operator==
- add function_signature operator<
- refactor module_function ctor
- revert some previous changes w.r.t. simulate and include_unninstr

* Fix source/bin/tests to use same output dir as tests

* cmake format

* Segfault mitigation + refactor + modify function iteration

- refactor module_function ctor to avoid segfaults
- string_t -> std::string
- replace std::string with std::string_view in some places
- get_name(module_t*)
- get_name(procedure_t*)
- disable using both app_modules and app_functions
- new option: --parse-all-modules to iterate over app_modules
- removed some unused code w.r.t. debug info

* Disable module_function address range for uninstrumentable functions

* Disable module_function address range for uninstrumentable functions

* Refactored getting file/line info and init/fini

- use dyninst insertInitCallback and insertFiniCallback if main not found
- fixed all issues with segmentation faults in --simulate --all-functions

* revert changes to Findrocprofiler.cmake
2022-07-21 01:15:41 -05:00
Jonathan R. Madsen 6bc86d1111 Remove get_perfetto_output_filename().clear() (#89)
- unnecessary and causing excess overhead
2022-07-18 09:41:03 -05:00
Jonathan R. Madsen 16bd20121e Support for disabling perfetto categories (#72)
* Updates

- get_perfetto_categories() impl
- rework perfetto category usage

* TIMEMORY_DEFINE_NAME_TRAIT for cpu_freq::cpu_peak

* Add process_memory_hwm perfetto category
2022-07-18 08:25:48 -05:00
Jonathan R. Madsen 8f8aa43d1a Fixes missing call to mpi_gotcha::update() (#88)
- mpi_gotcha::shutdown() calls mpi_gotcha::update()
  - fixes the MPI rank not being available to output files when `OMNITRACE_USE_PID=ON`
2022-07-18 07:00:38 -05:00
Jonathan R. Madsen d22725e830 Support ACTIVITY_DOMAIN_ROCTX (#87)
- New configuration variable: OMNITRACE_USE_ROCTX
- Enable support for roctxRangePushA, roctxRangePop, roctxRangeStartA, roctxRangeStop
2022-07-18 02:06:40 -05:00
Jonathan R. Madsen a2dcc7381b Unified setup_environ b/t libomni and libomni-dl (#86)
- omnitrace::common::setup_environ provides a standardized exporting of relevant env variables
2022-07-18 01:52:21 -05:00
Jonathan R. Madsen 7745c12417 Fix statistics type and use feature name indexes (#85)
- fix reporting units and statistics type for rocm_data_tracker
- use indexes for feature names instead of strings
2022-07-18 01:52:03 -05:00
Jonathan R. Madsen 4208b5654c GPU HW Counters via rocprofiler (#84)
* Initial support for GPU hardware counters

* Update find modules for roctracer and rocprofiler

- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure

* Improve ConfigCPack for MPI

* Update rocprofiler

- rocm_metrics()
- minor cleanup

* Update rocm find modules

* declare rocm_metrics + call in omnitrace-avail

* relocate omnitrace-launch-compiler

* REALPATH and find_modules

* Examples cmake (may drop)

* omnitrace-avail

- hw_counter categories
- init rocm

* setenv updates for rocprofiler in library.cpp and dl.cpp

* get_rocm_events config

* gpu::hip_device_count()

* rocm_metrics returns hardware_counters::info

* - relocated library/components/roctracer_callbacks.* to library/roctracer.*
- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker

* update omnitrace-avail info_type generation

* Update timemory submodule

* rocprofiler component

* cmake formatting

* omnitrace-avail handle no GPUs

- Add -c command-line option for --categories
- support verbosity

* hsa_rsrc_factory throws exceptions

- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler

* rocprofiler symbols for when disabled

* Fix warning in omnitrace-avail

- std::stringstream from initializer list would use explicit constructor

* Fix finalization after settings are deleted

* Reorganized rocprofiler source

* Updated formatting

* Miscellaneous tweaks

- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes
2022-07-17 21:52:09 -05:00
Jonathan R. Madsen 0ee32b3755 Improved sampling performance (#74)
* Improved sampling performance

* Sampling tweaks

- backtrace::get() returns vector of string_view
- further performance improvements
- tweaked _use_label
- wrapped samples in perfetto "samples [omnitrace]" block
- samples in TID=0 in sampling mode are in separate thread row

* Fix empty HW counter desc + category for sampling

- fallback to to metric name if papi event info description not found
- add perfetto sampling category

* Limit the SIGALRM frequency
2022-07-12 18:08:17 -05:00
Jonathan R. Madsen e099c84640 pthread_rwlock deadlock fix (#82)
- found when using ROCm-enabled OpenMPI with rocHPL
  - when wrapping pthread_rwlock_rdlock, pthread_rwlock_wrlock, and pthread_rwlock_unlock, omnitrace has been found to deadlock for some unknown reason
- New configuration variable: OMNITRACE_TRACE_THREAD_RW_LOCKS which defaults to false
2022-07-11 20:59:57 -05:00
Jonathan R. Madsen f6b6be3ddc Fix empty OMNITRACE_CONFIG_FILE and suppressing config and parsing (#81)
* Fix empty OMNITRACE_CONFIG_FILE and suppressing config and parsing
2022-07-11 18:46:12 -05:00
Jonathan R. Madsen f82845388a Handle OMNITRACE_ENABLED + minor updates (#78)
- handle OMNITRACE_ENABLED=OFF by disabling everything
- use set_data to get wrappee in pthread_create_gotcha
- clear roctracer_data storage if roctracer not initialized
2022-06-29 19:39:55 -05:00
Jonathan R. Madsen 2e1fd5a3c4 HIP API args in perfetto + new perfetto categories (#76)
* HIP API perfetto args + updated perfetto categories

- Support for HIP API args field in perfetto
- PERFETTO_CATEGORIES -> OMNITRACE_PERFETTO_CATEGORIES
- Changed perfetto categories for several trace events and trace counters
- migrated several TRACE_EVENT_* to use omnitrace::tracing::{push,pop}_perfetto_ts(...)

* Tweaked category_region to encode the type of args as well as value

- Affects MPI args field in perfetto

* Improved testing in ubuntu-focal.yml

- "Test Install" step sources setup-env.sh
- "Test Install" step tests python support
- "Test Install" step tests reading ~/.omnitrace.cfg
- Avoid installing boost and tbb libs when building from submodule

* validate-perfetto-proto.py accepts -m / --categories

* Remove reference from category_region typeids

* Tweak opensuse action name

* Tweak the "Test Install" Step of ubuntu-focal
2022-06-29 16:26:02 -05:00