develop
23 Commity
| Autor | SHA1 | Wiadomość | Data | |
|---|---|---|---|---|
|
|
9f014db6a4 |
[rocprofiler-systems] Update install path for examples (#2625)
* Update install path for examples to `share/rocprofiler-systems/examples` ---- Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com> Signed-off-by: David Galiffi <David.Galiffi@amd.com> |
||
|
|
8157437273 |
[Palamida scan] SWDEV-553054 Adding missing copyrights information (#900)
* Add missing copyright headers in rocprofiler-systems * Update python-tests * Update causal test --------- Co-authored-by: David Galiffi <David.Galiffi@amd.com> |
||
|
|
847580dd9e |
Update minimum_cmake_required to match version used in CI (#679)
- Update minimum_cmake_required to match version used in CI - We should match the minimum version that we test against - Ensure ".S" files are treated as assembly. |
||
|
|
8fcf3a50b0 |
Use gersemi for CMake formatting (#257)
* Replace `cmake-format` with `gersemi`
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Remove .cmake-format.yaml
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Update workflow to use gersemi
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Update CONTRIBUTING.md
* Update helper scripts
* Don't include `*/external/*` in workflows
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
d8c98d2d4d |
OMPT Target Offload Support (#17)
- Porting from https://github.com/ROCm/omnitrace/pull/411
- Improve OMPT support
- Add OpenMP target example to testing
- Update Timemory submodule to use ROCm/Timemory rather than NERSC/Timemory
- Update `actions/upload-artifacts` to v4
- Standardize the `cmake_minimum_required` to 3.18.4 across workflows, project, and examples
- Updated Ubuntu 20.04 workflows
[ROCm/rocprofiler-systems commit:
|
||
|
|
489eda995d |
Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed.
Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"
---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
3c7e6902e0 |
Causal profiling (#229)
* Addition of basic structure
* Reworked categories
* More causal integration additions
* Causal implementation
* Update examples
* delete virtual_speedup files
* Update perfetto submodule to v31.0
* Update dyninst submodule
* Update timemory submodule
* ElfUtils build for libdw
* OMNITRACE_LIKELY and OMNITRACE_UNLIKELY
* Update common lib join
* Examples updates for causal profiling
* config updates with causal options
- OMNITRACE_CAUSAL_FIXED_LINE
- OMNITRACE_CAUSAL_FIXED_SPEEDUP
- OMNITRACE_CAUSAL_FILE
- OMNITRACE_CAUSAL_BINARY_SCOPE
- OMNITRACE_CAUSAL_SOURCE_SCOPE
- version info in banner
- support increments in parse_numeric_range
- fix occasional deadlock in first call to get_config
* PTL general task group
* Always include PID in debug/verbose messages
* Add blocking/unblocking gotchas to runtime init bundle
* CausalState
* thread_data updates
- generic component_bundle_cache
* Improve handling of causal in category_region
* components updates
- backtrace_causal component
- backtrace::get_data member func
- decrease ignore_depth in backtrace::sample(int)
- handle "omnitrace_main" in backtrace::filter_and_patch(...)
- tweak internal thread state scope for pthread_mutex_gotcha wrappers
* simplify tracing get_instrumentation_bundles usage
* sampling updates
- include backtrace_causal component
- disable backtrace_metrics if using causal and not using perfetto
- disable backtrace and backtrace_timestamp when using causal
- post_process_causal
* causal updates
- more checks in blocking_gotcha and unblocking_gotcha start/stop
- miscellaneous overhaul of data
- experiment update
* Remove virtual speedup
* libomnitrace code_object
* causal-profiling test
* libomnitrace library.cpp updates
- handle causal profiling
- fini_bundle
* Disable causal profiling by default
* Updated causal code and example
- example: three execution variants: cpu + rng, cpu, rng
- example: three instrumentation variants: none, omni, coz
- fix blocking gotcha credit
- rework perform_experiment_impl
- get_eligible_address_ranges
- compute_eligible_lines
- support fixed lines/speedups/functions
- update selected_entry to support function mode
- fix causal::delay
- experiment updates
* omnitrace_progress / omnitrace_user_progress
- with accompanying omnitrace_annotated_progress / omnitrace_user_annotated_progress
* Update timemory submodule
* CausalMode
- mode indicated whether causal predictions source be at line-level or function-level
* code_object, config, runtime, sampling, thread_data
- code_object: address_range
- code_object: basic::line_info serialize(), name(), hash()
- config updates
- two signals for causal sampling
- thread_data init fixes
* pthread updates
- pthread_create_gotcha processes delays
- pthread_mutex_gotcha does not wrap pthread_join in causal mode
* backtrace_causal update
- dynamic delay period stats
* main wrapper uses basename of argv[0]
* update elfio submodule
* perf support (currently unused)
* Fix experiment JSON serialization
- static_vector.hpp (unused)
* causal executable + config options updates
- omnitrace-causal exe simplifies running multiple causal configs
- changed the causal config option names
* Support both throughput and latency points
* process-causal-json.py script
- will be used later for testing
* stable_vector
* Rework thread_data
* Improve omnitrace-causal exe
- better verbosity handling
- correct diagnosis of status for child process
- execvpe when only one iteration (debugging)
* Update timemory submodule
* exe --version
- omnitrace, omnitrace-avail, and omnitrace-sample all support --version on command-line
* OMNITRACE_INTERNAL_API + OMNITRACE_{LIKELY,UNLIKELY}
* omnitrace-causal cmake format
* omnitrace config update
- OMNITRACE_CAUSAL_FILE_CLOBBER
* custom exception
- wraps STL exception and gets stacktrace during construction
* exit_gotcha supports _Exit
* use global construct_on_init + max threads
- add some safety when exceeding max # of threads
* update code_object binary filter
- exclude dyninst and tbbmalloc library
* containers: c_array, static_vector, stable_vector
- moved utility::c_array to container::c_array
- created static_vector: std::vector bound to std::array
- created stable_vector: vector with stable references
* grow thread_data when new thread created
* causal updates
- data: improve compute_eligible_lines to ignore lambdas
- data: use new thread_data
- delay: use new thread_data
- experiment: properly support latency points
- experiment: support file clobber
- experiment: ensure non-zero experiment time
- progress_point: use new thread_data
- backtrace_causal: use new thread_data
* Update causal-profiling tests
* fix omnitrace-causal backslash escaping
* process-causal-json script
* restructure causal implementation
- update verbose messages for omnitrace-causal diagnose_status
- migrated causal implementation in sampling.cpp to causal/sampling.cpp
- OMNITRACE_USE_CAUSAL does not require OMNITRACE_USE_SAMPLING
- added Mode::Causal
- causal sampling uses same signals as regular sampling
- moved tracing::thread_init to implementation file
- combined tracing::thread_init and tracing::thread_init_sampling
- added causal/components folder
- pthread_create_gotcha::wrapper_config
- omnitrace_preload checks OMNITRACE_USE_CAUSAL
- updates mode accordingly
* update timemory submodule
* update timemory submodule
* causal example updates
- causal for lulesh
* perf code + utility - helpers
- relocated causal perf code
- placement new when generating unique ptr trait for potentially allocating during sampling
- additions to utility header
- removed previously added helpers.hpp
* update timemory submodule
* Default env variables for omnitrace-causal
- activate OMNITRACE_USE_KOKKOSP, etc.
* update stable_vector and static_vector
- static vector can use atomic for size tracking for thread-safe situations
* update causal example header
- CAUSAL_PROGRESS_NAMED
- use CAUSAL_ prefix for some macros
* Tweak lulesh example
- use CAUSAL_PROGRESS instead of CAUSAL_BEGIN and CAUSAL_END
* omnitrace-sample support for causal mode
- set OMNITRACE_USE_SAMPLING to off when OMNITRACE_MODE=causal
* refactor and cleanup code_object
- scope filter
- fixes to address_range
* overhaul causal data + causal config options
- full support for function and line mode
- support static vector of instruction pointers
- improve line info mapping resolution
- remove thread-locality from miscellanous functions where unnecessary
- causal options for {binary,source,function,fileline} exclusion
* causal experiment, sampling, and backtrace updates
- is_selected + unwind address array
- experiment warning about progress points
- increased buffer size for backtrace_casual sampler
- backtrace_causal only stores IP addresses instead of full unwind info
* category_region updates
- minor refactor
- local_category_region::mark
* Update causal tests
* Bump version to 1.8.0
* omnitrace-causal args + CLOBBER -> RESET
- renamed OMNITRACE_CAUSAL_FILE_CLOBBER to OMNITRACE_CAUSAL_FILE_RESET
- updated omnitrace-causal exe to support recently added configuration options
- other miscellaneous tweaks to data.cpp, experiment.cpp, and sampling.cpp
* Refactor causal and code_object
- code_object.hpp and code_object.cpp moved into binary folder
- causal components namespaced into omnitrace::causal::component
- moved sample_data out of backtrace_causal and into own file
- renamed backtrace_causal to causal::component::backtrace
* preload omnitrace_init + OMNITRACE_DEBUG_MARK
- env OMNITRACE_DEBUG_MARK
- fix omnitrace_init call when LD_PRELOAD-ing omnitrace
* Fix fileline support + line-info output names + experiment log
- line-info log files are prefixed with experiment name
- don't print experiment duration when E2E
- account for fileline scope in analysis
* KokkosP: OMNITRACE_KOKKOSP_NAME_LENGTH_MAX
- config option to limit the name of kokkos tool callbacks
- remove [kokkos] from KokkosP names
* Update causal example
- minor tweaks to decrease probability of overlapping regions in binary
* omnitrace-causal update
- prefix N / Ntot in environment printout
* Miscellaneous updates
- causal::finish_experimenting()
- OMNITRACE_CAUSAL_RANDOM_SEED
- KokkosP causal updates
- exclude some callbacks, make some callbacks unique, etc.
- address_range::operator+=(address_range)
- combine contiguous ranges in binary/analysis.cpp when file, func, line is same and address range is contiguous
- bfd_line_info reads inline info
- wait for perform_experiment_impl to complete
- causal::delay updates
- delay::process checks if experiment is active
- uses threading::get_id()
- experiment scales duration up for larger speedup experiments
- line info samples includes excluded lines
- sampler uses CLOCK_REALTIME
- blocking_gotcha updates
- is no longer fully static
- adds audit routine which sets the postblock value to zero if try/timed routine fails
- category::host was added to causal_throughput_categories_t
- pthread_create_gotcha sets new threads local parent delay
- was using internal value, now uses sequent value
* Causal improvements to KokkosP
* Updates to experiment time scaling
- use stats instead of just max
* binary/link_map.{hpp,cpp}
* update process-causal-json.py
* Folded fileline scope into source scope
* Update documentation
- Add documentation for causal profiling
- Replace 'Omnitrace' with 'OmniTrace' everywhere
* Update causal-helpers.cmake + omnitrace-testing.cmake
- split tests/CMakeLists.txt partially into omnitrace-testing.cmake
* omnitrace/causal.h
- OMNITRACE_CAUSAL_PROGRESS
- OMNITRACE_CAUSAL_PROGRESS_NAMED
- OMNITRACE_CAUSAL_BEGIN
- OMNITRACE_CAUSAL_END
* selected_entry + remove default filters for lambdas and operator()
- selected entry stores range and binary load address
* update process-causal-json.py
* format examples/lulesh/CMakeLists.txt
* causal-helpers find_package(Threads)
* OMNITRACE_KOKKOSP_KERNEL_LOGGER
- was OMNITRACE_KOKKOS_KERNEL_LOGGER
* quiet find of coz-profiler
* Fix rocm_smi exception handling
* Update timemory submodule (binutils)
- fix binutls compile error on some systems
- bump binutils to v2.40
* Fix miscellaneous tests
* OMNITRACE_KOKKOSP_PREFIX
* revert rocm_smi handling
* ElfUtils updates
- default to download version 0.188
- add -Wno-error=null-dereference due to GCC 12 compiler error
* Update causal example
* Remove OMNITRACE_VERBOSE from global workflow envs
* Reliable causal test
* disable compilation of causal perf files
* Remove set_current_selection with unwind stack
* update timemory submodule
* fix for segfault on bionic
- locking in TLS dtor was causing segfault
* remove experiment::is_selected(unwind_stack_t)
* update default init of selected_entry
* Fix for when IP is not offset by load address
* Update CMakeLists.txt
* Miscellaneous updates
- OMNITRACE_WARNING_OR_CI_THROW
- OMNITRACE_REQUIRE
- OMNITRACE_PREFER
- fixed issues with no ASLR
- added load address variable and ipaddr() func to basic/bfd line info
- removed get_basic() from dwarf_line_info
- TIMEMORY_PREFER -> OMNITRACE_PREFER
- removed previously added binary_address and range variables from selected_entry
* Removed superfluous CausalState
* Additional causal tests (lulesh + kokkos)
* filter, prefer, analysis ASLR handling
- removed default filter on cold functions
- fixed OMNITRACE_PREFER
- fixed analysis ASLR handling
* Tweak line-info output
* Removed some superfluous code
- causal/delay
- causal/selected_entry
* Exclude main.cold in function mode
* Update validate-perfetto-proto.py
- account for occasional http errors
* Add sampling test disabling tmp files
* argparser for process-causal-json
- support validation
- support filtering
* Avoid pthread_{lock,unlock} in sampling offload
- use homemade atomic_mutex/atomic_lock since contention will be low and using pthread tools might trigger our wrappers
* Rename process-causal-json.py
- validate-causal-json.py
* rework omnitrace_add_causal_test
- capable of performing validation
- added validation tests
* Fix kokkosp_begin_deep_copy + causal
* Tweak address range in bfd_line_info::read_pc
* Tweak analysis and data IP handling
- look for gaps
* Disable scaling experiment time by speedup
* Revert change in max threads during CI
* binary updates
- significant overhaul of binary analysis implementation
- removed "basic_line_info" and "bfd_line_info" in lieu of "symbol" class
- symbol class has basic BFD info + vector of inlines + vector of dwarf info
* Updated causal to use new binary analysis
- Fix symbol.cpp includes
* Updated formatting target
- include *.cmake files
* Updated causal tests
- causal tests should be stable now
* Update timemory and dyninst submodules
- TPLs are stripped + built w/o debug info
* Increase tolerance for causal validation speedups
- higher speedups have more variance (increased to +/- 5 from 3)
* Support causal output for MPI
- i.e. tag with MPI rank
* omnitrace-causal launcher argument
* improve experiment sampling output
* causal data updates
- call compute lines once
- fixed filtered cached binary info
- debugging info when experiment fails to start
* Tweaked causal validation tests
* dwarf_entry ranges
* CI updates
- increase max threads to 64
* Tweak causal E2E validation tests
- more threads
- shorter thread runtime
- more iterations
* Fix shadowed variable
* fix symbol read_bfd last PC calculation
* fix maybe-uninitialized warning
* omnitrace-causal launcher update
- only inject "omnitrace-causal --" once
- throw error if no matches found
* Update causal profiling docs for launcher
* fix address range boundaries
[ROCm/rocprofiler-systems commit:
|
||
|
|
5a1cec92e8 |
Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}
- remove /merge from CI name
* disable using BFD when sampling_include_inlines is OFF
- this consumes a lot of memory
* Improve finalization of rocprofiler
* update timemory submodule
- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init
* Improve timemory build time
* Remove kokkosp restrictions for perfetto
* omnitrace exe signal handler update
- configure signal handlers before main to allow libomnitrace to override
* Backtrace and timemory submodule updates
- Use unwind::cache w/o inline info
- update timemory submodule
- unwind::cache updates
- filepath updates
- fix termination_signal_message
- fix vsettings::report_change
* Update dyninst submodule
- updates BinaryEdit::getResolvedLibraryPath
* update timemory submodule
- update CpuArch support
* Cleanup configure warnings
* Update examples cmake and workflows
- (Mostly) eliminate configuration warnings
* omnitrace exe updates
- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS
* Update dyninst submodule
- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath
* examples/mpi CMakeLists.txt update
- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING
* Dev build and linker flags
- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
- disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function
* Update workflows to set OMNITRACE_BUILD_NUMBER
* Fix generator expressions for -fuse-ld
* Suppress some configuration warnings during CI
- helps to keep track of real warnings when they arise
* Update timemory and dyninst submodules with CMP0135
* Add -V flag to run-ci script
[ROCm/rocprofiler-systems commit:
|
||
|
|
473f452d39 |
Rework sampling and colorized logs (#140)
## Overview
This is a significant PR which has 3 very notable characteristics:
1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling
- Samples now record the current instruction pointers instead of strings
- This _dramatically_ decreases the overhead of taking a sample
- The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
- When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
- `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency
- `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
- Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
- In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
- This helped de-clutter the output folder
Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.
## Bug Fixes
There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made
## Details
- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
- there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
- added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
- removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
- removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
- migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
- handle disabled thread state
- handle finalized state
- fewer debug messages
- invoke thread_init()
- invoke thread_init_sampling()
- handle push/pop count based on category
- push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
[ROCm/rocprofiler-systems commit:
|
||
|
|
2699095190 |
CMake updates/fixes + parallel-overhead updates (#16)
- OMNITRACE_INSTALL_EXAMPLES option
- Fix lulesh standalone HIP compilation
- Fix transpose standalone HIP compilation
- Tweaks to parallel-overhead
[ROCm/rocprofiler-systems commit:
|
||
|
|
d009fc24a6 |
Standalone build examples + testing workflow updates (#15)
* Update examples to support standalone builds
* Tweak to ubuntu-focal-external workflow
- disable PAPI
* ubuntu focal external workflow update
- GCC 11
- Test static libgcc + static libstdcxx + strip
- ubuntu-toolchain-r/test
* Improve build-release.sh
- command line args for lto, strip, perfetto-tools,
static-libgcc, static-libstdcxx, hidden-visibility,
max-threads, parallel
* Update VERSION to 1.0.1
* Fixes to LTO build
* Updates to ubuntu-focal-external workflow
* build-release.sh update
- enable static libstdcxx by default
* disable python + static libstdcxx
* ubuntu-focal-external updates
* build-release.sh disable static libstdcxx by default
* cmake-format
[ROCm/rocprofiler-systems commit:
|
||
|
|
8cc87ca6b8 |
MPI headers + mutex gotcha + roctracer + kokkosp (#11)
* MPI headers, mutex gotcha + roctracer + kokkosp
- relocate internal MPI headers
- pthread_barrier in parallel-overhead
- doc fixes to DYNINST options
- minor tweaks to dynamic_library
- dlopen libamdhip64.so
- scoped thread state in kokkos
- extended pthread_mutex_gotcha
* Fix for unused-but-set-variables
[ROCm/rocprofiler-systems commit:
|
||
|
|
0d5f0fb9cf |
Support for tracing mutex locking (#52)
* Parallel overhead example with locks
* Support tracing mutex locking + more
- support wrapping pthread_mutex_lock
- support wrapping pthread_mutex_unlock
- support wrapping pthread_mutex_trylock
- get_perfetto_combined_traces setting
- OMNITRACE_TRACE_THREAD_LOCKS option
- ThreadState
- critical trace includes queue id
- enabled/disabled settings in timemory
- fix OMNITRACE_TIMEMORY_COMPONENTS
- fix reading config
- fix setting categories
- applied ThreadState::Internal in various places
- utility::get_filled_array
- utility::get_reserved_vector
- utility::get_thread_index
- fork_gotcha messages about forks
- split out some pthread_gotcha functionality into pthread_create_gotcha
- handle queue id in roctracer callbacks
* Update timemory and PTL submodules
* Misc CMake updates
- Includes fix to omnitrace-static-lib{gcc,stdcxx}
* Misc cleanup to pthread_mutex_gotcha and backtrace
* Fix to duplicate field in module_function json
* Improvement to debug messages
* omnitrace-dl and common improvements
- tweak to delimit
- common::ignore message
- common::join quoting of strings
- omnitrace_set_env ignores if inited and active
- omnitrace_set_mpi ignores if inited and active
* nsync for transpose example
* Fix to thread_deleter<void> functor invoke
* Fix thread state and HIP stream enums
[ROCm/rocprofiler-systems commit:
|
||
|
|
72d0a7d08a |
Code Coverage Support (#46)
* Code-coverage support
* Examples update
- code-coverage example
- tweak transpose and parallel-overhead
* Coverage output + testing
- config::get_setting value(...)
- REGULAR_EXPRESSION -> REGEX in cmake func args
- coverage.hpp header
- coverage JSON
- coverage tests
* cmake formatting
* Library instrumentation w/o main + more
- fixed library instrumentation w/o main
- use TIMEMORY_PROJECT_NAME in output messages
- removed '--driver' option from omnitrace exe
- support coverage in trace mode
- OMNITRACE_KOKKOS_KERNEL_LOGGER
- support multiple calls to omnitrace_set_env after init if already called
- support multiple calls to omnitrace_set_mpi after init if same args
- support multiple calls to omnitrace_init if same mode
- unique_ptr_t for thread_data which calls finalize when thread_data is destroyed
- tweaked openmp tests
- improved finalization
* Replace CI --output-on-failure with -V
* Fix to OMNITRACE_DL_INVOKE
* omnitrace-exe and testing updates
- omnitrace::omnitrace-timemory interface library
- support for configs in omnitrace exe
- print-{available,instrumented,...} opts no longer exit w/o --simulate
- all tests apply --print-instrumented functions
- tweaked coverage tests
- print-* options print instructions not address range
* Remove OMNITRACE_DEBUG_FINALIZE=ON from CI
* Python cmake tweaks
* Tweak test ordering
* Upload CI artifacts if fail or success
* CI Python tweaks
- Use OMNITRACE_PYTHON_PREFIX and OMNITRACE_PYTHON_ENVS
* CI ELFULTILS_DOWNLOAD_VERSION
* test tweaks
- labels and more coverage tests
* tweak to omnitrace --config handling
* Update module/function constraint handling + PP
- tweak pre-processor definition handling
- removed free-standing module_constraint
- remove free-standing routine_constraint
- remove module_name.find("omnitrace") module constraint
- fully handle the output path of omnitrace *-instr files
- get_use_code_coverage config option
- print-coverage option
- coverage_module_functions
* use github.job not github.name
* Re-enable HSA_ENABLE_INTERRUPT
- remove coverage address report
[ROCm/rocprofiler-systems commit:
|
||
|
|
28ade7fbb9 |
Update CI to test multiple python versions (#45)
* Update CI to test multiple python versions
* Ensure numpy is installed
* Handle lulesh with cmake < 3.16
* Fix typo
* Bump minimum CMake version to 3.16
- CMake 3.15 has issue with PTL object library
* Tweak CI test output
[ROCm/rocprofiler-systems commit:
|
||
|
|
f17ff12a66 |
Sampling support + testing + omnitrace namespace (#19)
* omnitrace namespace
* Kokkos + Lulesh example/tests
* Sampling support + more
- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options
* Release workflow updates
* Updated settings printing
* Fixed defaults in README
* Tweak setting defaults in README
* CMake fixes
* cmake-format
* clang-format
* LULESH_USE_MPI OFF
* LULESH_USE_MPI fix
* timemory add_secondary fix
* timemory ambiguous internal namespace fix
* Update timemory submodule
* Handle output path/prefix in omnitrace
- updated timemory
- updated test environment
* sampling + papi fix
* Fix to sampling without PAPI
* Fix for using too many processors in CI
* formatting
* Updated CI
- minor cmake tweaks
- updated timemory submodule
* Updated CI
* Updated CI
* CI + timemory updates
- data race fixes
* CI updates + debug for sampling
* Sampling updates
- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality
* Minimum OMNITRACE_THREAD_COUNT of 128
* Handle multiple dims in sampler data
* Configure libunwind support for timemory
* Improved safeguards for sampling
- updated CI
- lulesh runtime-instrument test tweak
* formatting
* CI updates + sampler updates + misc
- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default
* Updated timemory submodule
- hidden visibility for timemory
- storage finalizers do not capture this
* Update timemory submodule
- component visibility updates
* Reworked header includes
- use <...> for timemory headers
- always include <library/defines.hpp>
* Rename some config options
* Update PTL submodule
* Update kokkos submodule
* Updated sampling
* Updated CI
* Reworked instrumentation exe
- lowered min-address-range threshold to 256
- extended whole function exclude
* CI fix + timemory submodule update
- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead
* Sampling flags + transpose update + CI update
- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads
* CI update
- removed ubuntu-focal-external-debug
- reduced data artifacts upload
* CI timeouts
- updated timemory submodule
- minor tweaks to omnitrace exe logging
* LICENSE updates (partial)
* CI Test stage timeout extension
* Docker and Packaging updates
* Miscellaneous fixes/tweaks
- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix
* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate
* cmake format
* Sampler deadlock fixes
* Removed debug messages from sampler
* Fix for MPI detection + test tweaks + misc
* Sampler deadlock fixes + misc
- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
- no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY
* omnitrace-avail + libunwind update + restructure
- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace
* Fix remaining reorganization issues
- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting
* ensure_storage fix + avail improvements
- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail
* Delay settings initialization
- slight tweak to tests w/ MPI
* Disable OpenMPI testing w/ ubuntu-bionic
- MPI testing is hanging bc of network interface issue on system:
> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
> Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.
[ROCm/rocprofiler-systems commit:
|
||
|
|
9d5ebf9c3b |
Rename hosttrace to omnitrace (#18)
[ROCm/rocprofiler-systems commit:
|
||
|
|
efb6d766af |
Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16)
* Integrates roctracer values into wall-clock
* Fixed scoping + timemory roctracer
* Fixed data race in roctracer
* Synchronized HIP API on main thread
- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose
* Debugging + MPI + transpose updates
* PTL + HSA and timemory + kernel timing
- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks
* Ignore select HIP API types
- The ignored API types are ignored because there appears to be a bug
which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory
* Tweaks to PTL config
* Timemory update + pid-prefix w/ mpi headers
- %pid%- prefix with mpi headers
- timemory submodule update
* CMake + critical trace + reorganize library source
- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace
* Remove bits/stdint-uintn.h headers
* Rename + config + depth + critical path
- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation
* Fixed duplicates in perfetto critical-trace
* reworked critical trace support
- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)
* Removed "%pid%-" output prefix in mpi_gotcha
* Update timemory submodule
[ROCm/rocprofiler-systems commit:
|
||
|
|
2cc8005680 |
Release 0.0.2 (#14)
* Fixed Dyninst TBB symbolic links + bump to v0.0.2
* hosttrace exe and library updates + submodule updates
- Updated dyninst submodule with TBB build ORIGIN rpath
- Updated timemory submodule
- Dyninst build with shared libs
- Dockerfile for building packaging
- Disable hidden viz in examples
- parallel-overhead max parallelism
- query_instr in hosttrace
- different file-line info format
- full module names
- minor fix to MPI support
- disable instrumention stack frames by default
- disable trap instrumentation by default
- updated hosttrace output file dumps
- removed cstdlib option
- dyninst DebugParsing option
- improved instrument_module function
- fixed some MPI support
- tweaked some testing parameters
[ROCm/rocprofiler-systems commit:
|
||
|
|
f3e7a1664a |
cmake-format + miscellaneous tweaks (#13)
* cmake-format + miscellaneous tweaks
* Formatted cmake in examples and tests
* Updated linux-ci.yml artifacts naming
* Updated clang-format
* Fixed submodule branches
[ROCm/rocprofiler-systems commit:
|
||
|
|
60145cd5c4 |
GitHub CI (#11)
* Continuous integration
* linux-ci on
* fix for parallel-overhead
* Updated CMAKE_INSTALL_PREFIX
* Updated installed boost libraries
* Dyninst updates fixing TBB install
* timemory + dyninst submodule updates
- fixes some timemory package option handling
- fixes dyninst libiberty handling
* Update dyninst submodule with libiberty fix
* dyninst submodule update with TBB internal build fixes
* Updated linux-ci + tests + timemeory + dyninst
- updated timemory submodule
- update dyninst submodule
- delay OnLoad implementation
- DYNINST_RT_API handling improvement
- CI for ubuntu-bionic in addition to focal
- CTest in CI
* Update TBB handling in CI
* Update dyninst with symLite fix
* Update dyninst submodule with ElfUtils-External fix
* Dyninst::ElfUtils fix
* Modified dyninst submodule TPL install
* Update dyninst submodule with improved interface libraries
* Fix to Dyninst::ElfUtils in dyninst submodule
* Updated CI build matrix + test install
* Updated CI
* CI stage updates
* Tweak
* Dyninst updates + RPATH for hosttrace exe
- hosttrace will rpath to dyninst
- Dyninst will statically link to boost by default
- Fixes to double init w/ MPI + runtime instrumentation
- minor cleanup to hosttrace
- hosttrace help exits with zero
- CI updates
* Dyninst + CI updates
* Removed ELFUTILS_BUILD_STATIC option in Dyninst
* Dyninst + visibility + roctracer + tests
- Dyninst submodule updates with dynamic pcontrol lib
- hosttrace visibility is hidden
- improved handling of DYNINST_API_RT in hosttrace
- roctracer::tear_down in finalize fixes issues with timemory
- throw error if cannot create perfetto output file
- roctracer_default_pool() safety guards
- resume calling roctracer_close_pool()
- fixed working dir of tests
- fixed env of tests (cmake and CI)
- simplified CI
* Removed stray CI if condition
* Disable hidden visibility for now
* ifdef for roctracer::tear_down + reenable hidden
* Better CI environment handling + dyninst packaging
* Fix for dyninst-package CI
* Fixes for cmake 3.15 issues with aliasing imported lib
* Fix to ubuntu-focal-dyninst-package
* Dyninst updates for packaging
- fixes issues with hard-coded paths to libraries after relocation via package installer
* Restrict CI to main and develop branches
[ROCm/rocprofiler-systems commit:
|
||
|
|
244b308cb5 |
Integrated perfetto + roctracer (#5)
- hosttrace library automatically collects and merges timestamps for HIP API calls and kernels with the host-side instrumentation
- mostly eliminates the need for using external rocprof
- added thread_instruction_count in perfetto output
- increased hosttrace min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- miscellaneous cmake updates
* roctracer support
- fully integrated perfetto + roctracer outputs
- thread_instruction_count in perfetto
- increased min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- updated timemory submodule
* hosttrace_launch_compiler
- support for using an alternative compiler as needed via launch compiler
- elfio added as submodule (not currently used)
- miscellaneous cmake updates
* README update + host/device categories + misc
- timemory fix for TIMEMORY_ROCTRACER_ENABLED
- transpose fix
* papi_tuple_t -> papi_tot_ins
- minor fix to Findroctracer.cmake
[ROCm/rocprofiler-systems commit:
|
||
|
|
6825578603 |
Improved analysis of functions to instrument + MPI support + timemory support (#2)
* various tweaks
* build updates + cleanup + overlap guard + min addr range
* Library source reorg + miscellaneous tweaks
* Removed unnecessary fwd decls
* Print address range in --print-X pair mode
- hosttrace modifications
- disable instrumenting functions with overlapping sections or multiple entry points by default (control via --allow-overlapping option)
- disable instrumenting functions whose address range < 512 bytes unless a loop is present by default (control via --min-address-range option)
- disable instrumenting functions w/ loops whose address range < 64 bytes (control via --min-loop-address-range)
- Support for wrapping MPI function calls even in binary rewrite mode
- e.g. use gotcha to wrap MPI functions with hosttrace_push_trace and hosttrace_pop_trace
- New timemory only mode --> HOSTTRACE_USE_TIMEMORY=ON
- New timemory + perfetto mode --> HOSTTRACE_USE_PERFETTO=ON + HOSTTRACE_USE_TIMEMORY=ON
- Full support for all timemory components
- parallel-overhead example for measuring the overhead in a MT-parallelized application with very small instrumentation functions
- improvements to output directories for hosttrace exe
- improvements to output directories for hosttrace library
- new hosttrace options
- --print-instrumented <type> prints out the instrumented entities and exits
- --print-available <type> prints out the available instrumentation entities and exits
- --print-overlapping <type> prints out the overlapping entities and exits
- NOTE: <type> above refers to the information printed out, e.g. module name vs. function name vs. module and function name, etc.
[ROCm/rocprofiler-systems commit:
|