develop
128 コミット
| 作成者 | SHA1 | メッセージ | 日付 | |
|---|---|---|---|---|
|
|
318d13870f |
[rocprofiler-systems] Update logging to use spdlog library (#2428)
## Motivation - Structured logging with proper log levels (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL) - Better performance through compile-time formatting - Consistent formatting using fmt library - Runtime log level control via arguments and environment variables - Easier maintenance and debugging capabilities ## Technical Details - Added spdlog as a submodule and integrated it into CMake build system - Created new `rocprofiler-systems-logger` library wrapping spdlog functionality - Replaced custom logging macros (`ROCPROFSYS_VERBOSE`, `ROCPROFSYS_DEBUG`, `ROCPROFSYS_FATAL`, `ROCPROFSYS_REQUIRE`, `ROCPROFSYS_CI_THROW`, etc.) with spdlog equivalents (`LOG_DEBUG`, `LOG_WARNING`, `LOG_CRITICAL`, etc.) - Implemented log level control through command-line arguments and environment variables - Converted assertion macros to proper error handling with exceptions and std::abort() |
||
|
|
c35a7dd8cb |
[rocprofiler-systems] Update timemory submodule (#2440)
- Fixes SWDEV-559349 - Fix build failure caused by correct libunwind not being found in some environments. - Updated the `timemory` submodule to commit `24407d37ab85c46ba6c18fba9498320f825ee4e4 `. |
||
|
|
b4e04b07ed |
test: add unit tests for common utilities from PR #1249 (#2237)
* test: add unit tests for common utilities from PR #1249 * incorporate review comments specific to tests formatting * use filesystem API instead of std::system for safer cleanup * Add ghc/filesystem submodule v1.5.14 for portable C++17 filesystem support * fix: add cmake/GhcFilesystem.cmake for CI submodule auto-checkout * incorporate review comment * incorporate review comment |
||
|
|
09a9f9e31d | [rocprofiler-systems] Improve rocpd writing speed (#2061) | ||
|
|
a5d554b85a |
[rocprofiler-systems] Implement GTest/GMock integration for unit testing (#1777)
* googletest project set up --------- Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com> Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com> |
||
|
|
d77b245730 |
[Rocprofiler-systems] : Refactor papi enumeration to fix a hang on Intel systems (#1672)
* Refactor papi enumeration to fix a hang on Intel systems - Add an exclude argument to available_events_info() for perf_event_uncore causing hang like case on Intel systems with large number of uncore events. - Enumerate papi available events only when papi events are specified by users inside early initialization logic - Move papi available event query for ROCPROFSYS_SAMPLING_OVERFLOW_EVENT config setting to the avail component, to move the heavy logic outside initialization. - Make category option for rocprof-sys-avail -H -c case insensitive - Provide new option to query available overflow events that can be specified for ROCPROFSYS_SAMPLING_OVERFLOW_EVENT using new command option rocprof-sys-avail -H -c overflow * Update projects/rocprofiler-systems/source/bin/rocprof-sys-avail/common.cpp Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com> * Update timemory submodule pointer Signed-off-by: David Galiffi <David.Galiffi@amd.com> * Fix errors on compile * Change 1: Optimization for the category matching lambda Optmization changes. * Modify the rocprof-sys-avail -c option for overflow Overflow should not be displayed as a device in rocprof-sys-avail -H -c CPU Users can instead do regex on summary where overflow is appended in description User can do rocprof-sys-avail -H -c CPU -d -r overflow * Revert change to column width --------- Signed-off-by: David Galiffi <David.Galiffi@amd.com> Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com> Co-authored-by: David Galiffi <David.Galiffi@amd.com> |
||
|
|
e22a8e865e |
Update Timemory submodule (#1539)
- Fixes clang build failure Signed-off-by: David Galiffi <David.Galiffi@amd.com> Co-authored-by: Aleksandar Janicijevic <Aleksandar.Janicijevic@amd.com> |
||
|
|
28c2728b6b |
Update Dyninst module (#1540)
- Fix nullptr check ------ Signed-off-by: David Galiffi <David.Galiffi@amd.com> Co-authored-by: Aleksandar Janicijevic <Aleksandar.Janicijevic@amd.com> |
||
|
|
1eb4bd26a6 | Bump rocprofiler-systems/external/papi to papi-7-2-0b2-t. (#785) | ||
|
|
5a767cf272 |
Remove an undefined submodule from the GOTCHA submodule (#285)
A recursive submodule update, `git submodule uppdate --recursive
--init`,
would fail due to an improperly defined and empty submodule in GOTCHA.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
d4da72bf2d |
Update gotcha submodule from timemory (#277)
* Update gotcha submodule from timemory
* Fix build failure and add copilot suggestions
* Fix formatting errors
[ROCm/rocprofiler-systems commit:
|
||
|
|
45605cf645 |
SWDEV-525474 - Fixed CMake 4 configuration error (#244)
Update Timemory / GOTCHA modules
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
9d94e11e2b |
Update dyninst to v13 (#190)
Update Dyninst submodule
Refactoring of build scripts to build TBB, Boost, ElfUtils, and LibIberty, since Dyninst build scripts no longer do.
Workflows are now building Dyninst and its dependencies.
---------
Co-authored-by: marantic-amd <marantic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
d629a890a1 |
Fix RPM generation on RHEL 10 (#162)
* Set version for RPM_PACKAGE_OBSOLETES
* Not setting a version of omnitrace which rocprofiler-systems obsoletes will result in a warning
* Patch RPATH in libunwind files
* Update Timemory pointer
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
a2d567a467 |
Update dyninst submodule (#153)
Fix GGC 13 build failure caused by a missing include of <array>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
4695973cb8 |
Timemory and Dyninst were updated to use BinUtil v2.42 (#146)
Timemory and Dyninst were updated to use BinUtil v2.42 by the default
Update Dyninst submodule
Update Timemory submodule
[ROCm/rocprofiler-systems commit:
|
||
|
|
d148e94161 |
Fix hang in config file generation (#101)
- Updated Timemory module.
- Fixes a crash when running rocprof-sys-avail -G without explicitly providing -F <format>. The default value of "txt" was not being used.
- Define "choices" before "default" when defining the "--config-format" argument in the parser.
[ROCm/rocprofiler-systems commit:
|
||
|
|
aab094f1db |
Added libamd_comgr.so to internal modules and fix argument parsing module in Timemory (#96)
Updates Timemory submodule
[ROCm/rocprofiler-systems commit:
|
||
|
|
78168bca8d |
Allow ElfUtils_CONFIG_OPTIONS to provide additional configuration options (#58)
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
0c579dcc43 |
Changed libdir for external libraries built with autotools (#24)
[ROCm/rocprofiler-systems commit:
|
||
|
|
d8c98d2d4d |
OMPT Target Offload Support (#17)
- Porting from https://github.com/ROCm/omnitrace/pull/411
- Improve OMPT support
- Add OpenMP target example to testing
- Update Timemory submodule to use ROCm/Timemory rather than NERSC/Timemory
- Update `actions/upload-artifacts` to v4
- Standardize the `cmake_minimum_required` to 3.18.4 across workflows, project, and examples
- Updated Ubuntu 20.04 workflows
[ROCm/rocprofiler-systems commit:
|
||
|
|
484dba05ce |
Correct perfetto submodule pointer
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
218f8bcbea |
Update Perfetto and fix tests (#378)
Fix for "SWDEV-479652" - Perfetto-based tests are failing.
Updated version of perfetto submodule to v46.0.
Modified Omnitrace code that uses Perfetto, so it can compile.
Modified the testing code, so it can run the version of trace_processor_shell provided (v46.0).
---------
Signed-off-by: Aleksandar Janicijevic <Aleksandar.Janicijevic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
a705483ebc |
Fix compilation with GCC 13 and Ubuntu 24.04 (#362)
* Modified submodules dyninst and timemory.
* Modified PAPI submodule to fix GCC 13 build for Ubuntu 24.04.
* Updated PAPI submodule URL.
* Fixed papi submodule URL.
* Using latest tag (papi-7-1-0-t) for papi submodule.
* Update submodule dyninst to new version
* Added a mirror to elfutils
[ROCm/rocprofiler-systems commit:
|
||
|
|
a416c7d817 |
Modified submodules dyninst and timemory. (#361)
[ROCm/rocprofiler-systems commit:
|
||
|
|
80dc03eb99 |
Add ElfUtils and BinUtils Download URL Overrides (#358)
* Add CMake CACHE Variable ElfUtils_DOWNLOAD_URL
Used to override the default URL to download ElfUtils from.
Useful for internal builds
Also, include a mirror to fallback to if the override URL fails.
* Update timemory submodule
Updating to include the BINUTIL_DOWNLOAD_URL override cmake
variable.
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
8ad58c5d28 |
Build omnitrace-rt library (#355)
* Build omnitrace-rt library
- Explicitly build dyninstAPI_RT as omnitrace-rt so that the SONAME in the ELF is omnitrace-rt instead of dyninstAPI_RT
- Create symbolic link lib/omnitrace/libdyninstAPI_RT.so which points to lib/libomnitrace-rt.so
- Simplify build tree location of libomnitrace-rt.so since it is ../lib from the bin directory even in the build tree
- Update dyninst submodule with minor tweaks to dyninstAPI_RT/CMakeLists.txt
* Update source/lib/omnitrace-rt/cmake/platform.cmake
* Use ftpmirror.gnu.org instead of ftp.gnu.org
- in timemory and dyninst submodules
- minor .clang-tidy tweak
[ROCm/rocprofiler-systems commit:
|
||
|
|
65f46fde8c |
Update timemory submodule (#354)
- fix argparse::argument::required template deduction
[ROCm/rocprofiler-systems commit:
|
||
|
|
1dbb923682 |
Workflow, submodules, and thread info Updates (#352)
* Update CI workflows
- use node20 workflow packages
* Update tests/source/CMakeLists.txt
- Use OMNITRACE_TRACE and OMNTRACE_PROFILE instead of perfetto/timemory
* Update timemory submodule
- argparse: requires -> required
- parse callbacks
* Update thread_info.cpp
- fix causal::delay::get_local usage
* Update timemory submodule
* Update kokkos submodule
- release 3.7.02
* Revert opensuse.yml and ubuntu-bionic.yml to use node16 workflows
* Update docs.yml
[ROCm/rocprofiler-systems commit:
|
||
|
|
b5bdba12e4 |
Roctracer flush activity fix + perfetto.cfg (#317)
* Fix roctracer_flush_activity
- invoke roctracer_flush_activity() before disabling domains
* create comp::roctracer::flush()
- real issue was the global state when roctracer_flush_activity() was called
* formatting
* Update lib/omnitrace/library/components/roctracer.hpp
- provide definition of comp::roctracer::flush when OMNITRACE_USE_ROCTRACER is not defined
* omnitrace.cfg -> perfetto.cfg
- rename provided perfetto config file (omnitrace.cfg) to perfetto.cfg to avoid confusion
* Update lib/core
- gpu.hpp: defines for OMNITRACE_USE_{HIP,ROCTRACER,ROCPROFILER,ROCM_SMI}
- gpu.cpp
- include core/hip_runtime.hpp
- fix serialization of hipDeviceProp_t
- add hip_runtime.hpp
- ensure proper inclusion of hip_runtime.h
- add rccl.hpp
- ensure proper inclusion of rccl.h
* Update lib/omnitrace/library
- rcclp.cpp
- update includes for rccl
- roctracer.hpp
- update includes for hip_runtime
- components/comm_data.hpp
- update includes for rccl
- components/rcclp.hpp
- update includes for rccl
* Update bin/omnitrace-avail/avail.cpp
- update includes for hip_runtime
* Update examples/rccl/CMakeLists.txt
- fix find_package for rccl when CI enabled
* Update CMakeLists.txt
- set cmake policy CMP0135 to NEW for cmake >= 3.24
- Enable DOWNLOAD_EXTRACT_TIMESTAMP with ExternalProject_Add + URL download method
* Update timemory submodule
* Update pybind11 submodule
* Update pybind11 submodule
* Update lib/core/rccl.hpp
- include rccl.h only if OMNITRACE_USE_RCCL > 0
* Update lib/core/{gpu,hip_runtime}.hpp
* Update lib/core/gpu.cpp
- reintroduce some ppdefs
* Update lib/core/gpu.cpp
- fix ifdef on OMNITRACE_HIP_VERSION
* Update lib/core/gpu.cpp
- fix static assert for OMNITRACE_HIP_VERSION_MINOR when HIP version 4.x or older (unreliable minor versions)
* Update lib/core/gpu.cpp
- fix ifdef on OMNITRACE_HIP_VERSION
* Update lib/core/config.cpp
- disable OMNITRACE_PERFETTO_COMBINE_TRACES by default
* Update lib/core/perfetto.cpp
- if unable to open perfetto temp file, return the ReadTraceBlocking()
* Update lib/core/config.*
- flush tmpfile before closing
[ROCm/rocprofiler-systems commit:
|
||
|
|
82c9cdd9b5 |
Update GOTCHA submodule (via timemory submodule) (#295)
Update timemory submodule
- updated GOTCHA submodule
- support for wrapping the latest symbol version
- CI updates
- macos-ci in GitHub actions
- misc fixes for macOS
- fixed yaml-cpp install
[ROCm/rocprofiler-systems commit:
|
||
|
|
c6929f545d |
Perfetto annotation from timemory components (#289)
* Annotate perfetto with timemory component data
- support perfetto annotations via timemory component data, e.g. use PAPI component for exact HW counter annotations
* Tests for perfetto annotation via timemory data
* Update omnitrace-instrument
- remove --default-components argument as this overrides any components set in configuration file
- required by perfetto annotation via timemory data tests
* filter unavailable timemory components
- filter out unavailable timemory components before attempting to invoke the annotate operation on the bundle
* update annotate tests
- account for no PAPI support
* update lulesh-timemory test
- replace '-d wall_clock peak_rss' with '--env OMNITRACE_TIMEMORY_COMPONENTS="wall_clock peak_rss"'
* annotate tests update
- fix misnamed test
* annotate tests update
- restrict binary rewrite to run function to force instrumentation despite heuristics
* annotate tests update
- print {available,overlapping,excluded,instrumented} functions during binary rewrite
* annotate tests update
- add allow-overlapping flag
* Support PAPI with CAP_SYS_ADMIN
- do not disable PAPI if perf_event_paranoid > 2 but has CAP_SYS_ADMIN capability
[ROCm/rocprofiler-systems commit:
|
||
|
|
97011ea642 |
Fix thread index values (#287)
* Update PTL
- PTL submodule waits for threads to start before proceeding
* Initialize perfetto after init_bundle
- perfetto thread creation after pthread_create wrapped
* backtrace component update
- exclude gotcha call-tree
* callchain component update
- callchain::get sorts based on timestamp
- callchain::sample supports duplicate IPs (recursion)
* Bump version to 1.10.1
[ROCm/rocprofiler-systems commit:
|
||
|
|
b65f8e7605 |
CI timeout + line-info in releases (#279)
* Update perfetto args.gn.in
- remove enable_perfetto_tools_trace_to_text (unused)
* core timeout implementation
- requires OMNITRACE_CI=ON
- requires OMNITRACE_CI_TIMEOUT=<sec>
- adds pthread_self and std::this_thread::get_id to thread info
- pthread_create_gotcha stores native handles (pthread_self)
* Testing updates
- improve detection of segfault/failures with PASS_REGEX exists
- add OMNITRACE_CI_TIMEOUT env variable to all tests
* Line-info in releases
- e.g. -g1 + more options to minimize size of debug info
* Fix typo in config exit action message
* OMNITRACE_UNLIKELY around debug/verbose messages
* format fixes
* Overflow tests + capability check
* transpose example update
- link to threads library
* roctracer/rocprofiler update
- in ROCm 5.5.0, cannot include rocprofiler.h and roctracer.h in same file due to conflicting enum defs
- Moved HSA tracing setup/shutdown to component::roctracer
* roctracer update
- fix definition of roctracer::setup when disabled
* Update fork example
- detach threads on main PID
- flush io outputs when printing info
* Update overflow tests
- pass regular expressions
- overflow on PERF_COUNT_SW_CPU_CLOCK event
* fork gotcha update
- use getpid() instead of getppid()
* update fork example
- wait on threads calling fork
* timeout update
- wait on timeout thread to launch before proceeding
[ROCm/rocprofiler-systems commit:
|
||
|
|
557adea45a |
Linux Perf Support + Causal Profiling Updates (#276)
* causal backtrace updates
- fix initial causal sampling period value
* causal delay updates
- tweak handling of sleep_for_overhead
* Fix experiment global scaling for prog pts
- results in drastically improved predictions
* pthread_mutex_gotcha updates
- disable all wrappers during causal profiling
* validate-causal-json.py updates
- support decimal stddev
- fix setting stddev from command-line
* causal perform_experiment_impl update
- handle start failing because finalizing
* deprecate causal::component::sample_rate
- appears to not help at all
* Rework sample info
* Increase causal unwind_depth
- use OMNITRACE_MAX_UNWIND_DEPTH
* validate-causal-json updates
- min experiments
- exclude reporting predictions with less than X experiments at a given speedup
- percent samples
- only print samples within X% of the peak (default: 95%)
* Update timemory submodule
- extensions to sampling for signals delivered via non-timer method
- e.g. via HW counter overflow
* dwarf_entry::operator< updates
- sort via file
* causal profiling docs updates
- info about backends
- info about installing/enabling perf
* config updates: causal backend
- CausalBackend enum
- OMNITRACE_CAUSAL_BACKEND: perf, timer, auto
- omnitrace-causal option: --backend
* debug update
- use spin_mutex instead of std::mutex
* address_range::contains update
- range from 0-100 contains range from 10-100 but was returning false because high was == 100 not < 100
* symbol::operator< update
- handle load address differences
* sampling updates (non-causal)
- update get_timer to get_trigger + dynamic_cast
* container::static_vector updates
- support construction from container::c_array
- update_size private member func for handling atomic m_size
* Move perf files
- moved library/causal/perf.{hpp,cpp} to library/perf.{hpp,cpp}
* causal example update
- created impl.hpp (forward decls)
- renamed {cpu,rng}_func_impl to {cpu,rng}_impl_func
- only create two threads which run N iterations instead of two threads each iteration
* Update timemory submodule
- updates to unwind::processed_entry
- updates to procfs::maps
* Updated causal documentation
- fixed line numbers changed by modifications to causal example
* omnitrace-causal exe updates
- set OMNITRACE_THREAD_POOL_SIZE to zero by default
* core/containers updates
- static_vector: provide data() member function
- c_array pop_front() and pop_back() member functions
* core: config and argparse updates + perf
- core/perf.{hpp,cpp}
- forward decl of enums
- config-related capabilities
- argparse: --sample-overflow
- renamed some config functions
- e.g. get_sampling_cpu_freq -> get_sampling_cputime_freq
- added config settings related to overflow sampling via perf
- added timer_sampling and overflow_sampling categories
* Update timemory submodule
- sampling allocator flushing
* binary updates
- lookup_ipaddr_entry
- use bfd_find_nearest_line instead of bfd_find_nearest_line_discriminator
- discriminators are not used
- explicit instantiations of inlined_symbol::serialize
* Bump VERSION to 1.10.0
* sampling and perf updates
- support overflow sampling via Linux Perf
- update perf namespace
- update perf::perf_event
- update record ctor: pointer instead of const ref
- update open member func: return optional string
- add m_batch_size member variable
- sampling updates
- support overflow sampling
- flush allocators
- increase buffer size from 1024 to 2048
- restructure post-processing in light of perf overflow supports
- improve offload memory usage only load buffers for thread
- load_offload_buffer(tid) uses thread-specific filepos
- component updates
- backtrace_metrics::operator-=
- backtrace_metrics::operator-
- backtrace::sample does not record for overflow signal
- callchain: perf overflow sample
* core updates
- component::sampling_percent does not report self + uses_percent_units
* causal updates
- tweak get_line_info
- overloads for set_current_selection (uint64_t, c_array, std::array)
- delay
- use sampling::pause/sampling::resume
- experiment
- experiment::sample derives from unwind::processed_entry
- experiment::samples is vector instead of set
- fixed samples
- overloads for is_selected (uint64_t, c_array, std::array)
- scaling factor defaults to 100 instead of 50
- serialize updates follow change to experiment::sample
- modify algorithm for increasing/decreasing experiment length
- sample_data
- use map<uintptr, uint64_t> instead of set<sample_data>
- get_samples returns vector<sample_data> instead of set<sample_data>
- sampling
- support overflow via Linux Perf
- update causal_offload_buffer
- flush sampling allocator
- backtrace
- overflow component
* libomnitrace-dl updates
- handle dl::InstrumentMode::PythonProfile
* testing updates (causal)
- causal line 155 -> causal line 100
- causal line 165 -> causal line 110
* formatting
* exit_gotcha updates
- exit_info for abort()
- message about non-zero exit code
* testing updates
- fail regex for causal tests
- validate-causal-json: >= min_experiments instead of > min_experiments
- handle OMNITRACE_DEBUG_SETTINGS in omnitrace_write_test_config
* causal sampling updates
- add new lines where appropriate
* causal data updates
- reorder diagnostic info when experiment fails to start
* binary updates
- symbol address range from address to address + symsize + 1
- add 1 based on debug info
* causal data updates
- sample_selection wait_ns defaults to 1,000 instead of 10,000
- sample_selection wait scaled by iteration number
- save_line_info_impl verbosity
- print latest_eligible_pc when experiment does not start
* causal sampling + component updates
- perf backend disables component::backtrace
- ensure get_sampling_(realtime|cputime|overflow)_signal do not malloc
* causal: remove period stats
* validate-causal-json update
- fix --help
* causal data updates
- improve eligible pc history reporting when experiment fails to start
* causal data updates
- fix compute_eligible_lines_impl
- eligible address ranges returning too many ranges
- occasionally, overwrite all *true* eligible address ranges
* causal data updates
- reduce scoped ranges to symbol ranges
- is_eligible_address() returns true contains (not just coarse)
- revert some sample_selection behavior
* binary address_multirange updates
- make coarse_range private
- fix operator+=(pair<coarse, uintptr_t>)
* causal example update
- fix nsync to default to once per iteration
* binary analysis updates
- tweak header file includes
* causal updates
- remove factoring in sleep_for_overhead
- invoke delay::process() even if experiment is not active
* causal data updates
- update latest_eligible_pc structure
* update omnitrace-install.py.in
- fix support for fedora
- /etc/os-release does not have ID_LIKE
- fallback to RHEL 8.7 if version not specified
* update omnitrace-install.py.in
- fix support for debian
- /etc/os-release does not have ID_LIKE
- version mapping
* Update documentation
- update docs on installation
* causal data and experiment updates
- data: reset_sample_selection
* causal set_current_selection debugging
- debug messages for failed e2e runs
* causal data and backtrace component updates
- data: set_current_selection returns the number of eligible addresses added
- backtrace: if cputime signal has selected zero IPs > 5x, then realtime signal starts contributing call-stacks
* core library updates
- move config::parse_numeric_range to utility namespace
- add core/utility.cpp
- support range:increment, e.g. 5-25:10 expands to '5 15 25' instead of '5 10 15 20 25'
* omnitrace-causal update
- end-to-end expands all speedups
- support range:increment in speedups
* causal backtrace updates
- remove select_ival (realtime signal always contributes when select_count == 0)
* containers: static_vector update
- explicit c_array constructor
- explicit std::array constructor
* causal data updates
- remove set_current_selection(uint64_t)
- remove set_current_selection(std::array)
- sample_selection increase default wait time
- report eligible PC candidates
- move reset_sample_selection to perform_experiment_impl
- decrease latest_eligible_pc array size
- set_current_selection does not guard for experiment::active
* core debug updates
- OMNITRACE_PRINT_COLOR macros
* causal data updates
- tweak to experiment never started message
* causal gotcha updates
- remove unused code
* critical trace updates
- remove unused code
* omnitrace-causal
- OMNITRACE_LAUNCHER
* causal data updates
- don't fail on end-to-end + omnitrace-causal
* causal backtrace updates
- reintroduce select_ival behavior
* causal data updates
- tweak verbose messages about number of PC candidates
* core mproc updates
- utilities for waiting on child PID and diagnosing status
- omnitrace::mproc::wait_pid
- omnitrace::mproc::diagnose_status
* omnitrace-run updates
- support --fork argument for executing via fork in current process + execvpe on child instead of execvpe in current process
* omnitrace-causal updates
- wait_pid and diagnose_status just call equivalent functions in omnitrace::mproc
* ubuntu-focal workflow update
- attempt to launch ubuntu-focal-codecov job with CAP_SYS_ADMIN and use perf backend
* tests reorg and updates
- remove binary-rewrite-sampling and runtime-instrument-sampling tests
- rename *-preload tests (which use omnitrace-sample exe) to *-sampling
- split tests/CMakeLists.txt into several tests/omnitrace-<category>-tests.cmake files
- tweak to causal-both-omni-func test
- add args: -n 2 -b timer
* update validate-causal-json.py
- better reasoning info for adjusting tolerance
- always apply tolerance adjustments in CI mode
* causal e2e tests update
- add label "causal-e2e" label
- tweak params
- old: 80 12 432525 500000000
- new: 80 50 432525 100000000
- disable processor affinity for slow-func/line-100 tests
- artificially inflates some speedups with perf
* unblocking_gotcha updates
- overload operator() according to gotcha function index
* blocking_gotcha updates
- overload operator() according to gotcha function index
- fix bug where potentially post block functors (e.g. pthread_mutex_trylock) throw error if lock is not acquired.
* parse_numeric_range update
- support unordered_set
* config update
- OMNITRACE_DEBUG_{TIDS,PIDS} use parse_numeric_range
[ROCm/rocprofiler-systems commit:
|
||
|
|
70c8d1229c |
rocprofler_iterate_info workaround + omnitrace-avail update (#270)
* rocprofler_iterate_info workaround + omnitrace-avail update
- provides workaround for rocprofiler_iterate_info behavior change in ROCm 5.4.0-3
- update timemory submodule with argparse tweaks
- updates hsa_rsrc_factory.{hpp,cpp}
- colorized log in omnitrace-avail
- Bump version to 1.9.2
* Fix empty_base inheritance
- timemory's component::empty_base inherits from concepts::component so direct inheritance was removed
* Fix OMNITRACE_HIP_VERSION_COMPAT_STRING
- defined as "" when OMNITRACE_HIP_VERSION_MAJOR==0
* new defines + extra info
- define OMNITRACE_LIBRARY_ARCH (via CMAKE_LIBRARY_ARCHITECTURE)
- define OMNITRACE_SYSTEM_NAME (via CMAKE_SYSTEM_NAME)
- define OMNITRACE_SYSTEM_PROCESSOR (via CMAKE_SYSTEM_PROCESSOR)
- define OMNITRACE_SYSTEM_VERSION (via OMNITRACE_SYSTEM_VERSION)
- define OMNITRACE_COMPILER_ID (via CMAKE_CXX_COMPILER_ID)
- define OMNITRACE_COMPILER_VERSION (via CMAKE_CXX_COMPILER_VERSION)
- include this info in metadata
- include subset of this info in --version for bin tools
- tweak to perfetto verbose messages
[ROCm/rocprofiler-systems commit:
|
||
|
|
a1213480e0 |
Roctracer perfetto flow fixes (#267)
* testing label updates
- automatically add "gpu", "roctracer", "rocm-smi", and "rocprofiler" test labels when appropriate
* Bump version to v1.9.1
* roctracer and config updates
- fix perfetto::Flow
- use roctracer correlation ID instead of critical trace correlation ID
- renamed ambiguous _cid, _parent_cid, _corr_id variables to _crit_cid, _parent_crit_cid, _roct_cid
- use atomic_{mutex,lock} instead of STL mutex/lock
- support for individual perfetto annotations for HIP API args
- OMNITRACE_PERFETTO_COMPACT_ROCTRACER_ANNOTATIONS option for controlling compact vs. individual perfetto annotations for HIP API args
* Update timemory submodule
- argparser updates
- help prints to std::cout by default now
- supports setting custom ostream
* cmake formatting
* config::get_setting_value updates
- config::get_setting_value returns std::optional instead of std::pair<bool, Tp>
[ROCm/rocprofiler-systems commit:
|
||
|
|
a8c505c91d |
omnitrace-run executable - required for running binary writes (#257)
* omnitrace-run exe
- ensure LD_PRELOAD for libomnitrace-dl.so
- convert config options into command-line options
* Update timemory submodule
- updates to tsettings
- updates to argparser
* common environment update
- throw error if get_env<bool> has empty string
* config updates
- minor tweaks to categories of settings
* core lib update
- add argparse for common handling of argument parsers
* omnitrace-sample update
- fix handling of --trace-file (OMNITRACE_PERFETTO_FILE)
* omnitrace-run update
- updated to use omnitrace::argparse functions
* Tests for omnitrace-run
* argparse core update
- remove choices for --cpu-events and --gpu-events
* remove some debugging prints
* fix timemory include in argparse.cpp
* always provide --hsa-interrupt option
* Update source/lib/core/argparse.cpp
- fix pedantic warning
* Update testing
- remove testing args that may not be there in some builds
* roctracer/pthread_create fix
- disable roctracer_data when roctracer not enabled
* omnitrace-causal tweak
* omnitrace-instrument: module_function tweak
- allow DEFAULT_MODULE and LIBRARY_MODULE
* common environment update
- support get_env for enums
* core: config update
- Add "mode" category to OMNITRACE_MODE
* Update timemory submodule
- remove debug print statement
* omnitrace-sample tweak
- change var init
* omnitrace-run testing update
- use --help instead of -?
* core: common.hpp
- tweak header include style
* core: argparser update
- add_ld_preload func
- launcher and command member variables in parser_data
- support launcher
* omnitrace-run update
- clean up and reworked
* libomnitrace-dl updates
- require LD_PRELOAD with binary rewrite
- dl::InstrumentMode
- dl::get_instrumented()
- verify_instrumented_preloaded()
- omnitrace_set_instrumented(int)
- relocated omnitrace_main from main.c to dl.cpp
- omnitrace_set_env does not dlopen libomnitrace
- omnitrace_set_main(func_ptr) [internal API]
- OMNITRACE_HIDDEN_API -> OMNITRACE_INTERNAL_API
* Update testing to new LD_PRELOAD requirements
* omnitrace-instrument updates
- adhere to LD_PRELOAD requirementsa
- invoke omnitrace_set_instrumented
- binary rewrite does not instrument main
- binary rewrite does not instrument call to omnitrace_init
- runtime instr does not instrument main
- runtime instr does not instrument call to omnitrace_init
* Bump to v1.9.0
- LD_PRELOAD requirement necessitates minor version increment
* common: environment
- fix ambiguous get_env calls
* omnitrace-instrument update
- fix issue with temporaries
* omnitrace-instrument and libomnitrace-dl updates
- runtime instrumentation does not work if libomnitrace-dl is preloaded
* libomnitrace-dl and libpyomnitrace updates
- define dl::InstrumentMode in dl.hpp
- handle instrumentation via setprofile libpyomnitrace
- do not push trace in omnitrace_init
* omnitrace-instrument and libomnitrace-dl updates
- move header to dl subdirectory
- omnitrace::omnitrace-headers include omnitrace-dl folder
- use InstrumentMode in omnitrace-instrument
* Update workflows and scripts
- Use omnitrace-run on instrumented exes
* Update docs
- add omnitrace-run to examples of running binary rewritten exes
[ROCm/rocprofiler-systems commit:
|
||
|
|
61a050fd1d |
omnitrace -> omnitrace-instrument (#256)
* omnitrace-exe -> omnitrace-instrument
- Renamed omnitrace executable to omnitrace-instrument
- Provided dummy omnitrace exe which forwards onto omnitrace-instrument
- updated all docs to reflect the name change of the executable
- however, it is possible some were missed
* Update dyninst submodule
- correctly handle BOOST_LINK_STATIC in DyninstBoost.cmake
* Disable IPO for omnitrace-instrument
[ROCm/rocprofiler-systems commit:
|
||
|
|
446fd36a93 |
Add RedHat CI and release packaging (#251)
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
- improves the stability of MPI finalization
- reduces some debug messages within timemory when `OMNITRACE_DEBUG=ON`
- fixes issue found in RHEL where libunwind is using mutex and omnitrace was not treating this as an internal mutex call
- this may have been affecting the causal profiling slightly (tests seem a bit more stable now)
- fix data race in timemory
* Add RedHat CI and release packaging
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
* Fix URL for ROCm packages in redhat workflow
* Fix dnf --enable-repo for ROCm perl packages
* Dockerfile.rhel and redhat.yml updates
- Fix dnf repo for ROCm PERL packages
- Disable python in CI (interpreter segfaults)
- Exclude parallel-overhead-locks tests due to inclusion of internal locks
- This needs to be remedied in the future
* Exclude _dl_relocate_static_pie from instrumentation
* Testing updates
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks
* Fix redhat workflow
* redhat.yml update
- remove if condition on config/build/test step
* Update timemory submodule
- tweaks to verbosity messages
* Set thread state before unw_step
- on Redhat, unw_step calls mutex
* Update timemory submodule
- verbosity changes
- gotcha uses spin_lock/spin_mutex
* Remove using gsplit-dwarf unless OMNITRACE_BUILD_NUMBER > 2
* Re-enable parallel-overhead-locks tests in redhat workflow
* Always disable timemory manager metadata auto output
* testing updates
- tweak parallel-overhead-locks-timemory to higher instruction count min
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks-perfetto
* Update timemory submodule
- quiet realpath queries
* omnitrace exe updates
- detect text files
- improved bin/lib locating
* cmake format
* test-install.sh and redhat workflow updates
- handle testing when ls is script
- re-enable python testing on redhat workflow
- invoke test-install.sh in redhat workflow
* Misc guards for finalization
* omnitrace-exe, testing updates
- test-install.sh: LS_EXEC -> LS_NAME
- handle /usr/bin/ls being script in source/bin/tests
- improve locating the binary
* Fix mpi_gotcha compile error
* omnitrace-exe updates
- improve file locating
* formatting
* Misc fixes
- remove -static-libstdc++ for RHEL packaging (rocky-linux doesn't distribute static lib)
* omnitrace-exe paths
* Replace realpath with absolute
- using absolute path to symlink fixes issues with locating libdyninstAPI_RT at runtime
* omnitrace exe updates
- judicious use of realpath
* Update timemory submodule
- fix update main hash ids/aliases data race in merge
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
* omnitrace exe updates
- Update resolved exe/lib messaging
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
[ROCm/rocprofiler-systems commit:
|
||
|
|
49851b05ae |
Address and thread sanitizer fixes (#250)
* Address and thread sanitizer fixes
- Fix compilation with clang
- Tweak perfetto copy to build tree
- Added suppression files to scripts
- fix LD_PRELOAD support in omnitrace-causal and omnitrace-sample
- use spin_mutex and spin_lock from timemory instead of atomic_mutex and atomic_lock
- state uses atomic
- fix some memory leaks
- tweak testing
- mpi tests do not use preload
- increase timeout when using sanitizers
- add env LD_PRELOAD when using sanitizers
* Tweak perfetto build
* Update timemory submodule
* Update version to 1.8.1
* Update omnitrace-leak.supp
* Update timemory submodule
- fixed spin_mutex implementation
* Remove previously added addr_space->allowTraps(instr_traps)
- this appears to cause errors during binary rewrite
* causal testing updates
- relaxed causal validation on CI systems (to account for hyperthreading decreasing prediction)
- improved impact calculation
- other general improvements to validate-causal-json.py
* Improve fork handling for perfetto
- numerous updates changing perfetto:: to ::perfetto::
- added perfetto_fwd.hpp
* Updated fork example
- user API for validation that stopping/starting perfetto is valid
* Misc fixes to perfetto + fork support
- tweak regions in fork example
- handle disabling tmp files
- get rid of stop/start with perfetto before/after fork
- fixed sampling support during fork
- tweak env of fork test
* Fix find_package in build-tree
* Fix buildtree export
* Fix buildtree export
* Restructured ConfigInstall before adding examples
* Guard against creating tmp file in sampling when disabled
* Fix buildtree package
* formatting
* exit handlers on child processes
- quick exit to avoid perfetto cleanup
* Further tweaking of causal tests for reliability
- enable PROCESSOR_AFFINITY
- decrease to 5 iterations
* Further tweaking of causal tests for reliability
- disable PROCESSOR_AFFINITY for fast func e2e tests
- enabling affinity results in (valid) speedup predictions greater than zero
* Fixes to fork handling
- use pthread_atfork for redundancy if fork_gotcha fails
* cmake formatting
* Fix fork init settings + install components
- remove dl from PROJECT_BUILD_TARGETS
* Testing tweaks
- fix mpi-binary-rewrite-run regex when OMNITRACE_VERBOSE set > 1 in env
- increase causal e2e iterations to 8
* Fix "Test User API"
- test-find-package.sh included dl component
* Further tweaks to causal validation
- further considerations of variance
[ROCm/rocprofiler-systems commit:
|
||
|
|
5eb895fc7d |
Causal profiling fixes (#241)
- corrections in the calculations for latency and throughput points in `validate-causal-json.py`
- `omnitrace-causal` LD_PRELOAD libpthread
- ensures omnitrace is always wrapping libpthread.so pthread symbols
- minimal experiment delay
- always sleep 10 milliseconds before starting experiments
- ensures ~10 samples are taken to determine the sampling rate
- fixes issue with deadlocks on condition variables
- overhaul of `causal::component::blocking_gotcha` and `causal::component::unblocking_gotcha` components
- these components enforce the processing/crediting of delays before/after a thread is suspended
- these components wrap functions `pthread_cond_wait`, `pthread_cond_signal`, `pthread_mutex_lock`, etc.
- Fully implemented correct handling of processing/crediting delays based on return values and arguments
- E.g. skip crediting delay if `pthread_mutex_trylock` fail acquiring lock
- E.g. `kill`, `sigwait`, etc. check to make sure they are only applied if the PID matches its PID
## Condition Variable Deadlock Fix
In parallel applications using condition variables, it was found that the causal profiling was virtually guaranteed to deadlock. Although it was difficult to prove, evidence suggested that this was due to the work that was being done while taking a sample was causing notification to the condition variable to be lost. This was alleviated by the following updates:
- Separate out the part of `causal::backtrace::sample(int)` which calculates the sampling rate into small `sample_rate` component
- This component is essentially "always on" during sampling
- Added bundle of components invoked by `causal_sampler_t` during sampling
- Added two function calls to support disabling and re-enabling calls to `causal::backtrace::sample(int)` on a per-thread basis
- `causal::sampling::block_backtrace_samples()`
- `causal::sampling::unblock_backtrace_samples()`
- These two function now surround the wrappee functions of `blocking_gotcha` and `unblocking_gotcha`
**This solution was experimentally validated with a Geant4 application which uses a tasking model which makes _numerous_ calls to wait on a condition variables** (it was this application which exposed the bug)
* Fix validate-causal-json.py
- corrections in the calculations for latency and throughput points
* Update timemory submodule
- support for thread-local trait::runtime_enabled
* omnitrace-causal: LD_PRELOAD pthread library
- ensures omnitrace is always wrapping libpthread.so pthread symbols
* initial experiment delay
- always sleep 10 milliseconds before starting experiments
- ensures ~10 samples are taken to determine the sampling rate
* sample_rate component + block_backtrace_samples
- separate out the part of backtrace::sample which calculates the sampling rate into small sample_rate component
- add sample_rate component to causal_bundle_t used by causal_sampler_t
- causal::sampling::block_backtrace_samples() disables backtrace samples from being taken on a thread
- causal::sampling::unblock_backtrace_samples() enables backtrace samples from being taken on a thread
- above two function surround calls to function wrapped by blocking_gotcha and unblocking_gotcha
- the work happening in backtrace::sample when within these calls
produced deadlocks for condition variables (notifications to
condition variables were lost)
* blocking/unblocking gotcha updates
- overhaul of blocking_gotcha and unblocking_gotcha
- added fast_gotcha trait: replace function calls instead of wrapping
- when wrappees are called, backtrace samples are suppressed (thread-local)
- properly handle kill, sigwait, sigwaitinfo, sigtimedwait
- properly handle all instances of applying postblock based on return value
* Fix calculation of OMNITRACE_MAX_THREADS
* removed unnecessary checks in causal::delay
* Updated timemory with internal compiler error fix
[ROCm/rocprofiler-systems commit:
|
||
|
|
ecc794276c |
Misc fixes before v1.8.0 release (#239)
* Update timemory submodule for OMPT
- Updated OMPT support for OpenMP 5.2
* omnitrace exe cleanup
- fixed "omnitrace --" segfault
- added nullptr checks
* CMake updates
- moved omnitrace-interface-library definition up a directory
- general cleanup
- fixed branch/tag/ref for git submodule checkouts
* Improve shutdown of causal profiling after duration limit
* Fix dyninst minimum version number
* Removed debug print from binary::get_link_map
* Remove use of thread-pool in causal
* Relax causal testing when variance is high
* causal_gotcha utilities for blocking signals
* Tweak to causal example
* Install validate-causal-json as omnitrace-causal-print
* simplify address_multirange
* improve causal line saving
[ROCm/rocprofiler-systems commit:
|
||
|
|
1c6aaafe96 |
Handle fork in target application (#191)
* Always print PID in log messages
* omnitrace-dl updates
- omnitrace_preload does not call omnitrace_init or omnitrace_init_tooling
- omnitrace_preload will call omnitrace_set_mpi if OMNITRACE_USE_MPI
or OMNITRACE_USE_MPIP in the env is true but not call it otherwise
because doing so either overrides OMNITRACE_USE_PID (when true) or
disable mpip from initialization (when false) and the MPI
init can be caught later and override OMNITRACE_USE_PID
* config updates
- set_setting_value sets user update type
- remove volatile from get_settings_configured
- don't override settings::default_process_suffix
- don't kill process in omnitrace_exit_action
- set_state ignores updating state if >= State::Finalized
* Handle state > State::Finalized
* fork gotcha updates
- unsets LD_PRELOAD
- sets OMNITRACE_ROOT_PROCESS
- sets OMNITRACE_CHILD_PROCESS
* libomnitrace library.cpp updates
- basic_bundle for fini metrics
- handle finalization from child process
* sampling updates
- sampling::shutdown handles when child process
* Add example and test using fork
* Update run-ci script to support not submitting
* Tweak test envs
* Update build flags when codecov enabled
* remove unnecessary includes of sampling header
* Replace mpi copy/fini static lambda with free-funcs
* Update codecov job
* Fix OMPT segfaults after finalization
* Miscellaneous updates after rebase
* fixes for causal profiling
* revert some run-ci.sh changes
* Disable storing env in sampling::shutdown
* formatting fix
* Update timemory submodule
- fixed occasional synchronization issues with allocator offloading
- exclude protozero:: from internal samples
* improve root/child process detection
- avoid omnitrace_finalize in MPI when child process
- revert some testing tweaks
[ROCm/rocprofiler-systems commit:
|
||
|
|
b2bedda138 |
restructure libomnitrace + tasking and omnitrace-causal updates (#237)
* restructured libomnitrace
- this is necessary to incorporate some of the binary analysis capabilities into omnitrace exe
- created libomnitrace-core (static)
- created libomnitrace-binary (static)
- created libomnitrace (static)
- omnitrace-avail links to libomnitrace.a
- omnitrace-critical-trace links to libomnitrace.a
- tweaked the testing
- reduced verbosity on some of MPI tests
- excluded trace-time-window from tests on Ubuntu 18.04
- reduced causal e2e iterations
- minor tweak to tasking
- manually create `PTL::UserTaskQueue` instance instead of relying on `PTL::ThreadPool` to create it
* Update formatting workflow
- source formatting uses ubuntu-22.04
- check-includes doesn't generate false positive for 'include "timemory.hpp"'
* omnitrace-causal --generate-configs
- fix config generation in omnitrace causal
- add test for omnitrace-causal + generating configs
* Fix omnitrace-object-library build
- accidentally included rocm sources in non-rocm builds
* Fix rocm compilation w/o rocprofiler
* update timemory submodule with mpi_get warning messages
* sampling offload file updates
- more verbose messages
- disable offload before stopping
* testing updates
- increase causal e2e iterations to 12
- increase lock_environment verbose to 2 (for sampling offload messages)
- fix return for omnitrace_add_validation_test
[ROCm/rocprofiler-systems commit:
|
||
|
|
7e63db9441 |
Global trace delay and duration (#235)
- The primary feature of this PR is the **addition of support for scoping the collection of tracing/profiling data into one or more time-based windows**
- Closes #222
- Closes #207
- Support for a real-clock time delay and/or a duration for tracing/profiling was added, *resembling the support for this feature during sampling and process-sampling*
- However, above paradigm was enhanced for tracing
- Instead of one delay and/or one duration based on real time, ***tracing supports periodic and varying delays and durations and these delay+duration sets can be controlled with different clocks***
- At some point, this capability will be extended to sampling and process-sampling
- A secondary feature of this PR are the improvements to the handling of categories (by-product of the primary feature)
- For example, previously setting `OMNITRACE_ENABLE_CATEGORIES` to a specific set of categories only eliminated the disabled categories from the perfetto trace, now these are applied to timemory profiles too
- A new configuration variable `OMNITRACE_DISABLE_CATEGORIES` was added for when disabling only a handful of categories is easier
- There are quite a few miscellaneous modifications which pollute this PR a bit
## Multiple Tracing Windows
As noted above, tracing now supports specifying multiple delays and durations _and_ with different clocks. Consider the configuration below with two entries in the format `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_TYPE>`:
```console
OMNITRACE_TRACE_PERIODS = 0.5:1.0:2:realtime 10.0:5.0:3:cputime
```
The above configuration defines:
1. `0.5:1.0:2:realtime`
- A delay of 0.5 seconds (real-time)
- Followed by a data collection duration of 1 second (real-time)
- This delay + duration is repeated 2x
- Summary: tracing data is collected for 2 out of the first 3 seconds of the application's execution
2. `10.0:5.0:3:cputime`
- A delay of 10 seconds (process _CPU-time_)
- Followed by a data collection duration of 5 seconds (process _CPU-time_)
- This delay + duration is repeated 3x
- Summary: tracing data is collected for a total of 15 seconds of process CPU-time in the ensuing 75 seconds of CPU-time during the application execution.
- Note: the elapsed CPU-time is the aggregate of the CPU-time consumed by all the threads in the process and should be scaled accordingly, e.g., 4 threads running constantly for 1 second of real-time is ~4 seconds of CPU time.
## `omnitrace-sample` Changes
Formerly, `--wait` and `--duration` command-line options only applied to sampling delay and duration. The value of these options are now applied to the tracing delay and duration. To retain the ability to control sampling delay/duration without setting tracing delay/duration or vice versa, `--sampling-wait`, `--sampling-duration`, `--trace-wait`, and `--trace-duration` options were added. `omnitrace-sample` also has new options for most of the new configuration options detailed below.
## New configuration options
| Option | Description |
| ------- | ----------- |
| `OMNITRACE_DISABLE_CATEGORIES` | inverse behavior from `OMNITRACE_ENABLE_CATEGORIES` -- populates list of all available categories and then removes the specified ones. |
| `OMNITRACE_TRACE_DELAY` | Single floating-point number specifying time to wait before starting data collection. Analagous to `OMNITRACE_SAMPLING_DELAY` and `OMNITRACE_PROCESS_SAMPLING_DELAY` |
| `OMNITRACE_TRACE_DURATION` | Single floating-point number specifying data collection duration. Analagous to `OMNITRACE_SAMPLING_DURATION` and `OMNITRACE_PROCESS_SAMPLING_DURATION` |
| `OMNITRACE_TRACE_PERIOD_CLOCK_ID` | Sets the default clock-type for tracing delay/duration. Always applied to above two options, can be overridden in below option. Accepts `CLOCK_REALTIME`, `CLOCK_MONOTONIC`, `CLOCK_PROCESS_CPUTIME_ID`, `CLOCK_MONOTONIC_RAW`, `CLOCK_REALTIME_COARSE`, `CLOCK_MONOTONIC_COARSE`, `CLOCK_BOOTTIME`. See `man 2 clock_gettime` for details on differences. |
| `OMNITRACE_TRACE_PERIODS` | More powerful version for specifying delay + duration. Supports formats: `<DELAY>`, `<DELAY>:<DURATION>`, `<DELAY>:<DURATION>:<REPEAT>`, and `<DELAY>:<DURATION>:<REPEAT>:<CLOCK_ID>`. |
## Miscellaneous Changes
- Expanded `critical_trace_categories_t` to include tracing data from MPI, pthread, HIP, HSA, RCCL, NUMA, and Python.
- Added categories `thread_wall_time` and `thread_cpu_time` (derived from sampling)
- Read DWARF info for breakpoints
- Relocated some source code
- Reason: necessary to make `libomnitrace` a bit more modular. Eventually, a large chunk will be separated into `libomnitrace-core`, `libomnitrace-binary`, etc. in order to facilitate re-usability
- Relocated some functionality from `runtime.cpp` to `config.cpp`
- Relocated code using rocm-smi library to query number of devices to `gpu.cpp` (where the code for using HIP to query number of devices is)
- Relocated code for perfetto config and perfetto session out of tracing namespace to reside with other perfetto code
- `OMNITRACE_COLORIZED_LOG` configuration option renamed to `OMNITRACE_MONOCHROME`
- Backwards compatibility via a deprecated option was not retained here since the logic changed (i.e. true in former means false in latter)
- Replaced `TIMEMORY_DEFAULT_OBJECT` macro with `OMNITRACE_DEFAULT_OBJECT` macro
- Updated some code in roctracer to use `component::category_region` instead of explicitly using `tracing::` functions
- Updated `backtrace_metrics` to better support controlling their presence in the traces/profiles via categories
- Added support for `--print` in `validate-timemory-json.py`
- Generic `OMNITRACE_ADD_VALIDATION_TEST` CMake function
## Git Log
* OMNITRACE_DEFAULT_OBJECT
- replace TIMEMORY_DEFAULT_OBJECT with TIMEMORY_DEFAULT_OBJECT
* trace-time-window example + tests
- adds cmake OMNITRACE_ADD_VALIDATION_TEST function for testing
- validate-timemory-json.py now supports printing (-p)
- update to OMNITRACE_STRIP_TARGET
* Update timemory submodule
- detailed backtrace print /proc/<PID>/maps
- operation::push_node verbosity change
- storage::insert_hierarchy use emplace + at instead of operator[]
- concepts::is_type_listing
- argparse updates for start/end group
- argparse color fixes
* perfetto updates
- Remove OMNITRACE_CUSTOM_DATA_SOURCE CMake option
- move tracing::get_perfetto_config and tracing::get_perfetto_session to perfetto.cpp
* config and runtime updates
- OMNITRACE_DISABLE_CATEGORIES option
- get_enabled_categories() + get_disabled_categories()
- config impl handles populating them
- OMNITRACE_TRACE_DELAY option
- OMNITRACE_TRACE_DURATION option
- OMNITRACE_TRACE_PERIODS option
- {get,set}_signal_handler
- removes config.cpp link dependency for omnitrace_finalize
- get_realtime_signal() + get_cputime_signal() + get_sampling_signals()
- moved from runtime.cpp to config.cpp
* utility::convert
- helper function for converting string to a type
* pthread_create_gotcha + thread_info updates
- thread_index_data::as_string()
- tweak printing info about new thread / exited thread
* binary updates
- get_binary_info has arg to disable dwarf parsing
- binary_info contains vector of breakpoint addresses
- binary_info:filename() function
- binary::get_linked_path
- binary::get_link_map has args for dlopen mode
- symbol::read_dwarf -> symbol::read_dwarf_entries
- symbol::read_dwarf_breakpoints
* library updates + categories impl
- implement config::set_signal_handler
- categories.cpp for handling trace delays
- implement trace delay/duration/periods
* concepts + debug + defines
- tuple_element in concepts
- removed runtime header from debug header
- OMNITRACE_DEFAULT_COPY_MOVE
* gpu + rocm_smi
- moved rsmi_num_monitor_devices call to gpu.cpp
- gpu::rsmi_device_count()
* roctracer updates
- roctracer_bundle_t -> roctracer_hip_bundle_t
- use category_region instead of explicit tracing push/pop calls
* sampling + backtrace_metrics
- rework backtrace_metrics to support categories
* tracing updates
- category stack counters (i.e. push vs. pop counter) for profiling and tracing
- push_timemory and pop_timemory accept string_view instead of const char*
- tweaked the pop_timemory hash search
- {push,pop}_perfetto theoretically supports same invocations as for {push,pop}_perfetto_ts and {push,pop}_perfetto_track
- mark_perfetto, mark_perfetto_ts, mark_perfetto_track
* category_region update
- expanded the critical trace categories
- use category_push_disabled
- use category_pop_disabled
- use category_mark_disabled
* constraint implementation
- This provides generic functionality for constraining data collection within a windows of time.
- E.g., delay, delay + duration, (delay + duration) * nrepeat
* COLORIZED_LOG -> MONOCHROME
* constraint + omnitrace-causal + omnitrace-sample updates
- support for using different clock IDs for constraints
- OMNITRACE_TRACE_PERIOD_CLOCK_ID option
- tweak to trace-time-window example
- tweak to trace-time-window tests
* Fix formatting
* Update time-window tests
- Fix detection of validation support for perfetto
- Using the --caller-include feature + runtime instrumentation on Ubuntu 18.04 and OpenSUSE 15.2 results in a segfault in the internals of Dyninst.
- For now, mark that these tests will fail
- Later, determine if updating Dyninst submodule fixes this problem
* Fix OMNITRACE_OUTPUT_PATH for all tests
- Provide absolute path instead of relative
* Tweak lambda for checking whether HW counters are enabled
- causing strange build errors on older GCC compilers
* Update dyninst submodule
- fix issues with using --caller-include for Ubuntu 18.04, OpenSUSE 15.x
* cmake formatting
* fix sampling compiler issue for GCC 8
* Tweak thread create message
* Increase causal validation iterations
[ROCm/rocprofiler-systems commit:
|
||
|
|
f3c37baab7 |
Revert perfetto submodule to v28.0 (#233)
- versions > 28.0 have multiple build errors on Ubuntu 18.04
- gold linker issues
- Werror warnings: In the GNU C Library, "minor" is defined by <sys/sysmacros.h>
[ROCm/rocprofiler-systems commit:
|
||
|
|
3c7e6902e0 |
Causal profiling (#229)
* Addition of basic structure
* Reworked categories
* More causal integration additions
* Causal implementation
* Update examples
* delete virtual_speedup files
* Update perfetto submodule to v31.0
* Update dyninst submodule
* Update timemory submodule
* ElfUtils build for libdw
* OMNITRACE_LIKELY and OMNITRACE_UNLIKELY
* Update common lib join
* Examples updates for causal profiling
* config updates with causal options
- OMNITRACE_CAUSAL_FIXED_LINE
- OMNITRACE_CAUSAL_FIXED_SPEEDUP
- OMNITRACE_CAUSAL_FILE
- OMNITRACE_CAUSAL_BINARY_SCOPE
- OMNITRACE_CAUSAL_SOURCE_SCOPE
- version info in banner
- support increments in parse_numeric_range
- fix occasional deadlock in first call to get_config
* PTL general task group
* Always include PID in debug/verbose messages
* Add blocking/unblocking gotchas to runtime init bundle
* CausalState
* thread_data updates
- generic component_bundle_cache
* Improve handling of causal in category_region
* components updates
- backtrace_causal component
- backtrace::get_data member func
- decrease ignore_depth in backtrace::sample(int)
- handle "omnitrace_main" in backtrace::filter_and_patch(...)
- tweak internal thread state scope for pthread_mutex_gotcha wrappers
* simplify tracing get_instrumentation_bundles usage
* sampling updates
- include backtrace_causal component
- disable backtrace_metrics if using causal and not using perfetto
- disable backtrace and backtrace_timestamp when using causal
- post_process_causal
* causal updates
- more checks in blocking_gotcha and unblocking_gotcha start/stop
- miscellaneous overhaul of data
- experiment update
* Remove virtual speedup
* libomnitrace code_object
* causal-profiling test
* libomnitrace library.cpp updates
- handle causal profiling
- fini_bundle
* Disable causal profiling by default
* Updated causal code and example
- example: three execution variants: cpu + rng, cpu, rng
- example: three instrumentation variants: none, omni, coz
- fix blocking gotcha credit
- rework perform_experiment_impl
- get_eligible_address_ranges
- compute_eligible_lines
- support fixed lines/speedups/functions
- update selected_entry to support function mode
- fix causal::delay
- experiment updates
* omnitrace_progress / omnitrace_user_progress
- with accompanying omnitrace_annotated_progress / omnitrace_user_annotated_progress
* Update timemory submodule
* CausalMode
- mode indicated whether causal predictions source be at line-level or function-level
* code_object, config, runtime, sampling, thread_data
- code_object: address_range
- code_object: basic::line_info serialize(), name(), hash()
- config updates
- two signals for causal sampling
- thread_data init fixes
* pthread updates
- pthread_create_gotcha processes delays
- pthread_mutex_gotcha does not wrap pthread_join in causal mode
* backtrace_causal update
- dynamic delay period stats
* main wrapper uses basename of argv[0]
* update elfio submodule
* perf support (currently unused)
* Fix experiment JSON serialization
- static_vector.hpp (unused)
* causal executable + config options updates
- omnitrace-causal exe simplifies running multiple causal configs
- changed the causal config option names
* Support both throughput and latency points
* process-causal-json.py script
- will be used later for testing
* stable_vector
* Rework thread_data
* Improve omnitrace-causal exe
- better verbosity handling
- correct diagnosis of status for child process
- execvpe when only one iteration (debugging)
* Update timemory submodule
* exe --version
- omnitrace, omnitrace-avail, and omnitrace-sample all support --version on command-line
* OMNITRACE_INTERNAL_API + OMNITRACE_{LIKELY,UNLIKELY}
* omnitrace-causal cmake format
* omnitrace config update
- OMNITRACE_CAUSAL_FILE_CLOBBER
* custom exception
- wraps STL exception and gets stacktrace during construction
* exit_gotcha supports _Exit
* use global construct_on_init + max threads
- add some safety when exceeding max # of threads
* update code_object binary filter
- exclude dyninst and tbbmalloc library
* containers: c_array, static_vector, stable_vector
- moved utility::c_array to container::c_array
- created static_vector: std::vector bound to std::array
- created stable_vector: vector with stable references
* grow thread_data when new thread created
* causal updates
- data: improve compute_eligible_lines to ignore lambdas
- data: use new thread_data
- delay: use new thread_data
- experiment: properly support latency points
- experiment: support file clobber
- experiment: ensure non-zero experiment time
- progress_point: use new thread_data
- backtrace_causal: use new thread_data
* Update causal-profiling tests
* fix omnitrace-causal backslash escaping
* process-causal-json script
* restructure causal implementation
- update verbose messages for omnitrace-causal diagnose_status
- migrated causal implementation in sampling.cpp to causal/sampling.cpp
- OMNITRACE_USE_CAUSAL does not require OMNITRACE_USE_SAMPLING
- added Mode::Causal
- causal sampling uses same signals as regular sampling
- moved tracing::thread_init to implementation file
- combined tracing::thread_init and tracing::thread_init_sampling
- added causal/components folder
- pthread_create_gotcha::wrapper_config
- omnitrace_preload checks OMNITRACE_USE_CAUSAL
- updates mode accordingly
* update timemory submodule
* update timemory submodule
* causal example updates
- causal for lulesh
* perf code + utility - helpers
- relocated causal perf code
- placement new when generating unique ptr trait for potentially allocating during sampling
- additions to utility header
- removed previously added helpers.hpp
* update timemory submodule
* Default env variables for omnitrace-causal
- activate OMNITRACE_USE_KOKKOSP, etc.
* update stable_vector and static_vector
- static vector can use atomic for size tracking for thread-safe situations
* update causal example header
- CAUSAL_PROGRESS_NAMED
- use CAUSAL_ prefix for some macros
* Tweak lulesh example
- use CAUSAL_PROGRESS instead of CAUSAL_BEGIN and CAUSAL_END
* omnitrace-sample support for causal mode
- set OMNITRACE_USE_SAMPLING to off when OMNITRACE_MODE=causal
* refactor and cleanup code_object
- scope filter
- fixes to address_range
* overhaul causal data + causal config options
- full support for function and line mode
- support static vector of instruction pointers
- improve line info mapping resolution
- remove thread-locality from miscellanous functions where unnecessary
- causal options for {binary,source,function,fileline} exclusion
* causal experiment, sampling, and backtrace updates
- is_selected + unwind address array
- experiment warning about progress points
- increased buffer size for backtrace_casual sampler
- backtrace_causal only stores IP addresses instead of full unwind info
* category_region updates
- minor refactor
- local_category_region::mark
* Update causal tests
* Bump version to 1.8.0
* omnitrace-causal args + CLOBBER -> RESET
- renamed OMNITRACE_CAUSAL_FILE_CLOBBER to OMNITRACE_CAUSAL_FILE_RESET
- updated omnitrace-causal exe to support recently added configuration options
- other miscellaneous tweaks to data.cpp, experiment.cpp, and sampling.cpp
* Refactor causal and code_object
- code_object.hpp and code_object.cpp moved into binary folder
- causal components namespaced into omnitrace::causal::component
- moved sample_data out of backtrace_causal and into own file
- renamed backtrace_causal to causal::component::backtrace
* preload omnitrace_init + OMNITRACE_DEBUG_MARK
- env OMNITRACE_DEBUG_MARK
- fix omnitrace_init call when LD_PRELOAD-ing omnitrace
* Fix fileline support + line-info output names + experiment log
- line-info log files are prefixed with experiment name
- don't print experiment duration when E2E
- account for fileline scope in analysis
* KokkosP: OMNITRACE_KOKKOSP_NAME_LENGTH_MAX
- config option to limit the name of kokkos tool callbacks
- remove [kokkos] from KokkosP names
* Update causal example
- minor tweaks to decrease probability of overlapping regions in binary
* omnitrace-causal update
- prefix N / Ntot in environment printout
* Miscellaneous updates
- causal::finish_experimenting()
- OMNITRACE_CAUSAL_RANDOM_SEED
- KokkosP causal updates
- exclude some callbacks, make some callbacks unique, etc.
- address_range::operator+=(address_range)
- combine contiguous ranges in binary/analysis.cpp when file, func, line is same and address range is contiguous
- bfd_line_info reads inline info
- wait for perform_experiment_impl to complete
- causal::delay updates
- delay::process checks if experiment is active
- uses threading::get_id()
- experiment scales duration up for larger speedup experiments
- line info samples includes excluded lines
- sampler uses CLOCK_REALTIME
- blocking_gotcha updates
- is no longer fully static
- adds audit routine which sets the postblock value to zero if try/timed routine fails
- category::host was added to causal_throughput_categories_t
- pthread_create_gotcha sets new threads local parent delay
- was using internal value, now uses sequent value
* Causal improvements to KokkosP
* Updates to experiment time scaling
- use stats instead of just max
* binary/link_map.{hpp,cpp}
* update process-causal-json.py
* Folded fileline scope into source scope
* Update documentation
- Add documentation for causal profiling
- Replace 'Omnitrace' with 'OmniTrace' everywhere
* Update causal-helpers.cmake + omnitrace-testing.cmake
- split tests/CMakeLists.txt partially into omnitrace-testing.cmake
* omnitrace/causal.h
- OMNITRACE_CAUSAL_PROGRESS
- OMNITRACE_CAUSAL_PROGRESS_NAMED
- OMNITRACE_CAUSAL_BEGIN
- OMNITRACE_CAUSAL_END
* selected_entry + remove default filters for lambdas and operator()
- selected entry stores range and binary load address
* update process-causal-json.py
* format examples/lulesh/CMakeLists.txt
* causal-helpers find_package(Threads)
* OMNITRACE_KOKKOSP_KERNEL_LOGGER
- was OMNITRACE_KOKKOS_KERNEL_LOGGER
* quiet find of coz-profiler
* Fix rocm_smi exception handling
* Update timemory submodule (binutils)
- fix binutls compile error on some systems
- bump binutils to v2.40
* Fix miscellaneous tests
* OMNITRACE_KOKKOSP_PREFIX
* revert rocm_smi handling
* ElfUtils updates
- default to download version 0.188
- add -Wno-error=null-dereference due to GCC 12 compiler error
* Update causal example
* Remove OMNITRACE_VERBOSE from global workflow envs
* Reliable causal test
* disable compilation of causal perf files
* Remove set_current_selection with unwind stack
* update timemory submodule
* fix for segfault on bionic
- locking in TLS dtor was causing segfault
* remove experiment::is_selected(unwind_stack_t)
* update default init of selected_entry
* Fix for when IP is not offset by load address
* Update CMakeLists.txt
* Miscellaneous updates
- OMNITRACE_WARNING_OR_CI_THROW
- OMNITRACE_REQUIRE
- OMNITRACE_PREFER
- fixed issues with no ASLR
- added load address variable and ipaddr() func to basic/bfd line info
- removed get_basic() from dwarf_line_info
- TIMEMORY_PREFER -> OMNITRACE_PREFER
- removed previously added binary_address and range variables from selected_entry
* Removed superfluous CausalState
* Additional causal tests (lulesh + kokkos)
* filter, prefer, analysis ASLR handling
- removed default filter on cold functions
- fixed OMNITRACE_PREFER
- fixed analysis ASLR handling
* Tweak line-info output
* Removed some superfluous code
- causal/delay
- causal/selected_entry
* Exclude main.cold in function mode
* Update validate-perfetto-proto.py
- account for occasional http errors
* Add sampling test disabling tmp files
* argparser for process-causal-json
- support validation
- support filtering
* Avoid pthread_{lock,unlock} in sampling offload
- use homemade atomic_mutex/atomic_lock since contention will be low and using pthread tools might trigger our wrappers
* Rename process-causal-json.py
- validate-causal-json.py
* rework omnitrace_add_causal_test
- capable of performing validation
- added validation tests
* Fix kokkosp_begin_deep_copy + causal
* Tweak address range in bfd_line_info::read_pc
* Tweak analysis and data IP handling
- look for gaps
* Disable scaling experiment time by speedup
* Revert change in max threads during CI
* binary updates
- significant overhaul of binary analysis implementation
- removed "basic_line_info" and "bfd_line_info" in lieu of "symbol" class
- symbol class has basic BFD info + vector of inlines + vector of dwarf info
* Updated causal to use new binary analysis
- Fix symbol.cpp includes
* Updated formatting target
- include *.cmake files
* Updated causal tests
- causal tests should be stable now
* Update timemory and dyninst submodules
- TPLs are stripped + built w/o debug info
* Increase tolerance for causal validation speedups
- higher speedups have more variance (increased to +/- 5 from 3)
* Support causal output for MPI
- i.e. tag with MPI rank
* omnitrace-causal launcher argument
* improve experiment sampling output
* causal data updates
- call compute lines once
- fixed filtered cached binary info
- debugging info when experiment fails to start
* Tweaked causal validation tests
* dwarf_entry ranges
* CI updates
- increase max threads to 64
* Tweak causal E2E validation tests
- more threads
- shorter thread runtime
- more iterations
* Fix shadowed variable
* fix symbol read_bfd last PC calculation
* fix maybe-uninitialized warning
* omnitrace-causal launcher update
- only inject "omnitrace-causal --" once
- throw error if no matches found
* Update causal profiling docs for launcher
* fix address range boundaries
[ROCm/rocprofiler-systems commit:
|
||
|
|
b7e504e938 |
Optional perfetto annotations (#206)
* Misc tweaks
- C API function print with warning colors
- split region/trace start/stop functions into regions.cpp file
* Config option for disabling perfetto annotations
* Missing checks in roctracer.cpp and sampling.cpp
* Verbose makefile in CI
* run-ci uses -VV
* Fix gcc-7 maybe-uninitialized warning
* Fix push/pop perfetto
- moving perfetto::EventContext was causing errors
[ROCm/rocprofiler-systems commit:
|
||
|
|
a325f26c61 |
Improve sampling allocator (#205)
* Updated sampling
- dynamic sampler is constructed with a shared pointer to an allocator instance
- dynamic allocator handles multiple sampler
- eliminates need for every per-thread dynamic sampler to start background allocator thread
* Fix usage of tim::popen
[ROCm/rocprofiler-systems commit:
|