develop
27 Коммитов
| Автор | SHA1 | Сообщение | Дата | |
|---|---|---|---|---|
|
|
8fcf3a50b0 |
Use gersemi for CMake formatting (#257)
* Replace `cmake-format` with `gersemi`
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Remove .cmake-format.yaml
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Update workflow to use gersemi
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Update CONTRIBUTING.md
* Update helper scripts
* Don't include `*/external/*` in workflows
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
489eda995d |
Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed.
Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"
---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit:
|
||
|
|
a1b11b94f0 |
Dynamic expansion of thread data (#294)
* Tests for exceeding OMNITRACE_MAX_THREADS
- tests which exceeds OMNITRACE_MAX_THREADS value for thread creation
* CMake Formatting.cmake update
- include source files in /tests/source directory
* Add unknown-hash= to OMNITRACE_ABORT_FAIL_REGEX
- fail if a timemory hash is not resolved to a name
* Tests for exceeding OMNITRACE_MAX_THREADS
- update
* omnitrace-sample update
- remove env disabling of critical-trace and process-sampling
* core library update
- make_unique in concepts.hpp
- add OMNITRACE_USE_ROCM_SMI to "process_sampling" category
- remove forced disabling of critical-trace in sampling mode
- parentheses for OMNITRACE_PREFER
- use tim::get_hash_id instead of tim::get_combined_hash_id
* core library update (containers)
- added aligned_static_vector.hpp
- similar to static_vector.hpp but attempts to align to cache line size
- alignment template parameter for stable_vector
- added missing aliases in static_vector
- consistent with aligned_static_vector aliases
* thread_info update
- track the peak number of threads created
- thread_info::get_peak_num_threads() returns the peak number of threads
* thread_data update
- generic thread_data inherits from base_thread_data
- thread_data reworked to support dynamic expansion
- base_thread_data updated to invoke private_instance() function
- thread_data<optional<T>> uses stable_vector aligned to cache line width
- thread_data<identity<T>> uses stable_vector aligned to cache line width
- thread_data for optional and identity provide private private_instance function + friend to base_thread_data
- component_bundle_cache<T> is now thread_data<component_bundle_cache_impl<T>>
* causal update
- thread_data<T>::instances -> thread_data<T>::instance(construct_on_thread{ ... })
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- update progress_bundle usage to new thread_data API
* backtrace/backtrace_metrics component update
- backtrace_metrics update
- update to new thead_data API
- add thread CPU time row in perfetto
- fix potential bug when rusage categories are disabled
- fix bug in operator-= not subtracting cpu time of rhs
- backtrace update
- skip all child call-stack below 'tim::openmp::' if sampling_keep_internal = false
* pthread_gotcha component update
- pthread_gotcha::shutdown() invokes pthread_create_gotcha::shutdown()
* pthread_create_gotcha component update
- minor tweak to {start,stop}_bundle functions: pass in thread id
- update to new thread_data API
- track native handles of internal threads
- implement system with pthread_kill to stop dangling bundles
* rocprofiler/roctracer component update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* critical trace (library) update
- update to new thread_data API
- tim::get_combined_hash_id -> tim::get_hash_id
* coverage update
- update to new thread_data API
* tasking update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* roctracer update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* rocm_smi update
- update to new thread_data API
* runtime.cpp update
- update to new thread_data API
* sampling.cpp update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* ompt.cpp update
- invoke pthread_gotcha::shutdown before invoking OMPT finalize function
- this prevents signals from being delivered to OpenMP threads
* tracing.hpp and tracing.cpp update
- replace get_timemory_hash_{ids,aliases} functions with copy_timemory_hash_ids function
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- improvements to + error checking in thread_init function
* library.cpp update
- move copying timemory hash id/aliases to tracing.cpp
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* Update BuildSettings.cmake
- add -Wno-interference-size to suppress warning about use of std::hardware_destructive_interference
* Update fork example
- improve scheme for waiting on child processes via waitpid instead of wait
- support running main routine multiple times
- push/pop regions in child process
* Update lib/common/defines.h.in
- allow use to specify misc values via -D <name>=<value>
- OMNITRACE_CACHELINE_SIZE
- OMNITRACE_CACHELINE_SIZE_MIN
- OMNITRACE_ROCM_MAX_COUNTERS
- remove unused defines
- OMNITRACE_ROCM_LOOK_AHEAD
- OMNITRACE_MAX_ROCM_QUEUES
* Update rocprofiler.hpp
- OMNITRACE_MAX_ROCM_COUNTERS -> OMNITRACE_ROCM_MAX_COUNTERS
* Update aligned_static_vector
- set cacheline_align_v from max of OMNITRACE_CACHELINE_SIZE and OMNITRACE_CACHELINE_SIZE_MIN
* Update tracing.cpp
- acquire locks for updating main hash ids/aliases
- only propagate ids/aliases when finalizing
* Update pthread_create_gotcha.cpp
- make sure hash for "start_thread" exists on main thread
* Update causal end to end tests
- if OMNITRACE_BUILD_NUMBER is 1, set OMNITRACE_VERBOSE=0
[ROCm/rocprofiler-systems commit:
|
||
|
|
b65f8e7605 |
CI timeout + line-info in releases (#279)
* Update perfetto args.gn.in
- remove enable_perfetto_tools_trace_to_text (unused)
* core timeout implementation
- requires OMNITRACE_CI=ON
- requires OMNITRACE_CI_TIMEOUT=<sec>
- adds pthread_self and std::this_thread::get_id to thread info
- pthread_create_gotcha stores native handles (pthread_self)
* Testing updates
- improve detection of segfault/failures with PASS_REGEX exists
- add OMNITRACE_CI_TIMEOUT env variable to all tests
* Line-info in releases
- e.g. -g1 + more options to minimize size of debug info
* Fix typo in config exit action message
* OMNITRACE_UNLIKELY around debug/verbose messages
* format fixes
* Overflow tests + capability check
* transpose example update
- link to threads library
* roctracer/rocprofiler update
- in ROCm 5.5.0, cannot include rocprofiler.h and roctracer.h in same file due to conflicting enum defs
- Moved HSA tracing setup/shutdown to component::roctracer
* roctracer update
- fix definition of roctracer::setup when disabled
* Update fork example
- detach threads on main PID
- flush io outputs when printing info
* Update overflow tests
- pass regular expressions
- overflow on PERF_COUNT_SW_CPU_CLOCK event
* fork gotcha update
- use getpid() instead of getppid()
* update fork example
- wait on threads calling fork
* timeout update
- wait on timeout thread to launch before proceeding
[ROCm/rocprofiler-systems commit:
|
||
|
|
446fd36a93 |
Add RedHat CI and release packaging (#251)
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
- improves the stability of MPI finalization
- reduces some debug messages within timemory when `OMNITRACE_DEBUG=ON`
- fixes issue found in RHEL where libunwind is using mutex and omnitrace was not treating this as an internal mutex call
- this may have been affecting the causal profiling slightly (tests seem a bit more stable now)
- fix data race in timemory
* Add RedHat CI and release packaging
- additional miscellaneous tweaks to workflows and docker scripts, e.g. install perfetto python bindings
* Fix URL for ROCm packages in redhat workflow
* Fix dnf --enable-repo for ROCm perl packages
* Dockerfile.rhel and redhat.yml updates
- Fix dnf repo for ROCm PERL packages
- Disable python in CI (interpreter segfaults)
- Exclude parallel-overhead-locks tests due to inclusion of internal locks
- This needs to be remedied in the future
* Exclude _dl_relocate_static_pie from instrumentation
* Testing updates
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks
* Fix redhat workflow
* redhat.yml update
- remove if condition on config/build/test step
* Update timemory submodule
- tweaks to verbosity messages
* Set thread state before unw_step
- on Redhat, unw_step calls mutex
* Update timemory submodule
- verbosity changes
- gotcha uses spin_lock/spin_mutex
* Remove using gsplit-dwarf unless OMNITRACE_BUILD_NUMBER > 2
* Re-enable parallel-overhead-locks tests in redhat workflow
* Always disable timemory manager metadata auto output
* testing updates
- tweak parallel-overhead-locks-timemory to higher instruction count min
- OMNITRACE_SAMPLING_KEEP_INTERNAL=OFF for parallel-overhead-locks-perfetto
* Update timemory submodule
- quiet realpath queries
* omnitrace exe updates
- detect text files
- improved bin/lib locating
* cmake format
* test-install.sh and redhat workflow updates
- handle testing when ls is script
- re-enable python testing on redhat workflow
- invoke test-install.sh in redhat workflow
* Misc guards for finalization
* omnitrace-exe, testing updates
- test-install.sh: LS_EXEC -> LS_NAME
- handle /usr/bin/ls being script in source/bin/tests
- improve locating the binary
* Fix mpi_gotcha compile error
* omnitrace-exe updates
- improve file locating
* formatting
* Misc fixes
- remove -static-libstdc++ for RHEL packaging (rocky-linux doesn't distribute static lib)
* omnitrace-exe paths
* Replace realpath with absolute
- using absolute path to symlink fixes issues with locating libdyninstAPI_RT at runtime
* omnitrace exe updates
- judicious use of realpath
* Update timemory submodule
- fix update main hash ids/aliases data race in merge
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
* omnitrace exe updates
- Update resolved exe/lib messaging
* bin tests update
- change working directory of omnitrace-exe-simulate-lib-basename
[ROCm/rocprofiler-systems commit:
|
||
|
|
49851b05ae |
Address and thread sanitizer fixes (#250)
* Address and thread sanitizer fixes
- Fix compilation with clang
- Tweak perfetto copy to build tree
- Added suppression files to scripts
- fix LD_PRELOAD support in omnitrace-causal and omnitrace-sample
- use spin_mutex and spin_lock from timemory instead of atomic_mutex and atomic_lock
- state uses atomic
- fix some memory leaks
- tweak testing
- mpi tests do not use preload
- increase timeout when using sanitizers
- add env LD_PRELOAD when using sanitizers
* Tweak perfetto build
* Update timemory submodule
* Update version to 1.8.1
* Update omnitrace-leak.supp
* Update timemory submodule
- fixed spin_mutex implementation
* Remove previously added addr_space->allowTraps(instr_traps)
- this appears to cause errors during binary rewrite
* causal testing updates
- relaxed causal validation on CI systems (to account for hyperthreading decreasing prediction)
- improved impact calculation
- other general improvements to validate-causal-json.py
* Improve fork handling for perfetto
- numerous updates changing perfetto:: to ::perfetto::
- added perfetto_fwd.hpp
* Updated fork example
- user API for validation that stopping/starting perfetto is valid
* Misc fixes to perfetto + fork support
- tweak regions in fork example
- handle disabling tmp files
- get rid of stop/start with perfetto before/after fork
- fixed sampling support during fork
- tweak env of fork test
* Fix find_package in build-tree
* Fix buildtree export
* Fix buildtree export
* Restructured ConfigInstall before adding examples
* Guard against creating tmp file in sampling when disabled
* Fix buildtree package
* formatting
* exit handlers on child processes
- quick exit to avoid perfetto cleanup
* Further tweaking of causal tests for reliability
- enable PROCESSOR_AFFINITY
- decrease to 5 iterations
* Further tweaking of causal tests for reliability
- disable PROCESSOR_AFFINITY for fast func e2e tests
- enabling affinity results in (valid) speedup predictions greater than zero
* Fixes to fork handling
- use pthread_atfork for redundancy if fork_gotcha fails
* cmake formatting
* Fix fork init settings + install components
- remove dl from PROJECT_BUILD_TARGETS
* Testing tweaks
- fix mpi-binary-rewrite-run regex when OMNITRACE_VERBOSE set > 1 in env
- increase causal e2e iterations to 8
* Fix "Test User API"
- test-find-package.sh included dl component
* Further tweaks to causal validation
- further considerations of variance
[ROCm/rocprofiler-systems commit:
|
||
|
|
1c6aaafe96 |
Handle fork in target application (#191)
* Always print PID in log messages
* omnitrace-dl updates
- omnitrace_preload does not call omnitrace_init or omnitrace_init_tooling
- omnitrace_preload will call omnitrace_set_mpi if OMNITRACE_USE_MPI
or OMNITRACE_USE_MPIP in the env is true but not call it otherwise
because doing so either overrides OMNITRACE_USE_PID (when true) or
disable mpip from initialization (when false) and the MPI
init can be caught later and override OMNITRACE_USE_PID
* config updates
- set_setting_value sets user update type
- remove volatile from get_settings_configured
- don't override settings::default_process_suffix
- don't kill process in omnitrace_exit_action
- set_state ignores updating state if >= State::Finalized
* Handle state > State::Finalized
* fork gotcha updates
- unsets LD_PRELOAD
- sets OMNITRACE_ROOT_PROCESS
- sets OMNITRACE_CHILD_PROCESS
* libomnitrace library.cpp updates
- basic_bundle for fini metrics
- handle finalization from child process
* sampling updates
- sampling::shutdown handles when child process
* Add example and test using fork
* Update run-ci script to support not submitting
* Tweak test envs
* Update build flags when codecov enabled
* remove unnecessary includes of sampling header
* Replace mpi copy/fini static lambda with free-funcs
* Update codecov job
* Fix OMPT segfaults after finalization
* Miscellaneous updates after rebase
* fixes for causal profiling
* revert some run-ci.sh changes
* Disable storing env in sampling::shutdown
* formatting fix
* Update timemory submodule
- fixed occasional synchronization issues with allocator offloading
- exclude protozero:: from internal samples
* improve root/child process detection
- avoid omnitrace_finalize in MPI when child process
- revert some testing tweaks
[ROCm/rocprofiler-systems commit:
|
||
|
|
3c7e6902e0 |
Causal profiling (#229)
* Addition of basic structure
* Reworked categories
* More causal integration additions
* Causal implementation
* Update examples
* delete virtual_speedup files
* Update perfetto submodule to v31.0
* Update dyninst submodule
* Update timemory submodule
* ElfUtils build for libdw
* OMNITRACE_LIKELY and OMNITRACE_UNLIKELY
* Update common lib join
* Examples updates for causal profiling
* config updates with causal options
- OMNITRACE_CAUSAL_FIXED_LINE
- OMNITRACE_CAUSAL_FIXED_SPEEDUP
- OMNITRACE_CAUSAL_FILE
- OMNITRACE_CAUSAL_BINARY_SCOPE
- OMNITRACE_CAUSAL_SOURCE_SCOPE
- version info in banner
- support increments in parse_numeric_range
- fix occasional deadlock in first call to get_config
* PTL general task group
* Always include PID in debug/verbose messages
* Add blocking/unblocking gotchas to runtime init bundle
* CausalState
* thread_data updates
- generic component_bundle_cache
* Improve handling of causal in category_region
* components updates
- backtrace_causal component
- backtrace::get_data member func
- decrease ignore_depth in backtrace::sample(int)
- handle "omnitrace_main" in backtrace::filter_and_patch(...)
- tweak internal thread state scope for pthread_mutex_gotcha wrappers
* simplify tracing get_instrumentation_bundles usage
* sampling updates
- include backtrace_causal component
- disable backtrace_metrics if using causal and not using perfetto
- disable backtrace and backtrace_timestamp when using causal
- post_process_causal
* causal updates
- more checks in blocking_gotcha and unblocking_gotcha start/stop
- miscellaneous overhaul of data
- experiment update
* Remove virtual speedup
* libomnitrace code_object
* causal-profiling test
* libomnitrace library.cpp updates
- handle causal profiling
- fini_bundle
* Disable causal profiling by default
* Updated causal code and example
- example: three execution variants: cpu + rng, cpu, rng
- example: three instrumentation variants: none, omni, coz
- fix blocking gotcha credit
- rework perform_experiment_impl
- get_eligible_address_ranges
- compute_eligible_lines
- support fixed lines/speedups/functions
- update selected_entry to support function mode
- fix causal::delay
- experiment updates
* omnitrace_progress / omnitrace_user_progress
- with accompanying omnitrace_annotated_progress / omnitrace_user_annotated_progress
* Update timemory submodule
* CausalMode
- mode indicated whether causal predictions source be at line-level or function-level
* code_object, config, runtime, sampling, thread_data
- code_object: address_range
- code_object: basic::line_info serialize(), name(), hash()
- config updates
- two signals for causal sampling
- thread_data init fixes
* pthread updates
- pthread_create_gotcha processes delays
- pthread_mutex_gotcha does not wrap pthread_join in causal mode
* backtrace_causal update
- dynamic delay period stats
* main wrapper uses basename of argv[0]
* update elfio submodule
* perf support (currently unused)
* Fix experiment JSON serialization
- static_vector.hpp (unused)
* causal executable + config options updates
- omnitrace-causal exe simplifies running multiple causal configs
- changed the causal config option names
* Support both throughput and latency points
* process-causal-json.py script
- will be used later for testing
* stable_vector
* Rework thread_data
* Improve omnitrace-causal exe
- better verbosity handling
- correct diagnosis of status for child process
- execvpe when only one iteration (debugging)
* Update timemory submodule
* exe --version
- omnitrace, omnitrace-avail, and omnitrace-sample all support --version on command-line
* OMNITRACE_INTERNAL_API + OMNITRACE_{LIKELY,UNLIKELY}
* omnitrace-causal cmake format
* omnitrace config update
- OMNITRACE_CAUSAL_FILE_CLOBBER
* custom exception
- wraps STL exception and gets stacktrace during construction
* exit_gotcha supports _Exit
* use global construct_on_init + max threads
- add some safety when exceeding max # of threads
* update code_object binary filter
- exclude dyninst and tbbmalloc library
* containers: c_array, static_vector, stable_vector
- moved utility::c_array to container::c_array
- created static_vector: std::vector bound to std::array
- created stable_vector: vector with stable references
* grow thread_data when new thread created
* causal updates
- data: improve compute_eligible_lines to ignore lambdas
- data: use new thread_data
- delay: use new thread_data
- experiment: properly support latency points
- experiment: support file clobber
- experiment: ensure non-zero experiment time
- progress_point: use new thread_data
- backtrace_causal: use new thread_data
* Update causal-profiling tests
* fix omnitrace-causal backslash escaping
* process-causal-json script
* restructure causal implementation
- update verbose messages for omnitrace-causal diagnose_status
- migrated causal implementation in sampling.cpp to causal/sampling.cpp
- OMNITRACE_USE_CAUSAL does not require OMNITRACE_USE_SAMPLING
- added Mode::Causal
- causal sampling uses same signals as regular sampling
- moved tracing::thread_init to implementation file
- combined tracing::thread_init and tracing::thread_init_sampling
- added causal/components folder
- pthread_create_gotcha::wrapper_config
- omnitrace_preload checks OMNITRACE_USE_CAUSAL
- updates mode accordingly
* update timemory submodule
* update timemory submodule
* causal example updates
- causal for lulesh
* perf code + utility - helpers
- relocated causal perf code
- placement new when generating unique ptr trait for potentially allocating during sampling
- additions to utility header
- removed previously added helpers.hpp
* update timemory submodule
* Default env variables for omnitrace-causal
- activate OMNITRACE_USE_KOKKOSP, etc.
* update stable_vector and static_vector
- static vector can use atomic for size tracking for thread-safe situations
* update causal example header
- CAUSAL_PROGRESS_NAMED
- use CAUSAL_ prefix for some macros
* Tweak lulesh example
- use CAUSAL_PROGRESS instead of CAUSAL_BEGIN and CAUSAL_END
* omnitrace-sample support for causal mode
- set OMNITRACE_USE_SAMPLING to off when OMNITRACE_MODE=causal
* refactor and cleanup code_object
- scope filter
- fixes to address_range
* overhaul causal data + causal config options
- full support for function and line mode
- support static vector of instruction pointers
- improve line info mapping resolution
- remove thread-locality from miscellanous functions where unnecessary
- causal options for {binary,source,function,fileline} exclusion
* causal experiment, sampling, and backtrace updates
- is_selected + unwind address array
- experiment warning about progress points
- increased buffer size for backtrace_casual sampler
- backtrace_causal only stores IP addresses instead of full unwind info
* category_region updates
- minor refactor
- local_category_region::mark
* Update causal tests
* Bump version to 1.8.0
* omnitrace-causal args + CLOBBER -> RESET
- renamed OMNITRACE_CAUSAL_FILE_CLOBBER to OMNITRACE_CAUSAL_FILE_RESET
- updated omnitrace-causal exe to support recently added configuration options
- other miscellaneous tweaks to data.cpp, experiment.cpp, and sampling.cpp
* Refactor causal and code_object
- code_object.hpp and code_object.cpp moved into binary folder
- causal components namespaced into omnitrace::causal::component
- moved sample_data out of backtrace_causal and into own file
- renamed backtrace_causal to causal::component::backtrace
* preload omnitrace_init + OMNITRACE_DEBUG_MARK
- env OMNITRACE_DEBUG_MARK
- fix omnitrace_init call when LD_PRELOAD-ing omnitrace
* Fix fileline support + line-info output names + experiment log
- line-info log files are prefixed with experiment name
- don't print experiment duration when E2E
- account for fileline scope in analysis
* KokkosP: OMNITRACE_KOKKOSP_NAME_LENGTH_MAX
- config option to limit the name of kokkos tool callbacks
- remove [kokkos] from KokkosP names
* Update causal example
- minor tweaks to decrease probability of overlapping regions in binary
* omnitrace-causal update
- prefix N / Ntot in environment printout
* Miscellaneous updates
- causal::finish_experimenting()
- OMNITRACE_CAUSAL_RANDOM_SEED
- KokkosP causal updates
- exclude some callbacks, make some callbacks unique, etc.
- address_range::operator+=(address_range)
- combine contiguous ranges in binary/analysis.cpp when file, func, line is same and address range is contiguous
- bfd_line_info reads inline info
- wait for perform_experiment_impl to complete
- causal::delay updates
- delay::process checks if experiment is active
- uses threading::get_id()
- experiment scales duration up for larger speedup experiments
- line info samples includes excluded lines
- sampler uses CLOCK_REALTIME
- blocking_gotcha updates
- is no longer fully static
- adds audit routine which sets the postblock value to zero if try/timed routine fails
- category::host was added to causal_throughput_categories_t
- pthread_create_gotcha sets new threads local parent delay
- was using internal value, now uses sequent value
* Causal improvements to KokkosP
* Updates to experiment time scaling
- use stats instead of just max
* binary/link_map.{hpp,cpp}
* update process-causal-json.py
* Folded fileline scope into source scope
* Update documentation
- Add documentation for causal profiling
- Replace 'Omnitrace' with 'OmniTrace' everywhere
* Update causal-helpers.cmake + omnitrace-testing.cmake
- split tests/CMakeLists.txt partially into omnitrace-testing.cmake
* omnitrace/causal.h
- OMNITRACE_CAUSAL_PROGRESS
- OMNITRACE_CAUSAL_PROGRESS_NAMED
- OMNITRACE_CAUSAL_BEGIN
- OMNITRACE_CAUSAL_END
* selected_entry + remove default filters for lambdas and operator()
- selected entry stores range and binary load address
* update process-causal-json.py
* format examples/lulesh/CMakeLists.txt
* causal-helpers find_package(Threads)
* OMNITRACE_KOKKOSP_KERNEL_LOGGER
- was OMNITRACE_KOKKOS_KERNEL_LOGGER
* quiet find of coz-profiler
* Fix rocm_smi exception handling
* Update timemory submodule (binutils)
- fix binutls compile error on some systems
- bump binutils to v2.40
* Fix miscellaneous tests
* OMNITRACE_KOKKOSP_PREFIX
* revert rocm_smi handling
* ElfUtils updates
- default to download version 0.188
- add -Wno-error=null-dereference due to GCC 12 compiler error
* Update causal example
* Remove OMNITRACE_VERBOSE from global workflow envs
* Reliable causal test
* disable compilation of causal perf files
* Remove set_current_selection with unwind stack
* update timemory submodule
* fix for segfault on bionic
- locking in TLS dtor was causing segfault
* remove experiment::is_selected(unwind_stack_t)
* update default init of selected_entry
* Fix for when IP is not offset by load address
* Update CMakeLists.txt
* Miscellaneous updates
- OMNITRACE_WARNING_OR_CI_THROW
- OMNITRACE_REQUIRE
- OMNITRACE_PREFER
- fixed issues with no ASLR
- added load address variable and ipaddr() func to basic/bfd line info
- removed get_basic() from dwarf_line_info
- TIMEMORY_PREFER -> OMNITRACE_PREFER
- removed previously added binary_address and range variables from selected_entry
* Removed superfluous CausalState
* Additional causal tests (lulesh + kokkos)
* filter, prefer, analysis ASLR handling
- removed default filter on cold functions
- fixed OMNITRACE_PREFER
- fixed analysis ASLR handling
* Tweak line-info output
* Removed some superfluous code
- causal/delay
- causal/selected_entry
* Exclude main.cold in function mode
* Update validate-perfetto-proto.py
- account for occasional http errors
* Add sampling test disabling tmp files
* argparser for process-causal-json
- support validation
- support filtering
* Avoid pthread_{lock,unlock} in sampling offload
- use homemade atomic_mutex/atomic_lock since contention will be low and using pthread tools might trigger our wrappers
* Rename process-causal-json.py
- validate-causal-json.py
* rework omnitrace_add_causal_test
- capable of performing validation
- added validation tests
* Fix kokkosp_begin_deep_copy + causal
* Tweak address range in bfd_line_info::read_pc
* Tweak analysis and data IP handling
- look for gaps
* Disable scaling experiment time by speedup
* Revert change in max threads during CI
* binary updates
- significant overhaul of binary analysis implementation
- removed "basic_line_info" and "bfd_line_info" in lieu of "symbol" class
- symbol class has basic BFD info + vector of inlines + vector of dwarf info
* Updated causal to use new binary analysis
- Fix symbol.cpp includes
* Updated formatting target
- include *.cmake files
* Updated causal tests
- causal tests should be stable now
* Update timemory and dyninst submodules
- TPLs are stripped + built w/o debug info
* Increase tolerance for causal validation speedups
- higher speedups have more variance (increased to +/- 5 from 3)
* Support causal output for MPI
- i.e. tag with MPI rank
* omnitrace-causal launcher argument
* improve experiment sampling output
* causal data updates
- call compute lines once
- fixed filtered cached binary info
- debugging info when experiment fails to start
* Tweaked causal validation tests
* dwarf_entry ranges
* CI updates
- increase max threads to 64
* Tweak causal E2E validation tests
- more threads
- shorter thread runtime
- more iterations
* Fix shadowed variable
* fix symbol read_bfd last PC calculation
* fix maybe-uninitialized warning
* omnitrace-causal launcher update
- only inject "omnitrace-causal --" once
- throw error if no matches found
* Update causal profiling docs for launcher
* fix address range boundaries
[ROCm/rocprofiler-systems commit:
|
||
|
|
5a1cec92e8 |
Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}
- remove /merge from CI name
* disable using BFD when sampling_include_inlines is OFF
- this consumes a lot of memory
* Improve finalization of rocprofiler
* update timemory submodule
- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init
* Improve timemory build time
* Remove kokkosp restrictions for perfetto
* omnitrace exe signal handler update
- configure signal handlers before main to allow libomnitrace to override
* Backtrace and timemory submodule updates
- Use unwind::cache w/o inline info
- update timemory submodule
- unwind::cache updates
- filepath updates
- fix termination_signal_message
- fix vsettings::report_change
* Update dyninst submodule
- updates BinaryEdit::getResolvedLibraryPath
* update timemory submodule
- update CpuArch support
* Cleanup configure warnings
* Update examples cmake and workflows
- (Mostly) eliminate configuration warnings
* omnitrace exe updates
- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS
* Update dyninst submodule
- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath
* examples/mpi CMakeLists.txt update
- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING
* Dev build and linker flags
- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
- disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function
* Update workflows to set OMNITRACE_BUILD_NUMBER
* Fix generator expressions for -fuse-ld
* Suppress some configuration warnings during CI
- helps to keep track of real warnings when they arise
* Update timemory and dyninst submodules with CMP0135
* Add -V flag to run-ci script
[ROCm/rocprofiler-systems commit:
|
||
|
|
8f8ead76b5 |
Signal handler backtraces provide line info (#178)
* Signal handler backtraces provide line info
- print backtrace after SIGINT during finalization
* Workflow run-name + jammy rocm CI
* fix jammy matrix indentation
* disable building dyninst in jammy
* Update jammy for rocm
* jammy rocm_agent_enumerator
* Fix rocm install for jammy
* jammy bash
* jammy workflow typo
* revert some changes
* stack-usage + omnitrace-rt symlink + ncclSocketAccept + indiv sigs
- symlink omnitrace-rt in build tree
- exclude ncclSocketAccept
- timemory submodule update accepting individual signal handlers
[ROCm/rocprofiler-systems commit:
|
||
|
|
5973299ccd |
omnitrace-sample (#169)
- `omnitrace-sample` executable which executes sampling (no
instrumentation)
- fixes bug with OMPT ignoring value of `OMNITRACE_USE_OMPT`
- fixes some issues with sampling duration
- new `OMNITRACE_SAMPLING_INCLUDE_INLINES` configuration variable
- restricts process-sampling to 100 interrupts/sec when inheriting value
from `OMNITRACE_SAMPLING_FREQ`
- `OMNITRACE_PROCESS_SAMPLING_FREQ` still supports up to 1000
interrupts/sec
- fixes bug with colorized log not truly being disabled in all instances
- adds tests for `omnitrace-sample`
- adds tests for sampling duration
- settings ROCP_TOOL_LIB to libomnitrace-dl throws error
- rocprofiler does not configure correctly when this is done
- Quiet numa_gotcha warnings
- Fixed some shadowed variables
[ROCm/rocprofiler-systems commit:
|
||
|
|
07e3cf256a |
Resolve warnings/errors with extra warnings (#171)
[ROCm/rocprofiler-systems commit:
|
||
|
|
073ab3882b |
Fix deadlocks during initialization (#167)
- More to come in later commit, below is just tidying some stuff up
- clang-tidy
- mpi_gotcha quiet about not finding funcs
- update to new papi config
- sampling block_samples / unblock_samples
- disable calling component's sample functions within sampler
- release doesn't strip library
- remove HSA and ROCP env variables from modulefile / setup-env
- preliminary support for LD_PRELOAD usage
- default sampling rate is 300 interrupts / second
- fixes various deadlock issues at startup
[ROCm/rocprofiler-systems commit:
|
||
|
|
6227c25220 |
Fix setup-env + hsa/rocm/ompt serialization + testing + misc (#156)
- Fix setup-env.sh
- Closes #149
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
- Closes #153
- Closes #154
[ROCm/rocprofiler-systems commit:
|
||
|
|
473f452d39 |
Rework sampling and colorized logs (#140)
## Overview
This is a significant PR which has 3 very notable characteristics:
1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling
- Samples now record the current instruction pointers instead of strings
- This _dramatically_ decreases the overhead of taking a sample
- The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
- When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
- `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency
- `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
- Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
- In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
- This helped de-clutter the output folder
Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.
## Bug Fixes
There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made
## Details
- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
- there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
- added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
- removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
- removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
- migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
- handle disabled thread state
- handle finalized state
- fewer debug messages
- invoke thread_init()
- invoke thread_init_sampling()
- handle push/pop count based on category
- push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
[ROCm/rocprofiler-systems commit:
|
||
|
|
f642813ad1 |
Static libstdcxx and python (#139)
Support python + static libstdc++
[ROCm/rocprofiler-systems commit:
|
||
|
|
f2bcca54f6 |
Fix warnings + Werror (#101)
- Fix warnings via OMNITRACE_BASIC_VERBOSE and OMNITRACE_BASIC_VERBOSE_F
[ROCm/rocprofiler-systems commit:
|
||
|
|
d009fc24a6 |
Standalone build examples + testing workflow updates (#15)
* Update examples to support standalone builds
* Tweak to ubuntu-focal-external workflow
- disable PAPI
* ubuntu focal external workflow update
- GCC 11
- Test static libgcc + static libstdcxx + strip
- ubuntu-toolchain-r/test
* Improve build-release.sh
- command line args for lto, strip, perfetto-tools,
static-libgcc, static-libstdcxx, hidden-visibility,
max-threads, parallel
* Update VERSION to 1.0.1
* Fixes to LTO build
* Updates to ubuntu-focal-external workflow
* build-release.sh update
- enable static libstdcxx by default
* disable python + static libstdcxx
* ubuntu-focal-external updates
* build-release.sh disable static libstdcxx by default
* cmake-format
[ROCm/rocprofiler-systems commit:
|
||
|
|
0d5f0fb9cf |
Support for tracing mutex locking (#52)
* Parallel overhead example with locks
* Support tracing mutex locking + more
- support wrapping pthread_mutex_lock
- support wrapping pthread_mutex_unlock
- support wrapping pthread_mutex_trylock
- get_perfetto_combined_traces setting
- OMNITRACE_TRACE_THREAD_LOCKS option
- ThreadState
- critical trace includes queue id
- enabled/disabled settings in timemory
- fix OMNITRACE_TIMEMORY_COMPONENTS
- fix reading config
- fix setting categories
- applied ThreadState::Internal in various places
- utility::get_filled_array
- utility::get_reserved_vector
- utility::get_thread_index
- fork_gotcha messages about forks
- split out some pthread_gotcha functionality into pthread_create_gotcha
- handle queue id in roctracer callbacks
* Update timemory and PTL submodules
* Misc CMake updates
- Includes fix to omnitrace-static-lib{gcc,stdcxx}
* Misc cleanup to pthread_mutex_gotcha and backtrace
* Fix to duplicate field in module_function json
* Improvement to debug messages
* omnitrace-dl and common improvements
- tweak to delimit
- common::ignore message
- common::join quoting of strings
- omnitrace_set_env ignores if inited and active
- omnitrace_set_mpi ignores if inited and active
* nsync for transpose example
* Fix to thread_deleter<void> functor invoke
* Fix thread state and HIP stream enums
[ROCm/rocprofiler-systems commit:
|
||
|
|
6daac0f60c |
Python support (#37)
* Initial python support
* Add python testing
* Increase timeout for bin tests
* cmake-format
* Valid build types + testing + formatting + more
- Enforce valid build types
- Fix to numpy install
- Increase testing timeout
- Fix to cmake format glob
- Fix to backtrace verbose
* Disable stripping libraries by default
* omnitrace exe updates
- new '--print-instructions' option
- changed format of instructions in JSON
- remove no-save-fpr tests
* Default to strip libraries when release build
[ROCm/rocprofiler-systems commit:
|
||
|
|
2403bbde49 |
Stability improvements (#26)
* omnitrace verbprintf and errprintf
* avail categories fix
* omnitrace-dl namespace
* OMNITRACE_CI macro / OMNITRACE_BUILD_CI option
- always enables asserts
* Roctracer improvements
- Reworked roctracer significantly
- Added categories to settings
- create_cpu_cid_entry
- handle clock_skew in roctracer
- fixed roctracer activity names
- hip_api_callback is "host"
- perfetto::Flow for GPU
* timemory submodule update
* Tweak to redirect
* Improved recursive guards
- functors component
- created "_hidden" variants of instrumentation funcs
- omnitrace_* calls omnitrace_*_hidden
- omnitrace-dl calls non-hidden
- omnitrace-dl now strongly protects against recursion
- omnitrace-dl now is standalone w.r.t. headers
* Stability fixes
- OMNITRACE_DEBUG_PUSH env variable
- fix to HSA_TOOLS_LIB in dl.cpp
- Fixed SFINAE warning in mpi_gotcha
- Handle 64, _l, _r extensions in whole function names
* cmake formatting
* Fix for last commit + push/pop count info
- don't instrument rocr::core::Signal::WaitAny
- don't instrument rocr::core::Runtime::AsyncEventsLoop
- fixed main not being popped in runtime instrument
- updated interval data reserve
- copy hash-ids and aliases onto main thread
- warn about unclosed regions
- removed guards in libomnitrace
- added error checks for incorrect push_count vs. pop_count
- fixed missing pop_timemory in last commit
* Finalization methodology updates
- added some more rocr:: functions to whole function names
* Add event_base_loop to whole functions
* Update VERSION to 0.1.0
[ROCm/rocprofiler-systems commit:
|
||
|
|
b4a82711d1 |
Sampler improvements (#22)
* Sampler improvements
- roctracer_flush_activity
- papi_array in backtrace
- fixed sampler trait specializations
- split main_bundle into main and gotcha bundles
- cmake option display
* timemory update
* EINTR handling + debug_{pid,tid}
- sampler handles EINTR for sem_init and sem_destroy
- OMNITRACE_DEBUG_{TIDS,PIDS} env variables
* Increase waitForStatusChange
[ROCm/rocprofiler-systems commit:
|
||
|
|
9d5ebf9c3b |
Rename hosttrace to omnitrace (#18)
[ROCm/rocprofiler-systems commit:
|
||
|
|
efb6d766af |
Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16)
* Integrates roctracer values into wall-clock
* Fixed scoping + timemory roctracer
* Fixed data race in roctracer
* Synchronized HIP API on main thread
- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose
* Debugging + MPI + transpose updates
* PTL + HSA and timemory + kernel timing
- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks
* Ignore select HIP API types
- The ignored API types are ignored because there appears to be a bug
which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory
* Tweaks to PTL config
* Timemory update + pid-prefix w/ mpi headers
- %pid%- prefix with mpi headers
- timemory submodule update
* CMake + critical trace + reorganize library source
- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace
* Remove bits/stdint-uintn.h headers
* Rename + config + depth + critical path
- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation
* Fixed duplicates in perfetto critical-trace
* reworked critical trace support
- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)
* Removed "%pid%-" output prefix in mpi_gotcha
* Update timemory submodule
[ROCm/rocprofiler-systems commit:
|
||
|
|
f3e7a1664a |
cmake-format + miscellaneous tweaks (#13)
* cmake-format + miscellaneous tweaks
* Formatted cmake in examples and tests
* Updated linux-ci.yml artifacts naming
* Updated clang-format
* Fixed submodule branches
[ROCm/rocprofiler-systems commit:
|
||
|
|
055a3fba87 |
Updated documentation + misc (#3)
- tweaked some CMake option names
- moved merge-trace.jl to hosttrace-merge.jl
- removed Windows line encodings from hosttrace-merge.jl
- improved handling of !perfetto and !timemory
[ROCm/rocprofiler-systems commit:
|
||
|
|
581c4122aa |
Hosttrace via Dyninst
- complete with ctest support
[ROCm/rocprofiler-systems commit:
|