eaaec2cc3beea70467f2241a283a1b7e67dde471
5 Коммитов
| Автор | SHA1 | Сообщение | Дата | |
|---|---|---|---|---|
|
|
d07bf508a9 |
Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed. Full name: "ROCm Systems Profiler" Package name: "rocprofiler-systems" Binary / Library names: "rocprof-sys-*" --------- Co-authored-by: Xuan Chen <xuchen@amd.com> Signed-off-by: David Galiffi <David.Galiffi@amd.com> |
||
|
|
518c83e0f9 |
Dynamic expansion of thread data (#294)
* Tests for exceeding OMNITRACE_MAX_THREADS
- tests which exceeds OMNITRACE_MAX_THREADS value for thread creation
* CMake Formatting.cmake update
- include source files in /tests/source directory
* Add unknown-hash= to OMNITRACE_ABORT_FAIL_REGEX
- fail if a timemory hash is not resolved to a name
* Tests for exceeding OMNITRACE_MAX_THREADS
- update
* omnitrace-sample update
- remove env disabling of critical-trace and process-sampling
* core library update
- make_unique in concepts.hpp
- add OMNITRACE_USE_ROCM_SMI to "process_sampling" category
- remove forced disabling of critical-trace in sampling mode
- parentheses for OMNITRACE_PREFER
- use tim::get_hash_id instead of tim::get_combined_hash_id
* core library update (containers)
- added aligned_static_vector.hpp
- similar to static_vector.hpp but attempts to align to cache line size
- alignment template parameter for stable_vector
- added missing aliases in static_vector
- consistent with aligned_static_vector aliases
* thread_info update
- track the peak number of threads created
- thread_info::get_peak_num_threads() returns the peak number of threads
* thread_data update
- generic thread_data inherits from base_thread_data
- thread_data reworked to support dynamic expansion
- base_thread_data updated to invoke private_instance() function
- thread_data<optional<T>> uses stable_vector aligned to cache line width
- thread_data<identity<T>> uses stable_vector aligned to cache line width
- thread_data for optional and identity provide private private_instance function + friend to base_thread_data
- component_bundle_cache<T> is now thread_data<component_bundle_cache_impl<T>>
* causal update
- thread_data<T>::instances -> thread_data<T>::instance(construct_on_thread{ ... })
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- update progress_bundle usage to new thread_data API
* backtrace/backtrace_metrics component update
- backtrace_metrics update
- update to new thead_data API
- add thread CPU time row in perfetto
- fix potential bug when rusage categories are disabled
- fix bug in operator-= not subtracting cpu time of rhs
- backtrace update
- skip all child call-stack below 'tim::openmp::' if sampling_keep_internal = false
* pthread_gotcha component update
- pthread_gotcha::shutdown() invokes pthread_create_gotcha::shutdown()
* pthread_create_gotcha component update
- minor tweak to {start,stop}_bundle functions: pass in thread id
- update to new thread_data API
- track native handles of internal threads
- implement system with pthread_kill to stop dangling bundles
* rocprofiler/roctracer component update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* critical trace (library) update
- update to new thread_data API
- tim::get_combined_hash_id -> tim::get_hash_id
* coverage update
- update to new thread_data API
* tasking update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* roctracer update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* rocm_smi update
- update to new thread_data API
* runtime.cpp update
- update to new thread_data API
* sampling.cpp update
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* ompt.cpp update
- invoke pthread_gotcha::shutdown before invoking OMPT finalize function
- this prevents signals from being delivered to OpenMP threads
* tracing.hpp and tracing.cpp update
- replace get_timemory_hash_{ids,aliases} functions with copy_timemory_hash_ids function
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- improvements to + error checking in thread_init function
* library.cpp update
- move copying timemory hash id/aliases to tracing.cpp
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
* Update BuildSettings.cmake
- add -Wno-interference-size to suppress warning about use of std::hardware_destructive_interference
* Update fork example
- improve scheme for waiting on child processes via waitpid instead of wait
- support running main routine multiple times
- push/pop regions in child process
* Update lib/common/defines.h.in
- allow use to specify misc values via -D <name>=<value>
- OMNITRACE_CACHELINE_SIZE
- OMNITRACE_CACHELINE_SIZE_MIN
- OMNITRACE_ROCM_MAX_COUNTERS
- remove unused defines
- OMNITRACE_ROCM_LOOK_AHEAD
- OMNITRACE_MAX_ROCM_QUEUES
* Update rocprofiler.hpp
- OMNITRACE_MAX_ROCM_COUNTERS -> OMNITRACE_ROCM_MAX_COUNTERS
* Update aligned_static_vector
- set cacheline_align_v from max of OMNITRACE_CACHELINE_SIZE and OMNITRACE_CACHELINE_SIZE_MIN
* Update tracing.cpp
- acquire locks for updating main hash ids/aliases
- only propagate ids/aliases when finalizing
* Update pthread_create_gotcha.cpp
- make sure hash for "start_thread" exists on main thread
* Update causal end to end tests
- if OMNITRACE_BUILD_NUMBER is 1, set OMNITRACE_VERBOSE=0
|
||
|
|
3e2fa69a14 |
CI timeout + line-info in releases (#279)
* Update perfetto args.gn.in - remove enable_perfetto_tools_trace_to_text (unused) * core timeout implementation - requires OMNITRACE_CI=ON - requires OMNITRACE_CI_TIMEOUT=<sec> - adds pthread_self and std::this_thread::get_id to thread info - pthread_create_gotcha stores native handles (pthread_self) * Testing updates - improve detection of segfault/failures with PASS_REGEX exists - add OMNITRACE_CI_TIMEOUT env variable to all tests * Line-info in releases - e.g. -g1 + more options to minimize size of debug info * Fix typo in config exit action message * OMNITRACE_UNLIKELY around debug/verbose messages * format fixes * Overflow tests + capability check * transpose example update - link to threads library * roctracer/rocprofiler update - in ROCm 5.5.0, cannot include rocprofiler.h and roctracer.h in same file due to conflicting enum defs - Moved HSA tracing setup/shutdown to component::roctracer * roctracer update - fix definition of roctracer::setup when disabled * Update fork example - detach threads on main PID - flush io outputs when printing info * Update overflow tests - pass regular expressions - overflow on PERF_COUNT_SW_CPU_CLOCK event * fork gotcha update - use getpid() instead of getppid() * update fork example - wait on threads calling fork * timeout update - wait on timeout thread to launch before proceeding |
||
|
|
846301bcaf |
Address and thread sanitizer fixes (#250)
* Address and thread sanitizer fixes - Fix compilation with clang - Tweak perfetto copy to build tree - Added suppression files to scripts - fix LD_PRELOAD support in omnitrace-causal and omnitrace-sample - use spin_mutex and spin_lock from timemory instead of atomic_mutex and atomic_lock - state uses atomic - fix some memory leaks - tweak testing - mpi tests do not use preload - increase timeout when using sanitizers - add env LD_PRELOAD when using sanitizers * Tweak perfetto build * Update timemory submodule * Update version to 1.8.1 * Update omnitrace-leak.supp * Update timemory submodule - fixed spin_mutex implementation * Remove previously added addr_space->allowTraps(instr_traps) - this appears to cause errors during binary rewrite * causal testing updates - relaxed causal validation on CI systems (to account for hyperthreading decreasing prediction) - improved impact calculation - other general improvements to validate-causal-json.py * Improve fork handling for perfetto - numerous updates changing perfetto:: to ::perfetto:: - added perfetto_fwd.hpp * Updated fork example - user API for validation that stopping/starting perfetto is valid * Misc fixes to perfetto + fork support - tweak regions in fork example - handle disabling tmp files - get rid of stop/start with perfetto before/after fork - fixed sampling support during fork - tweak env of fork test * Fix find_package in build-tree * Fix buildtree export * Fix buildtree export * Restructured ConfigInstall before adding examples * Guard against creating tmp file in sampling when disabled * Fix buildtree package * formatting * exit handlers on child processes - quick exit to avoid perfetto cleanup * Further tweaking of causal tests for reliability - enable PROCESSOR_AFFINITY - decrease to 5 iterations * Further tweaking of causal tests for reliability - disable PROCESSOR_AFFINITY for fast func e2e tests - enabling affinity results in (valid) speedup predictions greater than zero * Fixes to fork handling - use pthread_atfork for redundancy if fork_gotcha fails * cmake formatting * Fix fork init settings + install components - remove dl from PROJECT_BUILD_TARGETS * Testing tweaks - fix mpi-binary-rewrite-run regex when OMNITRACE_VERBOSE set > 1 in env - increase causal e2e iterations to 8 * Fix "Test User API" - test-find-package.sh included dl component * Further tweaks to causal validation - further considerations of variance |
||
|
|
32b15fe7b7 |
Handle fork in target application (#191)
* Always print PID in log messages * omnitrace-dl updates - omnitrace_preload does not call omnitrace_init or omnitrace_init_tooling - omnitrace_preload will call omnitrace_set_mpi if OMNITRACE_USE_MPI or OMNITRACE_USE_MPIP in the env is true but not call it otherwise because doing so either overrides OMNITRACE_USE_PID (when true) or disable mpip from initialization (when false) and the MPI init can be caught later and override OMNITRACE_USE_PID * config updates - set_setting_value sets user update type - remove volatile from get_settings_configured - don't override settings::default_process_suffix - don't kill process in omnitrace_exit_action - set_state ignores updating state if >= State::Finalized * Handle state > State::Finalized * fork gotcha updates - unsets LD_PRELOAD - sets OMNITRACE_ROOT_PROCESS - sets OMNITRACE_CHILD_PROCESS * libomnitrace library.cpp updates - basic_bundle for fini metrics - handle finalization from child process * sampling updates - sampling::shutdown handles when child process * Add example and test using fork * Update run-ci script to support not submitting * Tweak test envs * Update build flags when codecov enabled * remove unnecessary includes of sampling header * Replace mpi copy/fini static lambda with free-funcs * Update codecov job * Fix OMPT segfaults after finalization * Miscellaneous updates after rebase * fixes for causal profiling * revert some run-ci.sh changes * Disable storing env in sampling::shutdown * formatting fix * Update timemory submodule - fixed occasional synchronization issues with allocator offloading - exclude protozero:: from internal samples * improve root/child process detection - avoid omnitrace_finalize in MPI when child process - revert some testing tweaks |