f4e27d8aeec3595130362fcb0be99bf86aa74ce8
3 コミット
| 作成者 | SHA1 | メッセージ | 日付 | |
|---|---|---|---|---|
|
|
39f17ae8b8 |
rocm-smi and KokkosTools support (#23)
* renamed omnitrace_thread_data to thread_data
* initial implementation
* Numerous fixes and updates
- Updated timemory submodule
- Updated perfetto submodule (pulls in fixes for TRACE_EVENT)
- pthread_gotcha only after omnitrace_init_tooling
- omnitrace banner
- config settings for rocm-smi freq and devices
- critical_trace::get_entries
- OMNITRACE_BASIC_PRINT
- rocm_smi perfetto category
- redirect roctracer warnings for ROCm 4.5.0
- property specializations for rocm-smi components
- units fixes data_tracker types
- roctracer entries for pthread_create and start_thread
- omnitrace-avail defaults to settings, not components
- settings have conforming names
- settings warn about duplicates
- ptl named threads
- decreased max freq for sampler SIGALRM
- rocm-smi names thread
- rocm-smi avoids call to hipGetDeviceCount
- name roctracer activity callback threads
- fixed binary rewrite test output names
* Update lulesh example
- supports non-UVM GPU
* Lulesh tweaks + formatting
* KokkosP + Mode + Roctracer sampling deadlock fix
- kokkosp support
- omnitrace_init_library
- config::print_settings()
- config::get_mode()
- omnitrace::Mode
- omnitrace-avail improvements (removes settings)
- handle get_verbose() < 0
- disable dyninst InstrStackFrames by default
- handle perf_event_paranoid > 1 by disabling PAPI
- SIGALRM max freq to 5.0
- Name threads
- rocm-smi handles get_use_perfetto() and get_use_timemory()
- HSA_ENABLE_INTERRUPT=0 when roctracer + sampling (fixes deadlock)
* Tests, API renaming, roctracer
- disable renaming of thread 0
- verbprintf_bare
- enable dyninst merge tramp
- tweaked some omnitrace exe verbose levels
- reworked roctracer::setup and roctracer::shutdown
- rocm_smi::data::poll checks get_state()
- omnitrace_trace_finalize -> omnitrace_finalize
- omnitrace_trace_init -> omnitrace_init
- omnitrace_trace_set_env -> omnitrace_set_env
- omnitrace_trace_set_mpi -> omnitrace_set_mpi
- sampling mode does not disable timemory
- disable roctracer before shutting down rocm-smi
- lulesh tests w/ and w/o kokkosp
- lulesh tests for perfetto only
- with --dynamic-callsites --traps --allow-overlapping
- lulesh tests for timemory only
- with --stdlib --dynamic-callsites --traps --allow-overlapping
* Update timemory submodule
- fix for TIMEMORY_PROPERTY_SPECIALIZATION
* get_verbose() handling + timemory submodule update
- Findroctracer.cmake uses find_package(hsakmt)
* Stability fixes + rework roctracer + perfetto
- reworked roctracer start up
- critical_trace perfetto basic values
- perfetto sampling category
- sampler checks signals
- peak_rss in sampling
- pthread_gotcha::shutdown()
- rocm_smi::device_count()
- HSA_TOOLS_LIB is set
- HSA_ENABLE_INTERRUPT in omnitrace exe
- omnitrace exe verbosity level changes
- Avoid instrumenting Impl ns in Kokkos
- gpu::device_count prefers rocm_smi instead of hip
- ptl blocks signals
- fixed pthread_gotcha roctracer_data values
- removed runtime-instrument-sampling tests
- timemory submodule update
* cmake formatting
* timemory + roctracer updates
- fix timemory issue with papi_common
- fix timemory issue with units
- define roctracer::is_setup()
* Miscellaneous tweaks
- Disable sampling during runtime instrument
- Fixed warnings about dynamic callsites
- Fixed backtrace output when timemory disabled
- Test tweaks
* cmake-format
* omnitrace_target_compile_definitions
* timemory submodule update
* config, omnitrace, State, mpi_gotcha updates
- use OMNITRACE_THROW instead of direct throw
- is_attached()
- is_binary_rewrite()
- get_is_continuous_integration()
- get_debug_init()
- get_debug_finalize()
- max_thread_bookmarks default to 1
- State::Init
- app_thread oneTimeCode
- runtime instrumentation uses waitpid
- fixed init_names
- include main in MPI runs
- fixed sampling setup when disabled
- reworked mpi_gotcha
- disabled critical trace in transpose test
* cmake-format
* handle rocm_smi::device_count() exception
* CI timeouts
* Re-enable runtime-instrument + sampling
|
||
|
|
752424efc2 |
Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16) * Integrates roctracer values into wall-clock * Fixed scoping + timemory roctracer * Fixed data race in roctracer * Synchronized HIP API on main thread - Cache hip activity callbacks and execute on main thread - Minor updates to transpose * Debugging + MPI + transpose updates * PTL + HSA and timemory + kernel timing - PTL usage fixed HSA + timemory issues bc we could control the thread destruction - Fixed laps counting in roctracer callbacks * Ignore select HIP API types - The ignored API types are ignored because there appears to be a bug which causes the "end" callback to be labeled as begin - hipDeviceEnablePeerAccess - hipImportExternalMemory - hipDestroyExternalMemory * Tweaks to PTL config * Timemory update + pid-prefix w/ mpi headers - %pid%- prefix with mpi headers - timemory submodule update * CMake + critical trace + reorganize library source - clang-tidy tweaks - cmake function updates to use hosttrace_ prefix - update gitignore - cmake HOSTTRACE_MAX_THREADS option - Formatting.cmake - cleaned up MacroUtilities.cmake - PTL submodule + usage - tweak to Findroctracer.cmake - MT transpose - Updated PTL submodule - Updated timemory submodule - fix to hosttrace return value type if type not found - reorganized library source code - support for critical trace * Remove bits/stdint-uintn.h headers * Rename + config + depth + critical path - rename hosttrace_timemory_data to instrumentation_bundles - rename hosttrace_bundle_t to main_bundle_t - rename bundle_t to instrumentation_bundle_t - rework of configuration setup - critical_trace write directly to file option - tweaked depth calculation - updated timemory submodule - improved parallel support in roctracer callbacks - working critical_trace - perfetto device-critical-trace and host-critical-trace categories - made transpose example parallel - made parallel-overhead example a bit uneven - relocated LTO activation * Fixed duplicates in perfetto critical-trace * reworked critical trace support - substantial perf improvement (30-45 min -> 30 sec) - changes to configuration (new and removed options) * Removed "%pid%-" output prefix in mpi_gotcha * Update timemory submodule |
||
|
|
c56b49a0bd |
v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh (#15)
* v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh - bumped to v0.0.3 - enabled gotcha wrapping of MPI functions w/o enabling MPI - added setup-env.sh script - minor updates to testing - Update timemory submodule - fixes tim::component::configure_mpip undefined symbol - Script updates |