66 کامیت‌ها

مولف SHA1 پیام تاریخ
Jonathan R. Madsen 060da8159c Code coverage updates (#50)
* code coverage updates

- python support
- refactored source

* remove code_coverage::operator+ and operator+=

* impl/coverage.hpp

[ROCm/rocprofiler-systems commit: 134b33320d]
2022-05-08 01:40:56 -05:00
Jonathan R. Madsen d45e84b116 GOTCHA + Kokkos + tasking + more (#47)
* GOTCHA + Kokkos + tasking + more

- update gotcha with fix for dlsym(RTLD_NEXT, ...)
- support for standalone KOKKOS_PROFILE_LIBRARY
- remove extra flags for omnitrace-user
- roctracer and critical_trace namespaces in tasking
- generic tasking functions, e.g. join(), shutdown(), etc.
- omnitrace_init_tooling_hidden in api.hpp
- ompt.cpp uses OMNITRACE_USE_OMPT
- kokkosp uses user_region instead of omnitrace component
- re-enable recycling thread ids
- more generic _{push,pop}_perfetto functors
- fix for thread_data::instance(construct_on_init, ...)
- fix for omnitrace-headers interface target
- omnitrace_watch_for_change

[ROCm/rocprofiler-systems commit: 29220cba58]
2022-04-26 22:08:51 -05:00
Jonathan R. Madsen a438000c21 Multiple python versions (#42)
* Support multiple Python versions in single build

* RPATH + Split up config into config and runtime

* pybind11 submodule

* Docker build updates

[ROCm/rocprofiler-systems commit: 4db6ba3d28]
2022-04-21 21:36:07 -05:00
Jonathan R. Madsen 127e30a4d7 Documentation + Miscellaneous Fixes (#36)
* Added documentation markdown source

* Replaced AARInternal with AMDResearch in URLs

* Renamed cpack artifact names

* Fix to testing and lulesh submodule checkout

* Docker updates

* CMake and CPack

- force CMAKE_INSTALL_LIBDIR to lib
- CPACK_DEBIAN_PACKAGE_RELEASE uses OMNITRACE_CPACK_SYSTEM_NAME
- CPACK_RPM_PACKAGE_RELEASE uses OMNITRACE_CPACK_SYSTEM_NAME
- Tweak LIBOMP_LIBRARY find in examples/openmp
- Tweak setup-env.sh.in

* Partial update of README

- status badges
- docs link
- removed install info (covered by docs)

* OMNITRACE_SAMPLING_CPUS setting

- enables control over which CPUs are sampled for frequency

* omnitrace exe updates

- exclude transaction clone, virtual thunk, non-virtual thunk
- module_function::start_address
- module_function::instructions
- verbosity > 0 encodes instructions into JSON

* Miscellaneous fixes

- relocate setup-env.sh.in
- add modulefile.in
- Updated README.md and source/docs/about.md
- cmake fix for libomp
- fix license in miscellaneous places
- dl.hpp and dl.cpp

* Update timemory and dyninst submodules

- timemory signals updates
- dyninst Movement-adhoc updates

* cmake format

[ROCm/rocprofiler-systems commit: 945f541965]
2022-04-04 15:27:38 -05:00
Jonathan R. Madsen 78ae7d1e37 Tweaks to docker scripts [skip ci] (#28)
[ROCm/rocprofiler-systems commit: 80e1a0d7e7]
2022-02-25 18:30:37 -06:00
Jonathan R. Madsen ebb29ac54a Miscellaneous updates (#21)
* Miscellaneous updates

- Updated README
- Updated VERSION
- Header include tweaks
- get_verbose() + get_verbose_env()
- fixes to omnitrace-avail
    - exclude all cuda/cupti settings
    - apply available_only to hw counters
- config file warnings
- config displayed at verbose > 0
- fix to MPI_Finalize when only using MPI headers

* Updated LICENSE

* CPack tweak

[ROCm/rocprofiler-systems commit: 8648410309]
2022-01-26 23:25:00 -06:00
Jonathan R. Madsen adb6e94de5 MPI gotcha updates (#20)
* MPI gotcha updates

* Release script updates

- build for ubuntu focal and bionic
- use OpenMPI for OMNITRACE_USE_MPI_HEADERS

* build-release.sh updates

[ROCm/rocprofiler-systems commit: a546949ff4]
2022-01-26 15:02:53 -06:00
Jonathan R. Madsen f17ff12a66 Sampling support + testing + omnitrace namespace (#19)
* omnitrace namespace

* Kokkos + Lulesh example/tests

* Sampling support + more

- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options

* Release workflow updates

* Updated settings printing

* Fixed defaults in README

* Tweak setting defaults in README

* CMake fixes

* cmake-format

* clang-format

* LULESH_USE_MPI OFF

* LULESH_USE_MPI fix

* timemory add_secondary fix

* timemory ambiguous internal namespace fix

* Update timemory submodule

* Handle output path/prefix in omnitrace

- updated timemory
- updated test environment

* sampling + papi fix

* Fix to sampling without PAPI

* Fix for using too many processors in CI

* formatting

* Updated CI

- minor cmake tweaks
- updated timemory submodule

* Updated CI

* Updated CI

* CI + timemory updates

- data race fixes

* CI updates + debug for sampling

* Sampling updates

- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality

* Minimum OMNITRACE_THREAD_COUNT of 128

* Handle multiple dims in sampler data

* Configure libunwind support for timemory

* Improved safeguards for sampling

- updated CI
- lulesh runtime-instrument test tweak

* formatting

* CI updates + sampler updates + misc

- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default

* Updated timemory submodule

- hidden visibility for timemory
- storage finalizers do not capture this

* Update timemory submodule

- component visibility updates

* Reworked header includes

- use <...> for timemory headers
- always include <library/defines.hpp>

* Rename some config options

* Update PTL submodule

* Update kokkos submodule

* Updated sampling

* Updated CI

* Reworked instrumentation exe

- lowered min-address-range threshold to 256
- extended whole function exclude

* CI fix + timemory submodule update

- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead

* Sampling flags + transpose update + CI update

- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads

* CI update

- removed ubuntu-focal-external-debug
- reduced data artifacts upload

* CI timeouts

- updated timemory submodule
- minor tweaks to omnitrace exe logging

* LICENSE updates (partial)

* CI Test stage timeout extension

* Docker and Packaging updates

* Miscellaneous fixes/tweaks

- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix

* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate

* cmake format

* Sampler deadlock fixes

* Removed debug messages from sampler

* Fix for MPI detection + test tweaks + misc

* Sampler deadlock fixes + misc

- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
    - no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY

* omnitrace-avail + libunwind update + restructure

- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace

* Fix remaining reorganization issues

- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting

* ensure_storage fix + avail improvements

- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail

* Delay settings initialization

- slight tweak to tests w/ MPI

* Disable OpenMPI testing w/ ubuntu-bionic

- MPI testing is hanging bc of network interface issue on system:

> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
>   Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.

[ROCm/rocprofiler-systems commit: 778af2a760]
2022-01-24 20:49:17 -06:00
Jonathan R. Madsen 9d5ebf9c3b Rename hosttrace to omnitrace (#18)
[ROCm/rocprofiler-systems commit: 39cf760a4e]
2021-11-24 04:59:59 -06:00
Jonathan R. Madsen efb6d766af Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16)

* Integrates roctracer values into wall-clock

* Fixed scoping + timemory roctracer

* Fixed data race in roctracer

* Synchronized HIP API on main thread

- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose

* Debugging + MPI + transpose updates

* PTL + HSA and timemory + kernel timing

- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks

* Ignore select HIP API types

- The ignored API types are ignored because there appears to be a bug
  which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory

* Tweaks to PTL config

* Timemory update + pid-prefix w/ mpi headers

- %pid%- prefix with mpi headers
- timemory submodule update

* CMake + critical trace + reorganize library source

- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace

* Remove bits/stdint-uintn.h headers

* Rename + config + depth + critical path

- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation

* Fixed duplicates in perfetto critical-trace

* reworked critical trace support

- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)

* Removed "%pid%-" output prefix in mpi_gotcha

* Update timemory submodule

[ROCm/rocprofiler-systems commit: 752424efc2]
2021-11-23 02:53:14 -06:00
Jonathan R. Madsen cdd2707058 v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh (#15)
* v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh

- bumped to v0.0.3
- enabled gotcha wrapping of MPI functions w/o enabling MPI
- added setup-env.sh script
- minor updates to testing
- Update timemory submodule
- fixes tim::component::configure_mpip undefined symbol
- Script updates

[ROCm/rocprofiler-systems commit: c56b49a0bd]
2021-10-01 16:46:03 -05:00
Jonathan R. Madsen 244b308cb5 Integrated perfetto + roctracer (#5)
- hosttrace library automatically collects and merges timestamps for HIP API calls and kernels with the host-side instrumentation
  - mostly eliminates the need for using external rocprof
- added thread_instruction_count in perfetto output
- increased hosttrace min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- miscellaneous cmake updates

* roctracer support

- fully integrated perfetto + roctracer outputs
- thread_instruction_count in perfetto
- increased min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- updated timemory submodule

* hosttrace_launch_compiler

- support for using an alternative compiler as needed via launch compiler
- elfio added as submodule (not currently used)
- miscellaneous cmake updates

* README update + host/device categories + misc

- timemory fix for TIMEMORY_ROCTRACER_ENABLED
- transpose fix

* papi_tuple_t -> papi_tot_ins

- minor fix to Findroctracer.cmake

[ROCm/rocprofiler-systems commit: a061b7947f]
2021-09-06 22:23:24 -05:00
Jonathan R. Madsen 055a3fba87 Updated documentation + misc (#3)
- tweaked some CMake option names
- moved merge-trace.jl to hosttrace-merge.jl
- removed Windows line encodings from hosttrace-merge.jl
- improved handling of !perfetto and !timemory

[ROCm/rocprofiler-systems commit: fc15967f8f]
2021-09-02 13:14:58 -05:00
Jianbing Chen 1ff2dfed88 add systemTraceEvents to merge for perfetto ftrace data
[ROCm/rocprofiler-systems commit: f518e09eab]
2021-08-30 09:20:31 -05:00
Jianbing Chen 869bbb4333 use DISCARD for fill_policy since default RING_BUFFER fails when wrap around (most result table empty); add README.md and merge-trace.jl
[ROCm/rocprofiler-systems commit: 7793a1f331]
2021-08-25 11:39:00 -05:00
Jonathan R. Madsen 581c4122aa Hosttrace via Dyninst
- complete with ctest support


[ROCm/rocprofiler-systems commit: 9ef3800986]
2021-08-06 13:08:57 -05:00