10 Commits

Autor SHA1 Mensaje Fecha
David Galiffi 8fcf3a50b0 Use gersemi for CMake formatting (#257)
* Replace `cmake-format` with `gersemi`

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Remove .cmake-format.yaml

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update workflow to use gersemi

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CONTRIBUTING.md

* Update helper scripts

* Don't include `*/external/*` in workflows

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 122623a929]
2025-06-22 10:44:33 -04:00
David Galiffi 489eda995d Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed. 

Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"

---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: d07bf508a9]
2024-10-15 11:20:40 -04:00
Jonathan R. Madsen 76f30b1c6d omnitrace find_package support (#3)
* omnitrace find_package support

- Fix to INSTALL_DESTINATION for configure_package_config_file
- Fixes to ConfigInstall.cmake and omnitrace-config.cmake.in

* Test find_package

[ROCm/rocprofiler-systems commit: d2e635ed3c]
2022-05-24 22:45:26 -05:00
Jonathan R. Madsen 4ae26e2d08 rocm-smi and KokkosTools support (#23)
* renamed omnitrace_thread_data to thread_data

* initial implementation

* Numerous fixes and updates

- Updated timemory submodule
- Updated perfetto submodule (pulls in fixes for TRACE_EVENT)
- pthread_gotcha only after omnitrace_init_tooling
- omnitrace banner
- config settings for rocm-smi freq and devices
- critical_trace::get_entries
- OMNITRACE_BASIC_PRINT
- rocm_smi perfetto category
- redirect roctracer warnings for ROCm 4.5.0
- property specializations for rocm-smi components
- units fixes data_tracker types
- roctracer entries for pthread_create and start_thread
- omnitrace-avail defaults to settings, not components
- settings have conforming names
- settings warn about duplicates
- ptl named threads
- decreased max freq for sampler SIGALRM
- rocm-smi names thread
- rocm-smi avoids call to hipGetDeviceCount
- name roctracer activity callback threads
- fixed binary rewrite test output names

* Update lulesh example

- supports non-UVM GPU

* Lulesh tweaks + formatting

* KokkosP + Mode + Roctracer sampling deadlock fix

- kokkosp support
- omnitrace_init_library
- config::print_settings()
- config::get_mode()
- omnitrace::Mode
- omnitrace-avail improvements (removes settings)
- handle get_verbose() < 0
- disable dyninst InstrStackFrames by default
- handle perf_event_paranoid > 1 by disabling PAPI
- SIGALRM max freq to 5.0
- Name threads
- rocm-smi handles get_use_perfetto() and get_use_timemory()
- HSA_ENABLE_INTERRUPT=0 when roctracer + sampling (fixes deadlock)

* Tests, API renaming, roctracer

- disable renaming of thread 0
- verbprintf_bare
- enable dyninst merge tramp
- tweaked some omnitrace exe verbose levels
- reworked roctracer::setup and roctracer::shutdown
- rocm_smi::data::poll checks get_state()
- omnitrace_trace_finalize -> omnitrace_finalize
- omnitrace_trace_init -> omnitrace_init
- omnitrace_trace_set_env -> omnitrace_set_env
- omnitrace_trace_set_mpi -> omnitrace_set_mpi
- sampling mode does not disable timemory
- disable roctracer before shutting down rocm-smi
- lulesh tests w/ and w/o kokkosp
- lulesh tests for perfetto only
    - with --dynamic-callsites --traps --allow-overlapping
- lulesh tests for timemory only
    - with --stdlib --dynamic-callsites --traps --allow-overlapping

* Update timemory submodule

- fix for TIMEMORY_PROPERTY_SPECIALIZATION

* get_verbose() handling + timemory submodule update

- Findroctracer.cmake uses find_package(hsakmt)

* Stability fixes + rework roctracer + perfetto

- reworked roctracer start up
- critical_trace perfetto basic values
- perfetto sampling category
- sampler checks signals
- peak_rss in sampling
- pthread_gotcha::shutdown()
- rocm_smi::device_count()
- HSA_TOOLS_LIB is set
- HSA_ENABLE_INTERRUPT in omnitrace exe
- omnitrace exe verbosity level changes
- Avoid instrumenting Impl ns in Kokkos
- gpu::device_count prefers rocm_smi instead of hip
- ptl blocks signals
- fixed pthread_gotcha roctracer_data values
- removed runtime-instrument-sampling tests
- timemory submodule update

* cmake formatting

* timemory + roctracer updates

- fix timemory issue with papi_common
- fix timemory issue with units
- define roctracer::is_setup()

* Miscellaneous tweaks

- Disable sampling during runtime instrument
- Fixed warnings about dynamic callsites
- Fixed backtrace output when timemory disabled
- Test tweaks

* cmake-format

* omnitrace_target_compile_definitions

* timemory submodule update

* config, omnitrace, State, mpi_gotcha updates

- use OMNITRACE_THROW instead of direct throw
- is_attached()
- is_binary_rewrite()
- get_is_continuous_integration()
- get_debug_init()
- get_debug_finalize()
- max_thread_bookmarks default to 1
- State::Init
- app_thread oneTimeCode
- runtime instrumentation uses waitpid
- fixed init_names
- include main in MPI runs
- fixed sampling setup when disabled
- reworked mpi_gotcha
- disabled critical trace in transpose test

* cmake-format

* handle rocm_smi::device_count() exception

* CI timeouts

* Re-enable runtime-instrument + sampling

[ROCm/rocprofiler-systems commit: 39f17ae8b8]
2022-02-08 17:42:17 -06:00
Jonathan R. Madsen 9d5ebf9c3b Rename hosttrace to omnitrace (#18)
[ROCm/rocprofiler-systems commit: 39cf760a4e]
2021-11-24 04:59:59 -06:00
Jonathan R. Madsen efb6d766af Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16)

* Integrates roctracer values into wall-clock

* Fixed scoping + timemory roctracer

* Fixed data race in roctracer

* Synchronized HIP API on main thread

- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose

* Debugging + MPI + transpose updates

* PTL + HSA and timemory + kernel timing

- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks

* Ignore select HIP API types

- The ignored API types are ignored because there appears to be a bug
  which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory

* Tweaks to PTL config

* Timemory update + pid-prefix w/ mpi headers

- %pid%- prefix with mpi headers
- timemory submodule update

* CMake + critical trace + reorganize library source

- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace

* Remove bits/stdint-uintn.h headers

* Rename + config + depth + critical path

- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation

* Fixed duplicates in perfetto critical-trace

* reworked critical trace support

- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)

* Removed "%pid%-" output prefix in mpi_gotcha

* Update timemory submodule

[ROCm/rocprofiler-systems commit: 752424efc2]
2021-11-23 02:53:14 -06:00
Jonathan R. Madsen f3e7a1664a cmake-format + miscellaneous tweaks (#13)
* cmake-format + miscellaneous tweaks
* Formatted cmake in examples and tests
* Updated linux-ci.yml artifacts naming
* Updated clang-format
* Fixed submodule branches

[ROCm/rocprofiler-systems commit: 6c93674f92]
2021-09-20 11:12:06 -05:00
Jonathan R. Madsen 244b308cb5 Integrated perfetto + roctracer (#5)
- hosttrace library automatically collects and merges timestamps for HIP API calls and kernels with the host-side instrumentation
  - mostly eliminates the need for using external rocprof
- added thread_instruction_count in perfetto output
- increased hosttrace min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- miscellaneous cmake updates

* roctracer support

- fully integrated perfetto + roctracer outputs
- thread_instruction_count in perfetto
- increased min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- updated timemory submodule

* hosttrace_launch_compiler

- support for using an alternative compiler as needed via launch compiler
- elfio added as submodule (not currently used)
- miscellaneous cmake updates

* README update + host/device categories + misc

- timemory fix for TIMEMORY_ROCTRACER_ENABLED
- transpose fix

* papi_tuple_t -> papi_tot_ins

- minor fix to Findroctracer.cmake

[ROCm/rocprofiler-systems commit: a061b7947f]
2021-09-06 22:23:24 -05:00
Jonathan R. Madsen 045f29eda6 timemory submodule (#1)
- library now wraps fork() to warn about fork + OpenMPI

[ROCm/rocprofiler-systems commit: fc7cc1e68f]
2021-08-17 17:34:34 -05:00
Jonathan R. Madsen 581c4122aa Hosttrace via Dyninst
- complete with ctest support


[ROCm/rocprofiler-systems commit: 9ef3800986]
2021-08-06 13:08:57 -05:00