37 Υποβολές

Συγγραφέας SHA1 Μήνυμα Ημερομηνία
Kian Cossettini 698ac6b8bc [rocprofiler-systems] Add build option for "examples" to specify gfx-arch (#2626)
## Motivation
 - Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
 - Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake` 
 - Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
 - Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
 - GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
 - OMPVV offload tests now only compile if `amdflang` version is `>= 20`
 - Improve link time by reducing the number of GFX targets that binaries need to support.
   - RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
2026-01-20 12:13:21 -05:00
Kian Cossettini 9f014db6a4 [rocprofiler-systems] Update install path for examples (#2625)
* Update install path for examples to `share/rocprofiler-systems/examples`

----

Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-15 21:51:16 -05:00
habajpai-amd 1c7293e6d0 Add bounds checks in transpose_a for both load and store so edge tiles dont read/write past MxN (#950) 2025-09-12 17:32:30 +05:30
habajpai-amd fb6fe518e8 fix(transpose): correct host allocation and GB/s calculation (#860) 2025-09-04 16:08:16 -04:00
David Galiffi 847580dd9e Update minimum_cmake_required to match version used in CI (#679)
- Update minimum_cmake_required to match version used in CI
  - We should match the minimum version that we test against

- Ensure ".S" files are treated as assembly.
2025-08-21 15:56:47 -04:00
David Galiffi 8fcf3a50b0 Use gersemi for CMake formatting (#257)
* Replace `cmake-format` with `gersemi`

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Remove .cmake-format.yaml

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update workflow to use gersemi

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CONTRIBUTING.md

* Update helper scripts

* Don't include `*/external/*` in workflows

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 122623a929]
2025-06-22 10:44:33 -04:00
Sajina PK 916aac1e92 Enable MPI tracing for Fortran (#185)
- Move the MPI gotcha functionality from Timemory to the repo.
- Add the PMPI Fortran MPI functions to the existing mpi gotcha handle.

[ROCm/rocprofiler-systems commit: 4fcd8cc78d]
2025-06-04 18:06:18 -04:00
Peter Park 3f9a3861ac Update copyright year to 2025 (#83)
[ROCm/rocprofiler-systems commit: 0a15d355e0]
2025-01-29 16:53:16 -05:00
David Galiffi d8c98d2d4d OMPT Target Offload Support (#17)
- Porting from https://github.com/ROCm/omnitrace/pull/411
- Improve OMPT support
- Add OpenMP target example to testing
- Update Timemory submodule to use ROCm/Timemory rather than NERSC/Timemory
- Update `actions/upload-artifacts` to v4
- Standardize the `cmake_minimum_required` to 3.18.4 across workflows, project, and examples
- Updated Ubuntu 20.04 workflows

[ROCm/rocprofiler-systems commit: 9da7365471]
2024-11-18 14:18:21 -05:00
David Galiffi 489eda995d Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed. 

Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"

---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: d07bf508a9]
2024-10-15 11:20:40 -04:00
Jonathan R. Madsen b65f8e7605 CI timeout + line-info in releases (#279)
* Update perfetto args.gn.in

- remove enable_perfetto_tools_trace_to_text (unused)

* core timeout implementation

- requires OMNITRACE_CI=ON
- requires OMNITRACE_CI_TIMEOUT=<sec>
- adds pthread_self and std::this_thread::get_id to thread info
- pthread_create_gotcha stores native handles (pthread_self)

* Testing updates

- improve detection of segfault/failures with PASS_REGEX exists
- add OMNITRACE_CI_TIMEOUT env variable to all tests

* Line-info in releases

- e.g. -g1 + more options to minimize size of debug info

* Fix typo in config exit action message

* OMNITRACE_UNLIKELY around debug/verbose messages

* format fixes

* Overflow tests + capability check

* transpose example update

- link to threads library

* roctracer/rocprofiler update

- in ROCm 5.5.0, cannot include rocprofiler.h and roctracer.h in same file due to conflicting enum defs
- Moved HSA tracing setup/shutdown to component::roctracer

* roctracer update

- fix definition of roctracer::setup when disabled

* Update fork example

- detach threads on main PID
- flush io outputs when printing info

* Update overflow tests

- pass regular expressions
- overflow on PERF_COUNT_SW_CPU_CLOCK event

* fork gotcha update

- use getpid() instead of getppid()

* update fork example

- wait on threads calling fork

* timeout update

- wait on timeout thread to launch before proceeding

[ROCm/rocprofiler-systems commit: 3e2fa69a14]
2023-06-14 11:55:22 -05:00
Jonathan R. Madsen 49851b05ae Address and thread sanitizer fixes (#250)
* Address and thread sanitizer fixes

- Fix compilation with clang
- Tweak perfetto copy to build tree
- Added suppression files to scripts
- fix LD_PRELOAD support in omnitrace-causal and omnitrace-sample
- use spin_mutex and spin_lock from timemory instead of atomic_mutex and atomic_lock
- state uses atomic
- fix some memory leaks
- tweak testing
  - mpi tests do not use preload
  - increase timeout when using sanitizers
  - add env LD_PRELOAD when using sanitizers

* Tweak perfetto build

* Update timemory submodule

* Update version to 1.8.1

* Update omnitrace-leak.supp

* Update timemory submodule

- fixed spin_mutex implementation

* Remove previously added addr_space->allowTraps(instr_traps)

- this appears to cause errors during binary rewrite

* causal testing updates

- relaxed causal validation on CI systems (to account for hyperthreading decreasing prediction)
- improved impact calculation
- other general improvements to validate-causal-json.py

* Improve fork handling for perfetto

- numerous updates changing perfetto:: to ::perfetto::
- added perfetto_fwd.hpp

* Updated fork example

- user API for validation that stopping/starting perfetto is valid

* Misc fixes to perfetto + fork support

- tweak regions in fork example
- handle disabling tmp files
- get rid of stop/start with perfetto before/after fork
- fixed sampling support during fork
- tweak env of fork test

* Fix find_package in build-tree

* Fix buildtree export

* Fix buildtree export

* Restructured ConfigInstall before adding examples

* Guard against creating tmp file in sampling when disabled

* Fix buildtree package

* formatting

* exit handlers on child processes

- quick exit to avoid perfetto cleanup

* Further tweaking of causal tests for reliability

- enable PROCESSOR_AFFINITY
- decrease to 5 iterations

* Further tweaking of causal tests for reliability

- disable PROCESSOR_AFFINITY for fast func e2e tests
- enabling affinity results in (valid) speedup predictions greater than zero

* Fixes to fork handling

- use pthread_atfork for redundancy if fork_gotcha fails

* cmake formatting

* Fix fork init settings + install components

- remove dl from PROJECT_BUILD_TARGETS

* Testing tweaks

- fix mpi-binary-rewrite-run regex when OMNITRACE_VERBOSE set > 1 in env
- increase causal e2e iterations to 8

* Fix "Test User API"

- test-find-package.sh included dl component

* Further tweaks to causal validation

- further considerations of variance

[ROCm/rocprofiler-systems commit: 846301bcaf]
2023-02-27 12:09:03 -06:00
Jonathan R. Madsen 5a1cec92e8 Various optimizations (#192)
* CDash name prefix {{ repo_owner }}-{{ ref_name }}

- remove /merge from CI name

* disable using BFD when sampling_include_inlines is OFF

- this consumes a lot of memory

* Improve finalization of rocprofiler

* update timemory submodule

- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init

* Improve timemory build time

* Remove kokkosp restrictions for perfetto

* omnitrace exe signal handler update

- configure signal handlers before main to allow libomnitrace to override

* Backtrace and timemory submodule updates

- Use unwind::cache w/o inline info
- update timemory submodule
  - unwind::cache updates
  - filepath updates
  - fix termination_signal_message
  - fix vsettings::report_change

* Update dyninst submodule

- updates BinaryEdit::getResolvedLibraryPath

* update timemory submodule

- update CpuArch support

* Cleanup configure warnings

* Update examples cmake and workflows

- (Mostly) eliminate configuration warnings

* omnitrace exe updates

- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS

* Update dyninst submodule

- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath

* examples/mpi CMakeLists.txt update

- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING

* Dev build and linker flags

- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
  - disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function

* Update workflows to set OMNITRACE_BUILD_NUMBER

* Fix generator expressions for -fuse-ld

* Suppress some configuration warnings during CI

- helps to keep track of real warnings when they arise

* Update timemory and dyninst submodules with CMP0135

* Add -V flag to run-ci script

[ROCm/rocprofiler-systems commit: f147670a7a]
2022-11-01 17:28:12 -05:00
Jonathan R. Madsen e1de27dde7 Raise default min number of instructions (#173)
- Raise min instructions default to 1024 instead of 64
- Default value of 64 has demonstrated tendency to slow down real-life
applications
- Improved the memory safety during `omnitrace_finalize()`
- new modifications guarantee that when `tim::manager::instance()` on
main thread is destroyed, omnitrace will finalize before
- Improved some warning w/ roctracer
- Improved the search for `ROCP_METRICS` and
`OMNITRACE_ROCPROFILER_LIBRARY`
- disable printing env by default
- Attempted to improve the sampling shutdown

[ROCm/rocprofiler-systems commit: 7d7a8f2c23]
2022-09-30 19:28:43 -05:00
Jonathan R. Madsen 93343034bc RCCL support (#93)
* Initial support for RCCL

* OMNITRACE_USE_RCCLP + sampling tweaks

- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile

* Update docker and workflows to download RCCL

* Update CPack DEB with rocprofiler dependency

* Rework rccl into library and library/components folder

- add tpls/rccl/rccl/rccl.h

* Fix timemory includes

* rcclp inline definitions when disabled

* Tweaks to ubuntu-focal-external-rocm

- disable ompt
- enable building testing

* Tweaks to ubuntu-focal-external-rocm

- ctest exclude

* Tweak ubuntu-focal.yml

- remove source /.../setup-env.sh, replace with $GITHUB_ENV

* Fix ubuntu-focal-rocm + OMPI + root

* Improved rocm-smi error handling

- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled

* formatting

* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL

* Update RCCL include directory

- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>

* RCCL Testing

- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes

* Handle RCCL include w/o HIP

* RCCL requires HIP

* Update OMNITRACE_SAMPLING_CPUS for testing

* Update tests/CMakeLists.txt

* Debug settings

* Install MPI even when USE_MPI=OFF

* exclude printf

* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS

* update ubuntu rocm workflow

* Fix configure env step for ubuntu rocm

[ROCm/rocprofiler-systems commit: 45be03906a]
2022-07-25 12:16:11 -05:00
Jonathan R. Madsen 17cf00c51d fix omnitrace print-* with libraries (#94)
* fix omnitrace print-* with libraries

* timemory submodule update

* Update workflows to use ./bin/omnitrace instead of ./omnitrace

* cmake format

* update timemory submodule

- fix ODR violations in utility/procfs

* cmake updates

- uniform find_package for all ROCm-based libraries

* tweak transpose example

- throw exception instead of std::exit

* Inspect cmdv name before assuming not exe

- some ELF execs "think" they are libraries so only assume rewrite + simulate + all-functions if filename looks like library
- adds some test for --print-available -- <library>

* Fix _has_lib_prefix when command is < 3

* Updates and reverts to omnitrace exe

- update module_function operator< and operator==
- add function_signature operator<
- refactor module_function ctor
- revert some previous changes w.r.t. simulate and include_unninstr

* Fix source/bin/tests to use same output dir as tests

* cmake format

* Segfault mitigation + refactor + modify function iteration

- refactor module_function ctor to avoid segfaults
- string_t -> std::string
- replace std::string with std::string_view in some places
- get_name(module_t*)
- get_name(procedure_t*)
- disable using both app_modules and app_functions
- new option: --parse-all-modules to iterate over app_modules
- removed some unused code w.r.t. debug info

* Disable module_function address range for uninstrumentable functions

* Disable module_function address range for uninstrumentable functions

* Refactored getting file/line info and init/fini

- use dyninst insertInitCallback and insertFiniCallback if main not found
- fixed all issues with segmentation faults in --simulate --all-functions

* revert changes to Findrocprofiler.cmake

[ROCm/rocprofiler-systems commit: d04cbe862e]
2022-07-21 01:15:41 -05:00
Jonathan R. Madsen 7d1989f5a4 GPU HW Counters via rocprofiler (#84)
* Initial support for GPU hardware counters

* Update find modules for roctracer and rocprofiler

- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure

* Improve ConfigCPack for MPI

* Update rocprofiler

- rocm_metrics()
- minor cleanup

* Update rocm find modules

* declare rocm_metrics + call in omnitrace-avail

* relocate omnitrace-launch-compiler

* REALPATH and find_modules

* Examples cmake (may drop)

* omnitrace-avail

- hw_counter categories
- init rocm

* setenv updates for rocprofiler in library.cpp and dl.cpp

* get_rocm_events config

* gpu::hip_device_count()

* rocm_metrics returns hardware_counters::info

* - relocated library/components/roctracer_callbacks.* to library/roctracer.*
- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker

* update omnitrace-avail info_type generation

* Update timemory submodule

* rocprofiler component

* cmake formatting

* omnitrace-avail handle no GPUs

- Add -c command-line option for --categories
- support verbosity

* hsa_rsrc_factory throws exceptions

- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler

* rocprofiler symbols for when disabled

* Fix warning in omnitrace-avail

- std::stringstream from initializer list would use explicit constructor

* Fix finalization after settings are deleted

* Reorganized rocprofiler source

* Updated formatting

* Miscellaneous tweaks

- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes

[ROCm/rocprofiler-systems commit: 4208b5654c]
2022-07-17 21:52:09 -05:00
Jonathan R. Madsen 2eb4b4981c Fix loop-level instrumentation + more (#32)
- fix loop-level instrumentation
- support loop instrumentation w/o debug symbols via loop number
- improve module_function messages
- serialize num_basic_blocks
- serialize num_outer_loops
- serialize is_num_instructions_constrained
- serialize is_loop_num_instructions_constrained
- updated transpose example to use uniform_int_distribution
- added transpose loop test
- added fail regexes for tests which enable loop instrumentation
- use module->getFullName in get_loop_file_line_info
- use module->getFullName in get_func_file_line_info
- use module->getFullName in get_basic_block_file_line_info

[ROCm/rocprofiler-systems commit: a640fbdb29]
2022-06-10 06:57:50 -05:00
Jonathan R. Madsen 2699095190 CMake updates/fixes + parallel-overhead updates (#16)
- OMNITRACE_INSTALL_EXAMPLES option
- Fix lulesh standalone HIP compilation
- Fix transpose standalone HIP compilation
- Tweaks to parallel-overhead

[ROCm/rocprofiler-systems commit: a9ff15ff8c]
2022-05-31 14:55:31 -05:00
Jonathan R. Madsen d009fc24a6 Standalone build examples + testing workflow updates (#15)
* Update examples to support standalone builds

* Tweak to ubuntu-focal-external workflow

- disable PAPI

* ubuntu focal external workflow update

- GCC 11
- Test static libgcc + static libstdcxx + strip
- ubuntu-toolchain-r/test

* Improve build-release.sh

- command line args for lto, strip, perfetto-tools,
   static-libgcc, static-libstdcxx, hidden-visibility,
   max-threads, parallel

* Update VERSION to 1.0.1

* Fixes to LTO build

* Updates to ubuntu-focal-external workflow

* build-release.sh update

- enable static libstdcxx by default

* disable python + static libstdcxx

* ubuntu-focal-external updates

* build-release.sh disable static libstdcxx by default

* cmake-format

[ROCm/rocprofiler-systems commit: 8b97c70df8]
2022-05-31 01:51:18 -05:00
Jonathan R. Madsen 0b75ce03a0 Minor updates for transpose, timemory submodule, roctracer, and omnitrace exe (#4)
* transpose usage message

* timemory submodule update

* roctracer updates

- Changes to verbosity of roctracer::shutdown
- protect_flush_activity prevents deadlock when error in callback

* Removed linking to timemory-cxx in omnitrace

- omnitrace exe does not link to `timemory-cxx` target

[ROCm/rocprofiler-systems commit: 5b2c27cccd]
2022-05-24 18:35:33 -05:00
Jonathan R. Madsen 0d5f0fb9cf Support for tracing mutex locking (#52)
* Parallel overhead example with locks

* Support tracing mutex locking + more

- support wrapping pthread_mutex_lock
- support wrapping pthread_mutex_unlock
- support wrapping pthread_mutex_trylock
- get_perfetto_combined_traces setting
- OMNITRACE_TRACE_THREAD_LOCKS option
- ThreadState
- critical trace includes queue id
- enabled/disabled settings in timemory
- fix OMNITRACE_TIMEMORY_COMPONENTS
- fix reading config
- fix setting categories
- applied ThreadState::Internal in various places
- utility::get_filled_array
- utility::get_reserved_vector
- utility::get_thread_index
- fork_gotcha messages about forks
- split out some pthread_gotcha functionality into pthread_create_gotcha
- handle queue id in roctracer callbacks

* Update timemory and PTL submodules

* Misc CMake updates

- Includes fix to omnitrace-static-lib{gcc,stdcxx}

* Misc cleanup to pthread_mutex_gotcha and backtrace

* Fix to duplicate field in module_function json

* Improvement to debug messages

* omnitrace-dl and common improvements

- tweak to delimit
- common::ignore message
- common::join quoting of strings
- omnitrace_set_env ignores if inited and active
- omnitrace_set_mpi ignores if inited and active

* nsync for transpose example

* Fix to thread_deleter<void> functor invoke

* Fix thread state and HIP stream enums

[ROCm/rocprofiler-systems commit: b208047741]
2022-05-08 04:40:10 -05:00
Jonathan R. Madsen 72d0a7d08a Code Coverage Support (#46)
* Code-coverage support

* Examples update

- code-coverage example
- tweak transpose and parallel-overhead

* Coverage output + testing

- config::get_setting value(...)
- REGULAR_EXPRESSION -> REGEX in cmake func args
- coverage.hpp header
- coverage JSON
- coverage tests

* cmake formatting

* Library instrumentation w/o main + more

- fixed library instrumentation w/o main
- use TIMEMORY_PROJECT_NAME in output messages
- removed '--driver' option from omnitrace exe
- support coverage in trace mode
- OMNITRACE_KOKKOS_KERNEL_LOGGER
- support multiple calls to omnitrace_set_env after init if already called
- support multiple calls to omnitrace_set_mpi after init if same args
- support multiple calls to omnitrace_init if same mode
- unique_ptr_t for thread_data which calls finalize when thread_data is destroyed
- tweaked openmp tests
- improved finalization

* Replace CI --output-on-failure with -V

* Fix to OMNITRACE_DL_INVOKE

* omnitrace-exe and testing updates

- omnitrace::omnitrace-timemory interface library
- support for configs in omnitrace exe
- print-{available,instrumented,...} opts no longer exit w/o --simulate
- all tests apply --print-instrumented functions
- tweaked coverage tests
- print-* options print instructions not address range

* Remove OMNITRACE_DEBUG_FINALIZE=ON from CI

* Python cmake tweaks

* Tweak test ordering

* Upload CI artifacts if fail or success

* CI Python tweaks

- Use OMNITRACE_PYTHON_PREFIX and OMNITRACE_PYTHON_ENVS

* CI ELFULTILS_DOWNLOAD_VERSION

* test tweaks

- labels and more coverage tests

* tweak to omnitrace --config handling

* Update module/function constraint handling + PP

- tweak pre-processor definition handling
- removed free-standing module_constraint
- remove free-standing routine_constraint
- remove module_name.find("omnitrace") module constraint
- fully handle the output path of omnitrace *-instr files
- get_use_code_coverage config option
- print-coverage option
- coverage_module_functions

* use github.job not github.name

* Re-enable HSA_ENABLE_INTERRUPT

- remove coverage address report

[ROCm/rocprofiler-systems commit: 791375bb24]
2022-04-25 17:00:52 -05:00
Jonathan R. Madsen 28ade7fbb9 Update CI to test multiple python versions (#45)
* Update CI to test multiple python versions

* Ensure numpy is installed

* Handle lulesh with cmake < 3.16

* Fix typo

* Bump minimum CMake version to 3.16

- CMake 3.15 has issue with PTL object library

* Tweak CI test output

[ROCm/rocprofiler-systems commit: 22eaa780ec]
2022-04-22 03:05:07 -05:00
Jonathan R. Madsen 083035dd8b User API + reorganized lib folders (#30)
* User API + reorganized lib folders

- omnitrace_user_start_trace
- omnitrace_user_stop_trace
- omnitrace_user_start_thread_trace
- omnitrace_user_stop_thread_trace
- omnitrace_user_push_region
- omnitrace_user_pop_region

* New OpenMP examples/tests

* Fix to KokkosP

* OMPT support

- fixed omnitrace instrumenting reporting
- common invoke improvements
- component::user_region

* exclude kmp_threadprivate_

* Separate omnitrace into multiple files

* PTL and timemory submodule updates

* Active guards + USE_OMPT guards in omnitrace-dl

* Tweak transpose default iterations

* omnitrace-precommit build target

* Omnitrace exe restructuring pt 2

- Never instrument functions with less than 4 instructions
- Never instrument ompt_start_tool or nanosleep
- module_function serializes heuristics
- removed hash stuff from omnitrace
- removed instr_procedures lambda
- WAITPID_DEBUG_MESSAGE

* set_state, "_hidden" fix, CI exceptions, backtrace fix

- set_state function
- fixed "_hidden" from appearing in print macros using __FUNCTION__
- OMNITRACE_CI_THROW
- more CI checks in library
- fixed backtrace init value sample issue being ignored

* Tweaks to OMPT tests

* cmake-formatting

* Removed debug output from backtrace processing

* Fix warnings and verbosity

* omnitrace-dl fix for libomp

* omnitrace-avail fixes

- remove second omnitrace_init_library call
- fix -r option not working

* Additional testing

- source/bin/tests
- tests for omnitrace-exe
- tests for omnitrace-avail

* cmake-format

* Reduce runtime of openmp-lu

* Update openmp-lu and tests timeout

* openmp-lu and CI tweaks

- decrease iterations
- OMP_NUM_THREADS=2
- install clang and libomp-dev in linux-ci
- fix data-files in linux-ci

[ROCm/rocprofiler-systems commit: d80752bc69]
2022-03-07 20:40:48 -06:00
Jonathan R. Madsen b99b153030 Critical trace updates (#24)
* Source code restructuring

* Critical trace updates following restructuring

* thread_sampler, timestamps

- thread_sampler
- CPU frequency managed via thread_sampler
- rocm-smi managed via thread_sampler
- Use consistent timestamps for perfetto
- removed hsa_timer_t in favor of wall_clock::record()
- disable KokkosP by default
- re-enable critical-trace testing

* cmake-format

* Fix for defines.hpp.in

* Remove OMNITRACE_ROCM_SMI_FREQ

- thread_sampler freq is set via OMNITRACE_SAMPLING_FREQ w/ max of 1000

* Increase CI Install Dyninst timeout

* Debug macros + omnitrace_init_tooling + config

- new debug macros
- extern "C" omnitrace_init_tooling
- guard get_rocm_smi_devices

* Miscellaneous tweaks

- tweak to transpose
- critical_trace::Device::ANY
- perfetto "critical-trace" category
- OMNITRACE_VERBOSE usage

* Disable key and tid data for HIP API calls

- non-kernels are ignored in activity callback

* critical-trace exe updates

- fix perfetto generation
- improved logging
- improved readability

* timemory submodule update

- lulesh example cmake tweaks

[ROCm/rocprofiler-systems commit: b016c8929f]
2022-02-19 02:00:59 -06:00
Jonathan R. Madsen 4ae26e2d08 rocm-smi and KokkosTools support (#23)
* renamed omnitrace_thread_data to thread_data

* initial implementation

* Numerous fixes and updates

- Updated timemory submodule
- Updated perfetto submodule (pulls in fixes for TRACE_EVENT)
- pthread_gotcha only after omnitrace_init_tooling
- omnitrace banner
- config settings for rocm-smi freq and devices
- critical_trace::get_entries
- OMNITRACE_BASIC_PRINT
- rocm_smi perfetto category
- redirect roctracer warnings for ROCm 4.5.0
- property specializations for rocm-smi components
- units fixes data_tracker types
- roctracer entries for pthread_create and start_thread
- omnitrace-avail defaults to settings, not components
- settings have conforming names
- settings warn about duplicates
- ptl named threads
- decreased max freq for sampler SIGALRM
- rocm-smi names thread
- rocm-smi avoids call to hipGetDeviceCount
- name roctracer activity callback threads
- fixed binary rewrite test output names

* Update lulesh example

- supports non-UVM GPU

* Lulesh tweaks + formatting

* KokkosP + Mode + Roctracer sampling deadlock fix

- kokkosp support
- omnitrace_init_library
- config::print_settings()
- config::get_mode()
- omnitrace::Mode
- omnitrace-avail improvements (removes settings)
- handle get_verbose() < 0
- disable dyninst InstrStackFrames by default
- handle perf_event_paranoid > 1 by disabling PAPI
- SIGALRM max freq to 5.0
- Name threads
- rocm-smi handles get_use_perfetto() and get_use_timemory()
- HSA_ENABLE_INTERRUPT=0 when roctracer + sampling (fixes deadlock)

* Tests, API renaming, roctracer

- disable renaming of thread 0
- verbprintf_bare
- enable dyninst merge tramp
- tweaked some omnitrace exe verbose levels
- reworked roctracer::setup and roctracer::shutdown
- rocm_smi::data::poll checks get_state()
- omnitrace_trace_finalize -> omnitrace_finalize
- omnitrace_trace_init -> omnitrace_init
- omnitrace_trace_set_env -> omnitrace_set_env
- omnitrace_trace_set_mpi -> omnitrace_set_mpi
- sampling mode does not disable timemory
- disable roctracer before shutting down rocm-smi
- lulesh tests w/ and w/o kokkosp
- lulesh tests for perfetto only
    - with --dynamic-callsites --traps --allow-overlapping
- lulesh tests for timemory only
    - with --stdlib --dynamic-callsites --traps --allow-overlapping

* Update timemory submodule

- fix for TIMEMORY_PROPERTY_SPECIALIZATION

* get_verbose() handling + timemory submodule update

- Findroctracer.cmake uses find_package(hsakmt)

* Stability fixes + rework roctracer + perfetto

- reworked roctracer start up
- critical_trace perfetto basic values
- perfetto sampling category
- sampler checks signals
- peak_rss in sampling
- pthread_gotcha::shutdown()
- rocm_smi::device_count()
- HSA_TOOLS_LIB is set
- HSA_ENABLE_INTERRUPT in omnitrace exe
- omnitrace exe verbosity level changes
- Avoid instrumenting Impl ns in Kokkos
- gpu::device_count prefers rocm_smi instead of hip
- ptl blocks signals
- fixed pthread_gotcha roctracer_data values
- removed runtime-instrument-sampling tests
- timemory submodule update

* cmake formatting

* timemory + roctracer updates

- fix timemory issue with papi_common
- fix timemory issue with units
- define roctracer::is_setup()

* Miscellaneous tweaks

- Disable sampling during runtime instrument
- Fixed warnings about dynamic callsites
- Fixed backtrace output when timemory disabled
- Test tweaks

* cmake-format

* omnitrace_target_compile_definitions

* timemory submodule update

* config, omnitrace, State, mpi_gotcha updates

- use OMNITRACE_THROW instead of direct throw
- is_attached()
- is_binary_rewrite()
- get_is_continuous_integration()
- get_debug_init()
- get_debug_finalize()
- max_thread_bookmarks default to 1
- State::Init
- app_thread oneTimeCode
- runtime instrumentation uses waitpid
- fixed init_names
- include main in MPI runs
- fixed sampling setup when disabled
- reworked mpi_gotcha
- disabled critical trace in transpose test

* cmake-format

* handle rocm_smi::device_count() exception

* CI timeouts

* Re-enable runtime-instrument + sampling

[ROCm/rocprofiler-systems commit: 39f17ae8b8]
2022-02-08 17:42:17 -06:00
Jonathan R. Madsen b4a82711d1 Sampler improvements (#22)
* Sampler improvements

- roctracer_flush_activity
- papi_array in backtrace
- fixed sampler trait specializations
- split main_bundle into main and gotcha bundles
- cmake option display

* timemory update

* EINTR handling + debug_{pid,tid}

- sampler handles EINTR for sem_init and sem_destroy
- OMNITRACE_DEBUG_{TIDS,PIDS} env variables

* Increase waitForStatusChange

[ROCm/rocprofiler-systems commit: eccba14f00]
2022-01-27 21:31:08 -06:00
Jonathan R. Madsen f17ff12a66 Sampling support + testing + omnitrace namespace (#19)
* omnitrace namespace

* Kokkos + Lulesh example/tests

* Sampling support + more

- OMNITRACE_BUILD_TESTING option
- sampling support
- pthread_gotcha
- fixes to labels for mpi_gotcha, fork_gotcha, omnitrace_component
- tasking::block_signals, tasking::unblock_signals
- instrumentation mode option in omnitrace exe
- argument option groups in omnitrace exe
- categories in omnitrace settings
- remove TIMEMORY_ prefixed options

* Release workflow updates

* Updated settings printing

* Fixed defaults in README

* Tweak setting defaults in README

* CMake fixes

* cmake-format

* clang-format

* LULESH_USE_MPI OFF

* LULESH_USE_MPI fix

* timemory add_secondary fix

* timemory ambiguous internal namespace fix

* Update timemory submodule

* Handle output path/prefix in omnitrace

- updated timemory
- updated test environment

* sampling + papi fix

* Fix to sampling without PAPI

* Fix for using too many processors in CI

* formatting

* Updated CI

- minor cmake tweaks
- updated timemory submodule

* Updated CI

* Updated CI

* CI + timemory updates

- data race fixes

* CI updates + debug for sampling

* Sampling updates

- moved tasking::{block,unblock}_signals to sampling namespace
- improvements to sampling w.r.t. thread-locality

* Minimum OMNITRACE_THREAD_COUNT of 128

* Handle multiple dims in sampler data

* Configure libunwind support for timemory

* Improved safeguards for sampling

- updated CI
- lulesh runtime-instrument test tweak

* formatting

* CI updates + sampler updates + misc

- fixed stack-buffer-overflow in omnitrace (get_*file_line_info)
- test labels
- steady_clock instead of system_clock in sampler
- update dyninst submodule with upgradePlaceholder fix
- disable OMNITRACE_BUILD_TESTING by default

* Updated timemory submodule

- hidden visibility for timemory
- storage finalizers do not capture this

* Update timemory submodule

- component visibility updates

* Reworked header includes

- use <...> for timemory headers
- always include <library/defines.hpp>

* Rename some config options

* Update PTL submodule

* Update kokkos submodule

* Updated sampling

* Updated CI

* Reworked instrumentation exe

- lowered min-address-range threshold to 256
- extended whole function exclude

* CI fix + timemory submodule update

- TIMEMORY_VISIBLE on component base
- RelWithDebugInfo -> RelWithDebInfo
- Info output for parallel-overhead

* Sampling flags + transpose update + CI update

- disable critical trace for parallel-overhead in CI
- SA_RESTART only in sampler
- reworked transpose example to use fewer threads

* CI update

- removed ubuntu-focal-external-debug
- reduced data artifacts upload

* CI timeouts

- updated timemory submodule
- minor tweaks to omnitrace exe logging

* LICENSE updates (partial)

* CI Test stage timeout extension

* Docker and Packaging updates

* Miscellaneous fixes/tweaks

- gpu.hpp / gpu.cpp
- disable roctracer component if no devices
- re-enable InstrStackFrames by default
- disable sampling by default
- pthread_gotcha::m_enable_sampling is false by default
- timemory submodule update w/ sampler and pop(tid) updates
- fix minor bug in sampler logic
- CMake: OMNITRACE_USE_HIP option
- roctracer + timemory fix

* Replaced OMNITRACE_USE_ROCTRACER with OMNITRACE_USE_HIP where appropriate

* cmake format

* Sampler deadlock fixes

* Removed debug messages from sampler

* Fix for MPI detection + test tweaks + misc

* Sampler deadlock fixes + misc

- removed papi_tot_ins
- pthread_gotcha blocks signals globally until sampler is setup
- metadata specialization for sampling components
- OMNITRACE_INSTRUMENTATION_MODE -> OMNITRACE_MODE
- default sampling delay increased to 0.05 from 1.0e-6
- removed {block,unblock}_signals from critical_trace and ptl
    - no longer necessary to use
- sampling delay minimum is 1.0e-3
- OMNITRACE_BUILD_HIDDEN_VISIBILITY

* omnitrace-avail + libunwind update + restructure

- restructured omnitrace components
- build custom omnitrace-avail executable
- updated libunwind to avoid malloc in get_unw_backtrace

* Fix remaining reorganization issues

- removed some duplicate code
- fixed some trait specializations after implicit instatiation
- formatting

* ensure_storage fix + avail improvements

- fix ensure_storage when component not avail
- suppress irrelevant info in omnitrace-avail

* Delay settings initialization

- slight tweak to tests w/ MPI

* Disable OpenMPI testing w/ ubuntu-bionic

- MPI testing is hanging bc of network interface issue on system:

> [[20462,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> Module: OpenFabrics (openib)
>   Host: fv-az19-371
> Another transport will be used instead, although this may result in
> lower performance.
> NOTE: You can disable this warning by setting the MCA parameter
> btl_base_warn_component_unused to 0.

[ROCm/rocprofiler-systems commit: 778af2a760]
2022-01-24 20:49:17 -06:00
Jonathan R. Madsen 9d5ebf9c3b Rename hosttrace to omnitrace (#18)
[ROCm/rocprofiler-systems commit: 39cf760a4e]
2021-11-24 04:59:59 -06:00
Jonathan R. Madsen efb6d766af Reorganization and critical trace support (#17)
* Roctracer wall clock integration (#16)

* Integrates roctracer values into wall-clock

* Fixed scoping + timemory roctracer

* Fixed data race in roctracer

* Synchronized HIP API on main thread

- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose

* Debugging + MPI + transpose updates

* PTL + HSA and timemory + kernel timing

- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks

* Ignore select HIP API types

- The ignored API types are ignored because there appears to be a bug
  which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory

* Tweaks to PTL config

* Timemory update + pid-prefix w/ mpi headers

- %pid%- prefix with mpi headers
- timemory submodule update

* CMake + critical trace + reorganize library source

- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace

* Remove bits/stdint-uintn.h headers

* Rename + config + depth + critical path

- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation

* Fixed duplicates in perfetto critical-trace

* reworked critical trace support

- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)

* Removed "%pid%-" output prefix in mpi_gotcha

* Update timemory submodule

[ROCm/rocprofiler-systems commit: 752424efc2]
2021-11-23 02:53:14 -06:00
Jonathan R. Madsen cdd2707058 v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh (#15)
* v0.0.3: MPI wrappers w/o full MPI support + setup-env.sh

- bumped to v0.0.3
- enabled gotcha wrapping of MPI functions w/o enabling MPI
- added setup-env.sh script
- minor updates to testing
- Update timemory submodule
- fixes tim::component::configure_mpip undefined symbol
- Script updates

[ROCm/rocprofiler-systems commit: c56b49a0bd]
2021-10-01 16:46:03 -05:00
Jonathan R. Madsen f3e7a1664a cmake-format + miscellaneous tweaks (#13)
* cmake-format + miscellaneous tweaks
* Formatted cmake in examples and tests
* Updated linux-ci.yml artifacts naming
* Updated clang-format
* Fixed submodule branches

[ROCm/rocprofiler-systems commit: 6c93674f92]
2021-09-20 11:12:06 -05:00
Jonathan R. Madsen 244b308cb5 Integrated perfetto + roctracer (#5)
- hosttrace library automatically collects and merges timestamps for HIP API calls and kernels with the host-side instrumentation
  - mostly eliminates the need for using external rocprof
- added thread_instruction_count in perfetto output
- increased hosttrace min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- miscellaneous cmake updates

* roctracer support

- fully integrated perfetto + roctracer outputs
- thread_instruction_count in perfetto
- increased min_loop_address_range to 512
- disabled instrumenting functions with dynamic callsites by default
- updated timemory submodule

* hosttrace_launch_compiler

- support for using an alternative compiler as needed via launch compiler
- elfio added as submodule (not currently used)
- miscellaneous cmake updates

* README update + host/device categories + misc

- timemory fix for TIMEMORY_ROCTRACER_ENABLED
- transpose fix

* papi_tuple_t -> papi_tot_ins

- minor fix to Findroctracer.cmake

[ROCm/rocprofiler-systems commit: a061b7947f]
2021-09-06 22:23:24 -05:00
Jonathan R. Madsen 6825578603 Improved analysis of functions to instrument + MPI support + timemory support (#2)
* various tweaks
* build updates + cleanup + overlap guard + min addr range
* Library source reorg + miscellaneous tweaks
* Removed unnecessary fwd decls
* Print address range in --print-X pair mode

- hosttrace modifications
  - disable instrumenting functions with overlapping sections or multiple entry points by default (control via --allow-overlapping option)
  - disable instrumenting functions whose address range < 512 bytes unless a loop is present by default (control via --min-address-range option)
  - disable instrumenting functions w/ loops whose address range < 64 bytes (control via --min-loop-address-range)
- Support for wrapping MPI function calls even in binary rewrite mode
  - e.g. use gotcha to wrap MPI functions with hosttrace_push_trace and hosttrace_pop_trace
- New timemory only mode --> HOSTTRACE_USE_TIMEMORY=ON
- New timemory + perfetto mode --> HOSTTRACE_USE_PERFETTO=ON + HOSTTRACE_USE_TIMEMORY=ON
- Full support for all timemory components
- parallel-overhead example for measuring the overhead in a MT-parallelized application with very small instrumentation functions
- improvements to output directories for hosttrace exe
- improvements to output directories for hosttrace library
- new hosttrace options
  - --print-instrumented <type> prints out the instrumented entities and exits
  - --print-available <type> prints out the available instrumentation entities and exits
  - --print-overlapping <type> prints out the overlapping entities and exits
  - NOTE: <type> above refers to the information printed out, e.g. module name vs. function name vs. module and function name, etc.

[ROCm/rocprofiler-systems commit: 1f15b3070f]
2021-09-02 11:38:39 -05:00
Jonathan R. Madsen 045f29eda6 timemory submodule (#1)
- library now wraps fork() to warn about fork + OpenMPI

[ROCm/rocprofiler-systems commit: fc7cc1e68f]
2021-08-17 17:34:34 -05:00
Jonathan R. Madsen 581c4122aa Hosttrace via Dyninst
- complete with ctest support


[ROCm/rocprofiler-systems commit: 9ef3800986]
2021-08-06 13:08:57 -05:00