The Omnitrace program is being renamed.
Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"
---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: d07bf508a9]
PyTorch Python fork fix part 2
- store script file in environment for robustness against restart after fork
[ROCm/rocprofiler-systems commit: 6c9b66d938]
* PyTorch Python fork fix
- fixes issue where forking process in PyTorch causes omnitrace/__main__.py to fail due to missing script argument
* Update source/python/omnitrace/__main__.py
Remove debugging "print" LOC
[ROCm/rocprofiler-systems commit: a85f141afe]
* Long workload name layout fix (#269)
* changed layout to fit experiment names
- added span to show full name
- shortened name to fit dropwdown
- changed layout for added consistency
* layout Fixes
- refresh button is to the right
- header is more consistent across different width screens
* header layout update
- center div makes turns into multiple lines if not all items fit
* slight improvement for header/graph spacing
* Fixed refresh button shape and function
- moved find_causal_files to parser so that main and gui can access
- resized refresh width to allow for same shape across different screens
* all graphs now have the same width
- graphs now have same width
- chart headers start well below the header with filters
---------
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Causal GUI: Linting, synced y-range, remove unused imports/variables, and bug fixes (#274)
* GUI: python linting workflow
- runs flake8 on code in source/python/gui
* GUI: flake8 settings in source/python/gui/setup.cfg
- ignore E501 errors (line too long)
- ignore W503 errors (line break before binary operator)
* GUI: setup.py updates
- remove unused imports
* GUI: __main__.py updates
- stddev is float value
- remove unused imports
- effectively propagate the --stddev argument
* GUI: gui.py updates
- remove unused imports
- fix light mode
- sync initial y range of all causal plots
- fix error bars for causal data
- set x-ticks to 5
- set y-ticks to 10
- only display top 99% of samples
- separate global declarations and assignments to global values
- remove unused variable assignments
- fix mislabeled function_regex and exp_regex
- change if X == False to if not X
* GUI: header.py updates
- remove unused imports
- fix mislabeled function_regex and exp_regex
* GUI: parser.py updates
- add set_num_stddev function for manipulating global num_stddev value
- remove unused variables
- fix latency point object (duplicated __init__ function)
- fix handling latency in JSON
- fix formatting of validation format error message
- replace if X == False with if not X
- fix unused dataframe creation in add_latency
- fix flake8 do not assign lambda for name_wo_ext (use def)
* GUI: gui.py updates
- replace misnamed "func_list" with "experiment_list"
- replace misnamed "exp_list" with "progpt_list"
* GUI: fix python workflow
- quote python versions to avoid truncating 3.10 to 3.1
---------
Co-authored-by: JoseSantosAMD <87447437+JoseSantosAMD@users.noreply.github.com>
[ROCm/rocprofiler-systems commit: cc14b52584]
* omnitrace-run exe
- ensure LD_PRELOAD for libomnitrace-dl.so
- convert config options into command-line options
* Update timemory submodule
- updates to tsettings
- updates to argparser
* common environment update
- throw error if get_env<bool> has empty string
* config updates
- minor tweaks to categories of settings
* core lib update
- add argparse for common handling of argument parsers
* omnitrace-sample update
- fix handling of --trace-file (OMNITRACE_PERFETTO_FILE)
* omnitrace-run update
- updated to use omnitrace::argparse functions
* Tests for omnitrace-run
* argparse core update
- remove choices for --cpu-events and --gpu-events
* remove some debugging prints
* fix timemory include in argparse.cpp
* always provide --hsa-interrupt option
* Update source/lib/core/argparse.cpp
- fix pedantic warning
* Update testing
- remove testing args that may not be there in some builds
* roctracer/pthread_create fix
- disable roctracer_data when roctracer not enabled
* omnitrace-causal tweak
* omnitrace-instrument: module_function tweak
- allow DEFAULT_MODULE and LIBRARY_MODULE
* common environment update
- support get_env for enums
* core: config update
- Add "mode" category to OMNITRACE_MODE
* Update timemory submodule
- remove debug print statement
* omnitrace-sample tweak
- change var init
* omnitrace-run testing update
- use --help instead of -?
* core: common.hpp
- tweak header include style
* core: argparser update
- add_ld_preload func
- launcher and command member variables in parser_data
- support launcher
* omnitrace-run update
- clean up and reworked
* libomnitrace-dl updates
- require LD_PRELOAD with binary rewrite
- dl::InstrumentMode
- dl::get_instrumented()
- verify_instrumented_preloaded()
- omnitrace_set_instrumented(int)
- relocated omnitrace_main from main.c to dl.cpp
- omnitrace_set_env does not dlopen libomnitrace
- omnitrace_set_main(func_ptr) [internal API]
- OMNITRACE_HIDDEN_API -> OMNITRACE_INTERNAL_API
* Update testing to new LD_PRELOAD requirements
* omnitrace-instrument updates
- adhere to LD_PRELOAD requirementsa
- invoke omnitrace_set_instrumented
- binary rewrite does not instrument main
- binary rewrite does not instrument call to omnitrace_init
- runtime instr does not instrument main
- runtime instr does not instrument call to omnitrace_init
* Bump to v1.9.0
- LD_PRELOAD requirement necessitates minor version increment
* common: environment
- fix ambiguous get_env calls
* omnitrace-instrument update
- fix issue with temporaries
* omnitrace-instrument and libomnitrace-dl updates
- runtime instrumentation does not work if libomnitrace-dl is preloaded
* libomnitrace-dl and libpyomnitrace updates
- define dl::InstrumentMode in dl.hpp
- handle instrumentation via setprofile libpyomnitrace
- do not push trace in omnitrace_init
* omnitrace-instrument and libomnitrace-dl updates
- move header to dl subdirectory
- omnitrace::omnitrace-headers include omnitrace-dl folder
- use InstrumentMode in omnitrace-instrument
* Update workflows and scripts
- Use omnitrace-run on instrumented exes
* Update docs
- add omnitrace-run to examples of running binary rewritten exes
[ROCm/rocprofiler-systems commit: abe35de43a]
* Fixes for Python 3.11
* Add python 3.11 to scripts
- also tweak to to{upper,lower} bash functions
* Fix PAPI RPM packaging in RedHat
- fix error from #!/usr/bin/python in papi_hl_output_writer.py
- requires either python2 or python3 instead of python
* cpack updates
- only generate STGZ for RedHat
- support `--generators` arg in build-release.sh
- support 7z, zip, and other zip generators
- fix build-release.sh with `--mpi`
- support setting CONDA_ROOT
* Support rhel/fedora/centos in omnitrace-install.py
* RedHat status badge
* Fix support for Python 3.11 + tweak ubuntu ci
- Remove installing clang and mpich in Ubuntu CI container
- Fallback on conda-forge for Python 3.11
- Enable entrypoint-rhel.sh for RHEL CI
- Pull latest container by default
* Update ElfUtils and PAPI builds
- quieter build output
- disable-nls for ElfUtils
- use -s flag for make
* Development Guide Docs
[ROCm/rocprofiler-systems commit: 83f9ed8696]
* Support external (i.e. user-defined) trace annotations
- tweaked the python examples to be more balanced
- updated the user-api example to conform to user API changes
- moved the get/set for State and ThreadState to state.{hpp,cpp}
- introduced user-provided trace annotations
- added perfetto python category
- moved coverage impl files around
- created enumerations for mapping category enums to category types
- created enumerations for mapping annotation type enums to annotation values
- moved tracing::add_perfetto_annotations to tracing/annotation.hpp
- utility make_index_sequence_range
- libomnitrace-dl: omnitrace_push_category_region
- libomnitrace-dl: omnitrace_pop_category_region
- libomnitrace-user: omnitrace_user_push_annotated_region
- libomnitrace-user: omnitrace_user_pop_annotated_region
- libpyomnitrace: support extra annotations via annotate_trace config value
- filename
- line
- last attempted instr in bytecode (lasti)
- argcount
- num local variables
- stacksize
- omnitrace-python: -a / --annotate-traces option
* tweak ubuntu-focal workflow
* Fix installation of omnitrace-user headers
* ubuntu-focal-codecov workflow update
- Install texinfo
* Update timemory submodule
[ROCm/rocprofiler-systems commit: 642b6b95ca]
* Submitting jobs to cdash
* Fail on submit
* submit url env
* submit url env
* try passing submit url as arg
* fix submit url
* Updated default URL
* Add submissions for remaining ubuntu focal workflow jobs
* Replace g++ with gcc in dashboard build name
* Add --ctest-args to run-ci.sh
* Add cdash support for bionic, jammy, and opensuse workflows
* Decrease CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE
* OMNITRACE_BUILD_CODECOV option
* Support code coverage in CDash script
* CI dyninst built with debug info
* Update ci-containers
- cron schedule moved 4 hours later to UTC+5
* Update implementation of config::configure_signal_handler
- using lambdas failed to compile with codecov flags
* Add codecov job to ubuntu focal workflow
* Fix support for --ctest-args in run-ci script
* Fix ubuntu workflows
* Fix quotation handling in run-ci script
* git safe directory for codecov
* New MPI examples
* Remove --stop-on-failure
* dynamic_library update
- find_library_path checks procfs maps
- invoke find_library_path with no additional args to resolve to mapped file
* RCCLP uses dynamic_library
* check if file exists for memory_map_files metadata
* Testing updates
- include new mpi examples in tests
- fix test labels
- test critical-trace exe
* Update MPI C examples tests (needed arg)
* Remove try/catch block from critical-trace
* Fix sampling max wait when shutting down
* Fix test env for critical-trace
* Fix settings for critical-trace
- disable time output: data is deterministic
- disable PID suffixes: not multiprocess
* Update critical-trace ctest
* Update critical-trace exe
- throw error if input cannot be opened
- throw error if input has no data
* Update lulesh example with more kokkos tools usage
* Fix tasking issue with critical_trace and roctracer
- were not setting pools to active
- also sync before critical_trace::get_entries
* Increase verbosity of critical-trace tests
* Update code coverage tests
- skip code coverage + preload
- code-coverage python example and test
* Remove duplication omnitrace.initialize function
* Skip python3.6 for ubuntu jammy
* Update MPI examples
- use MPI_Isend and MPI_Irecv
- explicitly use MPI_Bcast
* Update Formatting.cmake
- include C files in examples
* run-ci script does not check return of coverage
* mpi-allreduce link to libm
* Update ctest args in run-ci script
* Update dyninst submodule
- safety improvements in BinaryEdit::openResolvedLibraryName
* capture cmake error for ctest_coverage
[ROCm/rocprofiler-systems commit: 46b6db1a4c]
## Overview
This is a significant PR which has 3 very notable characteristics:
1. Omnitrace colorizes most of it's logging
2. Completely reworked the sampling
- Samples now record the current instruction pointers instead of strings
- This _dramatically_ decreases the overhead of taking a sample
- The collection of metrics during a sample are split out into another component, enabling that data collection to be disabled -- which decreases the sampling overhead even further
- When both `OMNITRACE_SAMPLING_CPUTIME` and `OMNITRACE_SAMPLING_REALTIME` are ON:
- `OMNITRACE_SAMPLING_CPUTIME_FREQ` and `OMNITRACE_SAMPLING_REALTIME_FREQ` can be used to individually control the sampling frequency
- `OMNITRACE_SAMPLING_CPUTIME_DELAY` and `OMNITRACE_SAMPLING_REALTIME_DELAY` can be used to individually control the delay time before starting
- Now, omnitrace does not start a real-time sampler on the main thread unless `OMNITRACE_SAMPLING_REALTIME` is ON
- In the future, an `OMNITRACE_SAMPLING_TIDS` (and real-time, cpu-time variants) configuration variable(s) will allow you to select which threads will be sampled
3. Files produced by `omnitrace` exe -- `available-instr.txt`, `instrumented-instr.txt`, etc. -- now no longer has `-instr` suffix and are placed in `instrumentation/` subfolder, i.e. `available-instr.txt` -> instrumentation/available.txt`
- This helped de-clutter the output folder
Most of the other edits were reorganization (e.g. internal namespace changes), cleanup, and splitting up functionality.
## Bug Fixes
There is a bug fix with respect to the HSA callbacks which disabled sampling on child threads when an HSA API call was made
## Details
- created thread_info struct for mapping different thread IDs
- reorganized file structure significantly
- added categories.hpp, concepts.hpp
- moved around name trait definitions
- moved all omnitrace components into `omnitrace::component` namespace
- there was a lot of inconsistency b/t using `tim::component` in some places and `omnitrace::component`
- added macros like OMNITRACE_DECLARE_COMPONENT in lieu of TIMEMORY_DECLARE_COMPONENT
- OMNITRACE_CRITICAL_TRACE_NUM_THREADS -> OMNITRACE_THREAD_POOL_SIZE
- roctracer and critical_trace use same thread pool
- critical_trace functions do not lock anymore bc of thread-local TaskGroup
- added `component::local_category_region` to support using `component::category_region` without explicitly passing in name
- removed `component::omnitrace` (unused)
- migrated KokkosP and OMPT to use `component::local_category_region`
- removed `component::user_region` as a result
- migrated omnitrace_{push,pop}_{trace,region}_hidden to use component::category_region
- removed `component::functors` as a result
- migrated some ppdefs
- `api::omnitrace` -> `project::omnitrace`
- `api::(...)` -> `category::(...)`
- improved recording the execution time of threads
- migrated this functionality out of pthread_create_gotcha and into thread_info
- moved mpi_gotcha, fork_gotcha, exit_gotcha, rcclp into omnitrace::component namespace
- split backtrace up into backtrace, backtrace_metrics, backtrace_timestamp components
- sampling.cpp handles setup and post-processing that was formerly in backtrace
- updated logging to use colors
- `OMNITRACE_COLORIZED_LOG` config variable
- updated docs on JSON output from timemory
- instrumentation info in instrumentation subfolder
- added testing for KokkosP entries
- added testing for ompt entries
- add_critical_trace function defined in critical_trace.hpp
- disable push_thread_state and pop_thread_state when thread state is Disabled or Completed
- add comp::page_rss to main bundle
- thread_data supports std::optional instead of std::unique_ptr
- thread_data supports tim::identity<T> to avoid unique_ptr or optional
- tracing::record_thread_start_time()
- tracing::push_timemory and tracing::pop_timemory are templated on CategoryT
- removed anonymous namespace from omnitrace::utility
- sampling backtrace stores instruction pointers instead of strings
- component::category_region updates
- handle disabled thread state
- handle finalized state
- fewer debug messages
- invoke thread_init()
- invoke thread_init_sampling()
- handle push/pop count based on category
- push/pop count only modified when used
- component::cpu_freq
- components/ensure_storage.hpp
- reworked the pthread_create replacement function
- updated parallel-overhead example to report # of times locked
- OMNITRACE_MAX_UNWIND_DEPTH build option
- update timemory submodule
[ROCm/rocprofiler-systems commit: 808ea7dfa7]
- When one python version is used, install to proper lib/pythonX.Y/site-packages
- config files, etc. in build tree resemble the install-tree
[ROCm/rocprofiler-systems commit: 95913c7135]
* User regions in Python
* User-region testing + common submodule
- Updated examples/python/source.py to use user-regions
- Python-level user submodule
- Python-level common submodule
- clean-up of profiler python code
- extended source.py testing to include the user-regions
[ROCm/rocprofiler-systems commit: 4dd144a32c]
* adding perfetto-validation-script
* Rename validation script
* Update tests/validate_perfetto.py
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Update tests/validate_perfetto.py
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Update tests/validate_perfetto.py
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Update tests/validate_perfetto.py
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* addressed the edits in the validation script
* addressed the edits in the validation script
* Perfetto validation script (#1)
* Fixed mismatch message in validate-timemory-json
* validate_perfetto.py -> validate-perfetto-proto.py
* Add python-source-validate-perfetto
- python-source-validate-perfetto uses validate-python-proto.py to validate perfetto output
- renamed python-source-check test to python-source-validate timemory
* Moved python-source-validate tests outside of cat command if block
- these tests don't rely on OMNITRACE_CAT_COMMAND
* CMake/CTest OMNITRACE_ADD_PYTHON_VALIDATION_TEST function
- generalized function for performing validation test with validate-{timemory-json,perfetto-proto}.py scripts
* Print perfetto validation
* Install perfetto python package in workflows
* cmake format
* Python formatting
* Python formatting CI
* Install perfetto python package in workflows
* Install dataclasses for perfetto in opensuse
* Install dataclasses for perfetto in ubuntu
- uninstalled dependency for perfetto in Python 3.6
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-systems commit: cfa16cbe2f]
* Implements --label option for python profiler
- removes '-a', '-f', and '-l' options in favor of '--label args file line' for consistency with omnitrace exe
* cmake formatting
* Bump version to 1.1.0
[ROCm/rocprofiler-systems commit: 017e794d63]
* Parallel overhead example with locks
* Support tracing mutex locking + more
- support wrapping pthread_mutex_lock
- support wrapping pthread_mutex_unlock
- support wrapping pthread_mutex_trylock
- get_perfetto_combined_traces setting
- OMNITRACE_TRACE_THREAD_LOCKS option
- ThreadState
- critical trace includes queue id
- enabled/disabled settings in timemory
- fix OMNITRACE_TIMEMORY_COMPONENTS
- fix reading config
- fix setting categories
- applied ThreadState::Internal in various places
- utility::get_filled_array
- utility::get_reserved_vector
- utility::get_thread_index
- fork_gotcha messages about forks
- split out some pthread_gotcha functionality into pthread_create_gotcha
- handle queue id in roctracer callbacks
* Update timemory and PTL submodules
* Misc CMake updates
- Includes fix to omnitrace-static-lib{gcc,stdcxx}
* Misc cleanup to pthread_mutex_gotcha and backtrace
* Fix to duplicate field in module_function json
* Improvement to debug messages
* omnitrace-dl and common improvements
- tweak to delimit
- common::ignore message
- common::join quoting of strings
- omnitrace_set_env ignores if inited and active
- omnitrace_set_mpi ignores if inited and active
* nsync for transpose example
* Fix to thread_deleter<void> functor invoke
* Fix thread state and HIP stream enums
[ROCm/rocprofiler-systems commit: b208047741]
* Code-coverage support
* Examples update
- code-coverage example
- tweak transpose and parallel-overhead
* Coverage output + testing
- config::get_setting value(...)
- REGULAR_EXPRESSION -> REGEX in cmake func args
- coverage.hpp header
- coverage JSON
- coverage tests
* cmake formatting
* Library instrumentation w/o main + more
- fixed library instrumentation w/o main
- use TIMEMORY_PROJECT_NAME in output messages
- removed '--driver' option from omnitrace exe
- support coverage in trace mode
- OMNITRACE_KOKKOS_KERNEL_LOGGER
- support multiple calls to omnitrace_set_env after init if already called
- support multiple calls to omnitrace_set_mpi after init if same args
- support multiple calls to omnitrace_init if same mode
- unique_ptr_t for thread_data which calls finalize when thread_data is destroyed
- tweaked openmp tests
- improved finalization
* Replace CI --output-on-failure with -V
* Fix to OMNITRACE_DL_INVOKE
* omnitrace-exe and testing updates
- omnitrace::omnitrace-timemory interface library
- support for configs in omnitrace exe
- print-{available,instrumented,...} opts no longer exit w/o --simulate
- all tests apply --print-instrumented functions
- tweaked coverage tests
- print-* options print instructions not address range
* Remove OMNITRACE_DEBUG_FINALIZE=ON from CI
* Python cmake tweaks
* Tweak test ordering
* Upload CI artifacts if fail or success
* CI Python tweaks
- Use OMNITRACE_PYTHON_PREFIX and OMNITRACE_PYTHON_ENVS
* CI ELFULTILS_DOWNLOAD_VERSION
* test tweaks
- labels and more coverage tests
* tweak to omnitrace --config handling
* Update module/function constraint handling + PP
- tweak pre-processor definition handling
- removed free-standing module_constraint
- remove free-standing routine_constraint
- remove module_name.find("omnitrace") module constraint
- fully handle the output path of omnitrace *-instr files
- get_use_code_coverage config option
- print-coverage option
- coverage_module_functions
* use github.job not github.name
* Re-enable HSA_ENABLE_INTERRUPT
- remove coverage address report
[ROCm/rocprofiler-systems commit: 791375bb24]
* Support multiple Python versions in single build
* RPATH + Split up config into config and runtime
* pybind11 submodule
* Docker build updates
[ROCm/rocprofiler-systems commit: 4db6ba3d28]