* Integrating amd-smi into rocprofiler-systems due to rocm-smi deprecation.
* No functionality changes to users other than naming conventions.
* New tracks available in perfetto- gpu busy percentage metrics now splits gfx busy into separate gfx, umc, and mm engine measurements.
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
The Omnitrace program is being renamed.
Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"
---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Bump version to 1.10.3
* Drop releases for ROCm < 5.3
- ROCm is no longer providing release for Ubuntu 18.04 starting with 5.3 so omnitrace is dropping support for Ubuntu 18.04 + ROCm
- Dropping ROCm 5.2 releases for Ubuntu 20.04
- Dropping ROCm 5.2 releases for OpenSUSE 15.4
* Update redhat workflow
- Test RedHat 9.1 + ROCm 5.5
- Test RedHat 9.1 + ROCm 5.6
* Update ubuntu-focal workflow
- drop ROCm 5.2 testing
- add ROCm 5.6 testing
* Update Findroctracer.cmake
- provide /opt/amdgpu to HINTS/PATHS for drm and drm_amdgpu libraries
* Update Findrocprofiler.cmake
- prefer librocprofiler64.so.1
* Update librocprofiler64.so to librocprofiler64.so.1
- search for the SOVERSION 1 library of librocprofiler64.so if ROCm > 5.5.0
* Update Findrocprofiler.cmake
- link to libpciaccess for ROCm 5.5.0
* Update redhat CI workflow
- install libpciaccess for rocm CI
* Update cpack workflow
- Remove all RHEL 9.0 packaging
- Remove all packaging for ROCm 5.3 on OSes supporting where releases are provided for 5.4, 5.5, and 5.6
* Update ubuntu focal workflow
- remove rocm 5.3 jobs
* CDash name prefix {{ repo_owner }}-{{ ref_name }}
- remove /merge from CI name
* disable using BFD when sampling_include_inlines is OFF
- this consumes a lot of memory
* Improve finalization of rocprofiler
* update timemory submodule
- disable OMPT thread begin/end callbacks
- support hierarchies in signal handlers
- update operation::pop_node debugging
- settings_update_type + setting_supported_data_types
- fixed parsing args in timemory_init
* Improve timemory build time
* Remove kokkosp restrictions for perfetto
* omnitrace exe signal handler update
- configure signal handlers before main to allow libomnitrace to override
* Backtrace and timemory submodule updates
- Use unwind::cache w/o inline info
- update timemory submodule
- unwind::cache updates
- filepath updates
- fix termination_signal_message
- fix vsettings::report_change
* Update dyninst submodule
- updates BinaryEdit::getResolvedLibraryPath
* update timemory submodule
- update CpuArch support
* Cleanup configure warnings
* Update examples cmake and workflows
- (Mostly) eliminate configuration warnings
* omnitrace exe updates
- pass environ to BPatch::processCreate
- avoid trailing ":" in DYNINST_REWRITER_PATHS
* Update dyninst submodule
- Add flags to DyninstOptimization.cmake
- Remove strtok from BinaryEdit::getResolvedLibraryPath
* examples/mpi CMakeLists.txt update
- STATUS message about missing MPI during CI, otherwise AUTHOR_WARNING
* Dev build and linker flags
- use -gsplit-dwarf when OMNITRACE_BUILD_DEVELOPER is ON
- disable when OMNITRACE_BUILD_NUMBER > 1
- OMNITRACE_BUILD_LINKER option
- add -fuse-ld=${OMNITRACE_BUILD_LINKER}
- omnitrace_add_cache_option function
* Update workflows to set OMNITRACE_BUILD_NUMBER
* Fix generator expressions for -fuse-ld
* Suppress some configuration warnings during CI
- helps to keep track of real warnings when they arise
* Update timemory and dyninst submodules with CMP0135
* Add -V flag to run-ci script
- Fix setup-env.sh
- Closes#149
- omnitrace exe color
- test-install.sh script
- if config variable is updated in config or env, include in generated
config
- metadata for hsa, rocm, and ompt
- Closes#153
- Closes#154
* RPATH to rocprofiler_LIBRARY_DIR for ROCm < v5.2
- until v5.2 only librocprofiler64.so was symlinked in /opt/rocm. Thus linker using SOVERSION caused issues finding librocprofiler64.so.1
* Test ROCm w/ CMAKE_INSTALL_RPATH_USE_LINK_PATH=OFF
* INSTALL_RPATH_USE_LINK_PATH for omnitrace exe
* Initial support for RCCL
* OMNITRACE_USE_RCCLP + sampling tweaks
- also OMNITRACE_SAMPLING_KEEP_INTERNAL option
- minor modifications to sampling to use keep internal option + discard funlockfile
* Update docker and workflows to download RCCL
* Update CPack DEB with rocprofiler dependency
* Rework rccl into library and library/components folder
- add tpls/rccl/rccl/rccl.h
* Fix timemory includes
* rcclp inline definitions when disabled
* Tweaks to ubuntu-focal-external-rocm
- disable ompt
- enable building testing
* Tweaks to ubuntu-focal-external-rocm
- ctest exclude
* Tweak ubuntu-focal.yml
- remove source /.../setup-env.sh, replace with $GITHUB_ENV
* Fix ubuntu-focal-rocm + OMPI + root
* Improved rocm-smi error handling
- Recover from rocm-smi errors
- Disabling rocm-smi after recovering from errors
- Werror in developer mode
- Remove State::DelayedInit
- Add State::Disabled
* formatting
* Fix merge of OMNITRACE_SAMPLING_KEEP_INTERNAL
* Update RCCL include directory
- based on ROCm version we need with <rccl/rccl.h> or <rccl.h>
* RCCL Testing
- updated tests to use configuration files
- many tests generate a configuration file
- tests how have GPU option
- enable ncclCommCount, disable ncclGetVersion
- add testing for RCCLP via rccl-tests
- working directory of tests is PROJECT_BINARY_DIR
- add nccl/rccl functions to get_whole_function_names
- some clang compiler fixes
* Handle RCCL include w/o HIP
* RCCL requires HIP
* Update OMNITRACE_SAMPLING_CPUS for testing
* Update tests/CMakeLists.txt
* Debug settings
* Install MPI even when USE_MPI=OFF
* exclude printf
* skip mpi tests w/o USE_MPI or USE_MPI_HEADERS
* update ubuntu rocm workflow
* Fix configure env step for ubuntu rocm
* fix omnitrace print-* with libraries
* timemory submodule update
* Update workflows to use ./bin/omnitrace instead of ./omnitrace
* cmake format
* update timemory submodule
- fix ODR violations in utility/procfs
* cmake updates
- uniform find_package for all ROCm-based libraries
* tweak transpose example
- throw exception instead of std::exit
* Inspect cmdv name before assuming not exe
- some ELF execs "think" they are libraries so only assume rewrite + simulate + all-functions if filename looks like library
- adds some test for --print-available -- <library>
* Fix _has_lib_prefix when command is < 3
* Updates and reverts to omnitrace exe
- update module_function operator< and operator==
- add function_signature operator<
- refactor module_function ctor
- revert some previous changes w.r.t. simulate and include_unninstr
* Fix source/bin/tests to use same output dir as tests
* cmake format
* Segfault mitigation + refactor + modify function iteration
- refactor module_function ctor to avoid segfaults
- string_t -> std::string
- replace std::string with std::string_view in some places
- get_name(module_t*)
- get_name(procedure_t*)
- disable using both app_modules and app_functions
- new option: --parse-all-modules to iterate over app_modules
- removed some unused code w.r.t. debug info
* Disable module_function address range for uninstrumentable functions
* Disable module_function address range for uninstrumentable functions
* Refactored getting file/line info and init/fini
- use dyninst insertInitCallback and insertFiniCallback if main not found
- fixed all issues with segmentation faults in --simulate --all-functions
* revert changes to Findrocprofiler.cmake
* Initial support for GPU hardware counters
* Update find modules for roctracer and rocprofiler
- /opt/rocm/{rocprofiler,roctracer} path is deprecated so tweak search procedure
* Improve ConfigCPack for MPI
* Update rocprofiler
- rocm_metrics()
- minor cleanup
* Update rocm find modules
* declare rocm_metrics + call in omnitrace-avail
* relocate omnitrace-launch-compiler
* REALPATH and find_modules
* Examples cmake (may drop)
* omnitrace-avail
- hw_counter categories
- init rocm
* setenv updates for rocprofiler in library.cpp and dl.cpp
* get_rocm_events config
* gpu::hip_device_count()
* rocm_metrics returns hardware_counters::info
* - relocated library/components/roctracer_callbacks.* to library/roctracer.*
- relocated library/components/rocprofiler.* to library/rocprofiler.*
- cleaned up rocprofiler.hpp
- added perfetto output of rocprofiler
- added timemory output of rocprofiler
- renamed omni.roctracer thread to roctracer.hip
- added roctracer.hsa thread name
- updated timemory submodule to support std::variant
- updated timemory submodule to support = in config value
- updated timemory submodule to support standalone storage
- updated timemory submodule to support new hw counter apis
- updated timemory submodule to prevent label/description caching in data_tracker
* update omnitrace-avail info_type generation
* Update timemory submodule
* rocprofiler component
* cmake formatting
* omnitrace-avail handle no GPUs
- Add -c command-line option for --categories
- support verbosity
* hsa_rsrc_factory throws exceptions
- throw exceptions to avoid aborting on HSA_STATUS_ERROR_NOT_INITIALIZED when advantageous
- removed duplicate specialization of is_available for component::rocprofiler
* rocprofiler symbols for when disabled
* Fix warning in omnitrace-avail
- std::stringstream from initializer list would use explicit constructor
* Fix finalization after settings are deleted
* Reorganized rocprofiler source
* Updated formatting
* Miscellaneous tweaks
- added using statements from timemory
- tweaked the main and thread bundle names
- fixed timemory header includes
* Updates installation docs + minor cmake tweaks
- OMNITRACE_BUILD_LIBUNWIND option
- Locally set OMNITRACE_USE_HIP=OFF if roctracer and rocm-smi are off
- Force TIMEMORY_BUILD_GOTCHA to avoid bug in gotcha not patched upstream
* MPI-Headers
- include copy of mpi.h from OpenMPI
- reworked FindMPI-Headers to support the internal OpenMPI headers
* cpack workflow for building installers
- ConfigCPack.cmake update
- STGZ and DEB + containers + test artifact
- DEBIAN_FRONTEND + set -v
- submodule fix
- actions checkout
- OMNITRACE_ROCM_VERSION + continue-on-error
- Change CPack generators + fix path to DEB
- separate configure, build, and package steps
- use cd instead of pushd
- FindROCmVersion + fix to cpack testing
- use ${ROCM_PATH}/.info/version for ROCm version info
- Tweaks for debian installer
- Packaging fixes
- Use CMAKE_SHARED_LIBRARY_SUFFIX instead of .so
- Split cpack.yml into 4 workflows
- Replace source with export in cpack
- Dyninst boost uses tar.gz instead of zip on Unix
* Fix to common join
* Update VERSION to 1.0.0
* Roctracer wall clock integration (#16)
* Integrates roctracer values into wall-clock
* Fixed scoping + timemory roctracer
* Fixed data race in roctracer
* Synchronized HIP API on main thread
- Cache hip activity callbacks and execute on main thread
- Minor updates to transpose
* Debugging + MPI + transpose updates
* PTL + HSA and timemory + kernel timing
- PTL usage fixed HSA + timemory issues bc we could control the thread destruction
- Fixed laps counting in roctracer callbacks
* Ignore select HIP API types
- The ignored API types are ignored because there appears to be a bug
which causes the "end" callback to be labeled as begin
- hipDeviceEnablePeerAccess
- hipImportExternalMemory
- hipDestroyExternalMemory
* Tweaks to PTL config
* Timemory update + pid-prefix w/ mpi headers
- %pid%- prefix with mpi headers
- timemory submodule update
* CMake + critical trace + reorganize library source
- clang-tidy tweaks
- cmake function updates to use hosttrace_ prefix
- update gitignore
- cmake HOSTTRACE_MAX_THREADS option
- Formatting.cmake
- cleaned up MacroUtilities.cmake
- PTL submodule + usage
- tweak to Findroctracer.cmake
- MT transpose
- Updated PTL submodule
- Updated timemory submodule
- fix to hosttrace return value type if type not found
- reorganized library source code
- support for critical trace
* Remove bits/stdint-uintn.h headers
* Rename + config + depth + critical path
- rename hosttrace_timemory_data to instrumentation_bundles
- rename hosttrace_bundle_t to main_bundle_t
- rename bundle_t to instrumentation_bundle_t
- rework of configuration setup
- critical_trace write directly to file option
- tweaked depth calculation
- updated timemory submodule
- improved parallel support in roctracer callbacks
- working critical_trace
- perfetto device-critical-trace and host-critical-trace categories
- made transpose example parallel
- made parallel-overhead example a bit uneven
- relocated LTO activation
* Fixed duplicates in perfetto critical-trace
* reworked critical trace support
- substantial perf improvement (30-45 min -> 30 sec)
- changes to configuration (new and removed options)
* Removed "%pid%-" output prefix in mpi_gotcha
* Update timemory submodule