296 Commits

Author SHA1 Message Date
Sajina PK 04fb7e4fe7 RocJpeg cmake and document fixes (#157)
- Fix for rocjpeg sample cmake due to changes in the rocJPEG project
- Fix for rocprofiler-sdk version check - change the format
- Edits to docs for jpeg and vcn activity support - mention that these values may not be supported on all ASICs.

[ROCm/rocprofiler-systems commit: fad3a0d341]
2025-04-09 16:20:02 -04:00
David Galiffi 63ec6ec48d Additional AMD-SMI Updates (#149)
- Check AMDSMI header version to fix compilation failure with v2.0 header change
- Fix ROCM-SMI references in documentation and tests
- Check AMDSMI library version at runtime and output in logs
- Fix a possible exception occurring when an in-flight sample is outstanding while the component is shutting down.

[ROCm/rocprofiler-systems commit: 7bb45aba1c]
2025-03-31 11:07:50 -04:00
David Galiffi 70b456f2a3 Fix "ROCPROFSYS_USE_ROCM" runtime config setting. (#144)
[ROCm/rocprofiler-systems commit: b6b39af011]
2025-03-27 16:03:46 -04:00
Aleksandar Djordjevic 95503cde21 Disable RCCL, load libamdhip64.so (#150)
Disable RCCL and load libamdhip64.so as a fix for sw509497.

[ROCm/rocprofiler-systems commit: 2bad0e941b]
2025-03-27 16:02:17 +01:00
David Galiffi bd0eeb9555 Reapply "Upgrade ROCm-SMI to AMD SMI (#86)" (#147)
* Reapply "Upgrade ROCm-SMI to AMD SMI (#86)"

This reverts commit 9fcea73122.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 85bbea4954]
2025-03-25 17:31:27 -04:00
ajanicijamd df729548d0 Fix a RCCL initialization to avoid a deadlock (#136)
Also fixes: 

- crash while finalizing rocprof-sys-causal

[ROCm/rocprofiler-systems commit: 26bb604215]
2025-03-19 14:48:04 -04:00
David Galiffi 663e5b0766 Update libraries' SOVERSION to match other ROCm components (#98)
set SOVERSION to ${PROJECT_VERSION_MAJOR}

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 5fc495c1e7]
2025-03-07 10:18:36 -05:00
David Galiffi 92e439627e Fix logging error (#130)
When we create profile config with rocprofiler we log the counters being registered. However, this log was being skipped in certain cases.

[ROCm/rocprofiler-systems commit: eb0a969a9c]
2025-03-06 14:30:45 -05:00
Sohaib Nadeem 95a07edf0b Fix hardware counter summary files not being generated after profiling (#124)
- Register a cleanup function in tim::manager instance to write out data in
counter storages

- The counter_storage::write() calls in tool_fini happen after the storage is destroyed
which is too late for the write to happen.

- Adjust traits for counter_data_tracker

- Add MIN, MAX, VAR, STDDEV columns
- Remove DEPTH, UNITS, %SELF columns

- Update "add_validation_test" to test for the existence of output file(s).
- Added step to test perfetto output for `transpose-rocprofiler-sampling`
and `transpose-rocprofiler-binary-rewrite`

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 42922ec851]
2025-03-05 16:05:18 -05:00
Sohaib Nadeem 2304111f79 Fix an application crash when collecting performance counters with rocprofiler (#117)
* Add check to skip counter_storage::write() if internal storage field is destroyed.
* Output warning message if counter data is not available when trying to write out to Timemory

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 43f900d01e]
2025-02-27 14:34:52 -05:00
Sajina PK 1db0539c30 Add support for rocJPEG API tracing (#116)
- Add rocDecode API Tracing support using domain `rocjpeg_api` in ROCPROFSYS_ROCM_DOMAINS.
- Modify existing `videodecode` and `jpegdecode` ctests to verify API tracing
- Print Perfetto values for easy debugging in verbose mode
- Convert CMake error to a warning and skip building the "decode" examples if requirements are not found

[ROCm/rocprofiler-systems commit: 3bea1d8eac]
2025-02-25 21:14:14 -05:00
Sajina PK 572f9532ef JPEG Activity tracing in Perfetto (#108)
- Add JPEG activity track in perfetto trace
- Add JPEG decode tests to the examples
- Change existing videodecode test to include JPEG testing
- Rename videodecode test file to decode to include jpeg tests too
- Fix a bug in the test which checks for total activity of 0
- Disable rocDecode and rocJPEG samples from the github image files

[ROCm/rocprofiler-systems commit: 59d3399901]
2025-02-21 10:25:01 -05:00
Sajina PK fba64f8acd Add support for VA-API and rocDecode tracing (#92)
- VA API tracing using Timemory gotcha wrappers.
- rocDecode API tracing integration using callback to ROCPROFILER_CALLBACK_TRACING_ROCDECODE_API
- Updated videodecode ctest to validate rocDecode APIs in perfetto trace. 

[ROCm/rocprofiler-systems commit: 697d1ac02f]
2025-02-11 13:08:23 -05:00
David Galiffi 2c9d92be33 Remove remaining roctracer references (#82)
[ROCm/rocprofiler-systems commit: e437200e9e]
2025-02-07 23:27:58 -05:00
David Galiffi 9fcea73122 Revert "Upgrade ROCm-SMI to AMD SMI (#86)" (#100)
This reverts commit 8c5db3f1d8.

[ROCm/rocprofiler-systems commit: b3eee295dd]
2025-02-07 11:45:26 -05:00
cfallows-amd 8c5db3f1d8 Upgrade ROCm-SMI to AMD SMI (#86)
* Integrating amd-smi into rocprofiler-systems due to rocm-smi deprecation.
* No functionality changes to users other than naming conventions.
* New tracks available in perfetto- gpu busy percentage metrics now splits gfx busy into separate gfx, umc, and mm engine measurements.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 0c32dfd6bc]
2025-01-30 21:32:07 -05:00
Peter Park 3f9a3861ac Update copyright year to 2025 (#83)
[ROCm/rocprofiler-systems commit: 0a15d355e0]
2025-01-29 16:53:16 -05:00
Maarten Arnst 0447cfdc58 Update to KOKKOS_TOOLS_LIBS env var (#69)
[ROCm/rocprofiler-systems commit: 043a8010a9]
2025-01-29 16:53:15 -05:00
Pranjal Swarup 64bb1ea13f Merge proto files from multiprocess run into one file. (#63)
- Added script to merge multiprocess output automatically to one file when there are multiprocess proto files written into output folder
- Execute the merge multiprocess script from the rank 0 process
- Added the scripts folder path to env path, via setup-env.sh
- Installed merge_multiprocess_output.sh to /share/rocprofiler-systems/bin dir

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 0263e951ff]
2024-12-18 17:34:02 -05:00
Sajina PK 2d6b4d9988 Enable VCN tracing in Perfetto output (#65)
Enable VCN activity tracing on different instances from the GPU metrics fetched using rsmi_dev_gpu_metrics_info_get in the ROCm-SMI. library.

The tracing can be controlled with ROCPROFSYS_ROCM_SMI_METRICS by setting the value as vcn_activity, Currently this configuration takes the following values: busy, temp, power, mem_usage, vcn_activity.
By default, all the 5 values will be enabled.

Signed-off-by: Sajina P Kandy <Sajina.PuthalathKandy@amd.com>
Co-authored-by: Sajina Kandy <sputhala-amd@amd.com>

[ROCm/rocprofiler-systems commit: 3fa37c991e]
2024-12-18 15:56:48 -05:00
David Galiffi b29cfac106 Update to use rocprofiler-sdk (#55)
- Renames the CMake option "ROCPROFSYS_USE_HIP" to "ROCPROFSYS_USE_ROCM"
- Remove the "ROCPROFSYS_USE_ROCM_SMI option. Controlled with the "ROCPROFSYS_USE_ROCM" option, instead.
   - Runtime configuration can still toggle ROCPROFSYS_USE_ROCM_SMI to disable the sampling.
- Rename ROCPROFSYS_HIP_VERSION macro to ROCPROFSYS_ROCM_VERSION and remove blocks for `ROCPROFSYS_ROCM_VERSION < 60000`
- Remove ROCPROFSYS_USE_ROCTRACER and ROCPROFSYS_USE_ROCPROFILER
- Update test cases
- Update docker files and workflows to install cmake 3.21, which is required for the rocprofiler-sdk findPackage script.
- Removed rocm-6.2 from workflows due to a rocprofiler-sdk API change. 

[ROCm/rocprofiler-systems commit: 88aa2d3cbe]
2024-12-13 18:48:39 -05:00
David Galiffi b73bd13a86 Adding installer for Ubuntu 24.04 (#14)
* Add installers for ubuntu 24.04

* Formatting change to the ubuntu-focal and ubuntu-jammy workflows

* Initial Ubuntu 24.04 workflow - just build test

[ROCm/rocprofiler-systems commit: 398ea62629]
2024-12-11 19:36:04 -05:00
Pran Swarup 9e72fac06d Fix GPU resource data of GPU power and temperature is not present on … (#23)
* Fix GPU resource data of GPU power and temperature is not present on MI300A traces

[ROCm/rocprofiler-systems commit: 51446f715f]
2024-11-13 15:02:43 -05:00
David Galiffi 6a6fd7f0f9 OMPT Target Offload Support (#17)
- Porting from https://github.com/ROCm/omnitrace/pull/411
- Improve OMPT support
- Add OpenMP target example to testing
- Update Timemory submodule to use ROCm/Timemory rather than NERSC/Timemory
- Update `actions/upload-artifacts` to v4
- Standardize the `cmake_minimum_required` to 3.18.4 across workflows, project, and examples
- Updated Ubuntu 20.04 workflows

[ROCm/rocprofiler-systems commit: 7dce5926a7]
2024-11-07 16:49:32 -05:00
David Galiffi 7eaaa83024 Fix for proto files not being viewable in Perfetto UI (#16)
- Fix for proto files not being viewable in Perfetto UI
  - Ported from https://github.com/ROCm/omnitrace/pull/411

- Update Workflows

- Use V47 trace_processor_shell for certain OS releases.
  - RedHat 8, SUSE 15.5, and Ubuntu 20.04 are no longer compatible with the latest trace_processor_shell.
  - Incompatible version of GLIBC.

- Remove notes about Perfetto workaround in documentation.

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 21d9ab79fd]
2024-11-05 10:14:25 -05:00
David Galiffi 489eda995d Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed. 

Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"

---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: d07bf508a9]
2024-10-15 11:20:40 -04:00
ajanicijamd 218f8bcbea Update Perfetto and fix tests (#378)
Fix for "SWDEV-479652" - Perfetto-based tests are failing.

Updated version of perfetto submodule to v46.0.
Modified Omnitrace code that uses Perfetto, so it can compile.
Modified the testing code, so it can run the version of trace_processor_shell provided (v46.0).

---------

Signed-off-by: Aleksandar Janicijevic <Aleksandar.Janicijevic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 96d7b8f0ab]
2024-09-13 13:43:26 -04:00
jamesxu2 cdb49fc06d Fix SWDEV-473314 by avoiding empty unique_ptr dereference (#371)
- Revert part of https://github.com/ROCm/omnitrace/pull/78/commits/b134a68110b2c96ce11293d93ad56f38e211fd06
	modifying source/lib/omnitrace/library/components/roctracer.cpp

[ROCm/rocprofiler-systems commit: 395c330dad]
2024-08-27 10:20:48 -04:00
Jonathan R. Madsen e0fb6a2335 Executables append omnitrace library directory to LD_LIBRARY_PATH (#356)
- omnitrace-run, omnitrace-sample, and omnitrace-causal now automatically append the LD_LIBRARY_PATH with the directory containing the omnitrace libraries
  - this helps ensure that binary rewritten exes can resolve omnitrace-rt library location

[ROCm/rocprofiler-systems commit: 18833a0a5e]
2024-07-02 18:55:37 -05:00
Jonathan R. Madsen 8ad58c5d28 Build omnitrace-rt library (#355)
* Build omnitrace-rt library

- Explicitly build dyninstAPI_RT as omnitrace-rt so that the SONAME in the ELF is omnitrace-rt instead of dyninstAPI_RT
- Create symbolic link lib/omnitrace/libdyninstAPI_RT.so which points to lib/libomnitrace-rt.so
- Simplify build tree location of libomnitrace-rt.so since it is ../lib from the bin directory even in the build tree
- Update dyninst submodule with minor tweaks to dyninstAPI_RT/CMakeLists.txt

* Update source/lib/omnitrace-rt/cmake/platform.cmake

* Use ftpmirror.gnu.org instead of ftp.gnu.org

- in timemory and dyninst submodules
- minor .clang-tidy tweak

[ROCm/rocprofiler-systems commit: 0cf017251e]
2024-06-27 16:51:43 -05:00
Jonathan R. Madsen 1dbb923682 Workflow, submodules, and thread info Updates (#352)
* Update CI workflows

- use node20 workflow packages

* Update tests/source/CMakeLists.txt

- Use OMNITRACE_TRACE and OMNTRACE_PROFILE instead of perfetto/timemory

* Update timemory submodule

- argparse: requires -> required
- parse callbacks

* Update thread_info.cpp

- fix causal::delay::get_local usage

* Update timemory submodule

* Update kokkos submodule

- release 3.7.02

* Revert opensuse.yml and ubuntu-bionic.yml to use node16 workflows

* Update docs.yml

[ROCm/rocprofiler-systems commit: 219b2e988e]
2024-06-20 17:47:31 -05:00
David Galiffi a8595dc524 Removing static version asserts. (#347)
It is causing failures on our internal builds

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: f9ae806b78]
2024-06-15 12:33:08 +05:30
ajanicijamd 15d4e8d0b9 Patch for omnitrace-sample crash with HIP API. (#344)
Fix HIP-API CTest failure

Check if stack is empty before popping data off of it.

Signed-off-by: Aleksandar Janicijevic <Aleksandar.Janicijevic@amd.com>

[ROCm/rocprofiler-systems commit: f0bd9126a5]
2024-06-07 15:23:18 -04:00
Jonathan R. Madsen ab551e7eb7 Remove Critical Trace Support (#327)
* Delete core critical-trace files

* Update docs and README

* Update workflows

* Update testing

* Update cmake

* Remove critical trace usage in source code

* Update source/docs/critical_trace.md

- fix spelling

* Formatting

* Update bin/omnitrace-avail/avail.cpp

- statically allocate shared pointers for timemory manager and hash id/aliases to prevent use-after-free errors

[ROCm/rocprofiler-systems commit: 9499e2f521]
2024-04-23 09:35:44 -05:00
David Galiffi 2a42e3abf0 Updated links to point to the ROCm organization. (#337)
Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: b81db80926]
2024-04-23 01:44:02 -05:00
Jonathan R. Madsen 25ff5e3891 OMNITRACE_ROCM_SMI_METRICS (#331)
* OMNITRACE_ROCM_SMI_METRICS

- configuration variable OMNITRACE_ROCM_SMI_METRICS for specifying which rocm-smi metrics to collect
- auto-disable metric collection when rsmi_dev_X_get returns RSMI_STATUS_NOT_SUPPORTED

* Bump version to 1.11.1

* Python formatting

* Update python/libpyomnitrace.cpp

- fix usage of substr (ignored return value)

* Update python/gui/source/gui.py

- Fix E721
  - do not compare types, for exact checks use `is` / `is not`, for instance checks use `isinstance()`

[ROCm/rocprofiler-systems commit: 15127c0d43]
2024-02-08 07:06:23 -06:00
Jonathan R. Madsen 8c8caaa1d9 Fix omnitrace-avail component list (#328)
* Fix omnitrace-avail component list

- remove omnitrace components from `omnitrace-avail -C` since these are no-ops in OMNITRACE_TIMEMORY_COMPONENTS

* Fix omnitrace-avail-filter-wall-clock-available test

[ROCm/rocprofiler-systems commit: 77d52814e9]
2024-01-10 20:00:46 -06:00
Jonathan R. Madsen 06c47383cc Fix thread-limit bug in roctracer (#326)
Update roctracer.cpp

- fix call to hip_exec_activity_callbacks when more runtime threads than compile time max

[ROCm/rocprofiler-systems commit: edd6f57cf3]
2024-01-10 19:10:45 -06:00
Jonathan R. Madsen d4ac1ed7ea Fix cpack on SLES (#324)
Update lib/core/gpu.cpp

- use const std::array instead of constexpr std::array due to internal compiler errors on systems with older GCC compilers

[ROCm/rocprofiler-systems commit: adefde707c]
2024-01-10 09:38:09 -06:00
Ben Richard 857eee18c2 Deprecate OMNITRACE_USE_PERFETTO, OMNITRACE_USE_TIMEMORY (#306)
* Rename OMNITRACE_USE_PERFETTO to OMNITRACE_TRACE

* Rename OMNITRACE_USE_TIMEMORY to OMNITRACE_PROFILE

* Revert change to Perfetto.cmake

* Fix formatting

clang-format-11 was complaining about formatting

[ROCm/rocprofiler-systems commit: 5de4163d66]
2024-01-10 07:20:54 -06:00
Tal Ben-Nun 5e4d7f7f84 Add option to skip barrier marker events in traces (#320)
* Add option to skip barrier marker events in traces

* Formatting

* Apply review suggestions

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* clang-format

* Formatting

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-systems commit: 82cbe3f306]
2024-01-10 07:17:32 -06:00
Jonathan R. Madsen 86946e95e2 HIP API backtraces (#323)
* Update lib/core/config.cpp

- Add OMNITRACE_ROCTRACER_HIP_API_BACKTRACE option

* Update lib/omnitrace/library/roctracer.cpp

- support perfetto debug annotation of backtrace in HIP API call

* Fix backtrace resolution and ordering in UI (#1)

* Fix backtrace resolution for non-omnitrace libraries

* Nicer Perfetto UI on long backtraces

* Make Perfetto annotation consistent

* clang-format

---------

Co-authored-by: Tal Ben-Nun <tbennun@users.noreply.github.com>

[ROCm/rocprofiler-systems commit: 608287ddad]
2024-01-10 06:15:32 -06:00
Jonathan R. Madsen b5bdba12e4 Roctracer flush activity fix + perfetto.cfg (#317)
* Fix roctracer_flush_activity

- invoke roctracer_flush_activity() before disabling domains

* create comp::roctracer::flush()

- real issue was the global state when roctracer_flush_activity() was called

* formatting

* Update lib/omnitrace/library/components/roctracer.hpp

- provide definition of comp::roctracer::flush when OMNITRACE_USE_ROCTRACER is not defined

* omnitrace.cfg -> perfetto.cfg

- rename provided perfetto config file (omnitrace.cfg) to perfetto.cfg to avoid confusion

* Update lib/core

- gpu.hpp: defines for OMNITRACE_USE_{HIP,ROCTRACER,ROCPROFILER,ROCM_SMI}
- gpu.cpp
  - include core/hip_runtime.hpp
  - fix serialization of hipDeviceProp_t
- add hip_runtime.hpp
  -  ensure proper inclusion of hip_runtime.h
- add rccl.hpp
  - ensure proper inclusion of rccl.h

* Update lib/omnitrace/library

- rcclp.cpp
  - update includes for rccl
- roctracer.hpp
  - update includes for hip_runtime
- components/comm_data.hpp
  - update includes for rccl
- components/rcclp.hpp
  - update includes for rccl

* Update bin/omnitrace-avail/avail.cpp

- update includes for hip_runtime

* Update examples/rccl/CMakeLists.txt

- fix find_package for rccl when CI enabled

* Update CMakeLists.txt

- set cmake policy CMP0135 to NEW for cmake >= 3.24
  - Enable DOWNLOAD_EXTRACT_TIMESTAMP with ExternalProject_Add + URL download method

* Update timemory submodule

* Update pybind11 submodule

* Update pybind11 submodule

* Update lib/core/rccl.hpp

- include rccl.h only if OMNITRACE_USE_RCCL > 0

* Update lib/core/{gpu,hip_runtime}.hpp

* Update lib/core/gpu.cpp

- reintroduce some ppdefs

* Update lib/core/gpu.cpp

- fix ifdef on OMNITRACE_HIP_VERSION

* Update lib/core/gpu.cpp

- fix static assert for OMNITRACE_HIP_VERSION_MINOR when HIP version 4.x or older (unreliable minor versions)

* Update lib/core/gpu.cpp

- fix ifdef on OMNITRACE_HIP_VERSION

* Update lib/core/config.cpp

- disable OMNITRACE_PERFETTO_COMBINE_TRACES by default

* Update lib/core/perfetto.cpp

- if unable to open perfetto temp file, return the ReadTraceBlocking()

* Update lib/core/config.*

- flush tmpfile before closing

[ROCm/rocprofiler-systems commit: 7bc50f5a0a]
2024-01-10 05:02:22 -06:00
Jonathan R. Madsen a1b11b94f0 Dynamic expansion of thread data (#294)
* Tests for exceeding OMNITRACE_MAX_THREADS

- tests which exceeds OMNITRACE_MAX_THREADS value for thread creation

* CMake Formatting.cmake update

- include source files in /tests/source directory

* Add unknown-hash= to OMNITRACE_ABORT_FAIL_REGEX

- fail if a timemory hash is not resolved to a name

* Tests for exceeding OMNITRACE_MAX_THREADS

- update

* omnitrace-sample update

- remove env disabling of critical-trace and process-sampling

* core library update

- make_unique in concepts.hpp
- add OMNITRACE_USE_ROCM_SMI to "process_sampling" category
- remove forced disabling of critical-trace in sampling mode
- parentheses for OMNITRACE_PREFER
- use tim::get_hash_id instead of tim::get_combined_hash_id

* core library update (containers)

- added aligned_static_vector.hpp
  - similar to static_vector.hpp but attempts to align to cache line size
- alignment template parameter for stable_vector
- added missing aliases in static_vector
  - consistent with aligned_static_vector aliases

* thread_info update

- track the peak number of threads created
- thread_info::get_peak_num_threads() returns the peak number of threads

* thread_data update

- generic thread_data inherits from base_thread_data
- thread_data reworked to support dynamic expansion
- base_thread_data updated to invoke private_instance() function
- thread_data<optional<T>> uses stable_vector aligned to cache line width
- thread_data<identity<T>> uses stable_vector aligned to cache line width
- thread_data for optional and identity provide private private_instance function + friend to base_thread_data
- component_bundle_cache<T> is now thread_data<component_bundle_cache_impl<T>>

* causal update

- thread_data<T>::instances -> thread_data<T>::instance(construct_on_thread{ ... })
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- update progress_bundle usage to new thread_data API

* backtrace/backtrace_metrics component update

- backtrace_metrics update
  - update to new thead_data API
  - add thread CPU time row in perfetto
  - fix potential bug when rusage categories are disabled
  - fix bug in operator-= not subtracting cpu time of rhs
- backtrace update
  - skip all child call-stack below 'tim::openmp::' if sampling_keep_internal = false

* pthread_gotcha component update

- pthread_gotcha::shutdown() invokes pthread_create_gotcha::shutdown()

* pthread_create_gotcha component update

- minor tweak to {start,stop}_bundle functions: pass in thread id
- update to new thread_data API
- track native handles of internal threads
- implement system with pthread_kill to stop dangling bundles

* rocprofiler/roctracer component update

- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()

* critical trace (library) update

- update to new thread_data API
- tim::get_combined_hash_id -> tim::get_hash_id

* coverage update

- update to new thread_data API

* tasking update

- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()

* roctracer update

- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()

* rocm_smi update

- update to new thread_data API

* runtime.cpp update

- update to new thread_data API

* sampling.cpp update

- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()

* ompt.cpp update

- invoke pthread_gotcha::shutdown before invoking OMPT finalize function
  - this prevents signals from being delivered to OpenMP threads

* tracing.hpp and tracing.cpp update

- replace get_timemory_hash_{ids,aliases} functions with copy_timemory_hash_ids function
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()
- tim::get_combined_hash_id -> tim::get_hash_id
- improvements to + error checking in thread_init function

* library.cpp update

- move copying timemory hash id/aliases to tracing.cpp
- update to new thread_data API
- loop over max_supported_threads (constexpr) -> loop over thread_info::get_peak_num_threads()

* Update BuildSettings.cmake

- add -Wno-interference-size to suppress warning about use of std::hardware_destructive_interference

* Update fork example

- improve scheme for waiting on child processes via waitpid instead of wait
- support running main routine multiple times
- push/pop regions in child process

* Update lib/common/defines.h.in

- allow use to specify misc values via -D <name>=<value>
  - OMNITRACE_CACHELINE_SIZE
  - OMNITRACE_CACHELINE_SIZE_MIN
  - OMNITRACE_ROCM_MAX_COUNTERS
- remove unused defines
  - OMNITRACE_ROCM_LOOK_AHEAD
  - OMNITRACE_MAX_ROCM_QUEUES

* Update rocprofiler.hpp

- OMNITRACE_MAX_ROCM_COUNTERS -> OMNITRACE_ROCM_MAX_COUNTERS

* Update aligned_static_vector

- set cacheline_align_v from max of OMNITRACE_CACHELINE_SIZE and OMNITRACE_CACHELINE_SIZE_MIN

* Update tracing.cpp

- acquire locks for updating main hash ids/aliases
- only propagate ids/aliases when finalizing

* Update pthread_create_gotcha.cpp

- make sure hash for "start_thread" exists on main thread

* Update causal end to end tests

- if OMNITRACE_BUILD_NUMBER is 1, set OMNITRACE_VERBOSE=0

[ROCm/rocprofiler-systems commit: 518c83e0f9]
2023-10-16 18:04:47 -05:00
Jonathan R. Madsen 1502968c67 Fix roctracer data race (#309)
- roctracer_type_mutex was per-thread, causing lack of sync b/t callback and activity

[ROCm/rocprofiler-systems commit: 227980f32b]
2023-09-27 18:06:16 -05:00
Jonathan R. Madsen 7499945cb9 Reduce release packaging (#300)
* Bump version to 1.10.3

* Drop releases for ROCm < 5.3

- ROCm is no longer providing release for Ubuntu 18.04 starting with 5.3 so omnitrace is dropping support for Ubuntu 18.04 + ROCm
- Dropping ROCm 5.2 releases for Ubuntu 20.04
- Dropping ROCm 5.2 releases for OpenSUSE 15.4

* Update redhat workflow

- Test RedHat 9.1 + ROCm 5.5
- Test RedHat 9.1 + ROCm 5.6

* Update ubuntu-focal workflow

- drop ROCm 5.2 testing
- add ROCm 5.6 testing

* Update Findroctracer.cmake

- provide /opt/amdgpu to HINTS/PATHS for drm and drm_amdgpu libraries

* Update Findrocprofiler.cmake

- prefer librocprofiler64.so.1

* Update librocprofiler64.so to librocprofiler64.so.1

- search for the SOVERSION 1 library of librocprofiler64.so if ROCm > 5.5.0

* Update Findrocprofiler.cmake

- link to libpciaccess for ROCm 5.5.0

* Update redhat CI workflow

- install libpciaccess for rocm CI

* Update cpack workflow

- Remove all RHEL 9.0 packaging
- Remove all packaging for ROCm 5.3 on OSes supporting where releases are provided for 5.4, 5.5, and 5.6

* Update ubuntu focal workflow

- remove rocm 5.3 jobs

[ROCm/rocprofiler-systems commit: 1216fd99a7]
2023-09-12 20:19:26 -05:00
Jonathan R. Madsen e135d3c6eb Sampling post-processing Perfetto fix (#298)
sampling post-processing perfetto fix

- avoid creating overflow sampling perfetto tracks when there is no data
- fix the parent region begin/end timestamps for the sampling tracks

[ROCm/rocprofiler-systems commit: 5276c957fb]
2023-07-06 02:40:49 -05:00
Jonathan R. Madsen c6929f545d Perfetto annotation from timemory components (#289)
* Annotate perfetto with timemory component data

- support perfetto annotations via timemory component data, e.g. use PAPI component for exact HW counter annotations

* Tests for perfetto annotation via timemory data

* Update omnitrace-instrument

- remove --default-components argument as this overrides any components set in configuration file
- required by perfetto annotation via timemory data tests

* filter unavailable timemory components

- filter out unavailable timemory components before attempting to invoke the annotate operation on the bundle

* update annotate tests

- account for no PAPI support

* update lulesh-timemory test

- replace '-d wall_clock peak_rss' with '--env OMNITRACE_TIMEMORY_COMPONENTS="wall_clock peak_rss"'

* annotate tests update

- fix misnamed test

* annotate tests update

- restrict binary rewrite to run function to force instrumentation despite heuristics

* annotate tests update

- print {available,overlapping,excluded,instrumented} functions during binary rewrite

* annotate tests update

- add allow-overlapping flag

* Support PAPI with CAP_SYS_ADMIN

- do not disable PAPI if perf_event_paranoid > 2 but has CAP_SYS_ADMIN capability

[ROCm/rocprofiler-systems commit: 1aca8c177b]
2023-06-19 19:18:04 -05:00
Jonathan R. Madsen a0812bfa0b Fix rocprofiler usage in ROCm >= 5.5.x (#288)
Fix rocprofiler usage in ROCm >= 5.5.x

- starting with ROCm 5.5.0, rocprofiler throws exception if OnLoad + dlopen librocprofiler
- CI skipped for this PR since CI does not support GPU usage (tested locally)

[ROCm/rocprofiler-systems commit: 223536896b]
2023-06-15 23:28:45 -05:00
Jonathan R. Madsen 97011ea642 Fix thread index values (#287)
* Update PTL

- PTL submodule waits for threads to start before proceeding

* Initialize perfetto after init_bundle

- perfetto thread creation after pthread_create wrapped

* backtrace component update

- exclude gotcha call-tree

* callchain component update

- callchain::get sorts based on timestamp
- callchain::sample supports duplicate IPs (recursion)

* Bump version to 1.10.1

[ROCm/rocprofiler-systems commit: de9f0e4c10]
2023-06-15 22:37:33 -05:00