* Rocpd part 2, caching
* Fix shadowed variables
* backward compatibility
* Fixed designated initializers
* Fix timemory include
* Remove benchmark & Fix build issues for rhel
* Add missing bracket
* Fix shadowing and pedantic
* Fix pedantic pt2
* Fix duplicated SDK calls
* Add decay in get_size_impl
* Rename sample cache to trace cache
* Add cache storage supported types
* Resolving track naming in sampling module
* fix sampling of flushing thread
* fix sampling of flushing thread 2
* throw exception upon store while buffer storage is not running
* Prevent fork crashing
* Fix rebase issue
* Applied suggestions from code review
* Change flushing thread to use PTL
* Fix agent creation order
* Fix stream id ci throw
* Remove force setup of rocprofiler-sdk
* Code cleanup
* Change initialization for agent
* Add missing namespace
* Fix the mismatch within the tool_agent->device_id
* Switch from using handle to use agent type index
* Fix pmc info comparator in metadata registry
---------
Co-authored-by: Aleksandar <aleksandar.djordjevic@amd.com>
Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com>
Co-authored-by: Marjan Antic <marantic@amd.com>
- openmp-target: add runtime rpath for libomptarget and update tests
- Handle events not associated with a HIP Stream
- Kernels from OpenMP target offload are not associated with a HIP stream. Fix handling with the callback record's stream_id is 0
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: c424dac261]
Adjusted the regex to filter out new "PAGE*" domains added by the
SDK. This was causing the passing regex to fail.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: de6120daf9]
- Corelate memory_copy and kernel_dispatch events with their HIP stream_id and add stream_id as an annotation in Perfetto.
- By default, group memory_copy and kernel_dispatch events in Perfetto output by their stream_id.
- Add option, with the configuration setting ROCPROFSYS_ROCM_GROUP_BY_QUEUE, to group by HSA queue instead.
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 4b4a846b58]
* Fix to find MPI symbols from undefined symbols
* Moved condition checks before
* Fixing format
---------
Co-authored-by: Anuj Shukla <anujshuk@amd.com>
[ROCm/rocprofiler-systems commit: 67ec52b523]
* Conditionally include backtraces in ROCPROFSYS_THROW based on verbosity
Modify ROCPROFSYS_THROW to only include backtraces when:
debug mode is enabled, OR
verbose level is >= 2, OR
running in CI environment
* Fix formatting errors
[ROCm/rocprofiler-systems commit: b0ff07b4fe]
On AMD-SMI, in rocm 7.0, vcn_activity and jpeg_activity will not be reported when XCP (partition) stats, vcn_busy and jpeg_busy, are available. This causes the activity tracking to fail. The fix is to read the busy values when activity values are not supported.
For issue: SWDEV-536439
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: e3741f678b]
* SWDEV-535445: rocprof-sys-avail shows jpeg_activity even when unsupported
* Added vcn tracking
* jpeg and vcn description now includes supported gpus
* Add getter methods per device to check vcn and jpeg support
Add logic to check if vcn activity and vcn busy values are supported for each device.
Add logic to check if jpeg activity and jpeg busy values are supported for each device.
Co-authored-by: Sajina P Kandy <sputhala@amd.com>
* Add getter methods per device to check vcn and jpeg support (#228)
* Formatting
* Variable fix
* List of supported GPUs are now ordered
* Removed the ability to see which gpu supports jpeg and vcn activity to reduce clutter
* Formatting
* Testing for busy support
* jpeg and vcn only show if supported
* Removed commented code
* Formatting
* Applied amd_smi cpp/hpp fixes
* Added break condition for xcp loop
* Modified loops for efficiency
* Removed unneccessary macro
* Removed unneccessary includes
---------
Co-authored-by: Sajina Kandy <sputhala@amd.com>
Co-authored-by: Sajina PK <Sajina.PuthalathKandy@amd.com>
[ROCm/rocprofiler-systems commit: 0380cf58ba]
Update Dyninst submodule
Refactoring of build scripts to build TBB, Boost, ElfUtils, and LibIberty, since Dyninst build scripts no longer do.
Workflows are now building Dyninst and its dependencies.
---------
Co-authored-by: marantic-amd <marantic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 96df9b6d3e]
- Add support for RCCL API tracing through rocprofiler-sdk.
- Refactored the comm_data code to use the SDK RCCL_API callbacks.
- Add a runtime version check for SDK to gate callback enablement, rather than just the compile-time check.
- Fixed: SAMPLING_TIMEOUT was not being handled correctly in add_test.
[ROCm/rocprofiler-systems commit: af77d93f75]
* SWDEV-507117: Unify OMP Target Offload Events into a Single Perfetto Timeline Row
* Fixed warning and format
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: c5507e3740]
- Move the MPI gotcha functionality from Timemory to the repo.
- Add the PMPI Fortran MPI functions to the existing mpi gotcha handle.
[ROCm/rocprofiler-systems commit: 4fcd8cc78d]
* SWDEV-533856: Handle dynamic event for HIP api for perfetto
* Refactor: Generalize function using template
* Format Source
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: abecaa8bf8]
* Removed advanced category from ROCPROFSYS_AMD_SMI_METRICS to have this property visible with rocprof-sys-avail.
[ROCm/rocprofiler-systems commit: 4c7560c78c]
- Path to merge script not found unless user explicitly sources "share/rocprofiler-systems/setup-env.sh" to setup PATHs.
- Instead, let's derive the path when the application loads and use it when executing the helper script
- Rename script to rocprof-sys-merge-output.sh.
- Change install folder to <prefix>/libexec/rocprofiler-systems based on dev-ops feedback.
- Updated PATH variable in the modulefile and source scrtipt.
- For SWDEV-528101
[ROCm/rocprofiler-systems commit: adc66956b0]
- Fix overlapping VCN and JPEG activity values in Perfetto output.
- Modify the storage of the activity values to be more efficient.
[ROCm/rocprofiler-systems commit: 99a411fe52]
- Addresses concern that device metric tracks are still shown in Perfetto trace file even when only -H is specified to rocprof-sys-sample (and vice versa).
- Update sampling call-stack docs.
[ROCm/rocprofiler-systems commit: 8ae6651357]
Fixes a bug where all the `ROCPROFSYSE_AMD_SMI_METRICS` values were being recorded by default.
Fixes bug with the 'all' and 'none' values giving an exception when specified for `ROCPROFSYSE_AMD_SMI_METRICS`.
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 8d48048bd3]
- Remove tooling initialization from rocprofiler_configure:
when rocprofiler configure is called from __hip_module_ctor
(which in turn is called as a global constructor when loading shared
libraries or before main in a hip program), initializing tooling
in it can cause problems because it is too early to do some of the tasks
that it involves (e.g. opening shared libraries, creating threads).
Instead, we rely on rocprofsys_main to initialize tooling later.
- Skip rocprofiler_configure if ROCPROFSYS_PRELOAD is not set since
preload is required for tooling (such as perfetto, which is used by
the rocprofiler callbacks) to be initialized.
- Revert RCCL initialization changes: These are no longer needed since rocprofsys_init_tooling_hidden will not
be called from rocprofiler_configure
- Force rocprofiler_configure in rocprofsys_init_tooling_hidden if it hasn't been
called through __hip_module_ctor global constructor
[ROCm/rocprofiler-systems commit: 0e535daa93]
- Fix for rocjpeg sample cmake due to changes in the rocJPEG project
- Fix for rocprofiler-sdk version check - change the format
- Edits to docs for jpeg and vcn activity support - mention that these values may not be supported on all ASICs.
[ROCm/rocprofiler-systems commit: fad3a0d341]
* Change terminal color back to normal after printing usage
* Format source
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: ee11f5b206]
- Check AMDSMI header version to fix compilation failure with v2.0 header change
- Fix ROCM-SMI references in documentation and tests
- Check AMDSMI library version at runtime and output in logs
- Fix a possible exception occurring when an in-flight sample is outstanding while the component is shutting down.
[ROCm/rocprofiler-systems commit: 7bb45aba1c]
When we create profile config with rocprofiler we log the counters being registered. However, this log was being skipped in certain cases.
[ROCm/rocprofiler-systems commit: eb0a969a9c]
- Register a cleanup function in tim::manager instance to write out data in
counter storages
- The counter_storage::write() calls in tool_fini happen after the storage is destroyed
which is too late for the write to happen.
- Adjust traits for counter_data_tracker
- Add MIN, MAX, VAR, STDDEV columns
- Remove DEPTH, UNITS, %SELF columns
- Update "add_validation_test" to test for the existence of output file(s).
- Added step to test perfetto output for `transpose-rocprofiler-sampling`
and `transpose-rocprofiler-binary-rewrite`
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 42922ec851]
* Add check to skip counter_storage::write() if internal storage field is destroyed.
* Output warning message if counter data is not available when trying to write out to Timemory
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 43f900d01e]
- Add rocDecode API Tracing support using domain `rocjpeg_api` in ROCPROFSYS_ROCM_DOMAINS.
- Modify existing `videodecode` and `jpegdecode` ctests to verify API tracing
- Print Perfetto values for easy debugging in verbose mode
- Convert CMake error to a warning and skip building the "decode" examples if requirements are not found
[ROCm/rocprofiler-systems commit: 3bea1d8eac]
- Add JPEG activity track in perfetto trace
- Add JPEG decode tests to the examples
- Change existing videodecode test to include JPEG testing
- Rename videodecode test file to decode to include jpeg tests too
- Fix a bug in the test which checks for total activity of 0
- Disable rocDecode and rocJPEG samples from the github image files
[ROCm/rocprofiler-systems commit: 59d3399901]
- Updated Timemory module.
- Fixes a crash when running rocprof-sys-avail -G without explicitly providing -F <format>. The default value of "txt" was not being used.
- Define "choices" before "default" when defining the "--config-format" argument in the parser.
[ROCm/rocprofiler-systems commit: 3833c8d162]