- Integrate rocprofiler-systems with rocprofiler-sdk-rocpd to fetch schema
- If rocprofiler-sdk-rocpd is not availabe, use embedded schema files. With this we provide rocpd format support even if ROCm is not available
- Include detection in CMake if rocprofiler-sdk-rocpd package is available (and valid), and build database class upon that
- Update embedded schema that is used as a fallback.
- Update some validation tests to account for schema changes.
* Rename "corr_id" to "stack_id" in Perfetto annotations to match new naming in schema.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* correlation_id.ancestor was not added until ROCPROFILER_VERSION 1.0
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Consolidate CTests to tests/ folder
* Remove comment
* Consolidate CTests to tests/ folder
* Remove comment
* Separate source code and test code for thread-limit into appropriate folders
* Remove sleeper.cpp and instead use linux sleep cmd
* Merge python-console tests into python-tests
* Change how cache manager handles child process trace cache
* Sampling and backtrace metrics to cache
* Apply cmake formatting
* Fix parsing of metadata json
* Code clean up
* Fix build nlohmann json from source
* Fix storage parsed finished callback
* Revert sampling for child process
* Change cache file name generating
* Fix thread start stop
* Fix process start end timestamp
* Applied suggestions from code review
* Try with late start of flushing task thread
* Change dockerfiles for ci
* Revert changes on github workflows
* Remove json_fwd.hpp include
* fix dump
* Build nlohmann/json by default
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Update location of build artifacts for nlohmann/json
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Revert use_output_suffix
* Remove unused logs
* Fix cache store inside counter due to structure change
* Remove decode tests from debian ci
* Fix issue where all databases have the same UUID (#1499)
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
* Removing the cpack and install steps to save space
* Revert "Remove decode tests from debian ci"
This reverts commit ddabf6dd142dcf438e6b8997b8abe86f2c868468.
* Revert "Removing the cpack and install steps to save space"
This reverts commit 973da3a1ba99d99d529af5269d30e177092f9bfa.
* Add prepare-runner job as dependency to clean up the space
* Fix formatting
* Free up even more space
* Remove verbose for workflows
* remove hw_counters from ext_data
* move space clean up inside container
* try to remove external folder to free up space
* Check space
* Refactor Cleanup to it's own step
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
* Fix for thread limit tests. Which are failing due to exceeding the number of threads allowed.
Signed-off-by: Anuj-Kumar Shukla <AnujKumar.Shukla@amd.com>
* Update CMakeLists.txt
* Stopping thread creation after max thread limit
* Adressed review comments
* Update projects/rocprofiler-systems/tests/source/CMakeLists.txt
---------
Signed-off-by: Anuj-Kumar Shukla <AnujKumar.Shukla@amd.com>
Co-authored-by: anujshuk-amd <anujshuk@amd.com>
* Add OMPT to ROCpd
* Use correct category
* Added wrapper functions for future control
* Formatting
* Fix naming
* Comment change
* Remove ompt_get_cb_args
* Switched to using region_sample for OMPT
* Remove relic function
* Remove get_use_rocpd that was used in this pr (one still remains)
* Rename ompt_get_args_string and reuse in tool_tracing_callback_stop
* Make lock init and destroy cb instant
* [Prototype] ROCPD Name fix
* [Prototype] ROCPD Name fix P1
* [Prototype] ROCPD Name fix P2
* ROCPD Name fix
* Var name changes
* Rewrite cb overwrite to single function
* [Important] Use parallel_data as key for parallel callback map
* Fix workflow failure
* Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0
* Add missing ROCPROFILER_VERSION check
* Improve readability
* Make ompt storage maps thread local
* Part 1: Variable name fix, memory cleanup, and fixed asserts
* Part 2: Add comments
* Part 3: Add CI_THROW
* Part 4: Formatting
* Part 5: Move #include to cpp
This PR fixes a segmentation fault seen when running rocprof-sys-sample with multi-process OpenMP/HIP applications.
The crash was caused by missing libomptarget.so on the runtime loader path or incorrect LD_PRELOAD settings.
Fixes SWDEV-552804
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
* Add ROCPROFSYS_ROOT to the env for sample
* Add env for causal
* Add env for instrument
* Check for null and address memory leak
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Detect SELinux mode and fail-fast
* Detect SELinux status by reading /sys/fs/selinux/enforce during initialization.
* Fix the verbose mode for HIP Stream events
* Add more information in the logs
Add information to the user about how to change the setting
* Move amd-smi to use caching mechanism
* Add VCN and JPEG activity to rocpd
* Switch cpu_freq to use caching mechanism
* Different approach with xcp activity & applied suggestions from code review
* Applied suggestions from code review
* Fix shadowing
* Applied suggestions from code review
- Update minimum_cmake_required to match version used in CI
- We should match the minimum version that we test against
- Ensure ".S" files are treated as assembly.
* Rocpd part 2, caching
* Fix shadowed variables
* backward compatibility
* Fixed designated initializers
* Fix timemory include
* Remove benchmark & Fix build issues for rhel
* Add missing bracket
* Fix shadowing and pedantic
* Fix pedantic pt2
* Fix duplicated SDK calls
* Add decay in get_size_impl
* Rename sample cache to trace cache
* Add cache storage supported types
* Resolving track naming in sampling module
* fix sampling of flushing thread
* fix sampling of flushing thread 2
* throw exception upon store while buffer storage is not running
* Prevent fork crashing
* Fix rebase issue
* Applied suggestions from code review
* Change flushing thread to use PTL
* Fix agent creation order
* Fix stream id ci throw
* Remove force setup of rocprofiler-sdk
* Code cleanup
* Change initialization for agent
* Add missing namespace
* Fix the mismatch within the tool_agent->device_id
* Switch from using handle to use agent type index
* Fix pmc info comparator in metadata registry
---------
Co-authored-by: Aleksandar <aleksandar.djordjevic@amd.com>
Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com>
Co-authored-by: Marjan Antic <marantic@amd.com>
- openmp-target: add runtime rpath for libomptarget and update tests
- Handle events not associated with a HIP Stream
- Kernels from OpenMP target offload are not associated with a HIP stream. Fix handling with the callback record's stream_id is 0
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: c424dac261]
Adjusted the regex to filter out new "PAGE*" domains added by the
SDK. This was causing the passing regex to fail.
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: de6120daf9]
- Corelate memory_copy and kernel_dispatch events with their HIP stream_id and add stream_id as an annotation in Perfetto.
- By default, group memory_copy and kernel_dispatch events in Perfetto output by their stream_id.
- Add option, with the configuration setting ROCPROFSYS_ROCM_GROUP_BY_QUEUE, to group by HSA queue instead.
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 4b4a846b58]
* Fix to find MPI symbols from undefined symbols
* Moved condition checks before
* Fixing format
---------
Co-authored-by: Anuj Shukla <anujshuk@amd.com>
[ROCm/rocprofiler-systems commit: 67ec52b523]
* Conditionally include backtraces in ROCPROFSYS_THROW based on verbosity
Modify ROCPROFSYS_THROW to only include backtraces when:
debug mode is enabled, OR
verbose level is >= 2, OR
running in CI environment
* Fix formatting errors
[ROCm/rocprofiler-systems commit: b0ff07b4fe]
On AMD-SMI, in rocm 7.0, vcn_activity and jpeg_activity will not be reported when XCP (partition) stats, vcn_busy and jpeg_busy, are available. This causes the activity tracking to fail. The fix is to read the busy values when activity values are not supported.
For issue: SWDEV-536439
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: e3741f678b]
* SWDEV-535445: rocprof-sys-avail shows jpeg_activity even when unsupported
* Added vcn tracking
* jpeg and vcn description now includes supported gpus
* Add getter methods per device to check vcn and jpeg support
Add logic to check if vcn activity and vcn busy values are supported for each device.
Add logic to check if jpeg activity and jpeg busy values are supported for each device.
Co-authored-by: Sajina P Kandy <sputhala@amd.com>
* Add getter methods per device to check vcn and jpeg support (#228)
* Formatting
* Variable fix
* List of supported GPUs are now ordered
* Removed the ability to see which gpu supports jpeg and vcn activity to reduce clutter
* Formatting
* Testing for busy support
* jpeg and vcn only show if supported
* Removed commented code
* Formatting
* Applied amd_smi cpp/hpp fixes
* Added break condition for xcp loop
* Modified loops for efficiency
* Removed unneccessary macro
* Removed unneccessary includes
---------
Co-authored-by: Sajina Kandy <sputhala@amd.com>
Co-authored-by: Sajina PK <Sajina.PuthalathKandy@amd.com>
[ROCm/rocprofiler-systems commit: 0380cf58ba]
Update Dyninst submodule
Refactoring of build scripts to build TBB, Boost, ElfUtils, and LibIberty, since Dyninst build scripts no longer do.
Workflows are now building Dyninst and its dependencies.
---------
Co-authored-by: marantic-amd <marantic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: 96df9b6d3e]
- Add support for RCCL API tracing through rocprofiler-sdk.
- Refactored the comm_data code to use the SDK RCCL_API callbacks.
- Add a runtime version check for SDK to gate callback enablement, rather than just the compile-time check.
- Fixed: SAMPLING_TIMEOUT was not being handled correctly in add_test.
[ROCm/rocprofiler-systems commit: af77d93f75]
* SWDEV-507117: Unify OMP Target Offload Events into a Single Perfetto Timeline Row
* Fixed warning and format
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: c5507e3740]
- Move the MPI gotcha functionality from Timemory to the repo.
- Add the PMPI Fortran MPI functions to the existing mpi gotcha handle.
[ROCm/rocprofiler-systems commit: 4fcd8cc78d]
* SWDEV-533856: Handle dynamic event for HIP api for perfetto
* Refactor: Generalize function using template
* Format Source
---------
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
[ROCm/rocprofiler-systems commit: abecaa8bf8]
* Removed advanced category from ROCPROFSYS_AMD_SMI_METRICS to have this property visible with rocprof-sys-avail.
[ROCm/rocprofiler-systems commit: 4c7560c78c]