* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements
Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.
The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.
The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations
New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
high contention with multiple threads rapidly filling buffers
HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.
Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues
Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering
* [rocprofiler-sdk] Revert buffer pool design changes
Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design
This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test
The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.
* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization
- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
- hsa/queue.cpp (lines 105, 210)
- hsa/async_copy.cpp (line 344)
- hsa/hsa_barrier.cpp (line 43)
- buffer.cpp (lines 107, 138, 185)
This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.
* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation
Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.
Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
order check with timestamp-based validation that retirement timestamp >=
max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.
* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal
Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.
* [rccl] Remove orphaned rocSHMEM gitlink
Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".
* [rocprofiler-sdk] Add HSA ABI version 0x09 support
Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).
* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations
This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.
Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
when flushing buffers, as this indicates buffers were already flushed
during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
to prevent correlation ID creation after finalization starts
Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp
* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling
Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.
Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers
The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration
- generatePerfetto.cpp: Move output_stream into shared_state to prevent
use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
configuration for better maintainability
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions
The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.
Co-Authored-By: Claude <noreply@anthropic.com>
* Revert "[rccl] Remove orphaned rocSHMEM gitlink"
This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.
* [rocprofiler-sdk] Revert registration.cpp changes
Revert changes to registration.cpp to match develop branch.
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix output_stream move ctor/assignment operator
* Fix erroneous revert of registration.cpp
* Fix handling of fini status in correlation ID construction
* [rocprofiler-sdk] Fix OMPT segfault during finalization
Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.
The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.
Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null
* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization
Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.
This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* [rocprofiler-sdk] Fix fmt::join build errors
- remedy use of fmt::join without include <fmt/ranges.h>
* include memory header
* Disable FMT build for SDK CI
* Add -DROCPROFILER_BUILD_FMT=OFF to sanitizer steps
* Add temporary workaround for rccl.h issue
* Add ROCPROFILER_INTERNAL_RCCL_API_TRACE to SDK CI builds
* disable clang-tidy for vendored includes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
* Fix dimension mismatch for multi-GPU systems with identical architectures
This change addresses an issue where counter dimensions were incorrectly
shared across all GPU agents with the same architecture name, even when
those agents had different hardware configurations (e.g., different CU counts).
Changes:
- Updated getBlockDimensions() to accept agent ID instead of architecture name
- Made dimension cache agent-specific instead of architecture-specific
- Updated set_dimensions() in AST evaluation to use specific agent ID
- Modified all API functions to handle agent-specific dimension lookups
- Updated tests to work with agent-specific dimensions
This fix ensures that dimensions accurately reflect the actual hardware
configuration of each individual GPU agent, preventing dimension mismatches
in multi-GPU systems where GPUs share the same architecture but have
different physical configurations.
Counter ID Representation Changes:
- Modified counter_id encoding to include agent information in bits 37-32
- Agent logical_node_id is encoded as (value + 1) to ensure agent 0 is detectable
- Counter records internally store only 16-bit base metric IDs (bits 15-0)
- Tool reconstructs agent-encoded counter IDs from base metric ID & agent info
- Instance record counter_id field uses bitwise AND mask to extract base metric ID
(counter_id.handle & 0xFFFF) to fit in 16-bit storage
- Output generators (CSV, JSON, Perfetto) use agent-encoded IDs for consistency
- Updated counter_config.cpp and metrics.cpp to extract base metric ID when needed
- All counter lookups now properly handle agent-encoded vs base metric IDs
This ensures counter IDs are consistent between metadata and output records while
maintaining compact storage in instance records.
* attach: milestone: API tracing
- This pairs with another commit in rocprofiler-sdk to fully
function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet
* attach: cleanup
- Remove hardcode for loading of tool library
- Make invoke registration functions public again
* attach: proxy queue first draft
- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk
* attach: prestore overhaul
- Must be paired with commit in rocprofiler-sdk
* attach: add dispatch table rework
- Register will load the prestore library and provide entrypoints to sdk
* attach: formatting and cleanup
* attach: revise dispatch table scheme
* attach: formatting
* attach: milestone: API tracing
- This change must be paired with a change in rocprofiler-register to
fully function.
- API tracing works at this commit
- Queue tracing not supported yet
* attach: cleanup and comments
* attach: Formatting and crash fixes
* attach: add attach duration
- Add option attach-duration-msec for attachment
* Formatting + sglang hang fix via signal handling
* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues
* attach: proxy queue first draft
- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register
* Allow null agents for scratch output
* attach: improve queue library interface
- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect
* attach: add code_object support
- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements
* attach: rename queue library to prestore
* attach: prestore overhaul
- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
- Separates registrations for different object types
- Sets up future changes for initialization
* attach: add prestore dispatch table
- Removes linkage to prestore library from sdk
* attach: cleanup
* attach: formatting
* attach: fix input prompt not appearing
* attach: fix component name in cmake
* attach: revert change to export level
* Make prestore API public
* attach: update sdk attachment library WIP
- This commit is NONFUNCTIONAL
- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage
* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL
- Changes rocprofiler_register to handle dispatch table from attach
library.
- Still needs changes in SDK with dispatch table usage
* attach: dispatch table wip
- This commit is NONFUNCTIONAL
* attach: move attach component into core
* attach: rename to rocprofv3-attach
* attach: add callbacks for new queues and code objects
* attach: finish dispatch table implementation
- Fixes kernel tracing
* attach: add cmake variable for attachment support
* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests
- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration
Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities
* Documentation Update
* attach: make attach script callable
* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name
* attach: revert metrics library path changes
* Generic Attachment in Register (#942)
Remove tool references in register
* Add second param to attach call in rocprof register
* Add experimental reattachment support for ROCprofiler-SDK
This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:
**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests
**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:
1. Initial Attachment Flow:
- rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
- Full tool initialization with complete context setup
- Sets prev_attached atomic flag to track state
2. Reattachment Flow:
- rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
- Bypasses full re-initialization, calls client reattach callbacks instead
- Preserves existing contexts and buffers, only reactivates profiling services
**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability
**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts
**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles
**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization
**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts
**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities
**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution
**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration
The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.
* Fix misc clang-tidy warnings/errors
* CMake Option and Environment Variable Updates
- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->
* Source reorganization
* Formatting + new lines at EOF
* Fix flake8 F841: local variable is assigned to but never used
* Update attachment test
- get rid of 5 second start delay
- add roctx
* Rework implementation
- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst
* Handle re-attachment options
- inherit options from previous attachment
- check previous options do not modify data collection services
* Fix support for tools w/o rocprofiler_configure_attach
- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool
* attach: remove unknown agent handling
- Change was from earlier commit, no longer needed
* attach: add error for attaching without library loaded
* attach: revise version numbering
* attach: register header revisions
* attach: clang format register
* attach: formatting
* attach: fix build failure
- Remove cross dependency into rocprofiler-sdk, fixes build on some systems
* attach: revise register library detection
* Update rocprofiler-register and attach library
- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()
* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion
* Fix output support for rocprofiler-sdk-tool
* Fix formatting
* Fix clang tidy errors
* Misc rocprofiler-sdk-attach fixes
* attach: add sigint handling to attach python
* tool README.md formatting
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Fix buffered output issue
* attach: add errors for tool attach
* CI Fixes
* Rework tests
* attach: improve library loading in rocprofv3 attach
* formatting
* Update tests to use pytest framework
* Fix test_attachment_hsa_api_trace
* attach: catch ctypes exceptions
* attach: fix leak in registration
* attach: fix sanitizer tests
* attach: fix sanitizer tests further
* attach: disable attach asan tests
* attach: disable ubsan test
* attach: fix permissions in installed test package
* attach: formatting
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Remove config checks for stream and kernel rename data collection
* Updated csv generation to check if kernel rename is on before calling get_kernel_name
* Update metadata to use kernel_rename bool argument
* Formatting + unconditionally store kernel name in rocpd
* Readded kernel rename parameter after rebase
* Fixed rebase conflicts
* Updated comment in line with github comments
* Added check in rocpd csv.cpp to output kernel name if region name is empty
* Add test for kernel rename
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
* Removing regex from the tool
* Adding alternative for regex regarding handling
* Adding ROCpd
* Removing regex include
* Apply suggestion from @jomadsen_amdeng
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Apply suggestion from @jomadsen_amdeng
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Apply suggestion from @jomadsen_amdeng
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Adding Standalone Regex Header File
* Fixing Regex to handle grouping and
* Fixing Regex to handle grouping and
* Fixing Regex to handle grouping and
* Formatting Fix
* Update rocprofiler-sdk-restrictions.yml
* Separating regex.hpp to source and header & Adding Tests for parity with std::regex
* Update regex.cpp
* Using snake_case for naming and addressing some comments
* Adding more tests & README for regex implementation
* Updating rocprofiler sdk restrictions workflow
* Updating more tests & README for regex implementation
* Update README_regex.md
* Rename README_regex.md to README.md
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Reverted header and field location for csv memory allocation and updated tests
* Updated example csv file and made small update
[ROCm/rocprofiler-sdk commit: 533a8329d8]
* expose dimensional info in rocprofiler_counter_info_v1_t.
* add counter_id in dim info.
* address review comments
* format.
* address comments.
* use array of pointers for dimensions_instaces.
* format and comments.
* address comments.
* new line.
* Update counter_defs.yaml
* Update counter_defs.yaml
* Update counter_defs.yaml
* counter_defs.
* format counter defs.
* format counter defs.
* format counter defs.
* show only counters being profiled in metadata.
* Format.
* use config for counters and fix warnings.
* add version for rocprofiler_counter_dimension_info_v1_t struct.
* rename rocprofiler_counter_record_dimension_instance_v1_info_t.
* account device id from pmc for counters metadata.
* move dim structs to counters.h.
* address comments to compare value.
* fix tests.
* Address comments. use pointer of arrays for ABI.
* rebase.
* fix build error.
* use separate metadata::init() for rocprofv3.
* also print not found counters.
* precompute all the perf counters needed to be in metadata.
* Misc.
* format
* Format.
* rocprofiler::sdk::container::c_array
* Address comments.
* source/lib/output/metadata.cpp
* lint.
* add unit test for c_array.
* add unit test and serialization support for c_array container.
* Misc.
* Clean files.
* Format.
* clang-tidy.
* add more checks to c_array.
* misc. typo
* Addr comments.
---------
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: bf0fad1d54]
* Fix null handle
- use .handle=0, not .handle=numeric_limits<>::max()
* Update lib.common.hasher
* Fix ROCPROFILER_CONTEXT_NONE
* Use context operator==
* Update CHANGELOG
* Updated null handle for scratch memory and changed allocation test so that free ops account for null agent
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
[ROCm/rocprofiler-sdk commit: 4d6a61f5e5]
* [SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges
Range extent to capture all work between roctxpush/pop operations. Entry callback takes place during roxtxpush and exit callback takes place in roctxpop. This is primarily to allow us to keep an ancestor id on the ancestor stack such that all operations that take place within the push/pop context can be annotated as being apart of this range. With the current setup (where push and pop are two separate operations that need to be combined externally), we cannot keep an ancestor id on the stack and thus cannot tie tracing events to particular ranges.
Correlation id information is inherited from the push operation. Ancestor id needs to be added in a future commit that also outputs this ancestor to CSV.
Output:
```
[ctest] {'size': 64, 'kind': 7, 'operation': 1, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479402642, 'end_timestamp': 2932551491178449, 'thread_id': 3254861}
[ctest] {'size': 64, 'kind': 8, 'operation': 2, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479405878, 'end_timestamp': 2932551491181214, 'thread_id': 3254861}
```
Note: Kind 8 = range extent op.
* Merge fix
Revert several changes
source/lib/rocprofiler-sdk/marker/range_marker.*
- separate out range marker implementation for standard marker implementation
Update public API with marker core range
Support marker core range in sdk (source/lib/rocprofiler-sdk)
Transition rocprofiler-sdk-tool and output lib to use marker core range
Misc fixes for tests
Fix logic in lib/output/generate{CSV,Stats}.cpp
Update tests/rocprofv3/tracing-hip-in-libraries (marker validation)
Fix test_otf2_data
* Test fixes
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 2c4e20b951]
* Modified perfetto output for HIP stream display
* Moved stream_map file location and changed perfetto output names Private_Segment_Size and Group_Segment_Size to Scratch_Size and LDS_Block_Size respectively
* Used const_cast to remove const modifier on void*
* Reverted stream_map changes, now using tool_metadata map to track mapping between stream ptrs and stream IDs
* Removed buffer tracing args in perfetto, added tool_...hip buffer record struct that stores the HIP stream ID for display purposes
* Updated rocpd perfetto.cpp to reflect stream changes. Still need to add vgpr values and stream ID for HIP API
* Changes pass-by const reference to pass-by const value
[ROCm/rocprofiler-sdk commit: 1f8b8c5e9f]
* Updated kernel rename service to use internal correaltion IDs for external correlation IDs and kernel rename values
* Updated for review comments
* Changed if-condition in generateJSON.cpp to check if string view is empty before calling get_entry
[ROCm/rocprofiler-sdk commit: afb27f3f1a]
* [rocprofv3] Add rocpd output support (part 1: prelude)
- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
- md5sum
- hasher
- simple_timer
- static_tl_object
- get_process_start_time_ns(pid_t)
- output library updates
- node_info
- file_generator (generator is now virtual base class)
- stream info updates
* Added submodules
* Code review updates
* Minor unused-but-set-X warning fixes
* Update CI
- install libsqlite3-dev package
* Update CI
- install libsqlite3-dev package
* Fix static thread-local object memory leak
- also fix signal handler chaining
* Remove URL from comment
* Remove page migration exception
* Enable ROCPROFILER_BUILD_SQLITE3 by default
- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON
* Fix gotcha installation
- make install of target optional
* Validate tracing + counter collection dispatch data
- i.e. correlation ids, thread ids, timestamps
* Make find_package(SQLite3) optional
- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)
* Fixes to tracing + counter collection test
* get_process_start_time_ns update
- original implementation did not work
* Fix pytest-packages test_perfetto_data for counter collection
- erroneous failure when used with same PMC + multiple agents
* cmake policy: option() honors normal variables
- for GOTCHA submodule
* Improve samples/api_buffered_tracing stability
- reduce likelihood of sporadic exception throw
* Update gotcha submodule
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 7166b1ab58]
* Adding Benchmarking Stg1
* config fix
* reset
* add jpeg and decode traces in iteration
* address comments benchmark config files.
* address comments.
* address comments.
* address comments: revert cntrl ctx.
* address comments: revert csv output.
* resolve merge conflits.
* format.
* build fix.
* fix hip runtime api traces.
* loop cb services.
* format.
* bug fix.
* Fix operator>
- public C++ comparison operator
* Update configuration options
- support selected regions (--selected-regions)
- support writing output config json (--output-config)
- update serialization data
* rocprofv3 tool library misc updates
- lambda for starting context
- support for writing config json
* Tool library updates
- Finished support for all benchmarking modes
- Added build spec support to config json
* Fix ROCPROFILER_SOVERSION
- this value should not be multiplied by 10,000
* Minor tweak to rocprofv3
* Benchmarking scripts
* formatting
* Fix duplicate include
* Add reproducible-dispatch-count test app
- used in benchmarking
* registration logging
- report number of registered contexts and active contexts after client initialization
* Serialize environment in rocprofv3 output config
* ROCPROFILER_BUILD_BENCHMARK CMake option
* Update benchmark SQL schema
- hash_id is text
- add md5sum to benchmarked_app
- remove app_id from benchmarked_sdk
- add sdk_id to benchmark_config
- separate hip_trace into hip_runtime_trace and hip_compiler_trace
- use INT instead of INTEGER for MySQL compatibility
- add count column in benchmark_statistics
- allow std_dev to be NULL in benchmark_statistics
* Update rocprofv3-benchmark.py
- use md5 instead of python hash (which includes random seed)
- use args.mysql_database
- compute md5sum of executable
- fix insert_benchmark_config
- marker trace fixes
- memory allocation fixes
- split hip_trace into hip_{runtime,compiler}_trace
- remove app_id from benchmarked_sdk
- support warmup runs
- count field in benchmark_statistics
* Support launcher and environment in YAML
* Update reproducible-dispatch-count.cpp
- support mode which doesn't use hip event timing
* Misc rocprofv3-benchmark.py updates
- fix some MySQL support
- remove some unnecessary logging
* support mysql db.
* Format.
* Updated SQL input files
- moved benchmark_schema.sql to benchmark_table.sql
- added benchmark_views.sql
- uses {{metric}} syntax for variable substitution
* cmake formatting
* update rocprofv3-benchmark.py
- benchmark config labels
- overhead views
* Encode rocprofv3-benchmark PID in rocprofv3 and timem output files
* Minor tweak to benchmark_views.sql
- include count
- reorder fields for readability
* split statements and use IS if values is NONE.
* use backtick instead of double quotes and add IS before NOT NULL.:
* Adding Mandelbrot Benchmark App
* Adding Dockerfile example
* Update dockerfile
* Update dockerfile
* [SDK] rocprofiler_query_external_correlation_id_request_kind_name
* Execution-profile benchmark mode
* Execution profile SQL support
* Rename mandlebrot folder + misc clang-tidy
* [rocprofv3-benchmark] Execution profile support
* Update installation
* add work dir when setting git revision, useful when building outside src.
* Set FULL_VERSION_STRING and ROCPROFILER_SDK_GIT_REVISION
- when benchmark folder is top-level
* Remove unused python packages from requirements.txt
* Use ldd/pyelftools to include linked libs for md5sum
- also add --filter-benchmark and --filter-rocprofv3 options
- support labeling the rocprofv3 options
- use more argparse groups
- more generic application of filters
- support variable substitution in environment, e.g. PATH=/some/path:$PATH
* Environment improvements
- improve reproducibility when env set via input file vs. shell
- support "environment-ignore" to remove environment variables
* Misc formatting
* Misc. fix
* use backticks for defining new columns name
* Support shuffling the order of benchmark modes/rocprofv3 args
* Address review comments
* Update Dockerfile
- rename to Dockerfile
- reduce to one layer
* Support docker build arg BRANCH
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 6f17da7ade]
* Fix stream duplication and fixed tests
* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code
* Removed roctx from CMakeLists.txt
* Updated documentation
* Fix documentation
* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table
* Added runtime initialization check for HIP
* Changed tool name, working on fixing memory management
* Added context for counter collection kernel rename combination
* Changed name from map to set and changed description
* Fix documentation description for group-by-queue
* Merged memory copy and kernel operations onto a single track when on the same stream
* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:
* Most pr comments addressed
* Added filter for counter collection and removed kernel buffer tracing hack
* Added PR comment fixes
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: e626df43eb]
* Fix compilation for output library
- link to targets for ATT (amd-comgr, dw, elf)
* Relax correlation ID retirement log failures
- only fail for correlation ID retirement underflow when building in CI mode
* Fix shebang for several files
- license was inserted before shebang in several places
* Update code coverage exclude folders for samples
* Tweak to agent tests
- test to make sure hsa agent is not the old value instead of testing that it is the new value
* Fix libdw include/link
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 3580478426]
* rocprofv3: LD_PRELOAD for signal and sigaction
- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)
* [DO NOT COMMIT] extra logging
* Switch git submodule url for perfetto
- use GitHub URL as this is more accessible
* Update ring_buffer<Tp>
- account for alignment padding
* Update buffered_output
- track number of bytes stored
- add nullptr checks
* Update tmp_file_buffer
- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist
* Update tmp_file
- add exists member function for checking whether temporary file already exists
- tweak remove() implementation
* Update config.hpp
- add option to enable/disable signal handlers
- add option for minimum_output_bytes
* Make signal, sigaction functions visible
* rocprofv3 tool updates
- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes
* Add commandline support
* fixing test
* minor fix
* minor fix
* fix clang issue
* fix
* Adding docs
* review comments
* review changes
* review
* YUV pulldown additions to rocdecode
* More rocdecode changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 87badfbd15]
* Update header file includes
* Fix includes for lib/rocprofiler-sdk/hip/hip.hpp
* Minor touch ups
* Minor include improvements
* Doxygen tweak
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: c5a3edc3fa]
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing
* Updated perfetto to output args individually rather than as a string list
* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type
* Updated tests for review comments
* Test args exist and return value
* Updated to use string entry
* Change function name
* Updated PR to reflect review comments
* Updated for PR review comments
* Change function name
[ROCm/rocprofiler-sdk commit: 077723337a]
* Make sure all structs/enums can be forward declared
* Updates to counter collection
- consistency updates and cleanup
* Conversion of dimension information to info struct
* Added deprecated folder
* Testing changes
* merge changes
* Fix shadowed variable
* Source code formatting
* Fix shadowed variable
* Update rocprofiler_counter_info_v1_t member names
* Split version.h into version.h and ext_version.h
- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision
* profile_config -> counter_config
* EOF new line
* [Samples] Reduce header includes + reorg counter collection samples
* Misc compilation fixes
- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables
* Minor misc modifications
- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
- move local anon namespace functions into rocprofiler::counters:: anon namespace
- use std::string_view for get_static_string
- const ref for get_static_ptr
- misc namespace shortening
* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t
- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet
* [Tests] Improve async-copy-testing test
- relax constraints
- improve logging
* Update counter_config.h doxygen docs
* ROCPROFILER_SDK_BETA_COMPAT
- ppdef which helps with renaming when set to 1
* Remove spurious include
* Fix includes for cxx/version.hpp
* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet
* Public API Experimental Designation
- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries
* Fix use of assert instead of static_assert in hip/stream.cpp
* Use typedef instead of define for rocprofiler_profile_config_id_t
* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef
- added <rocprofiler-sdk/deprecated/profile_config.h>
* Doxygen for rocprofiler_{create,destroy}_profile_config
* ROCPROFILER_SDK_DEPRECATED_WARNINGS
* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1
* cmake formatting
* Misc variable renaming in samples and tests
* Fix declarations of types
* Fix hip stream tracing service struct name
- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t
* Rename "HIP_STREAM_API" to "HIP_STREAM"
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 4cd121e27b]
* [SDK][rocprofv3] HIP API buffer records with args (ext)
- New buffer tracing domain(s) for HIP APIs which include the arguments and the return value in the buffer records
- Update HIP stream support for extended HIP buffer tracing
- Update rocprofv3 tool library and output library to use extended HIP buffer tracing recods
* Update stream.cpp
- handle hipStream_t address being reused for a new stream
* Update doxygen docs for rocprofiler_iterate_buffer_tracing_record_args
* Update rocprofv3 tool.cpp
- configure buffer tracing services with HIP_*_API_EXT variants
- tweak logging level for hip_stream_display_callback
* Fix validation tests
- add HIP_RUNTIME_API_EXT and HIP_COMPILER_API_EXT to valid domain names
* Serialization support for buffer tracing args
* Disable stream service for __hipPopCallConfiguration
- this is interpreted as a stream create but it doesn't create a stream
* Fix execute_buffer_record_emplace for HIP extended contexts
* Add uint64_t_retval to rocprofiler_hip_api_retval_t union
- reading in hipError_t_retval during serialization of pointer return value causes undefined behavior
* Fix compilation warning about unused but set parameter
- in hip/stream.cpp
* Add synchronization for async_copy_data
* Fix compilation error
* Fix compilation error
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: e33dff7ad0]
* MI300 Stochastic PC sampling SDK API implementation
* ROCProfV3: Stochastic PC sampling Support (#94)
* ROCProfV3: MI300 Stochastic PC sampling initial draft
* ROCProfV3: Initial Stochastic PC sampling Tests (#95)
ROCProfV3: Initial Stochastic PC sampling tests
* Update rocprofiler_pc_sampling_record_stochastic_v0_t
- update doxygen docs for members
- replace rocprofiler_correlation_id_t with rocprofiler_async_correlation_id_t
* Relax the check in JSON tests
* drain PC sampling buffer during finalize_rocprofv3
* Increase timeout for "Test Install Build" step
- 10 minutes -> 20 minutes
- "Test Installed Packages" has 20 minutes so "Test Install Build" should also
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 49ce79a5b5]
* Potential fix for code scanning alert no. 24: Use of potentially dangerous function
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* clang-format fix
* use std::localtime_r instead of localtime.
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* localtime_r is defined in global namespace.
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: c06feccf2a]
* Add Stack IDs
* Add memcpy test
* Add async corr id record
* Async events use `rocprofiler_async_correlation_id_t`
* Sync events use `rocprofiler_correlation_id_t`
* Update ATT to use asnyc IDs
* Review comments
[ROCm/rocprofiler-sdk commit: f27f76716e]
* rocprofiler_stream_id_t: opaque handle for a stream
- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
- rocprofiler_buffer_tracing_hip_api_record_t
- rocprofiler_buffer_tracing_memory_copy_record_t
- rocprofiler_callback_tracing_hip_api_data_t
- rocprofiler_callback_tracing_memory_copy_data_t
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Mark Meserve <mark.meserve@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jakaraddi, Manjunath <Manjunath.Jakaraddi@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Nagaraj, Sriraksha <Sriraksha.Nagaraj@amd.com>
Co-authored-by: U, Srihari <Srihari.U@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: ccd1e54293]
* Perfetto duration temp fix setup
* Add timestamp change amounts to ROCP Info
* Groups kernel dispatch info by agent and queue id before sorting. Midpoint interpolation is then performed on the sorted kernels
* Moved dispatch bins into the for-loop
* Fix compilation error by using const ref
* Modified for review comments
* Changed variable names
[ROCm/rocprofiler-sdk commit: 6518c5463d]
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Support for rocJPEG API Trace
* Added newline to rocjpeg_version.h
* json-tool code added, initial test/bin commit
* Formatting
* Resolved rocjpeg bin test compilation errors
* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed
* Formatting and compilation errors
* Minor fixes
* Copyright year update and minor fixes
* Doc update fix
* Added rocjpeg csv file in data
* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes
* Added rocdecode and rocjpeg to CI
* Removed rocdecode and rocjpeg from CI and added back build tests option
* Updated Cmake Files
* Added rocDecode and rocJPEG to CI
* Remove cmake line added in error
* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes
* Added find_package for test
* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path
* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests
* Resolve merge conflicts and formatting
* Added regex find and replace instead of include for CI
* VAAPI package causing errors on Vega20
* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved
* Removed workflows regex
* Formatting and minor test modification
* Modified test for vega20
* Update rocDecode and rocJPEG cmake and tests
* Changelog
* Fix merge conflict
* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing
* Removed if found statements, replaced with TARGET:EXISTS
* Skip json file for rocjpeg and rocdecode tests if not supported
* Add os import
---------
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 31fe8858d1]
* Counter track for memory allocation is now a running sum showing total allocation
* Address review comments
* Update source/lib/output/generatePerfetto.cpp
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
* Updated to reflect review comments
* Fix compilation errors on CI
* remove braces on scalar
* Fix struct compilation issues
* Removed name_to_id for sanitizer
---------
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
[ROCm/rocprofiler-sdk commit: cc0c401615]
* rocprofv3: suppress agent info when no data collected
* Update output config serialization
- full serialization of output configuration
* Update rocprofiler-sdk-att/tests
- add version and soversion
- change output directory
- generate libatt_decoder_summary
- disable tests instead of removing them
* Update rocprofv3 command-line
- make --att-library-path hidden by default
- simplify check_att_capability
- reorder pc sampling options
- add hidden --echo option
- remove ROCPROF_LIST_AVAIL_TOOL_LIBRARY from preload
* Add new rocprofv3 tests for specify the ATT library path
* Tweak to rocprofv3-test-hsa-multiqueue-att tests
* Update rocprofv3 tool to enable output with att
* Fix standalone test installation
* Revert to fetchcontent_makeavailable to fetchcontent_populate
* Revert tests/common/CMakeLists.txt
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 59b41ab5aa]
* [DO NOT MERGE] Misc UUID updates
- this is WIP
* Agent visibility
- Support for ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, CUDA_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL
* Update CHANGELOG
* tweak to rocprofiler_agent_runtime_visiblity_t
* Code object kernel address
- new fields in code_object_kernel_symbol_register_data_t
- kernel_code_entry_byte_offset
- kernel_address
* Support ROCR_VISIBLE_DEVICES reordering devices for HIP
* Addressed code review changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 6246ec4040]
* rocprofv3: do not abort if counter does not have dimensions
* Relax error handling further in rocprofv3 metadata
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 3071199386]