Dateien
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
..
2025-04-14 13:07:32 -05:00
2025-07-18 12:05:52 -05:00
2025-01-22 19:11:20 -06:00

ROCProfiler SDK Common API Library

Custom Regex Engine

Why We Have Our Own Regex Implementation

This directory contains a custom regex engine implementation designed explicitly for ROCm profiling tools. The primary motivation for implementing our own regex engine instead of using std::regex is to avoid the dual ABI compatibility issues that plague std::regex in the GNU libstdc++ library.

The Dual ABI Problem

The GNU libstdc++ library introduced a dual ABI (Application Binary Interface) system starting with GCC 5.1 to maintain backward compatibility while introducing C++11 improvements. This dual ABI system affects std::string and other standard library components, including std::regex.

Technical Background

The dual ABI allows two different implementations to coexist:

  • Old ABI (pre-C++11): Uses Copy-on-Write (COW) strings
  • New ABI (C++11+): Uses Short String Optimization (SSO)

The ABI is controlled by the _GLIBCXX_USE_CXX11_ABI macro:

  • _GLIBCXX_USE_CXX11_ABI=0: Old ABI (default for GCC < 5.1)
  • _GLIBCXX_USE_CXX11_ABI=1: New ABI (default for GCC >= 5.1)
The std::regex Problem

std::regex is particularly problematic because:

  1. ABI Sensitivity: The std::regex implementation is tightly coupled to the string ABI being used
  2. Symbol Conflicts: Different ABI versions create incompatible symbols that cannot be mixed
  3. Runtime Failures: Applications linking against libraries compiled with different ABI settings experience runtime failures
  4. Distribution Issues: Different Linux distributions and package managers may use different ABI settings
Real-World Impact

As explained in the Stack Overflow discussion, this creates several problematic scenarios:

  • Applications compiled with GCC 4.x linking against libraries compiled with GCC 5+
  • Mixing libraries compiled with different _GLIBCXX_USE_CXX11_ABI settings
  • Distribution packages that assume different ABI defaults
  • Cross-compilation scenarios where ABI settings don't match

Example error scenarios:

// Library A compiled with _GLIBCXX_USE_CXX11_ABI=0
// Library B compiled with _GLIBCXX_USE_CXX11_ABI=1
// Both use std::regex -> Runtime failures or linking errors

Our Solution

To avoid these compatibility issues entirely, we implemented a custom regex engine with the following benefits:

1. ABI Independence

  • No dependency on std::regex or dual ABI settings
  • Consistent behavior across all GCC versions and distributions
  • Eliminates linking and runtime compatibility issues

2. Controlled Dependencies

  • Uses only basic standard library components (std::string_view, std::vector, etc.)
  • Minimizes external dependencies that could introduce ABI conflicts
  • Self-contained implementation

3. Targeted Feature Set

Our implementation focuses on the regex features actually needed by ROCm profiling tools:

Supported Features
  • Literals and Escapes: \n, \t, \\, etc.
  • Anchors: ^ (beginning), $ (end)
  • Character Classes: [abc], [a-z], [^0-9]
  • Shortcuts: \d, \D, \w, \W, \s, \S
  • Quantifiers: *, +, ?, {m}, {m,}, {m,n}
  • Lazy Quantifiers: *?, +?, ??, {m,n}?
  • Groups and Alternation: (), |
  • Dot Metacharacter: .
API Compatibility

The API is designed to be familiar to users of std::regex:

namespace rocprofiler::common::regex {
    bool regex_match(std::string_view text, std::string_view pattern);
    bool regex_search(std::string_view text, std::string_view pattern);
    bool regex_search(std::string_view text, std::string_view pattern,
                     size_t& begin, size_t& end);
    std::string regex_replace(std::string_view text, std::string_view pattern,
                             std::string_view replacement);
}

4. Replacement Token Support

Full support for replacement tokens in regex_replace:

  • $0 or $&: Whole match
  • $1 to $99: Capture groups
  • `$``: Prefix (text before match)
  • $': Suffix (text after match)

Implementation Architecture

1. Parser (struct Parser)

  • Converts regex pattern strings into an Abstract Syntax Tree (AST)
  • Handles escape sequences, character classes, and quantifiers
  • Validates pattern syntax and reports errors

2. AST Nodes (struct Node)

  • Represents different regex components (literals, classes, quantifiers, etc.)
  • Supports recursive structure for complex patterns
  • Memory-efficient representation

3. Matchers

  • FastMatcher: Optimized for simple matching without capture groups
  • CaptureMatcher: Full-featured matcher with capture group support
  • Memoization for performance optimization

4. Algorithm Features

  • Backtracking: Supports complex patterns with alternatives
  • Greedy/Lazy Quantifiers: Proper implementation of both modes
  • Zero-length Guards: Prevents infinite loops in edge cases
  • Capture Group Tracking: Maintains group boundaries during matching

Usage Examples

#include "lib/common/regex.hpp"

using namespace rocprofiler::common::regex;

// Basic matching
bool matches = regex_match("hello123", "hello\\d+");

// Search with position
size_t begin, end;
if (regex_search("prefix_hello123_suffix", "hello\\d+", begin, end)) {
    // Found match at positions [begin, end)
}

// Replace with captures
std::string result = regex_replace(
    "file_v1.2.3.txt",
    "v(\\d+)\\.(\\d+)\\.(\\d+)",
    "version_$1_$2_$3"
);
// result: "file_version_1_2_3.txt"

Testing and Validation

The implementation includes comprehensive tests that verify compatibility with ECMAScript regex semantics:

  • Parity Tests: Compare behavior against std::regex where possible
  • Edge Cases: Handle corner cases like zero-length matches, nested captures
  • Compatibility Tests: Verify consistent behavior across different string types and usage patterns

Maintenance Notes

  • The implementation prioritizes correctness and ABI independence over maximum performance
  • Features are added based on actual requirements from ROCm profiling tools
  • Regular testing ensures compatibility with target environments
  • Documentation is maintained to explain design decisions and limitations

This custom implementation provides a robust, ABI-independent regex solution that eliminates the compatibility issues that would otherwise plague ROCm profiling tools when deployed across diverse environments.

Notes on ABI Independence Testing

The current test suite includes "compatibility tests" that verify consistent behavior across different string types and usage patterns. However, true ABI independence testing would require:

  1. Cross-compilation builds: Building test applications with different _GLIBCXX_USE_CXX11_ABI settings (0 and 1)
  2. Binary compatibility verification: Ensuring object files compiled with different ABI settings can link together
  3. Runtime validation: Testing that regex functionality works consistently regardless of how dependent libraries were compiled

Such comprehensive ABI testing would require:

# Build with old ABI
g++ -D_GLIBCXX_USE_CXX11_ABI=0 -c test_old_abi.cpp

# Build with new ABI
g++ -D_GLIBCXX_USE_CXX11_ABI=1 -c test_new_abi.cpp

# Link together and verify functionality
g++ test_old_abi.o test_new_abi.o -o cross_abi_test

The current implementation achieves ABI independence by avoiding std::regex entirely, relying instead on minimal standard library components and custom string processing that remains stable across ABI versions.