Arquivos
systems-assistant[bot] 2e50d88fe6 [ROCProfiler SDK] Removing regex from the tool and output libraries (#170)
* Removing regex from the tool

* Adding alternative for regex regarding  handling

* Adding ROCpd

* Removing regex include

* Apply suggestion from @jomadsen_amdeng

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Apply suggestion from @jomadsen_amdeng

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Apply suggestion from @jomadsen_amdeng

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* Adding Standalone Regex Header File

* Fixing Regex to handle grouping and

* Fixing Regex to handle grouping and

* Fixing Regex to handle grouping and

* Formatting Fix

* Update rocprofiler-sdk-restrictions.yml

* Separating regex.hpp to source and header & Adding Tests for parity with std::regex

* Update regex.cpp

* Using snake_case for naming and addressing some comments

* Adding more tests & README for regex implementation

* Updating rocprofiler sdk restrictions workflow

* Updating more tests & README for regex implementation

* Update README_regex.md

* Rename README_regex.md to README.md

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-08-27 12:30:12 -05:00

7.5 KiB

ROCProfiler SDK Common API Library

Custom Regex Engine

Why We Have Our Own Regex Implementation

This directory contains a custom regex engine implementation designed explicitly for ROCm profiling tools. The primary motivation for implementing our own regex engine instead of using std::regex is to avoid the dual ABI compatibility issues that plague std::regex in the GNU libstdc++ library.

The Dual ABI Problem

The GNU libstdc++ library introduced a dual ABI (Application Binary Interface) system starting with GCC 5.1 to maintain backward compatibility while introducing C++11 improvements. This dual ABI system affects std::string and other standard library components, including std::regex.

Technical Background

The dual ABI allows two different implementations to coexist:

  • Old ABI (pre-C++11): Uses Copy-on-Write (COW) strings
  • New ABI (C++11+): Uses Short String Optimization (SSO)

The ABI is controlled by the _GLIBCXX_USE_CXX11_ABI macro:

  • _GLIBCXX_USE_CXX11_ABI=0: Old ABI (default for GCC < 5.1)
  • _GLIBCXX_USE_CXX11_ABI=1: New ABI (default for GCC >= 5.1)
The std::regex Problem

std::regex is particularly problematic because:

  1. ABI Sensitivity: The std::regex implementation is tightly coupled to the string ABI being used
  2. Symbol Conflicts: Different ABI versions create incompatible symbols that cannot be mixed
  3. Runtime Failures: Applications linking against libraries compiled with different ABI settings experience runtime failures
  4. Distribution Issues: Different Linux distributions and package managers may use different ABI settings
Real-World Impact

As explained in the Stack Overflow discussion, this creates several problematic scenarios:

  • Applications compiled with GCC 4.x linking against libraries compiled with GCC 5+
  • Mixing libraries compiled with different _GLIBCXX_USE_CXX11_ABI settings
  • Distribution packages that assume different ABI defaults
  • Cross-compilation scenarios where ABI settings don't match

Example error scenarios:

// Library A compiled with _GLIBCXX_USE_CXX11_ABI=0
// Library B compiled with _GLIBCXX_USE_CXX11_ABI=1
// Both use std::regex -> Runtime failures or linking errors

Our Solution

To avoid these compatibility issues entirely, we implemented a custom regex engine with the following benefits:

1. ABI Independence

  • No dependency on std::regex or dual ABI settings
  • Consistent behavior across all GCC versions and distributions
  • Eliminates linking and runtime compatibility issues

2. Controlled Dependencies

  • Uses only basic standard library components (std::string_view, std::vector, etc.)
  • Minimizes external dependencies that could introduce ABI conflicts
  • Self-contained implementation

3. Targeted Feature Set

Our implementation focuses on the regex features actually needed by ROCm profiling tools:

Supported Features
  • Literals and Escapes: \n, \t, \\, etc.
  • Anchors: ^ (beginning), $ (end)
  • Character Classes: [abc], [a-z], [^0-9]
  • Shortcuts: \d, \D, \w, \W, \s, \S
  • Quantifiers: *, +, ?, {m}, {m,}, {m,n}
  • Lazy Quantifiers: *?, +?, ??, {m,n}?
  • Groups and Alternation: (), |
  • Dot Metacharacter: .
API Compatibility

The API is designed to be familiar to users of std::regex:

namespace rocprofiler::common::regex {
    bool regex_match(std::string_view text, std::string_view pattern);
    bool regex_search(std::string_view text, std::string_view pattern);
    bool regex_search(std::string_view text, std::string_view pattern,
                     size_t& begin, size_t& end);
    std::string regex_replace(std::string_view text, std::string_view pattern,
                             std::string_view replacement);
}

4. Replacement Token Support

Full support for replacement tokens in regex_replace:

  • $0 or $&: Whole match
  • $1 to $99: Capture groups
  • `$``: Prefix (text before match)
  • $': Suffix (text after match)

Implementation Architecture

1. Parser (struct Parser)

  • Converts regex pattern strings into an Abstract Syntax Tree (AST)
  • Handles escape sequences, character classes, and quantifiers
  • Validates pattern syntax and reports errors

2. AST Nodes (struct Node)

  • Represents different regex components (literals, classes, quantifiers, etc.)
  • Supports recursive structure for complex patterns
  • Memory-efficient representation

3. Matchers

  • FastMatcher: Optimized for simple matching without capture groups
  • CaptureMatcher: Full-featured matcher with capture group support
  • Memoization for performance optimization

4. Algorithm Features

  • Backtracking: Supports complex patterns with alternatives
  • Greedy/Lazy Quantifiers: Proper implementation of both modes
  • Zero-length Guards: Prevents infinite loops in edge cases
  • Capture Group Tracking: Maintains group boundaries during matching

Usage Examples

#include "lib/common/regex.hpp"

using namespace rocprofiler::common::regex;

// Basic matching
bool matches = regex_match("hello123", "hello\\d+");

// Search with position
size_t begin, end;
if (regex_search("prefix_hello123_suffix", "hello\\d+", begin, end)) {
    // Found match at positions [begin, end)
}

// Replace with captures
std::string result = regex_replace(
    "file_v1.2.3.txt",
    "v(\\d+)\\.(\\d+)\\.(\\d+)",
    "version_$1_$2_$3"
);
// result: "file_version_1_2_3.txt"

Testing and Validation

The implementation includes comprehensive tests that verify compatibility with ECMAScript regex semantics:

  • Parity Tests: Compare behavior against std::regex where possible
  • Edge Cases: Handle corner cases like zero-length matches, nested captures
  • Compatibility Tests: Verify consistent behavior across different string types and usage patterns

Maintenance Notes

  • The implementation prioritizes correctness and ABI independence over maximum performance
  • Features are added based on actual requirements from ROCm profiling tools
  • Regular testing ensures compatibility with target environments
  • Documentation is maintained to explain design decisions and limitations

This custom implementation provides a robust, ABI-independent regex solution that eliminates the compatibility issues that would otherwise plague ROCm profiling tools when deployed across diverse environments.

Notes on ABI Independence Testing

The current test suite includes "compatibility tests" that verify consistent behavior across different string types and usage patterns. However, true ABI independence testing would require:

  1. Cross-compilation builds: Building test applications with different _GLIBCXX_USE_CXX11_ABI settings (0 and 1)
  2. Binary compatibility verification: Ensuring object files compiled with different ABI settings can link together
  3. Runtime validation: Testing that regex functionality works consistently regardless of how dependent libraries were compiled

Such comprehensive ABI testing would require:

# Build with old ABI
g++ -D_GLIBCXX_USE_CXX11_ABI=0 -c test_old_abi.cpp

# Build with new ABI
g++ -D_GLIBCXX_USE_CXX11_ABI=1 -c test_new_abi.cpp

# Link together and verify functionality
g++ test_old_abi.o test_new_abi.o -o cross_abi_test

The current implementation achieves ABI independence by avoiding std::regex entirely, relying instead on minimal standard library components and custom string processing that remains stable across ABI versions.