* Removing regex from the tool * Adding alternative for regex regarding handling * Adding ROCpd * Removing regex include * Apply suggestion from @jomadsen_amdeng Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com> * Apply suggestion from @jomadsen_amdeng Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com> * Apply suggestion from @jomadsen_amdeng Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com> * Adding Standalone Regex Header File * Fixing Regex to handle grouping and * Fixing Regex to handle grouping and * Fixing Regex to handle grouping and * Formatting Fix * Update rocprofiler-sdk-restrictions.yml * Separating regex.hpp to source and header & Adding Tests for parity with std::regex * Update regex.cpp * Using snake_case for naming and addressing some comments * Adding more tests & README for regex implementation * Updating rocprofiler sdk restrictions workflow * Updating more tests & README for regex implementation * Update README_regex.md * Rename README_regex.md to README.md --------- Co-authored-by: Ammar ELWazir <aelwazir@amd.com> Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com> Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
7.5 KiB
ROCProfiler SDK Common API Library
Custom Regex Engine
Why We Have Our Own Regex Implementation
This directory contains a custom regex engine implementation designed explicitly for ROCm profiling tools. The primary motivation for implementing our own regex engine instead of using std::regex is to avoid the dual ABI compatibility issues that plague std::regex in the GNU libstdc++ library.
The Dual ABI Problem
The GNU libstdc++ library introduced a dual ABI (Application Binary Interface) system starting with GCC 5.1 to maintain backward compatibility while introducing C++11 improvements. This dual ABI system affects std::string and other standard library components, including std::regex.
Technical Background
The dual ABI allows two different implementations to coexist:
- Old ABI (pre-C++11): Uses Copy-on-Write (COW) strings
- New ABI (C++11+): Uses Short String Optimization (SSO)
The ABI is controlled by the _GLIBCXX_USE_CXX11_ABI macro:
_GLIBCXX_USE_CXX11_ABI=0: Old ABI (default for GCC < 5.1)_GLIBCXX_USE_CXX11_ABI=1: New ABI (default for GCC >= 5.1)
The std::regex Problem
std::regex is particularly problematic because:
- ABI Sensitivity: The
std::regeximplementation is tightly coupled to the string ABI being used - Symbol Conflicts: Different ABI versions create incompatible symbols that cannot be mixed
- Runtime Failures: Applications linking against libraries compiled with different ABI settings experience runtime failures
- Distribution Issues: Different Linux distributions and package managers may use different ABI settings
Real-World Impact
As explained in the Stack Overflow discussion, this creates several problematic scenarios:
- Applications compiled with GCC 4.x linking against libraries compiled with GCC 5+
- Mixing libraries compiled with different
_GLIBCXX_USE_CXX11_ABIsettings - Distribution packages that assume different ABI defaults
- Cross-compilation scenarios where ABI settings don't match
Example error scenarios:
// Library A compiled with _GLIBCXX_USE_CXX11_ABI=0
// Library B compiled with _GLIBCXX_USE_CXX11_ABI=1
// Both use std::regex -> Runtime failures or linking errors
Our Solution
To avoid these compatibility issues entirely, we implemented a custom regex engine with the following benefits:
1. ABI Independence
- No dependency on
std::regexor dual ABI settings - Consistent behavior across all GCC versions and distributions
- Eliminates linking and runtime compatibility issues
2. Controlled Dependencies
- Uses only basic standard library components (
std::string_view,std::vector, etc.) - Minimizes external dependencies that could introduce ABI conflicts
- Self-contained implementation
3. Targeted Feature Set
Our implementation focuses on the regex features actually needed by ROCm profiling tools:
Supported Features
- Literals and Escapes:
\n,\t,\\, etc. - Anchors:
^(beginning),$(end) - Character Classes:
[abc],[a-z],[^0-9] - Shortcuts:
\d,\D,\w,\W,\s,\S - Quantifiers:
*,+,?,{m},{m,},{m,n} - Lazy Quantifiers:
*?,+?,??,{m,n}? - Groups and Alternation:
(),| - Dot Metacharacter:
.
API Compatibility
The API is designed to be familiar to users of std::regex:
namespace rocprofiler::common::regex {
bool regex_match(std::string_view text, std::string_view pattern);
bool regex_search(std::string_view text, std::string_view pattern);
bool regex_search(std::string_view text, std::string_view pattern,
size_t& begin, size_t& end);
std::string regex_replace(std::string_view text, std::string_view pattern,
std::string_view replacement);
}
4. Replacement Token Support
Full support for replacement tokens in regex_replace:
$0or$&: Whole match$1to$99: Capture groups- `$``: Prefix (text before match)
$': Suffix (text after match)
Implementation Architecture
1. Parser (struct Parser)
- Converts regex pattern strings into an Abstract Syntax Tree (AST)
- Handles escape sequences, character classes, and quantifiers
- Validates pattern syntax and reports errors
2. AST Nodes (struct Node)
- Represents different regex components (literals, classes, quantifiers, etc.)
- Supports recursive structure for complex patterns
- Memory-efficient representation
3. Matchers
- FastMatcher: Optimized for simple matching without capture groups
- CaptureMatcher: Full-featured matcher with capture group support
- Memoization for performance optimization
4. Algorithm Features
- Backtracking: Supports complex patterns with alternatives
- Greedy/Lazy Quantifiers: Proper implementation of both modes
- Zero-length Guards: Prevents infinite loops in edge cases
- Capture Group Tracking: Maintains group boundaries during matching
Usage Examples
#include "lib/common/regex.hpp"
using namespace rocprofiler::common::regex;
// Basic matching
bool matches = regex_match("hello123", "hello\\d+");
// Search with position
size_t begin, end;
if (regex_search("prefix_hello123_suffix", "hello\\d+", begin, end)) {
// Found match at positions [begin, end)
}
// Replace with captures
std::string result = regex_replace(
"file_v1.2.3.txt",
"v(\\d+)\\.(\\d+)\\.(\\d+)",
"version_$1_$2_$3"
);
// result: "file_version_1_2_3.txt"
Testing and Validation
The implementation includes comprehensive tests that verify compatibility with ECMAScript regex semantics:
- Parity Tests: Compare behavior against
std::regexwhere possible - Edge Cases: Handle corner cases like zero-length matches, nested captures
- Compatibility Tests: Verify consistent behavior across different string types and usage patterns
Maintenance Notes
- The implementation prioritizes correctness and ABI independence over maximum performance
- Features are added based on actual requirements from ROCm profiling tools
- Regular testing ensures compatibility with target environments
- Documentation is maintained to explain design decisions and limitations
This custom implementation provides a robust, ABI-independent regex solution that eliminates the compatibility issues that would otherwise plague ROCm profiling tools when deployed across diverse environments.
Notes on ABI Independence Testing
The current test suite includes "compatibility tests" that verify consistent behavior across different string types and usage patterns. However, true ABI independence testing would require:
- Cross-compilation builds: Building test applications with different
_GLIBCXX_USE_CXX11_ABIsettings (0 and 1) - Binary compatibility verification: Ensuring object files compiled with different ABI settings can link together
- Runtime validation: Testing that regex functionality works consistently regardless of how dependent libraries were compiled
Such comprehensive ABI testing would require:
# Build with old ABI
g++ -D_GLIBCXX_USE_CXX11_ABI=0 -c test_old_abi.cpp
# Build with new ABI
g++ -D_GLIBCXX_USE_CXX11_ABI=1 -c test_new_abi.cpp
# Link together and verify functionality
g++ test_old_abi.o test_new_abi.o -o cross_abi_test
The current implementation achieves ABI independence by avoiding std::regex entirely, relying instead on minimal standard library components and custom string processing that remains stable across ABI versions.