$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Eric Niebler (eric_at_[hidden])
Date: 2005-09-12 17:34:00
Thomas Witt wrote:
> What is the rationale for having the regex type bound to the iterator 
> type used when matching/searching ?
> 
> Thomas
Good question. From the regex proposal, John writes:
"There is one other restriction worth pointing out; the Boost interface 
[parameterized on Char instead of Iterator] does not easily allow an 
implementation to generate self-modifying machine code for the state 
machine. That would require the state machine to be fixed to a finite 
number of iterator types, and in practice there appear to be no regex 
implementations that make this design choice in any case."
xpressive is such an implementation. At least, static xpressive is. 
Since the the full regex expression *and* the iterator type are known at 
compile time, the compiler is used to generate machine code to match a 
regex to a data sequence. Each static regex generates its own small 
custom pattern matching engine. For these regexes, there are no 
op-codes, jump tables, virtual calls or other indirections.
If the regex type were parameterized on Char instead of Iterator, then 
the compiled form of a regex would have to be Iterator-neutral. You 
would have to separate the internal representation of the regex from the 
code that evaluates the regex against a data sequence. This requires an 
indirection somewhere -- in the case of Boost.Regex, it's op codes and a 
jump table. (John can correct me if I'm wrong here.) Arrays of function 
pointers flummox optimizers.
In short, the answer is performance. In order to take full advantage of 
the compile-time information of static regexes, the regex type must be 
parameterized on Iterator, not Char. In case you're wondering if it 
matters or not, static regex are ~13% faster than their dynamic 
equivalents on average (on VC7.1). Perf results here:
HTH,
-- Eric Niebler Boost Consulting www.boost-consulting.com