$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [regex, xpressive] interesting(?) perf benchmark
From: John Maddock (boost.regex_at_[hidden])
Date: 2010-06-07 04:26:00
> I'm always suspect of benchmarks, but I figured I'd post this anyway.
>
> http://astl.sourceforge.net/bench.7.html.
LOL, nice to see my own benchmark suite used against me!
> Anyway, it's not my intention to kick off yet another costly escalation
> in the regex war. God help me, I don't have the time. I post it here
> because I haven't come across a comparison of these four libraries by a
> third party before. Some quick observations:
>
> - I'm pleased that on the balance, xpressive fares well here relative to
> the other NFAs. In the end, though, not much separates best from worst.
>
> - All the NFA libraries seem to have large overhead that becomes very
> noticeable when matching short strings. I don't know why.
I spent some time trying mostly in vain to hammer this down lower with
current Boost.Regex - it turns out that most of it is down to initialisation
of state - Perl-style re's typically have a lot of state to initialise. The
thing to remember is for that short matches against short texts the cost of
a match is basically the cost of a function call for DFA's. Or to put it
another way, even if the NFA's are 10x slower on these short tests, 10x
almost nothing is still almost nothing (though I accept there are certain
outlyer-use-cases where this overhead becomes an issue).
> - I'm gobsmacked that my VERY OLD library greta handily beats pcre,
> boost.regex /and/ xpressive on short matches. Failing to beat yourself
> in any perf benchmark is kind of embarrassing.
Probably more features in xpressive to set up?
> - I'm tickled that for long matches, both xpressive and boost.regex
> (NFAs) beat his library (DFA). How'd that happen?
Maybe reflects the preocupations of their authors?
> - I find it interesting that both xpressive and boost.regex have large
> discontinuities in the chart at the bottom; the others don't. Huh.
Weird indeed, can't really imagine why that would happen, unless there's
some kind of cache miss effect going on?
Cheers, John.