Boost mailing page: Re: [boost] Review: Regular Expression Library

Date view	Thread view	Subject view	Author view

From: Greg Colvin (gcolvin_at_[hidden])
Date: 2000-08-15 11:57:47

Next message: Mark Rodgers: "Re: [boost] [cvs] test results log?"
Previous message: Howard Hinnant: "Re: [boost] [cvs] test results log?"
In reply to: Beman Dawes: "Re: [boost] Review: Regular Expression Library"
Next in thread: Jens Maurer: "[boost] Review: Regular Expression Library (reminder)"

From: "Beman Dawes" <beman_at_[hidden]>
> ...
> The third challenge is this:
>
> Given short input strings which contains postal addresses ("123 Main St",
> "PO Box 123", "123 Highway 12") and a bunch of patterns described as
> regular expressions, find which pattern (if any) is the "best"
> match. Ignore what "best" means for this discussion.
>
> A regex user could presumably fill a container with reg_exp objects, then
> for each input address iterate over the container applying regex_match()
> with the reg_exp object. Not particularly difficult to program, AKAIK.
>
> But what if efficiency rears its ugly head? Industrial code I have seen in
> effect re-ordered the regular expressions into groups according to common
> initial sub-expressions, determined which sub-expression applied, and then
> only made the pattern by pattern match attempts on patterns that stood a
> chance of matching. The logical extreme of this approach would be a tree
> search that did the minimum number of regex_match() calls.

The logical extreme is how I have done this in the past.
In effect you combine all the regular expressions into one
big expression and intrduce some sort of "best fit" metric.

Next message: Mark Rodgers: "Re: [boost] [cvs] test results log?"
Previous message: Howard Hinnant: "Re: [boost] [cvs] test results log?"
In reply to: Beman Dawes: "Re: [boost] Review: Regular Expression Library"
Next in thread: Jens Maurer: "[boost] Review: Regular Expression Library (reminder)"

Date view	Thread view	Subject view	Author view