From: Greg Colvin (gcolvin_at_[hidden])
Date: 2000-08-15 11:57:47


From: "Beman Dawes" <beman_at_[hidden]>
> ...
> The third challenge is this:
>
> Given short input strings which contains postal addresses ("123 Main St",
> "PO Box 123", "123 Highway 12") and a bunch of patterns described as
> regular expressions, find which pattern (if any) is the "best"
> match. Ignore what "best" means for this discussion.
>
> A regex user could presumably fill a container with reg_exp objects, then
> for each input address iterate over the container applying regex_match()
> with the reg_exp object. Not particularly difficult to program, AKAIK.
>
> But what if efficiency rears its ugly head? Industrial code I have seen in
> effect re-ordered the regular expressions into groups according to common
> initial sub-expressions, determined which sub-expression applied, and then
> only made the pattern by pattern match attempts on patterns that stood a
> chance of matching. The logical extreme of this approach would be a tree
> search that did the minimum number of regex_match() calls.

The logical extreme is how I have done this in the past.
In effect you combine all the regular expressions into one
big expression and intrduce some sort of "best fit" metric.