From: John Maddock (john_at_[hidden])
Date: 2007-03-19 06:28:45


Eric Niebler wrote:
> I have a question and a bug report regarding the format_perl flag.
> First the question ...
>
> I see that, when you specify format_perl, match_results::format()
> recognizes the escape sequences \l \L \u and \U, which do uppercasing
> or lowercasing. These are necessarily locale-dependent character
> transformations, but match_results does not have a Traits parameter.
> How should the transformations be done?
>
> I note that the basic_regex<> class template has a traits parameter,
> and that match_results<>::format() can only be called after a
> successful regex match. One reasonable approach is that
> match_results<> holds a (shared) pointer to the regex object's
> traits. It would have to be a polymorphic base pointer, since
> match_results can't know the exact type of the traits object at the
> time format() is called.
>
> That doesn't exactly work because the RegexTraits concept doesn't have
> toupper() and tolower() functions. I suggest adding them.

Right, but format_perl isn't part of TR1, so this is all in the realms of
vendor-specific extensions. I added some *optional* extra members to the
traits class to deal with this: the code detects at compile time whether the
member are there, and uses them if they are, otherwise uses some sensible
defaults.

> This isn't only a problem for format_perl, strictly speaking.
> match_results::format() also needs to know how to turn characters into
> integers (eg. to parse format strings like "$1"). That is the reason
> for RegexTraits::value()'s existence, so match_results<>::format()
> should use it.
>
> (Incidentally, I just implemented all this in xpressive, so I can
> confirm that this strategy works. It incurs a virtual call for each
> tolower(), toupper(), and value(), but there doesn't seem to be any
> other way without changing the interface in a non-TR1 compatible way.)

Yep, for regex_replace you can pass the regex object through to the code
that does the formatting, but match_replace::format has no such object. I
use the default locale in this case, but your approach is probably better.

> Finally, a bug report. Consider the following code:
>
> std::string str ("fOO bAr BaZ");
> regex rx ("\\w+");
>
> str = regex_replace( str, rx, "\\L\\u$&", format_perl );
> std::cout << str << std::endl;
>
> This prints:
>
> FOO BAr BaZ
>
> However, the equivalent perl:
>
> $str= 'fOO bAr BaZ';
> $str =~ s/\w+/\L\u$&/g;
> print "$str\n";
>
> Prints this:
>
> Foo Bar Baz
>
> Looks like in boost::regex, the \u is stomping the \L rather than
> merely overriding it for the next character.

Yep, fixed in cvs, thanks for the report.

John.