Subject: Re: [boost] [libboost_regex-1_32.dylib] Validating Email address usingRegx failing in boost
From: OvermindDL1 (overminddl1_at_[hidden])
Date: 2009-10-22 22:26:35


On Thu, Oct 22, 2009 at 1:03 PM, Arun <arun.ka_at_[hidden]> wrote:
> Also i have tried the perl code that is being posted in this thread.
> The output seems comes like this
>
> perl test.pl
> Sequence (?<a...) not recognized in regex; marked by <-- HERE in
> m/^((?>[a-zA-Z\d!#0&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<a
> <-- HERE ngle><))?((?!\.)(?>\.?[a-zA-Z\d!#0&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(angle)>)$/
> at test.pl line 2.
>
>
> John, if I compare the output to yours, looks like for me the error is
> being pointed at the 1st instance of "angle" rather than the second
> one.
>
> Is it possible because we may be sing different Operating Systems?
>
> I am using Mac OS X 10.5.5 and
>
> perl -v
>
> This is perl, v5.8.8 built for darwin-thread-multi-2level
> (with 1 registered patch, see perl -V for more detail)
>
>
> -Arun
>
> On Thu, Oct 22, 2009 at 10:08 PM, Arun <arun.ka_at_[hidden]> wrote:
>> Hi John,
>> Thanks for u r reply.
>>
>> I am using the code as below.
>>
>> bool testRegExMatch(std::string aRegex, std::string aTestString)
>> {
>>        boost::regex regExpr(aRegex);
>>
>>        if(regex_match(aTestString,regExpr)==false)
>>        {
>>                return false;
>>        }
>>
>>        return true;
>> }
>>
>> bool validateEmailAddressWithRegx(std::string aEmailAddressStr)
>> {
>>        std::string expr("^((?>[a-zA-Z\\d!#$%&'*+\\-/=?^_`{|}~]+\\x20*|\"((?=[\\x01-\\x7f])[^\"\\\\]|\\\\[\\x01-\\x7f])*\"\\x20*)*(?<angle><))?((?!\\.)(?>\\.?[a-zA-Z\\d!#$%&'*+\\-/=?^_`{|}~]+)+|\"((?=[\\x01-\\x7f])[^\"\\\\]|\\\\[\\x01-\\x7f])*\")@(((?!-)[a-zA-Z\\d\\-]+(?<!-)\\.)+[a-zA-Z]{2,}|\\[(((?(?<!\\[)\\.)(25[0-5]|2[0-4]\\d|[01]?\\d?\\d)){4}|[a-zA-Z\\d\\-]*[a-zA-Z\\d]:((?=[\\x01-\\x7f])[^\\\\\\[\\]]|\\\\[\\x01-\\x7f])+)\\])(?(<angle>)>)$");
>>
>>        cout<<expr;
>>
>>        return testRegExMatch(expr, aEmailAddressStr);
>> }
>>
>> ^((?>[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA-Z\d!#$%&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(<angle>)>)$
>> [Session started at 2009-10-22 22:02:58 +0530.]
>> terminate called after throwing an instance of 'boost::bad_expression'
>>  what():  Invalid preceding regular expression
>>
>>
>> Any thing i am missing
>>
>> -Arun
>>
>> But still i am getting the output as
>>
>>
>>
>>
>> On Thu, Oct 22, 2009 at 9:35 PM, John Maddock <john_at_[hidden]> wrote:
>>>> I am using boost regex libraries (libboost_regex-1_32.dylib) for
>>>> validation of strings. The regular expression string i use is as
>>>> below.
>>>>
>>>> string expr =
>>>> "^((?>[a-zA-Z\\d!#$%&'*+\\-/=?^_`{|}~]+\\x20*|\"((?=[\\x01-\\x7f])[^\"\\\\]|\\\\[\\x01-\\x7f])*\"\\x20*)*(?<angle><))?((?!\\.)(?>\\.?[a-zA-Z\\d!#$%&'*+\\-/=?^_`{|}~]+)+|\"((?=[\\x01-\\x7f])[^\"\\\\]|\\\\[\\x01-\\x7f])*\")@(((?!-)[a-zA-Z\\d\\-]+(?<!-)\\.)+[a-zA-Z]{2,}|\\[(((?(?<!\\[)\\.)(25[0-5]|2[0-4]\\d|[01]?\\d?\\d)){4}|[a-zA-Z\\d\\-]*[a-zA-Z\\d]:((?=[\\x01-\\x7f])[^\\\\\\[\\]]|\\\\[\\x01-\\x7f])+)\\])(?(angle)>)$";
>>>>
>>>> When i make a call
>>>>
>>>> boost::regex regExpr(expr);
>>>>
>>>> it is throwing  what():  Invalid preceding regular expression.
>>>
>>> Sigh... I *really* need to improve those error messages :-(
>>>
>>> I tried the expression in Perl and got an error as well:
>>>
>>> $in = 'name.surname_at_[hidden]';
>>> $in =~
>>> /^((?>[a-zA-Z\d!#$%&'*+\-\/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA-Z\d!#$%&'*+\-\/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(angle)>)$/;
>>> print "\n";
>>>  print "\$& = $&\n";
>>>  print "\$1 = $1\n";
>>>  print "\$2 = $2\n";
>>>  print "\$3 = $3\n";
>>>  print "\$4 = $4\n";
>>>  print "\$5 = $5\n";
>>>  print "\$6 = $6\n";
>>>  print "\$7 = $7\n";
>>>  print "\$8 = $8\n";
>>>
>>> Prints:
>>>
>>> Unknown switch condition (?(an in regex; marked by <-- HERE in
>>> m/^((?>[a-zA-Z\d!#0&'*+\-/=?^_`{|}~]+\x20*|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA-Z\d!#0&'*+\-/=?^_`{|}~]+)+|"((?=[\x01-\x7f])[^"\\]|\\[\x01-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}|\[(((?(?<!\[)\.)(25[0-5]|2[0-4]\d|[01]?\d?\d)){4}|[a-zA-Z\d\-]*[a-zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]|\\[\x01-\x7f])+)\])(?(
>>> <-- HERE angle)>)$/ at test.pl line 3.
>>>
>>> So Perl and Boost.Regex are both rejecting the "(?(angle)>)" part, and
>>> looking at http://perldoc.perl.org/perlre.html I believe it should be
>>> rejected.  It appears this is a .NET-specific construct :-(
>>>
>>> Changing to "(?(<angle>)>)" seems to make everything work though, in code:
>>>
>>>  boost::regex test (
>>> "^((?>[a-zA-Z\\d!#$%&'*+\\-/=?^_`{|}~]+\\x20*|\"((?=[\\x01-\\x7f])[^\"\\\\]|\\\\[\\x01-\\x7f])*\"\\x20*)*(?<angle><))?((?!\\.)(?>\\.?[a-zA-Z\\d!#$%&'*+\\-/=?^_`{|}~]+)+|\"((?=[\\x01-\\x7f])[^\"\\\\]|\\\\[\\x01-\\x7f])*\")@(((?!-)[a-zA-Z\\d\\-]+(?<!-)\\.)+[a-zA-Z]{2,}|\\[(((?(?<!\\[)\\.)(25[0-5]|2[0-4]\\d|[01]?\\d?\\d)){4}|[a-zA-Z\\d\\-]*[a-zA-Z\\d]:((?=[\\x01-\\x7f])[^\\\\\\[\\]]|\\\\[\\x01-\\x7f])+)\\])(?(<angle>)>)$"
>>> );

I thought one of the correct regex's for email (going strict to the
standard) is at http://ex-parrot.com/~pdw/Mail-RFC822-Address.html and
supports everything except email comments (which would blow that up
more then it already is)?

And yes, I tried your regex, not even javascript's parser was able to
parse it. And what kind of .NET specific crap is that and what is it
supposed to do?