$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [Tokenizer]Usage and documentation
From: Max (more4less_at_[hidden])
Date: 2011-02-16 08:01:19
[Yechezkel Mett]
>
> ,\s*(),
>
> means find a ',' followed by any number of spaces followed by a ','
> and capture an empty string.
Yes, now I see. Thank you, Yechezkel.
>
> The others are similar.
>
> >
> > r: "([^"]*)"|([^\s,"]+)|,\s*(),|^\s*(),|,\s*()$
> >
> > empty,,,fields, , , like this
> > [empty][][fields][][like][this]
> > ,,,
> > [][]
> >
> > There are 2 empty tokens in between each 3 contiguous ',' but only one
for
> > each is detected.
>
> Yes, that's a mistake. When matching ,, as an empty field the second
> ',' is eaten and can no longer be used as the beginning of the next
> field.
>
> "([^"]*)"|([^\s,"]+)|,\s*()(?=,)|^\s*()(?=,)|,\s*()$
>
> should work. (?=) is a lookahead, it checks that the pattern (',' in
> this case) matches at this point, but doesn't eat any input.
>
Yes, Its behavior is exactly as you expected.
> >
> > Likewise, for (2), I get:
> >
> > r: "([^"]*)"|([^\s,"]+)|(?:^|,)\s*()(?:$|,)
> >
> > empty,,,fields, , , like this
> > [empty][fields][like][this]
> >
> > This time, the behavior is no different than the 'original' version.
>
> I get the same results as the first version. Perhaps it wasn't escaped
properly?
Yes, you are right. My different result came from my incorrect escaping
unintentionally.
B/Rgds
Max
P.S
I've found some 'complete' reference (books) on RE. However it's this thread
of discussion that has indeed triggered a leap of my understanding of RE.
And, I have also had a revisit, not so deep though, to SPIRIT.Qi, following
the direction of Michael. (Qi is a power tool I believe I definitely will
use,
and its siblings.)
Now I'm able to comprehend quite 'complex' expression, including whose
appeared in this thread.
Thank you Michael, Yechezkel, Stephan for your kind help!