$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: George A. Heintzelman (georgeh_at_[hidden])
Date: 2001-03-20 13:50:37
> >What I don't get is the reason why it [regex_split] does it this way, eating the 
> >input string. 
[snip]
> It's done that way, because that's the way that perl does it.  You can also
> specify an upper limit to the number of items to be split, in which case
> some text may be left in the string.  BTW the return value is the number of
> *items* split out, not the number of characters removed, so there is no way
> of knowing how much input has been processed unless you erase processed
> text from the input.
That perl does it that way is IMHO a good reason to supply a function 
which does it perl's way, but not a good reason not to also have one 
which does it the other way, especially when you can be more efficient 
in a fairly large set of circumstances. There are two points here. The 
first is a const correctness issue, something which perl doesn't even 
have the concept of. The second is a potential efficiency cost; in perl 
I can well believe that perl-style is faster than the non-destructive 
version, but that is certainly not true with our current 
implementation. I think a C++ library should be easily useable in 
const-correct C++-style, and hew as close as reasonably possible to the 
standard library principle of not paying for a feature you're not 
using. Both of these ideas the current interface of regex_split 
violates.
Am I alone in this opinion? If so I'll shut up and go away. :)
> BTW it's not that hard to role your own function that does what you want -
> really all you need is a custom functor to pass to regex_grep.
Of course it's not hard. But to do it right, one winds up duplicating 
98% of
the code in regex_split. For the library writer, on the other hand, it 
is easy to do it instead with a helper function which returns a pair of 
(items, characters) in the library, and have that used by both eating 
and non-eating functions.
I'm a little stuck on a good distinguishable name for a non-eating 
version, though. regex_tokenize might be okay, I guess, but it doesn't 
make the difference clear on the name. OTOH, neither does regex_split, 
so maybe we could ignore the issue.
George Heintzelman
georgeh_at_[hidden]