$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Robert Zeh (razeh_at_[hidden])
Date: 2003-12-05 09:24:02
John Torjo <john.lists_at_[hidden]> writes:
> Dear boosters,
> 
> While trying to implement slice range (in rtl - range template
> library), I came across the token_iterator class.
> While examining it, I found the TokenizerFunction concept too
> complicated, basically uniting two concepts.
> 
> 
> The way I see implementing a token, there are two concepts:
> 1. finding where each token begins and ends (this can be implemented
> incredibly simple, see below)
> 
> 2. parsing the token, and returning the result.
> 
> 
> By keeping the above separated, we get simpler code and more reusability.
> 
> A simple example could be: you want to parse each word on a file.
> As results, you might want the words themselves, (who knows?) only
> first 10 letters from the words, first letter from each word, or the
> word length.
> Keeping the 2 concepts separated, and the implementation is a breeze
> (efficent as well).
> 
> Here's a possible implementation of parsing words:
> // does a new word begin, after 'first'?
> bool are_from_same_word( char first, char second) {
>      if ( !isspace(second)) return true;
>      return isspace(first) ? true : false;
> }
> 
> void ignore_space(const char *& begin, const char *&end) {
>     while ( begin != end)
>       if (isspace(*begin)) begin++; else break;
>     while ( begin != end)
>       if (isspace(end[-1])) end--; else break;
> }
> std::string parse_word( const char * begin, const char *end) {
>     ignore_space(begin,end);
>     return std::string( begin, end);
> }
> 
> int parse_word_len( const char * begin, const char *end) {
>     ignore_space(begin,end);
>     return end - begin;
> }
> 
> ... etc.
> 
> 
> The above is a very generic solution that does not apply to strings only.
> (also, I was thinking a better name: slice - which slices a range into
> multiple ranges, and for each such range computes something. The result
> is another range).
> 
> I will do some coding these days and post the results.
> 
> Best,
> John
> 
One of the nice features of the current Tokenizerfunction concept is
that it is a single pass algorithm, and will work with input
iterators.  I'm not sure how to keep the algorithm single pass if you
split the token delimitation from the token creation.
Robert