From: Robert Zeh (razeh_at_[hidden])
Date: 2003-12-05 09:24:02


John Torjo <john.lists_at_[hidden]> writes:

> Dear boosters,
>
> While trying to implement slice range (in rtl - range template
> library), I came across the token_iterator class.
> While examining it, I found the TokenizerFunction concept too
> complicated, basically uniting two concepts.
>
>
> The way I see implementing a token, there are two concepts:
> 1. finding where each token begins and ends (this can be implemented
> incredibly simple, see below)
>
> 2. parsing the token, and returning the result.
>
>
> By keeping the above separated, we get simpler code and more reusability.
>
> A simple example could be: you want to parse each word on a file.
> As results, you might want the words themselves, (who knows?) only
> first 10 letters from the words, first letter from each word, or the
> word length.
> Keeping the 2 concepts separated, and the implementation is a breeze
> (efficent as well).
>
> Here's a possible implementation of parsing words:
> // does a new word begin, after 'first'?
> bool are_from_same_word( char first, char second) {
> if ( !isspace(second)) return true;
> return isspace(first) ? true : false;
> }
>
> void ignore_space(const char *& begin, const char *&end) {
> while ( begin != end)
> if (isspace(*begin)) begin++; else break;
> while ( begin != end)
> if (isspace(end[-1])) end--; else break;
> }
> std::string parse_word( const char * begin, const char *end) {
> ignore_space(begin,end);
> return std::string( begin, end);
> }
>
> int parse_word_len( const char * begin, const char *end) {
> ignore_space(begin,end);
> return end - begin;
> }
>
> ... etc.
>
>
> The above is a very generic solution that does not apply to strings only.
> (also, I was thinking a better name: slice - which slices a range into
> multiple ranges, and for each such range computes something. The result
> is another range).
>
> I will do some coding these days and post the results.
>
> Best,
> John
>

One of the nice features of the current Tokenizerfunction concept is
that it is a single pass algorithm, and will work with input
iterators. I'm not sure how to keep the algorithm single pass if you
split the token delimitation from the token creation.

Robert