From: Gennadiy E. Rozental (rogeeff_at_[hidden])
Date: 2001-08-09 21:00:47


Hi!

I have some proposition how to make tokenizer library little bit more
flexible. Here implementation for char_delimiters_separator::operator
() that I propose to modify:

template<class InputIterator,class Token>
bool operator()(InputIterator& next, InputIterator end, Token& tok){
  tok = Token();
         
  // skip past all nonreturnable delims
  // skip past the returnable only if we are not returning delims
  for(;next!=end && ( is_nonret(*next) || (is_ret(*next)
      && !return_delims_ ) );++next){}
         
  if(next == end){
     return false;
  }
         
  // if we are to return delims and we are one a returnable one
  // move past it and stop
  if(is_ret(*next) && return_delims_){
     tok.assign( next, 1 ); //!!!!!!!!!!!!!!!!!!!!!!!!
     ++next;
  }
  else {
    InputIterator curr = next;
    // append all the non delim characters
    while( next!=end && !is_nonret(*next) && !is_ret(*next) ) {
       ++next;
    }
    
    token.assign(curr,next); //!!!!!!!!!!!!!!!!!!!!!!
  }
  
  return true;
}

Difference:
Instead of operator+=(Char) we now require methods:
       assign( Iterator begin, length ) and
       assign( Iterator begin, Iterator end )
(I realize that it is possible to implement the same logic using only
second function.)

  std::string will work with new impelmentation.

Advantage:
  Now I am not obligated to use "String" class that allocate memory
(which is the case if you use append-like token creation logic). I
can use class that will be pointer-based to the memory (I have such
const_string class and found it very useful and efficient when I do
parse-like work). For most cases you read some line and then tokenize
it and will never change the tokens.

What do you think?

Gennadiy.