$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Gennadiy E. Rozental (rogeeff_at_[hidden])
Date: 2001-08-09 21:00:47
Hi!
I have some proposition how to make tokenizer library little bit more
flexible. Here implementation for char_delimiters_separator::operator
() that I propose to modify:
template<class InputIterator,class Token>
bool operator()(InputIterator& next, InputIterator end, Token& tok){
tok = Token();
// skip past all nonreturnable delims
// skip past the returnable only if we are not returning delims
for(;next!=end && ( is_nonret(*next) || (is_ret(*next)
&& !return_delims_ ) );++next){}
if(next == end){
return false;
}
// if we are to return delims and we are one a returnable one
// move past it and stop
if(is_ret(*next) && return_delims_){
tok.assign( next, 1 ); //!!!!!!!!!!!!!!!!!!!!!!!!
++next;
}
else {
InputIterator curr = next;
// append all the non delim characters
while( next!=end && !is_nonret(*next) && !is_ret(*next) ) {
++next;
}
token.assign(curr,next); //!!!!!!!!!!!!!!!!!!!!!!
}
return true;
}
Difference:
Instead of operator+=(Char) we now require methods:
assign( Iterator begin, length ) and
assign( Iterator begin, Iterator end )
(I realize that it is possible to implement the same logic using only
second function.)
std::string will work with new impelmentation.
Advantage:
Now I am not obligated to use "String" class that allocate memory
(which is the case if you use append-like token creation logic). I
can use class that will be pointer-based to the memory (I have such
const_string class and found it very useful and efficient when I do
parse-like work). For most cases you read some line and then tokenize
it and will never change the tokens.
What do you think?
Gennadiy.