$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
From: Pavol Droba (droba_at_[hidden])
Date: 2008-04-03 02:27:30
Hi,
If you don't want to have container to store the results, you can use
the split_iterator directly. split algorithm only wraps the split_iterator.
http://www.boost.org/doc/libs/1_35_0/doc/html/boost/algorithm/split_iterator.html
http://www.boost.org/doc/libs/1_35_0/doc/html/string_algo/usage.html#id1290714
Regards,
Pavol.
Florin Trofin wrote:
> Turns out that the char_separator shamelessly constructs std::strings 
> under the cover so I gained something but not as much as I hoped. The 
> split algorithm you mention requires a container to store the results so 
> you still have to do one allocation, correct?
> 
> Frustrating! In theory one should be able to parse a sequence of tokens 
> without constructing or copying any strings.
> 
> Florin.
> 
> On Wed, Mar 26, 2008 at 12:54 AM, Pavol Droba <droba_at_[hidden] 
> <mailto:droba_at_[hidden]>> wrote:
> 
>     Hi,
> 
>     Why don't you just use the split algorithm in the StringAlgo library?
> 
>     http://www.boost.org/doc/html/string_algo/usage.html#id1638440
> 
> 
>     Regards,
>     Pavol.
> 
>     Florin Trofin wrote:
>      > Hi,
>      >
>      >
>      > I've been using the boost tokenizer successfully in the past and I've
>      > been quite happy with it. I was using it with std::string as my token
>      > type, but now I need to use it differently because of performance
>      > reasons (the input string is a raw UTF8 buffer (const unsigned char*)
>      > and output is a specific UTF16 string class). So I thought: maybe
>     I can
>      > just tokenize the unsigned char buffer in place using
>      > boost::iterator_range<const unsigned char*> as my token type.
>      >
>      > And it almost worked! With a hack:
>      >
>      > the tokenizer attempts to call assign on my TokenType but
>      > boost::iterator_range doesn't have such member function. I created a
>      > wrapper class that simply delegates to the iterator_range's
>     assignment
>      > operator and it now works!
>      >
>      > This is great because I have no more useless string
>     constructions: I can
>      > go directly from a raw UTF8 buffer to my output string type (UTF16
>      > based) with only one conversion and no extra allocations! I still
>     have
>      > the nice syntax of boost tokenizer and the maximum efficiency!
>      >
>      > I think this solution should be mentioned in the tutorial docs
>     because
>      > it might not be obvious for everybody. Also, maybe we can
>     eliminate the
>      > hack I did by adding an assign() to the boost range interface (this
>      > seems simpler to me than modifying the tokenizer to not call assign).
>      >
>      > Thanks for the great work you guys put into this library!
>      >
>      >
>      > Best regards,
>      >
>      >
>      > Florin.
>      >
>      >
>      >
>     ------------------------------------------------------------------------
>      >
>      > _______________________________________________
>      > Boost-users mailing list
>      > Boost-users_at_[hidden] <mailto:Boost-users_at_[hidden]>
>      > http://listarchives.boost.org/mailman/listinfo.cgi/boost-users
>     _______________________________________________
>     Boost-users mailing list
>     Boost-users_at_[hidden] <mailto:Boost-users_at_[hidden]>
>     http://listarchives.boost.org/mailman/listinfo.cgi/boost-users
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://listarchives.boost.org/mailman/listinfo.cgi/boost-users