$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String	Adapter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-14 10:12:44
Soares Chen Ruo Fei wrote:
>> with non-Unicode CJK encodings
>> like Shift-JIS or GBK there is no
>> way to go backward
> Ahh I see so that's quite nasty, but actually it still can be done
> with the sacrifice on efficiency. Basically since the iterator already
> has the begin and end boundary iterators it can simply reiterate all
> over from the beginning of the string. Although doing so is roughly
> O(N^2) it shouldn't make significant impact as developers rarely use
> this multi-byte encoding and even seldom use the reverse decoding
> function.
As a general point, I believe it's a bad idea to hide a surprise like 
O(N^2) instead of O(N) complexity in a "rare" case.  Doing so means 
that users will implement something that seems to work, and then get 
bitten later when it doesn't work in the field.  (For example, the 
first time that a customer in Japan tries to process a 1 MB file and it 
takes a million times longer than expected.)
It would be better to not provide the inefficient case at all.  Compare 
with how std::list doesn't provide random access, even though it could 
do so in O(N).  Looking at your character set iterator, it seems to me 
that you could have a forward-only iterator and a bidirectional 
iterator for UTF, but only the former for these other encodings.  Not 
storing the begin iterator when only forward iteration is needed also 
saves space.
Regards,  Phil.