Boost mailing page: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-14 10:12:44

Next message: Joel falcou: "Re: [boost] The Lonely Song of the MPL Maintainer -- or Boost support for antediluvian compiler and the future supprot of C++11"
Previous message: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
In reply to: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Next in thread: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Gordon Woodhull: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"

Soares Chen Ruo Fei wrote:

>> with non-Unicode CJK encodings
>> like Shift-JIS or GBK there is no
>> way to go backward

> Ahh I see so that's quite nasty, but actually it still can be done
> with the sacrifice on efficiency. Basically since the iterator already
> has the begin and end boundary iterators it can simply reiterate all
> over from the beginning of the string. Although doing so is roughly
> O(N^2) it shouldn't make significant impact as developers rarely use
> this multi-byte encoding and even seldom use the reverse decoding
> function.

As a general point, I believe it's a bad idea to hide a surprise like
O(N^2) instead of O(N) complexity in a "rare" case. Doing so means
that users will implement something that seems to work, and then get
bitten later when it doesn't work in the field. (For example, the
first time that a customer in Japan tries to process a 1 MB file and it
takes a million times longer than expected.)

It would be better to not provide the inefficient case at all. Compare
with how std::list doesn't provide random access, even though it could
do so in O(N). Looking at your character set iterator, it seems to me
that you could have a forward-only iterator and a bidirectional
iterator for UTF, but only the former for these other encodings. Not
storing the begin iterator when only forward iteration is needed also
saves space.

Regards, Phil.

Date view	Thread view	Subject view	Author view