$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-16 18:43:02
Soares Chen Ruo Fei wrote:
> On Tue, Aug 16, 2011, Phil Endecott wrote:
>> I'm not familiar with the algorithms requiring bidirectional access that
>> Artyom mentions, but a standard way to make them work with iterators for
>> various different encodings would be to specialise the algorithms. ?You
>> would have a main implementation that requires the bidirectional (or random
>> access) iterator, and a forwarding implementation that looks like this:
>>
>> template <typename FORWARD_ITER>
>> void algorithm(FORWARD_ITER begin, FORWARD_ITER end)
>> {
>> ?// Make a copy of the range into a bidirectional container:
>> ?std::vector< typename FORWARD_ITER::value_type > v(begin,end);
>> ?// Call the other specialisation:
>> ?algorithm(v.begin(),v.end());
>> }
>>
>> That is the standard time-vs-space complexity trade-off.
>
> Well I don't think forcing all generic Unicode algorithms to provide
> specialization version for forward-only iterators is any better than
> providing a less-efficient bidirectional iterator. Such a burden is
> too high for the algorithm developers. Or perhaps a better decision is
> to simply let the compiler yield a (friendly?) error when the generic
> algorithm uses the decrement/random access operator, and find a way to
> inform the user to convert the string to standard UTF strings before
> passing to the Unicode algorithms.
The "less-efficient" O(N^2) bidirectional iterator is completely
unreasonable. Algorithms are not being "forced" to do anything.
Have a look at how the standard library does things.
std::lower_bound() and std::rotate(), for example, have specialisations
that select different algorithms depending on the type of iterator that
is supplied; on the other hand, std::random_shuffle() only takes random
access iterators and it would be the user's responsibility to choose
what to do if they had some other kind of range.
> Or perhaps I could find a way to let template instances of
> unicode_string_adapter with MBCS encoding to store convert the string
> to UTF string during construction and store the UTF encoded string
> instead. The only problem for this is that during conversion back to
> the raw string, the string adapter would have to reconvert the
> internally stored UTF-encoded string back to the MBCS-encoded string.
> This can be expensive if the user regularly wants access the raw
> string, unless we store two smart pointers within the string adapter -
> one for the MBCS string and one for the converted UTF string, but
> doing so would waste storage space as well.
No, don't do that. Just provide the iterators that can be provided efficiently.
Phil.