Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-04-26 07:38:11


On 25/04/2011 21:50, Ryou Ezoe wrote:
> On Tue, Apr 26, 2011 at 3:55 AM, Artyom<artyomtnk_at_[hidden]> wrote:
>>> From: Ryou Ezoe<boostcpp_at_[hidden]>
>>>
>>> Sort by code point is not the best solution.
>>> But at least, it's consistent if we use one encoding.
>>>
>>
>> No it is not, UCS encoding has different order
>> in different representations:
>>
>> UTF-8 and UTF-32 order is consistent i.e.
>>
>> for each a,b in utf8(a)< utf8(b) iff utf32(a)< utf32(b)
>>
>> However this is not correct for UTF-16 where codepoints
>> outside of BMP has different ordering. i.e.
>>
>> It may be that codepoint (a)> codepoint(b) but UTF-16(a) sorted before
>> UTF-16(b)
>
> What do you mean?
> No matter what UTF you use.
> Code point is same.
> You can't compare UTF-8 string by comparing each octet.

Actually, you can. And you should actually do it at the octet level for
efficiency.