$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Boost.Unicode (was Re: Boost.Locale)
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-12-16 07:15:08
On 16/12/2010 12:32, Artyom wrote:
> Then don't do case conversion!
I already parse the data that provides that information, I might as well
forward it to the user.
Unicode provides two levels of casing, one in its main character
mapping, and one in the SpecialCasing supplement.
> Do just case folding. For such "simple" and incorrect
> case conversion I don't need sophisticated Unicode library, I can use use
> standard
> operating system API and even std::locale::ctype very successfully
> (which I do in Boost.Locale if user prefers to use non-icu based backend)
>
> Case conversion is:
>
> - context dependent: Greek letter "Σ" is converted to "Ï" or to "Ï", according
> to position in the word.
> - locale dependent: Turkish i goes to İ
> - not 1-to-1: German à goes to SS in upper case.
Right, and the reason I'm not doing it right now is because I don't want
to look into the context thing before I take a look at more complex
things that I think are more immediately useful.
> I'm not sure about case-folding but AFAIK it is not 1-to-1 as well - but I may
> be wrong.
No it isn't.
It also needs special treatment of Turkish, but nothing context-dependent.
> For general collation that works "well" in most languages I can use strcmp... I
> don't
> need Unicode library for this.
Doesn't allow to search for a substring regardless of case, accentuation
or punctuation.
The thing that really interests me with collation is collation folding.