Subject: Re: [boost] Boost.Unicode (was Re: Boost.Locale)
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-12-16 07:15:08


On 16/12/2010 12:32, Artyom wrote:

> Then don't do case conversion!

I already parse the data that provides that information, I might as well
forward it to the user.

Unicode provides two levels of casing, one in its main character
mapping, and one in the SpecialCasing supplement.

> Do just case folding. For such "simple" and incorrect
> case conversion I don't need sophisticated Unicode library, I can use use
> standard
> operating system API and even std::locale::ctype very successfully
> (which I do in Boost.Locale if user prefers to use non-icu based backend)
>
> Case conversion is:
>
> - context dependent: Greek letter "Σ" is converted to "σ" or to "ς", according
> to position in the word.
> - locale dependent: Turkish i goes to İ
> - not 1-to-1: German ß goes to SS in upper case.

Right, and the reason I'm not doing it right now is because I don't want
to look into the context thing before I take a look at more complex
things that I think are more immediately useful.

> I'm not sure about case-folding but AFAIK it is not 1-to-1 as well - but I may
> be wrong.

No it isn't.
It also needs special treatment of Turkish, but nothing context-dependent.

> For general collation that works "well" in most languages I can use strcmp... I
> don't
> need Unicode library for this.

Doesn't allow to search for a substring regardless of case, accentuation
or punctuation.
The thing that really interests me with collation is collation folding.