Subject: Re: [boost] RFC: interest in Unicode codecs?
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-07-18 01:43:33


OvermindDL1 wrote:
> On Sat, Feb 14, 2009 at 10:07 AM, Phil Endecott
> <spam_from_boost_dev_at_[hidden]> wrote:
>> /* snip */
>> Yes, a Unicode character properties library is important to those who are
>> writing text editors and similar applications. Perhaps Boost should have
>> one. I have personally used the Unicode properties tables for doing
>> "approximate matching" of e.g. accented characters with their base
>> characters when searching. But I can do that equally well in UTF-8 as in
>> UTF-32.
>
> If you are all interested in other opinions, I would love for boost to
> have a UTF8(16/32) helper library.

There is a google summer of code project for a unicode library which I'm
working on.

It allows handling of unicode text in any of UTF-8, UTF-16 or UTF-32
encodings, bundles a small-ish unicode character database, supports
grapheme boundaries, composition/decomposition and normalization, but
not "approximate matching", collation or case folding (at least it won't
for the time being).