Subject: Re: [boost] [rfc] Unicode GSoC project
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-05-14 18:28:52


Eric Niebler wrote:
> Mathias Gaunard wrote:

> Also needed are tables that store the
> various character properties, and (hopefully) some parsers that build
> the tables directly from the Unicode character database so we can easily
> rev it whenever the database changes.

For the record, I have scripts that can generate ISO-8859-* to/from
unicode tables from the downloaded data; I'll happily contribute this
if it is useful to anyone.

> The library provides the following core types in the boost namespace:
>
> uchar8_t
> uchar16_t
> uchar32_t
>
> In C++0x, these are called char, char16_t and char32_t.

I liked that idea of making them obviously-unsigned; I had some nasty
bugs with my UTF-8 code where I made invalid assumptions about signs.
But of course being consistent with C++0x is more important.

> I strongly disagree with requiring normalization form C for the concept
> UnicodeRange. There are many more valid Unicode sequences.

Agreed.

> the concrete algorithms must come first.

Agreed. Mathias, I would love to see a sort of "end user perspective"
view of how this library will be used, i.e. its scope and basic usage pattern.

Phil.