$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [rfc] Unicode GSoC project
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-05-14 18:28:52
Eric Niebler wrote:
> Mathias Gaunard wrote:
> Also needed are tables that store the
> various character properties, and (hopefully) some parsers that build
> the tables directly from the Unicode character database so we can easily
> rev it whenever the database changes.
For the record, I have scripts that can generate ISO-8859-* to/from
unicode tables from the downloaded data; I'll happily contribute this
if it is useful to anyone.
> The library provides the following core types in the boost namespace:
>
> uchar8_t
> uchar16_t
> uchar32_t
>
> In C++0x, these are called char, char16_t and char32_t.
I liked that idea of making them obviously-unsigned; I had some nasty
bugs with my UTF-8 code where I made invalid assumptions about signs.
But of course being consistent with C++0x is more important.
> I strongly disagree with requiring normalization form C for the concept
> UnicodeRange. There are many more valid Unicode sequences.
Agreed.
> the concrete algorithms must come first.
Agreed. Mathias, I would love to see a sort of "end user perspective"
view of how this library will be used, i.e. its scope and basic usage pattern.
Phil.