Subject: Re: [boost] [rfc] Unicode GSoC project
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-05-13 06:55:17


Hi Mathias,

Mathias Gaunard wrote:
> I have been working on range adaptors to iterate over code points in
> an UTF-x string as well as converting back those code points to UTF-y
> for the past week

I would be interested to see this code. I encourage you to share what
you have done as soon as possible, to get prompt feedback.

> short documentation
> http://mathias.gaunard.emi.u-bordeaux1.fr/unicode/doc/html/

Some feedback based on that document:

     UTF-16
     ....
     This is the recommended encoding for dealing with Unicode.

Recommended by who? It's not the encoding that I would normally recommend.

     make_utf8(Range&& range);
     Assumes range range is a properly encoded UTF-8 range in
Normalization Form C.
     Iterating the range may throw an exception if it isn't.

     as_utf8(Range&& range);
     Return type is a model of UnicodeRange whose value type is uchar8_t.

To me, the word "make" suggests that the former is actually doing a
conversion. But it's the latter, "as", that does that. Can we think
of something better? (Can anyone suggest any precidents?)

Regards, Phil.