Subject: Re: [boost] [rfc] Unicode GSoC project
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2009-05-15 10:12:12


Scott McMurray wrote:

> I really think UTF-8 should be the recommended one, since it forces
> people to remember that it's no longer one unit, one "character".
>
> Even in Beman Dawes's talk
> (http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/07/filesystem.pdf)
> where slide 11 mentions UTF-32 and remembers that UTF-16 can still
> take 2 encoding units per codepoint, slide 13 says that UTF-16 is
> "desired" where "random access critical".

I don't plan on supporting random access for UTF-16.
UTF-16 is still faster than UTF-8 because UTF-8 requires more complex
decoding.
UTF-16 has only two cases, making it easier to optimize branches under
the likely and unlikely case.