$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2008-02-24 08:12:57
Frank Mori Hess wrote:
> I don't have a lot of experience using non-ascii strings in my
> internal code,
> aside from occasional forays into utf-8 for special characters, but
> wouldn't
> using ucs-4 for the "core" encoding be the sane thing to do? With a
> ucs-4
> encoding, you could use a
>
> basic_string<wchar_t>
>
> and continue using the familiar api without worrying about the
> complications
> and confusion caused by variable length encodings.
The sane thing, perhaps. But take a look at Mozilla, for example, who're
dealing with character data a lot. Currently they're evaluating the
memory and speed effects of switching from UTF-16 to UTF-8 for
everything. The reasoning is that even on web pages that consist mostly
of exotic characters, there's still a lot of ASCII around (not counting
tag names): URIs, IDs, classes, names, etc. Thus, the space savings
could be considerable. (Current benchmarks record an average of a few
percent on an unfortunately not representative set of pages, if I
remember correctly.)
Can you imagine what these developers would think of switching to
UTF-32, where 11 bits are guaranteed to be wasted simply because all
Unicode5 planes can be represented with 21 bits?
Sebastian