Boost mailing page: Re: [boost] UTF-8 conversion etc.

Date view	Thread view	Subject view	Author view

From: Sebastian Redl (sebastian.redl_at_[hidden])
Date: 2008-02-24 08:12:57

Next message: John Maddock: "[boost] Reminder: Floating Point Utilities Review: some more reviews needed!"
Previous message: Marco Costalba: "[boost] [announce] Dynamic object factory v3"
In reply to: Frank Mori Hess: "Re: [boost] UTF-8 conversion etc."
Next in thread: Sebastian Redl: "Re: [boost] UTF-8 conversion etc."

Frank Mori Hess wrote:
> I don't have a lot of experience using non-ascii strings in my
> internal code,
> aside from occasional forays into utf-8 for special characters, but
> wouldn't
> using ucs-4 for the "core" encoding be the sane thing to do? With a
> ucs-4
> encoding, you could use a
>
> basic_string<wchar_t>
>
> and continue using the familiar api without worrying about the
> complications
> and confusion caused by variable length encodings.
The sane thing, perhaps. But take a look at Mozilla, for example, who're
dealing with character data a lot. Currently they're evaluating the
memory and speed effects of switching from UTF-16 to UTF-8 for
everything. The reasoning is that even on web pages that consist mostly
of exotic characters, there's still a lot of ASCII around (not counting
tag names): URIs, IDs, classes, names, etc. Thus, the space savings
could be considerable. (Current benchmarks record an average of a few
percent on an unfortunately not representative set of pages, if I
remember correctly.)

Can you imagine what these developers would think of switching to
UTF-32, where 11 bits are guaranteed to be wasted simply because all
Unicode5 planes can be represented with 21 bits?

Sebastian

Next message: John Maddock: "[boost] Reminder: Floating Point Utilities Review: some more reviews needed!"
Previous message: Marco Costalba: "[boost] [announce] Dynamic object factory v3"
In reply to: Frank Mori Hess: "Re: [boost] UTF-8 conversion etc."
Next in thread: Sebastian Redl: "Re: [boost] UTF-8 conversion etc."

Date view	Thread view	Subject view	Author view