$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [string] proposal
From: Patrick Horgan (phorgan1_at_[hidden])
Date: 2011-01-21 20:47:14
On 01/21/2011 09:50 AM, Beman Dawes wrote:
> ... elision by patrick ....
>
> IMO, Any serious Unicode string proposal has to address UTF-8 strings,
> UTF-16 strings, UTF-32 strings, and probably UTF strings where the
> particular UTF encoding is established at runtime. Applications that
> deal with Asian languages, do a lot of random access, or would pay a
> performance or storage penalty will demand more than just UTF-8
> strings. There might be other variants, too, such as a BMP-string. If
> a Unicode string library provides a strong design framework that is
> clearly articulated, then an initial implementation would only have to
> provide the most needed types; UTF-8 and UTF-16/BMP.
>
> I really doubt any proposal will get taken very seriously is it only
> supports one of the UTF encodings.
+1 with the caveat that UTF-8 and UTF-32 is considered by many to be the 
most needed types with UTF-16 considered evil.  (Seems to be a 
Windows/non-Windows split.  I like them all;)  So all three (four if you 
want to differentiate between fixed-width UTF-16/BMP (really UCS-2) and 
the full UTF-16) would be needed to avoid people saying that it doesn't 
fill their needs so why did we bother.  The UTF string with run-time 
would carry a lot of extra code.  Wouldn't a programmer know which he 
wanted to use internally at compile time?
Patrick
p.s. Nice quick description of the differences between and history of 
UCS-2 UCS-4 utf-8 utf-16 utf-32 at 
http://en.wikipedia.org/wiki/Universal_Character_Set