$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Daryle Walker (darylew_at_[hidden])
Date: 2004-04-25 13:14:30
On 4/13/04 3:27 PM, "Miro Jurisic" <macdev_at_[hidden]> wrote:
> In article <00c101c42167$8d0a7f40$1b440352_at_fuji>,
> "John Maddock" <john_at_[hidden]> wrote:
[SNIP]
>> However I think we're getting ahead of ourselves here:  I think a Unicode
>> library should be handled in stages:
>> 
>> 1) define the data types for 8/16/32 bit Unicode characters.
> 
> The fact that you believe this is a reasonable first step leads me to believe
> that you have not given much thought to the fact that even if you use a 32-bit
> Unicode encoding, a character can take up more than 32 bits (and likewise for
> 16-bit and 8-bit encodings. Unicode characters are not fixed-width data in any
> encoding. 
[TRUNCATE]
Unicode code-points fit in 31-bit values.  The 8- and 16-bit standards just
encode the 32-bit standard.  We could base Unicode string only around the
code-points.
It may be better to use abstract Unicode characters instead.  However, each
abstract character can be made up of a variable number code-points.  Worse,
there can be several ways of expressing the same abstract character (that's
why there are normalization standards).
Maybe we can have:
struct unicode_code_point { int_least_32_t c; };
struct unicode_code_point_traits { /* like char_traits */ };
struct unicode_abstract_character
{
    int_least_32_t   main_char;     // can there be co-main characters?
    std::size_t      helper_count;  // length of following array
    int_least_32_t  *helper_chars;  // dynamic array of combiners
};
struct unicode_abstract_character_traits { /* like char_traits, but much
more complicated */ };
Recall that character types must be POD, so all the smarts have to go into
the traits class.
-- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com