$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-10-19 12:58:55
In article <cl3hl9$g4e$1_at_[hidden]>, "Erik Wien" <wien_at_[hidden]> wrote:
> The basic idea I have been working around, is to make a nencoded_string 
> class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made 
> possible through a encoding_traits class which contains all nececcary 
> implementation details for working on strings of code units.
I generally agree with this design approach, but I don't think that code point 
iterators alone are sufficient. Iteration over encoded characters and abstract 
characters would be needed for some algorithms to function sensibly. For 
example, the simple task of:
find(begin, end, "ü")
needs to use abstract characters in order to be able to find precomposed and 
decomposed versions of ü.
> You could use the encoded_string class like this:
> 
>  // Constructor converts the ASCII string to UTF-16.
> encoded_string<utf16> some_string("Hello World");
> // Run some standard algorithm on the string:
> std::for_each(some_string.begin(), some_string.end(), do_some_operation);
Again, taking this example, you let's say that do_some_operation performs 
canonicalization to some Unicode canonical form; you can't do this by iterating 
over code points.
> I am aware that this implementation will be less that ideal for integration 
> with the current c++ standard, but it's issues like that I would like to get 
> deeper into during the develpoment.
You should explain what problems with integration you foresee.
meeroh