From: Miro Jurisic (macdev_at_[hidden])
Date: 2004-10-19 12:58:55


In article <cl3hl9$g4e$1_at_[hidden]>, "Erik Wien" <wien_at_[hidden]> wrote:

> The basic idea I have been working around, is to make a nencoded_string
> class templated on unicode encoding types (i.e. UTF-8, UTF-16). This is made
> possible through a encoding_traits class which contains all nececcary
> implementation details for working on strings of code units.

I generally agree with this design approach, but I don't think that code point
iterators alone are sufficient. Iteration over encoded characters and abstract
characters would be needed for some algorithms to function sensibly. For
example, the simple task of:

find(begin, end, "ü")

needs to use abstract characters in order to be able to find precomposed and
decomposed versions of ü.

> You could use the encoded_string class like this:
>
> // Constructor converts the ASCII string to UTF-16.
> encoded_string<utf16> some_string("Hello World");
> // Run some standard algorithm on the string:
> std::for_each(some_string.begin(), some_string.end(), do_some_operation);

Again, taking this example, you let's say that do_some_operation performs
canonicalization to some Unicode canonical form; you can't do this by iterating
over code points.

> I am aware that this implementation will be less that ideal for integration
> with the current c++ standard, but it's issues like that I would like to get
> deeper into during the develpoment.

You should explain what problems with integration you foresee.

meeroh