From: Sundell Software (sundell.software_at_[hidden])
Date: 2005-03-18 15:55:16


On Fri, 18 Mar 2005 16:34:24 -0300, Felipe Magno de Almeida
<felipe.almeida_at_[hidden]> wrote:
> reference counting optimization and maybe others, where there is.

And that it is already a part of the standard. Less code duplication
and the existance of a stable implementation. Dunno if the best thing
would be to make a unicode string class privatly inherit from
basic_string or perhaps do everything through iterators.

Each UTF-8/16/32 has its own iterator type, but all output UTF-32 when
accessed. Look at std::istream_iterator/std::ostream_iterator for
design. There would propably be helper functions for the most common
tasks and i think you should be able to do all the nessesary tasks
with just iterators.

typedef basic_string<utf_8> ustring8;
typedef basic_string<utf_16> ustring16;

ustring8 u8;
ustring16 u16;

// Would propably make .begin() default.
unicode_iterator i8(u8, u8.begin());

// This would be a slow way of doing operator[]. the assignment would
// insert/remove elements from the basic_string if nessesary.
*std::advance(unicode_iterator(u16, u16.begin()), 5) = *(i8++);

Note that the client is responible for giving a valid iterator to
unicode_iterator.

BTW, is using UTF-8/16 in the container really overall cheaper than
UTF-32. Since if the client changes a character, and it happens to be
larger/smaller then all the elements behind it would need to be moved.
Does that happen rarely enough? Though the client should propably know
that themselves.

Rakshasa