From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-22 15:38:30


On Fri, 22 Oct 2004 12:46:00 -0400 (EDT), Rob Stewart <stewart_at_[hidden]> wrote:
> From: Rogier van Dalen <rogiervd_at_[hidden]>
> >
> > unicode::string should take a unicode::character for appending. A
> > unicode::character object may be constructed with a single codepoint,
> > which will be its base character. If this codepoint is invalid, it
> > should throw. If the codepoint is a combining mark, it should also
> > throw.
> > unicode::correct() should convert an invalid codepoint into U+FFFD,
> > and if it is input a combining mark, it should use U+0020 SPACE as a
> > base character.
>
> Why not have unicode::character's ctor invoke unicode::correct()?

unicode::correct() replaces every encoding error in the input by a
replacement character. This loses information and it is not
recoverable. The combining character bit is only slightly better. When
I proposed a policy I called it workaround_encoding_error; maybe we
need a better name than "correct".

I agree with Peter Dimov, however, that the default should be to throw
rather than to throw away information and pretend nothing happened.

Regards,
Rogier