$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [Locale] Preview of 3rd version
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-09-11 15:57:25
On 11/09/2010 20:34, Artyom wrote:
> Ahh I see, I do following:
>
> When I read for example 4 byes of UTF-8 that go to codepoint> 0xFFFF
> I do following:
>
> 1. I write first surrogate pair to output stream,
> I update the state to reflect that first part of the pair was written and
> **I do not consume input**
> 2. Same 4 utf-8 bytes again and see that state is marked to
> that first part of pair was written so I write the second and consume the
> input.
>
> So actually do_in called twice for same input.
The code in question is in loop that keeps on going until from reaches
from_end or the conversion fails (due to insufficient input or
otherwise), so both surrogates should be written in the same do_in
invocation.
> Actually the mbstate_t is POD type that should be initialized to 0. I must make
> sure that
> sizeof(mbstate_t)>= 2, and then I use it as temporary storage for state.
I'm not talking about that, I meant the reinterpret casting between
uchar and uint_type, but actually I suppose they're the same, maybe just
different signedness, so that should be somewhat ok.
It's still not allowed by the strict aliasing rules though.