$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [nowide] Library Updates and Boost's brokenUTF-8 codecvt facet
From: Robert Ramey (ramey_at_[hidden])
Date: 2015-10-08 12:07:26
On 10/8/15 7:54 AM, Artyom Beilis wrote:
> ----- Original Message -----
>
> [BEGIN: Long description regarding <codecvt> ]
>
...
> So... Boost community - please give yourself a favor Don't use <codecvt> unless you really
> understand what are you doing.
Well, I use <codecvt> and boost::utf8_codecvt and I definitely don't 
know what I'm doing.  That (and the fact that I don't have any extra 
time) is the reason for using a library in first place.
The whole, locale/facet/codecvt saga is long and very difficult to 
fathom.  To make things worse it has a tortured history of library 
writers not getting it right.  If one looks at the utf_codecvt facet 
there's lot's of workaround for older compilers and libraries.  So it's 
high time this be rationalized.  I think the concept has merit and would 
do well with a good library and educational documentation to match.
>
> [END: Long description regarding <codecvt> ]
>
>
> If you want to covert utf8 files properly to native wide character like for example for boost::filesystem,
>
> boost::serialization or std::fstream you need to use facet that converts to utf-16 or utf-32
> according to what wchar_t holds and <codecvt> does not provide one (without platform specific tricks)
I see that, but we could easily select which codecvt facet depending on 
the size of the wchar on the specific platform.  I dislike libraries 
which do "too much" in order to "just" work. codecvt library should be
a) A tool kit ot create codecvt facets
b) some generated examples which will cover what most users need
c) a bunch of tutorial information about how codecvt can be used - 
especially outside of stream i/o
d) anything else which is useful.
Note I'm aware that this is a huge task to do right - I certainly 
wouldn't blame anyone for not taking it on.
>
> So I'm not going to implement C++11 <codecvt> because IMHO it is broken by design in first
> place.
Hmm - I'd have to think more about this.  If <codecvt> is ill concieved 
- I'm sure one could propose an alternative.
>
> Boost.Locale provides one but currently it is deep internal and complex part of library.
Hmmm - very interesting.  Maybe it's a question of factoring out this 
part and repackaging it in a more digestible form.  That would be 
interesting.
> The code I written for Boost.Nowide or one I suggest to put into Boost.Locale header-only part
> is codecvt that converts between utf8 and utf-16/32 according to size of character:
> boost::(nowide|or locale)::utf8_facet<wchar_t> - utf-8 to utf-16 (windows) utf-32 (posix)
> boost::(nowide|or locale)::utf8_facet<char16_t> - utf-8 to utf-16 on any platform
> boost::(nowide|or locale)::utf8_facet<char32_t> - utf-8 to utf-32 on any platform
>
> That's it. It isn't <codecvt> because C++11 <codecvt> does not actually do the job needed.
I'll have to take your word for it.
>
>
>
> Artyom Beilis
>
> _______________________________________________
> Unsubscribe & other changes: http://listarchives.boost.org/mailman/listinfo.cgi/boost
>