$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Matus Chochlik (chochlik_at_[hidden])
Date: 2011-01-20 03:59:51
On Wed, Jan 19, 2011 at 8:50 PM, Chad Nelson
<chad.thecomfychair_at_[hidden]> wrote:
>
> Do you see another way to provide those conversions, and automatic
> verification of proper UTF coding? (Automatic verification is a very
> good thing, without it someone won't use it or will forget to, and open
> up their programs to exploitation.)
Yes, implementing it into std::string in some future standard.
>
> If Boost comes out with a version that breaks existing programs,
> companies just won't upgrade to it. I can keep one of the companies
> that mine works with upgrading, because the group that I work with is
> the only one there using C++ and they listen to me, but most companies
> have a lot more invested in the existing system. Believe me, any
> breaking changes have to be eased in over many versions -- the "boiling
> a frog" approach. :-)
Of course this is a valid point and what we should do is to do some
potential damage evaluation. There have been breaking changes
in Boost and the end-users finally accepted them (even if complaining
loudly) Boost is a cutting edge library and such changes should
be avoided if possible, but they should not be avoided completelly.
This would require a lot of PR and announcing the changes well
in advance.
>
> If they're already using UTF-8 strings, then we provide something like
> BOOST_ALL_STD_STRINGS_ARE_UTF8 that they can define. The utf*_t classes
> configure themselves to accept std::strings as UTF-8-encoded, and any
> changes are completely transparent to those people. No punishment
> involved.
OK this could work.
>
> For everyone else, we introduce the utf*_t API alongside the
> std::string one, for those classes and functions that are not
> encoding-agnostic. The std::string one can be deprecated in future
> versions if the library author desires. Again, no punishment involved.
>
>
> I don't expect that the utf*_t classes will make it into the standard.
> They definitely won't make it into the now-misnamed C++0x standard, and
> it'll likely be another ten years before another one is hashed out --
> by then, the UTF-8 conversion should be complete, so there will be no
> need for it, except possibly to confirm that a string isn't malformed.
>
>>
>> Besides the ugly name and that is a new class ? No :)
>
> If you can think of a more-acceptable-but-still-descriptive name for
> it, I'm all ears. :-)
I have an idea: what about boost::string, which could possibly become
the next std::string in the future.
>> And the solution is long overdue. And creating utf8_t is just putting
>> the problem away, not solving it really.
>
> I see it as merely easing the transition.
OK, if the long term plan is:
1) design and implement boost::string using UTF-8 doing all the things
like code-point iteration, character iteration, convenience stuff like
starts-with, ends-with, replace, trim, etc., etc. with as much backward
compatibility with std::string as possible without hindering progress
2) try really hard to push it to the standard
then I'm on board with that.
BR,
Matus