Boost mailing page: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]
From: Patrick Horgan (phorgan1_at_[hidden])
Date: 2011-01-21 18:36:12

Next message: Patrick Horgan: "Re: [boost] [general] What will string handling in C++ look like in the future"
Previous message: Wilson Tim-CTW024: "Re: [boost] [1.44][serialization] Link error with polymorphicarchiveand BOOST_CLASS_EXPORT"
In reply to: Matus Chochlik: "Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]"
Next in thread: Matus Chochlik: "Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]"
Reply: Matus Chochlik: "Re: [boost] [general] What will string handling in C++ look like in the future [was Always treat ... ]"

On 01/21/2011 01:54 AM, Matus Chochlik wrote:
> ... elision by patrick...
> Why not boost::string (explicitly stating in the docs that it is UTF-8-based) ?
> the name u8string suggests to me that it is meant for some special case
> of character encoding and the (encoding agnostic/native) std::string
> is still the way
> to go.

I think that's the truth. std::string has some performance guarantees
that a utf-8 based string wouldn't be able to keep. std::string can do
things, and people do things with std::string that a utf-8 based string
can't do. If you set LC_COLLATE to en_US.utf8 or the equivalent (I hate
the way locale names are not as standardized as you might like), then
most of the standard algorithms will be locale aware and operations on
your string will be muchly aware of the string encoding. By switching
locales, you can then operate on strings with other encodings.
utf-8_string isn't intended to operate like that. It's specialized.
> IMO we should send the message that UTF-8 is
> "normal"/"(semi-)standard"/"de-facto-standard"
> and the other encodings like the native_t (or even ansi_t,
> ibm_cp_xyz_t, string16_t,
> string32_t, ...) are the special cases and they should be treated as such.
Why would people want to lose so much of the functionality of
std::string? The only advantage of a utf8_string would be automatic and
continual verification that it's a valid utf-8 encoded string that
otherwise acts as much as possible like a std::string. For that you
would give up a lot of other functionality.

Patrick

Date view	Thread view	Subject view	Author view