$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [strings][unicode] Proposals for Improved String Interoperability in a Unicode World
From: Daryle Walker (darylew_at_[hidden])
Date: 2012-01-31 03:57:58
----------------------------------------
> Date: Mon, 30 Jan 2012 00:24:30 -0800
> From: Artyom
>
> ----- Original Message -----
> > From: Beman Dawes <bdawes_at_[hidden]>
> >
> >> What probably should be done is that compilers should be compelled to
> >> support UTF-8 as the source character set in a unified way.
> >
> > Makes sense to me.
> >
> > Why don't you write up an issue for the C and C++ committees? My
> >
> > [snip]
> >
> > Another possibility is to start lobbying compiler vendors, or at least
> > Microsoft, to support UTF-8 both with and without BOM.
> >
>
> It is not only BOM not BOM issue. It is mostly the ability
> to define execution character set. i.e. character set for
> normal "some text" literals and the input character set
> and what is even more important that C++ compilers must
> support UTF-8 for the two of them.
This probably isn't the right post to respond to, but I don't want to spend forever figuring it out.
Not every system is a 8/16/32(/64)-bit computer using ASCII/Latin-1/UTF-8. C++ (from C) was designed so a user with a 9/36/81-bit EBSDIC system and one with a 8/16/32/64 UTF-16 system can write programs for the other (with the appropriate cross-compiler). We don't want to obnoxiously be prejudiced against systems not matching the current configuration trends.
(I was originally going to write "9/36/72", but then realized that higher types only have to be a multiple of char, not each other, so my new system breaks more common-programmer assumptions. BTW, that's 9-bit bytes (char), 36-bit words (short and int), and 81-bit long-words (long and long-long). I wonder if anyone here can fabricate this custom hardware, to mess people up.)
Daryle W.