Subject: Re: [boost] [General] Always treat std::strings as UTF-8
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-14 10:27:49


> > -1 > > I'm opposed to this strategy simply because it differs from the way > existing libraries treat narrow strings. Not least the STL. If you open > an fstream with a narrow filename, for instance, this isn't treated as a > UTF-8 string. It's treated as being in the local codepage. > First of all, neither in C++/03 nor in C++0x you can open a file stream with wide file name. MSVC provides non-standard extension but it does not exist in other compilers like GCC/MinGW. So using C++ you can't open a file called: "שלום-سلام-pease-Мир.txt" under Microsoft Windows. You can use OS level API like _wfopen to do this job using wide string. But you can't to do this in C++. Period. The idea is following: 1. Provide replacement for system libraries that actually use text and relate to it as text in some encoding. For STL and standard C library it would be filesystem API. So you need to provide something like boost::filesystem::fstream 2. Make all boost libraries use Wide API only and never call ANSI API. 3. Treat narrow strings as UTF-8 and convert then to wide prior system calls. > > While this behaviour isn't great, it is standard. > If the standard it bad, leads to unportable and platform incompatible code it should not be used! You can always provide a fallback like boost::utf8_to_locale_encoding if you have to use ANSI API. But generally you should just use something like boost::utf8_to_utf16 and always call Wide API. You must not use ANSI API under Windows. Artyom