Subject: Re: [boost] [nowide] Request for interest (nowide unicode support for windows)
From: Artyom (artyomtnk_at_[hidden])
Date: 2010-06-16 15:50:19


> > Me too. > > I'm saying that Filesystem v3 on Windows doesn't interpret > narrow strings > as UTF-8 by default.  Berman said that it did but I > beg to differ.  Here's > what the comments say: > > //  For Windows, wchar_t strings do not undergo > conversion. char strings > //  are converted using the "ANSI" or "OEM" code > pages, as determined by > //  the AreFileApisANSI() function, or, if a > conversion argument is given, > //  using a conversion object modeled on > std::wstring_convert. > > In other words "שלום.txt" would be interpreted as being > in whatever > encoding the local code page is set to and would, > therefore, produce a path > containing gibberish for most people.  This is > standard Windows behaviour > :P This standard Windows behavior is exactly **the** problem. To be honest, have you seen anybody using "wide-path" outside of Windows scope? Do you actually need such "wide-path" for POSIX platforms? The answer is not. Actually, POSIX OS does not care about filename charset, as I can create a file std::ofstream f("\xf9\xec\xe5\xed.txt"); Which is valid file (שלום in ISO-8859-8) but invalid UTF-8. But it is valid file-name (and the locale is UTF-8 locale). > > Your problem is yet another step further than this.  > Assuming fs3 correctly > converted "שלום.txt" to the UTF-16 equivalent, how do > you then open a file > using this wide-char name?  Well, MSVC has wchar_t > overloads so this works > fine.  You're right about glibc++/MinGW though.  > fs::fstream will fail > there.  Rather than introducing a nowide library, why > don't we just try to > fix this in Boost.Filesystem? > I think that this can be fixed (the way I fixed it in nowide implementing fstreambuf over stdio+_wfopen) http://art-blog.no-ip.info/files/nowide.zip But this is one particular problem. There are more. What about filesystem::remove and others? From what I see in the code, it supports only path and not wpath --------------------- But this is a part of one bigger problem. When I develop cross platform applications I have following options for operating of files. For example when I want to remove, rename, create a file in a program writing cross platform applications, writing using standard platform independent C++, Writing for POSIX operating systems and for MS Windows. OS \ Str | std::string | std::wstring | ----------------------------------------------- Std C++ | Ok | Not Defined! POSIX | Ok | Not Defined! WinAPI | Not UTF-8 | Ok What I can see. I need either use wide strings that works only on Windows but require me to convert to other encoding for operations on files. Or I may use normal strings as standard requires and have problems with Windows as it is not fully supported. Or I need to write two kinds of code: - One for Windows using "Wide" strings - One for anything else using normal strings. Because windows does not support UTF-8 code-page. So far? Why? Why do you need all this if you can just create a tiny layer that makes Window support UTF-8 code page by converting std::string to std::wstring and calling appropriate API? My Opinion: ----------- - There is Neither use nor Need of "Wide" strings for file system operations on all platforms but Windows. - Introducing boost::filesystem::wpath does not help as it meaningless on other OSes. - Using Wide strings is extremely error prone in cross platform applications as on Windows they are UTF-16 and on POSIX they are UTF-32 encodings. Wide Path support just make our applications more complicated and error prone. So... Just create an API that is friendly to UTF-8 strings and forget about this hell. ------------- But from what I see this will never happen in Boost as it is too Windows centric, and Windows is too ignorant to basic programmers needs who want to write a portable programs. Regards. Artyom P.S.: The title of this mail is request for interest. It is ok not to have one.