$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [nowide] Request for interest (nowide unicode support for windows)
From: Artyom (artyomtnk_at_[hidden])
Date: 2010-06-16 15:50:19
>
> Me too.
>
> I'm saying that Filesystem v3 on Windows doesn't interpret
> narrow strings
> as UTF-8 by default. Berman said that it did but I
> beg to differ. Here's
> what the comments say:
>
> //Â For Windows, wchar_t strings do not undergo
> conversion. char strings
> //Â are converted using the "ANSI" or "OEM" code
> pages, as determined by
> //Â the AreFileApisANSI() function, or, if a
> conversion argument is given,
> //Â using a conversion object modeled on
> std::wstring_convert.
>
> In other words "ש×××.txt" would be interpreted as being
> in whatever
> encoding the local code page is set to and would,
> therefore, produce a path
> containing gibberish for most people. This is
> standard Windows behaviour
> :P
This standard Windows behavior is exactly **the** problem.
To be honest, have you seen anybody using "wide-path" outside
of Windows scope? Do you actually need such "wide-path" for POSIX
platforms?
The answer is not.
Actually, POSIX OS does not care about filename charset, as I can create
a file
std::ofstream f("\xf9\xec\xe5\xed.txt");
Which is valid file (ש××× in ISO-8859-8) but invalid UTF-8. But
it is valid file-name (and the locale is UTF-8 locale).
>
> Your problem is yet another step further than this.Â
> Assuming fs3 correctly
> converted "ש×××.txt" to the UTF-16 equivalent, how do
> you then open a file
> using this wide-char name? Well, MSVC has wchar_t
> overloads so this works
> fine. You're right about glibc++/MinGW though.Â
> fs::fstream will fail
> there. Rather than introducing a nowide library, why
> don't we just try to
> fix this in Boost.Filesystem?
>
I think that this can be fixed (the way I fixed it in nowide
implementing fstreambuf over stdio+_wfopen)
http://art-blog.no-ip.info/files/nowide.zip
But this is one particular problem.
There are more. What about filesystem::remove and others?
From what I see in the code, it supports only path and not wpath
---------------------
But this is a part of one bigger problem.
When I develop cross platform applications I have following options
for operating of files.
For example when I want to remove, rename, create a file
in a program writing cross platform applications, writing
using standard platform independent C++, Writing for POSIX operating
systems and for MS Windows.
OS \ Str | std::string | std::wstring |
-----------------------------------------------
Std C++ | Ok | Not Defined!
POSIX | Ok | Not Defined!
WinAPI | Not UTF-8 | Ok
What I can see. I need either use wide strings that works only on Windows
but require me to convert to other encoding for operations on files.
Or I may use normal strings as standard requires and have problems
with Windows as it is not fully supported.
Or I need to write two kinds of code:
- One for Windows using "Wide" strings
- One for anything else using normal strings.
Because windows does not support UTF-8 code-page.
So far? Why? Why do you need all this if you can just
create a tiny layer that makes Window support UTF-8 code page
by converting std::string to std::wstring and calling appropriate
API?
My Opinion:
-----------
- There is Neither use nor Need of "Wide" strings for file system
operations on all platforms but Windows.
- Introducing boost::filesystem::wpath does not help as
it meaningless on other OSes.
- Using Wide strings is extremely error prone in cross platform
applications as on Windows they are UTF-16 and on POSIX they
are UTF-32 encodings.
Wide Path support just make our applications more complicated
and error prone.
So... Just create an API that is friendly to UTF-8 strings and
forget about this hell.
-------------
But from what I see this will never happen in Boost as it is too
Windows centric, and Windows is too ignorant to basic programmers
needs who want to write a portable programs.
Regards.
Artyom
P.S.: The title of this mail is request for interest.
It is ok not to have one.