|
Message-ID: <CAE2XoE-C_Pi+i4YT3QKGana3oaWMKz6zUwSN94gnSamtbDxD5Q@mail.gmail.com> Date: Sun, 10 May 2015 20:19:46 +0800 From: 罗勇刚(Yonggang Luo) <luoyonggang@...il.com> To: Rich Felker <dalias@...c.org> Cc: John Sully <john@...uare.ca>, Karsten Blees <blees@...n.de>, musl@...ts.openwall.com, dplakosh@...t.org, austin-group-l@...ngroup.org, hsutter@...rosoft.com, Clang Dev <cfe-dev@...uiuc.edu>, James McNellis <james@...esmcnellis.com> Subject: Re: Re: [cfe-dev] Is that getting wchar_t to be 32bit on win32 a good idea for compatible with Unix world by implement posix layer on win32 API? 2015-05-10 4:05 GMT+08:00 Rich Felker <dalias@...c.org>: > On Sat, May 09, 2015 at 07:19:14PM +0800, 罗勇刚(Yonggang Luo) wrote: >> 2015-05-09 18:36 GMT+08:00 Szabolcs Nagy <nsz@...t70.net>: >> > * John Sully <john@...uare.ca> [2015-05-09 00:55:12 -0700]: >> >> In my opinion you almost never want 32-bit wide characters once you learn >> >> of their limitations. Most people assume that if they use them they can >> >> return to the one character -> one glyph idiom like ASCII. But Unicode is >> > >> > wchar_t must be at least 21 bits on a system that spports unicode >> > in any locale: it has to be able to represent all code points of the >> > supported character set. >> > >> > in practice this means that the only conforming definition to iso c >> > (and thus posix, c++ and other standards based on c) is a 32bit wchar_t >> > (the signedness can be choosen freely). >> > >> > so the definition is not based on what "you almost never want" or what >> > "most people assume". >> > >> > if the goal is to provide a posix implementation then 16bit wchar_t >> > is not an option (assuming the system wants to be able to communicate >> > with the external world that uses unicode text). >> wchar_t is not the only way to communicate with the external way, and >> it's also not suite for communicate to the external world, > > Of course it's not. UTF-8 is. But per both ISO C and POSIX, any > character the locale supports has a representation as wchar_t. If > wchar_t is only 16-bit, then you fundamentally can't support all of > Unicode in the locale's encoding. mbrtowc has to fail with EILSEQ for > 4-byte characters, regex functions cannot process 4-byte characters, > etc. Such a system is is conforming to the requirements for C and > POSIX but does not support Unicode (in full) at the locale level. > >> from the C11 standard, it's never restrict the wchar_t's width, and >> for Posix, most API are implement in >> utf8, and indeed, Windows need the posix layer mainly because of those >> API that using utf8, not wchar_t APIs, >> for the communicate reason to getting wchar_t to be 32 bit on Win32 is >> not a good idea, >> >> And for portable text processing(Including win32) apps or libs, they >> would and should never dependents on the wchar_t must be 32 bit width. > > If __STDC_ISO_10646__ is defined, wchar_t must have at least 21 value > bits. Applications which are portable only to systems where this macro > is defined, or which have some fallback (like dropping multilingual > text support) for systems where it's not defined, CAN make such > assumptions. > >> And C11/C++11 already provide uchar.h to provide cross-platform >> char16_t and char32_t, so there is no reason to getting wchar_t to be >> 32bit >> on win32 for suport posix on win32. > > If wchar_t is 16-bit, you can't represent non-BMP characters in > char32_t because they can't be part of the locale's character set. All > char32_t buys you then is 16 wasted zero bits. > >> We were intent to creating a usable posix layer on win32, not creating >> a theoretical POSIX layer that would be useless, on win32, we should >> considerate the de facto things >> on win32. > > Uselessness is a big assumption you're making that's not supported by > data. If you actually provide a working POSIX layer, you'll have > pretty much any application that's currently working on Linux, BSDs, > etc. (with actual portable code, not system-specific #ifdefs) working > on Windows with few or no changes. If you do that with 32-bit wchar_t, > they'll support Unicode fully. If you do it with 16-bit wchar_t, then > the ones that are using the locale system for character handling will > have to be refitted with extra layers to support more than the BMP, > and those patches probably (hopefully) won't be accepted upstream. > > The only applications that would benefit from having 16-bit wchar_t > are existing Windows applications that are not going to have much use > for a POSIX layer anyway, and they can be fixed very easily with > search-and-replace (no new code layers). That's not so easy as you said to search-and-replace, Windows and POSIX there is a lot of incompatible and that won't be changed, or We just implement a virtual machine that running on Win32, that's would compatible all the POSIX things on win32, but that's useless The intention to provide a POSIX layer is to reduce the burden for those Developers have intension to create cross-platform(include Windows), but not for those Developers that only intent to developing apps for Linux/POSIX. So such a layer should preserve the usable part of POSIX and dropping those part that just creating inconvenience. wchar_t to be 32bit is obviously suite for Win32. My intention is not developing a virtual machine like layer such as cygwin, but a native Win32 layer that provide most POSIX functions and with utf8 support, that would solve most portable issue and works on win32 just like a win32 app but not a Unix/Linux app. > > Rich -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.