|
Message-ID: <CAPG2z09edp7keF3ZfHxKR8_2LPu=nSj4aL-XE8VXtiL_+3LuHA@mail.gmail.com> Date: Sat, 18 Mar 2017 21:50:28 +0800 From: He X <xw897002528@...il.com> To: musl@...ts.openwall.com Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' OK, i think there's no further needs of discussion. I got your idea, if this is what musl want to be. I will try to make patches to vim later! But for the checking of `charset=`, i can't help, i did not understand what's up in __mo_lookup(). Hope you can make the patch. The attached has deleted all things related to drop .charset. 2017-03-18 20:28 GMT+08:00 Rich Felker <dalias@...c.org>: > On Sat, Mar 18, 2017 at 07:34:58AM +0000, He X wrote: > > > As discussed on irc, .charset suffixes should be dropped before the > > loop even begins (never used in pathnames), and they occur before the > > @mod, not after it, so the logic for dropping them is different. > > > > 1. drop .charset: Sorry for proposing it again, i forget this case after > > around three weeks, as i said before, vim will generate three different > .mo > > files with different charset -> zh_CN.UTF-8.po, zh_CN.cp936.po, zh_CN.po. > > In that case, dropping is to generate a lots of junk. > > > > I now found it's not a bug of msgfmt. That is charset is converted by: > > iconv -f UTF-8 -t cp936 zh_CN.UTF-8.po | sed -e > > 's/charset=utf-8/charset=gbk/ > ... So that means, charset and pathname > is > > decided by softwares, msgfmt does not do charset converting at all, just > a > > format-translator. (btw, iconv.c is from alpine) > > There are two things you seem to be missing: > > 1. musl does not, and won't, support non-UTF-8 locales, so there is no > point in trying to load translations for them. Moreover, with the > proposed changes to setlocale/locale_map.c, it will never be possible > for the locale name to contain a . with anything other than UTF-8 (or, > for compatibility, some variant like utf8) after it. So I don't see > how there's any point in iterating and trying with/without .charset > when the only possibilities are that .charset is blank, .UTF-8, or > some misspelling of .UTF-8. In the latter case, we'd even have to do > remapping of the misspellings to avoid having to have multiple > dirs/symlinks. > > 2. From my perspective, msgfmt's production of non-UTF-8 .mo files is > a bug. Yes the .po file can be something else, but msgfmt should be > transcoding it at 'compile' time. There's at least one other change > msgfmt needs for all features to work with musl's gettext -- expansion > of SYSDEP strings to all their possible format patterns -- so I don't > think it's a significant additional burden to ensure that the msgfmt > used on musl-based systems outputs UTF-8. > > Of course software trying to do multiple encodings like you described > will still install duplicate files unless patched, but any of them > should work as long as msgfmt recoded them. In the mean time, distros > can just patch the build process for software that's still installing > non-UTF-8 locale files. AFAIK doing that is not a recommended practice > even by the GNU gettext project, so the patches might even make it > upstream. > > One thing we could do for robustness is check the .mo header at load > time and, if it has a charset= specification with something other than > UTF-8, reject it. I mainly suggest this in case the program is running > on a non-musl system where a glibc-built version of the same program > (e.g. vi) with non-UTF-8 .mo files is present and they're using the > same textdomain dir (actually unlikely since prefix should be > different). But if we do this it should be a separate patch because > it's a separate functional change. > > Rich > Content of type "text/html" skipped View attachment "locale.diff" of type "text/plain" (3824 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.