|
Message-ID: <CAPG2z0-7JV8safi5rrYr0zeu+dKqZLGAikV549jrumy+=NLJxQ@mail.gmail.com> Date: Sat, 4 Mar 2017 16:02:58 +0800 From: He X <xw897002528@...il.com> To: musl@...ts.openwall.com Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' OK, i am busy on school these days. I read the mailing lists again, and i clean up. These are all remaining issues we need to solve since previous discussion: 1. about zero msgid1, i can prove that glibc will fallback to no translations. It's equal to printf(""), so this should be ok: @@ -120,8 +122,9 @@ + if (!msgid1) goto notrans; > but it should be a separate patch since it's an independent change. (added in the head of dcngettext(), ill send a new standalone mail for this, but it's also included in this patch, be careful) 2. >But if the locale name is explicitly non-UTF-8 like "zh_CN.GBK", we could opt to reject it without breaking anything, and this may give users better feedback about what's going wrong if they have such settings when ssh'ing into a musl-based system. About the .GBK(and any other non-UTF8 charsets), i ignore them by treating them as C.UTF-8, do we need to be more strict? --- musl-1.1.16/src/locale/locale_map.c 2017-01-01 03:27:17.000000000 +0000 +++ musl-1.1.16/src/locale/locale_map.c 2017-01-01 03:27:17.000000000 +0000 @@ -46,7 +46,8 @@ if (val[0]=='.' || val[n]) val = "C.UTF-8"; int builtin = (val[0]=='C' && !val[1]) || !strcmp(val, "C.UTF-8") - || !strcmp(val, "POSIX"); + || !strcmp(val, "POSIX") + || strcmp(__strchrnul(val, '.'), ".UTF-8"); if (builtin) { if (cat == LC_CTYPE && val[1]=='.') 3. >The autoconf text for gettext is supposed to be getting fixed not to do that anymore, but I'm not sure what the progress on upstreaming it is. It's just a workaround before they handle it, and i am not going to change anything in musl, just a description. I only patched myself. 4. > Support for non-UTF-8 .mo files won't be added. > msgfmt just needs to be fixed not to produce non-UTF-8 output. I agreed with you, so then i hope '.UTF-8' could be kept. Rather than stripping it as i thought before, '.UTF-8' should be kept until the code went into dcngettext(). What if the .mo files are downloaded from www? Or what if it's pre-generated in the releases of programs? (i guess that's why vim gave me GBK set, it must be pre-generated) And even with msgfmt generating UTF-8 outputs, what if programs still name the dir as zh_CN.UTF-8 instead of simply zh_CN? You can't say it's wrong, right? It's their preference how to name it. It's necessary for those who have a full name like zh_CN.UTF-8 instead of zh_CN. This's what i am trying to express now. 2017-02-14 1:12 GMT+08:00 Rich Felker <dalias@...c.org>: > On Mon, Feb 13, 2017 at 10:06:49PM +0800, He X wrote: > > no, it's on musl, i just tested it with my patches, with vim, stripping > > will lead to unknown characters. > > That's not a matter of the locale being non-UTF-8 (it's UTF-8) but of > the application doing something broken. The locale is UTF-8 because > nl_langinfo(CODESET) says it is and because mb/wc conversion functions > process UTF-8. That's what it means for the locale to be UTF-8. > > > I mean, .mo files under zh_CN/ of vim is GBK set, while zh_CN/ of other > > apps is UTF-8 set, that meas there may be other apps like vim, we should > be > > more cautious, add a check before map the .mo files, and fail non-UTF8 > set > > in setlocale. > > All musl locale files are required to be UTF-8. If an application has > translation files that are not UTF-8, they're not usable. This could > be fixed in the application or by using a fixed version of msgfmt that > converts to UTF-8 before producing the .mo file. > > > Btw, _nl_msg_cat_cntr & _nl_domain_bindings will block apps compiling > with > > the native intl of musl, and after i added a dump for these two symbols, > > The autoconf text for gettext is supposed to be getting fixed not to > do that anymore, but I'm not sure what the progress on upstreaming it > is. > > > gnu tar showed me segfaults, because he passed a zero msgid1 causing > > __mo_lookup segfault, we should add a check in dcngettext to avoid it(if > > (!msgid1) goto notrans;): > > > > #2 0x00007ffff7d82a6f in dcngettext (domainname=0x6737a0 "tar", > > msgid1=0x0, msgid2=0x0, n=1, > > category=5) at src/locale/dcngettext.c:211 > > Is it expecting gettext to return a null pointer in this case, or to > return something else (like the "header", i.e. the translation of "")? > I think it's acceptable to change this behavior as long as we do it > right, but it should be a separate patch since it's an independent > change. > > Rich > Content of type "text/html" skipped View attachment "locale.diff" of type "text/plain" (3636 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.