Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPG2z0-7JV8safi5rrYr0zeu+dKqZLGAikV549jrumy+=NLJxQ@mail.gmail.com>
Date: Sat, 4 Mar 2017 16:02:58 +0800
From: He X <xw897002528@...il.com>
To: musl@...ts.openwall.com
Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8'

OK, i am busy on school these days. I read the mailing lists again, and i
clean up. These are all remaining issues we need to solve since previous
discussion:

1. about zero msgid1, i can prove that glibc will fallback to no
translations. It's equal to printf(""), so this should be ok:
@@ -120,8 +122,9 @@
+ if (!msgid1) goto notrans;

> but it should be a separate patch since it's an independent
change.
(added in the head of dcngettext(), ill send a new standalone mail for
this, but it's also included in this patch, be careful)

2.
>But if the locale name is explicitly
non-UTF-8 like "zh_CN.GBK", we could opt to reject it without breaking
anything, and this may give users better feedback about what's going
wrong if they have such settings when ssh'ing into a musl-based
system.

About the .GBK(and any other non-UTF8 charsets), i ignore them by treating
them as C.UTF-8, do we need to be more strict?
--- musl-1.1.16/src/locale/locale_map.c 2017-01-01 03:27:17.000000000 +0000
+++ musl-1.1.16/src/locale/locale_map.c 2017-01-01 03:27:17.000000000 +0000
@@ -46,7 +46,8 @@
  if (val[0]=='.' || val[n]) val = "C.UTF-8";
  int builtin = (val[0]=='C' && !val[1])
  || !strcmp(val, "C.UTF-8")
- || !strcmp(val, "POSIX");
+ || !strcmp(val, "POSIX")
+ || strcmp(__strchrnul(val, '.'), ".UTF-8");

  if (builtin) {
  if (cat == LC_CTYPE && val[1]=='.')
3.
>The autoconf text for gettext is supposed to be getting fixed not to
do that anymore, but I'm not sure what the progress on upstreaming it
is.
It's just a workaround before they handle it, and i am not going to change
anything in musl, just a description. I only patched myself.

4.
 > Support for non-UTF-8 .mo files won't be added.
> msgfmt just needs to be fixed not to produce
non-UTF-8 output.

I agreed with you, so then i hope '.UTF-8' could be kept. Rather than
stripping it as i thought before, '.UTF-8' should be kept until the code
went into dcngettext(). What if the .mo files are downloaded from www? Or
what if it's pre-generated in the releases of programs? (i guess that's why
vim gave me GBK set, it must be pre-generated)

And even with msgfmt generating UTF-8 outputs, what if programs still name
the dir as zh_CN.UTF-8 instead of simply zh_CN? You can't say it's wrong,
right? It's their preference how to name it.

It's necessary for those who have a full name like zh_CN.UTF-8 instead of
zh_CN. This's what i am trying to express now.

2017-02-14 1:12 GMT+08:00 Rich Felker <dalias@...c.org>:

> On Mon, Feb 13, 2017 at 10:06:49PM +0800, He X wrote:
> > no, it's on musl, i just tested it with my patches, with vim, stripping
> > will lead to unknown characters.
>
> That's not a matter of the locale being non-UTF-8 (it's UTF-8) but of
> the application doing something broken. The locale is UTF-8 because
> nl_langinfo(CODESET) says it is and because mb/wc conversion functions
> process UTF-8. That's what it means for the locale to be UTF-8.
>
> > I mean, .mo files under zh_CN/ of vim is GBK set, while zh_CN/ of other
> > apps is UTF-8 set, that meas there may be other apps like vim, we should
> be
> > more cautious, add a check before map the .mo files, and fail non-UTF8
> set
> > in setlocale.
>
> All musl locale files are required to be UTF-8. If an application has
> translation files that are not UTF-8, they're not usable. This could
> be fixed in the application or by using a fixed version of msgfmt that
> converts to UTF-8 before producing the .mo file.
>
> > Btw, _nl_msg_cat_cntr & _nl_domain_bindings will block apps compiling
> with
> > the native intl of musl, and after i added a dump for these two symbols,
>
> The autoconf text for gettext is supposed to be getting fixed not to
> do that anymore, but I'm not sure what the progress on upstreaming it
> is.
>
> > gnu tar showed me segfaults, because he passed a zero msgid1 causing
> > __mo_lookup segfault, we should add a check in dcngettext to avoid it(if
> > (!msgid1) goto notrans;):
> >
> >  #2  0x00007ffff7d82a6f in dcngettext (domainname=0x6737a0 "tar",
> > msgid1=0x0, msgid2=0x0, n=1,
> >     category=5) at src/locale/dcngettext.c:211
>
> Is it expecting gettext to return a null pointer in this case, or to
> return something else (like the "header", i.e. the translation of "")?
> I think it's acceptable to change this behavior as long as we do it
> right, but it should be a separate patch since it's an independent
> change.
>
> Rich
>

Content of type "text/html" skipped

View attachment "locale.diff" of type "text/plain" (3636 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.