|
Message-ID: <CAPG2z08yePs-6pqHcoBbMfWRPyXunuT-2Ge_JDWH8E5Y+_0wtw@mail.gmail.com> Date: Sat, 11 Feb 2017 14:00:56 +0800 From: He X <xw897002528@...il.com> To: musl@...ts.openwall.com Subject: Re: Re: a bug in bindtextdomain() and strip '.UTF-8' fresh patch :) 1. It's easier that just stopping at dot, and i think this should be commented in the wiki or somewhere. 2. I read your first part of reply for 20mins, but im not sure; If i understand right, you mean, let the __locale_map* and strcut binding* be the id-card for msgcat list instead of the long name string, not only faster, but also more easy to construct pathname string. But there's some questions: + I removed name from msgcat, i can't find its use there, is it safe? + gettextdir() is replaced by a new loop, since i need the pointer of struct binding not only the dirname, but then, gettextdir() is only called by bindtextdomain(), is there a need to keep it? Or we have a better way to get the pointer of struct binding? + you said msgcat's indexed by ( struct __locale_map *, struct binding *, category ), but i found lm(locale_map) is located by category, so if category is different, then we can't get the same lm, so we can just compare lm, right? 2017-02-11 10:36 GMT+08:00 Rich Felker <dalias@...c.org>: > On Thu, Feb 09, 2017 at 05:49:13PM +0800, He X wrote: > > sry! > > > > 2017-02-08 22:31 GMT+08:00 Rich Felker <dalias@...c.org>: > > > > > On Wed, Feb 08, 2017 at 06:13:30PM +0800, He X wrote: > > > > here the patch is: http://paste.ubuntu.com/23953329/ > > > > The code tested, but maybe it sucks. > > > > > > Patches need to be attached and sent to the list, not pastebins that > > > might disappear. The latter don't work for discussing and preserving > > > discussion of the patch. > > > --- a/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > > +++ b/src/locale/dcngettext.c 2017-02-06 14:39:17.860482624 +0000 > > @@ -19,6 +19,7 @@ > > }; > > > > static void *volatile bindings; > > +char *__strchrnul(const char *, int); > > > > static char *gettextdir(const char *domainname, size_t *dirlen) > > { > > @@ -143,7 +143,7 @@ > > > > catname = catnames[category]; > > catlen = catlens[category]; > > - loclen = strlen(locname); > > + loclen = __strchrnul(locname, '.') - locname; > > > > size_t namelen = dirlen+1 + loclen+1 + catlen+1 + domlen+3; > > char name[namelen+1], *s = name; > > @@ -157,6 +157,8 @@ > > +rewrite_loc: > > memcpy(s, locname, loclen); > > s[loclen] = '/'; > > s += loclen + 1; > > +skip_loc: > > memcpy(s, catname, catlen); > > s[catlen] = '/'; > > s += catlen + 1; > > @@ -174,7 +175,22 @@ > > void *old_cats; > > size_t map_size; > > const void *map = __map_file(name, &map_size); > > - if (!map) goto notrans; > > + if (!map) { > > + if (s = strchr(name + dirlen + 1, '@')) { > > + *s++ = '/'; > > + goto skip_loc; > > + } > > + if (locname && (s = strchr(name + dirlen + 1, > '_')) && (strchr(name + dirlen +1, '/') > s) ) { > > + if (locname = strchr(locname, '@')) { > > + loclen = __strchrnul(lm->name, > '.') - locname; > > + goto rewrite_loc; > > + } else { > > + *s++ = '/'; > > + goto skip_loc; > > + } > > + } > > + goto notrans; > > + } > > This doesn't work because it changes both the key used for the lookup > and the filename mapped. If you try this code with a translation that > requires a fallback, and run it under strace, you'll see that _every_ > call to gettext will try again to find the nonexistent files. > > It could be fixed, but I think the code should be refactored so that, > rather than the msgcat list being indexed by pathname strings, it's > indexed by tuples of: > > ( struct __locale_map *, struct binding *, category ) > > These are all integers/pointers and thus compare very fast versus the > current strcmp operation, and it's very quick to look them up. Then we > only have to construct the pathname string when a new file needs to be > loaded, not on every call, and you're free to clobber the pathname > string while doing fallbacks. > > > p = calloc(sizeof *p + namelen + 1, 1); > > if (!p) { > > __munmap((void *)map, map_size); > > --- a/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > > +++ b/src/locale/locale_map.c 2017-02-06 14:39:17.797148750 +0000 > > @@ -32,6 +32,7 @@ > > struct __locale_map *new = 0; > > const char *path = 0, *z; > > char buf[256]; > > + char *dotp; > > size_t l, n; > > > > if (!*val) { > > @@ -40,6 +41,12 @@ > > (val = getenv("LANG")) && *val || > > (val = "C.UTF-8"); > > } > > + if (dotp = strchr(val, '.')) { > > + char part[256]; > > + memcpy(part, val, dotp - val); > > + memcpy(&part[dotp - val], ".UTF-8\0", 7); > > + val = part; > > + } > > > > /* Limit name length and forbid leading dot or any slashes. */ > > for (n=0; n<LOCALE_NAME_MAX && val[n] && val[n]!='/'; n++); > > I don't think this part is desirable, but if it were, it would need to > be done differently. As-is, it has serious UB, use of part[] after the > end of its lifetime. It also seems to have no check to see that > dotp-val is less than 256-7 or even that it's bounded, whereas the > code that immediately follows checks the length of the string pointed > to by val. > > I think what it should be doing is the opposite, stopping when hitting > a dot in the name and only using the part up to the dot, except in the > one special case "C.UTF-8". The subsequent path search for the locale > file should probably then be repeated with combinations of dropping > @mod and _CC suffixes, but this dropping should _not_ affect the name > that's saved and reported back. (That is, if LC_TIME=fr_CA but only a > "fr" locale file exists, the "fr" file should get mapped but the name > returned by setlocale, and saved for use by gettext, should still be > the full "fr_CA" in case applications have "fr_CA" translations.) > > Rich > Content of type "text/html" skipped View attachment "locale.diff" of type "text/plain" (3878 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.