Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 09 Dec 2023 19:44:30 +0100
From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
To: Rich Felker <dalias@...c.org>, Alastair Houghton <ahoughton@...le.com>
Cc: musl@...ts.openwall.com
Subject: Re: setlocale() again

On vie, 2023-12-08 at 18:59 -0500, Rich Felker wrote:
> > On Fri, Dec 08, 2023 at 10:46:15AM +0000, Alastair Houghton wrote:
> > > > On 5 Dec 2023, at 15:19, Alastair Houghton
> > > > <ahoughton@...le.com>
> > > > wrote:
> > > > > > 
> > > > > > > > Maybe I’ve missed a reply somewhere along the lines;
> > > > > > > > here’s a
> > > > > > > > tentative patch that just does the simple thing of
> > > > > > > > making
> > > > > > > > setlocale(LC_ALL, "") pick the C.UTF-8 locale if it’s
> > > > > > > > unable to
> > > > > > > > find the locale specified in the environment.
> > > > > > 
> > > > > > [snip]
> > > > > > 
> > > > > > Hah. So, testing that patch, having removed my hacks to
> > > > > > avoid
> > > > > > using Musl’s locale support, I find it doesn’t actually
> > > > > > work (for
> > > > > > two reasons; one, NULL doesn’t mean not found, it means
> > > > > > “use
> > > > > > ‘C’”; and two, there is some very odd code in setlocale.c
> > > > > > that
> > > > > > causes things to go wrong if the specified name is longer
> > > > > > than
> > > > > > LOCALE_NAME_MAX).
> > > > > > 
> > > > > > I’ll come back with an updated patch in a bit.
> > > > 
> > > > Updated patch:
> > > > 
> > > > ==== Cut here ====
> > > > diff --git a/src/locale/locale_map.c b/src/locale/locale_map.c
> > > > index da61f7fc..097da1ad 100644
> > > > --- a/src/locale/locale_map.c
> > > > +++ b/src/locale/locale_map.c
> > > > @@ -31,7 +31,7 @@ static const char envvars[][12] = {
> > > >  volatile int __locale_lock[1];
> > > >  volatile int *const __locale_lockptr = __locale_lock;
> > > >  
> > > > -const struct __locale_map *__get_locale(int cat, const char
> > > > *val)
> > > > +const struct __locale_map *__get_locale(int cat, const char
> > > > *locale)
> > > >  {
> > > >   static void *volatile loc_head;
> > > >   const struct __locale_map *p;
> > > > @@ -39,6 +39,7 @@ const struct __locale_map *__get_locale(int
> > > > cat,
> > > > const char *val)
> > > >   const char *path = 0, *z;
> > > >   char buf[256];
> > > >   size_t l, n;
> > > > + const char *val = locale;
> > > >  
> > > >   if (!*val) {
> > > >   (val = getenv("LC_ALL")) && *val ||
> > > > @@ -92,22 +93,18 @@ const struct __locale_map *__get_locale(int
> > > > cat, const char *val)
> > > >   }
> > > >   }
> > > >  
> > > > - /* If no locale definition was found, make a locale map
> > > > - * object anyway to store the name, which is kept for the
> > > > - * sake of being able to do message translations at the
> > > > - * application level. */
> > > > - if (!new && (new = malloc(sizeof *new))) {
> > > > - new->map = __c_dot_utf8.map;
> > > > - new->map_size = __c_dot_utf8.map_size;
> > > > - memcpy(new->name, val, n);
> > > > - new->name[n] = 0;
> > > > - new->next = loc_head;
> > > > - loc_head = new;
> > > > - }
> > > > + /* If no locale definition was found, and we specified a
> > > > + * locale name of "", return the C.UTF-8 locale. */
> > > > + if (!new && !*locale) new = (void *)&__c_dot_utf8;
> > > >  
> > > >   /* For LC_CTYPE, never return a null pointer unless the
> > > >   * requested name was "C" or "POSIX". */
> > > >   if (!new && cat == LC_CTYPE) new = (void *)&__c_dot_utf8;
> > > >  
> > > > + /* Returning NULL means "C locale"; if we get here and
> > > > + * there's no locale, return failure instead. */
> > > > + if (!new)
> > > > + return LOC_MAP_FAILED;
> > > > +
> > > >   return new;
> > > >  }
> > > > diff --git a/src/locale/setlocale.c b/src/locale/setlocale.c
> > > > index 360c4437..9842d95d 100644
> > > > --- a/src/locale/setlocale.c
> > > > +++ b/src/locale/setlocale.c
> > > > @@ -28,12 +28,14 @@ char *setlocale(int cat, const char *name)
> > > >   const char *p = name;
> > > >   for (i=0; i<LC_ALL; i++) {
> > > >   const char *z = __strchrnul(p, ';');
> > > > - if (z-p <= LOCALE_NAME_MAX) {
> > > > + if (z-p > LOCALE_NAME_MAX)
> > > > + lm = LOC_MAP_FAILED;
> > > > + else {
> > > >   memcpy(part, p, z-p);
> > > >   part[z-p] = 0;
> > > >   if (*z) p = z+1;
> > > > + lm = __get_locale(i, part);
> > > >   }
> > > > - lm = __get_locale(i, part);
> > > >   if (lm == LOC_MAP_FAILED) {
> > > >   UNLOCK(__locale_lock);
> > > >   return 0;
> > > > ==== Cut here ====
> > 
> > Sorry to be late chiming in here. There's something I've been
> > meaning
> > to ask: back when this was first proposed, I recall there being two
> > variants we considered: one where setlocale to "" where the env
> > vars
> > don't resolve to any real locale file produces as its
> > implementation-defined result "C.UTF-8", and another where it
> > produces
> > a ghost locale with the requested name but the behavior of "C.UTF-
> > 8".
> > 
> > Is there a reason you think the former is a better choice than the
> > latter? The latter would avoid breaking things for users with
> > application translations but no libc locale files. However it
> > requires
> > more complex logic for consistency I think, and I'm not sure we
> > ever
> > worked out if that could be done in a reasonable way.

Thanks for chimming in again Rich. I think we have discussed in this
thread, that we want to avoid the second option, because it provides an
inconsistent experience to users. As a distro maintainer, my life will
be much easier if users that lack translations get affected by this. We
have 6 months for this to be included in alpine, and for us to work
together with the users to add their language to musl-locales. If we
keep some sort of inconsistent behavior (between what we claim we
support by returning the name, and what we actually support, which is
C), then I'm sure we'll bump into multiple corner-cases, where I'll
have a very hard time supporting users, and they will have less
motivation to work with us provide translations for their language.

> > 
> > Another option that wasn't raised before but that might be worth
> > considering is keeping the existing behavior if MUSL_LOCPATH is not
> > set (all names are valid and are aliases for "C.UTF-8" but doing as
> > in
> > your patch if it's set.

Same here, although probably less critical, since installing musl-
locales in alpine sets the variable by default. However, if we ever
want to have a default path that does not depend on an environment
variable (something I'd really like to have), this might get trickier?

I've also tested the patch on my GNOME setup with Spanish localization,
and everything seems to work as expected.

Best,
Pablo.

> > 
> > Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.