Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180302014349.GZ1436@brightrain.aerifal.cx>
Date: Thu, 1 Mar 2018 20:43:49 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: setlocale behavior with 'missing' locales

On Thu, Mar 01, 2018 at 02:25:45PM -0500, Rich Felker wrote:
> On Thu, Mar 01, 2018 at 01:10:47PM -0600, William Pitcock wrote:
> > >> One notable issue is that, right now, we rely on being able to set
> > >> LC_MESSAGES to an arbitrary name even if there's no libc locale
> > >> definition for it; this is because gettext() relies on the name of the
> > >> current LC_MESSAGES locale to find (application-specific) translation
> > >> files that might exist even without a libc translation. I'm not sure
> > >> how we would best keep this working under changes similar to the
> > >> above.
> > >
> > > Any further thoughts on this? I'd like to begin addressing these
> > > issues in this release cycle.
> > >
> > > I think the above plan works (is conforming, doesn't break things)
> > > except for the LC_MESSAGES issue mentioned at the end. I don't have
> > > any good ideas still for dealing with that. Really since gettext can
> > > be used with any category, not just LC_MESSAGES (although LC_MESSAGES
> > > is the normal choice), it applies to all categories. Maybe we could
> > > still use the ("nonexistant") requested locale name in this case, or
> > > some derivative of it that clarifies that it's synthesized...?
> > 
> > +1 to using this approach.
> > 
> > We could use a locale name such as "en_US@...tual.UTF-8".
> > 
> > glibc uses this style of locale name for locales such as UK english
> > with eurozone LC_CURRENCY: en_UK@...o.UTF-8.
> 
> I was actually just in the process of trying to work out something
> very similar. Here's how I think it might work:
> 
> setlocale(cat, "") -- always succeeds, produces ll_TT@...tual (or
> ll_TT@...sing was my idea) if a locale file by the matching name is
> not found.
> 
> setlocale(cat, "ll_TT@...tual") (or whatever name) - always succeeds.
> 
> setlocale(cat, "ll_TT[@other]") - succeeds only if a file matching the
> name is found.
> 
> One thing I don't entirely like is repurposing the @ modifier for
> this; it conflicts with (and perhaps fails to preserve) an existing
> modifier if there is one, and affects how search for gettext
> translation files would happen (searching extra @virtual paths).
> Perhaps we should instead make it a separate component delimited in
> some other way so it can always be dropped by gettext.

On this topic, I did some research on GNU gettext, and just like
musl's it ignores the codeset part of the locale name
ll[_TT][.codeset][@modifier] while trying combinations of including or
omitting _TT and @modifier. So it looks like the only way to make a
synthesized locale name that can match all the same translation files
as the original name, under either musl or GNU gettext, is by
misappropriating the codeset field as the indicator that it's a
synthesized locale. That doesn't sound particularly good.

If we're only concerned about musl gettext and not GNU gettext or
other third-party software trying to parse the resulting synthesized
locale names, we can simply adopt any notation we like and have musl's
gettext ignore it.

Also in the case where the original requested locale had no @modifier
component, adding a special @synth/@...sing/whatever modifier would
not disturb search for translations with either musl or GNU gettext.
At worst GNU gettext would search a few extra nonexistant pathnames.

One other thing to note is that synthesizing locales without adjusting
the name to indicate that they're synthesized does not seem consistent
if setlocale is going to reject unknown explicit names. The name that
the program reads back from setlocale(cat,0) or NL_LOCALE_NAME would
then fail to be valid for subsequent use as an explicit name.

One possible alternative to synthesizing names would be just reading
back the name of the locale that was actually set ("C.UTF-8" or some
fallback like "en" when "en_US" was requested but only "en" was
available). In this case GNU gettext or any third-party code would be
unable to honor the requested locale. musl's internal gettext could,
but I'm not sure this kind of hidden state would be desirable or
consistent, so I'd be a bit hesitant to do it. An alternative would be
just giving up on the ability to get message translations in a
language for which you don't have a locale installed. This would sound
a lot more acceptable if we actually had locale definition files, I
think....

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.