|
Message-ID: <CA+T2pCGKU001PYXjBqj+oQ9W6XC99KGHcKsNh0yTBE_rf3rBKQ@mail.gmail.com> Date: Thu, 1 Mar 2018 13:10:47 -0600 From: William Pitcock <nenolod@...eferenced.org> To: musl@...ts.openwall.com Subject: Re: setlocale behavior with 'missing' locales Hi, On Wed, Feb 28, 2018 at 7:13 PM, Rich Felker <dalias@...c.org> wrote: > On Wed, Nov 08, 2017 at 12:27:15AM -0500, Rich Felker wrote: >> On Wed, Nov 08, 2017 at 12:03:38AM -0500, Rich Felker wrote: >> > Unfortunately this turns out to have been something of a tradeoff, >> > since there's no way for applications (and, as it turns out, >> > especially tests/test suites) to query whether a particular locale is >> > "really" available. I've been asked to change the behavior to fail on >> > unknown locale names, but of course that's not a working option in >> > light of the above. >> > >> > I think there may be a solution that makes everyone happy, but I'm not >> > sure yet. I'm going to follow up with a description and analysis of >> > whether it's valid/conforming. >> >> So here's the possible solution. ISO C leaves the default locale when >> setlocale(cat,"") is called implementation-defined. POSIX however >> defines it in terms of the LANG and LC_* environment variables. See >> the CX text in: >> >> http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html >> >> "Setting all of the categories of the global locale is similar to >> successively setting each individual category of the global locale, >> except that all error checking is done before any actions are >> performed. To set all the categories of the global locale, >> setlocale() can be invoked as: >> >> setlocale(LC_ALL, ""); >> >> In this case, setlocale() shall first verify that the values of all >> the environment variables it needs according to the precedence rules >> (described in XBD Environment Variables) indicate supported locales. >> If the value of any of these environment variable searches yields a >> locale that is not supported (and non-null), setlocale() shall >> return a null pointer and the global locale shall not be changed. If >> all environment variables name supported locales, setlocale() shall >> proceed as if it had been called for each category, using the >> appropriate value from the associated environment variable or from >> the implementation-defined default if there is no such value." >> >> and the Environment Variables text in XBD 8.2: >> >> http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 >> >> The former seems to tie our hands: unless the locales determined by >> the environment variables all exist, setlocale is required to fail and >> leave us in the (unacceptable) "C" locale where UTF-8 doesn't work. >> However the latter seems to offer us a way out. After describing how >> the precedence of the variables work, how locale pathnames work if >> localedef is supported (musl doesn't support it), and how >> implementation-provided/defined locale names work, it specifies: >> >> "If the locale value is not recognized by the implementation, the >> behavior is unspecified." >> >> My optimistic reading of this is that, in the event the locale name >> provided does not correspond to something we recognize, we're free to >> define how it's interpreted, and always interpret it as C.UTF-8. >> >> What this would achieve is the following: >> >> 1. setlocale(cat, explicit_locale_name) - succeeds if the locale >> actually has a definition file, fails and returns a null pointer >> otherwise. >> >> 2. setlocale(cat, "") - always succeeds, honoring the environment >> variable for the category if a locale definition file by that name >> exists, but otherwise (the unspecified behavior) treating it as if >> it were C.UTF-8. >> >> This way, applications that probe for specific locale names can do so >> and determine if they exist, but applications that just want to use >> the default locale the user configured will still avoid catastrophic >> breakage (failure to support UTF-8) even if they encounter "bad" LC_* >> variables. >> >> Does this approach sound acceptable? I'm fairly content with >> interpreting it as conforming to the standard; I'm mainly concerned >> about whether there might be unforseen breakage. >> >> One notable issue is that, right now, we rely on being able to set >> LC_MESSAGES to an arbitrary name even if there's no libc locale >> definition for it; this is because gettext() relies on the name of the >> current LC_MESSAGES locale to find (application-specific) translation >> files that might exist even without a libc translation. I'm not sure >> how we would best keep this working under changes similar to the >> above. > > Any further thoughts on this? I'd like to begin addressing these > issues in this release cycle. > > I think the above plan works (is conforming, doesn't break things) > except for the LC_MESSAGES issue mentioned at the end. I don't have > any good ideas still for dealing with that. Really since gettext can > be used with any category, not just LC_MESSAGES (although LC_MESSAGES > is the normal choice), it applies to all categories. Maybe we could > still use the ("nonexistant") requested locale name in this case, or > some derivative of it that clarifies that it's synthesized...? +1 to using this approach. We could use a locale name such as "en_US@...tual.UTF-8". glibc uses this style of locale name for locales such as UK english with eurozone LC_CURRENCY: en_UK@...o.UTF-8. William
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.