|
Message-ID: <20171108052715.GM1627@brightrain.aerifal.cx> Date: Wed, 8 Nov 2017 00:27:15 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: setlocale behavior with 'missing' locales On Wed, Nov 08, 2017 at 12:03:38AM -0500, Rich Felker wrote: > Unfortunately this turns out to have been something of a tradeoff, > since there's no way for applications (and, as it turns out, > especially tests/test suites) to query whether a particular locale is > "really" available. I've been asked to change the behavior to fail on > unknown locale names, but of course that's not a working option in > light of the above. > > I think there may be a solution that makes everyone happy, but I'm not > sure yet. I'm going to follow up with a description and analysis of > whether it's valid/conforming. So here's the possible solution. ISO C leaves the default locale when setlocale(cat,"") is called implementation-defined. POSIX however defines it in terms of the LANG and LC_* environment variables. See the CX text in: http://pubs.opengroup.org/onlinepubs/9699919799/functions/setlocale.html "Setting all of the categories of the global locale is similar to successively setting each individual category of the global locale, except that all error checking is done before any actions are performed. To set all the categories of the global locale, setlocale() can be invoked as: setlocale(LC_ALL, ""); In this case, setlocale() shall first verify that the values of all the environment variables it needs according to the precedence rules (described in XBD Environment Variables) indicate supported locales. If the value of any of these environment variable searches yields a locale that is not supported (and non-null), setlocale() shall return a null pointer and the global locale shall not be changed. If all environment variables name supported locales, setlocale() shall proceed as if it had been called for each category, using the appropriate value from the associated environment variable or from the implementation-defined default if there is no such value." and the Environment Variables text in XBD 8.2: http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02 The former seems to tie our hands: unless the locales determined by the environment variables all exist, setlocale is required to fail and leave us in the (unacceptable) "C" locale where UTF-8 doesn't work. However the latter seems to offer us a way out. After describing how the precedence of the variables work, how locale pathnames work if localedef is supported (musl doesn't support it), and how implementation-provided/defined locale names work, it specifies: "If the locale value is not recognized by the implementation, the behavior is unspecified." My optimistic reading of this is that, in the event the locale name provided does not correspond to something we recognize, we're free to define how it's interpreted, and always interpret it as C.UTF-8. What this would achieve is the following: 1. setlocale(cat, explicit_locale_name) - succeeds if the locale actually has a definition file, fails and returns a null pointer otherwise. 2. setlocale(cat, "") - always succeeds, honoring the environment variable for the category if a locale definition file by that name exists, but otherwise (the unspecified behavior) treating it as if it were C.UTF-8. This way, applications that probe for specific locale names can do so and determine if they exist, but applications that just want to use the default locale the user configured will still avoid catastrophic breakage (failure to support UTF-8) even if they encounter "bad" LC_* variables. Does this approach sound acceptable? I'm fairly content with interpreting it as conforming to the standard; I'm mainly concerned about whether there might be unforseen breakage. One notable issue is that, right now, we rely on being able to set LC_MESSAGES to an arbitrary name even if there's no libc locale definition for it; this is because gettext() relies on the name of the current LC_MESSAGES locale to find (application-specific) translation files that might exist even without a libc translation. I'm not sure how we would best keep this working under changes similar to the above. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.