|
Message-ID: <20140731193854.GA1599@brightrain.aerifal.cx> Date: Thu, 31 Jul 2014 15:38:54 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: How should $LANGUAGE work in our gettext? I've been trying to figure out a reasonable way to add $LANGUAGE support to our gettext, or if it's even reasonable. In GNU gettext, $LANGUAGE overrides the locale name coming from the locale category (usually LC_MESSAGES) with a colon-delimited list of languages in preference order. There are multiple problems I'm running into: 1. Unlike the locale name, which is fixed and well-defined at the time of setlocale, $LANGUAGE is only available by calling getenv. Doing a linear-search of the environment on each translation call seems wrong. I think GNU gettext does this, but has bloated caching of recent lookups that masks most of the impact. 2. The $LANGUAGE variable conflicts with uselocale and thread-local locales. For instance if the caller has called uselocale to request language Y despite the process-wide locale being language X, where language X is based on the user's preferences in the environment and language Y is based on data, it's wrong to present messages based on the environment ($LANGUAGE) rather than the requested language Y. 3. If the first choice of language is not available, it's not clear how to best cache the non-availability so as not to retry accessing the filesystem to look for a translation file each time a translation lookup is performed. 4. It seems semantically wrong for $LANGUAGE to override ALL locale category settings like it does in the GNU gettext. For instance if LC_TIME is a different language from LC_MESSAGES and the application uses dcgettext with LC_TIME, semantically this should provide a message translation file that matches the language in use for the day/month names (from LC_TIME). Issues 1 and 3 are possibly solvable by having the fallback in $LANGUAGE be resolved at bindtextdomain time (as part of the binding), and updated (weak alias magic can be used here) if setlocale is called again after any domains are already bound, but this is of course rather complex and ugly. Issue 2 can probably be alleviated by ignoring $LANGUAGE if uselocale is active and the uselocale value for LC_MESSAGES was set explicitly, but this is also an ugly hack. Issue 4 can obviously be solved by ignoring $LANGUAGE for categories other than LC_MESSAGES, but this makes for behavior inconsistent with GNU gettext. So basically, I think these issues are solvable, but as far as I can tell, only by ugly hacks and behavior that's inconsistent with what users might be used to. So I'm hesitant to do them and feeling more inclined to ignore $LANGUAGE and look for another method of fallbacks such as allowing a colon-delimited fallback list in LC_MESSAGES. At this point I'm not going to try to resolve this in 1.1.4. It's a complex issue and needs more discussion, which is probably better facilitated by having a release out there for testing and feedback. But I would like to go ahead and get the discussion started. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.