Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140731193854.GA1599@brightrain.aerifal.cx>
Date: Thu, 31 Jul 2014 15:38:54 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: How should $LANGUAGE work in our gettext?

I've been trying to figure out a reasonable way to add $LANGUAGE
support to our gettext, or if it's even reasonable. In GNU gettext,
$LANGUAGE overrides the locale name coming from the locale category
(usually LC_MESSAGES) with a colon-delimited list of languages in
preference order.

There are multiple problems I'm running into:

1. Unlike the locale name, which is fixed and well-defined at the time
   of setlocale, $LANGUAGE is only available by calling getenv. Doing
   a linear-search of the environment on each translation call seems
   wrong. I think GNU gettext does this, but has bloated caching of
   recent lookups that masks most of the impact.

2. The $LANGUAGE variable conflicts with uselocale and thread-local
   locales. For instance if the caller has called uselocale to request
   language Y despite the process-wide locale being language X, where
   language X is based on the user's preferences in the environment
   and language Y is based on data, it's wrong to present messages
   based on the environment ($LANGUAGE) rather than the requested
   language Y.

3. If the first choice of language is not available, it's not clear
   how to best cache the non-availability so as not to retry accessing
   the filesystem to look for a translation file each time a
   translation lookup is performed.

4. It seems semantically wrong for $LANGUAGE to override ALL locale
   category settings like it does in the GNU gettext. For instance if
   LC_TIME is a different language from LC_MESSAGES and the
   application uses dcgettext with LC_TIME, semantically this should
   provide a message translation file that matches the language in use
   for the day/month names (from LC_TIME).

Issues 1 and 3 are possibly solvable by having the fallback in
$LANGUAGE be resolved at bindtextdomain time (as part of the binding),
and updated (weak alias magic can be used here) if setlocale is called
again after any domains are already bound, but this is of course
rather complex and ugly.

Issue 2 can probably be alleviated by ignoring $LANGUAGE if uselocale
is active and the uselocale value for LC_MESSAGES was set explicitly,
but this is also an ugly hack.

Issue 4 can obviously be solved by ignoring $LANGUAGE for categories
other than LC_MESSAGES, but this makes for behavior inconsistent with
GNU gettext.

So basically, I think these issues are solvable, but as far as I can
tell, only by ugly hacks and behavior that's inconsistent with what
users might be used to. So I'm hesitant to do them and feeling more
inclined to ignore $LANGUAGE and look for another method of fallbacks
such as allowing a colon-delimited fallback list in LC_MESSAGES.

At this point I'm not going to try to resolve this in 1.1.4. It's a
complex issue and needs more discussion, which is probably better
facilitated by having a release out there for testing and feedback.
But I would like to go ahead and get the discussion started.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.