|
Message-ID: <Z4K0xWcQ6tP30CZc@beigestar> Date: Sat, 11 Jan 2025 18:13:25 +0000 From: Gavin Smith <gavinsmith0123@...il.com> To: musl@...ts.openwall.com Subject: gettext LC_MESSAGES differences from other libc (Please CC me in any replies as I am not subscribed to the list.) As you know, the gettext function in musl does not behave exactly like the function in glibc and some other libc implementations. Specifically, it does not obey the LANGUAGE variable which can be used to specify that translated strings should be in a certain language. In 2014, you discussed the rationale for not supporting LANGUAGE. There were issues with threads and caching: Rich Felker, Thu, 31 Jul 2014, "How should $LANGUAGE work in our gettext?" https://www.openwall.com/lists/musl/2014/07/31/2 Recently in the Texinfo project, we found this incompatibility with musl for translations of strings to be placed in output files. The gettext API (neither musl or glibc/other) is not a perfect match for Texinfo needs as much assumes that the target language is that of the user, of the person sitting in front of the computer, whereas the appropriate translation language is that of the input document. For example, somebody could be generating documentation in Italian to be posted to a website, while they don't speak Italian themselves and do not have an Italian locale installed. The only way we can support this with glibc is to set LC_MESSAGES and/or LC_ALL to a locale that is not "C" or "POSIX", and then to set the LANGUAGE variable for the actual target language. This is a nuisance, as sometimes it is a struggle to actually find such a locale. The assumption when this API was designed was that a user with only a "C" locale does not need translations, but this is false when they are generating them for somebody else. libc appears to offer no way just to open an arbitrary .mo file (the file with the translated strings in it) to get the translations, forcing you to go through the locale system. musl supports setting LC_MESSAGES to an arbitrary value that is not a locale, so can access arbitrary translation files in a different way. However, we didn't think it was worth having a special case in the code just for musl: https://lists.gnu.org/archive/html/bug-texinfo/2024-12/msg00035.html You also discussed this changing how LC_MESSAGES worked in a post in 2017, but as far as I am aware nothing came of it: Rich Felker, Wed, 8 Nov 2017, "Re: setlocale behavior with 'missing' locales" One notable issue is that, right now, we rely on being able to set LC_MESSAGES to an arbitrary name even if there's no libc locale definition for it; this is because gettext() relies on the name of the current LC_MESSAGES locale to find (application-specific) translation files that might exist even without a libc translation. I'm not sure how we would best keep this working under changes similar to the above. https://www.openwall.com/lists/musl/2017/11/08/2 Could there be a possiblity of a new extension to the getttext API that works with musl, glibc and other libc implementations, that could be used for arbitrary languages, not just those with installed locales? I mention the possibility, as I found an old proposal (from 2016) to add to the glibc API for translation languages that could be of interest: Bruno Haible, 2016-05-10 "Re: [bug-gettext] RFC: move LANGUAGE check out of gettext()" https://lists.gnu.org/archive/html/bug-gettext/2016-05/msg00009.html > Why is this being reported for the LANGUAGE environment variable but not > for the LANG and LC_ALL environment variables? Because for LANG and LC_* > we have an architecture composed of three functionalities: > > (A) environment variables: getenv(), setenv() > > (B) locales: setlocale(), newlocale(), uselocale(). > > (C) gettext() and friends. > > (A) is the bottom-most layer. But it has the limitation that multi-threaded > programs must not call setenv(). > > (B) is a layer that fetches the initial values from (A), and that allows > mutators (setlocale(), uselocale()) in multi-threaded programs. > So that multi-threaded applications can modify the program's locale after > startup, there is the setlocale() function. > So that multi-threaded programs can have a locale per thread, there is a > uselocale() function. > > (C) is an application layer that happens to be in Glibc for convenience > reasons. It is based on the layer (B). > > > Back to the LANGUAGE environment variable. The problem is that here we > have the layers (A) and (C), but (B) is missing. The solution ought to > be to introduce a layer (B) for LANGUAGE. LANGUAGE is not specified by > POSIX and does not perfectly fit into the locale system, therefore I > believe it is best treated separately. This was also raised in the glibc bugtracker system: Daiki Ueno, 2016-05-31 "API for language priority list" https://sourceware.org/bugzilla/show_bug.cgi?id=20184 It was proposed that a language preference list could be set on a thread specific basis, that would not involve setting environment variables. This accords with point 2 in Rich Felker's 2014 commentary: 2. The $LANGUAGE variable conflicts with uselocale and thread-local locales. For instance if the caller has called uselocale to request language Y despite the process-wide locale being language X, where language X is based on the user's preferences in the environment and language Y is based on data, it's wrong to present messages based on the environment ($LANGUAGE) rather than the requested language Y. I hope this possibility is interesting to you although I don't fully understand all the issues involved.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.