|
Message-ID: <20140801052953.GA4515@brightrain.aerifal.cx> Date: Fri, 1 Aug 2014 01:29:53 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Information on locale system in musl 1.1.4 The major new feature in musl 1.1.4 is the locale system. In accordance with the long-term plan it was based on, it's designed to: 1. Be lightweight -- calling setlocale pulls in around 2k of code when static linking on i386. 2. Meet the minimum needs for applications to provide an interface in the user's preferred natural language using the official and de facto standard interfaces for doing so -- the standard C/POSIX locale API and gettext translation API. 3. Avoid complicating the libc or applications that call setlocale in ways that impact security, introduce bugs that only occur in unusual locales, or discourage developers of light applications from calling setlocale. The version of the locale system in musl 1.1.4 is still incomplete and experimental. However, its experimental status should not impact use on existing deployments; locales are not loaded at all unless the MUSL_LOCPATH environment variable is set. The features presently supported are: - The setting of the LC_MESSAGES locale category is recorded regardless of whether a libc locale file is available to be loaded. This will be used by the gettext interfaces if the application uses gettext message translation and can be retrieved by the application by calling setlocale(LC_MESSAGES, 0). - Message translation for most messages produced by libc, including error and signal name strings, controlled by LC_MESSAGES. - Translated day/month names and appropriate date/time format strings, controlled by LC_TIME. The key missing features which will definitely be added at some time in the future are collation rules (LC_COLLATE) and currency information and monetary numeric formatting (LC_MONETARY). Finding locale files: If the MUSL_LOCPATH environment variable is set, it's treated as a colon-delimited list of directories to search for locale files. The locale file must have the exact same name as the locale setting being requested. Locale names greater than 15 bytes in length, starting with a '.', or containing the '/' character are rejected. In the future, musl will probably ignore everything after the dot when the locale name contains a dot, since by convention this component reflects a character encoding, whereas musl always uses UTF-8. Other character may also be rejected in the future; to be safe, locale names should be restricted to using alphanumeric characters, the underscore, and the at sign. In programs running with elevated privileges (setuid/setgid/etc.), the MUSL_LOCPATH environment variable is not honored. At present, this means there is no way to use the locale functionality with such programs. This deficiency will be addressed in a future release. Unrecognized locale names: Any locale name that is not usable for any reason (file not found, name rejected, error loading, etc.) is treated as an alias for the built-in C.UTF-8 locale. The motivation for this behavior is to avoid possibly breaking UTF-8 support when the application depends on setlocale success for UTF-8 to work; this may be a bigger issue in the future if musl adopts an abstract 8-bit C locale. Locale file format: A locale file for use by musl is simply a .mo format file like the ones used by gettext, and can be created with the msgfmt utility from the GNU gettext package, gettext-tiny, or possibly other versions. Translations for message strings and LC_TIME strings (day names, month names, strftime-style date/time format strings) all go in the same translation file. The format for monetary and collation data will be specified at a later time, but will be stored in the same type of file. Using gettext: The gettext translation functions are largely compatible with the documented interfaces in the GNU gettext manual. This does not include some more recent, undocumented, ill-designed features in GNU gettext which are used mostly (only?) by some GNU packages so far. The main deviation from GNU gettext in the outward behavior is that the LANGUAGE environment variable is not honored; that topic is covered in a separate message to the musl list. Also, there is no default path for translation files, but this should not affect applications since the documented usage is that calling bindtextdomain is required.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.