|
|
Message-ID: <20260506163028.GZ1827@brightrain.aerifal.cx> Date: Wed, 6 May 2026 12:30:29 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: musl localedef source format - informal spec The musl localedef format is a subset of POSIX localedef format, plus extensions to represent error message strings, and is specified relative to the POSIX 2024 spec from XBD 7.3 Locale Definition. Most of the subsetting is motivated by omitting generality that was historically intended to support a range of character encodings, possibly even ASCII-incompatible ones. This is part of the locale support overhaul project, funded by NLnet and the NGI Zero Core Fund. Basic limitations: comment_char and escape_char override keywords are not supported. The comment character is always # and the escape character is always \. In all category sections, the "copy" keyword for copying the contents of another locale is not supported. Symbolic character names (enclosed in <> brackets) are not supported. In string contexts, characters which are special per POSIX localedef can be represented by preceding them with a backslash (\\, \<, \>, and \"). Control characters (U+0000 - U+001F) are not accepted in string contents. [These are not rejected by any present tooling, but no special provisions for supporting them are made, since they should never appear in a valid locale.] Otherwise, all characters must be represented by themselves. In particular, octal, hex, and decimal character constants are not supported. LC_CTYPE: This category is presently not defined/supported. All locales are UTF-8 and honor the built-in character classifications based on Unicode. LC_COLLATE: TBD. LC_MONETARY: All keywords are supported. LC_NUMERIC: All keywords are supported. The contents of decimal_point must be either "." or ",". [This is presently not checked at locale generation time by the tooling, but at runtime, any value not equal to "," will be treated as "."] LC_TIME: All keywords are supported. LC_MESSAGES: All keywords are supported. In addition, the deprecated/removed keywords yesstr and nostr are supported, as well as keywords for error codes. These keywords are written in uppercase to match the error code macro names E* from errno.h, EAI_* from netdb.h, REG_* from regex.h, plus: - HOST_NOT_FOUND, TRY_AGAIN, NO_RECOVERY, and NO_DATA from netdb.h. - E0 and E_ for errno=0 and unknown errno code messages. - H0 and H_ for h_errno==0 and unknown h_errno code messages. - EAI_0 and EAI__ for no-error and unknown error messages. - REG__ for unknown error message. Other extensions: None at present. If further extensions are added, keyword naming should be chosen to be unlikely to conflict with anything in the standard (analogous to how uppercase naming for error codes is unlikely to conflict with generally-lowercase keywords for standard LC_MESSAGES contents). The LC_COLLATE category will need keywords for importing the base or localized FractionalUCA data and, if base data is ised, keywords for applying the localizations to it. This will be amended later alongside collation integration.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.