Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260506163028.GZ1827@brightrain.aerifal.cx>
Date: Wed, 6 May 2026 12:30:29 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: musl localedef source format - informal spec

The musl localedef format is a subset of POSIX localedef format, plus
extensions to represent error message strings, and is specified
relative to the POSIX 2024 spec from XBD 7.3 Locale Definition.

Most of the subsetting is motivated by omitting generality that was
historically intended to support a range of character encodings,
possibly even ASCII-incompatible ones.

This is part of the locale support overhaul project, funded by NLnet
and the NGI Zero Core Fund.




Basic limitations:

comment_char and escape_char override keywords are not supported. The
comment character is always # and the escape character is always \.

In all category sections, the "copy" keyword for copying the contents
of another locale is not supported.

Symbolic character names (enclosed in <> brackets) are not supported.

In string contexts, characters which are special per POSIX localedef
can be represented by preceding them with a backslash (\\, \<, \>, and
\").

Control characters (U+0000 - U+001F) are not accepted in string
contents. [These are not rejected by any present tooling, but no
special provisions for supporting them are made, since they should
never appear in a valid locale.]

Otherwise, all characters must be represented by themselves. In
particular, octal, hex, and decimal character constants are not
supported.




LC_CTYPE:

This category is presently not defined/supported. All locales are
UTF-8 and honor the built-in character classifications based on
Unicode.



LC_COLLATE:

TBD.



LC_MONETARY:

All keywords are supported.



LC_NUMERIC:

All keywords are supported. The contents of decimal_point must be
either "." or ",". [This is presently not checked at locale generation
time by the tooling, but at runtime, any value not equal to "," will
be treated as "."]



LC_TIME:

All keywords are supported.



LC_MESSAGES:

All keywords are supported. In addition, the deprecated/removed
keywords yesstr and nostr are supported, as well as keywords for error
codes. These keywords are written in uppercase to match the error code
macro names E* from errno.h, EAI_* from netdb.h, REG_* from regex.h,
plus:

- HOST_NOT_FOUND, TRY_AGAIN, NO_RECOVERY, and NO_DATA from netdb.h.
- E0 and E_ for errno=0 and unknown errno code messages.
- H0 and H_ for h_errno==0 and unknown h_errno code messages.
- EAI_0 and EAI__ for no-error and unknown error messages.
- REG__ for unknown error message.



Other extensions:

None at present.

If further extensions are added, keyword naming should be chosen to be
unlikely to conflict with anything in the standard (analogous to how
uppercase naming for error codes is unlikely to conflict with
generally-lowercase keywords for standard LC_MESSAGES contents).

The LC_COLLATE category will need keywords for importing the base or
localized FractionalUCA data and, if base data is ised, keywords for
applying the localizations to it. This will be amended later alongside
collation integration.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.