Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260430203349.GW1827@brightrain.aerifal.cx>
Date: Thu, 30 Apr 2026 16:33:49 -0400
From: Rich Felker <dalias@...c.org>
To: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Cc: musl@...ts.openwall.com
Subject: Re: Start of localedef tool for locales project

On Thu, Apr 30, 2026 at 03:22:05PM -0400, Rich Felker wrote:
> On Thu, Apr 30, 2026 at 02:28:38PM -0400, Rich Felker wrote:
> > On Thu, Apr 30, 2026 at 08:08:43PM +0200, Pablo Correa Gomez wrote:
> > > El Wed, 29-04-2026 a las 16:08 -0400, Rich Felker escribió:
> > > > Based on previous proposal & discussion of the new locale source
> > > > format (subset of POSIX localedef, to be documented, with extensions
> > > > for error strings) and the concepts for the proposed binary runtime
> > > > format, I've put together a simple parser that reads localedef-format
> > > > input and emits what amount to a sequence of insertions into the
> > > > multi-level binary table format.
> > > 
> > > Really nice! I've now just tested with a few of the current translations, and
> > > seems to mostly just work fine. The only thing that called my attention is that
> > > the parser seemed to miss the semi-colon-separated eras and alt-digits, taking
> > > just the first one. You can find the source file attached if you want to use it
> > > for testing.
> > 
> > OK, AFAICT the nl_langinfo documentation is insufficient to determine
> > what the actual data in ALT_DIGITS and ERA is supposed to look like. I
> > assumed ALT_DIGITS was a single string with one character per digit,
> > but perhaps it's supposed to be a multi-string delimited by null bytes
> > and terminated by an empty string or something to that effect. I think
> > we'll need to look at what other implementations do here to figure out
> > the intent.
> 
> OK, for ALT_DIGITS and ERA, it's specified in XSH 7.3.5.2:
> 
> https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_05_02
> 
> that the values returned by nl_langinfo are single strings with
> in-band semicolons as delimiters.
> 
> I think this means that literal semicolons cannot be part of the
> content, and that we should just compile these string lists to a
> single string, treating the pattern /"[:space:]+;[:space:]+"/ as if it
> were a single semicolon character.
> 
> AFAICT this is not what glibc does though. Its localedef utility seems
> to write them as null-delimited multistrings, and I don't see where
> nl_langinfo would be able to convert it back to the POSIX-specified
> form.
> 
> I don't have a glibc vm anywhere at the moment with locales installed.
> Any volunteers to check nl_langinfo(ALT_DIGITS) and nl_langinfo(ERA)
> on a locale that uses them (jp_JP?) to see what you get?
> 
> #include <stdio.h>
> #include <langinfo.h>
> #include <locale.h>
> int main() {
> 	setlocale(LC_ALL, "");
> 	puts(nl_langinfo(ALT_DIGITS));
> 	puts(nl_langinfo(ERA));
> }

Another important find: POSIX 2024 added ALTMON_x and ABALTMON_x
langinfo keys for %Ob/%OB strftime formats. So musl needs to get these
added -- I'll use values matching the glibc keys, even though that
makes for some annoying gaps. And they need to be added to the parser.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.