Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260430192205.GV1827@brightrain.aerifal.cx>
Date: Thu, 30 Apr 2026 15:22:05 -0400
From: Rich Felker <dalias@...c.org>
To: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Cc: musl@...ts.openwall.com
Subject: Re: Start of localedef tool for locales project

On Thu, Apr 30, 2026 at 02:28:38PM -0400, Rich Felker wrote:
> On Thu, Apr 30, 2026 at 08:08:43PM +0200, Pablo Correa Gomez wrote:
> > El Wed, 29-04-2026 a las 16:08 -0400, Rich Felker escribió:
> > > Based on previous proposal & discussion of the new locale source
> > > format (subset of POSIX localedef, to be documented, with extensions
> > > for error strings) and the concepts for the proposed binary runtime
> > > format, I've put together a simple parser that reads localedef-format
> > > input and emits what amount to a sequence of insertions into the
> > > multi-level binary table format.
> > 
> > Really nice! I've now just tested with a few of the current translations, and
> > seems to mostly just work fine. The only thing that called my attention is that
> > the parser seemed to miss the semi-colon-separated eras and alt-digits, taking
> > just the first one. You can find the source file attached if you want to use it
> > for testing.
> 
> OK, AFAICT the nl_langinfo documentation is insufficient to determine
> what the actual data in ALT_DIGITS and ERA is supposed to look like. I
> assumed ALT_DIGITS was a single string with one character per digit,
> but perhaps it's supposed to be a multi-string delimited by null bytes
> and terminated by an empty string or something to that effect. I think
> we'll need to look at what other implementations do here to figure out
> the intent.

OK, for ALT_DIGITS and ERA, it's specified in XSH 7.3.5.2:

https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_07_03_05_02

that the values returned by nl_langinfo are single strings with
in-band semicolons as delimiters.

I think this means that literal semicolons cannot be part of the
content, and that we should just compile these string lists to a
single string, treating the pattern /"[:space:]+;[:space:]+"/ as if it
were a single semicolon character.

AFAICT this is not what glibc does though. Its localedef utility seems
to write them as null-delimited multistrings, and I don't see where
nl_langinfo would be able to convert it back to the POSIX-specified
form.

I don't have a glibc vm anywhere at the moment with locales installed.
Any volunteers to check nl_langinfo(ALT_DIGITS) and nl_langinfo(ERA)
on a locale that uses them (jp_JP?) to see what you get?

#include <stdio.h>
#include <langinfo.h>
#include <locale.h>
int main() {
	setlocale(LC_ALL, "");
	puts(nl_langinfo(ALT_DIGITS));
	puts(nl_langinfo(ERA));
}

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.