|
|
Message-ID: <f038c45b63904ada07e0b274e324da6104e22b53.camel@postmarketos.org>
Date: Mon, 04 May 2026 17:25:57 +0200
From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Start of localedef tool for locales project
El Thu, 30-04-2026 a las 16:33 -0400, Rich Felker escribió:
> On Thu, Apr 30, 2026 at 03:22:05PM -0400, Rich Felker wrote:
> > On Thu, Apr 30, 2026 at 02:28:38PM -0400, Rich Felker wrote:
> > > On Thu, Apr 30, 2026 at 08:08:43PM +0200, Pablo Correa Gomez wrote:
> > > > El Wed, 29-04-2026 a las 16:08 -0400, Rich Felker escribió:
> > > > > Based on previous proposal & discussion of the new locale source
> > > > > format (subset of POSIX localedef, to be documented, with extensions
> > > > > for error strings) and the concepts for the proposed binary runtime
> > > > > format, I've put together a simple parser that reads localedef-format
> > > > > input and emits what amount to a sequence of insertions into the
> > > > > multi-level binary table format.
> > > >
> > > > Really nice! I've now just tested with a few of the current
> > > > translations, and
> > > > seems to mostly just work fine. The only thing that called my attention
> > > > is that
> > > > the parser seemed to miss the semi-colon-separated eras and alt-digits,
> > > > taking
> > > > just the first one. You can find the source file attached if you want to
> > > > use it
> > > > for testing.
> > >
> > > OK, AFAICT the nl_langinfo documentation is insufficient to determine
> > > what the actual data in ALT_DIGITS and ERA is supposed to look like. I
> > > assumed ALT_DIGITS was a single string with one character per digit,
> > > but perhaps it's supposed to be a multi-string delimited by null bytes
> > > and terminated by an empty string or something to that effect. I think
> > > we'll need to look at what other implementations do here to figure out
> > > the intent.
> >
> > OK, for ALT_DIGITS and ERA, it's specified in XSH 7.3.5.2:
> >
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag
> > _07_03_05_02
> >
> > that the values returned by nl_langinfo are single strings with
> > in-band semicolons as delimiters.
> >
> > I think this means that literal semicolons cannot be part of the
> > content, and that we should just compile these string lists to a
> > single string, treating the pattern /"[:space:]+;[:space:]+"/ as if it
> > were a single semicolon character.
> >
> > AFAICT this is not what glibc does though. Its localedef utility seems
> > to write them as null-delimited multistrings, and I don't see where
> > nl_langinfo would be able to convert it back to the POSIX-specified
> > form.
> >
> > I don't have a glibc vm anywhere at the moment with locales installed.
> > Any volunteers to check nl_langinfo(ALT_DIGITS) and nl_langinfo(ERA)
> > on a locale that uses them (jp_JP?) to see what you get?
> >
> > #include <stdio.h>
> > #include <langinfo.h>
> > #include <locale.h>
> > int main() {
> > setlocale(LC_ALL, "");
> > puts(nl_langinfo(ALT_DIGITS));
> > puts(nl_langinfo(ERA));
> > }
>
> Another important find: POSIX 2024 added ALTMON_x and ABALTMON_x
> langinfo keys for %Ob/%OB strftime formats. So musl needs to get these
> added -- I'll use values matching the glibc keys, even though that
> makes for some annoying gaps. And they need to be added to the parser.
>
> Rich
Good catch! I realized the docs had an outdated version of POSIX for the
LC_TIME, fixed now.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.