|
|
Message-ID: <20260504180518.GY1827@brightrain.aerifal.cx>
Date: Mon, 4 May 2026 14:05:18 -0400
From: Rich Felker <dalias@...c.org>
To: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Cc: musl@...ts.openwall.com
Subject: Re: Start of localedef tool for locales project
On Mon, May 04, 2026 at 05:16:32PM +0200, Pablo Correa Gomez wrote:
> El Thu, 30-04-2026 a las 15:22 -0400, Rich Felker escribió:
> > On Thu, Apr 30, 2026 at 02:28:38PM -0400, Rich Felker wrote:
> > > On Thu, Apr 30, 2026 at 08:08:43PM +0200, Pablo Correa Gomez wrote:
> > > > El Wed, 29-04-2026 a las 16:08 -0400, Rich Felker escribió:
> > > > > Based on previous proposal & discussion of the new locale source
> > > > > format (subset of POSIX localedef, to be documented, with extensions
> > > > > for error strings) and the concepts for the proposed binary runtime
> > > > > format, I've put together a simple parser that reads localedef-format
> > > > > input and emits what amount to a sequence of insertions into the
> > > > > multi-level binary table format.
> > > >
> > > > Really nice! I've now just tested with a few of the current translations,
> > > > and
> > > > seems to mostly just work fine. The only thing that called my attention is
> > > > that
> > > > the parser seemed to miss the semi-colon-separated eras and alt-digits,
> > > > taking
> > > > just the first one. You can find the source file attached if you want to
> > > > use it
> > > > for testing.
> > >
> > > OK, AFAICT the nl_langinfo documentation is insufficient to determine
> > > what the actual data in ALT_DIGITS and ERA is supposed to look like. I
> > > assumed ALT_DIGITS was a single string with one character per digit,
> > > but perhaps it's supposed to be a multi-string delimited by null bytes
> > > and terminated by an empty string or something to that effect. I think
> > > we'll need to look at what other implementations do here to figure out
> > > the intent.
> >
> > OK, for ALT_DIGITS and ERA, it's specified in XSH 7.3.5.2:
> >
> > https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_0
> > 7_03_05_02
> >
> > that the values returned by nl_langinfo are single strings with
> > in-band semicolons as delimiters.
> >
> > I think this means that literal semicolons cannot be part of the
> > content, and that we should just compile these string lists to a
> > single string, treating the pattern /"[:space:]+;[:space:]+"/ as if it
> > were a single semicolon character.
> >
> > AFAICT this is not what glibc does though. Its localedef utility seems
> > to write them as null-delimited multistrings, and I don't see where
> > nl_langinfo would be able to convert it back to the POSIX-specified
> > form.
> >
> > I don't have a glibc vm anywhere at the moment with locales installed.
> > Any volunteers to check nl_langinfo(ALT_DIGITS) and nl_langinfo(ERA)
> > on a locale that uses them (jp_JP?) to see what you get?
> >
> > #include <stdio.h>
> > #include <langinfo.h>
> > #include <locale.h>
> > int main() {
> > setlocale(LC_ALL, "");
> > puts(nl_langinfo(ALT_DIGITS));
> > puts(nl_langinfo(ERA));
> > }
>
> >From a Debian container:
>
>
> root@...610c7f086:~# cat alt-era.c
>
> #include <stdio.h>
> #include <langinfo.h>
> #include <locale.h>
> int main() {
> setlocale(LC_ALL, "");
> puts(nl_langinfo(ALT_DIGITS));
> puts(nl_langinfo(ERA));
> }
>
> root@...610c7f086:~# gcc alt-era.c
> root@...610c7f086:~# LC_ALL=ja_JP.UTF-8 ./a.out
> 〇
> +:2:2020/01/01:+*:令和:%EC%Ey年
> root@...610c7f086:~# cat /etc/os-release
> PRETTY_NAME="Debian GNU/Linux 13 (trixie)"
> NAME="Debian GNU/Linux"
> VERSION_ID="13"
> VERSION="13 (trixie)"
> VERSION_CODENAME=trixie
> DEBIAN_VERSION_FULL=13.4
> ID=debian
> HOME_URL="https://www.debian.org/"
> SUPPORT_URL="https://www.debian.org/support"
> BUG_REPORT_URL="https://bugs.debian.org/"
Interesting. This looks like glibc is not conforming to the
requirements in POSIX.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.