|
|
Message-ID: <bd69d018f8e7db98e6aae4f5594fb69b1a7ab11e.camel@postmarketos.org>
Date: Mon, 04 May 2026 17:16:32 +0200
From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Start of localedef tool for locales project
El Thu, 30-04-2026 a las 15:22 -0400, Rich Felker escribió:
> On Thu, Apr 30, 2026 at 02:28:38PM -0400, Rich Felker wrote:
> > On Thu, Apr 30, 2026 at 08:08:43PM +0200, Pablo Correa Gomez wrote:
> > > El Wed, 29-04-2026 a las 16:08 -0400, Rich Felker escribió:
> > > > Based on previous proposal & discussion of the new locale source
> > > > format (subset of POSIX localedef, to be documented, with extensions
> > > > for error strings) and the concepts for the proposed binary runtime
> > > > format, I've put together a simple parser that reads localedef-format
> > > > input and emits what amount to a sequence of insertions into the
> > > > multi-level binary table format.
> > >
> > > Really nice! I've now just tested with a few of the current translations,
> > > and
> > > seems to mostly just work fine. The only thing that called my attention is
> > > that
> > > the parser seemed to miss the semi-colon-separated eras and alt-digits,
> > > taking
> > > just the first one. You can find the source file attached if you want to
> > > use it
> > > for testing.
> >
> > OK, AFAICT the nl_langinfo documentation is insufficient to determine
> > what the actual data in ALT_DIGITS and ERA is supposed to look like. I
> > assumed ALT_DIGITS was a single string with one character per digit,
> > but perhaps it's supposed to be a multi-string delimited by null bytes
> > and terminated by an empty string or something to that effect. I think
> > we'll need to look at what other implementations do here to figure out
> > the intent.
>
> OK, for ALT_DIGITS and ERA, it's specified in XSH 7.3.5.2:
>
> https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1_chap07.html#tag_0
> 7_03_05_02
>
> that the values returned by nl_langinfo are single strings with
> in-band semicolons as delimiters.
>
> I think this means that literal semicolons cannot be part of the
> content, and that we should just compile these string lists to a
> single string, treating the pattern /"[:space:]+;[:space:]+"/ as if it
> were a single semicolon character.
>
> AFAICT this is not what glibc does though. Its localedef utility seems
> to write them as null-delimited multistrings, and I don't see where
> nl_langinfo would be able to convert it back to the POSIX-specified
> form.
>
> I don't have a glibc vm anywhere at the moment with locales installed.
> Any volunteers to check nl_langinfo(ALT_DIGITS) and nl_langinfo(ERA)
> on a locale that uses them (jp_JP?) to see what you get?
>
> #include <stdio.h>
> #include <langinfo.h>
> #include <locale.h>
> int main() {
> setlocale(LC_ALL, "");
> puts(nl_langinfo(ALT_DIGITS));
> puts(nl_langinfo(ERA));
> }
>From a Debian container:
root@...610c7f086:~# cat alt-era.c
#include <stdio.h>
#include <langinfo.h>
#include <locale.h>
int main() {
setlocale(LC_ALL, "");
puts(nl_langinfo(ALT_DIGITS));
puts(nl_langinfo(ERA));
}
root@...610c7f086:~# gcc alt-era.c
root@...610c7f086:~# LC_ALL=ja_JP.UTF-8 ./a.out
〇
+:2:2020/01/01:+*:令和:%EC%Ey年
root@...610c7f086:~# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 13 (trixie)"
NAME="Debian GNU/Linux"
VERSION_ID="13"
VERSION="13 (trixie)"
VERSION_CODENAME=trixie
DEBIAN_VERSION_FULL=13.4
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.