|
|
Message-ID: <20250618231420.GC1827@brightrain.aerifal.cx>
Date: Wed, 18 Jun 2025 19:14:21 -0400
From: Rich Felker <dalias@...c.org>
To: Thorsten Glaser <tg@...bsd.de>
Cc: musl@...ts.openwall.com,
Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Subject: Re: Planned locale work and community thoughts
On Thu, Jun 19, 2025 at 12:42:50AM +0200, Thorsten Glaser wrote:
> On Wed, 18 Jun 2025, Rich Felker wrote:
>
> >Theoretically it's possible the textual grep missed things if there is
> >inconsistent json formatting anywhere, so if anyone familiar with jq
> >wants to conduct a search using it instead to confirm, go ahead. I
>
> My jq-foo is not very good, but I managed this:
>
> tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^ "[^.,]"' -e '^ ".[^"]' | uniq
> "٫"
>
> So yes, U+066B is the only other one, and no multi-char ones.
>
> tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ "[^.,]"' -e '^ ".[^"]'
>
> … shows all the occurrences, but a quick filter shows that we have
> both symbols-numberSystem-arabext and symbols-numberSystem-arab but
> assuming both are out of scope…
>
> tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ "[^.,]"' -e '^ ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-numberSystem-arabext"' -e '.symbols-numberSystem-arab"'
> >>main.bgn-AE.numbers.symbols-numberSystem-latn",
> >>main.bgn-AF.numbers.symbols-numberSystem-latn",
> >>main.bgn-IR.numbers.symbols-numberSystem-latn",
> >>main.bgn-OM.numbers.symbols-numberSystem-latn",
> >>main.bgn.numbers.symbols-numberSystem-latn",
>
> … leaves us with this; bgn/numbers.json examplary:
>
> {
> "main": {
> "bgn": {
> "numbers": {
> "symbols-numberSystem-arabext": {
> "decimal": "٫",
> "group": "٬",
> "list": "؛",
> …
> },
> "symbols-numberSystem-latn": {
> "decimal": "٫",
> "group": "،",
> "list": ";",
> …
>
> So, if the bgn locales are ever going to be relevant…
> unsure what that exactly is, but my acronyms database says…
> [ISO 639-3] Western Balochi (cf. bal)
> … which seems to fit.
Thanks. My grapping seems to have overlooked that just because it was
the same character that would normally only be used in an alt-digits
context. I wonder if the above is intentional or a mistake and if any
systems are actually doing that.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.