Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250618231420.GC1827@brightrain.aerifal.cx>
Date: Wed, 18 Jun 2025 19:14:21 -0400
From: Rich Felker <dalias@...c.org>
To: Thorsten Glaser <tg@...bsd.de>
Cc: musl@...ts.openwall.com,
	Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
Subject: Re: Planned locale work and community thoughts

On Thu, Jun 19, 2025 at 12:42:50AM +0200, Thorsten Glaser wrote:
> On Wed, 18 Jun 2025, Rich Felker wrote:
> 
> >Theoretically it's possible the textual grep missed things if there is
> >inconsistent json formatting anywhere, so if anyone familiar with jq
> >wants to conduct a search using it instead to confirm, go ahead. I
> 
> My jq-foo is not very good, but I managed this:
> 
> tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^  "[^.,]"' -e '^  ".[^"]' | uniq
>   "٫"
> 
> So yes, U+066B is the only other one, and no multi-char ones.
> 
> tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]'
> 
> … shows all the occurrences, but a quick filter shows that we have
> both symbols-numberSystem-arabext and symbols-numberSystem-arab but
> assuming both are out of scope…
> 
> tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^  "[^.,]"' -e '^  ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-numberSystem-arabext"' -e '.symbols-numberSystem-arab"'
>   >>main.bgn-AE.numbers.symbols-numberSystem-latn",
>   >>main.bgn-AF.numbers.symbols-numberSystem-latn",
>   >>main.bgn-IR.numbers.symbols-numberSystem-latn",
>   >>main.bgn-OM.numbers.symbols-numberSystem-latn",
>   >>main.bgn.numbers.symbols-numberSystem-latn",
> 
> … leaves us with this; bgn/numbers.json examplary:
> 
> {
>   "main": {
>     "bgn": {
>       "numbers": {
>         "symbols-numberSystem-arabext": {
>           "decimal": "٫",
>           "group": "٬",
>           "list": "؛",
> …
>         },
>         "symbols-numberSystem-latn": {
>           "decimal": "٫",
>           "group": "،",
>           "list": ";",
> …
> 
> So, if the bgn locales are ever going to be relevant…
> unsure what that exactly is, but my acronyms database says…
> 	[ISO 639-3] Western Balochi (cf. bal)
> … which seems to fit.

Thanks. My grapping seems to have overlooked that just because it was
the same character that would normally only be used in an alt-digits
context. I wonder if the above is intentional or a mistake and if any
systems are actually doing that.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.