![]() |
|
Message-ID: <20250618231420.GC1827@brightrain.aerifal.cx> Date: Wed, 18 Jun 2025 19:14:21 -0400 From: Rich Felker <dalias@...c.org> To: Thorsten Glaser <tg@...bsd.de> Cc: musl@...ts.openwall.com, Pablo Correa Gomez <pabloyoyoista@...tmarketos.org> Subject: Re: Planned locale work and community thoughts On Thu, Jun 19, 2025 at 12:42:50AM +0200, Thorsten Glaser wrote: > On Wed, 18 Jun 2025, Rich Felker wrote: > > >Theoretically it's possible the textual grep missed things if there is > >inconsistent json formatting anywhere, so if anyone familiar with jq > >wants to conduct a search using it instead to confirm, go ahead. I > > My jq-foo is not very good, but I managed this: > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -e '^ "[^.,]"' -e '^ ".[^"]' | uniq > "٫" > > So yes, U+066B is the only other one, and no multi-char ones. > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ "[^.,]"' -e '^ ".[^"]' > > … shows all the occurrences, but a quick filter shows that we have > both symbols-numberSystem-arabext and symbols-numberSystem-arab but > assuming both are out of scope… > > tg@...p:/tmp/u/cldr-numbers-full/main $ cat */numbers.json | jq 'paths(.decimal?|scalars) as $p | [">>" + ($p | join(".")), getpath($p).decimal]' | sed 's/">>/>>/' | grep -B 1 -e '^ "[^.,]"' -e '^ ".[^"]' | fgrep '>>' | fgrep -v -e '.symbols-numberSystem-arabext"' -e '.symbols-numberSystem-arab"' > >>main.bgn-AE.numbers.symbols-numberSystem-latn", > >>main.bgn-AF.numbers.symbols-numberSystem-latn", > >>main.bgn-IR.numbers.symbols-numberSystem-latn", > >>main.bgn-OM.numbers.symbols-numberSystem-latn", > >>main.bgn.numbers.symbols-numberSystem-latn", > > … leaves us with this; bgn/numbers.json examplary: > > { > "main": { > "bgn": { > "numbers": { > "symbols-numberSystem-arabext": { > "decimal": "٫", > "group": "٬", > "list": "؛", > … > }, > "symbols-numberSystem-latn": { > "decimal": "٫", > "group": "،", > "list": ";", > … > > So, if the bgn locales are ever going to be relevant… > unsure what that exactly is, but my acronyms database says… > [ISO 639-3] Western Balochi (cf. bal) > … which seems to fit. Thanks. My grapping seems to have overlooked that just because it was the same character that would normally only be used in an alt-digits context. I wonder if the above is intentional or a mistake and if any systems are actually doing that.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.