Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260513164324.GP1827@brightrain.aerifal.cx>
Date: Wed, 13 May 2026 12:43:29 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: musl multi-level table format for binary locale images

On Tue, May 12, 2026 at 07:09:32PM -0400, Rich Felker wrote:
> On Sat, May 09, 2026 at 11:04:13PM -0400, Rich Felker wrote:
> > On Fri, May 08, 2026 at 11:22:28PM -0400, Rich Felker wrote:
> > > The concepts here have been presented before; what follows is an
> > > informal spec of the actual mappable image format that has emerged
> > > from earlier design proposals and discussion and from implementation
> > > of draft tooling.
> > > 
> > > The multi-level tables used here are in some sense a data-driven
> > > generalization of the multi-level tables used elsewhere in musl for
> > > character data, with headers at each level defining the range covered
> > > and bits examined at the level covered.
> > > 
> > > Some of the details here are not yet matched by the draft tooling, and
> > > all may still be subject to further change until integration and
> > > release.
> > > 
> > > This is part of the locale support overhaul project, funded by NLnet
> > > and the NGI Zero Core Fund.
> > > 
> > > 
> > > 
> > > 
> > > 
> > > FILE FORMAT
> > > [...]
> > 
> > Give me another day or so to clean this up and post it, but I've now
> > implemented both the generation and lookup for the binary locale
> > images. Generation does not make use of shift!=0 (multi-level)
> > functionality, which is presently unneeded outside of collation data;
> > this will be added later, and should be easy to do as a
> > transformationon the in-memory representation prior to serialization.
> > 
> > Right now I have the code bolted haphazardly to the existing
> > parselocale.c to build the image then query back the items in
> > parselocale.c's table to check that they round-trip. I'll probably
> > clean this up a little bit to be reasonable to commit into the
> > development history, then go on to write a simple localedef(1) entry
> > point.
> > 
> > This leaves details of collation and musl integration as the main
> > remaining parts of the locale project.
> 
> An integration of the parser, binary table generation, and
> localedef(1) frontend is up at:
> 
> https://codeberg.org/dalias/musl-locale-tools-draft
> 
> Current version at the time of this email:
> 
> https://codeberg.org/dalias/musl-locale-tools-draft/src/commit/19bf9dc524353232b03735f410490895248ee5b1
> 
> The code to perform lookups is not yet merged much less hooked up to
> any test framework, but I'm attaching a draft to this email. It needs
> to be pointed at the start of the actual table (after the 16-byte file
> header).

Lookup code is now included in the above draft repo, with bugs fixed
and adjustment to length/count field meaning applied, and hooked up to
an extractlocale utility that can pull the text-based source locale
format out of binary files.

Current revision as of this mail is:

https://codeberg.org/dalias/musl-locale-tools-draft/src/commit/653135e636c1e0a3d8a34278079c424afa7d6639

As the data structures are not self-describing but require the
consuming process to be aware of the layout, the same table used by
parselocale to populate the binary table is used to pull data back out
of it.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.