Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260513122816.GK1827@brightrain.aerifal.cx>
Date: Wed, 13 May 2026 08:28:17 -0400
From: Rich Felker <dalias@...c.org>
To: Luca Kellermann <mailto.luca.kellermann@...il.com>,
	musl@...ts.openwall.com
Subject: Re: musl multi-level table format for binary locale images

On Wed, May 13, 2026 at 01:06:56PM +0200, Szabolcs Nagy wrote:
> * Luca Kellermann <mailto.luca.kellermann@...il.com> [2026-05-13 05:07:41 +0200]:
> > On Tue, May 12, 2026 at 07:09:32PM -0400, Rich Felker wrote:
> > > [...]
> > > 
> > > The code to perform lookups is not yet merged much less hooked up to
> > > any test framework, but I'm attaching a draft to this email. It needs
> > > to be pointed at the start of the actual table (after the 16-byte file
> > > header).
> > > 
> > > [...]
> > > 
> > > static unsigned get32(const char *b0)
> > > {
> > > 	const unsigned char *b = (const void *)b0;
> > > 	return (b[0]<<24) | (b[1]<<16) | (b[2]<<8) | b[3];
> > > }
> > 
> > b[0] is promoted to int before shifting so a bit is shifted into the
> > sign position (UB) if b[0] > 0x7f.
> 
> yeah this is annoying to do in c, i thought it was fixed in c23,

I'm not sure how it could be "fixed" unless you mean by defining
signed overflow for << or in general. Any change in how the types are
interpreted would be show-stoppingly breaking.

> btw if this is a mappable format, then wouldn't little-endian repr
> be better for most cpus? so get32 is optimized to a single load
> (nowadays even unaligned loads are efficient, so compilers emit them)

My choice of big endian was purely aesthetic, in that I prefer it for
on-disk/on-the-wire formats. It's the conventional default there, and
conveys the sense that "this is serialized bytes, not a native integer
you can access directly" so that the portability error is immediately
apparent on mainstream archs if you try to do that.

If this is really the bottleneck I wouldn't be opposed to changing it,
but I think bswap is basically free and the rest of the operation
(mainly, chasing pointers/cache lines/TLBs) dominates the runtime.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.