Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260513120523.GI1827@brightrain.aerifal.cx>
Date: Wed, 13 May 2026 08:05:24 -0400
From: Rich Felker <dalias@...c.org>
To: Luca Kellermann <mailto.luca.kellermann@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: musl multi-level table format for binary locale images

On Wed, May 13, 2026 at 05:07:41AM +0200, Luca Kellermann wrote:
> On Tue, May 12, 2026 at 07:09:32PM -0400, Rich Felker wrote:
> > [...]
> > 
> > The code to perform lookups is not yet merged much less hooked up to
> > any test framework, but I'm attaching a draft to this email. It needs
> > to be pointed at the start of the actual table (after the 16-byte file
> > header).
> > 
> > [...]
> > 
> > static unsigned get32(const char *b0)
> > {
> > 	const unsigned char *b = (const void *)b0;
> > 	return (b[0]<<24) | (b[1]<<16) | (b[2]<<8) | b[3];
> > }
> 
> b[0] is promoted to int before shifting so a bit is shifted into the
> sign position (UB) if b[0] > 0x7f.

Yes, thanks for the early review and catching this.

> > const char *lookup(const char *ld, int key)
> > {
> > 	unsigned shift, scale, len, val, x, k = key;
> > 
> > 	do {
> > 		k -= get32(ld);
> > 		shift = ld[4];
> > 		scale = ld[5];
> > 		//if (shift > 31 || scale > 2) return 0;
> > 		//len = (get16(ld+6)+1) << scale;
> > 		len = get16(ld+6);
> 
> Here you could also check that len & ((1 << scale) - 1) == 0 to ensure
> that len is a multiple of 1 << scale.

I don't think this is a place where added runtime cost for validation
makes sense. The files necessarily have to be trusted to a large
extent. Maybe less so than other software since we don't have format
strings in here, but there's still plenty of mischief you can do with
a malicious-content locale file that's not outright malformed.

It might make sense to have and end-of-mapping pointer to validate
that lookups can't go outside of it, but I don't want to create the
illusion that processing them is safe against malicious files. Even
with validation, the compiler may assume no data races, so it could
use a different load of the data for the check and the actual use
(introduce TOCTOU races) that an attacker who can modify the mapped
file out from under you could exploit.

> > 		x = (k>>shift) << scale;
> > 		k &= (1U<<shift) - 1;
> > 		if (x >= len) return 0;
> 
> If scale > shift and k > (0xffffffff >> (scale - shift)), the invalid
> index would not be detected. For strerror() this would mean, for
> example, that INT_MIN + 1 results in "Operation not permitted" instead
> of NULL (which would presumably be replaced by "Unknown error")
> because: (0x80000002>>0) << 1 == 4.

Yes. I think I was thinking that valid indices after shift will always
be in a small range where you don't have to worry about that (at most
64k; anything larger requires multiple levels with shift>0). But of
course we still need to be able to catch the unmapped indices to
reject them.

> Invalid indexes should instead be detected before applying scale:
> 
> x = k>>shift;
> k &= (1U<<shift) - 1;
> if (x >= (len>>scale)) return 0;
> x <<= scale;

That works but defeats the purpose of having the scale pre-applied to
len. If we need to check this I should probably just flip back to
having len be a count rather than a size.

> > 		ld += 8;
> > 		for (int i=val=0; i<(1U<<scale); i++)
> > 			val = 256*val + get8(ld+x+i);
> > 		if (!val) return 0;
> > 		ld += len + val-1;
> 
> Because len + val-1 is evaluated as type unsigned, this will wrap
> around and give an incorrect result when the sum is greater than
> 0xffffffff (locale file has to be unreasonably large though). A
> possible fix:

This is just a malformed file (not mappable/processable on a 32-bit
host).

As mentioned above, we could put the check against an end pointer
here. I'm on the fence about whether that's a good idea and would be
happy to hear input..

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.