Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240110015550.GP4163@brightrain.aerifal.cx>
Date: Tue, 9 Jan 2024 20:55:50 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Protect pthreads' mutexes against use-after-destroy

On Tue, Jan 09, 2024 at 02:07:26PM -0500, Rich Felker wrote:
> On Tue, Jan 09, 2024 at 03:37:17PM +0100, jvoisin wrote:
> > Ohai,
> > 
> > as discussed on irc, Android's bionic has a check to prevent
> > use-after-destroy on phtread mutexes
> > (https://github.com/LineageOS/android_bionic/blob/e0aac7df6f58138dae903b5d456c947a3f8092ea/libc/bionic/pthread_mutex.cpp#L803),
> > and musl doesn't.
> > 
> > While odds are that this is a super-duper common bug, it would still be
> > nice to have this kind of protection, since it's cheap, and would
> > prevent/make it easy to diagnose weird states.
> > 
> > Is this something that should/could be implemented?
> > 
> > o/
> 
> I think you meant that the odds are it's not common. There's already
> enough complexity in the code paths for supporting all the different
> mutex types that my leaning would be, if we do any hardening for
> use-after-destroy, that it should probably just take the form of
> putting the object in a state that will naturally deadlock or error
> rather than adding extra checks to every path where it's used.
> 
> If OTOH we do want it to actually trap in all cases where it's used
> after destroy, the simplest way to achieve that is probably to set it
> up as a non-robust non-PI recursive or errorchecking mutex with
> invalid prev/next pointers and owner of 0x3fffffff. Then the only
> place that would actually have to have an explicit trap is trylock in
> the code path:
> 
>         if (own == 0x3fffffff) return ENOTRECOVERABLE;
> 
> where it could trap if type isn't robust. The unlock code path would
> trap on accessing invalid prev/next pointers.

Unfortunately I discovered a problem we need to deal with in
researching for this: at some point Linux quietly changed the futex
ABI, so that bit 29 is no longer reserved but potentially a tid bit.
This was documented in 9c40365a65d62d7c06a95fb331b3442cb02d2fd9 but
apparently actually happened at the source level a long time before
that. So, we cannot assume 0x3fffffff is not a valid tid, and thereby
cannot assume 0x7fffffff is not equal to ownerdead|valid_tid.

This probably means we need to find a way to encode "not recoverable"
as 0x40000000, as 0 is now the _only_ value in the low-30-bits that
can't potentially be a valid tid.

I'll look at this more over the next day or two. It's probably fixable
but requires fiddling with delicate logic.

Note that the only in-the-wild breakage possible is on systems where
the pid/tid limit has been set extremely high, where attempts to lock
a recursive or errorchecking mutex owned by a thread with tid
0x3fffffff could malfunction.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.