musl - Re: Protect pthreads' mutexes against use-after-destroy

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240110015550.GP4163@brightrain.aerifal.cx>
Date: Tue, 9 Jan 2024 20:55:50 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Protect pthreads' mutexes against use-after-destroy

On Tue, Jan 09, 2024 at 02:07:26PM -0500, Rich Felker wrote:
> On Tue, Jan 09, 2024 at 03:37:17PM +0100, jvoisin wrote:
> > Ohai,
> > 
> > as discussed on irc, Android's bionic has a check to prevent
> > use-after-destroy on phtread mutexes
> > (https://github.com/LineageOS/android_bionic/blob/e0aac7df6f58138dae903b5d456c947a3f8092ea/libc/bionic/pthread_mutex.cpp#L803),
> > and musl doesn't.
> > 
> > While odds are that this is a super-duper common bug, it would still be
> > nice to have this kind of protection, since it's cheap, and would
> > prevent/make it easy to diagnose weird states.
> > 
> > Is this something that should/could be implemented?
> > 
> > o/
> 
> I think you meant that the odds are it's not common. There's already
> enough complexity in the code paths for supporting all the different
> mutex types that my leaning would be, if we do any hardening for
> use-after-destroy, that it should probably just take the form of
> putting the object in a state that will naturally deadlock or error
> rather than adding extra checks to every path where it's used.
> 
> If OTOH we do want it to actually trap in all cases where it's used
> after destroy, the simplest way to achieve that is probably to set it
> up as a non-robust non-PI recursive or errorchecking mutex with
> invalid prev/next pointers and owner of 0x3fffffff. Then the only
> place that would actually have to have an explicit trap is trylock in
> the code path:
> 
>         if (own == 0x3fffffff) return ENOTRECOVERABLE;
> 
> where it could trap if type isn't robust. The unlock code path would
> trap on accessing invalid prev/next pointers.

Unfortunately I discovered a problem we need to deal with in
researching for this: at some point Linux quietly changed the futex
ABI, so that bit 29 is no longer reserved but potentially a tid bit.
This was documented in 9c40365a65d62d7c06a95fb331b3442cb02d2fd9 but
apparently actually happened at the source level a long time before
that. So, we cannot assume 0x3fffffff is not a valid tid, and thereby
cannot assume 0x7fffffff is not equal to ownerdead|valid_tid.

This probably means we need to find a way to encode "not recoverable"
as 0x40000000, as 0 is now the _only_ value in the low-30-bits that
can't potentially be a valid tid.

I'll look at this more over the next day or two. It's probably fixable
but requires fiddling with delicate logic.

Note that the only in-the-wild breakage possible is on systems where
the pid/tid limit has been set extremely high, where attempts to lock
a recursive or errorchecking mutex owned by a thread with tid
0x3fffffff could malfunction.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.