|
Message-ID: <20240110015550.GP4163@brightrain.aerifal.cx> Date: Tue, 9 Jan 2024 20:55:50 -0500 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Protect pthreads' mutexes against use-after-destroy On Tue, Jan 09, 2024 at 02:07:26PM -0500, Rich Felker wrote: > On Tue, Jan 09, 2024 at 03:37:17PM +0100, jvoisin wrote: > > Ohai, > > > > as discussed on irc, Android's bionic has a check to prevent > > use-after-destroy on phtread mutexes > > (https://github.com/LineageOS/android_bionic/blob/e0aac7df6f58138dae903b5d456c947a3f8092ea/libc/bionic/pthread_mutex.cpp#L803), > > and musl doesn't. > > > > While odds are that this is a super-duper common bug, it would still be > > nice to have this kind of protection, since it's cheap, and would > > prevent/make it easy to diagnose weird states. > > > > Is this something that should/could be implemented? > > > > o/ > > I think you meant that the odds are it's not common. There's already > enough complexity in the code paths for supporting all the different > mutex types that my leaning would be, if we do any hardening for > use-after-destroy, that it should probably just take the form of > putting the object in a state that will naturally deadlock or error > rather than adding extra checks to every path where it's used. > > If OTOH we do want it to actually trap in all cases where it's used > after destroy, the simplest way to achieve that is probably to set it > up as a non-robust non-PI recursive or errorchecking mutex with > invalid prev/next pointers and owner of 0x3fffffff. Then the only > place that would actually have to have an explicit trap is trylock in > the code path: > > if (own == 0x3fffffff) return ENOTRECOVERABLE; > > where it could trap if type isn't robust. The unlock code path would > trap on accessing invalid prev/next pointers. Unfortunately I discovered a problem we need to deal with in researching for this: at some point Linux quietly changed the futex ABI, so that bit 29 is no longer reserved but potentially a tid bit. This was documented in 9c40365a65d62d7c06a95fb331b3442cb02d2fd9 but apparently actually happened at the source level a long time before that. So, we cannot assume 0x3fffffff is not a valid tid, and thereby cannot assume 0x7fffffff is not equal to ownerdead|valid_tid. This probably means we need to find a way to encode "not recoverable" as 0x40000000, as 0 is now the _only_ value in the low-30-bits that can't potentially be a valid tid. I'll look at this more over the next day or two. It's probably fixable but requires fiddling with delicate logic. Note that the only in-the-wild breakage possible is on systems where the pid/tid limit has been set extremely high, where attempts to lock a recursive or errorchecking mutex owned by a thread with tid 0x3fffffff could malfunction. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.