Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201001023018.GL17637@brightrain.aerifal.cx>
Date: Wed, 30 Sep 2020 22:30:19 -0400
From: Rich Felker <dalias@...c.org>
To: Florian Weimer <fw@...eb.enyo.de>
Cc: Carlos O'Donell via Libc-alpha <libc-alpha@...rceware.org>,
	musl@...ts.openwall.com
Subject: Re: [PATCH] Make abort() AS-safe (Bug 26275).

On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote:
> On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote:
> > * Rich Felker:
> > 
> > > Is there a reason to take the lock across fork rather than just
> > > resetting it in the child? After seeing this I'm working on fixing the
> > > same issue in musl and was about to take the lock, but realized ours
> > > isn't actually protecting any userspace data state, just excluding
> > > sigaction on SIGABRT during abort.
> > 
> > It's also necessary to stop the fork because the subprocess could
> > otherwise observe the impossible SIG_DFL state.  In case the signal
> > handler returns, the implementation needs to produce a termination
> > status with SIGABRT as the termination signal, and the only way I can
> > see to achieve that is to remove the signal handler and send the
> > signal again.  This suggests that a lock in sigaction is needed as
> > well.
> 
> Yes, in musl we already have the lock in sigaction -- that's the whole
> point of the lock. To prevent other threads from fighting to change
> the disposition back to SIG_IGN or a signal handler while abort is
> trying to change it to SIG_DFL.
> 
> > But for the fork case, restting the lock in the new subprocess should
> > be sufficient.
> 
> I don't follow. Do you mean taking the lock in the parent, but just
> resetting it in the child? That should work but I don't see how it has
> any advantage over just releasing it in the child.

OK, this is a lot worse than you thought:

Even without fork, execve and posix_spawn can also see the SIGABRT
disposition change made by abort(), passing it on to a process that
should have started with a disposition of SIG_IGN if you hit exactly
the wrong spot in the race.

So, to fix this, these interfaces also have to take the abort lock,
and to make it AS-safe (since execve is required to be), need to block
all signals to take the lock. But execve can't leave signals blocked
or the new process image would inherit that state. So it has to
unblock them after taking the lock. But then a signal handler can
interrupt between taking the lock and the execve syscall, making abort
deadlock if called from the signal handler.

So how to solve this? Having the abort lock be recursive sounds like
it helps (avoid the deadlock above), but then the signal handler that
runs between taking the abort lock and making the execve syscall still
delays abort by other threads for an unbounded length of time, and in
fact it could even longjmp out, leaving a stale lock owner that
prevents any other thread from ever calling abort. Ultimately this
boils down to a general principle: you can't make AS-safe locks that
allow arbitrary application code to run while they're held.

I really don't see any way out without giving abort a mechanism to
"seize" other threads before changing the signal disposition. This
could for example be done with the same mechanism used for
multithreaded set*id (broadcast signal of an implementation-internal,
unblockable signal) or maybe with some seccomp hacks on a recent
enough kernel. Is there some better approach I'm missing??

All of this hell because Linux thought we didn't need a SYS_abort...

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.