musl - Re: abort() fails to terminate PID 1 process

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160620194110.GM10893@brightrain.aerifal.cx>
Date: Mon, 20 Jun 2016 15:41:10 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: abort() fails to terminate PID 1 process

On Mon, Jun 20, 2016 at 02:00:42PM +0200, Igmar Palsenberg wrote:
> 
> > > First, processes kan install handlers, which might 
> > > instruct the kernel to ignore the signal. SIGABORT can be ignored. I don't 
> > 
> > abort() should terminate the process even if SIGABRT is ignored.
> 
> That rule doesn't apply to pid 1 by default. Pid 1 should be a proper init 
> system, not a full blows application that makes the system blow up on 
> every error.

abort is specified to terminate the process no matter what. For it to
ever be able to return is a serious bug since both the compiler and
the programmer can assume any code after abort() is unreachable. At
present musl avoids this worst-case failure (wrongfully returning)
with an infinite loop, but that's just a fail-safe. The intent is that
it terminate, and in particular, terminate abnormally as specified,
which we don't do enough to guarantee (SIGKILL is not "abnormal"
termination). So there's definitely work to be done to fix this. It's
an issue I've been aware of for a long time but the kernel makes it
painful to reliably produce abnormal termination without race
conditions.

> > > expect my process to be SIGILL'ed next because of this (which, can also be 
> > > ignored).
> > > Libc should NOT mess with these kind of things, that's up to the 
> > > application.
> > 
> > the glibc fallbacks are
> > 
> > change signal mask and set default handling for SIGABRT
> > raise(SIGABRT);
> > "abort instruction" (segfault, sigtrap or sigill depending on target)
> > _exit(127);
> > infinite loop
> 
> Pid 1 is an exception to all of this. 
> 
> > http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/abort.c;h=155d70b0647e848f1d40fc0e3b15a2914d7145c0;hb=HEAD
> > 
> > on x86 glibc, pid 1 would terminate with SIGSEGV
> > (unless there is a segfault handler).
> > 
> > the musl logic is explained in
> > 
> > http://git.musl-libc.org/cgit/musl/commit/?id=2557d0ba47286ed3e868f8ddc9dbed0942fe99dc
> > 
> > neither of them is correct because it is not possible to
> > exit with the right status in general.
> > 
> > SIGKILL can only be ignored by pid 1 whose exit status is
> > not supposed to be observable so musl may want to have a
> > fallback after it since the pid namespace thing is nowadays
> > widely abused on linux.
> 
> Well, normally abort() does some signal magic, and then raises again. 
> Which is what POSIX mandates I think.

To make this work reliably I think we need to make abort() take a lock
the precludes further calls to sigaction prior to re-raising SIGABRT
and resetting the disposition. But there are all sorts of
complications to deal with. For example if another thread performs
posix_spawn for fork and exec concurrent with abort() munging the
disposition of SIGABRT, the child process could start with the wrong
disposition for SIGABRT, which would be non-conforming. Finding ways
to fix all places where the wrong behavior may be observable is a
nontrivial problem.

> If you're pid 1 however, you should behave like one.

I tend to agree, but if you're libc you should also behave as
specified, and currently we don't in this regard.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.