Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250119031759.GP10433@brightrain.aerifal.cx>
Date: Sat, 18 Jan 2025 22:18:00 -0500
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com
Subject: Re: [bug] Ctrl-Z when process is doing posix_spawn makes the
 process hard to kill

On Sat, Jan 18, 2025 at 09:16:58PM +0100, Markus Wichmann wrote:
> Hi all,
> 
> here is my understanding of the bug first.
> 
> 1. Foreground process calls posix_spawn without the POSIX_SPAWN_SETSID
> or POSIX_SPAWN_SETPGROUP flags (either of those prevent the bug).
> 2. User presses terminal suspend character between the parent process
> masking signals and the child process execing the target program.
> 3. Kernel sends SIGTSTP to foreground process group.
> 4. SIGTSTP is blocked in parent process, so parent process does not
> stop. Parent process is blocked in trying to read the pipe to the child,
> though.
> 5. Child process unblocks signals before calling exec(), thereby
> unblocking SIGTSTP and stopping.
> 6. User has an issue mainly because parent process never acts on SIGTSTP
> and stops (which is why the shell's wait() call never returns).
> 
> Looking at the ingredients of the problem, it seems that unblocking
> signals before reading the pipe would be the simplest way out of this
> pickle. We cannot avoid blocking signals before calling clone() to spawn
> the child with blocked signals, and they cannot be unblocked in exec(),
> because all exec() functions pass on the signal mask, but the parent
> could read the pipe with unblocked signals.

I think this is a misunderstanding of the bug. My understanding is
that, due to signals sent from a controlling terminal or to a process
group, it's posssible for a process which logically does not exist yet
to enter a stopped state.

If the parent also stopped, most likely they would get resumed
together, but there is no requirement that this happen. In a worst
case, the child stop may be queued before the child changes to a new
process group; in that case, it's acted upon after the process group
has already changed (because that necessarily happens before signals
are unblocked), and sending SIGCONT to the parent process group (like
a shell would do) will not resume it.

This cannot happen in the case of a hard SIGSTOP though, only SIGTSTP.
So one could argue that my original fix for SIGTSTP suffices, if
you're willing to assume something sending hard SIGSTOP to a process
group will send the SIGCONT to the process group as well.

> The code for reading the pipe and waiting for the child process
> obviously would need to account for the possibility of EINTR, and there
> is a possibility the pipe FD would escape to fork-without-exec in a
> signal handler. That could be helped with FD_CLOFORK emulation in libc,
> though (keep track of CLOFORK FDs in an FD set and close them all in
> _Fork()), since FD_CLOFORK is not in the kernel, sadly.

This doesn't matter. It's always expected that libc-internal fds can
escape this way, and in this case it's completely harmless except for
the resource leak. If you _Fork from a signal handler you're in a
permanent AS context, and can't really do much except exec or _exit.
So the resource usage really doesn't matter. It does not block forward
progress of anything.

> Or else you could tell applications that weird things happen if you fork
> in a signal handler without execing (that's weird usage, anyway).

This is basically what the standard already does.

I'm not really convinced that unblocking signals in the parent is
relevant to fixing this bug, but it might be a better behavior, since
posix_spawn can block forward progress indefinitely if the child file
actions do stupid things like opening a file type that blocks in open.
While the implementation may of course block signals internally where
needed, generally this should follow the as-if rule whereby the
application can't see that they were blocked except by timing
differences. Blocking forward progress that can only occur by a signal
being handled seems like at least bad QoI if not nonconforming.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.