libc-coord - Re: [musl] Re: Making exit actually thread-safe

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240726195621.GN10433@brightrain.aerifal.cx>
Date: Fri, 26 Jul 2024 15:56:22 -0400
From: Rich Felker <dalias@...c.org>
To: Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>
Cc: libc-coord@...ts.openwall.com, enh <enh@...gle.com>,
	musl@...ts.openwall.com, libc-alpha@...rceware.org
Subject: Re: [musl] Re: Making exit actually thread-safe

On Thu, Jul 25, 2024 at 09:48:46AM -0300, Adhemerval Zanella Netto wrote:
> 
> 
> On 25/07/24 09:39, enh wrote:
> > On Wed, Jul 24, 2024 at 8:19 PM Rich Felker <dalias@...c.org> wrote:
> >>
> >> On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote:
> >>> you didn't want to go with the recursive mutex variant mentioned? i'm
> >>> convinced by this change for Android too, but was leaning towards the
> >>> recursive mutex myself...
> >>
> >> The change I'm advocating for first is a minimal one, just making
> >> calls from other threads well-defined by blocking until the process
> >> terminates. This is a trivial change that any implementation can adopt
> >> without breaking anything else, and doesn't have any potential
> >> far-reaching consequences.
> >>
> >> While some implementations may want to allow (or feel they already
> >> allow, often by accident)
> > 
> > yeah, i think that was what made me lean toward the recursive mutex
> > --- the assumption that _that's_ the option least likely to break
> > anyone. (i actually thought that was the point of even mentioning it
> > in the proposal --- the assumption that someone somewhere has an
> > atexit() handler that calls exit(). normally at this point i'd say "if
> > i've learned one thing in a decade+ of dealing with Android's libc and
> > the third-party binary app ecosystem, it's that no matter how crazy a
> > thing, if you can imagine it, someone's relying on it already", but
> > since "exiting" isn't really a thing on Android -- you're either
> > backgrounded or kill -9'ed, and don't typically have any kind of
> > "quit" functionality yourself -- this is one place where it seems
> > relatively unlikely.)
> > 
> >> recursive calls to exit, imposing a
> >> requirement to do this without a deep dive into where that might lead
> >> seems like a bad idea to me. Even if it is desirable, it's something
> >> that could be considered separately without having the thread-safety
> >> issue blocked on it.
> >>
> >> By leaving the recursive case undefined as it was before, any
> >> implementations that want to do that or keep doing that are free to do
> >> so.
> > 
> > aye, but a program that calls exit() from an atexit() handler is
> > working for me right now on Android, glibc, and macOS. so there's a
> > user-visible behavior change here for any of those libcs that goes
> > with a non-recursive mutex. (i think the same is true for musl too,
> > but don't have a musl-based system to test on.)
> 
> I think it is reasonable to not add the constraint to allow recursive
> exit, although making this implementation defined will most likely 
> pressure to eventually have the resolution on the most used behavior
> (unless it is broken by design).
> 
> At least for glibc, my plan is to keep current support of allowing it
> so mostly likely we will use a recursive mutex. 

Part of the reason I'm hesitant to suggesst specifying any behavior is
that it's a lot messier than we'd probably like to think it is.

There actually is a fairly "good" motivation for wanting recursive
exit to work: it lets atexit handlers (or global dtors) override the
exit code, for example if a write error is detected during cleanup.
While that can already be done by using _exit/_Exit (this is the way
gnulib does it, noting in the comments that exit cannot be called from
an atexit handler without invoking UB), it's somewhat unsatisfying
because it precludes any further execution of other atexit handlers
and precludes leaving stdio streams in a consistent state at exit
(flushed and with underlying fd position updated to match the logical
FILE position).

One might think recursive exit is a good solution here. The natural
behavior is that the currently executing atexit handler will alread
have been popped off the handler registration stack, so that when you
call exit again, things will pick up with the next handler.

The problem is that global dtors are not each their own handler. In
most real-world implementations, there is a single handler that runs
at the end of the atexit handler stack that processes all the global
dtors in the dtor_array or similar. This means that calling exit from
any one of them will skip execution of *all subsequent* dtors, not
just the currently executing one.

I don't see any *reasonable* way to specify this behavior; it's a
consequence of implementation details, not anything present in the
abstract machine.

Probably the closest we could get to a reasonable specification is
stipulating a behavior for exit from an actual abstract-machine atexit
handler (execution picks up at the next handler) but leaving the case
of exit from a global dtor undefined.

There may be other nasty surprises in this area too I haven't yet
thought of, but this is the main one that comes to mind so far.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.