|
Message-ID: <20240726195621.GN10433@brightrain.aerifal.cx> Date: Fri, 26 Jul 2024 15:56:22 -0400 From: Rich Felker <dalias@...c.org> To: Adhemerval Zanella Netto <adhemerval.zanella@...aro.org> Cc: libc-coord@...ts.openwall.com, enh <enh@...gle.com>, musl@...ts.openwall.com, libc-alpha@...rceware.org Subject: Re: Re: [libc-coord] Making exit actually thread-safe On Thu, Jul 25, 2024 at 09:48:46AM -0300, Adhemerval Zanella Netto wrote: > > > On 25/07/24 09:39, enh wrote: > > On Wed, Jul 24, 2024 at 8:19 PM Rich Felker <dalias@...c.org> wrote: > >> > >> On Wed, Jul 24, 2024 at 05:21:00PM -0400, enh wrote: > >>> you didn't want to go with the recursive mutex variant mentioned? i'm > >>> convinced by this change for Android too, but was leaning towards the > >>> recursive mutex myself... > >> > >> The change I'm advocating for first is a minimal one, just making > >> calls from other threads well-defined by blocking until the process > >> terminates. This is a trivial change that any implementation can adopt > >> without breaking anything else, and doesn't have any potential > >> far-reaching consequences. > >> > >> While some implementations may want to allow (or feel they already > >> allow, often by accident) > > > > yeah, i think that was what made me lean toward the recursive mutex > > --- the assumption that _that's_ the option least likely to break > > anyone. (i actually thought that was the point of even mentioning it > > in the proposal --- the assumption that someone somewhere has an > > atexit() handler that calls exit(). normally at this point i'd say "if > > i've learned one thing in a decade+ of dealing with Android's libc and > > the third-party binary app ecosystem, it's that no matter how crazy a > > thing, if you can imagine it, someone's relying on it already", but > > since "exiting" isn't really a thing on Android -- you're either > > backgrounded or kill -9'ed, and don't typically have any kind of > > "quit" functionality yourself -- this is one place where it seems > > relatively unlikely.) > > > >> recursive calls to exit, imposing a > >> requirement to do this without a deep dive into where that might lead > >> seems like a bad idea to me. Even if it is desirable, it's something > >> that could be considered separately without having the thread-safety > >> issue blocked on it. > >> > >> By leaving the recursive case undefined as it was before, any > >> implementations that want to do that or keep doing that are free to do > >> so. > > > > aye, but a program that calls exit() from an atexit() handler is > > working for me right now on Android, glibc, and macOS. so there's a > > user-visible behavior change here for any of those libcs that goes > > with a non-recursive mutex. (i think the same is true for musl too, > > but don't have a musl-based system to test on.) > > I think it is reasonable to not add the constraint to allow recursive > exit, although making this implementation defined will most likely > pressure to eventually have the resolution on the most used behavior > (unless it is broken by design). > > At least for glibc, my plan is to keep current support of allowing it > so mostly likely we will use a recursive mutex. Part of the reason I'm hesitant to suggesst specifying any behavior is that it's a lot messier than we'd probably like to think it is. There actually is a fairly "good" motivation for wanting recursive exit to work: it lets atexit handlers (or global dtors) override the exit code, for example if a write error is detected during cleanup. While that can already be done by using _exit/_Exit (this is the way gnulib does it, noting in the comments that exit cannot be called from an atexit handler without invoking UB), it's somewhat unsatisfying because it precludes any further execution of other atexit handlers and precludes leaving stdio streams in a consistent state at exit (flushed and with underlying fd position updated to match the logical FILE position). One might think recursive exit is a good solution here. The natural behavior is that the currently executing atexit handler will alread have been popped off the handler registration stack, so that when you call exit again, things will pick up with the next handler. The problem is that global dtors are not each their own handler. In most real-world implementations, there is a single handler that runs at the end of the atexit handler stack that processes all the global dtors in the dtor_array or similar. This means that calling exit from any one of them will skip execution of *all subsequent* dtors, not just the currently executing one. I don't see any *reasonable* way to specify this behavior; it's a consequence of implementation details, not anything present in the abstract machine. Probably the closest we could get to a reasonable specification is stipulating a behavior for exit from an actual abstract-machine atexit handler (execution picks up at the next handler) but leaving the case of exit from a global dtor undefined. There may be other nasty surprises in this area too I haven't yet thought of, but this is the main one that comes to mind so far. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.