musl - Re: Deadlock in dynamic linker?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250528001332.GR1827@brightrain.aerifal.cx>
Date: Tue, 27 May 2025 20:13:33 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com
Subject: Re: Deadlock in dynamic linker?

On Tue, May 27, 2025 at 02:26:15PM -0400, Rich Felker wrote:
> On Tue, May 27, 2025 at 06:59:12PM +0200, Markus Wichmann wrote:
> > Am Tue, May 27, 2025 at 11:20:07AM -0400 schrieb Rich Felker:
> > > On Sat, May 24, 2025 at 07:45:45AM +0200, Markus Wichmann wrote:
> > > > I'm thinking something like this: Thread A initializes liba.so. liba.so
> > > > has initializers and finalizers, so thread A adds liba.so to the fini
> > > > list before calling the initializers. The liba initializer calls
> > > > dlopen("libb.so"). libb.so also has initializers.
> > > > 
> > > > While thread A is not holding the init_fini_lock, thread B calls exit().
> > > > That progresses until __libc_exit_fini() sets shutting_down to 1. Then
> > > > it tries to destroy all the libraries, but the loop stops when it comes
> > > > to liba.
> > > > 
> > > > liba.so has a ctor_visitor, namely thread A, so thread B cannot advance.
> > > > Thread A meanwhile is hanging in the infinite wait loop trying to
> > > > initialize libb.so. The situation cannot change, and the process hangs
> > > > indefinitely.
> > > 
> > > I see. In particular you're assuming the dlopen of libb happened after
> > > the exit started.
> > > 
> > 
> > I had completely neglected to look at the global ldso lock, actually.
> > But looking at it again, I am actually assuming that the dlopen() is
> > *starting* before the __libc_exit_fini() (so that thread B hangs waiting
> > for the lock), but that thread B then overtakes thread A between the
> > latter's release of the global lock and the taking of the init_fini_lock.
> > 
> > This does mean that taking the init_fini_lock before releasing the
> > global lock would entirely prevent the issue. Not sure if that's
> > acceptable, though.
> > 
> > > > A simple way out of this pickle could be to add liba.so to the fini list
> > > > only after it was initialized. That way, thread B cannot hang on it, or
> > > > more generally, the finalizing thread cannot be halted by an incomplete
> > > > initialization in another thread. This might change the order of nodes
> > > > on the fini list, but only to account for dynamic dependencies. Isn't
> > > > that a good thing?
> > > 
> > > No, I think it's non-conforming, and also unsafe, as it can result in
> > > failure to run a dtor for something whose ctor already ran but did not
> > > finish. This is a worse outcome than a deadlock in a situation that's
> > > arguably undefined to begin with.
> > > 
> > 
> > But __libc_exit_fini() refuses to destroy libraries that haven't been
> > constructed completely. If p->constructed is zero, a node is skipped
> > even if it is on the fini list. And that flag is set in do_init_fini()
> > only after all constructors have returned.
> 
> p->constructed being zero can only happen and mean "incompletely
> p->constructed" in the case where visitor is self (call to exit from
> p->your own ctor). It's not a condition you can encounter. from
> p->concurrency, since in that case you would not get past the condvar
> p->wait due to there being a visitor.

Uhg, something went wrong in mailing (mail-mode being really unhelpful
and messing up fill-paragraph behavior). This should read:

p->constructed being zero can only happen and mean "incompletely
constructed" in the case where visitor is self (call to exit from your
own ctor). It's not a condition you can encounter from concurrency,
since in that case you would not get past the condvar wait due to
there being a visitor.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.