Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221005143730.GT29905@brightrain.aerifal.cx>
Date: Wed, 5 Oct 2022 10:37:32 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Illegal killlock skipping when transitioning to
 single-threaded state

On Wed, Oct 05, 2022 at 10:03:03AM -0400, Rich Felker wrote:
> On Wed, Oct 05, 2022 at 03:10:09PM +0300, Alexey Izbyshev wrote:
> > On 2022-10-05 04:00, Rich Felker wrote:
> > >On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
> > >>Reordering the "libc.need_locks = -1" assignment and
> > >>UNLOCK(E->killlock) and providing a store barrier between them
> > >>should fix the issue.
> > >
> > >Back to this, because it's immediately actionable without resolving
> > >the aarch64 atomics issue:
> > >
> > >Do you have something in mind for how this reordering is done, since
> > >there are other intervening steps that are potentially ordered with
> > >respect to either or both? I don't think there is actually any
> > >ordering constraint at all on the unlocking of killlock (with the
> > >accompanying assignment self->tid=0 kept with it) except that it be
> > >past the point where we are committed to the thread terminating
> > >without executing any more application code. So my leaning would be to
> > >move this block from the end of pthread_exit up to right after the
> > >point-of-no-return comment.
> > >
> > This was my conclusion as well back when I looked at it before
> > sending the report.
> > 
> > I was initially concerned about whether reordering with
> > a_store(&self->detach_state, DT_EXITED) could cause an unwanted
> > observable change (pthread_tryjoin_np() returning EBUSY after a
> > pthread function acting on tid like pthread_getschedparam() returns
> > ESRCH), but no, pthread_tryjoin_np() will block/trap if the thread
> > is not DT_JOINABLE.
> > 
> > >Unfortunately while reading this I found another bug, this time a lock
> > >order one. __dl_thread_cleanup() takes a lock while the thread list
> > >lock is already held, but fork takes these in the opposite order. I
> > >think the lock here could be dropped and replaced with an atomic-cas
> > >list head, but that's rather messy and I'm open to other ideas.
> > >
> > I'm not sure why using a lock-free list is messy, it seems like a
> > perfect fit here to me.
> 
> Just in general I've tried to reduce the direct use of atomics and use
> high-level primitives, because (as this thread is evidence of) I find
> the reasoning about direct use of atomics and their correctness to be
> difficult and inaccessible to a lot of people who would otherwise be
> successful readers of the code. But you're right that it's a "good
> match" for the problem at hand.
> 
> > However, doesn't __dl_vseterr() use the libc-internal allocator
> > after  34952fe5de44a833370cbe87b63fb8eec61466d7? If so, the problem
> > that freebuf_queue was originally solving doesn't exist anymore. We
> > still can't call the allocator after __tl_lock(), but maybe this
> > whole free deferral approach can be reconsidered?
> 
> I almost made that change when the MT-fork changes were done, but
> didn't because it was wrong. I'm not sure if I documented this
> anywhere (it might be in mail threads related to that or IRC) but it
> was probably because it would need to take malloc locks with the
> thread list lock held, which isn't allowed.
> 
> It would be nice if we could get rid of the deferred freeing here, but
> I don't see a good way. The reason we can't free the buffer until
> after the thread list lock is taken is that it's only freeable if this
> isn't the last exiting thread. If it is the last exiting thread, the
> buffer contents still need to be present for the atexit handlers to
> see. And whether this is the last exiting thread is only
> stable/determinate as long as the thread list lock is held.

Proposed patch with atomic list attached, along with a stupid test
program (to be run under a debugger to see anything happening).

Rich

View attachment "dlerror_free.c" of type "text/plain" (646 bytes)

View attachment "dlerror_free.diff" of type "text/plain" (2761 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.