musl - Re: Illegal killlock skipping when transitioning to single-threaded state

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221004025831.GI29905@brightrain.aerifal.cx>
Date: Mon, 3 Oct 2022 22:58:32 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com, Alexey Izbyshev <izbyshev@...ras.ru>
Subject: Re: Illegal killlock skipping when transitioning to
 single-threaded state

On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote:
> * Szabolcs Nagy <nsz@...t70.net> [2022-10-03 15:26:15 +0200]:
> 
> > * Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]:
> > > On 2022-09-19 18:29, Rich Felker wrote:
> > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
> > ...
> > > > > Reordering the "libc.need_locks = -1" assignment and
> > > > > UNLOCK(E->killlock) and providing a store barrier between them
> > > > > should fix the issue.
> > > > 
> > > > I think this all sounds correct. I'm not sure what you mean by a store
> > > > barrier between them, since all lock and unlock operations are already
> > > > full barriers.
> > > > 
> > > 
> > > Before sending the report I tried to infer the intended ordering semantics
> > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
> > > see why they would provide a full barrier (my reasoning is below), so I
> > > concluded that probably acquire/release semantics was intended in general
> > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks
> > > = -1" store spelled after UNLOCK(E->killlock) back into the critical
> > > section.
> > > 
> > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
> > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
> > > via load-acquire/store-release instructions. Therefore, if we consider a
> > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
> > > such memory access can be reordered with the initial ldaxr in UNLOCK, and
> > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming
> > > the processor predicts that stlxr succeeds), and further, due to (a), with
> > > any memory access inside the critical section. Therefore, UNLOCK is not full
> > > barrier. Is this right?
> > 
> > i dont think this is right.
> 
> 
> i think i was wrong and you are right.
> 
> so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and
> starting with 'something == 0' the exiting E and remaining R threads:
> 
> E:something=1      // protected by killlock
> E:UNLOCK(killlock)
> E:need_locks=-1
> 
> R:LOCK(unrelated)  // reads need_locks == -1
> R:need_locks=0
> R:UNLOCK(unrelated)
> R:LOCK(killlock)   // does not lock
> R:read something   // can it be 0 ?
> 
> and here something can be 0 (ie. not protected by killlock) on aarch64
> because
> 
> T1
> 	something=1
> 	ldaxr ... killlock
> 	stlxr ... killlock
> 	need_locks=-1
> 
> T2
> 	x=need_locks
> 	ldaxr ... unrelated
> 	stlxr ... unrelated
> 	y=something
> 
> can end with x==-1 and y==0.
> 
> and to fix it, both a_fetch_add and a_cas need an a_barrier.
> 
> i need to think how to support such lock usage on aarch64
> without adding too many dmb.

OK, after reading a lot more, I think I'm starting to get what you're
saying. Am I correct in my understanding that the problem is that the
"R:LOCK(unrelated)" as implemented does not synchronize with the
"E:UNLOCK(killlock)" because they're different objects?

If so, I think this would be fully solved by using __tl_sync in the
code path that resets need_locks to 0 after observing -1, by virtue of
providing a common object (the thread list lock) to synchronize on.
This is the "weaker memory model friendly" approach we should probably
strive to achieve some day.

However, all existing code in musl is written assuming what I call the
"POSIX memory model" where the only operation is "synchronizes memory"
and that underspecified phrase has to be interpreted as "is a full
barrier" to admit any consistent model. Said differently, the code was
written assuming every a_* synchronizes with every other a_*, without
any regard for whether they act on the same objects. This likely even
matters for how our waiter accounting works (which is probably a good
argument for removing it and switching to waiter flags). So I think,
if the issue as I understand it now exists, we do need to fix it. Then
we can revisit this at some later time as part of a big project.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.