Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221003132615.GF2158779@port70.net>
Date: Mon, 3 Oct 2022 15:26:15 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Cc: Alexey Izbyshev <izbyshev@...ras.ru>
Subject: Re: Illegal killlock skipping when transitioning to
 single-threaded state

* Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]:
> On 2022-09-19 18:29, Rich Felker wrote:
> > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote:
...
> > > Reordering the "libc.need_locks = -1" assignment and
> > > UNLOCK(E->killlock) and providing a store barrier between them
> > > should fix the issue.
> > 
> > I think this all sounds correct. I'm not sure what you mean by a store
> > barrier between them, since all lock and unlock operations are already
> > full barriers.
> > 
> 
> Before sending the report I tried to infer the intended ordering semantics
> of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't
> see why they would provide a full barrier (my reasoning is below), so I
> concluded that probably acquire/release semantics was intended in general
> and suggested an extra store barrier to prevent hoisting of "libc.need_locks
> = -1" store spelled after UNLOCK(E->killlock) back into the critical
> section.
> 
> UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple
> a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented
> via load-acquire/store-release instructions. Therefore, if we consider a
> LOCK/UNLOCK critical section containing only plain loads and stores, (a) any
> such memory access can be reordered with the initial ldaxr in UNLOCK, and
> (b) any plain load following UNLOCK can be reordered with stlxr (assuming
> the processor predicts that stlxr succeeds), and further, due to (a), with
> any memory access inside the critical section. Therefore, UNLOCK is not full
> barrier. Is this right?

i dont think this is right.

memory operations in the critical section cannot visibly happen after the
final stlxr.

memory operations after the critical section cannot visibly happen before
the ldaxr of UNLOCK.

the only issue can be inside the ll/sc loop in UNLOCK if there are independent
memory operations there, but there arent.

> 
> As for a store following UNLOCK, I'm not sure. A plain store following stlxr
> can be reordered with it, but here that store is conditional on stlxr
> success. So even if the processor predicts stlxr success, it can't make the
> following store visible before it's sure that stlxr succeeded. But I don't
> know whether the events "the processor knows that stlxr succeeded" and "the
> result of stlxr is globally visible" are separate or not, and if they are
> separate, what comes first. Depending on the answer, UNLOCK acts as a store
> barrier or not.
> 

UNLOCK on aarch64 acts as a full seqcst barrier as far as i can see.

but looking into the arch implementations needs a detailed understanding
of the arch memory model (eg aarch64 stlxr is RCsc not RCpc like iso c
release store), but there is no need for that: the musl model is
simple seqcst synchronization everywhere.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.