|
Message-ID: <bd7c341816819554877e452643d1f698@ispras.ru> Date: Tue, 04 Oct 2022 08:16:16 +0300 From: Alexey Izbyshev <izbyshev@...ras.ru> To: musl@...ts.openwall.com Subject: Re: Illegal killlock skipping when transitioning to single-threaded state On 2022-10-04 00:27, Szabolcs Nagy wrote: > * Szabolcs Nagy <nsz@...t70.net> [2022-10-03 15:26:15 +0200]: > >> * Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]: >> > On 2022-09-19 18:29, Rich Felker wrote: >> > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote: >> ... >> > > > Reordering the "libc.need_locks = -1" assignment and >> > > > UNLOCK(E->killlock) and providing a store barrier between them >> > > > should fix the issue. >> > > >> > > I think this all sounds correct. I'm not sure what you mean by a store >> > > barrier between them, since all lock and unlock operations are already >> > > full barriers. >> > > >> > >> > Before sending the report I tried to infer the intended ordering semantics >> > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't >> > see why they would provide a full barrier (my reasoning is below), so I >> > concluded that probably acquire/release semantics was intended in general >> > and suggested an extra store barrier to prevent hoisting of "libc.need_locks >> > = -1" store spelled after UNLOCK(E->killlock) back into the critical >> > section. >> > >> > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple >> > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented >> > via load-acquire/store-release instructions. Therefore, if we consider a >> > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any >> > such memory access can be reordered with the initial ldaxr in UNLOCK, and >> > (b) any plain load following UNLOCK can be reordered with stlxr (assuming >> > the processor predicts that stlxr succeeds), and further, due to (a), with >> > any memory access inside the critical section. Therefore, UNLOCK is not full >> > barrier. Is this right? >> >> i dont think this is right. > > > i think i was wrong and you are right. > > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and > starting with 'something == 0' the exiting E and remaining R threads: > > E:something=1 // protected by killlock > E:UNLOCK(killlock) > E:need_locks=-1 > > R:LOCK(unrelated) // reads need_locks == -1 > R:need_locks=0 > R:UNLOCK(unrelated) > R:LOCK(killlock) // does not lock > R:read something // can it be 0 ? > > and here something can be 0 (ie. not protected by killlock) on aarch64 > because > > T1 > something=1 > ldaxr ... killlock > stlxr ... killlock > need_locks=-1 > > T2 > x=need_locks > ldaxr ... unrelated > stlxr ... unrelated > y=something > > can end with x==-1 and y==0. > Yes, overall this matches my understanding. However, your UNLOCK expansion (in T1/T2) omits the branch instruction between stlxr and the following store, and, as I mentioned, I'm not sufficiently knowledgeable to understand the effects of this branch on the order of visibility of "stlxr killlock" (and preceding stores) and "need_locks=-1". > and to fix it, both a_fetch_add and a_cas need an a_barrier. > > i need to think how to support such lock usage on aarch64 > without adding too many dmb. > I'd also like to point out that our discussion so far has been focused on reordering of "something" protected by the critical section and "need_locks=-1", but my original bug report also mentioned the possibility of lock corruption, and for the latter we're also interested in reordering of "stlxr" and "need_locks=-1". It's possible that some UNLOCK implementations can provide the first guarantee, but not the second. > > >> >> memory operations in the critical section cannot visibly happen after >> the >> final stlxr. >> >> memory operations after the critical section cannot visibly happen >> before >> the ldaxr of UNLOCK. >> >> the only issue can be inside the ll/sc loop in UNLOCK if there are >> independent >> memory operations there, but there arent. >> You've already said this is wrong, but I want to spell out why it's so for any interested parties to hopefully clear some confusion. Both premises (about stlxr and ldaxr) are correct, but the third point doesn't follow and is wrong. Because ldaxr/stlxr act as one-way barriers, the following pseudo-code r1 = X ldaxr LOCK stlxr LOCK r2 = Y can be observed, in my understanding of AArch64 memory model, as ldaxr LOCK r1 = X r2 = Y stlxr LOCK and even as ldaxr LOCK r2 = Y r1 = X stlxr LOCK The same is true for stores as well. So effectively the loop in UNLOCK can absorb memory operations from both directions, which then can be reordered with each other (with a possible exception of stores that follow UNLOCK, as I said above). Thanks, Alexey
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.