|
Message-ID: <2e77a700561a059e85daad5311306cfb@ispras.ru> Date: Tue, 04 Oct 2022 16:50:00 +0300 From: Alexey Izbyshev <izbyshev@...ras.ru> To: musl@...ts.openwall.com Subject: Re: Illegal killlock skipping when transitioning to single-threaded state On 2022-10-04 02:05, Rich Felker wrote: > On Mon, Oct 03, 2022 at 06:54:17PM -0400, Rich Felker wrote: >> On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote: >> > * Szabolcs Nagy <nsz@...t70.net> [2022-10-03 15:26:15 +0200]: >> > >> > > * Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]: >> > > > On 2022-09-19 18:29, Rich Felker wrote: >> > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote: >> > > ... >> > > > > > Reordering the "libc.need_locks = -1" assignment and >> > > > > > UNLOCK(E->killlock) and providing a store barrier between them >> > > > > > should fix the issue. >> > > > > >> > > > > I think this all sounds correct. I'm not sure what you mean by a store >> > > > > barrier between them, since all lock and unlock operations are already >> > > > > full barriers. >> > > > > >> > > > >> > > > Before sending the report I tried to infer the intended ordering semantics >> > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't >> > > > see why they would provide a full barrier (my reasoning is below), so I >> > > > concluded that probably acquire/release semantics was intended in general >> > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks >> > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical >> > > > section. >> > > > >> > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple >> > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented >> > > > via load-acquire/store-release instructions. Therefore, if we consider a >> > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any >> > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and >> > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming >> > > > the processor predicts that stlxr succeeds), and further, due to (a), with >> > > > any memory access inside the critical section. Therefore, UNLOCK is not full >> > > > barrier. Is this right? >> > > >> > > i dont think this is right. >> > >> > >> > i think i was wrong and you are right. >> > >> > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and >> > starting with 'something == 0' the exiting E and remaining R threads: >> > >> > E:something=1 // protected by killlock >> > E:UNLOCK(killlock) >> > E:need_locks=-1 >> > >> > R:LOCK(unrelated) // reads need_locks == -1 >> > R:need_locks=0 >> > R:UNLOCK(unrelated) >> > R:LOCK(killlock) // does not lock >> > R:read something // can it be 0 ? >> > >> > and here something can be 0 (ie. not protected by killlock) on aarch64 >> > because >> > >> > T1 >> > something=1 >> > ldaxr ... killlock >> > stlxr ... killlock >> > need_locks=-1 >> > >> > T2 >> > x=need_locks >> > ldaxr ... unrelated >> > stlxr ... unrelated >> > y=something >> > >> > can end with x==-1 and y==0. >> > >> > and to fix it, both a_fetch_add and a_cas need an a_barrier. >> > >> > i need to think how to support such lock usage on aarch64 >> > without adding too many dmb. >> >> I don't really understand this, but FWIW gcc emits >> >> ldxr >> ... >> stlxr >> ... >> dmb ish >> >> for __sync_val_compare_and_swap. So this is probably the right thing >> we should have. And it seems to match what the kernel folks discussed >> here: >> >> http://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229588.html >> >> I wondered if there are similar issues for any others archs which need >> review, but it looks like all the other llsc archs have explicit >> pre/post barriers defined. > > Actually I don't understand what's going on with cmpxchg there. The > patch I linked has it using ldxr/stxr (not stlxr) for cmpxchg. There's > some follow-up in the thread I don't understand, about the case where > the cas fails, but we already handle that by doing an explicit barrier > in that case. > I think in that follow-up[1] they mean the following case (in musl terms): volatile int x, flag; T1: x = 1; a_store(&flag, 1); T2: while (!flag); a_cas(&x, 0, 1); // can this fail? They want it to never fail. But if a_cas() is implemented as ldrx/stlrx/dmb, this is not guaranteed because ldxr can be reordered with the load of flag. Note that musl does *not* handle this now, because a_barrier() in the failure path is after a_ll(). [1] https://lists.infradead.org/pipermail/linux-arm-kernel/2014-February/229693.html Alexey
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.