|
Message-ID: <b2628dbba894de61664271a42d320090@ispras.ru> Date: Tue, 04 Oct 2022 13:18:31 +0300 From: Alexey Izbyshev <izbyshev@...ras.ru> To: musl@...ts.openwall.com Cc: musl@...ts.openwall.com Subject: Re: Illegal killlock skipping when transitioning to single-threaded state On 2022-10-04 07:59, Rich Felker wrote: > On Mon, Oct 03, 2022 at 11:00:21PM -0400, Rich Felker wrote: >> On Mon, Oct 03, 2022 at 10:58:32PM -0400, Rich Felker wrote: >> > On Mon, Oct 03, 2022 at 11:27:05PM +0200, Szabolcs Nagy wrote: >> > > * Szabolcs Nagy <nsz@...t70.net> [2022-10-03 15:26:15 +0200]: >> > > >> > > > * Alexey Izbyshev <izbyshev@...ras.ru> [2022-10-03 09:16:03 +0300]: >> > > > > On 2022-09-19 18:29, Rich Felker wrote: >> > > > > > On Wed, Sep 07, 2022 at 03:46:53AM +0300, Alexey Izbyshev wrote: >> > > > ... >> > > > > > > Reordering the "libc.need_locks = -1" assignment and >> > > > > > > UNLOCK(E->killlock) and providing a store barrier between them >> > > > > > > should fix the issue. >> > > > > > >> > > > > > I think this all sounds correct. I'm not sure what you mean by a store >> > > > > > barrier between them, since all lock and unlock operations are already >> > > > > > full barriers. >> > > > > > >> > > > > >> > > > > Before sending the report I tried to infer the intended ordering semantics >> > > > > of LOCK/UNLOCK by looking at their implementations. For AArch64, I didn't >> > > > > see why they would provide a full barrier (my reasoning is below), so I >> > > > > concluded that probably acquire/release semantics was intended in general >> > > > > and suggested an extra store barrier to prevent hoisting of "libc.need_locks >> > > > > = -1" store spelled after UNLOCK(E->killlock) back into the critical >> > > > > section. >> > > > > >> > > > > UNLOCK is implemented via a_fetch_add(). On AArch64, it is a simple >> > > > > a_ll()/a_sc() loop without extra barriers, and a_ll()/a_sc() are implemented >> > > > > via load-acquire/store-release instructions. Therefore, if we consider a >> > > > > LOCK/UNLOCK critical section containing only plain loads and stores, (a) any >> > > > > such memory access can be reordered with the initial ldaxr in UNLOCK, and >> > > > > (b) any plain load following UNLOCK can be reordered with stlxr (assuming >> > > > > the processor predicts that stlxr succeeds), and further, due to (a), with >> > > > > any memory access inside the critical section. Therefore, UNLOCK is not full >> > > > > barrier. Is this right? >> > > > >> > > > i dont think this is right. >> > > >> > > >> > > i think i was wrong and you are right. >> > > >> > > so with your suggested swap of UNLOCK(killlock) and need_locks=-1 and >> > > starting with 'something == 0' the exiting E and remaining R threads: >> > > >> > > E:something=1 // protected by killlock >> > > E:UNLOCK(killlock) >> > > E:need_locks=-1 >> > > >> > > R:LOCK(unrelated) // reads need_locks == -1 >> > > R:need_locks=0 >> > > R:UNLOCK(unrelated) >> > > R:LOCK(killlock) // does not lock >> > > R:read something // can it be 0 ? >> > > >> > > and here something can be 0 (ie. not protected by killlock) on aarch64 >> > > because >> > > >> > > T1 >> > > something=1 >> > > ldaxr ... killlock >> > > stlxr ... killlock >> > > need_locks=-1 >> > > >> > > T2 >> > > x=need_locks >> > > ldaxr ... unrelated >> > > stlxr ... unrelated >> > > y=something >> > > >> > > can end with x==-1 and y==0. >> > > >> > > and to fix it, both a_fetch_add and a_cas need an a_barrier. >> > > >> > > i need to think how to support such lock usage on aarch64 >> > > without adding too many dmb. >> > >> > OK, after reading a lot more, I think I'm starting to get what you're >> > saying. Am I correct in my understanding that the problem is that the >> > "R:LOCK(unrelated)" as implemented does not synchronize with the >> > "E:UNLOCK(killlock)" because they're different objects? >> > >> > If so, I think this would be fully solved by using __tl_sync in the >> > code path that resets need_locks to 0 after observing -1, by virtue of >> > providing a common object (the thread list lock) to synchronize on. >> > This is the "weaker memory model friendly" approach we should probably >> > strive to achieve some day. >> > >> > However, all existing code in musl is written assuming what I call the >> > "POSIX memory model" where the only operation is "synchronizes memory" >> > and that underspecified phrase has to be interpreted as "is a full >> > barrier" to admit any consistent model. Said differently, the code was >> > written assuming every a_* synchronizes with every other a_*, without >> > any regard for whether they act on the same objects. This likely even >> > matters for how our waiter accounting works (which is probably a good >> > argument for removing it and switching to waiter flags). So I think, >> > if the issue as I understand it now exists, we do need to fix it. Then >> > we can revisit this at some later time as part of a big project. >> >> Forgot, I should include links to the material I've been reading: >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697 >> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=f70fb3b635f9618c6d2ee3848ba836914f7951c2 >> https://gcc.gnu.org/git?p=gcc.git;a=commitdiff;h=ab876106eb689947cdd8203f8ecc6e8ac38bf5ba >> >> which is where the GCC folks seem to have encountered and fixed their >> corresponding issue. Thank you for the links. This comment[1] addresses my doubts on whether a store following (the current implementation of) UNLOCK can become visible before stlxr despite the conditional branch instruction in between (yes, it can). [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65697#c23 > > Proposed patch attached. > The patch looks good to me. Alexey
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.