|
Message-ID: <20150728145821.GT16376@brightrain.aerifal.cx> Date: Tue, 28 Jul 2015 10:58:21 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: What's left for 1.1.11 release? On Tue, Jul 28, 2015 at 04:50:33PM +0200, Jens Gustedt wrote: > Am Dienstag, den 28.07.2015, 10:18 -0400 schrieb Rich Felker: > > On Tue, Jul 28, 2015 at 04:09:38PM +0200, Jens Gustedt wrote: > > > Hello, > > > > > > Am Montag, den 27.07.2015, 23:40 -0400 schrieb Rich Felker: > > > > In principle the a_store issue affects all libc-internal __lock/LOCK > > > > uses, > > > > > > so this worries me since I assumed that UNLOCK had release consistency > > > for the __atomic implementation. > > > > It does. The problem is that it lacks acquire consistency, which we > > need in order to know whether to wake. > > ah, I think we are speaking of different things here. I want release > consistency for the lock operation, in the sense to be guaranteed that > all threads that are waiting for the lock will eventually know that it > has been released. So you are telling me, that the current version > doesn't warrant this? This is no problem; you get it for free on x86, and it's properly achieved with explicit barriers on all other archs. > The operation for which you need acquire consistency, is in fact the > load of l[1]. Somehow the current approach is ambiguous to which is > the atomic object. Is it l[0], is it l[1] or is it the pair of them? l[0] is the lock word. l[1] is the waiters count and while it's modified atomically, the read is relaxed-order. Contrary to my expectations, real-world x86 chips will actually reorder the read of l[1] before the store to l[0], resulting in a failure-to-wake deadlock. > > > > and stdio locks too, but it's only been observed in malloc. > > > > Since there don't seem to be any performance-relevant uses of a_store > > > > that don't actually need the proper barrier, I think we have to just > > > > put an explicit barrier (lock orl $0,(%esp) or mfence) after the store > > > > and live with the loss of performance. > > > > > > How about using a xchg as instruction? This would perhaps "waste" a > > > register, but that sort of optimization should not be critical in the > > > vicinity of code that needs memory synchronization, anyhow. > > > > How is this better? My intent was to avoid incurring a read on the > > cache line that's being written and instead achieve the > > synchronization by poking at a cache line (the stack) that should not > > be shared. > > In fact, I think you need a read on the cache line, here, don't you? > You want to know the real value of l[1], no? These specific callers do need l[1], but that's specific to the callers, not fundamental to a_store. Also in principle l[1] need not even be in the same cache line (ideally, it wouldn't be, but it's likely to be anyway) since the alignment of l[] is just 32-bit. > To be safe, I think this needs a full cmpxchg on the pair (l[0], > l[1]), otherwise you can't know if the waiter count l[1] corresponds > to the value just before the release of the lock. No. As long as a_store behaves as a full barrier (including acquire behavior) as it was intended to, you cannot read a value of l[1] older than the value it had at the time of a_store, because there's a globally consistent order between the a_inc and a_store. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.