|
Message-ID: <20140814153615.GB12888@brightrain.aerifal.cx> Date: Thu, 14 Aug 2014 11:36:15 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: My current understanding of cond var access restrictions On Thu, Aug 14, 2014 at 10:41:10AM -0400, Rich Felker wrote: > On Thu, Aug 14, 2014 at 10:00:04AM +0200, Jens Gustedt wrote: > > Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker: > > > I think I have an informal proof sketch that this is necessary unless > > > we abandon requeue: > > > > > ... > > > > > With that in mind, I'd like to look for ways we can fix the bogus > > > waiter accounting for the mutex that seems to be the source of the bug > > > you found. One "obvious" (but maybe bad/wrong?) solution would be to > > > put the count on the mutex at the time of waiting (rather than moving > > > it there as part of broadcast), so that decrementing the mutex waiter > > > count is always the right thing to do in unwait. > > > > sounds like a good idea, at least for correctness > > > > > Of course this > > > possibly results in lots of spurious futex wakes to the mutex (every > > > time it's unlocked while there are waiters on the cv, which could be a > > > lot). > > > > I we'd be more careful in not spreading too much wakes where we > > shouldn't, there would perhaps not be "a lot" of such wakeups. > > Well this is different from the wake-after-release that you dislike. > It's a wake on a necessarily-valid object that just doesn't have any > actual waiters right now because its potential-waiters are still > waiting on the cv. > > However I think it may be costly (one syscall per unlock) in > applications where mutex is used to protect state that's frequently > modified but where the predicate associated with the cv only rarely > changes (and thus signaling is rare and cv waiters wait around a long > time). In what's arguably the common case (a reasonable number of > waiters as opposed to thousands of waiters on a 4-core box) just > waking all waiters on broadcast would be a lot less expensive. > > Thus I'm skeptical of trying an approach like this when it would be > easier, and likely less costly on the common usage cases, just to > remove requeue and always use broadcast wakes. I modified your test > case for the bug to use a process-shared cv (using broadcast wake), > and as expected, the test runs with no failure. A really ugly hack that might solve the problem: adaptively switching to a less efficient mode the first time a different mutex is used. It could either switch to pre-moving wait counts to the mutex, or revert to broadcast wakes. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.