Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140814153615.GB12888@brightrain.aerifal.cx>
Date: Thu, 14 Aug 2014 11:36:15 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: My current understanding of cond var access restrictions

On Thu, Aug 14, 2014 at 10:41:10AM -0400, Rich Felker wrote:
> On Thu, Aug 14, 2014 at 10:00:04AM +0200, Jens Gustedt wrote:
> > Am Donnerstag, den 14.08.2014, 02:10 -0400 schrieb Rich Felker:
> > > I think I have an informal proof sketch that this is necessary unless
> > > we abandon requeue:
> > 
> > > ...
> > 
> > > With that in mind, I'd like to look for ways we can fix the bogus
> > > waiter accounting for the mutex that seems to be the source of the bug
> > > you found. One "obvious" (but maybe bad/wrong?) solution would be to
> > > put the count on the mutex at the time of waiting (rather than moving
> > > it there as part of broadcast), so that decrementing the mutex waiter
> > > count is always the right thing to do in unwait.
> > 
> > sounds like a good idea, at least for correctness
> > 
> > > Of course this
> > > possibly results in lots of spurious futex wakes to the mutex (every
> > > time it's unlocked while there are waiters on the cv, which could be a
> > > lot).
> > 
> > I we'd be more careful in not spreading too much wakes where we
> > shouldn't, there would perhaps not be "a lot" of such wakeups.
> 
> Well this is different from the wake-after-release that you dislike.
> It's a wake on a necessarily-valid object that just doesn't have any
> actual waiters right now because its potential-waiters are still
> waiting on the cv.
> 
> However I think it may be costly (one syscall per unlock) in
> applications where mutex is used to protect state that's frequently
> modified but where the predicate associated with the cv only rarely
> changes (and thus signaling is rare and cv waiters wait around a long
> time). In what's arguably the common case (a reasonable number of
> waiters as opposed to thousands of waiters on a 4-core box) just
> waking all waiters on broadcast would be a lot less expensive.
> 
> Thus I'm skeptical of trying an approach like this when it would be
> easier, and likely less costly on the common usage cases, just to
> remove requeue and always use broadcast wakes. I modified your test
> case for the bug to use a process-shared cv (using broadcast wake),
> and as expected, the test runs with no failure.

A really ugly hack that might solve the problem: adaptively switching
to a less efficient mode the first time a different mutex is used. It
could either switch to pre-moving wait counts to the mutex, or revert
to broadcast wakes.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.