Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140814165817.GD12888@brightrain.aerifal.cx>
Date: Thu, 14 Aug 2014 12:58:17 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: My current understanding of cond var access restrictions

On Thu, Aug 14, 2014 at 06:27:21PM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 14.08.2014, 10:41 -0400 schrieb Rich Felker:
> > Thus I'm skeptical of trying an approach like this when it would be
> > easier, and likely less costly on the common usage cases, just to
> > remove requeue and always use broadcast wakes. I modified your test
> > case for the bug to use a process-shared cv (using broadcast wake),
> > and as expected, the test runs with no failure.
> 
> You shouldn't draw much conclusion from the fact that it works in that
> case. This still heavily interacts with the waiters count and thus a
> signaling thread that comes after such a broadcast might wake up a
> thread that it shouldn't.
> 
> (But I didn't do a full analysis of that situation.)

In the process-shared case, broadcast just increments the sequence
number and wakes all futex waiters. It's very simple.

Formally, there's no such thing as waking up a thread you shouldn't,
since spurious wakes are always allowed. The current implementation
has a lot of potential for spurious wakes but they don't happen except
in certain situations:

- If a futex wait gets interrupted by a signal, the wait will always
  terminate after the signal handler returns if any intervening
  signals or broadcasts happened (except in the case of a full
  wraparound of the sequence number, i.e. exactly 2<<32 cv signals
  while stuck in a signal handler, which I don't know how to fix, but
  it would be easy to write a test for this) even if the signal was
  already received by another waiter.

- If the sequence number gets incremented by a signal before the
  initial futex wait, the waiter will return immediately; this can
  happen to multiple waiters even for just one signal.

Really sequence numbers are the wrong tool here, but they were
introduced because the previous approach (having each waiter write its
own tid, and futex wait comparing that tid) lead to pathologically bad
performance under heavy waiter arrival where waiters were constantly
returning because another waiter was almost always able to write its
tid before the first one could block on the futex. I'd like to have a
better solution, but I can't think of any.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.