|
Message-ID: <20140812233310.GF12888@brightrain.aerifal.cx> Date: Tue, 12 Aug 2014 19:33:10 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: bug in pthread_cond_broadcast On Wed, Aug 13, 2014 at 12:50:19AM +0200, Jens Gustedt wrote: > Am Dienstag, den 12.08.2014, 17:20 -0400 schrieb Rich Felker: > > On Tue, Aug 12, 2014 at 08:18:59PM +0200, Jens Gustedt wrote: > > > so yes, the broadcast operation is synchronized with all other > > > threads, that's the idea of this test > > > > OK, thanks for clarifying. I'm still trying to understand where the > > error in musl's accounting is happening -- > > I think these are the decrement operations on _c_waiters and > _c_waiters2 in "unwait". In the case of the test, for pending waiters, > these happen after other threads know that the condition variable is > "released" by the main thread and have already entered the next phase. I think you're saying the problem is this code: if (c->_c_waiters2) c->_c_waiters2--; else a_dec(&m->_m_waiters); which is wrongly decrementing a newly-added waiter from the cond var when the intent was that the waiter should be decremented from the mutex. Is this correct? One potential solution I have in mind is to get rid of this complex waiter accounting by: 1. Having pthread_cond_broadcast set the new-waiter state on the mutex when requeuing, so that the next unlock forces a futex wake that otherwise might not happen. 2. Having pthread_cond_timedwait set the new-waiter state on the mutex after relocking it, either unconditionally (this would be rather expensive) or on some condition. One possible condition would be to keep a counter in the condvar object for the number of waiters that have been requeued, incrementing it by the number requeued at broadcast time and decrementing it on each wake. However the latter requires accessing the cond var memory in the return path for wait, and I don't see any good way around this. Maybe there's a way to use memory on the waiters' stacks? > > I'd like to find a fix that > > would be acceptable in the 1.0.x branch and make that fix before > > possibly re-doing the cond var implementation (a fix which wouldn't be > > suitable for backporting). > > Some thoughts: > > Basically, in "unwait" there shouldn't be any reference to c-> . No > pending thread inside timedwait should ever have to access the > pthread_cond_t, again, it might already heavily used by other threads. I'm not sure whether I agree with this or not -- as long as destroy is properly synchronized, I don't think it's inherently a bug to access c-> here, and I'm not clear yet on how the access could be eliminated. I think this is a direction we should pursue too, but I'd like to avoid invasive design changes for the backport to 1.0.x. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.