Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140827214853.GV12888@brightrain.aerifal.cx>
Date: Wed, 27 Aug 2014 17:48:53 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 2/2] avoid taking _c_lock if we know it isn't
 necessary

On Wed, Aug 27, 2014 at 11:30:26PM +0200, Jens Gustedt wrote:
> > I also have some other potential changes to this
> > code based on my latest comments to:
> > 
> > http://austingroupbugs.net/view.php?id=609
> > 
> > regarding things they seem to deem as requirements, and which musl
> > does not satisfy, that are specified in non-normative text. So there's
> > likely to be more cond var work to do before the release still...
> 
> Ah, the cancelation stuff. As if condition variables wouldn't be
> complicated enough already, without cancelation. We already have two
> different ordered sequences of events, those on the cv and those on
> the mutex. The discussion (and our implementation struggles) already
> shows how difficult it is to get these two linear sequences ordered in
> a convenient way. If you add a third set of events that are neither
> ordered among themselves (cancelation to different threads are
> asynchronous) nor with any of the two sequences, the semantics aren't
> clear at all. (This is why I think that generally thread cancelation
> is not a good idea, and why it is not very widely used. It contributes
> for more than 50% to the complexity of the implementation of
> pthreads.)
> 
> But with the current implementation, I would think that it basically
> fulfills (or can be easily made to fulfill) the requirement that
> cancelation would not be "consuming" a signal when some other thread
> is available. We are marking threads as WAITING, LEAVING or SIGNALED
> and only for WAITING, a thread can be consired "blocked" on the
> cv. The transition between these is atomic, and so once a signaler
> marked a thread SIGNALED, it is not blocked and has rightly consumed
> the signal.

Yet this transition to SIGNALED can happen when the waiter is already
executing the cancellation cleanup handler, before the a_cas there. In
this case, it has "consumed the signal", but __timedwait never
returns (the __syscall_cp in timedwait never returns).

I have a patch which solves this problem via setjmp in
pthread_cond_timedwait and longjmp in unwait when SIGNALED won the
a_cas race, but it has noticable performance cost (due to
unconditional setjmp on each call).

The ideal solution would be to implement the cancellation variant I've
been wanting to add for some time now: a cancellation mode where the
cancelled function returns with ECANCELED rather than acting on
cancellation immediately. This can be implemented by having the
cancellation signal handler not just check the program counter, but
also modify it, when this mode is in effect, so that returning from
the signal handler skips the syscall and instead returns -ECANCELED.

With that done, all of the nasty libc-internal use of cancellation
cleanup handlers could be replaced with temporarily changing the
cancellation mode and simply checking return values/errno for
ECANCELED. And it allows us to implement things like the cond var
behavior where deciding whether to act on cancellation or leave it
pending should take place in userspace after the syscall returns.

We can also expose this behavior as an experimental public interface
and propose it for standardization, but there are a lot of corner
cases I'd want to analyze in more detail before doing so to make sure
they're done right.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.