musl - Re: sem_getvalue conformance considerations

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1408292251540.5292@monopod.intra.ispras.ru>
Date: Sat, 30 Aug 2014 02:51:35 +0400 (MSK)
From: Alexander Monakov <amonakov@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: sem_getvalue conformance considerations

Hi,

To begin, I have a few questions regarding cancellation (please excuse
deviating from topic for a while):

1.  It looks non-conforming not to act on pending cancellation in uncontended
sem_wait (or any cancellation point that does not invoke a syscall, for that
matter; nsz pointed out that many instances should already be handled, but
pthread_join for instance seems to miss explicit timedwait).  A bug?

2.  I had trouble understanding cancel_handler.  What is re-raising
SIGCANCEL at the end for?

3.  My understanding is that for deferred cancellation, ideally you'd want
SIGCANCEL to be delivered only as interruptions to syscalls --- that would
eliminate some scaffolding around syscall_cp including IP range check in
cancel_handler, correct?

4.  If the above could be provided, it would be possible to guarantee that
pthread cancellation cleanup is executing in a signal handler context only for
asynchronous cancellation.  I'd like that for my sem unwait.  Too bad?

======

Trying to approach a formal description of my proposed sem ops
implementation, here's what I can do now:

1.  Sem ops begin execution by performing an atomic read or read-modify-write
on val[0].  The sequence of atomic accesses establishes an order between
threads doing the ops (with respect to a given semaphore).

2.  Returning from futex wait with timeout or interrupt also accesses
val[0] and is included in that order

3.  The following invariant is maintained: val[0] is equal to the semaphore
value minus the number of current waiters (as given by the linear sequence of
events established above)

4. sem_post begins by performing an atomic increment on val[0]; if the
previous value was negative, there was at least one waiter, which is now
discounted from val[0]; we must arrange that one waiter eventually wakes up
and proceeds

4.1. Note: when sem_post increments negative val[0], we need that one waiter
returns successfully from sem_wait; if they were to exit exceptionally (eintr,
timeout, cancel), they would need to increment val[0] yet again;
but this will allow other threads to observe incrementing sem_getvalue despite
no new posts, as Rich noted in his email

4.2. We arrange that exactly one waiter proceeds by atomically incrementing
val[1] and futex-waking it; since we are doing that non-atomically with val[0]
update, we must guarantee that no waiters can proceed to destroy the semaphore
between val[0] and val[1] updates in sem_post

5.  sem_wait begins by atomically decrementing val[0]; if the previous value
was positive, we have simply successfully decremented the semaphore;
otherwise, we have become a waiter and must suspend

5.1.  sem_wait futex-waits on val[1]; on normal wakeups, it tries to
atomically-decrement-if-positive val[1] and leaves if successful

FIXME: we are reading from val[1] even though val[0] was increased long ago and
semaphore may be now positive; is there a justification for that, why can't
other threads assume the semaphore is free to be destroyed?

5.2.  On eintr/timeout/cancel, a waiter must decide if it may just discount
itself as a waiter, or must deal with a racing post; in case there is a racing
post, the waiter must either consume it and return normally, or discount
itself as a waiter nevertheless and ensure that rather than causing a wake the
post increases semaphore value beyond zero

It seems there are two ways of doing it.  We either prefer to consume a
pending post if available (a), or to propagate eintr/cancel (b).

5.2.a. Begin by incrementing-if-negative val[0]. If successful, we have
discounted ourselves as a waiter, so we don't need to care about concurrent
posts (there are fewer pending posts than waiters) and may propagate
eintr/timeout/cancel.  If unsuccessful, there is a pending post.   Consume it
and cause normal return from sem_wait (as if interruption did not happen).
Can't easily cause normal return from sem_wait if acting upon cancellation.

5.2.b.  Begin by incrementing val[0] unconditionally.  If it was negative, we
have discounted ourselves as a waiter; otherwise, there is a pending post, and
our increment is making it look as if the post is increasing an uncontended
semaphore (out of sequence, so other threads may observe oddly increasing
value); however there is a pending increase of val[1] and we must undo it.
Wait for val[1] to become positive by futex-waiting and then decrement it.
This looks more hairy, causes non-conforming behavior, but easier to execute
from a signal handler context for cancellation.

Does that help?  Thoughts?

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.