musl - Re: Memory lock needed for semaphores?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240804224023.GY10433@brightrain.aerifal.cx>
Date: Sun, 4 Aug 2024 18:40:24 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com, Alexey Izbyshev <izbyshev@...ras.ru>
Subject: Re: Memory lock needed for semaphores?

On Sun, Aug 04, 2024 at 09:31:10AM -0400, Rich Felker wrote:
> On Sun, Aug 04, 2024 at 09:29:03AM +0200, Markus Wichmann wrote:
> > Hi all,
> > 
> > in the light of the recent thread about barriers I looked at the
> > semaphore implementation under the light of the sudden unmap issue
> > again. To recap, the issue is that multiple threads may interact with
> > the same pshared semaphore, and in particular one thread might unmap a
> > semaphore that another thread is currently still interacting with.
> > 
> > Is that an issue with semaphores? I'm thinking of the following
> > scenario:
> > 
> > Start: One thread of another process is blocked at the semaphore
> > (sem->__val = {INT_MIN, 1, 0}).
> > 
> > Thread 1: Calls sem_post(), gets until just after the a_cas() before
> > being preempted (sem->__val = {1, 1, 0}, but this thread saw val < 0 and
> > waiters <= 1, so broadcast wake will issue once thread resumes).
> > 
> > Thread 2: Calls sem_trywait(), sets sem->__val = {0, 1, 0}.
> > Thread 2: Calls sem_post(), sets sem->__val = {1, 1, 0}, does not wake.
> > Thread 2: Unmaps the semaphore.
> > Thread 1: Resumes. SYS_futex returns EFAULT. Other process remains
> > sleeping.
> > 
> > Am I missing something that makes this impossible? If not, do we maybe
> > need a vmlock for posting pshared semaphores? That would probably be the
> > easiest fix.
> 
> This is not possible and would be a misuse of it even if it were.
> sem_post is required to be AS-safe, so it cannot take any sort of
> lock.
> 
> I think the issue you report is real, but provided that's the case,
> it's a flaw in the logic to be relying on the first sem_post to
> perform the wake the second logically needs to perform. I haven't yet
> looked into what the cost of leaving the waiters flag set would be.
> 
> There are kinda 3 approaches I can see exploring to fix this, and I'm
> not sure which are low-cost or even valid yet:
> 
> 1. Have post only clear the waiters flag if the count is <1 (not <=1)
> 2. Have trywait re-add a waiters flag if it sees waiters>0
> 3. Have post perform the wake if it saw waiters>0 even if waiters bit
>    was not set.
> 
> I'll look at this more later. I've CC'd Alexey Izbyshev who introduced
> the current waiter accounting.

>From the original thread:

https://www.openwall.com/lists/musl/2022/11/22/3

On Tue, Nov 22, 2022 at 08:41:44AM +0300, Alexey Izbyshev wrote:
> ...
> * WAITERS_BIT is set and waiters > 1: don't reset WAITERS_BIT since
> it's likely that some waiters will remain (and it's always fine to
> err on the side of preserving the WAITERS_BIT); wake a single
> waiter.
> 
> * WAITERS_BIT is set and waiters <= 1: reset WAITERS_BIT and wake
> all waiters. In non-racy cases only a single waiter will be woken.
> 
> * WAITERS_BIT is unset: don't wake anybody. Even if there are some
> waiters, another sem_post (that reset the WAITERS_BIT) is
> responsible for waking them.

The third case here is the one we have a problem with: it's possible
that the "another sem_post" responsible for waking them can't do so
because the object is unmapped before it gets a chance to.

I don't see any harm in breaking it down into 2 case:

3a. WAITERS_BIT is unset and waiter count is zero: don't wake anybody.

3b. WAITERS_BIT is unset and waiter count is nonzero: since newly
arriving waiters always set the waiters bit, this can only happen if
another sem_post cleared the waiters bit after the waiter arrived.
That sem_post should perform a broadcast wake as part of clearing the
waiters bit, so either it has not yet made forward progress to the
wake, or the wake was sent but the waiter has not yet woken to
decrement itself from the waiter count and attempt to decrement the
semaphore value. Re-issuing the broadcase wake will ensure, in the
former case, that the wake actually gets sent, and that it happens
before before sem_post returns. I think that solves the problem.

The only harm here is that, for patterns like repeatedly calling
sem_post to increment the semaphore value by more than 1, it's quite
plausible that a waiter has not yet taken itself off the waiter count,
and that two or more wake syscalls will be made where one would have
sufficed. This happened prior to commit
159d1f6c02569091c7a48bdb2e2e824b844a1902 anyway, where the old
condition to wake was:

	if (val<0 || waiters) __wake(sem->__val, 1, priv);

so I don't think we need to be concerned about it.

We could *probably* get by with skipping the waiters>0 condition for
non-pshared semaphores:

	if (val<0 || (!priv && waiters)) __wake(sem->__val, 1, priv);

however, I think there's a variant of the bug reported here in regards
to AS-safety, where a sem_post called from a signal handler that
executes between the atomic and the futex wake won't perform another
wake, even though it may need to in order to trigger forward progress.

Even with the extra wake, there's still an AS-safety issue of sorts
where it's possible for the post to happen (value to be updated)
without a wake, if a long-running signal handler interrupts sem_post.
However, aside from the missed-wake in the case of recursive sem_post,
the situation of a signal handler interrupting sem_post doesn't seem
to be a significant behavioral difference: it's hard to construct
cases where you notice the posted value without sem_getvalue, which
has some allowance to be racey. I don't see a way to fully fix this
without FUTEX_WAKE_OP having a CAS operation (which it sadly omitted)
or just blocking signals in the cases where we're going to issue a
wake (which I guess may be an option, since it's already making a
syscall anyway).

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.