musl - Re: Memory lock needed for semaphores?

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240810204720.GH10433@brightrain.aerifal.cx>
Date: Sat, 10 Aug 2024 16:47:20 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com, Alexey Izbyshev <izbyshev@...ras.ru>
Subject: Re: Memory lock needed for semaphores?

On Sun, Aug 04, 2024 at 06:40:24PM -0400, Rich Felker wrote:
> On Sun, Aug 04, 2024 at 09:31:10AM -0400, Rich Felker wrote:
> > On Sun, Aug 04, 2024 at 09:29:03AM +0200, Markus Wichmann wrote:
> > > Hi all,
> > > 
> > > in the light of the recent thread about barriers I looked at the
> > > semaphore implementation under the light of the sudden unmap issue
> > > again. To recap, the issue is that multiple threads may interact with
> > > the same pshared semaphore, and in particular one thread might unmap a
> > > semaphore that another thread is currently still interacting with.
> > > 
> > > Is that an issue with semaphores? I'm thinking of the following
> > > scenario:
> > > 
> > > Start: One thread of another process is blocked at the semaphore
> > > (sem->__val = {INT_MIN, 1, 0}).
> > > 
> > > Thread 1: Calls sem_post(), gets until just after the a_cas() before
> > > being preempted (sem->__val = {1, 1, 0}, but this thread saw val < 0 and
> > > waiters <= 1, so broadcast wake will issue once thread resumes).
> > > 
> > > Thread 2: Calls sem_trywait(), sets sem->__val = {0, 1, 0}.
> > > Thread 2: Calls sem_post(), sets sem->__val = {1, 1, 0}, does not wake.
> > > Thread 2: Unmaps the semaphore.
> > > Thread 1: Resumes. SYS_futex returns EFAULT. Other process remains
> > > sleeping.
> > > 
> > > Am I missing something that makes this impossible? If not, do we maybe
> > > need a vmlock for posting pshared semaphores? That would probably be the
> > > easiest fix.
> > 
> > This is not possible and would be a misuse of it even if it were.
> > sem_post is required to be AS-safe, so it cannot take any sort of
> > lock.
> > 
> > I think the issue you report is real, but provided that's the case,
> > it's a flaw in the logic to be relying on the first sem_post to
> > perform the wake the second logically needs to perform. I haven't yet
> > looked into what the cost of leaving the waiters flag set would be.
> > 
> > There are kinda 3 approaches I can see exploring to fix this, and I'm
> > not sure which are low-cost or even valid yet:
> > 
> > 1. Have post only clear the waiters flag if the count is <1 (not <=1)
> > 2. Have trywait re-add a waiters flag if it sees waiters>0
> > 3. Have post perform the wake if it saw waiters>0 even if waiters bit
> >    was not set.
> > 
> > I'll look at this more later. I've CC'd Alexey Izbyshev who introduced
> > the current waiter accounting.
> 
> >From the original thread:
> 
> https://www.openwall.com/lists/musl/2022/11/22/3
> 
> On Tue, Nov 22, 2022 at 08:41:44AM +0300, Alexey Izbyshev wrote:
> > ...
> > * WAITERS_BIT is set and waiters > 1: don't reset WAITERS_BIT since
> > it's likely that some waiters will remain (and it's always fine to
> > err on the side of preserving the WAITERS_BIT); wake a single
> > waiter.
> > 
> > * WAITERS_BIT is set and waiters <= 1: reset WAITERS_BIT and wake
> > all waiters. In non-racy cases only a single waiter will be woken.
> > 
> > * WAITERS_BIT is unset: don't wake anybody. Even if there are some
> > waiters, another sem_post (that reset the WAITERS_BIT) is
> > responsible for waking them.
> 
> The third case here is the one we have a problem with: it's possible
> that the "another sem_post" responsible for waking them can't do so
> because the object is unmapped before it gets a chance to.
> 
> I don't see any harm in breaking it down into 2 case:
> 
> 3a. WAITERS_BIT is unset and waiter count is zero: don't wake anybody.
> 
> 3b. WAITERS_BIT is unset and waiter count is nonzero: since newly
> arriving waiters always set the waiters bit, this can only happen if
> another sem_post cleared the waiters bit after the waiter arrived.
> That sem_post should perform a broadcast wake as part of clearing the
> waiters bit, so either it has not yet made forward progress to the
> wake, or the wake was sent but the waiter has not yet woken to
> decrement itself from the waiter count and attempt to decrement the
> semaphore value. Re-issuing the broadcase wake will ensure, in the
> former case, that the wake actually gets sent, and that it happens
> before before sem_post returns. I think that solves the problem.
> 
> The only harm here is that, for patterns like repeatedly calling
> sem_post to increment the semaphore value by more than 1, it's quite
> plausible that a waiter has not yet taken itself off the waiter count,
> and that two or more wake syscalls will be made where one would have
> sufficed. This happened prior to commit
> 159d1f6c02569091c7a48bdb2e2e824b844a1902 anyway, where the old
> condition to wake was:
> 
> 	if (val<0 || waiters) __wake(sem->__val, 1, priv);
> 
> so I don't think we need to be concerned about it.
> 
> We could *probably* get by with skipping the waiters>0 condition for
> non-pshared semaphores:
> 
> 	if (val<0 || (!priv && waiters)) __wake(sem->__val, 1, priv);
> 
> however, I think there's a variant of the bug reported here in regards
> to AS-safety, where a sem_post called from a signal handler that
> executes between the atomic and the futex wake won't perform another
> wake, even though it may need to in order to trigger forward progress.
> 
> Even with the extra wake, there's still an AS-safety issue of sorts
> where it's possible for the post to happen (value to be updated)
> without a wake, if a long-running signal handler interrupts sem_post.
> However, aside from the missed-wake in the case of recursive sem_post,
> the situation of a signal handler interrupting sem_post doesn't seem
> to be a significant behavioral difference: it's hard to construct
> cases where you notice the posted value without sem_getvalue, which
> has some allowance to be racey. I don't see a way to fully fix this
> without FUTEX_WAKE_OP having a CAS operation (which it sadly omitted)
> or just blocking signals in the cases where we're going to issue a
> wake (which I guess may be an option, since it's already making a
> syscall anyway).

How does the attached look?

Rich

View attachment "0001-fix-lost-or-delayed-wakes-in-sem_post-under-certain-.patch" of type "text/plain" (2105 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.