Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240804133110.GV10433@brightrain.aerifal.cx>
Date: Sun, 4 Aug 2024 09:31:10 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com, Alexey Izbyshev <izbyshev@...ras.ru>
Subject: Re: Memory lock needed for semaphores?

On Sun, Aug 04, 2024 at 09:29:03AM +0200, Markus Wichmann wrote:
> Hi all,
> 
> in the light of the recent thread about barriers I looked at the
> semaphore implementation under the light of the sudden unmap issue
> again. To recap, the issue is that multiple threads may interact with
> the same pshared semaphore, and in particular one thread might unmap a
> semaphore that another thread is currently still interacting with.
> 
> Is that an issue with semaphores? I'm thinking of the following
> scenario:
> 
> Start: One thread of another process is blocked at the semaphore
> (sem->__val = {INT_MIN, 1, 0}).
> 
> Thread 1: Calls sem_post(), gets until just after the a_cas() before
> being preempted (sem->__val = {1, 1, 0}, but this thread saw val < 0 and
> waiters <= 1, so broadcast wake will issue once thread resumes).
> 
> Thread 2: Calls sem_trywait(), sets sem->__val = {0, 1, 0}.
> Thread 2: Calls sem_post(), sets sem->__val = {1, 1, 0}, does not wake.
> Thread 2: Unmaps the semaphore.
> Thread 1: Resumes. SYS_futex returns EFAULT. Other process remains
> sleeping.
> 
> Am I missing something that makes this impossible? If not, do we maybe
> need a vmlock for posting pshared semaphores? That would probably be the
> easiest fix.

This is not possible and would be a misuse of it even if it were.
sem_post is required to be AS-safe, so it cannot take any sort of
lock.

I think the issue you report is real, but provided that's the case,
it's a flaw in the logic to be relying on the first sem_post to
perform the wake the second logically needs to perform. I haven't yet
looked into what the cost of leaving the waiters flag set would be.

There are kinda 3 approaches I can see exploring to fix this, and I'm
not sure which are low-cost or even valid yet:

1. Have post only clear the waiters flag if the count is <1 (not <=1)
2. Have trywait re-add a waiters flag if it sees waiters>0
3. Have post perform the wake if it saw waiters>0 even if waiters bit
   was not set.

I'll look at this more later. I've CC'd Alexey Izbyshev who introduced
the current waiter accounting.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.