![]() |
|
Message-ID: <CANk+eUVC-4qfB93689tzyX9sXSJDpaW20ak4SQd6PNt9wUjCvg@mail.gmail.com>
Date: Sat, 8 Feb 2025 15:40:18 +0100
From: Daniele Dario <d.dario76@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: Florian Weimer <fweimer@...hat.com>, musl@...ts.openwall.com
Subject: Re: pthread_mutex_t shared between processes with different
pid namespaces
Il sab 8 feb 2025, 13:39 Rich Felker <dalias@...c.org> ha scritto:
> On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario wrote:
> > But wouldn't this mean that robust mutexes functionality is totally
> > incompatible with pid namespaces?
>
> No, only with trying to synchronize *across* different pid namespaces.
>
> > If the kernel relies on tid stored in memory by the process this always
> > lacks the information about the pid namespace the tid belongs to.
>
> It's necessarily within the same pid namespace as the process itself.
>
> Functionally, you should consider different pid namespaces as
> different systems that happen to be capable of sharing some resources.
>
> Rich
>
Yes, I'm just saying that sharing pthread_mutex_t instances across
processes within the same pid namespace but on a system with more than a
pid namespace could lead to issues anyway if the stored tid value is used
by the kernel as who to contact without the knowledge of on which pid
namespace.
I not saying this is true, I'm trying to understand and if possible,
improve things.
Daniele
>
>
> > Il giorno ven 7 feb 2025 alle ore 17:19 Rich Felker <dalias@...c.org> ha
> > scritto:
> >
> > > On Thu, Feb 06, 2025 at 08:45:14AM +0100, Daniele Personal wrote:
> > > > On Wed, 2025-02-05 at 11:32 +0100, Florian Weimer wrote:
> > > > > * Daniele Personal:
> > > > >
> > > > > > On Tue, 2025-02-04 at 13:53 -0500, Rich Felker wrote:
> > > > > > > On Mon, Feb 03, 2025 at 06:25:41PM +0100, Florian Weimer wrote:
> > > > > > > > * Daniele Personal:
> > > > > > > >
> > > > > > > > > On Sat, 2025-02-01 at 17:03 +0100, Florian Weimer wrote:
> > > > > > > > > > * Daniele Personal:
> > > > > > > > > >
> > > > > > > > > > > > Is this required for implementing the unlock-if-not-
> > > > > > > > > > > > owner
> > > > > > > > > > > > error
> > > > > > > > > > > > code
> > > > > > > > > > > > on mutex unlock?
> > > > > > > > > > >
> > > > > > > > > > > No, I don't see problems related to EOWNERDEAD.
> > > > > > > > > >
> > > > > > > > > > Sorry, what I meant is that the TID is needed for
> efficient
> > > > > > > > > > reporting
> > > > > > > > > > of
> > > > > > > > > > usage errors. It's not imposed by the robust list
> protocol
> > > > > > > > > > as
> > > > > > > > > > such..
> > > > > > > > > > There could be a PID-namespace-compatible robust mutex
> type
> > > > > > > > > > that does
> > > > > > > > > > not have this problem (but with less error checking).
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Florian
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Are you saying that there are pthread_mutexes which can be
> > > > > > > > > shared
> > > > > > > > > across processes run on different pid namespaces? If yes
> I'm
> > > > > > > > > definitely
> > > > > > > > > interested on this. Can you tell me something more?
> > > > > > > >
> > > > > > > > You would have to add a new mutex type that is a mix of
> > > > > > > > PTHREAD_MUTEX_NORMAL amd PTHREAD_MUTEX_ROBUST. Closer to the
> > > > > > > > latter,
> > > > > > > > but without the ownership checks.
> > > > > > >
> > > > > > > This is inaccurate. Robust mutexes fundamentally depend on
> having
> > > > > > > the
> > > > > > > owner's tid in the owner field, and on this value not matching
> > > > > > > the
> > > > > > > tid of any other task that might hold the mutex. If these
> > > > > > > properties
> > > > > > > don't hold, the mutex may fail to unlock when the owner dies,
> or
> > > > > > > incorrectly unlock when another task mimicking the owner dies.
> > > > > > >
> > > > > > > The Linux robust mutex protocol fundamentally does not work
> > > > > > > across
> > > > > > > pid namespaces.
> > > > >
> > > > > Thank you, Rich, for the correction.
> > > > >
> > > > > > Looking at the code for musl 1.2.4, a pthread_mutex_t which has
> > > > > > been
> > > > > > initialized as shared and robust but not PI capable leaves
> > > > > > uncovered
> > > > > > only the case of pthread_mutex_unlock().
> > > > >
> > > > > > As mentioned by Rich, since TIDs are not unique across different
> > > > > > namespaces, a task might unlock a mutex hold by another one if
> they
> > > > > > have the same TID.
> > > > > >
> > > > > > I don't see other possible errors, am I missing something?
> > > > >
> > > > > The kernel code uses the owner TID to handle some special cases:
> > > > >
> > > > > /*
> > > > > * Special case for regular (non PI) futexes. The unlock
> > > > > path in
> > > > > * user space has two race scenarios:
> > > > > *
> > > > > * 1. The unlock path releases the user space futex value
> > > > > and
> > > > > * before it can execute the futex() syscall to wake up
> > > > > * waiters it is killed.
> > > > > *
> > > > > * 2. A woken up waiter is killed before it can acquire the
> > > > > * futex in user space.
> > > > > *
> > > > > * In the second case, the wake up notification could be
> > > > > generated
> > > > > * by the unlock path in user space after setting the futex
> > > > > value
> > > > > * to zero or by the kernel after setting the OWNER_DIED bit
> > > > > below.
> > > > > *
> > > > > * In both cases the TID validation below prevents a wakeup
> > > > > of
> > > > > * potential waiters which can cause these waiters to block
> > > > > * forever.
> > > > > *
> > > > > * In both cases the following conditions are met:
> > > > > *
> > > > > * 1) task->robust_list->list_op_pending != NULL
> > > > > * @pending_op == true
> > > > > * 2) The owner part of user space futex value == 0
> > > > > * 3) Regular futex: @pi == false
> > > > > *
> > > > > * If these conditions are met, it is safe to attempt waking
> > > > > up a
> > > > > * potential waiter without touching the user space futex
> > > > > value and
> > > > > * trying to set the OWNER_DIED bit. If the futex value is
> > > > > zero,
> > > > > * the rest of the user space mutex state is consistent, so
> > > > > a woken
> > > > > * waiter will just take over the uncontended futex. Setting
> > > > > the
> > > > > * OWNER_DIED bit would create inconsistent state and
> > > > > malfunction
> > > > > * of the user space owner died handling. Otherwise, the
> > > > > OWNER_DIED
> > > > > * bit is already set, and the woken waiter is expected to
> > > > > deal with
> > > > > * this.
> > > > > */
> > > > > owner = uval & FUTEX_TID_MASK;
> > > > >
> > > > > if (pending_op && !pi && !owner) {
> > > > > futex_wake(uaddr, FLAGS_SIZE_32 | FLAGS_SHARED, 1,
> > > > > FUTEX_BITSET_MATCH_ANY);
> > > > > return 0;
> > > > > }
> > > > >
> > > > > As a result, it's definitely just a userspace-only change if you
> need
> > > > > to
> > > > > use the robust mutex list across PID namespaces.
> > > > >
> > > >
> > > > I tried to understand what you mean here but can't: can you please
> > > > explain me which userspace-only change is needed?
> > >
> > > No such change is possible. Robust futexes inherently rely on the
> > > kernel being able to evaluate, on async process death, whether the
> > > dying task was the owner of a mutex in the robust list. This depends
> > > on the tid stored in memory being an accurate and unique identifier
> > > for the task. If you violate this, you can hack things make the
> > > userspace side work, but the whole robust functionality you want will
> > > fail to work.
> > >
> > > Rich
> > >
>
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.