musl - Re: pthread_mutex_t shared between processes with different pid namespaces

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <602655d31ed172b8c4d65a7127d275e7973c3549.camel@gmail.com>
Date: Tue, 11 Feb 2025 10:34:30 +0100
From: Daniele Personal <d.dario76@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: Florian Weimer <fweimer@...hat.com>, musl@...ts.openwall.com
Subject: Re: pthread_mutex_t shared between processes with different
 pid namespaces

On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote:
> On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote:
> > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote:
> > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario wrote:
> > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@...c.org> ha
> > > > scritto:
> > > > 
> > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario
> > > > > wrote:
> > > > > > But wouldn't this mean that robust mutexes functionality is
> > > > > > totally
> > > > > > incompatible with pid namespaces?
> > > > > 
> > > > > No, only with trying to synchronize *across* different pid
> > > > > namespaces.
> > > > > 
> > > > > > If the kernel relies on tid stored in memory by the process
> > > > > > this always
> > > > > > lacks the information about the pid namespace the tid
> > > > > > belongs
> > > > > > to.
> > > > > 
> > > > > It's necessarily within the same pid namespace as the process
> > > > > itself.
> > > > > 
> > > > > Functionally, you should consider different pid namespaces as
> > > > > different systems that happen to be capable of sharing some
> > > > > resources.
> > > > > 
> > > > > Rich
> > > > > 
> > > > 
> > > > Yes, I'm just saying that sharing pthread_mutex_t instances
> > > > across
> > > > processes within the same pid namespace but on a system with
> > > > more
> > > > than a
> > > > pid namespace could lead to issues anyway if the stored tid
> > > > value
> > > > is used
> > > > by the kernel as who to contact without the knowledge of on
> > > > which
> > > > pid
> > > > namespace.
> > > > 
> > > > I not saying this is true, I'm trying to understand and if
> > > > possible,
> > > > improve things.
> > > 
> > > That's not a problem. The stored tid is used only in the context
> > > of a
> > > process exiting, where the kernel code knows the relevant pid
> > > namespace (the one the exiting process is in) and uses the tid
> > > relative to that. If it didn't work this way, it would be a fatal
> > > bug
> > > in the pid namespace implementation, which is supposed to allow
> > > essentially transparent containerization (which includes
> > > processes in
> > > the ns being able to use their tids as they could if they were
> > > outside
> > > of any container/in global ns).
> > > 
> > > Rich
> > > 
> > 
> > So, IIUC, the problem of sharing robust pthread_mutex_t instances
> > across different pid namespaces is on the user space side which is
> > not
> > able to distinguish clashes on TIDs. In particular, problems could
> > arise when:
> 
> No, it is not "on the user side". The user side can be modified
> arbitrarily, and, modulo some cost, could surely be made to work for
> non-robust process-shared mutexes. The problem is that the kernel --
> the part which makes them robust -- has to honor the protocol, and
> the
> protocol does not admit distinguishing "pid N in ns X" from "pid N in
> ns Y".

Ah, I thought your previous sentence was saying that the kernel is able
to make this distinction.

> 
> >  * an application tries to unlock a mutex owned by another one with
> > its
> >    same TID but on a different pid namespace (but this is an
> >    application design problem and libc can't help because TIDs are
> > not
> >    unique across different pid namespaces)
> >  * an application tries to lock a mutex owned by another one with
> > its
> >    same TID but on a different pid namespace: this is a real issue
> >    because it could happen
> > 
> > I know that pid namespace isolation usually comes also with ipc
> > namespace isolation but it is not a violation to have one without
> > the
> > other. Wouldn't it be a good idea to figure out a way to have a
> > safe
> > way to use robust mutexes shared across different pid namespaces?
> 
> I do not consider this a reasonable expenditure of complexity
> whatsoever. It would require at least having a new robust list
> protocol, with userspace having to support both the old and new ones
> adapting at runtime, and may even require larger-than-wordsize
> atomics, which are not something you can assume exists. All of this
> for the explicit purpose of *violating* the whole intended purpose of
> namespaces: the isolation.
> 
> For cases where you really need cross-ns locking, you could use sysv
> semaphores if the sysvipc namespace is shared. If it's not, you could
> use fcntl ODF locks on a shared file descriptor, which should have
> your needed robustness properties.
> 
> Rich

Unfortunately it is not possible to say which variables need cross-ns
locking and which not. This means that we should treat all in the same
way and so replace all the mutexes with sysv semaphores but this has
some costs: locking sysv semaphores always require syscalls and context
switch between user/kernel spaces even if there's no contention and
moreover, they imply the presence of accessible files.

We basically use a chunk of shared memory as a storage where variables
could be added/read/written by the various applications. Since mutexes
used to protect the variables are embedded in the same chunk of shared
memory, there is only an mmap needed in order to access the storage by
applications.

Up to now, applications were running in the same pid namespace but now,
for some products, we needed to integrate a 3rd party application and
this requires a certain degree of isolation so we opted to containerize
this application and here we come to why I asked for clarifications.

I get your point when you say that sharing robust pthread_mutex_t
instances violates the pid namespace isolation but you choose the
degree of isolation balancing the risks and the benefits. Even if you
have a new mount namespace you can decide to bind mount some parts of
the filesystem to allow access to pars of the host flash for instance,
same could happen with network.

Long story short, I'm pulling water to my mill, but I think that it's
not bad to have posix robust shared mutexes working across different
pid namespaces. It will allow users to use a really powerful tool also
with containerized applications (again pulling water to my mill) which
need it.

If there's any idea on how to gain this I'd really work on it: limiting
the max number of pids which could run on a pid namespace to allow the
use of some bits for the ns in the tid stored in the robust list for
instance?

On the other hand I'll surely try what you suggested.

Thanks,
Daniele.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.