![]() |
|
Message-ID: <433bc1021c8bdcc1c2b17c5fd58d6e19ec144624.camel@gmail.com> Date: Tue, 11 Feb 2025 14:53:22 +0100 From: Daniele Personal <d.dario76@...il.com> To: Rich Felker <dalias@...c.org> Cc: Florian Weimer <fweimer@...hat.com>, musl@...ts.openwall.com Subject: Re: pthread_mutex_t shared between processes with different pid namespaces On Tue, 2025-02-11 at 06:38 -0500, Rich Felker wrote: > On Tue, Feb 11, 2025 at 10:34:30AM +0100, Daniele Personal wrote: > > On Mon, 2025-02-10 at 13:14 -0500, Rich Felker wrote: > > > On Mon, Feb 10, 2025 at 05:12:52PM +0100, Daniele Personal wrote: > > > > On Sat, 2025-02-08 at 09:52 -0500, Rich Felker wrote: > > > > > On Sat, Feb 08, 2025 at 03:40:18PM +0100, Daniele Dario > > > > > wrote: > > > > > > Il sab 8 feb 2025, 13:39 Rich Felker <dalias@...c.org> ha > > > > > > scritto: > > > > > > > > > > > > > On Sat, Feb 08, 2025 at 10:20:45AM +0100, Daniele Dario > > > > > > > wrote: > > > > > > > > But wouldn't this mean that robust mutexes > > > > > > > > functionality is > > > > > > > > totally > > > > > > > > incompatible with pid namespaces? > > > > > > > > > > > > > > No, only with trying to synchronize *across* different > > > > > > > pid > > > > > > > namespaces. > > > > > > > > > > > > > > > If the kernel relies on tid stored in memory by the > > > > > > > > process > > > > > > > > this always > > > > > > > > lacks the information about the pid namespace the tid > > > > > > > > belongs > > > > > > > > to. > > > > > > > > > > > > > > It's necessarily within the same pid namespace as the > > > > > > > process > > > > > > > itself. > > > > > > > > > > > > > > Functionally, you should consider different pid > > > > > > > namespaces as > > > > > > > different systems that happen to be capable of sharing > > > > > > > some > > > > > > > resources. > > > > > > > > > > > > > > Rich > > > > > > > > > > > > > > > > > > > Yes, I'm just saying that sharing pthread_mutex_t instances > > > > > > across > > > > > > processes within the same pid namespace but on a system > > > > > > with > > > > > > more > > > > > > than a > > > > > > pid namespace could lead to issues anyway if the stored tid > > > > > > value > > > > > > is used > > > > > > by the kernel as who to contact without the knowledge of on > > > > > > which > > > > > > pid > > > > > > namespace. > > > > > > > > > > > > I not saying this is true, I'm trying to understand and if > > > > > > possible, > > > > > > improve things. > > > > > > > > > > That's not a problem. The stored tid is used only in the > > > > > context > > > > > of a > > > > > process exiting, where the kernel code knows the relevant pid > > > > > namespace (the one the exiting process is in) and uses the > > > > > tid > > > > > relative to that. If it didn't work this way, it would be a > > > > > fatal > > > > > bug > > > > > in the pid namespace implementation, which is supposed to > > > > > allow > > > > > essentially transparent containerization (which includes > > > > > processes in > > > > > the ns being able to use their tids as they could if they > > > > > were > > > > > outside > > > > > of any container/in global ns). > > > > > > > > > > Rich > > > > > > > > > > > > > So, IIUC, the problem of sharing robust pthread_mutex_t > > > > instances > > > > across different pid namespaces is on the user space side which > > > > is > > > > not > > > > able to distinguish clashes on TIDs. In particular, problems > > > > could > > > > arise when: > > > > > > No, it is not "on the user side". The user side can be modified > > > arbitrarily, and, modulo some cost, could surely be made to work > > > for > > > non-robust process-shared mutexes. The problem is that the kernel > > > -- > > > the part which makes them robust -- has to honor the protocol, > > > and > > > the > > > protocol does not admit distinguishing "pid N in ns X" from "pid > > > N in > > > ns Y". > > > > Ah, I thought your previous sentence was saying that the kernel is > > able > > to make this distinction. > > No, it's able to make the *assumption* that the namespace the tid is > relative to is that of the dying process. That's what lets it work > (and a large part of why namespaces were practical to add to Linux to > begin with -- all of the existing interfaces that use pids/tids need > to know which namespace you're talking about, but they work because > the kernel can assume "same namespace as the executing task"). > > > Unfortunately it is not possible to say which variables need cross- > > ns > > locking and which not. This means that we should treat all in the > > same > > way and so replace all the mutexes with sysv semaphores but this > > has > > some costs: locking sysv semaphores always require syscalls and > > context > > switch between user/kernel spaces even if there's no contention and > > moreover, they imply the presence of accessible files. > > > > We basically use a chunk of shared memory as a storage where > > variables > > could be added/read/written by the various applications. Since > > mutexes > > used to protect the variables are embedded in the same chunk of > > shared > > memory, there is only an mmap needed in order to access the storage > > by > > applications. > > > > Up to now, applications were running in the same pid namespace but > > now, > > for some products, we needed to integrate a 3rd party application > > and > > this requires a certain degree of isolation so we opted to > > containerize > > this application and here we come to why I asked for > > clarifications. > > > > I get your point when you say that sharing robust pthread_mutex_t > > instances violates the pid namespace isolation but you choose the > > degree of isolation balancing the risks and the benefits. Even if > > you > > have a new mount namespace you can decide to bind mount some parts > > of > > the filesystem to allow access to pars of the host flash for > > instance, > > same could happen with network. > > > > Long story short, I'm pulling water to my mill, but I think that > > it's > > not bad to have posix robust shared mutexes working across > > different > > pid namespaces. It will allow users to use a really powerful tool > > also > > with containerized applications (again pulling water to my mill) > > which > > need it. > > Generally we implement nonstandard functionality only on the basis of > strong historical precedent, need by multiple major real-world > applications, lack of cost imposed onto everyone else who doesn't > want/need the functionality, and other similar conditions. On all of > these axes, the thing you're asking for is completely in the opposite > direction. > > > If there's any idea on how to gain this I'd really work on it: > > limiting > > the max number of pids which could run on a pid namespace to allow > > the > > use of some bits for the ns in the tid stored in the robust list > > for > > instance? > > This is something where you're on your own either writing it or > hiring > someone to do so and maintianing your forks of musl and the kernel. > There is just no way this kind of hack ever belongs upstream. > > Rich Thanks for the time you spent on this, I really appreciated. Daniele.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.