![]() |
|
Message-ID: <20250824021820.GI1827@brightrain.aerifal.cx> Date: Sat, 23 Aug 2025 22:18:20 -0400 From: Rich Felker <dalias@...c.org> To: Demi Marie Obenour <demiobenour@...il.com> Cc: musl@...ts.openwall.com, libc-alpha@...rceware.org Subject: Re: Running code on all other threads (for sandboxing) On Fri, Aug 22, 2025 at 09:34:55PM -0400, Demi Marie Obenour wrote: > There are cases where it is highly desirable for a process > to start out with full user rights (or at least close to them), > initialize, and then drop these privileges using Linux kernel > features like seccomp. Unfortunately, this breaks if the > process uses third-party libraries that create threads during > initialization. In particular, Mesa can do this, and there is > no realistic alternative to it as Mesa is ~2 million lines of > GPU compiler and driver code. Loading Mesa later is undesirable > as it prevents removing all filesystem access. > > There are two ways to fix this problem: > > 1. Fix the problem in the Linux kernel. > 2. Work around it in userspace, as is already done for setuid() > and friends. > > For the second, it should be sufficient to provide a function > that runs a caller-provided function on each thread, while > ensuring that the process is atomic with respect to other > threads in the process. This function only needs to make > system calls and crashes the process if there is an error. > If the function uses anything that isn't a syscall or > compiler builtin, it gets to keep both pieces. > > Is this something that would make sense to implement? I know > that this problem has been an issue for Chromium on Linux. I'm not sure what the right solution to this specific problem is, but I don't think exposing a "run arbitrary code in each thread" as a public API is a good choice. Such code would run in a context which is worse/more-restrictive even than "async signal" context, making it really difficult to define any reasonable class of "what you're allowed to do here". I know you said "syscalls", but even that requires defining what you mean by syscalls (raw via asm? via syscall()? any function that's "traditionally just a syscall"?) and further specifying which syscalls are actually allowed (any which break the __synccall context assumptions would need to be forbidden). I think there are potentially semi-portable solutions to your problem that don't require such a big hammer as arbitrary __synccall. One that comes to mind is installing a SECCOMP_RET_USER_NOTIF or SECCOMP_RET_TRAP filter before loading Mesa. This could allow the filesystem access to load Mesa libraries only until you set a flag that loading has finished, then cause filesystem access syscalls to fail once the flag has been set. Another approach is doing what I'd call "manual __synccall" with your own signal, which is better than exposing actual __synccall because the application code does not run in an invalid-libc context, but this would only work if Mesa's hidden threads don't mask signals. A library creating its own threads behind the scenes *should* be masking all signals, so this probably doesn't work. Even if Mesa botched it, you wouldn't want to preclude them fixing it. There is probably also a way to do this with ptrace, which blocked signals wouldn't interfere with, but that gets really nasty really quick. Unfortunately there don't seem to be any ways to inject new seccomp filters into another task (even a thread of your own process) directly. This is what Linux really should be offering here. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.