|
|
Message-ID: <20230601200806.GM4163@brightrain.aerifal.cx>
Date: Thu, 1 Jun 2023 16:08:07 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] [RFC] make clone() usable
On Thu, Jun 01, 2023 at 01:12:57PM +0300, Alexey Izbyshev wrote:
> On 2023-05-31 02:35, Rich Felker wrote:
> >As discussed before (see the 2021 thread "Incorrect thread TID
> >caching") clone() has been effectively unusable because it produces a
> >child process in invalid state, with wrong tid in its thread
> >structure, among other problems.
> >
> >The attached proposed patch attempts to make clone() usable by having
> >it share the _Fork logic for establishing a consistent process state
> >after forking, and also blocks use of flags which produce invalid
> >state.
> >
> Tangentially, while thinking about this, I realized that because the
> kernel clears the address of the exit futex on clone and _Fork
> doesn't re-establish it, the following blocks forever:
>
> void *thr(void *arg) {
> // Blocks in __tl_sync because __thread_list_lock is never
> unlocked
> pthread_join(*(pthread_t *)arg, NULL));
> return NULL;
> }
>
> int main() {
> if (!fork()) {
> static pthread_t pt;
> pt = pthread_self();
> pthread_create(&(pthread_t){0}, NULL, thr, &pt);
> pthread_exit(NULL);
> }
> wait(NULL);
> }
>
> This could be fixed by the following change in _Fork.c:
>
> - self->tid = __syscall(SYS_gettid);
> + self->tid = __syscall(SYS_set_tid_address,
> &__thread_list_lock);
Wow, I had no idea Linux's fork cleared the exit futex address. That's
a big bug we've had around for nobody to have noticed...
> >With CLONE_VM, the old behavior is kept, with a caveat that the child
> >context is extremely restrictive, ala vfork.
> >
> >It was raised previously (pardon the pun) that raise() should perhaps
> >be modified to make a SYS_gettid syscall rather than using the thread
> >structure tid, so that it's possible to call raise() in a vfork
> >context without signaling the wrong process. It might make sense to do
> >that now too.
> >
> Yes, it would be nice to have raise and abort (which also reads the
> cached tid separately) working in a vfork child.
>
> >diff --git a/src/linux/clone.c b/src/linux/clone.c
> >index 8c1af7d3..e8f76bbc 100644
> >--- a/src/linux/clone.c
> >+++ b/src/linux/clone.c
> >@@ -4,18 +4,55 @@
> > #include <sched.h>
> > #include "pthread_impl.h"
> > #include "syscall.h"
> >+#include "lock.h"
> >+#include "fork_impl.h"
> >+
> >+struct clone_start_args {
> >+ int (*func)(void *);
> >+ void *arg;
> >+ sigset_t sigmask;
> >+};
> >+
> >+static int clone_start(void *arg)
> >+{
> >+ struct clone_start_args *csa = arg;
> >+ __post_Fork(0);
> >+ __restore_sigs(&csa->sigmask);
> >+ return csa->func(csa->arg);
> >+}
> >
> > int clone(int (*func)(void *), void *stack, int flags, void *arg, ...)
> > {
> >+ struct clone_start_args csa;
> > va_list ap;
> >- pid_t *ptid, *ctid;
> >- void *tls;
> >+ pid_t *ptid = 0, *ctid = 0;
> >+ void *tls = 0;
> >+
> >+ /* Flags that produce an invalid thread/TLS state are disallowed. */
> >+ int badflags = CLONE_THREAD | CLONE_SETTLS | CLONE_CHILD_CLEARTID;
> >+ if (flags & badflags)
> >+ return __syscall_ret(-EINVAL);
> >
> Maybe also check for NULL stack, as proposed in
> https://www.openwall.com/lists/musl/2022/08/02/1 ?
I guess that can be done. Normally we don't check for null pointers,
but I think the syscall would silently keep the existing stack pointer
if passed null rather than blowing up, which we probably don't want --
even without CLONE_VM, where it'd blow up, we'd be giving
appears-to-work behavior without a contract for it to work.
> > va_start(ap, arg);
> >- ptid = va_arg(ap, pid_t *);
> >- tls = va_arg(ap, void *);
> >- ctid = va_arg(ap, pid_t *);
> >+ if (flags & (CLONE_PIDFD | CLONE_PARENT_SETTID | CLONE_CHILD_SETTID))
> >+ ptid = va_arg(ap, pid_t *);
> >+ if (flags & CLONE_CHILD_SETTID) {
> >+ tls = va_arg(ap, void *);
> >+ ctid = va_arg(ap, pid_t *);
> >+ }
> > va_end(ap);
> >
> >- return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls,
> >ctid));
> >+ if (flags & CLONE_VM)
> >+ return __syscall_ret(__clone(func, stack, flags, arg, ptid,
> >tls, ctid));
> >+
> >+ __block_all_sigs(&csa.sigmask);
> >+ LOCK(__abort_lock);
> >+
> >+ csa.func = func;
> >+ csa.arg = arg;
> >+ int ret = __clone(clone_start, stack, flags, &csa, ptid, tls, ctid);
> >+
> >+ __post_Fork(ret);
> >+ __restore_sigs(&csa.sigmask);
> >+ return __syscall_ret(ret);
> > }
>
> Another thing that might be somewhat relevant if we expect to see
> wider usage of clone is that syscall (the public function) currently
> blindly extracts six arguments via va_arg, which on some
> architectures requires sufficient stack space not to crash. So on
> i386 the following silly function will crash if passed to clone,
> provided that "stack" points to the beginning of unmapped space past
> the stack page end.
That's changed in the patch. It only calls va_arg for arguments whose
existence the flags implies.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.