|
Message-ID: <20230601200806.GM4163@brightrain.aerifal.cx> Date: Thu, 1 Jun 2023 16:08:07 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: [PATCH] [RFC] make clone() usable On Thu, Jun 01, 2023 at 01:12:57PM +0300, Alexey Izbyshev wrote: > On 2023-05-31 02:35, Rich Felker wrote: > >As discussed before (see the 2021 thread "Incorrect thread TID > >caching") clone() has been effectively unusable because it produces a > >child process in invalid state, with wrong tid in its thread > >structure, among other problems. > > > >The attached proposed patch attempts to make clone() usable by having > >it share the _Fork logic for establishing a consistent process state > >after forking, and also blocks use of flags which produce invalid > >state. > > > Tangentially, while thinking about this, I realized that because the > kernel clears the address of the exit futex on clone and _Fork > doesn't re-establish it, the following blocks forever: > > void *thr(void *arg) { > // Blocks in __tl_sync because __thread_list_lock is never > unlocked > pthread_join(*(pthread_t *)arg, NULL)); > return NULL; > } > > int main() { > if (!fork()) { > static pthread_t pt; > pt = pthread_self(); > pthread_create(&(pthread_t){0}, NULL, thr, &pt); > pthread_exit(NULL); > } > wait(NULL); > } > > This could be fixed by the following change in _Fork.c: > > - self->tid = __syscall(SYS_gettid); > + self->tid = __syscall(SYS_set_tid_address, > &__thread_list_lock); Wow, I had no idea Linux's fork cleared the exit futex address. That's a big bug we've had around for nobody to have noticed... > >With CLONE_VM, the old behavior is kept, with a caveat that the child > >context is extremely restrictive, ala vfork. > > > >It was raised previously (pardon the pun) that raise() should perhaps > >be modified to make a SYS_gettid syscall rather than using the thread > >structure tid, so that it's possible to call raise() in a vfork > >context without signaling the wrong process. It might make sense to do > >that now too. > > > Yes, it would be nice to have raise and abort (which also reads the > cached tid separately) working in a vfork child. > > >diff --git a/src/linux/clone.c b/src/linux/clone.c > >index 8c1af7d3..e8f76bbc 100644 > >--- a/src/linux/clone.c > >+++ b/src/linux/clone.c > >@@ -4,18 +4,55 @@ > > #include <sched.h> > > #include "pthread_impl.h" > > #include "syscall.h" > >+#include "lock.h" > >+#include "fork_impl.h" > >+ > >+struct clone_start_args { > >+ int (*func)(void *); > >+ void *arg; > >+ sigset_t sigmask; > >+}; > >+ > >+static int clone_start(void *arg) > >+{ > >+ struct clone_start_args *csa = arg; > >+ __post_Fork(0); > >+ __restore_sigs(&csa->sigmask); > >+ return csa->func(csa->arg); > >+} > > > > int clone(int (*func)(void *), void *stack, int flags, void *arg, ...) > > { > >+ struct clone_start_args csa; > > va_list ap; > >- pid_t *ptid, *ctid; > >- void *tls; > >+ pid_t *ptid = 0, *ctid = 0; > >+ void *tls = 0; > >+ > >+ /* Flags that produce an invalid thread/TLS state are disallowed. */ > >+ int badflags = CLONE_THREAD | CLONE_SETTLS | CLONE_CHILD_CLEARTID; > >+ if (flags & badflags) > >+ return __syscall_ret(-EINVAL); > > > Maybe also check for NULL stack, as proposed in > https://www.openwall.com/lists/musl/2022/08/02/1 ? I guess that can be done. Normally we don't check for null pointers, but I think the syscall would silently keep the existing stack pointer if passed null rather than blowing up, which we probably don't want -- even without CLONE_VM, where it'd blow up, we'd be giving appears-to-work behavior without a contract for it to work. > > va_start(ap, arg); > >- ptid = va_arg(ap, pid_t *); > >- tls = va_arg(ap, void *); > >- ctid = va_arg(ap, pid_t *); > >+ if (flags & (CLONE_PIDFD | CLONE_PARENT_SETTID | CLONE_CHILD_SETTID)) > >+ ptid = va_arg(ap, pid_t *); > >+ if (flags & CLONE_CHILD_SETTID) { > >+ tls = va_arg(ap, void *); > >+ ctid = va_arg(ap, pid_t *); > >+ } > > va_end(ap); > > > >- return __syscall_ret(__clone(func, stack, flags, arg, ptid, tls, > >ctid)); > >+ if (flags & CLONE_VM) > >+ return __syscall_ret(__clone(func, stack, flags, arg, ptid, > >tls, ctid)); > >+ > >+ __block_all_sigs(&csa.sigmask); > >+ LOCK(__abort_lock); > >+ > >+ csa.func = func; > >+ csa.arg = arg; > >+ int ret = __clone(clone_start, stack, flags, &csa, ptid, tls, ctid); > >+ > >+ __post_Fork(ret); > >+ __restore_sigs(&csa.sigmask); > >+ return __syscall_ret(ret); > > } > > Another thing that might be somewhat relevant if we expect to see > wider usage of clone is that syscall (the public function) currently > blindly extracts six arguments via va_arg, which on some > architectures requires sufficient stack space not to crash. So on > i386 the following silly function will crash if passed to clone, > provided that "stack" points to the beginning of unmapped space past > the stack page end. That's changed in the patch. It only calls va_arg for arguments whose existence the flags implies. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.