|
|
Message-ID: <20201027211735.GV534@brightrain.aerifal.cx>
Date: Tue, 27 Oct 2020 17:17:35 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Status report and MT fork
On Sun, Oct 25, 2020 at 08:59:20PM -0400, Rich Felker wrote:
> On Sun, Oct 25, 2020 at 08:50:29PM -0400, Rich Felker wrote:
> > I just pushed a series of changes (up through 0b87551bdf) I've had
> > queued for a while now, some of which had minor issues that I think
> > have all been resolved now. They cover a range of bugs found in the
> > process of reviewing the possibility of making fork provide a
> > consistent execution environment for the child of a multithreaded
> > parent, and a couple unrelated fixes.
> >
> > Based on distro experience with musl 1.2.1, I'm working on getting the
> > improved fork into 1.2.2. Despite the fact that 1.2.1 did not break
> > anything that wasn't already broken (apps invoking UB in MT-forked
> > children), prior to it most of the active breakage was hit with very
> > low probability, so there were a lot of packages people *thought* were
> > working, that weren't, and feedback from distros seems to be that
> > getting everything working as reliably as before (even if it was
> > imperfect and dangerous before) is not tractable in any reasonable
> > time frame. And in particular, I'm concerned about language runtimes
> > like Ruby that seem to have a contract with applications they host to
> > support MT-forked children. Fixing these is not a matter of fixing a
> > finite set of bugs but fixing a contract, which is likely not
> > tractable.
> >
> > Assuming it goes through, the change here will be far more complete
> > than glibc's handling of MT-forked children, where most things other
> > than malloc don't actually work, but fail sufficiently infrequently
> > that they seem to work. While there are a lot of things I dislike
> > about this path, one major thing I do like is that it really makes
> > internal use of threads by library code (including third party libs)
> > transparent to the application, rather than "transparent, until you
> > use fork".
> >
> > Will follow up with draft patch for testing.
>
> Patch attached. It should suffice for testing and review of whether
> there are any locks/state I overlooked. It could possibly be made less
> ugly too.
>
> Note that this does not strictly conform to past and current POSIX
> that specify fork as AS-safe. POSIX-future has removed fork from the
> AS-safe list, and seems to have considered its original inclusion
> erroneous due to pthread_atfork and existing implementation practice.
> The patch as written takes care to skip all locking in single-threaded
> parents, so it does not break AS-safety property in single-threaded
> programs that may have made use of it. -Dfork=_Fork can also be used
> to get an AS-safe fork, but it's not equivalent to old semantics;
> _Fork does not run atfork handlers. It's also possible to static-link
> an alternate fork implementation that provides its own pthread_atfork
> and calls _Fork, if really needed for a particular application.
>
> Feedback from distro folks would be very helpful -- does this fix all
> the packages that 1.2.1 "broke"?
Another bug:
> diff --git a/src/process/fork.c b/src/process/fork.c
> index a12da01a..ecf7f376 100644
> --- a/src/process/fork.c
> +++ b/src/process/fork.c
> @@ -1,13 +1,81 @@
> #include <unistd.h>
> #include "libc.h"
> +#include "lock.h"
> +#include "pthread_impl.h"
> +#include "fork_impl.h"
> +
> +static volatile int *const dummy_lockptr = 0;
> +
> +weak_alias(dummy_lockptr, __at_quick_exit_lockptr);
> +weak_alias(dummy_lockptr, __atexit_lockptr);
> +weak_alias(dummy_lockptr, __dlerror_lockptr);
> +weak_alias(dummy_lockptr, __gettext_lockptr);
> +weak_alias(dummy_lockptr, __random_lockptr);
> +weak_alias(dummy_lockptr, __sem_open_lockptr);
> +weak_alias(dummy_lockptr, __stdio_ofl_lockptr);
> +weak_alias(dummy_lockptr, __syslog_lockptr);
> +weak_alias(dummy_lockptr, __timezone_lockptr);
> +weak_alias(dummy_lockptr, __bump_lockptr);
> +
> +weak_alias(dummy_lockptr, __vmlock_lockptr);
> +
> +static volatile int *const *const atfork_locks[] = {
> + &__at_quick_exit_lockptr,
> + &__atexit_lockptr,
> + &__dlerror_lockptr,
> + &__gettext_lockptr,
> + &__random_lockptr,
> + &__sem_open_lockptr,
> + &__stdio_ofl_lockptr,
> + &__syslog_lockptr,
> + &__timezone_lockptr,
> + &__bump_lockptr,
> +};
>
> static void dummy(int x) { }
> weak_alias(dummy, __fork_handler);
> +weak_alias(dummy, __malloc_atfork);
> +weak_alias(dummy, __ldso_atfork);
> +
> +static void dummy_0(void) { }
> +weak_alias(dummy_0, __tl_lock);
> +weak_alias(dummy_0, __tl_unlock);
>
> pid_t fork(void)
> {
> + sigset_t set;
> __fork_handler(-1);
> + __block_app_sigs(&set);
> + int need_locks = libc.need_locks > 0;
> + if (need_locks) {
> + __ldso_atfork(-1);
> + __inhibit_ptc();
> + for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
> + if (atfork_locks[i]) LOCK(*atfork_locks[i]);
^^^^^^^^^^^^^^^
Always non-null because it's missing a level of indirection; causes
static linked program to crash. Should be if (*atfork_locks[i]).
> + __malloc_atfork(-1);
> + __tl_lock();
> + }
> + pthread_t self=__pthread_self(), next=self->next;
> pid_t ret = _Fork();
> + if (need_locks) {
> + if (!ret) {
> + for (pthread_t td=next; td!=self; td=td->next)
> + td->tid = -1;
> + if (__vmlock_lockptr) {
> + __vmlock_lockptr[0] = 0;
> + __vmlock_lockptr[1] = 0;
> + }
> + }
> + __tl_unlock();
> + __malloc_atfork(!ret);
> + for (int i=0; i<sizeof atfork_locks/sizeof *atfork_locks; i++)
> + if (atfork_locks[i])
^^^^^^^^^^^^^^^
And same here.
> + if (ret) UNLOCK(*atfork_locks[i]);
> + else **atfork_locks[i] = 0;
> + __release_ptc();
> + __ldso_atfork(!ret);
> + }
> + __restore_sigs(&set);
> __fork_handler(!ret);
> return ret;
> }
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.