musl - Re: TLS (thread-local storage) support

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20121016225438.GT254@brightrain.aerifal.cx>
Date: Tue, 16 Oct 2012 18:54:39 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: TLS (thread-local storage) support

On Tue, Oct 16, 2012 at 11:47:52PM +0200, boris brezillon wrote:
> 2012/10/16 boris brezillon <b.brezillon.musl@...il.com>:
> > Hi,
> >
> > First I'd like to thank Rich for adding TLS support (I started to work
> > on it a few weeks ago but never had time to finish it).
> >
> > 2012/10/6 Daniel Cegiełka <daniel.cegielka@...il.com>:
> >> 2012/10/5 Rich Felker <dalias@...ifal.cx>:
> >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
> >>>> great news! Finally able to compile Go (lang)...
> >>>
> >>> Did Go fail with gcc's emulated TLS in libgcc?
> >>
> >> I tested Go with sabotage (with fresh musl). I'll try to do it again...
> >> gcc in sabotage was compiled without support for TLS, so I didn't
> >> expect that it will be successful:
> >>
> >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4
> >>
> > There's at least one thing (maybe more) missing for go support with
> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
> > http://gcc.gnu.org/wiki/SplitStacks).
> >
> > I'm also interested in split stack support in musl but for other
> > reasons (thread and coroutine stack automatic expansion).
> >
> > For x86/x86_64 split stack is implemented using a field inside the
> > pthread struct which is accessed via %fs (or %gs for x86_64) and an
> > offset.
> >
> > Currently this offset is defined at 0x30 (0x70 for x86_64) by the
> > TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP
> > is defined (see gcc/config/i386/gnu-user.h or
> > gcc/config/i386/gnu-user64.h).
> >
> > As far as I know musl does not support stack protection, but we could
> > at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when
> > using musl.
> >
> > We also need to reserve a field in the musl pthread struct. There are
> > currently two fields named 'unused1' and 'unused2' but I'm not sure
> > they're really unused in every supported arch.
> >
> >
> > BTW, I'd like to work on a more integrated support of split stack in MUSL :

I'm not a fan of split-stack for various reasons, but I have no
objection to adding support to make it work as long as it's an
optional feature that does not impair non-split-stack usage.

> > 1) support in dynamic linker (see the last point of
> > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in
> > shared libs (and program ?)

It could be done, but is it really useful? There are infinitely many
ways you can crash a program with libraries that were not built
correctly for use with it. Checking for one of them seems like
gratuitous complexity with little benefit.

> > 2) support in thread implementation : currently when a thread is
> > created the stack limit is set afterward (see
> > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c
> > and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S)
> > and the stack size is supposed to be 16K (which is the minimum stack
> > size). This means we may reallocate a new stack chunk even if the
> > previous one (the first one) is not fully used.
> > If stack limit is set by thread implementation, this can be set
> > appropriately according to the stack size defined by the thread
> > creator.

That's perfectly reasonable to support.

> > 3) more optimizations I haven't thought about yet...
> >
> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
> to appropriate functions (at least all functions called before
> pthread_self_init because %gs or %fs register is unusable before this
> call).

This is definitely not desirable, at least not by default. It hurts
performance, possibly a lot, and destroys async-signal-safety. Also I
doubt it's needed. As long as split stack mode leaves at least ~8k
when calling a new function, most if not all functions in musl should
run fine without needing support for enlarging the stack.

> 5) set main thread stack limit to 0 (pthread_self_init) : the main
> thread stack grow is handled by the kernel.
> 
> 6) add no-split-stack note to every asm file.

I'm against this, or any boilerplate clutter. If it's really needed,
it should be possible with CFLAGS (or "ASFLAGS"), rather than
modifying every file, and if there's no way to do it with command line
options, that's a bug in gas.

With that said, why would it be needed? I don't think there are any
asm files that use more than 32 bytes of stack...

> 7) make split stack support optional (either by checking the
> -fsplit-stack option in CFLAGS or with a specific option :
> --enable-split-stack) : split stack adds overhead to every functions
> (except for those with the 'no_split_stack' attribute).
> 
> > Do you have any concern about adding those features in musl ?

Basically, the whole idea of split-stack is antithetical to the QoI
guarantees of musl. A program using split-stack can crash at any time
due to out-of-memory, and there is no reliable/portable way to recover
from this condition. It's much like the following low-quality aspects
of glibc and default Linux config:

- overcommit
- lazy allocation of libc-internal storage
- lazy/on-demand allocation of TLS
- dynamic loading of libgcc_s.so at runtime in pthread_cancel
- etc.

On 64-bit machines, split-stack is 100% useless. You can get the same
behavior (crashing on OOM, but not having to know your stack size
ahead of time) by just turning on overcommit and using huge thread
stack sizes; the enormous 64-bit virtual address space makes it so you
don't have to worry about running out of virtual memory.

On 32-bit machines where virtual addresses are a precious resource,
split-stack is a clever hack that essentially allows you to
over-commit not just physical memory but virtual memory too. But it's
inherently non-robust, and even worse than physical memory overcommit.
At least in the latter case, the kernel can be intelligent about
choosing an "abusive" process to kill. But if you run out of virtual
memory, nothing can be done but terminating the whole process (you
can't just terminate a single thread because it will leave resources
in an inconsistent state).

As such, I'm willing to add whatever inexpensive support framework is
needed so that people who want to use split-stack can use it, but I'm
very wary of invasive or costly changes to support a feature which I
believe is fundamentally misguided (and, for 64-bit targets, utterly
useless).

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.