Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180527001706.GF1392@brightrain.aerifal.cx>
Date: Sat, 26 May 2018 20:17:06 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: TLS issue on aarch64

On Fri, May 25, 2018 at 04:50:59PM +0200, Szabolcs Nagy wrote:
> 
> yeah i think on aarch64 (which is TLS_ABOVE_TP) musl expects a layout
> 
>   -------------------------- ----------- - -
>  |  struct pthread | 16     |       tlsvar
>   -------------------------- ----------- - -
>  ^                 ^         <----->
>  self              tp         tls offset
>  (aligned)

I don't think this is correct; if so it's a bug. The intent is only
that musl expects the point where TLS begins (the beginning of your
tlsvar area above) to be aligned, tp to point 16 below it, and self to
point sizeof(struct pthread) below that.

> while it should be
> 
>   -------------------------- ----------- - -
>  |  struct pthread | 16     |       tlsvar
>   -------------------------- ----------- - -
>  ^                 ^<-------------->
>  self              tp    tls offset
>                    (aligned)

The "tls offset" described here (including the 16 reserved bytes)
doesn't seem to match the "tls offset" in the sense of offset from the
beginning of the PT_TLS image. That seems to be the core issue here.

> i think the constraints for tp are:
> 
> - tp must be aligned to 'tls_align'

I don't see this as an explicit requirement anywhere, but it falls out
implicitly if the linker assigns offsets beginning at
max(16,tls_align) (and adjusted by that much vs the PT_TLS segment).

> - tp must be at a small fixed offset from the end
> of pthread struct (so asm code can access the dtv)
> 
> - tp + off must be usable memory for tls for off >= 16
> (this is aarch64 specific)

The constraint is not "usable memory" but that it lines up with where
we're pasting the PT_TLS image.

> i'm not yet sure what's the best fix.
> 
> > Hence the code tries to access the TLS variables at the wrong location.
> > 
> > The following patch fixes the issue, but only if musl is then compiled with
> > optimizations off. With optimizations, the compiler emits the *same* code for
> > both variants. Also, the patch probably has some unexpected side-effects, too -
> > I'm just adding it here as a starting point for further debugging.
> > 
> > Any help is greatly appreciated :-)
> > 
> > - Phillip
> > 
> > ----
> > 
> > diff --git a/arch/aarch64/pthread_arch.h b/arch/aarch64/pthread_arch.h
> > index b2e2d8f..c69f6f1 100644
> > --- a/arch/aarch64/pthread_arch.h
> > +++ b/arch/aarch64/pthread_arch.h
> > @@ -2,10 +2,10 @@ static inline struct pthread *__pthread_self()
> >  {
> >         char *self;
> >         __asm__ __volatile__ ("mrs %0,tpidr_el0" : "=r"(self));
> > -       return (void*)(self + 16 - sizeof(struct pthread));
> > +       return (void*)(self - sizeof(struct pthread));
> >  }
> > 
> >  #define TLS_ABOVE_TP
> > -#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread) - 16)
> > +#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread))
> > 
> 
> this is ok, but wastes 16 bytes after the pthread struct
> where will be no tls data (and i guess the allocated tls
> size should be adjusted accordingly as well as tlsdesc.s).
> 
> >  #define MC_PC pc
> > diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
> > index b125eb1..3a3c307 100644
> > --- a/src/env/__init_tls.c
> > +++ b/src/env/__init_tls.c
> > @@ -42,7 +42,7 @@ void *__copy_tls(unsigned char *mem)
> > 
> >         mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1);
> >         td = (pthread_t)mem;
> > -       mem += sizeof(struct pthread);
> > +       mem += sizeof(struct pthread) + libc.tls_align;
> > 
> 
> i don't think this works with the above TP_ADJ, but
> yes something has to be done differently here to make
> aarch64 happy.

As noted later in the thread, the issue does not seem to be
aarch64-specific, but rather applies whenever the TLS ABI has some
reserved space between the thread pointer and beginning of TLS
(nonzero offset in TP_ADJ? not exactly since on mips, etc. the offset
is not reserved space but just accounting for ridiculous signed 16-bit
offsets).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.