Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180525145059.GG4418@port70.net>
Date: Fri, 25 May 2018 16:50:59 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Subject: Re: TLS issue on aarch64

* Phillip Berndt <phillip.berndt@...glemail.com> [2018-05-25 14:40:14 +0200]:
> I'm experiencing a TLS-related error with musl on aarch64. This is my
> test program:
> 
> ----8<--------------
> #include <stdio.h>
> 
> __thread int foo = 1234;
> __thread int bar __attribute__((aligned(0x100))) = 5678;
> 
> int main(int argc, char **argv)
> {
>     printf("0x%p: %d\n", &foo, foo);
>     printf("0x%p: %d\n", &bar, bar);
> 
>     return 0;
> }
> ---->8---------------
> 
> I'm compiling this into a static binary with -O2 -static. With glibc and
> gcc 5.4.0 (tried with 7.2 as well), this gives me the expected output. With
> musl libc 1.1.19, I instead see
> 
> ----8<--------------
> 0x0x7fa7adf2f0: 0
> 0x0x7fa7adf3f0: 0
> ---->8---------------
> 
> Note that this is the wrong address (not aligned), and that the memory has
> unexpected content as well.
> 
> I did some initial debugging, but now I'm stuck and need some help. What I've
> found so far:
> 
> * GCC apparently emits code that expects the tpidr_el0 register to contain a
>   pointer to the TLS memory, and it expects that the loader unconditionally
>   offsets the first variable by the TLS alignment into said memory:
> 
>   Disassembly of the code that loads &foo:
>   ----8<--------------
>   4001a4:       d53bd053        mrs     x19, tpidr_el0
>   4001a8:       91400273        add     x19, x19, #0x0, lsl #12
>   4001ac:       91040273        add     x19, x19, #0x100
>   ----8<--------------
> 
>   (If I align the variable by 0x1000 instead then the code changes
>    acoordingly.)
> 
> * Musl, on the other hand, in __copy_tls, initializes tpidr_el0 with a
>   pointer 16 bytes from the end of struct pthread, and copies the TLS
>   initializer code directly behind that struct, without adding extra
>   padding.
> 


yeah i think on aarch64 (which is TLS_ABOVE_TP) musl expects a layout

  -------------------------- ----------- - -
 |  struct pthread | 16     |       tlsvar
  -------------------------- ----------- - -
 ^                 ^         <----->
 self              tp         tls offset
 (aligned)

while it should be

  -------------------------- ----------- - -
 |  struct pthread | 16     |       tlsvar
  -------------------------- ----------- - -
 ^                 ^<-------------->
 self              tp    tls offset
                   (aligned)

i think the constraints for tp are:

- tp must be aligned to 'tls_align'

- tp must be at a small fixed offset from the end
of pthread struct (so asm code can access the dtv)

- tp + off must be usable memory for tls for off >= 16
(this is aarch64 specific)

i'm not yet sure what's the best fix.

> Hence the code tries to access the TLS variables at the wrong location.
> 
> The following patch fixes the issue, but only if musl is then compiled with
> optimizations off. With optimizations, the compiler emits the *same* code for
> both variants. Also, the patch probably has some unexpected side-effects, too -
> I'm just adding it here as a starting point for further debugging.
> 
> Any help is greatly appreciated :-)
> 
> - Phillip
> 
> ----
> 
> diff --git a/arch/aarch64/pthread_arch.h b/arch/aarch64/pthread_arch.h
> index b2e2d8f..c69f6f1 100644
> --- a/arch/aarch64/pthread_arch.h
> +++ b/arch/aarch64/pthread_arch.h
> @@ -2,10 +2,10 @@ static inline struct pthread *__pthread_self()
>  {
>         char *self;
>         __asm__ __volatile__ ("mrs %0,tpidr_el0" : "=r"(self));
> -       return (void*)(self + 16 - sizeof(struct pthread));
> +       return (void*)(self - sizeof(struct pthread));
>  }
> 
>  #define TLS_ABOVE_TP
> -#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread) - 16)
> +#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread))
> 

this is ok, but wastes 16 bytes after the pthread struct
where will be no tls data (and i guess the allocated tls
size should be adjusted accordingly as well as tlsdesc.s).

>  #define MC_PC pc
> diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
> index b125eb1..3a3c307 100644
> --- a/src/env/__init_tls.c
> +++ b/src/env/__init_tls.c
> @@ -42,7 +42,7 @@ void *__copy_tls(unsigned char *mem)
> 
>         mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1);
>         td = (pthread_t)mem;
> -       mem += sizeof(struct pthread);
> +       mem += sizeof(struct pthread) + libc.tls_align;
> 

i don't think this works with the above TP_ADJ, but
yes something has to be done differently here to make
aarch64 happy.

>         for (i=1, p=libc.tls_head; p; i++, p=p->next) {
>                 dtv[i] = mem + p->offset;

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.