|
Message-ID: <20180525145059.GG4418@port70.net> Date: Fri, 25 May 2018 16:50:59 +0200 From: Szabolcs Nagy <nsz@...t70.net> To: musl@...ts.openwall.com Subject: Re: TLS issue on aarch64 * Phillip Berndt <phillip.berndt@...glemail.com> [2018-05-25 14:40:14 +0200]: > I'm experiencing a TLS-related error with musl on aarch64. This is my > test program: > > ----8<-------------- > #include <stdio.h> > > __thread int foo = 1234; > __thread int bar __attribute__((aligned(0x100))) = 5678; > > int main(int argc, char **argv) > { > printf("0x%p: %d\n", &foo, foo); > printf("0x%p: %d\n", &bar, bar); > > return 0; > } > ---->8--------------- > > I'm compiling this into a static binary with -O2 -static. With glibc and > gcc 5.4.0 (tried with 7.2 as well), this gives me the expected output. With > musl libc 1.1.19, I instead see > > ----8<-------------- > 0x0x7fa7adf2f0: 0 > 0x0x7fa7adf3f0: 0 > ---->8--------------- > > Note that this is the wrong address (not aligned), and that the memory has > unexpected content as well. > > I did some initial debugging, but now I'm stuck and need some help. What I've > found so far: > > * GCC apparently emits code that expects the tpidr_el0 register to contain a > pointer to the TLS memory, and it expects that the loader unconditionally > offsets the first variable by the TLS alignment into said memory: > > Disassembly of the code that loads &foo: > ----8<-------------- > 4001a4: d53bd053 mrs x19, tpidr_el0 > 4001a8: 91400273 add x19, x19, #0x0, lsl #12 > 4001ac: 91040273 add x19, x19, #0x100 > ----8<-------------- > > (If I align the variable by 0x1000 instead then the code changes > acoordingly.) > > * Musl, on the other hand, in __copy_tls, initializes tpidr_el0 with a > pointer 16 bytes from the end of struct pthread, and copies the TLS > initializer code directly behind that struct, without adding extra > padding. > yeah i think on aarch64 (which is TLS_ABOVE_TP) musl expects a layout -------------------------- ----------- - - | struct pthread | 16 | tlsvar -------------------------- ----------- - - ^ ^ <-----> self tp tls offset (aligned) while it should be -------------------------- ----------- - - | struct pthread | 16 | tlsvar -------------------------- ----------- - - ^ ^<--------------> self tp tls offset (aligned) i think the constraints for tp are: - tp must be aligned to 'tls_align' - tp must be at a small fixed offset from the end of pthread struct (so asm code can access the dtv) - tp + off must be usable memory for tls for off >= 16 (this is aarch64 specific) i'm not yet sure what's the best fix. > Hence the code tries to access the TLS variables at the wrong location. > > The following patch fixes the issue, but only if musl is then compiled with > optimizations off. With optimizations, the compiler emits the *same* code for > both variants. Also, the patch probably has some unexpected side-effects, too - > I'm just adding it here as a starting point for further debugging. > > Any help is greatly appreciated :-) > > - Phillip > > ---- > > diff --git a/arch/aarch64/pthread_arch.h b/arch/aarch64/pthread_arch.h > index b2e2d8f..c69f6f1 100644 > --- a/arch/aarch64/pthread_arch.h > +++ b/arch/aarch64/pthread_arch.h > @@ -2,10 +2,10 @@ static inline struct pthread *__pthread_self() > { > char *self; > __asm__ __volatile__ ("mrs %0,tpidr_el0" : "=r"(self)); > - return (void*)(self + 16 - sizeof(struct pthread)); > + return (void*)(self - sizeof(struct pthread)); > } > > #define TLS_ABOVE_TP > -#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread) - 16) > +#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread)) > this is ok, but wastes 16 bytes after the pthread struct where will be no tls data (and i guess the allocated tls size should be adjusted accordingly as well as tlsdesc.s). > #define MC_PC pc > diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c > index b125eb1..3a3c307 100644 > --- a/src/env/__init_tls.c > +++ b/src/env/__init_tls.c > @@ -42,7 +42,7 @@ void *__copy_tls(unsigned char *mem) > > mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1); > td = (pthread_t)mem; > - mem += sizeof(struct pthread); > + mem += sizeof(struct pthread) + libc.tls_align; > i don't think this works with the above TP_ADJ, but yes something has to be done differently here to make aarch64 happy. > for (i=1, p=libc.tls_head; p; i++, p=p->next) { > dtv[i] = mem + p->offset;
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.