Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 26 May 2018 02:54:16 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: musl@...ts.openwall.com
Subject: Re: TLS issue on aarch64

* Phillip Berndt <phillip.berndt@...glemail.com> [2018-05-26 00:20:04 +0200]:
> 2018-05-25 16:50 GMT+02:00 Szabolcs Nagy <nsz@...t70.net>:
> > i think the constraints for tp are:
> >
> > - tp must be aligned to 'tls_align'
> >
> > - tp must be at a small fixed offset from the end
> > of pthread struct (so asm code can access the dtv)
> >
> > - tp + off must be usable memory for tls for off >= 16
> > (this is aarch64 specific)
> >
> 
> Hmm.. but these constraints do not explain the extra offset of one
> alignment I'm seeing in the GCC output, do they? If I compile a

tp must be aligned and tp + offset must be aligned too,
but offset >= 16 has to hold.

> program with a single TLS variable with
> __attribute__((aligned(n)) that does nothing but try to reference and
> print said variable, I get the
> following assembler code from GCC:
> 
> For n = 0x1000:
> 
>   400194:       d53bd041        mrs     x1, tpidr_el0
>   400198:       b0000040        adrp    x0, 409000 <__subtf3+0xd18>
>   40019c:       91400421        add     x1, x1, #0x1, lsl #12
>   4001a0:       91000021        add     x1, x1, #0x0
> 
> 
> For n = 0x100:
> 
>   400194:       d53bd041        mrs     x1, tpidr_el0
>   400198:       b0000040        adrp    x0, 409000 <__subtf3+0xd18>
>   40019c:       91400021        add     x1, x1, #0x0, lsl #12
>   4001a0:       91040021        add     x1, x1, #0x100
> 
> For n = 0x10:
> 
>   400194:       d53bd041        mrs     x1, tpidr_el0
>   400198:       b0000040        adrp    x0, 409000 <__subtf3+0xd18>
>   40019c:       91400021        add     x1, x1, #0x0, lsl #12
>   4001a0:       91004021        add     x1, x1, #0x10
> 
> That's how I came up with the mem += libc.tls_align hack in the first place.
> 

indeed you need another alignment there, i came up with the
following fix:

(on mips/ppc i expect it not to change anything: tp is
at a page aligned offset from the end of struct pthread,
so one alignment is enough there, but on aarch64/arm/sh4
this makes a difference, and seems to pass my simple tests)

diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
index 1c5d98a0..8e70024d 100644
--- a/src/env/__init_tls.c
+++ b/src/env/__init_tls.c
@@ -41,9 +41,12 @@ void *__copy_tls(unsigned char *mem)
 #ifdef TLS_ABOVE_TP
 	dtv = (void **)(mem + libc.tls_size) - (libc.tls_cnt + 1);
 
-	mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1);
+	/* Ensure TP is aligned.  */
+	mem += -(uintptr_t)TP_ADJ(mem) & (libc.tls_align-1);
 	td = (pthread_t)mem;
 	mem += sizeof(struct pthread);
+	/* Ensure TLS is aligned after struct pthread.  */
+	mem += -(uintptr_t)mem & (libc.tls_align-1);
 
 	for (i=1, p=libc.tls_head; p; i++, p=p->next) {
 		dtv[i] = mem + p->offset;

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.