Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241222132331.GZ10433@brightrain.aerifal.cx>
Date: Sun, 22 Dec 2024 08:23:31 -0500
From: Rich Felker <dalias@...c.org>
To: Alex Rønne Petersen <alex@...xrp.com>
Cc: Fangrui Song <i@...kray.me>, musl@...ts.openwall.com,
	Alexander Monakov <amonakov@...ras.ru>
Subject: Re: [PATCH] s390x: Mark __tls_get_addr hidden before invoking
 it.

On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote:
> On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@...c.org> wrote:
> >
> > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@...kray.me> wrote:
> > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > > >
> > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > > >
> > > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > > general-instructions-extension was unavailable when the ABI was
> > > > codified.
> > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> > >
> > > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > > if we don't actually make use of it in __tls_get_offset. Is my
> > > understanding correct?
> > >
> > > If yes, what would be the preferred way to achieve this in musl?
> >
> > There is no requirement for a symbol to be hidden unless it violates
> > namespace, which is not the case here. The problem is that the code in
> > __tls_get_offset is performing a call to __tls_get_addr in a manner
> > that's not valid unless the call target is local.
> >
> > My preferred fix would be getting rid of the call and inlining
> > __tls_get_addr into __tls_get_offset. This was not possible back when
> > the port was added because __tls_get_addr had a complex code path for
> > installing new TLS on first-access. That was changed long ago, so now
> > it's a fairly trivial instruction sequence.
> 
> Before I send a patch, just to confirm, is this what you have in mind?
> 
>         .global __tls_get_offset
>         .type __tls_get_offset,%function
> __tls_get_offset:
>         stmg  %r14, %r15, 112(%r15)
>         aghi  %r15, -160
> 
>         ear   %r0, %a0
>         sllg  %r0, %r0, 32
>         ear   %r0, %a1
> 
>         la    %r1, 0(%r2, %r12)
> 
>         lg    %r3, 0(%r1)
>         sllg  %r4, %r3, 3
>         lg    %r5, 8(%r0)
>         lg    %r2, 0(%r4, %r5)
>         ag    %r2, 8(%r1)
>         sgr   %r2, %r0
> 
>         lmg   %r14, %r15, 272(%r15)
>         br    %r14

I'm not clear on what you're setting up the stack frame (munging r14
and r15) for. My disasm of existing __tls_get_addr doesn't do that,
and it doesn't seem to be useful for a leaf function -- or desirable
for one that's a critical hot path.

The 3 lines before loading the actual argument from r2+r12 also look
wrong. I don't understand s390x asm very well but I don't see how they
could do anything meaningful there.

The disasm I have is:

0000000000000000 <__tls_get_addr>:
   0:	e3 10 20 00 00 04 	lg	%r1,0(%r2)
   6:	eb 11 00 03 00 0d 	sllg	%r1,%r1,3
   c:	b2 4f 00 30       	ear	%r3,%a0
  10:	eb 33 00 20 00 0d 	sllg	%r3,%r3,32
  16:	b2 4f 00 31       	ear	%r3,%a1
  1a:	e3 30 30 08 00 04 	lg	%r3,8(%r3)
  20:	e3 11 30 00 00 04 	lg	%r1,0(%r1,%r3)
  26:	e3 10 20 08 00 08 	ag	%r1,8(%r2)
  2c:	b9 04 00 21       	lgr	%r2,%r1
  30:	07 fe             	br	%r14

and AIUI the only changes that should be made are adding at the
beginning:

	la	%r2,0(%r2,%r12)

and subtracting off the thread pointer before return. (This latter
part wouldn't be needed if we switch to storing tp-rel addressed in
dtv, but that would be a separate change.)

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.