Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH9TF6Nm1asGtCx9OquUbpA6D3scipvbZ8m-Y2ddJVx2+R3wOg@mail.gmail.com>
Date: Sun, 22 Dec 2024 14:53:33 +0100
From: Alex Rønne Petersen <alex@...xrp.com>
To: Rich Felker <dalias@...c.org>
Cc: Fangrui Song <i@...kray.me>, musl@...ts.openwall.com, 
	Alexander Monakov <amonakov@...ras.ru>
Subject: Re: [PATCH] s390x: Mark __tls_get_addr hidden before invoking it.

On Sun, Dec 22, 2024 at 2:23 PM Rich Felker <dalias@...c.org> wrote:
>
> On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote:
> > On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@...c.org> wrote:
> > >
> > > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@...kray.me> wrote:
> > > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > > > >
> > > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > > > >
> > > > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > > > general-instructions-extension was unavailable when the ABI was
> > > > > codified.
> > > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> > > >
> > > > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > > > if we don't actually make use of it in __tls_get_offset. Is my
> > > > understanding correct?
> > > >
> > > > If yes, what would be the preferred way to achieve this in musl?
> > >
> > > There is no requirement for a symbol to be hidden unless it violates
> > > namespace, which is not the case here. The problem is that the code in
> > > __tls_get_offset is performing a call to __tls_get_addr in a manner
> > > that's not valid unless the call target is local.
> > >
> > > My preferred fix would be getting rid of the call and inlining
> > > __tls_get_addr into __tls_get_offset. This was not possible back when
> > > the port was added because __tls_get_addr had a complex code path for
> > > installing new TLS on first-access. That was changed long ago, so now
> > > it's a fairly trivial instruction sequence.
> >
> > Before I send a patch, just to confirm, is this what you have in mind?
> >
> >         .global __tls_get_offset
> >         .type __tls_get_offset,%function
> > __tls_get_offset:
> >         stmg  %r14, %r15, 112(%r15)
> >         aghi  %r15, -160
> >
> >         ear   %r0, %a0
> >         sllg  %r0, %r0, 32
> >         ear   %r0, %a1
> >
> >         la    %r1, 0(%r2, %r12)
> >
> >         lg    %r3, 0(%r1)
> >         sllg  %r4, %r3, 3
> >         lg    %r5, 8(%r0)
> >         lg    %r2, 0(%r4, %r5)
> >         ag    %r2, 8(%r1)
> >         sgr   %r2, %r0
> >
> >         lmg   %r14, %r15, 272(%r15)
> >         br    %r14
>
> I'm not clear on what you're setting up the stack frame (munging r14
> and r15) for. My disasm of existing __tls_get_addr doesn't do that,
> and it doesn't seem to be useful for a leaf function -- or desirable
> for one that's a critical hot path.

That was already there in the __tls_get_offset asm. I wasn't sure if
removing it is fine ABI-wise. If it is, then I'll definitely do so.

>
> The 3 lines before loading the actual argument from r2+r12 also look
> wrong. I don't understand s390x asm very well but I don't see how they
> could do anything meaningful there.

%r0 is used twice: Once in the code inlined from __tls_get_addr and
once more to subtract the thread pointer before return as you mention
below.

I guess it might be a bit confusing in a side-by-side comparison
because I changed the register number usage to be (IMO) more
understandable when just reading the new __tls_get_offset in a vacuum
(vs inlining __tls_get_addr's register number usage unchanged).

>
> The disasm I have is:
>
> 0000000000000000 <__tls_get_addr>:
>    0:   e3 10 20 00 00 04       lg      %r1,0(%r2)
>    6:   eb 11 00 03 00 0d       sllg    %r1,%r1,3
>    c:   b2 4f 00 30             ear     %r3,%a0
>   10:   eb 33 00 20 00 0d       sllg    %r3,%r3,32
>   16:   b2 4f 00 31             ear     %r3,%a1
>   1a:   e3 30 30 08 00 04       lg      %r3,8(%r3)
>   20:   e3 11 30 00 00 04       lg      %r1,0(%r1,%r3)
>   26:   e3 10 20 08 00 08       ag      %r1,8(%r2)
>   2c:   b9 04 00 21             lgr     %r2,%r1
>   30:   07 fe                   br      %r14
>
> and AIUI the only changes that should be made are adding at the
> beginning:
>
>         la      %r2,0(%r2,%r12)
>
> and subtracting off the thread pointer before return. (This latter
> part wouldn't be needed if we switch to storing tp-rel addressed in
> dtv, but that would be a separate change.)
>
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.