Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20241223070858.GB10433@brightrain.aerifal.cx>
Date: Mon, 23 Dec 2024 02:08:58 -0500
From: Rich Felker <dalias@...c.org>
To: Alex Rønne Petersen <alex@...xrp.com>
Cc: Fangrui Song <i@...kray.me>, musl@...ts.openwall.com,
	Alexander Monakov <amonakov@...ras.ru>
Subject: Re: [PATCH] s390x: Mark __tls_get_addr hidden before invoking
 it.

On Sun, Dec 22, 2024 at 09:14:25AM -0500, Rich Felker wrote:
> On Sun, Dec 22, 2024 at 02:53:33PM +0100, Alex Rønne Petersen wrote:
> > On Sun, Dec 22, 2024 at 2:23 PM Rich Felker <dalias@...c.org> wrote:
> > >
> > > On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote:
> > > > On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@...c.org> wrote:
> > > > >
> > > > > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote:
> > > > > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@...kray.me> wrote:
> > > > > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset
> > > > > > >
> > > > > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model
> > > > > > >
> > > > > > > The 32-bit ABI had to use __tls_get_offset because some nice
> > > > > > > general-instructions-extension was unavailable when the ABI was
> > > > > > > codified.
> > > > > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate..
> > > > > >
> > > > > > From your notes, it sounds like __tls_get_addr has to be hidden, even
> > > > > > if we don't actually make use of it in __tls_get_offset. Is my
> > > > > > understanding correct?
> > > > > >
> > > > > > If yes, what would be the preferred way to achieve this in musl?
> > > > >
> > > > > There is no requirement for a symbol to be hidden unless it violates
> > > > > namespace, which is not the case here. The problem is that the code in
> > > > > __tls_get_offset is performing a call to __tls_get_addr in a manner
> > > > > that's not valid unless the call target is local.
> > > > >
> > > > > My preferred fix would be getting rid of the call and inlining
> > > > > __tls_get_addr into __tls_get_offset. This was not possible back when
> > > > > the port was added because __tls_get_addr had a complex code path for
> > > > > installing new TLS on first-access. That was changed long ago, so now
> > > > > it's a fairly trivial instruction sequence.
> > > >
> > > > Before I send a patch, just to confirm, is this what you have in mind?
> > > >
> > > >         .global __tls_get_offset
> > > >         .type __tls_get_offset,%function
> > > > __tls_get_offset:
> > > >         stmg  %r14, %r15, 112(%r15)
> > > >         aghi  %r15, -160
> > > >
> > > >         ear   %r0, %a0
> > > >         sllg  %r0, %r0, 32
> > > >         ear   %r0, %a1
> > > >
> > > >         la    %r1, 0(%r2, %r12)
> > > >
> > > >         lg    %r3, 0(%r1)
> > > >         sllg  %r4, %r3, 3
> > > >         lg    %r5, 8(%r0)
> > > >         lg    %r2, 0(%r4, %r5)
> > > >         ag    %r2, 8(%r1)
> > > >         sgr   %r2, %r0
> > > >
> > > >         lmg   %r14, %r15, 272(%r15)
> > > >         br    %r14
> > >
> > > I'm not clear on what you're setting up the stack frame (munging r14
> > > and r15) for. My disasm of existing __tls_get_addr doesn't do that,
> > > and it doesn't seem to be useful for a leaf function -- or desirable
> > > for one that's a critical hot path.
> > 
> > That was already there in the __tls_get_offset asm. I wasn't sure if
> > removing it is fine ABI-wise. If it is, then I'll definitely do so.
> 
> Must be different gcc version or cflags...
> 
> > > The 3 lines before loading the actual argument from r2+r12 also look
> > > wrong. I don't understand s390x asm very well but I don't see how they
> > > could do anything meaningful there.
> > 
> > %r0 is used twice: Once in the code inlined from __tls_get_addr and
> > once more to subtract the thread pointer before return as you mention
> > below.
> 
> Ahh, that's just loading the thread pointer. I forgot the weird insn
> sequence for that. So I think it looks right.

Can you go ahead with a patch to do this, omitting the frame pointer
stuff? Also please let me know if you've tested it or if I should
request someone test before applying.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.