|
Message-ID: <CAH9TF6Nm1asGtCx9OquUbpA6D3scipvbZ8m-Y2ddJVx2+R3wOg@mail.gmail.com> Date: Sun, 22 Dec 2024 14:53:33 +0100 From: Alex Rønne Petersen <alex@...xrp.com> To: Rich Felker <dalias@...c.org> Cc: Fangrui Song <i@...kray.me>, musl@...ts.openwall.com, Alexander Monakov <amonakov@...ras.ru> Subject: Re: [PATCH] s390x: Mark __tls_get_addr hidden before invoking it. On Sun, Dec 22, 2024 at 2:23 PM Rich Felker <dalias@...c.org> wrote: > > On Fri, Dec 13, 2024 at 08:04:22PM +0100, Alex Rønne Petersen wrote: > > On Fri, Dec 13, 2024 at 12:18 PM Rich Felker <dalias@...c.org> wrote: > > > > > > On Thu, Dec 12, 2024 at 06:45:39PM +0100, Alex Rønne Petersen wrote: > > > > On Sat, Nov 30, 2024 at 6:51 PM Fangrui Song <i@...kray.me> wrote: > > > > > (I am not versed in s390x assembly, but I have some notes about __tls_get_offset > > > > > > > > > > https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model > > > > > > > > > > The 32-bit ABI had to use __tls_get_offset because some nice > > > > > general-instructions-extension was unavailable when the ABI was > > > > > codified. > > > > > The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate.. > > > > > > > > From your notes, it sounds like __tls_get_addr has to be hidden, even > > > > if we don't actually make use of it in __tls_get_offset. Is my > > > > understanding correct? > > > > > > > > If yes, what would be the preferred way to achieve this in musl? > > > > > > There is no requirement for a symbol to be hidden unless it violates > > > namespace, which is not the case here. The problem is that the code in > > > __tls_get_offset is performing a call to __tls_get_addr in a manner > > > that's not valid unless the call target is local. > > > > > > My preferred fix would be getting rid of the call and inlining > > > __tls_get_addr into __tls_get_offset. This was not possible back when > > > the port was added because __tls_get_addr had a complex code path for > > > installing new TLS on first-access. That was changed long ago, so now > > > it's a fairly trivial instruction sequence. > > > > Before I send a patch, just to confirm, is this what you have in mind? > > > > .global __tls_get_offset > > .type __tls_get_offset,%function > > __tls_get_offset: > > stmg %r14, %r15, 112(%r15) > > aghi %r15, -160 > > > > ear %r0, %a0 > > sllg %r0, %r0, 32 > > ear %r0, %a1 > > > > la %r1, 0(%r2, %r12) > > > > lg %r3, 0(%r1) > > sllg %r4, %r3, 3 > > lg %r5, 8(%r0) > > lg %r2, 0(%r4, %r5) > > ag %r2, 8(%r1) > > sgr %r2, %r0 > > > > lmg %r14, %r15, 272(%r15) > > br %r14 > > I'm not clear on what you're setting up the stack frame (munging r14 > and r15) for. My disasm of existing __tls_get_addr doesn't do that, > and it doesn't seem to be useful for a leaf function -- or desirable > for one that's a critical hot path. That was already there in the __tls_get_offset asm. I wasn't sure if removing it is fine ABI-wise. If it is, then I'll definitely do so. > > The 3 lines before loading the actual argument from r2+r12 also look > wrong. I don't understand s390x asm very well but I don't see how they > could do anything meaningful there. %r0 is used twice: Once in the code inlined from __tls_get_addr and once more to subtract the thread pointer before return as you mention below. I guess it might be a bit confusing in a side-by-side comparison because I changed the register number usage to be (IMO) more understandable when just reading the new __tls_get_offset in a vacuum (vs inlining __tls_get_addr's register number usage unchanged). > > The disasm I have is: > > 0000000000000000 <__tls_get_addr>: > 0: e3 10 20 00 00 04 lg %r1,0(%r2) > 6: eb 11 00 03 00 0d sllg %r1,%r1,3 > c: b2 4f 00 30 ear %r3,%a0 > 10: eb 33 00 20 00 0d sllg %r3,%r3,32 > 16: b2 4f 00 31 ear %r3,%a1 > 1a: e3 30 30 08 00 04 lg %r3,8(%r3) > 20: e3 11 30 00 00 04 lg %r1,0(%r1,%r3) > 26: e3 10 20 08 00 08 ag %r1,8(%r2) > 2c: b9 04 00 21 lgr %r2,%r1 > 30: 07 fe br %r14 > > and AIUI the only changes that should be made are adding at the > beginning: > > la %r2,0(%r2,%r12) > > and subtracting off the thread pointer before return. (This latter > part wouldn't be needed if we switch to storing tp-rel addressed in > dtv, but that would be a separate change.) > > Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.