|
Message-ID: <CAN30aBGKpn_KMe+zn_OTb6RhDyA5hXH+yg_BmKdh2VDc8znO9Q@mail.gmail.com> Date: Sat, 30 Nov 2024 09:51:24 -0800 From: Fangrui Song <i@...kray.me> To: musl@...ts.openwall.com Cc: Alex Rønne Petersen <alex@...xrp.com>, Alexander Monakov <amonakov@...ras.ru> Subject: Re: [PATCH] s390x: Mark __tls_get_addr hidden before invoking it. On Fri, Nov 29, 2024 at 7:20 PM Rich Felker <dalias@...c.org> wrote: > > On Fri, Nov 29, 2024 at 08:49:00PM +0100, Alex Rønne Petersen wrote: > > On Fri, Nov 29, 2024 at 2:48 PM Rich Felker <dalias@...c.org> wrote: > > > > > > On Sat, Nov 23, 2024 at 01:57:16PM +0100, Alex Rønne Petersen wrote: > > > > On Sat, Nov 23, 2024 at 1:36 PM Alexander Monakov <amonakov@...ras.ru> wrote: > > > > > > > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote: > > > > > > > > > > > On Sat, Nov 23, 2024 at 9:30 AM Alexander Monakov <amonakov@...ras.ru> wrote: > > > > > > > > > > > > > > On Sat, 23 Nov 2024, Alex Rønne Petersen wrote: > > > > > > > > > > > > > > > Similar to what's done for __syscall_ret, __sigsetjmp_tail, etc.. This fixes a > > > > > > > > linker error when building musl libc.so with zig cc. > > > > > > > > > > > > > > Hm, on s390 __tls_get_addr is not used for TLS ABI, so it's fine that it ends up > > > > > > > hidden in libc.so. Unusual. > > > > > > > > > > > > > > (linkers must take the most restrictive visibility from all mentions of a symbol) > > > > > > > > > > > > > > I'm curious, what kind of error with zig cc were you seeing? > > > > > > > > > > > > This: > > > > > > > > > > > > ld.lld: error: relocation R_390_PC32DBL cannot be used against symbol > > > > > > '__tls_get_addr'; recompile with -fPIC > > > > > > >>> defined in obj/src/thread/__tls_get_addr.lo > > > > > > >>> referenced by __tls_get_offset.s:8 (src/thread/s390x/__tls_get_offset.s:8) > > > > > > >>> obj/src/thread/s390x/__tls_get_offset.lo:(.text+0x10) > > > > > > > > > > > > (-fPIC is actually in use.) > > > > > > > > > > > > Presumably this could be fixed in lld, considering GNU ld seems fine > > > > > > with it. But I figured that, since glibc also marks __tls_get_addr > > > > > > hidden for s390x, musl should probably just do the same anyway. > > > > > > > > > > I see, thanks. Your commit message was confusing to me, because unlike > > > > > __syscall_ret and the like, __tls_get_addr is not an internal helper, > > > > > it may not have hidden visibility anywhere except s390. So it felt like > > > > > the commit message was drawing a false parallel. > > > > > > > > > > I would love this to land with a clearer commit message, but that's up > > > > > to Rich and yourself to sort out. > > > > > > > > Yeah, I think that's fair. I wrote the commit message before I > > > > actually investigated in detail how __tls_get_addr is supposed to be > > > > handled for s390x. > > > > > > > > Should I re-send the patch with an updated commit message, or how is > > > > this usually handled? > > > > > > While s390x doesn't need __tls_get_addr to be a public symbol, I'd > > > kinda prefer not to have an arch-specific hack to make it hidden. > > > Looking at the code, it's got to be significantly gratuitously slow > > > having __tls_get_offset making a second function call to > > > __tls_get_addr, setting up a stack frame and all. > > > > > > The __tls_get_offset code dates back to 2016 when it was actually > > > necessary to call into C code in case new TLS needed to be installed. > > > Since 2019 (9d44b6460a) that's not necessary, so I think we could just > > > open code the asm for __tls_get_offset entirely and have it be > > > decently fast. > > > > That sounds reasonable. I don't have a ton of experience with writing > > s390x assembly, though. I can do the obvious thing and extract the > > compiled logic from __tls_get_addr without the calling convention > > fluff. Would that be sufficient? > > That's what I was looking at doing. Basically just compiling a > modified version of __tls_get_addr that subtracts the thread pointer, > then prepending the code to load the index address from the GOT > pointer argument in r12. > > A further optimization later could be storing the address with tp > pre-subtracted in the dtv. This would also be optimal for archs with > TLSDESC support, at the expense of an extra addition in legacy > __tls_get_addr access. On some archs it may even save a temp register > in the TLSDESC function. > > Rich (I am not versed in s390x assembly, but I have some notes about __tls_get_offset https://maskray.me/blog/2024-02-11-toolchain-notes-on-z-architecture#general-dynamic-tls-model The 32-bit ABI had to use __tls_get_offset because some nice general-instructions-extension was unavailable when the ABI was codified. The 64-bit ABI following the 32-bit __tls_get_offset was just unfortunate.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.