|
Message-ID: <20211208153657.GU7074@brightrain.aerifal.cx> Date: Wed, 8 Dec 2021 10:36:57 -0500 From: Rich Felker <dalias@...c.org> To: Stijn Tintel <stijn@...ux-ipv6.be> Cc: musl@...ts.openwall.com Subject: Re: [PATCH] ppc64: check for AltiVec in setjmp/longjmp On Wed, Dec 08, 2021 at 08:37:13AM -0500, Rich Felker wrote: > On Wed, Dec 08, 2021 at 10:43:05AM +0200, Stijn Tintel wrote: > > On 7/12/2021 02:59, Rich Felker wrote: > > > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wrote: > > >> * Stijn Tintel: > > >> > > >>> diff --git a/src/setjmp/powerpc64/setjmp.s b/src/setjmp/powerpc64/setjmp.s > > >>> index 37683fda..32853693 100644 > > >>> --- a/src/setjmp/powerpc64/setjmp.s > > >>> +++ b/src/setjmp/powerpc64/setjmp.s > > >>> @@ -69,7 +69,17 @@ __setjmp_toc: > > >>> stfd 30, 38*8(3) > > >>> stfd 31, 39*8(3) > > >>> > > >>> - # 5) store vector registers v20-v31 > > >>> + # 5) store vector registers v20-v31 if hardware supports AltiVec > > >>> + mflr 0 > > >>> + bl 1f > > >>> + .hidden __hwcap > > >>> + .long __hwcap-. > > >>> +1: mflr 4 > > >> This de-balances the return stack and probably has quite severe > > >> performance impact. The ISA manual says to use > > >> > > >> bcl 20,31,$+4 > > >> > > >> and you'll have to store the __hwcap offset somewhere else. > > > To begin with, let's change the .s files to .S files and put the whole > > > branch logic inside #ifndef __ALTIVEC__ so that it does not impact > > > normal builds with an ISA level where Altivec can be assumed to be > > > present. > > > > > > I'm not sufficiently familiar with the PowerPC ISA to know how bcl > > > works, but if there's a less expensive solution along those lines > > > that's compatible with all ISA levels, by all means let's use it. The > > > same could be done for powerpc-sf (32-bit) and its SPE branches, too. > > > > > > Also the add and lwz can be used into lwzx (indexed load). > > > > > The code for ppc64 uses ld after add, not lwz. This is required to make > > it work on both big and little endian systems. We therefore cannot use > > lwzx, but have to use ldx. > > OK, I don't understand why endianness would matter, but I do see a > problem here: ld expects to load a 64-bit value, but the value is only > 32-bit (.long). Unless I'm missing something, we need to either make > it 64-bit (.llong, and with proper alignment) or use a sign-extending > 32-bit load. The latter would assume a model where the whole program > (for static linking) or libc.so (for dynamic) fits in ±2GB. This is > clearly valid for dynamic but dubious for static (although maybe GCC > already assumes this with how it loads the GOT address and DSO-local > globals?). OK, I see now -- I was mixing up the load of __hwcap (which is necessarily 64-bit) and the load of the relative address constant (which could be done either way). The comment about lwzx not being appropriate was just because __hwcap is 64-bit not 32-bit, and that's fixed by using ldx. But I'm still unclear on whether we should use a full 64-bit relative address constant or a 32-bit one like we've been using (which assumes everything in ±2GB). Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.