Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211208133712.GT7074@brightrain.aerifal.cx>
Date: Wed, 8 Dec 2021 08:37:13 -0500
From: Rich Felker <dalias@...c.org>
To: Stijn Tintel <stijn@...ux-ipv6.be>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] ppc64: check for AltiVec in setjmp/longjmp

On Wed, Dec 08, 2021 at 10:43:05AM +0200, Stijn Tintel wrote:
> On 7/12/2021 02:59, Rich Felker wrote:
> > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wrote:
> >> * Stijn Tintel:
> >>
> >>> diff --git a/src/setjmp/powerpc64/setjmp.s b/src/setjmp/powerpc64/setjmp.s
> >>> index 37683fda..32853693 100644
> >>> --- a/src/setjmp/powerpc64/setjmp.s
> >>> +++ b/src/setjmp/powerpc64/setjmp.s
> >>> @@ -69,7 +69,17 @@ __setjmp_toc:
> >>>  	stfd 30, 38*8(3)
> >>>  	stfd 31, 39*8(3)
> >>>  
> >>> -	# 5) store vector registers v20-v31
> >>> +	# 5) store vector registers v20-v31 if hardware supports AltiVec
> >>> +	mflr 0
> >>> +	bl 1f
> >>> +	.hidden __hwcap
> >>> +	.long __hwcap-.
> >>> +1:	mflr 4
> >> This de-balances the return stack and probably has quite severe
> >> performance impact.  The ISA manual says to use
> >>
> >>   bcl 20,31,$+4
> >>
> >> and you'll have to store the __hwcap offset somewhere else.
> > To begin with, let's change the .s files to .S files and put the whole
> > branch logic inside #ifndef __ALTIVEC__ so that it does not impact
> > normal builds with an ISA level where Altivec can be assumed to be
> > present.
> >
> > I'm not sufficiently familiar with the PowerPC ISA to know how bcl
> > works, but if there's a less expensive solution along those lines
> > that's compatible with all ISA levels, by all means let's use it. The
> > same could be done for powerpc-sf (32-bit) and its SPE branches, too.
> >
> > Also the add and lwz can be used into lwzx (indexed load).
> >
> The code for ppc64 uses ld after add, not lwz. This is required to make
> it work on both big and little endian systems. We therefore cannot use
> lwzx, but have to use ldx.

OK, I don't understand why endianness would matter, but I do see a
problem here: ld expects to load a 64-bit value, but the value is only
32-bit (.long). Unless I'm missing something, we need to either make
it 64-bit (.llong, and with proper alignment) or use a sign-extending
32-bit load. The latter would assume a model where the whole program
(for static linking) or libc.so (for dynamic) fits in ±2GB. This is
clearly valid for dynamic but dubious for static (although maybe GCC
already assumes this with how it loads the GOT address and DSO-local
globals?).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.