Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211207132509.GO7074@brightrain.aerifal.cx>
Date: Tue, 7 Dec 2021 08:25:09 -0500
From: Rich Felker <dalias@...c.org>
To: David Edelsohn <dje.gcc@...il.com>
Cc: musl@...ts.openwall.com, Florian Weimer <fweimer@...hat.com>,
	Stijn Tintel <stijn@...ux-ipv6.be>
Subject: Re: [PATCH] ppc64: check for AltiVec in setjmp/longjmp

On Mon, Dec 06, 2021 at 08:44:47PM -0500, David Edelsohn wrote:
> On Mon, Dec 6, 2021 at 8:39 PM Rich Felker <dalias@...c.org> wrote:
> >
> > On Mon, Dec 06, 2021 at 08:15:48PM -0500, David Edelsohn wrote:
> > > On Mon, Dec 6, 2021 at 7:59 PM Rich Felker <dalias@...c.org> wrote:
> > > >
> > > > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wrote:
> > > > > * Stijn Tintel:
> > > > >
> > > > > > diff --git a/src/setjmp/powerpc64/setjmp.s b/src/setjmp/powerpc64/setjmp.s
> > > > > > index 37683fda..32853693 100644
> > > > > > --- a/src/setjmp/powerpc64/setjmp.s
> > > > > > +++ b/src/setjmp/powerpc64/setjmp.s
> > > > > > @@ -69,7 +69,17 @@ __setjmp_toc:
> > > > > >     stfd 30, 38*8(3)
> > > > > >     stfd 31, 39*8(3)
> > > > > >
> > > > > > -   # 5) store vector registers v20-v31
> > > > > > +   # 5) store vector registers v20-v31 if hardware supports AltiVec
> > > > > > +   mflr 0
> > > > > > +   bl 1f
> > > > > > +   .hidden __hwcap
> > > > > > +   .long __hwcap-.
> > > > > > +1: mflr 4
> > > > >
> > > > > This de-balances the return stack and probably has quite severe
> > > > > performance impact.  The ISA manual says to use
> > > > >
> > > > >   bcl 20,31,$+4
> > > > >
> > > > > and you'll have to store the __hwcap offset somewhere else.
> > > >
> > > > To begin with, let's change the .s files to .S files and put the whole
> > > > branch logic inside #ifndef __ALTIVEC__ so that it does not impact
> > > > normal builds with an ISA level where Altivec can be assumed to be
> > > > present.
> > > >
> > > > I'm not sufficiently familiar with the PowerPC ISA to know how bcl
> > > > works, but if there's a less expensive solution along those lines
> > > > that's compatible with all ISA levels, by all means let's use it. The
> > > > same could be done for powerpc-sf (32-bit) and its SPE branches, too.
> > >
> > > bl = branch and link
> > > bcl = branch conditional and link
> > >
> > > link means place the next instruction address in the link register.
> > > Normally a branch and link would be used for a matching "return"
> > > instruction, but in this case it is being used to compute a position
> > > independent code address.  As Florian correctly points out, the "bl"
> > > will corrupt the link stack in the processor used to predict return
> > > addresses and the recommended sequence is the one that he suggests.
> > >
> > > bcl 20,31,addr
> > >
> > > which means branch always and, because the condition register bits are
> > > irrelevant, a special value that instructs the processor to not  push
> > > the address onto the link stack so that the "calls" and "returns"
> > > remain matched.
> >
> > Thanks. Am I correct in understanding then that we don't need $+4, but
> > can instead use the 1f just as now, with inline .long __hwcap-. -- in
> > other words that "bcl 20,31," is a drop-in replacement for "bl"
> > without the link stack impact?
> 
> It should work, but it's slightly preferred to use $+4 because one
> explicitly wants the address of the next instruction and labels of the

In this case we don't want the address of the next instruction. We
want the address of the constant __hwcap-.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.