Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGWvnykjn79ncxW6o3-ugp7-ESV5pmgYFwj6NJTC-W4q=+NLhQ@mail.gmail.com>
Date: Tue, 7 Dec 2021 09:48:31 -0500
From: David Edelsohn <dje.gcc@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com, Florian Weimer <fweimer@...hat.com>, 
	Stijn Tintel <stijn@...ux-ipv6.be>
Subject: Re: [PATCH] ppc64: check for AltiVec in setjmp/longjmp

On Tue, Dec 7, 2021 at 9:43 AM Rich Felker <dalias@...c.org> wrote:
>
> On Tue, Dec 07, 2021 at 08:39:20AM -0500, David Edelsohn wrote:
> > On Tue, Dec 7, 2021 at 8:25 AM Rich Felker <dalias@...c.org> wrote:
> > >
> > > On Mon, Dec 06, 2021 at 08:44:47PM -0500, David Edelsohn wrote:
> > > > On Mon, Dec 6, 2021 at 8:39 PM Rich Felker <dalias@...c.org> wrote:
> > > > >
> > > > > On Mon, Dec 06, 2021 at 08:15:48PM -0500, David Edelsohn wrote:
> > > > > > On Mon, Dec 6, 2021 at 7:59 PM Rich Felker <dalias@...c.org> wrote:
> > > > > > >
> > > > > > > On Tue, Dec 07, 2021 at 01:37:12AM +0100, Florian Weimer wrote:
> > > > > > > > * Stijn Tintel:
> > > > > > > >
> > > > > > > > > diff --git a/src/setjmp/powerpc64/setjmp.s b/src/setjmp/powerpc64/setjmp.s
> > > > > > > > > index 37683fda..32853693 100644
> > > > > > > > > --- a/src/setjmp/powerpc64/setjmp.s
> > > > > > > > > +++ b/src/setjmp/powerpc64/setjmp.s
> > > > > > > > > @@ -69,7 +69,17 @@ __setjmp_toc:
> > > > > > > > >     stfd 30, 38*8(3)
> > > > > > > > >     stfd 31, 39*8(3)
> > > > > > > > >
> > > > > > > > > -   # 5) store vector registers v20-v31
> > > > > > > > > +   # 5) store vector registers v20-v31 if hardware supports AltiVec
> > > > > > > > > +   mflr 0
> > > > > > > > > +   bl 1f
> > > > > > > > > +   .hidden __hwcap
> > > > > > > > > +   .long __hwcap-.
> > > > > > > > > +1: mflr 4
> > > > > > > >
> > > > > > > > This de-balances the return stack and probably has quite severe
> > > > > > > > performance impact.  The ISA manual says to use
> > > > > > > >
> > > > > > > >   bcl 20,31,$+4
> > > > > > > >
> > > > > > > > and you'll have to store the __hwcap offset somewhere else.
> > > > > > >
> > > > > > > To begin with, let's change the .s files to .S files and put the whole
> > > > > > > branch logic inside #ifndef __ALTIVEC__ so that it does not impact
> > > > > > > normal builds with an ISA level where Altivec can be assumed to be
> > > > > > > present.
> > > > > > >
> > > > > > > I'm not sufficiently familiar with the PowerPC ISA to know how bcl
> > > > > > > works, but if there's a less expensive solution along those lines
> > > > > > > that's compatible with all ISA levels, by all means let's use it. The
> > > > > > > same could be done for powerpc-sf (32-bit) and its SPE branches, too.
> > > > > >
> > > > > > bl = branch and link
> > > > > > bcl = branch conditional and link
> > > > > >
> > > > > > link means place the next instruction address in the link register.
> > > > > > Normally a branch and link would be used for a matching "return"
> > > > > > instruction, but in this case it is being used to compute a position
> > > > > > independent code address.  As Florian correctly points out, the "bl"
> > > > > > will corrupt the link stack in the processor used to predict return
> > > > > > addresses and the recommended sequence is the one that he suggests.
> > > > > >
> > > > > > bcl 20,31,addr
> > > > > >
> > > > > > which means branch always and, because the condition register bits are
> > > > > > irrelevant, a special value that instructs the processor to not  push
> > > > > > the address onto the link stack so that the "calls" and "returns"
> > > > > > remain matched.
> > > > >
> > > > > Thanks. Am I correct in understanding then that we don't need $+4, but
> > > > > can instead use the 1f just as now, with inline .long __hwcap-. -- in
> > > > > other words that "bcl 20,31," is a drop-in replacement for "bl"
> > > > > without the link stack impact?
> > > >
> > > > It should work, but it's slightly preferred to use $+4 because one
> > > > explicitly wants the address of the next instruction and labels of the
> > >
> > > In this case we don't want the address of the next instruction. We
> > > want the address of the constant __hwcap-.
> >
> > ..hidden __hwcap
> >
> > is not an instruction.  It will not emit any data.
>
> Of course it won't. .long __hwcap-. is the directive that does, on the
> next line, which you seem to have missed.

I'm sorry that you don't understand what I am expressing.  Snide
comments are not productive.  Do what you want.

Thanks, David

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.