Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211208050203.GD8506@voyager>
Date: Wed, 8 Dec 2021 06:02:03 +0100
From: Markus Wichmann <nullplan@....net>
To: Rich Felker <dalias@...c.org>
Cc: Florian Weimer <fweimer@...hat.com>, musl@...ts.openwall.com
Subject: Re: [PATCH] ppc64: check for AltiVec in setjmp/longjmp

On Tue, Dec 07, 2021 at 03:29:21PM -0500, Rich Felker wrote:
> In general I would prefer the "obvious what it's doing" form over the
> "special cased for performance" form in places where performance can't
> matter -- for example, the ones you cited that execute once per
> program invocation. But if it's easy to read either way, fine -- and
> it probably can be made so.
>

I foresee no issue with readability. Indeed most avid PPC assembly
readers will recognize "bcl 20,31" as "just getting the instruction
pointer" sooner than "bl", but the functions in question are so small it
doesn't really matter either way.

> Note that if the __hwcap-. constant is moved out of line, I think it's
> possible to avoid any added cost. Something along the lines of the
> following:
>
> 	bcl 20,31,1f
> 1:	mflr 4
> 	lwz 5,2f-1b(4)
> 	lwzx 4,4,5
> 	...
> 2:	.long __hwcap-1b
>
> Does this look right?

Seems right to me.

David's warning made me remember an article I read once about branch
prediction and cache instructions: Basically, cache instructions have no
execution phase, I mean, architecturally they have no effect. They
change no memory and no registers, they change an implementation detail
that ought to be transparent to the programmer.

So if a branch is mispredicted to hit a given cache instruction, that
cache instruction will be executed to the fullest even if the pipeline
is flushed (pipeline flush simply skips execution phase, which cache
instructions don't have). Now, the XBox 360 CPU had a special cache
instruction (I believe it was "xdcbl" or so), which could circumvent the
L2 cache. Unfortunately, all access synchronization between CPUs
happens through the L2 cache. Therefore this instruction should not be
used on memory that can be shared between CPUs, which is pretty much all
memory in user space (any thread might be preempted and migrated at any
time, so not even stack is safe).

Unfortunately, with the above mentioned branch prediction drama, the
instruction can cause issues if it merely shows up in the instruction
stream, even if it is ultimately never executed. They had to remove any
instance of this instruction from their programs to get the issues to
disappear.

Now with your hwcap pointer, you have no idea what instruction it will
end up looking like. But if we put the pointer into .rodata, the offset
between labels 2 and 1 might be larger than 32kB, making the code more
complicated. You could put "b ." in front of it, to stop any branch
misprediction before it. I don't know, you figure it out.

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.