|
Message-ID: <20211208050203.GD8506@voyager> Date: Wed, 8 Dec 2021 06:02:03 +0100 From: Markus Wichmann <nullplan@....net> To: Rich Felker <dalias@...c.org> Cc: Florian Weimer <fweimer@...hat.com>, musl@...ts.openwall.com Subject: Re: [PATCH] ppc64: check for AltiVec in setjmp/longjmp On Tue, Dec 07, 2021 at 03:29:21PM -0500, Rich Felker wrote: > In general I would prefer the "obvious what it's doing" form over the > "special cased for performance" form in places where performance can't > matter -- for example, the ones you cited that execute once per > program invocation. But if it's easy to read either way, fine -- and > it probably can be made so. > I foresee no issue with readability. Indeed most avid PPC assembly readers will recognize "bcl 20,31" as "just getting the instruction pointer" sooner than "bl", but the functions in question are so small it doesn't really matter either way. > Note that if the __hwcap-. constant is moved out of line, I think it's > possible to avoid any added cost. Something along the lines of the > following: > > bcl 20,31,1f > 1: mflr 4 > lwz 5,2f-1b(4) > lwzx 4,4,5 > ... > 2: .long __hwcap-1b > > Does this look right? Seems right to me. David's warning made me remember an article I read once about branch prediction and cache instructions: Basically, cache instructions have no execution phase, I mean, architecturally they have no effect. They change no memory and no registers, they change an implementation detail that ought to be transparent to the programmer. So if a branch is mispredicted to hit a given cache instruction, that cache instruction will be executed to the fullest even if the pipeline is flushed (pipeline flush simply skips execution phase, which cache instructions don't have). Now, the XBox 360 CPU had a special cache instruction (I believe it was "xdcbl" or so), which could circumvent the L2 cache. Unfortunately, all access synchronization between CPUs happens through the L2 cache. Therefore this instruction should not be used on memory that can be shared between CPUs, which is pretty much all memory in user space (any thread might be preempted and migrated at any time, so not even stack is safe). Unfortunately, with the above mentioned branch prediction drama, the instruction can cause issues if it merely shows up in the instruction stream, even if it is ultimately never executed. They had to remove any instance of this instruction from their programs to get the issues to disappear. Now with your hwcap pointer, you have no idea what instruction it will end up looking like. But if we put the pointer into .rodata, the offset between labels 2 and 1 might be larger than 32kB, making the code more complicated. You could put "b ." in front of it, to stop any branch misprediction before it. I don't know, you figure it out. Ciao, Markus
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.