|
Message-ID: <CAK1hOcPQ=mADeAUP3i-Xt3rvHmgUrVVoz2yUEOkUEYQ2xRVN2g@mail.gmail.com> Date: Sun, 15 Feb 2015 15:07:06 +0100 From: Denys Vlasenko <vda.linux@...glemail.com> To: musl <musl@...ts.openwall.com> Subject: Re: [PATCH] x86_64/memset: use "small block" code for blocks up to 30 bytes long On Sun, Feb 15, 2015 at 5:06 AM, Rich Felker <dalias@...c.org> wrote: >> The main change whose value I really question is the conditional >> widen_rax. If the value isn't used until a few cycles after the imul >> instruction, doing it unconditionally is probably cheaper than testing >> and branching even when the branch is predictable. > > To elaborate, simply replacing the unconditional imul with an > unconditional xor %eax,%eax in my best variant so far, I was only able > to save one cycle. So I don't see any way a test, branch, and > conditional imul could be less expensive than the unconditional imul. So imul elimination is a (tiny) win even on our CPUs, which happen to be the _fastest_ CPUs in regards to 64x64 imul (3 cycles). Just because we don't personally see a hit from 6-cycle imul of AMD CPUs, it does not mean people who do use those CPUs don't exist. Have heart...
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.