Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1hOcPQ=mADeAUP3i-Xt3rvHmgUrVVoz2yUEOkUEYQ2xRVN2g@mail.gmail.com>
Date: Sun, 15 Feb 2015 15:07:06 +0100
From: Denys Vlasenko <vda.linux@...glemail.com>
To: musl <musl@...ts.openwall.com>
Subject: Re: [PATCH] x86_64/memset: use "small block" code for blocks
 up to 30 bytes long

On Sun, Feb 15, 2015 at 5:06 AM, Rich Felker <dalias@...c.org> wrote:
>> The main change whose value I really question is the conditional
>> widen_rax. If the value isn't used until a few cycles after the imul
>> instruction, doing it unconditionally is probably cheaper than testing
>> and branching even when the branch is predictable.
>
> To elaborate, simply replacing the unconditional imul with an
> unconditional xor %eax,%eax in my best variant so far, I was only able
> to save one cycle. So I don't see any way a test, branch, and
> conditional imul could be less expensive than the unconditional imul.

So imul elimination is a (tiny) win even on our CPUs, which happen
to be the _fastest_ CPUs in regards to 64x64 imul (3 cycles).

Just because we don't personally see a hit from 6-cycle imul of AMD CPUs,
it does not mean people who do use those CPUs don't exist. Have heart...

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.