|
Message-ID: <75bd5da8a0ed0ad50e047745f89447ba@smtp.hushmail.com> Date: Fri, 14 Sep 2012 19:21:27 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: intrinsics: speed up for linux-x86-64-native On 14 Sep, 2012, at 15:08 , Aleksey Cherepanov <aleksey.4erepanov@...il.com> wrote: > Looking over sse-intrinsics.c I noticed weird thing: multiple > MD5_PARA_DO cycles when it is possible to write one cycle over > everything and avoid use of tmp variable. I tried to avoid some cycles > and got a speed up. But when I merged them into one cycle per MD5_STEP > I got a significant slowdown. It's not very intuitive but AFAIK it was made that way on purpose: The intention is to get code that hides latency, much like GPU coding. This is pretty compiler-dependant and that is why icc can make such a great difference. That is about all I know so I'll leave the details to the experts. I hope someone can give a more thorough explanation - I'd read it with interest. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.