|
Message-ID: <513a4b9826006c9659f32134e87004a6@smtp.hushmail.com> Date: Wed, 10 Jun 2015 19:19:58 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics On 2015-06-10 17:59, Lei Zhang wrote: > I further did some investigation into the asm code generated under x1 > & x2 (SIMD_PARA_SHA256) by icc on my laptop (AVX). In SSESHA256body, > there're about 200 vmovdqu instructions generated under x1, and the > number is 260 under x2. Most of the vmovdqu instructions seem to be > used for loading & storing xmm registers, only a few for > inter-register moving. I think it's likely those additional vmovdqu > instructions under x2 are for register spilling. So we get 30% more load/store for 100% more work done. That should be a win! But this assumes we're not having actual loops in the code. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.