|
Message-ID: <20150701100505.GA9071@openwall.com> Date: Wed, 1 Jul 2015 13:05:05 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: optimizing bcrypt cracking on x86 On Thu, Jun 25, 2015 at 07:33:21AM +0300, Solar Designer wrote: > Regarding the 2x2 MMX2 code on i7-4770K: > > On Wed, Jun 24, 2015 at 07:10:07AM +0300, Solar Designer wrote: > > On 64-bit builds, though, I only got this to run at cumulative speeds > > like 780*8 = 6240 c/s, which is worse than 6595 c/s previously seen with > > OpenMP (and even worse than the slightly better speeds that can be seen > > with separate independent processes). > > I managed to improve this to 796*8 = 6368 c/s by removing some of the > large displacements on loads, and instead keeping them in base registers > (using the extra GPRs that we have in 64-bit mode for this). For the > 288 bytes of P, an offset into the middle of this range may be put into > a register, and then 256 out of the 288 bytes may be accessed via 1-byte > displacements (or alternatively 248 out of 288, but then we can also > access the first S-box via the same base register with 0x78 in the > 1-byte displacement). Also, remembering that R13 is special just like > RBP (no without-displacement encoding) can sometimes be helpful. Another related trick, which I haven't tried yet, is to interleave pairs of S-boxes (from the same bcrypt instance or from different instances). Then the same base register could be used to access two of such S-boxes at once, with "4" in the displacement field (fits 1-byte, obviously) for the second S-box in a pair. The index scaling would be by 8 rather than by 4, but it's same cost. This way, only 4 base registers would be needed to access everything for 2 bcrypt instances, with at most 1-byte displacements (thus avoiding 4-byte displacements, which appear to cost extra on Haswell). Another advantage of such interleaving is that we're guaranteed to have no cache bank conflict between lookups from these two S-boxes then. Per my testing, this is irrelevant for Haswell, but it might be relevant on other CPUs (and not only CPUs). > This is still not good enough, though. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.