john-dev - Re: optimizing bcrypt cracking on x86

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150625043321.GA1020@openwall.com>
Date: Thu, 25 Jun 2015 07:33:21 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: optimizing bcrypt cracking on x86

Regarding the 2x2 MMX2 code on i7-4770K:

On Wed, Jun 24, 2015 at 07:10:07AM +0300, Solar Designer wrote:
> On 64-bit builds, though, I only got this to run at cumulative speeds
> like 780*8 = 6240 c/s, which is worse than 6595 c/s previously seen with
> OpenMP (and even worse than the slightly better speeds that can be seen
> with separate independent processes).

I managed to improve this to 796*8 = 6368 c/s by removing some of the
large displacements on loads, and instead keeping them in base registers
(using the extra GPRs that we have in 64-bit mode for this).  For the
288 bytes of P, an offset into the middle of this range may be put into
a register, and then 256 out of the 288 bytes may be accessed via 1-byte
displacements (or alternatively 248 out of 288, but then we can also
access the first S-box via the same base register with 0x78 in the
1-byte displacement).  Also, remembering that R13 is special just like
RBP (no without-displacement encoding) can sometimes be helpful.

This is still not good enough, though.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.