Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150625043321.GA1020@openwall.com>
Date: Thu, 25 Jun 2015 07:33:21 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: optimizing bcrypt cracking on x86

Regarding the 2x2 MMX2 code on i7-4770K:

On Wed, Jun 24, 2015 at 07:10:07AM +0300, Solar Designer wrote:
> On 64-bit builds, though, I only got this to run at cumulative speeds
> like 780*8 = 6240 c/s, which is worse than 6595 c/s previously seen with
> OpenMP (and even worse than the slightly better speeds that can be seen
> with separate independent processes).

I managed to improve this to 796*8 = 6368 c/s by removing some of the
large displacements on loads, and instead keeping them in base registers
(using the extra GPRs that we have in 64-bit mode for this).  For the
288 bytes of P, an offset into the middle of this range may be put into
a register, and then 256 out of the 288 bytes may be accessed via 1-byte
displacements (or alternatively 248 out of 288, but then we can also
access the first S-box via the same base register with 0x78 in the
1-byte displacement).  Also, remembering that R13 is special just like
RBP (no without-displacement encoding) can sometimes be helpful.

This is still not good enough, though.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.