john-dev - Re: optimizing bcrypt cracking on x86

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20150624224502.GA29543@openwall.com>
Date: Thu, 25 Jun 2015 01:45:02 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: optimizing bcrypt cracking on x86

On Wed, Jun 24, 2015 at 06:06:16PM -0400, Alain Espinosa wrote:
> ...I got speedup for 1 thread/core, but
> significant slowdown for 2 threads/core.
> 
> This is other thing that is different in my tests (may be my asm code is suboptimal). In a core i3-2120 I get 4% speed up interleaving 3 keys instead of 2. This is using 4 threads.

Of course, on an HT-less CPU you need to interleave 3 or 4 instances
rather than just 2.

In fact, with my 2x2 MMX2 code I am experimenting with 4 parallel
instances (2x crippled SIMD, 2x interleaving) on i7-4770K as well.

Replacing those SHLD with MOV+SHR got me slowdown for 2 threads/core
even at 4 instances/thread.  (But that's the 2x2 thing, not simple 4x
interleaving.)

Replacing those SHLD with SHRX similarly speeds things up for 1 thread/core
(just like MOV+SHR), but keeps performance the same for 2 threads/core
(unlike the slowdown seen with MOV+SHR).  Unfortunately, it wastes a
register to hold the shift count, which may prevent other optimizations.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.