Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20150624224502.GA29543@openwall.com>
Date: Thu, 25 Jun 2015 01:45:02 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: optimizing bcrypt cracking on x86

On Wed, Jun 24, 2015 at 06:06:16PM -0400, Alain Espinosa wrote:
> ...I got speedup for 1 thread/core, but
> significant slowdown for 2 threads/core.
> 
> This is other thing that is different in my tests (may be my asm code is suboptimal). In a core i3-2120 I get 4% speed up interleaving 3 keys instead of 2. This is using 4 threads.

Of course, on an HT-less CPU you need to interleave 3 or 4 instances
rather than just 2.

In fact, with my 2x2 MMX2 code I am experimenting with 4 parallel
instances (2x crippled SIMD, 2x interleaving) on i7-4770K as well.

Replacing those SHLD with MOV+SHR got me slowdown for 2 threads/core
even at 4 instances/thread.  (But that's the 2x2 thing, not simple 4x
interleaving.)

Replacing those SHLD with SHRX similarly speeds things up for 1 thread/core
(just like MOV+SHR), but keeps performance the same for 2 threads/core
(unlike the slowdown seen with MOV+SHR).  Unfortunately, it wastes a
register to hold the shift count, which may prevent other optimizations.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.