john-dev - Re: Parallella: bcrypt

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130730143311.GA17291@openwall.com>
Date: Tue, 30 Jul 2013 18:33:11 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Parallella: bcrypt

Katja,

On Tue, Jul 30, 2013 at 02:18:04PM +0200, Katja Malvoni wrote:
> On Tue, Jul 30, 2013 at 2:47 AM, Solar Designer <solar@...nwall.com> wrote:
> 
> > Perhaps you should change your code to transferring just one struct?
> > I wouldn't be surprised if this gives us a few c/s extra.
> 
> Done.

Any change in c/s rate?

BTW, you can probably do smarter: when you have hashes with multiple
salts loaded for cracking, usually the candidate passwords stay the same
across crypt_all() calls until they've been tested for all salts (so the
salt changes across those calls).  You can optimize for this special
case, e.g. by maintaining a keys_changed variable in your format (e.g.,
DES_bs* files use a variable like this) and only transferring the
candidate passwords to Epiphany when they have changed.

There's another special case: when cracking hashes with just one salt
(which often means that you have just one hash loaded, although this is
not necessarily so), the salt stays the same across crypt_all() calls
(only the candidate passwords change), so you can save on not
transferring the unchanged salt.

So it may be more optimal to have exactly two structs: one for the salt
and one for the candidate passwords - and transfer only those of them
that have changed since the previous transfer.  You set the
keys_changed flag in set_key() and the salt_changed flag in set_salt(),
and you reset both in crypt_all().

I'm sorry it did not occur to me to suggest this to you before.

> When I do test with BF_tst.in speed is 727 c/s. It seems that interleaving
> is not used (but even without interleaving it shouldn't be this slow).

Your code can't magically turn into a non-interleaved version.  Rather,
when there are too few inputs to fully use the 32 "slots", fewer of the
slots will be made effective use of (and counted for c/s), so a speed
worse than you had without interleaving is to be expected for the last
set of candidate passwords to be tested (as long as the number of
candidate passwords is not a multiple of 32).  This does mean that
increasing interleaving hurts performance for very short runs of the
program, while improving performance for long runs.

How many candidate passwords are you testing?

> Self
> test on same code gives speed of 1175 or 1177 c/s. MAX_KEYS_PER_CRYPT is
> defined as EPIPHANY_CORES*2 so every crypt_all() call should compute 32
> hashes?

Yes.

BTW, when you #define something to an expression (rather than a literal
constant), it is considered good style to enclose the entire expression
in braces.  That way, you will avoid potential bugs when you or someone
else later happens to use your #define'd non-literal constant in an
expression.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.