john-dev - Re: DES BS + OMP improvements, MPI direction

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111217224622.GA29846@openwall.com>
Date: Sun, 18 Dec 2011 02:46:22 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: DES BS + OMP improvements, MPI direction

On Sat, Dec 17, 2011 at 09:21:19AM -0700, RB wrote:
> I've not been terribly active here lately, but certainly keep up with
> the state of things and "evangelize" the use of JtR where appropriate.

Wow.

> That said, I wanted to congratulate and thank you guys for the recent
> speed improvements.  Two specific items of note: the key setup for DES
> bitslice and the proliferation of OpenMP enablement for various
> hashes.  By itself, the DES improvement speeds up LM on benchmarks by
> as much as 2x on one of my systems (W3565).

It'd be nice if you add some benchmark results to:

http://openwall.info/wiki/john/benchmarks

> In concert with OpenMP, I
> see between 120m and 180m c/s over four threads in live cracking
> (using HT appears to dampen my specific performance),

Yes, LM/OpenMP scales poorly.  4 threads is usually just 1.5 times
faster than non-OpenMP.  But all of these are very fast, almost as fast
as dummy.

You might want to use a non-OpenMP build, which should do something
around 60M c/s per process on your system.  With 8 processes (e.g. with
MPI), you'll get around 250M c/s total.

> with completion estimates in the 36-hour (!!!) range.

Actually, 120M c/s would result in completion in 17 hours (for printable
US-ASCII), so perhaps you're not actually getting that speed.

> As bad as LM is, it's still
> useful to those who have access to the SAM database.  OMP is just a
> huge help in general.  I'm not seeing a linear increase in speed on my
> systems, but it's enough to stick with 4/8/16-core machines and not
> futz about trying to set up an MPI network any more.

Like I said, LM/OpenMP scales very poorly - almost up to the point where
it's unreasonable to use OpenMP.  For other hash types, it's much better
(90% efficiency as compared to multiple separate processes is common).

> All that said, after reading through the OMP additions they doesn't
> appear to be terribly invasive.  I'm sure there's some necessary code
> grooming prior to the seemingly small insertion of the pragma, but
> overall it seems small.  I've not looked at the MPI stuff since magnum
> took that over (thanks!), but that type of positioning is exactly
> where I'd have wanted to go eventually.  Being partly lazy, are those
> of you looking at the MPI implementation considering using that same
> approach?  Assuming the network latency and throughput don't
> interfere, that could certainly help solve the scaling issues john-MPI
> had at the sunset of my maintainership.

Are you suggesting that we'd apply MPI at the same low level where we
currently apply OpenMP?  No, that doesn't sound like a good idea to me,
except maybe for the slowest hash/cipher types only (BF, MSCash2, RAR,
crypt when run against SHA-crypt), where it might be acceptable.  This
is not currently planned, though.

Rather, I think we should be moving in the opposite direction:
introducing easy-to-use parallelization at higher levels, where it can
be more generic and more efficient.  Somewhat similar to what the MPI
support in -jumbo does, but without MPI.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.