john-users - Re: JtR vs. hashcat on /r/crypto

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120831221515.GC15594@openwall.com>
Date: Sat, 1 Sep 2012 02:15:15 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: JtR vs. hashcat on /r/crypto

On Thu, Aug 30, 2012 at 02:32:52AM +0400, Alexander Cherepanov wrote:
> On 2012-08-28 23:21, jfoug wrote:
> >> 2. It turns out (was news to me) that hashcat added SunMD5 support
> >> recently (on CPU).  According to atom, it does not use SIMD, yet is
> >> faster than ours with SIMD (JimF's unreleased code in magnum-jumbo).
> >> I've asked atom for specific speed numbers, but we might want to do our
> >> own benchmarks as well (Jim?), if we don't mind running the closed-
> >> source hashcat for that. ;-)
> > 
> > I have a strong belief the coin flip logic we have (the original sun logic),
> > is where the speedup can be found. Yes, we did remove a %5 in one of the
> > loops.  But there still has to be a LOT of optimization left. There is a lot
> > of temp memory usage, and memory movement.
> 
> Indeed all those crazy arrays can easily be ditched. Patched is posted
> to john-dev.

Thank you!  (For those not on john-dev: you wrote this gives +18%.)

Meanwhile, the mystery may have been solved.  r4d1x of team hashcat has
kindly benchmarked hashcat's vs. magnum-jumbo's SunMD5 on his dual Xeon
E5645 machine (12 cores, 24 logical CPUs), JtR built as linux-x86-64i.
(This does not include the "+18%" speedup mentioned above yet.)  Here
are some speed numbers, posted with r4d1x' permission (thanks, r4d1x!)

JtR using one core:

*r4d1x* 413 c/s @ 120 seconds
*r4d1x* single thread

hashcat, ditto:

*r4d1x* Speed/sec.: - plains, 103 words
*r4d1x* @ 120 seconds

Both were for length 7, lowercase letters + numbers, running against one
SunMD5 hash (the same one).

JtR MPI build (24 processes):

*r4d1x* # mpirun -n 24 ./john --test --format=sunmd5
*r4d1x* Benchmarking: SunMD5 [128/128 SSE2 intrinsics 12x x576]... (24xMPI) DONE
*r4d1x* Raw:    5276 c/s real, 5276 c/s virtual

hashcat not limited to 1 thread (thus, should be 24 threads):

*r4d1x* Speed/sec.: - plains, 1.74k words

So magnum-jumbo is about 4x faster when using one core, and about 3x
faster when using all logical CPUs (HT partially compensates for non-use
of SIMD in hashcat).

To be fair, I need to note that this is a released version of hashcat
vs. unreleased JtR code (yet publicly available via git).  It is
possible that by the time we get around to including our SunMD5 code in
a release, atom puts out a new version of hashcat with similar or better
speedup. ;-)  Or maybe not, because SunMD5 is pretty uncommon in the
wild.  IIRC, so far we had only one person posting to john-users mention
cracking these hashes during a security audit.

Also, hashcat's built-in multi-threading works very nicely, as compared
to JtR's cumbersome MPI support (e.g., status reporting with MPI is
nastier).  We need to implement OpenMP support for SunMD5 hashes (right
now it only exists for the old SunMD5 support on Solaris via the
"generic crypt(3)" format, but that's system-specific and slow).  Jim? :-)

Anyhow, SIMD does provide the expected speedup.  Thanks, Jim!

To me, our successful SIMD'ing of SunMD5 primarily proves that
data-dependent branching is not such a great idea to defeat this sort of
attacks (as well as GPUs).  If done differently, it could actually
mitigate the attacks, but even then it would come with a risk of side
channel leaks - not a good tradeoff, in my opinion, considering that
there are other ways to defeat GPUs (and defeating SIMD is not even a
good goal; it is better to use SIMD for defense).

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.