Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151206144044.GA28804@openwall.com>
Date: Sun, 6 Dec 2015 17:40:44 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: hashcat CPU vs. JtR

Hi,

Most value of hashcat is in oclHashcat, and I greatly appreciate atom's
generosity in making it open source along with the CPU hashcat.  We have
more stuff to learn from there.  However, this one posting is about the
CPU hashcat.

What are some reasons why someone may prefer to use hashcat over JtR,
both on CPU?  Is it some cracking modes we don't have equivalents for in
JtR?  What are those?

hashcat appears to support a subset of hash types that we have in jumbo,
and in my testing today is typically 2 to 3 times slower than JtR, with
few exceptions.  (This is consistent with what I heard from others
before.  I just didn't test this myself until now.)

The most notable exception, where hashcat is much faster than JtR, is
with its multi-threading support for fast hashes.  When using JtR on
fast hashes, currently --fork should be used instead of multiple threads,
and it can be cumbersome (multiple status lines instead of one, the
child processes terminating not exactly at the same time, etc.)

Another exception is bcrypt, where hashcat delivers about the best speed
we can get out of JtR, and in fact better than a default build of JtR
does on our 2x E5-2670 machine (which I am testing this on):

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 3200 
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: bcrypt, Blowfish(OpenBSD)
Speed/sec: 16.82k words

JtR is slightly slower by default (built with the same gcc 4.9.1 as
hashcat above):

[solar@...er src]$ ../run/john -test -form=bcrypt
Will run 32 OpenMP threads
Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 32
Raw:    16128 c/s real, 506 c/s virtual

Its performance on this machine can be improved to 16900 c/s (same as
hashcat) by forcing BF_X2 = 3 in arch.h, but the current logic in jumbo
is to only use that setting on HT-less Intel CPUs (and these Xeons are
HT-capable) as that appears to work slightly better on many other CPUs
(just not on this particular machine).

Another exception I noticed is scrypt, where hashcat is only moderately
slower than JtR:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 8900
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: scrypt
Speed/sec: 639 words

[solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=scrypt
Will run 32 OpenMP threads
Benchmarking: scrypt (16384, 8, 1) [Salsa20/8 128/128 AVX]... (32xOMP) DONE
Speed for cost 1 (N) of 16384, cost 2 (r) of 8, cost 3 (p) of 1
Raw:    878 c/s real, 27.6 c/s virtual

(BTW, I think this used to be ~960 c/s.  Looks like we got a performance
regression we need to look into, or just get the latest yescrypt code in
first and then see.)

hashcat is at 639/878 = 73% of JtR's speed at scrypt here

Yet another exception in SunMD5, where I am puzzled about what hashcat
is actually benchmarking:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 3300
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: MD5(Sun)
Speed/sec: 223.64M words

[solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sunmd5
Will run 32 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:    10593 c/s real, 332 c/s virtual

223.64M vs. 10.6K?!  This can't be right.  SunMD5 with typical settings
is known to be slow.

For most other hash types I checked, JtR is a lot faster, e.g.:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 500
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5
Speed/sec: 269.21k words

[solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=md5crypt
Will run 32 OpenMP threads
Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Raw:    729600 c/s real, 22750 c/s virtual

729600/269210 = 2.71 times faster

sha512crypt:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1800
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: sha512crypt, SHA512(Unix)
Speed/sec: 5.35k words

[solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sha512crypt
Will run 32 OpenMP threads
Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 128/128 AVX 2x]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:    11299 c/s real, 354 c/s virtual

11299/5350 = 2.11 times faster

Raw MD5:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 0
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: MD5
Speed/sec: 268.55M words

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 0 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1

Hash type: MD5
Speed/sec: 12.71M words

Good multi-threaded efficiency (unlike JtR's at fast hashes like this),
but poor per-thread speed.  JtR's is:

[solar@...er src]$ ../run/john -test -form=raw-md5
Benchmarking: Raw-MD5 [MD5 128/128 AVX 4x3]... DONE
Raw:    38898K c/s real, 38898K c/s virtual

OpenMP is compile-time disabled for fast hashes (which is the current
default in bleeding-jumbo), so this is for 1 thread (and --fork should
be used - yes, with its drawbacks).

38898/12710 = 3.06 times faster

Raw SHA-1:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 100 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1

Hash type: SHA1
Speed/sec: 10.12M words

[solar@...er src]$ ../run/john -test -form=raw-sha1
Benchmarking: Raw-SHA1 [SHA1 128/128 AVX 4x]... DONE
Raw:    19075K c/s real, 19075K c/s virtual

19075/10120 = 1.88 times faster

Not that bad.  I guess hashcat has optimizations here that we don't, but
lacks interleaving.  Still, I wouldn't use hashcat over john --fork.

NTLM:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1000 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1

Hash type: NTLM
Speed/sec: 14.21M words

[solar@...er src]$ ../run/john -test -form=nt
Benchmarking: NT [MD4 128/128 AVX 4x3]... DONE
Raw:    44687K c/s real, 44687K c/s virtual

44687/14210 = 3.14 times faster

Raw SHA-256:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1400 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1

Hash type: SHA256
Speed/sec: 5.10M words

[solar@...er src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha256
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... DONE
Raw:    9068K c/s real, 9068K c/s virtual

9068/5100 = 1.78 times faster

We also have OpenMP support enabled by default for raw SHA-256, but it
doesn't scale well for 32 threads:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1400 
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: SHA256
Speed/sec: 80.85M words

[solar@...er src]$ ../run/john -test -form=raw-sha256
Will run 32 OpenMP threads
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE
Raw:    39976K c/s real, 3774K c/s virtual

[solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha256
Will run 32 OpenMP threads
Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE
Raw:    40370K c/s real, 3731K c/s virtual

hashcat is 2 times faster with multi-threading, but JtR --fork would be
faster yet.

Raw SHA-512:

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1700 -n 1
Initializing hashcat v2.00 with 1 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 1

Hash type: SHA512
Speed/sec: 1.32M words

[solar@...er src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha512
Warning: OpenMP is disabled; a non-OpenMP build may be faster
Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... DONE
Raw:    3856K c/s real, 3856K c/s virtual

3856/1320 = 2.92 times faster

[solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1700
Initializing hashcat v2.00 with 32 threads and 32mb segment-size...

Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
Instruction set..: x86_64
Number of threads: 32

Hash type: SHA512
Speed/sec: 26.80M words

[solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha512
Will run 32 OpenMP threads
Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... (32xOMP) DONE
Raw:    23330K c/s real, 1577K c/s virtual

SHA-512 is almost slow enough that JtR's (poor) multi-threading support
is almost on par with hashcat's even at 32 threads.  Yet --fork would be
2 to 3 times faster than hashcat.

My JtR benchmarks are with yesterday's bleeding-jumbo.  It could be
better to (also) use actual cracking runs to compare the tools - maybe
someone else will.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.