|
Message-ID: <20151206144044.GA28804@openwall.com> Date: Sun, 6 Dec 2015 17:40:44 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: hashcat CPU vs. JtR Hi, Most value of hashcat is in oclHashcat, and I greatly appreciate atom's generosity in making it open source along with the CPU hashcat. We have more stuff to learn from there. However, this one posting is about the CPU hashcat. What are some reasons why someone may prefer to use hashcat over JtR, both on CPU? Is it some cracking modes we don't have equivalents for in JtR? What are those? hashcat appears to support a subset of hash types that we have in jumbo, and in my testing today is typically 2 to 3 times slower than JtR, with few exceptions. (This is consistent with what I heard from others before. I just didn't test this myself until now.) The most notable exception, where hashcat is much faster than JtR, is with its multi-threading support for fast hashes. When using JtR on fast hashes, currently --fork should be used instead of multiple threads, and it can be cumbersome (multiple status lines instead of one, the child processes terminating not exactly at the same time, etc.) Another exception is bcrypt, where hashcat delivers about the best speed we can get out of JtR, and in fact better than a default build of JtR does on our 2x E5-2670 machine (which I am testing this on): [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 3200 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: bcrypt, Blowfish(OpenBSD) Speed/sec: 16.82k words JtR is slightly slower by default (built with the same gcc 4.9.1 as hashcat above): [solar@...er src]$ ../run/john -test -form=bcrypt Will run 32 OpenMP threads Benchmarking: bcrypt ("$2a$05", 32 iterations) [Blowfish 32/64 X2]... (32xOMP) DONE Speed for cost 1 (iteration count) of 32 Raw: 16128 c/s real, 506 c/s virtual Its performance on this machine can be improved to 16900 c/s (same as hashcat) by forcing BF_X2 = 3 in arch.h, but the current logic in jumbo is to only use that setting on HT-less Intel CPUs (and these Xeons are HT-capable) as that appears to work slightly better on many other CPUs (just not on this particular machine). Another exception I noticed is scrypt, where hashcat is only moderately slower than JtR: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 8900 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: scrypt Speed/sec: 639 words [solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=scrypt Will run 32 OpenMP threads Benchmarking: scrypt (16384, 8, 1) [Salsa20/8 128/128 AVX]... (32xOMP) DONE Speed for cost 1 (N) of 16384, cost 2 (r) of 8, cost 3 (p) of 1 Raw: 878 c/s real, 27.6 c/s virtual (BTW, I think this used to be ~960 c/s. Looks like we got a performance regression we need to look into, or just get the latest yescrypt code in first and then see.) hashcat is at 639/878 = 73% of JtR's speed at scrypt here Yet another exception in SunMD5, where I am puzzled about what hashcat is actually benchmarking: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 3300 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: MD5(Sun) Speed/sec: 223.64M words [solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sunmd5 Will run 32 OpenMP threads Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE Speed for cost 1 (iteration count) of 5000 Raw: 10593 c/s real, 332 c/s virtual 223.64M vs. 10.6K?! This can't be right. SunMD5 with typical settings is known to be slow. For most other hash types I checked, JtR is a lot faster, e.g.: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 500 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: md5crypt, MD5(Unix), FreeBSD MD5, Cisco-IOS MD5 Speed/sec: 269.21k words [solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=md5crypt Will run 32 OpenMP threads Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 AVX 4x3]... (32xOMP) DONE Raw: 729600 c/s real, 22750 c/s virtual 729600/269210 = 2.71 times faster sha512crypt: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1800 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: sha512crypt, SHA512(Unix) Speed/sec: 5.35k words [solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=sha512crypt Will run 32 OpenMP threads Benchmarking: sha512crypt, crypt(3) $6$ (rounds=5000) [SHA512 128/128 AVX 2x]... (32xOMP) DONE Speed for cost 1 (iteration count) of 5000 Raw: 11299 c/s real, 354 c/s virtual 11299/5350 = 2.11 times faster Raw MD5: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 0 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: MD5 Speed/sec: 268.55M words [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 0 -n 1 Initializing hashcat v2.00 with 1 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 1 Hash type: MD5 Speed/sec: 12.71M words Good multi-threaded efficiency (unlike JtR's at fast hashes like this), but poor per-thread speed. JtR's is: [solar@...er src]$ ../run/john -test -form=raw-md5 Benchmarking: Raw-MD5 [MD5 128/128 AVX 4x3]... DONE Raw: 38898K c/s real, 38898K c/s virtual OpenMP is compile-time disabled for fast hashes (which is the current default in bleeding-jumbo), so this is for 1 thread (and --fork should be used - yes, with its drawbacks). 38898/12710 = 3.06 times faster Raw SHA-1: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 100 -n 1 Initializing hashcat v2.00 with 1 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 1 Hash type: SHA1 Speed/sec: 10.12M words [solar@...er src]$ ../run/john -test -form=raw-sha1 Benchmarking: Raw-SHA1 [SHA1 128/128 AVX 4x]... DONE Raw: 19075K c/s real, 19075K c/s virtual 19075/10120 = 1.88 times faster Not that bad. I guess hashcat has optimizations here that we don't, but lacks interleaving. Still, I wouldn't use hashcat over john --fork. NTLM: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1000 -n 1 Initializing hashcat v2.00 with 1 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 1 Hash type: NTLM Speed/sec: 14.21M words [solar@...er src]$ ../run/john -test -form=nt Benchmarking: NT [MD4 128/128 AVX 4x3]... DONE Raw: 44687K c/s real, 44687K c/s virtual 44687/14210 = 3.14 times faster Raw SHA-256: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1400 -n 1 Initializing hashcat v2.00 with 1 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 1 Hash type: SHA256 Speed/sec: 5.10M words [solar@...er src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha256 Warning: OpenMP is disabled; a non-OpenMP build may be faster Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... DONE Raw: 9068K c/s real, 9068K c/s virtual 9068/5100 = 1.78 times faster We also have OpenMP support enabled by default for raw SHA-256, but it doesn't scale well for 32 threads: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1400 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: SHA256 Speed/sec: 80.85M words [solar@...er src]$ ../run/john -test -form=raw-sha256 Will run 32 OpenMP threads Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE Raw: 39976K c/s real, 3774K c/s virtual [solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha256 Will run 32 OpenMP threads Benchmarking: Raw-SHA256 [SHA256 128/128 AVX 4x]... (32xOMP) DONE Raw: 40370K c/s real, 3731K c/s virtual hashcat is 2 times faster with multi-threading, but JtR --fork would be faster yet. Raw SHA-512: [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1700 -n 1 Initializing hashcat v2.00 with 1 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 1 Hash type: SHA512 Speed/sec: 1.32M words [solar@...er src]$ OMP_NUM_THREADS=1 ../run/john -test -form=raw-sha512 Warning: OpenMP is disabled; a non-OpenMP build may be faster Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... DONE Raw: 3856K c/s real, 3856K c/s virtual 3856/1320 = 2.92 times faster [solar@...er hashcat-build]$ ./hashcat-cli64.bin -b -m 1700 Initializing hashcat v2.00 with 32 threads and 32mb segment-size... Device...........: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Instruction set..: x86_64 Number of threads: 32 Hash type: SHA512 Speed/sec: 26.80M words [solar@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form=raw-sha512 Will run 32 OpenMP threads Benchmarking: Raw-SHA512 [SHA512 128/128 AVX 2x]... (32xOMP) DONE Raw: 23330K c/s real, 1577K c/s virtual SHA-512 is almost slow enough that JtR's (poor) multi-threading support is almost on par with hashcat's even at 32 threads. Yet --fork would be 2 to 3 times faster than hashcat. My JtR benchmarks are with yesterday's bleeding-jumbo. It could be better to (also) use actual cracking runs to compare the tools - maybe someone else will. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.