|
Message-ID: <20181012185956.GA23135@openwall.com> Date: Fri, 12 Oct 2018 20:59:57 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: Denis Burykin <apingis@...nwall.net> Subject: md5crypt & phpass password cracking on FPGA Hi, As many of you are aware, we support descrypt, bcrypt, sha512crypt, sha256crypt, and Drupal7 password hash cracking on the old ZTEX 1.15y quad-FPGA boards. Threads: https://www.openwall.com/lists/john-users/2016/11/06/1 https://www.openwall.com/lists/john-users/2017/06/25/1 https://www.openwall.com/lists/john-users/2018/07/23/1 https://www.openwall.com/lists/john-users/2018/08/27/11 Now Denis has also added support for md5crypt and for phpass "portable hashes" on those same boards. (While phpass as released by Openwall primarily uses bcrypt, it also includes a "last resort fallback" to MD5-based "portable hashes", which many popular web apps forced use of for portability.) Similarly to sha512crypt, sha256crypt, and Drupal7, this addition is not so much to compete with GPUs as it is to provide a way to put those FPGA boards to more uses. Also just like our implementations of sha512crypt, sha256crypt, and Drupal7 hashes on FPGA, this is, to the best of my knowledge, the very first time md5crypt and phpass "portable hashes" are implemented on FPGA. Denis wrote a good description of the design with some ASCII diagrams, currently found here: https://github.com/magnumripper/JohnTheRipper/tree/bleeding-jumbo/src/ztex/fpga-md5crypt Similarly to Denis' designs for sha512crypt + Drupal7 and sha256crypt, the new one for md5crypt + phpass uses specialized soft CPU cores along with cryptographic cores. However, the specific parameters of those cores changed once again: this time, it's 16-bit 12-way SMT CPU cores along with sets of 3 MD5 cores each capable of up to 4 in-flight hashes. Three MD5 cores, one soft CPU core, and memory and glue logic form a unit. 32 units fit in one Spartan-6 LX150 FPGA. This means 32 soft CPU cores, 384 hardware threads, 96 MD5 cores, up to 384 in-flight MD5 per FPGA. Four times that - meaning 1536 in-flight hashes - per board. Also included are on-device candidate password generator (for mask mode, including in hybrid modes along with a wordlist coming from host, etc.) and hash comparator (capable of up to 512 loaded hashes per salt; no limit on total loaded hashes as that's handled on host). This is the same as the designs for sha512crypt + Drupal7 and sha256crypt also have. Per Xilinx tools, this design was supposed to work at 202 MHz. In our testing on actual boards, the design works reliably for us at 180 MHz, which we set as the default and made it configurable in john.conf. Some other implementations of md5crypt and phpass "portable hashes" that we have in JtR have password length limitations of 15 and 39 characters, respectively - as an optimization. This FPGA implementation of these hashes is capable of password lengths up to 64. Actual performance varies by salt and password length. For the benchmarks below, I'll use salt length of 8 and password length of 7 to match Hashcat benchmarks. Here's a test run against one md5crypt hash on one board (4 FPGAs) at 180 MHz: $ perl -e 'print crypt("passMD5", "\$1\$saltsalt"), "\n";' > pw-md5crypt-1 $ cat pw-md5crypt-1 $1$saltsalt$TwZH0EJ82F8jZZW.s.uLn/ $ ./john -form=md5crypt-ztex -mask='pas?a?a?a?a' pw-md5crypt-1 [...] Loaded 1 password hash (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status passMD5 (?) 1g 0:00:00:18 DONE (2018-10-12 19:31) 0.05506g/s 944245p/s 944245c/s 944245C/s passMD5..pas##|5 A higher frequency might work, but isn't reliable across the boards we tested. That said, here's a lucky run on a lucky board at 210 MHz just to hit and exceed 1M c/s: $ ./john -form=md5crypt-ztex -mask='pas?a?a?a?a' pw-md5crypt-1 -dev=04A3466XXX ZTEX 04A3466XXX bus:2 dev:8 Frequency:210 210 210 210 Using default input encoding: UTF-8 Loaded 1 password hash (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status passMD5 (?) 1g 0:00:00:15 DONE (2018-10-12 19:43) 0.06389g/s 1095Kp/s 1095Kc/s 1095KC/s passMD5..pas##|5 For all further tests, I'll use 180 MHz. Let's pretend more of the password is unknown, for a longer test on one board (4 FPGAs) and also to test a hybrid mode: $ ./john -form=md5crypt-ztex -inc -mask='?w?a?a?a' -min-len=7 -max-len=7 pw-md5crypt-1 [...] Loaded 1 password hash (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status passMD5 (?) 1g 0:00:02:23 DONE (2018-10-12 19:55) 0.006945g/s 952837p/s 952837c/s 952837C/s passMD5..pass##| Four boards (16 FPGAs): $ ./john -form=md5crypt-ztex -inc -mask='?w?a?a?a' -min-len=7 -max-len=7 pw-md5crypt-1 [...] Loaded 1 password hash (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status passMD5 (?) 1g 0:00:00:36 DONE (2018-10-12 19:55) 0.02751g/s 3773Kp/s 3773Kc/s 3773KC/s passMD5..pass##| Scaling efficiency 3773000/952837/4 = 99.0%. Modern high-end GPUs are several times faster than that. However, this speed is on par with what was achieved with Hashcat on AMD HD 7970 aka Tahiti, the fastest GPU contemporary to these FPGA boards (circa 2012), and ours is achieved at moderately lower power consumption. Denis says the boards running this design consume around 2.8A at 12V at 190 MHz (we don't readily have a figure for 180 MHz and I don't want to delay posting this), which means about 135W for the four boards. HD 7970 was a 250W TDP GPU card; its actual power usage could be less, and underclocking could provide better power efficiency, but probably not to the extent of reaching 135W at this performance level. Now to some multi-hash runs for reliability testing. Four boards, mask: $ perl -e 'for ($i = 100; $i < 612; $i++) { print crypt("pass$i", "\$1\$saltsalt"), "\n"; }' > pw-md5crypt $ ./john -form=md5crypt-ztex -mask='pas?a?a?a?a' -verb=1 pw-md5crypt [...] Loaded 512 password hashes with no different salts (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 512g 0:00:00:06 DONE (2018-10-11 20:55) 74.41g/s 3672Kp/s 3672Kc/s 1263MC/s pass477..pas##Mj Four boards, larger mask for a longer run: $ ./john -form=md5crypt-ztex -mask='pa?l?a?a?a?a' -verb=1 pw-md5crypt [...] Loaded 512 password hashes with no different salts (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:06 1.06% (ETA: 21:19:48) 0g/s 3711Kp/s 3711Kc/s 1922MC/s paaaaee..paaa9ee 52g 0:00:00:18 3.37% (ETA: 21:19:19) 2.744g/s 3764Kp/s 3764Kc/s 1968MC/s paaaa5i..paaa95i 155g 0:00:01:00 10.81% (ETA: 21:19:38) 2.555g/s 3776Kp/s 3776Kc/s 1779MC/s pass372..pass242 359g 0:00:01:57 20.92% (ETA: 21:19:43) 3.065g/s 3783Kp/s 3783Kc/s 1414MC/s paaaa%5..paaa9%5 507g 0:00:02:28 26.59% (ETA: 21:19:41) 3.404g/s 3781Kp/s 3781Kc/s 1188MC/s pass367..pass587 512g 0:00:02:30 DONE (2018-10-11 21:12) 3.413g/s 3779Kp/s 3779Kc/s 1172MC/s pass477..pa###R7 (I pressed a key a few times.) This shows roughly the same speed as we have for large enough mask when running against one hash, meaning the comparator against 512 loaded hashes (sharing this same salt for testing) doesn't slow things down. Four boards, wordlist and mask: $ ./john -form=md5crypt-ztex -w=rtop1m -mask='?w?d' -verb=1 pw-md5crypt [...] Loaded 512 password hashes with no different salts (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 280g 0:00:00:03 DONE (2018-10-11 21:01) 80.92g/s 3283Kp/s 3283Kc/s 1468MC/s 1alyssak#..Tom_Ere_2k9# The number 280 is correct - you can also see it for a similar test in the posting about sha256crypt referenced above. But 3 seconds is too quick for a speed measurement, and transferring a word from host over USB for every 10 hashes computed is probably slow. Let's pretend we didn't know the last character is a digit, so we can increase our "mask amplifier": $ ./john -form=md5crypt-ztex -w=rtop1m -mask='?w?a' -verb=1 pw-md5crypt [...] Loaded 512 password hashes with no different salts (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 280g 0:00:00:29 DONE (2018-10-11 21:04) 9.582g/s 3693Kp/s 3693Kc/s 1271MC/s 004811010124i..----- Now we get performance figures closer to the maximum we've seen before. A similarly amplifying mask is two digits: $ ./john -form=md5crypt-ztex -w=rtop1m -mask='?w?d?d' -verb=1 pw-md5crypt [...] Loaded 512 password hashes with no different salts (md5crypt-ztex, crypt(3) $1$ [md5crypt ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 300g 0:00:00:30 DONE (2018-10-12 20:27) 9.708g/s 3676Kp/s 3676Kc/s 1282MC/s 1alyssak##..----- Again, 300 is the correct number here, as confirmed by a run on the original HD 7970 (925 MHz): $ ./john -form=md5crypt-opencl -w=rtop1m -mask='?w?d?d' -verb=1 pw-md5crypt Using default input encoding: UTF-8 Loaded 512 password hashes with no different salts (md5crypt-opencl, crypt(3) $1$ [MD5 OpenCL]) Press 'q' or Ctrl-C to abort, almost any other key for status 300g 0:00:00:55 DONE (2018-10-12 20:35) 5.404g/s 2043Kp/s 2043Kc/s 691495KC/s !ejr8!69.. nam77 Incidentally, " nam" is in fact the last line in the "rtop1m" wordlist. Unfortunately, the reporting of the range of candidate passwords is wrong when using on-device mask (which for this hash type we have on FPGA, but not on GPU). Now to phpass tests. Four boards, mask only: $ cat pw-phpass $P$9saltstriXeNc.xV8N.K9cTs/XEn13. $ ./john -form=phpass-ztex -mask='a?l?l?l?l?l?l' pw-phpass [...] Loaded 1 password hash (phpass-ztex [phpass ($P$ or $H$)]) Cost 1 (iteration count) is 2048 for all loaded hashes Press 'q' or Ctrl-C to abort, almost any other key for status abcdefg (?) 1g 0:00:01:53 DONE (2018-10-11 21:42) 0.008816g/s 1881Kp/s 1881Kc/s 1881KC/s abcdefg..a###eqg As expected, this is roughly one half of md5crypt's speed because the number of iterations of MD5 is increased from 1000 to 2048. (It can vary between different phpass hashes. It's just that 2048 is used for benchmarks for historical reasons. This is also the value that phpBB3 uses, whereas WordPress uses 8192 - thus, cracking of WordPress phpass hashes is 4 times slower yet - but is also supported on FPGA now.) It's possible to implement phpass "portable hashes" on FPGA slightly more efficiently, without bothering with the soft CPUs and reclaiming their area for more MD5 cores, since it's a much simpler algorithm than md5crypt. However, we preferred to get the implementation almost for free on top of the md5crypt design, by having phpass implemented as a different entry point into the soft CPU program running on the exact same hardware design (same bitstream). (It's the exact same approach we used for having sha512crypt and Drupal7 share the hardware design.) Hybrid with somewhat low mask amplifier (one letter): $ ./john -form=phpass-ztex -inc=lower -mask='?w?l' -min-len=7 -max-len=7 pw-phpass [...] Loaded 1 password hash (phpass-ztex [phpass ($P$ or $H$)]) Cost 1 (iteration count) is 2048 for all loaded hashes Press 'q' or Ctrl-C to abort, almost any other key for status abcdefg (?) 1g 0:00:00:05 DONE (2018-10-11 21:43) 0.1904g/s 1797Kp/s 1797Kc/s 1797KC/s abcdefg..llabat# No mask, every candidate password is transferred over USB from host: $ ./john -form=phpass-ztex -inc=lower -min-len=7 -max-len=7 pw-phpass [...] Loaded 1 password hash (phpass-ztex [phpass ($P$ or $H$)]) Cost 1 (iteration count) is 2048 for all loaded hashes Note: This format may be a lot faster with --mask acceleration (see doc/MASK). Warning: Slow communication channel to the device. Increase mask or expect performance degradation. Press 'q' or Ctrl-C to abort, almost any other key for status abcdefg (?) 1g 0:00:00:01 DONE (2018-10-11 21:43) 0.7812g/s 1228Kp/s 1228Kc/s 1228KC/s abcdefg..lyziesi The c/s rate is lower by a third, but the attack duration is a lot lower due to the more optimal ordering of candidate passwords. In fact, the attack duration is still low even if we don't specify the password length (let incremental mode try different lengths): $ ./john -form=phpass-ztex -inc=lower pw-phpass [...] Loaded 1 password hash (phpass-ztex [phpass ($P$ or $H$)]) Cost 1 (iteration count) is 2048 for all loaded hashes Note: This format may be a lot faster with --mask acceleration (see doc/MASK). Warning: Slow communication channel to the device. Increase mask or expect performance degradation. Press 'q' or Ctrl-C to abort, almost any other key for status abcdefg (?) 1g 0:00:00:02 DONE (2018-10-11 21:45) 0.4149g/s 1305Kp/s 1305Kc/s 1305KC/s abcdefg..shunnas And it's essentially zero if we let JtR use its default wordlist, as this is a common password: $ ./john -form=phpass-ztex pw-phpass [...] Loaded 1 password hash (phpass-ztex [phpass ($P$ or $H$)]) Cost 1 (iteration count) is 2048 for all loaded hashes Note: This format may be a lot faster with --mask acceleration (see doc/MASK). Warning: Slow communication channel to the device. Increase mask or expect performance degradation. Press 'q' or Ctrl-C to abort, almost any other key for status abcdefg (?) 1g 0:00:00:00 DONE 2/3 (2018-10-11 21:45) 1.694g/s 265840p/s 265840c/s 265840C/s abcdefg..Sssing This shows that while raw performance is important, being smart is even more important. We'd appreciate more testing of this and other ZTEX formats by the community. Please post your results as follow-ups to the appropriate threads, or start a new thread if your posting isn't hash type specific. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.