|
Message-ID: <20190329122548.GA14460@openwall.com> Date: Fri, 29 Mar 2019 13:25:48 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: apingis@...nwall.net Subject: Re: DES-based crypt(3) cracking on ZTEX 1.15y FPGA boards (descrypt-ztex) Hi, After almost 2 years, we have an update to descrypt-ztex: On Thu, Jun 29, 2017 at 08:39:16PM +0200, Solar Designer wrote: > On Sun, Nov 06, 2016 at 03:06:53PM +0100, Solar Designer wrote: > > This implements descrypt aka traditional DES-based crypt(3) hash > > cracking on ZTEX 1.15y quad Spartan-6 LX150 FPGA boards. Some of you in > > here have these or compatibles (such as the US clones of the originally > > German ZTEX boards). Most of these boards previously worked as Bitcoin > > miners, and were then resold on eBay and such at a fraction of the > > original price. Those we bought for development cost us between 100 EUR > > (lately) and 250 EUR each (earlier). They became rare on eBay now, but > > I guess some asking around in cryptocurrency forums will do the trick > > since there were a lot of those boards around, and only a fraction ever > > reached eBay. ZTEX itself does not sell them anymore. > > > > As implemented by Denis, the "descrypt-ztex" format supports "mask mode" > > (with on-device mask), hybrid modes (where you add a mask on top of > > another mode, referring to the previous mode's generated portions of > > candidate passwords with the "?w" mask), up to 2047 hashes per salt > > (with on-device comparator) so up to a few million hashes loaded total > > perhaps (given a good salt distribution), and it can work with one or > > multiple ZTEX boards at once. > > Besides his recently committed work on bcrypt-ztex Denis has also been > trying to redesign descrypt-ztex. While his attempts were promising > (with ~50% greater expected speeds), they mostly failed so far with > difficult to debug issues. Given the low demand for any of this (with > it being mostly an experiment), I asked Denis that rather than keep > trying to get much better speeds he gathers whatever minor optimizations > he could get working quickly and commits those - and he did. The result > is a design that should run approx. 19/17 times = ~12% faster, and can > be overclocked slightly (5% or so) on top of that. > > Performance is up to about 740M hash computations per second (with room > > for further improvement). > > I am now getting ~806M c/s at standard clocks, ~840M at 5% overclocking > (which appears stable on this board, but YMMV). This is with the same > Qubes USB pass-through as I described for my bcrypt-ztex testing here: > > http://www.openwall.com/lists/john-users/2017/06/25/1 > > Performance should be higher without the virtualization (or with USB > controller pass-through rather than individual device proxying). Denis has now made another attempt at getting further descrypt-ztex optimizations to work, armed with the experience we gained during Denis' experiments with other designs. Specifically, we learned that designs with fewer clock domains work more reliably at high device utilization and power draw. Previously, the descrypt-ztex design ran DES cores at one clock rate (220 MHz by default) and comparators at another (160 MHz by default, which was sufficient). Denis has now optimized the comparators to run at the full DES cores' clock rate and brought them into the same clock domain. With this, he was able to get his actual optimizations to work. There's revised documentation of the design here: https://github.com/magnumripper/JohnTheRipper/tree/bleeding-jumbo/src/ztex/fpga-descrypt Previously, descrypt-ztex had one shared on-device candidate password generator feeding 24 descrypt cores. More cores could easily fit in the device, but the generator's capacity was sufficient to feed only ~23.5 cores, so adding more cores than the 24 made no sense. In the revised design, there are now two big units each with its own candidate password generator feeding its own set of 16 descrypt cores, for a total of 32 cores. The design tools' reported frequency is 221 MHz, which is similar to what we had before, so the expected speedup would be 32/23.5 or +36%. Unfortunately, as we've already seen with other designs trying to utilize the devices more fully (with bcrypt-ztex being the only exception), the boards become unreliable at full clock rate. In our testing, the new design is reliable across many boards at 190 MHz. The luckiest board I have seems to run the new design OK at 215 MHz. Nevertheless, even at 190 MHz the new design is ~17% faster than the previous one was at its 220 MHz. As Denis wrote in his GitHub pull request, at 190 MHz the "theoretical performance is 973 Mc/s, measured 950-960 Mc/s regardless of number of hashes to compare" and "current consumption: 2.8A, at idle 0.4A" at 12V. This corresponds to power consumption of 34W, which is similar to what we had before. And yes, this design update also adds clock gating (all of the *-ztex designs have this now), bringing the idle power consumption to under 5W. I ran many tests of the new descrypt-ztex yesterday and today. For this posting, I'll include tests at length 7 passwords as I think that's what Hashcat benchmarks use, although descrypt-ztex speeds don't actually vary by length (they may in equivalent attacks on CPU and GPU) and I've also confirmed that things work right for all other password lengths. One board (4 FPGAs), 190 MHz: $ ./john -form=descrypt-ztex -inc=lower -min-len=7 -max-len=7 -mask='?w?l?l?l?l' pw-fake-len7 Warning! Section [list.ztex:devices] overridden by john-local.conf SN 2: firmware uploaded SN 2: uploading bitstreams.. ok ZTEX 2 bus:2 dev:86 Frequency:190 190 190 190 Using default input encoding: UTF-8 Loaded 464 password hashes with 442 different salts (descrypt-ztex, traditional crypt(3) [DES ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status buttons (u560-des) cowboys (u1009-des) [...] 208g 0:00:05:04 0.6825g/s 0p/s 972444Kc/s 1026MC/s kenneth..benaikz [...] iforget (u2181-des) Warning: Only 1449 candidates left, minimum 1792 needed for performance. awesome (u539-des) tequila (u2775-des) 464g 0:00:13:55 N/A 0.5555g/s 9619Kp/s 970537Kc/s 1001MC/s tequila..xqj#### Use the "--show" option to display all of the cracked passwords reliably Session completed This is 970M+ c/s actual speed, which is closer than what Denis observed to the theoretical peak speed of 973M. I guess this might be due to the faster host system (desktop vs. laptop) or running against many salts. Four boards (16 FPGAs), 190 MHz: $ ./john -form=descrypt-ztex -inc=lower -min-len=7 -max-len=7 -mask='?w?l?l?l?l' pw-fake-len7 Warning! Section [list.ztex:devices] overridden by john-local.conf ZTEX 3 bus:2 dev:79 Frequency:190 190 190 190 ZTEX 1 bus:2 dev:80 Frequency:190 190 190 190 ZTEX 4 bus:2 dev:81 Frequency:190 190 190 190 ZTEX 2 bus:2 dev:78 Frequency:190 190 190 190 Using default input encoding: UTF-8 Loaded 464 password hashes with 442 different salts (descrypt-ztex, traditional crypt(3) [DES ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status buttons (u560-des) cowboys (u1009-des) [...] 428g 0:00:04:21 40.78% (ETA: 00:19:45) 1.637g/s 12534Kp/s 3802Mc/s 3959MC/s oahaaaa..oaharsf [...] vermont (u2845-des) 457g 0:00:04:38 61.17% (ETA: 00:16:39) 1.643g/s 17674Kp/s 3799Mc/s 3947MC/s vermont..phyarsf zxcvbnm (u345-des) airwolf (u1654-des) zepplin (u2912-des) phoenix (u223-des) Warning: Only 3241 candidates left, minimum 3584 needed for performance. awesome (u539-des) tequila (u2775-des) iforget (u2181-des) 464g 0:00:04:41 N/A 1.650g/s 28584Kp/s 3798Mc/s 3944MC/s iforget..xqj#### Use the "--show" option to display all of the cracked passwords reliably Session completed Scaling efficiency: 3798000/970537/4 = 97.8% The running time reduction seen here is much less than 4x because both runs were lucky to crack all passwords before reaching 100% of the keyspace, but the single-board run was luckier. The running time is also reduced by salts being eliminated as more passwords get cracked. (To search this full keyspace against the 442 salts without eliminating any, it'd have taken 1 hour on one board or 15 minutes on four boards.) One luckiest board (4 FPGAs), 215 MHz: $ ./john -form=descrypt-ztex -inc=lower -min-len=7 -max-len=7 -mask='?w?l?l?l?l' pw-fake-len7 Warning! Section [list.ztex:devices] overridden by john-local.conf ZTEX 2 bus:2 dev:72 Frequency:215 215 215 215 Using default input encoding: UTF-8 Loaded 464 password hashes with 442 different salts (descrypt-ztex, traditional crypt(3) [DES ZTEX]) [...] 59g 0:00:01:23 0.7051g/s 0p/s 1104Mc/s 1191MC/s benaaaa..benaibg [...] Warning: Only 1737 candidates left, minimum 1760 needed for performance. awesome (u539-des) tequila (u2775-des) 464g 0:00:12:19 N/A 0.6270g/s 10869Kp/s 1098Mc/s 1132MC/s tequila..xqj#### We're able to get nearly 1100M c/s here, which is close to theoretical maximum for this clock rate. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.