|
Message-ID: <20170625170752.GA5048@openwall.com> Date: Sun, 25 Jun 2017 19:07:53 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Cc: apingis@...nwall.net Subject: bcrypt cracking on ZTEX 1.15y FPGA boards (bcrypt-ztex) Hi, After last year's work on descrypt-ztex: http://www.openwall.com/lists/john-users/2016/11/06/1 Denis proceeded to work on bcrypt-ztex this year. We had listed this as planned future work on Katja's project in 2014: http://www.openwall.com/presentations/Passwords14-Energy-Efficient-Cracking/ but unfortunately didn't resume that project until this year. I guess better late than never, especially given that the results achieved are still good even by modern standards (relative to current GPUs), despite of those ZTEX 1.15y boards being rather old by now. As far as I can tell, Denis' implementation is brand new, not building upon Katja's, although our past experience was of some indirect help. We finally got the bcrypt-ztex format into bleeding-jumbo this week. For technical detail on the implementation, you may read: https://github.com/magnumripper/JohnTheRipper/commit/4c37300e32c5b8c47e34be3a0b28a94ecd30da2a#diff-af56e15c23e8e70150ed23cb93cbae6fR1 The speed is roughly ~106k c/s at bcrypt cost 5 on ZTEX 1.15y without overclocking, ~114k with overclocking. It should scale almost linearly with multiple boards (e.g. Denis reported ~103k c/s/board with 3 boards on the same host). I can't easily measure the power consumption right now, but I estimate it's ~20W as both the board (with a large but slowly rotating cooling fan) and the 12V, 5A power adapter (brick) stay barely warm to the touch. These used to get much warmer in Bitcoin mining tests (known to be ~40W). For comparison, according to Jeremi M Gosney's testing hashcat achieves ~23k c/s at bcrypt cost 5 on GTX 1080 Ti: https://gist.github.com/epixoip/ace60d09981be09544fdd35005051505 Hashtype: bcrypt $2*$, Blowfish (Unix) Speed.Dev.#1.....: 23223 H/s (37.63ms) Speed.Dev.#2.....: 22953 H/s (38.08ms) Speed.Dev.#3.....: 22958 H/s (38.05ms) Speed.Dev.#4.....: 22821 H/s (38.30ms) Speed.Dev.#5.....: 23025 H/s (37.89ms) Speed.Dev.#6.....: 23266 H/s (37.60ms) Speed.Dev.#7.....: 23342 H/s (37.41ms) Speed.Dev.#8.....: 23209 H/s (37.62ms) Speed.Dev.#*.....: 184.8 kH/s Thus, these FPGAs from several years back perform slightly faster than this year's top GPUs at bcrypt, per chip. The four-chip ZTEX 1.15y is slightly faster at bcrypt than four GTX 1080 Ti cards, while consuming 10+ times less power. (I suspect the GPUs don't reach their peak power usage on this test, by far, which is why the conservative 10+ figure.) This doesn't mean these FPGAs are so fast and those GPUs are so slow. Rather, it means that bcrypt is a better fit for FPGAs than for GPUs. Now to the setup and testing: To build JtR bleeding-jumbo with ZTEX 1.15y board support, install libusb (e.g., the libusb-devel package on Fedora) in addition to jumbo's usual dependencies. Then use "./configure --enable-ztex". The rest of the build is as usual for jumbo. To access a ZTEX board as non-root (and you shouldn't build nor run JtR as root) on a Linux system with udev, add this: ATTRS{idVendor}=="221a", ATTRS{idProduct}=="0100", SUBSYSTEMS=="usb", ACTION=="add", MODE="0660", GROUP="ztex" e.g. to /etc/udev/rules.d/99-local.rules (create this file). Then issue these commands as root: groupadd ztex usermod -a -G ztex user # where "user" is your non-root username systemctl restart systemd-udevd # or "service udev restart" if without systemd In order to trigger udev to set the new permissions, (re)connect the device after this point. If you use a common Linux distro like Ubuntu or Fedora, the above should be sufficient. In my case this time, the system is Fedora in a Qubes OS VM, so I have to use USB passthrough. Moreover, I didn't want to pass the entire USB controller into the VM, so the data is being proxied through two userspace processes: one in the VM with JtR, and the other in sys-usb. It's a setup supported by Qubes. No customizations other than enabling the passthrough: https://www.qubes-os.org/doc/usb/#attaching-a-single-usb-device-to-a-qube-usb-passthrough There's significant CPU load caused in both of these VMs by such proxying of the candidate passwords stream, and there must be increased latency too. Speeds would probably be slightly higher if I ran the same tests without use of VMs. In a way, it's amazing this works at all and shows decent speeds. Denis' implementation works around our current synchronous crypt_all() API by buffering a large number of candidate passwords - many times larger than the number of cores. The current design has 124 bcrypt cores per chip, so 496 per board. My tests are with "TargetSetting = 5" (tuning for bcrypt cost 5) in the "[ZTEX:bcrypt]" section in john.conf, and this results in: 0:00:00:00 - Candidate passwords will be buffered and tried in chunks of 63488 appearing in john.log. The number 63488 is 496*128. This buffering is similar to what GPUs commonly require, albeit for different reasons (greater concurrency and pipelining on GPUs vs. hiding communication latency to these FPGA boards). Either way, this has usability and efficiency drawbacks when you interrupt/restore a session (especially with large salt count), but it results in nearly optimal c/s rate despite of the synchronous API and the USB latency (especially in my testing in a VM). Here is a test run: $ ./john -form=bcrypt-ztex -mask='tes?l?l?l?l?l' -u=u2781-bf pw-fake-unix ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:141 141 141 141 Using default input encoding: UTF-8 Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:01 1.18% (ETA: 22:05:58) 0g/s 105720p/s 105720c/s 105720C/s tesaaata..tesaaota 0g 0:00:00:05 4.73% (ETA: 22:06:19) 0g/s 106521p/s 106521c/s 106521C/s tesaaale..tesaaole 0g 0:00:00:17 15.38% (ETA: 22:06:24) 0g/s 106583p/s 106583c/s 106583C/s tesaaaan..tesaaoan 0g 0:00:00:34 30.77% (ETA: 22:06:24) 0g/s 106614p/s 106614c/s 106614C/s tesaaaat..tesaaoat testtest (u2781-bf) 1g 0:00:00:35 DONE (2017-06-24 22:05) 0.02807g/s 106581p/s 106581c/s 106581C/s testtest..tes###st Use the "--show" option to display all of the cracked passwords reliably Session completed This is at 141 MHz, which per the design tools is guaranteed to work. As you can see, the speed is about 106.6k c/s. Now hybrid mode, combining mask (in this case simply having it give the known 3 characters verbatim) with incremental mode (thus, necessarily feeding the candidate passwords from host): $ ./john -form=bcrypt-ztex -mask='tes?w' -inc=lower -min-len=8 -max-len=8 -u=u2781-bf pw-fake-unix ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:141 141 141 141 Using default input encoding: UTF-8 Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:02 2.14% (ETA: 22:07:51) 0g/s 102814p/s 102814c/s 102814C/s tesnivfm..tesjrkto testtest (u2781-bf) 1g 0:00:00:04 DONE (2017-06-24 22:06) 0.2331g/s 103593p/s 103593c/s 103593C/s testtest..tesfedal Use the "--show" option to display all of the cracked passwords reliably Session completed Much quicker running time (4 seconds instead of 35) due to incremental mode's more optimal ordering of candidate passwords, even though the c/s rate has reduced to 103.6k c/s (but 4 seconds is too little to measure this precisely). Another variation, running against many hashes (and salts) and using mask mode to double the "words" generated by incremental mode: $ ./john -form=bcrypt-ztex -mask='?w?w' -inc=lower -min-len=8 -max-len=8 pw-fake-unix ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:141 141 141 141 Using default input encoding: UTF-8 Loaded 3107 password hashes with 3107 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:02 0g/s 0p/s 104078c/s 104078C/s lovelove..lvvllvvl 0g 0:00:00:20 0g/s 0p/s 105297c/s 105297C/s lovelove..lvvllvvl 0g 0:00:00:53 0g/s 0p/s 105398c/s 105398C/s lovelove..lvvllvvl asdfasdf (u915-bf) 1g 0:00:01:22 0.01211g/s 0p/s 105415c/s 105415C/s lovelove..lvvllvvl 1g 0:00:03:18 0.005044g/s 0p/s 105375c/s 105375C/s lovelove..lvvllvvl 1g 0:00:04:40 0.003567g/s 0p/s 105307c/s 105307C/s lovelove..lvvllvvl Use the "--show" option to display all of the cracked passwords reliably Session aborted I interrupted this one, but it does show that 105.3k c/s is possible even with incremental mode and a mask on top of it. Now extreme overclocking, setting "Frequency = 163" in the section in john.conf (it is also possible to set individual frequencies per FPGA - see the comments in john.conf - but I did not use this here): $ ./john -form=bcrypt-ztex -mask='tes?l?l?l?l?l' -u=u2781-bf pw-fake-unix ZTEX XXXXXXXXXX bus:2 dev:19 Frequency:163 163 163 163 Using default input encoding: UTF-8 Loaded 1 password hash (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:02 2.37% (ETA: 22:25:57) 0g/s 121213p/s 121213c/s 121213C/s tesaaaka..tesaaoka 0g 0:00:00:08 8.28% (ETA: 22:26:09) 0g/s 122572p/s 122572c/s 122572C/s tesaaani..tesaaoni 0g 0:00:00:18 18.93% (ETA: 22:26:08) 0g/s 122868p/s 122868c/s 122868C/s tesaaaxn..tesaaoxn 0g 0:00:00:26 27.22% (ETA: 22:26:08) 0g/s 122871p/s 122871c/s 122871C/s tesaaais..tesaaois testtest (u2781-bf) 1g 0:00:00:31 DONE (2017-06-24 22:25) 0.03224g/s 122425p/s 122425c/s 122425C/s testtest..tes###st Use the "--show" option to display all of the cracked passwords reliably Session completed This worked here (and 163 MHz is actually the maximum that does, with higher values failing even this quick test) achieving 122.4k c/s, but more thorough testing shows this design and board are unstable at this high frequency, so I didn't quote it above. The highest that works reliably for me so far is 152 MHz, where the below tests are supposed to and do crack all of the 239 short passwords, 7 times in a row: $ egrep '^([^:]*:){4}[a-z]{4}:' pw-fake-unix > pw-fake-len4 $ for n in `seq 1 7`; do rm john.pot; ./john -form=bcrypt-ztex -mask='?l?l?l?l' -verb=1 pw-fake-len4; done ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 239g 0:00:06:39 N/A 0.5981g/s 1143p/s 114625c/s 114625C/s alex..###q Session completed ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 239g 0:00:06:40 N/A 0.5972g/s 1141p/s 114458c/s 114458C/s alex..###q Session completed ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 239g 0:00:06:39 N/A 0.5980g/s 1143p/s 114613c/s 114613C/s alex..###q Session completed ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 239g 0:00:06:39 N/A 0.5976g/s 1142p/s 114527c/s 114527C/s alex..###q Session completed ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 239g 0:00:06:39 N/A 0.5976g/s 1142p/s 114542c/s 114542C/s alex..###q Session completed ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 239g 0:00:06:39 N/A 0.5977g/s 1142p/s 114550c/s 114550C/s alex..###q Session completed ZTEX XXXXXXXXXX bus:2 dev:25 Frequency:152 152 152 152 Using default input encoding: UTF-8 Loaded 239 password hashes with 239 different salts (bcrypt-ztex [Blowfish ZTEX]) Press 'q' or Ctrl-C to abort, almost any other key for status 239g 0:00:06:40 N/A 0.5971g/s 1141p/s 114444c/s 114444C/s alex..###q Session completed So that's 114.5k c/s at maximum overclocking here. I must admit this board is 10% overvolted (extra resistors soldered on by the previous owner), but per testing at Bitcoin mining this only provided a 1% increase in maximum reasonable clock rates (vs. other non-overvolted boards), so it's probably similar here. Denis' boards are not overvolted, but he mentioned getting similar maximum stable clocks and speeds. YMMV. If you test our *-ztex formats as well, please share your feedback. In case you'd like to reproduce these results, our pw-fake-unix is available at: http://openwall.info/wiki/john/sample-hashes#Sample-password-hash-files Also see this recent reply on what else we could implement on FPGAs: http://www.openwall.com/lists/john-users/2017/05/31/2 And this Twitter poll/thread: https://twitter.com/solardiz/status/876087192573104128 PBKDF2-HMAC-SHA* won, and we'll likely have it in a few months from now. This means things like WPA and dmg. Another target we intend to explore is AWS F1, but we don't have anything ready yet. F1 turned out to be reasonably priced - $1.65/hour per FPGA, spot price now is ~$0.18/hour (I guess not much demand yet): https://aws.amazon.com/ec2/instance-types/f1/ https://aws.amazon.com/ec2/pricing/on-demand/ https://aws.amazon.com/ec2/spot/pricing/ (choose N. Virginia) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.