|
Message-ID: <20170531173333.GA10584@openwall.com> Date: Wed, 31 May 2017 19:33:33 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: other algorithms on ZTEX 1.15y? On Wed, May 31, 2017 at 06:07:56AM -0800, Royce Williams wrote: > Beyond the algorithms either already supported in john or implemented > elsewhere (descrypt, bcrypt, DES), what other algorithms are feasible > or worthwhile on ZTEX? Are you aware of bcrypt already implemented on ZTEX elsewhere? Where exactly? Have you tested? Regarding DES, are you referring to Gifts' implementation? Have you tried using it, or anything else? Maybe we need to add a plain DES cracker mode to JtR, like I think hashcat has now (but not on FPGAs yet). As to our developments so far, after the descrypt-ztex format Denis has also been working on bcrypt-ztex, citing speeds of ~105k c/s per board at bcrypt cost 5 - but this work is yet to be completed and merged. Actual speeds will vary by cracking mode since the current synchronous crypt_all() API combined with the not-so-fast USB interface results in significant idle time when the candidate passwords are fed from the host. On-FPGA mask mode mostly avoids that (and so will an API revision for asynchronous processing, but we haven't gotten around to that yet). > This project is working on WPA2 support, which seems interesting: > > https://github.com/JarrettR/FPGA-Cryptoparty > > From a brief review of the project's files, I infer that SHA1 and > PBKDF2 would be possible on ZTEX. Would they be worth the effort? For PBKDF2 with MD*/SHA-1/SHA-2, it should be possible to obtain GPU-like speeds on ZTEX, roughly like these boards worked for Bitcoin mining (thus, one quad-FPGA board is roughly same as one high-end GPU from 2015 or so). The purpose would be to put these boards to more general use and to achieve better energy efficiency (compared to GPUs). For fast unsalted hashes, good speeds may only be achieved for up to a few thousand hashes loaded for cracking. This is a lot worse than with GPUs, which handle millions. So focusing on PBKDF2 makes more sense. We didn't come up with a good enough idea for a generic password hashing soft CPU yet. My current thinking is that, to avoid bumping into BRAM port count for the register file as we would with instructions doing little work each, maybe we should have different bitstreams for different crypto primitives like MD5, SHA-1, etc. (one at a time) and have those available through very high latency instructions in the soft CPU to allow for full pipelining - thus, 64 cycles latency for MD5, etc. We'd also have a handful of simpler instructions (same or similar in the different bitstreams) for implementing higher-level crypto schemes around the current bitstream's crypto primitive (this way, the same bitstream will be usable for multiple higher-level schemes sharing the same crypto primitive). These would include data copying and control transfer instructions. A tough question is how to combine the extreme high-latency crypto instructions with control flow transfer - do we have like 63 delay slots? SPARC has 1, some DSPs have a few, but I've never heard of an ISA having tens of delay slots. Yet maybe this is the way to go. Meanwhile, or alternatively, maybe we need PBKDF2-SHA* bitstreams. There are many JtR formats that use PBKDF2, so it would have been a primary candidate for implementation on the soft CPU anyway. For NTLM, we could use a soft CPU having an MD4 primitive, but then do we have anything else needing MD4? Perhaps just raw-MD4? That's very rare, and other MD4-based things are probably even more rare. So perhaps a separate bitstream for NTLM as well, or maybe one usable for NTLM and for raw-MD4 (different placement of characters into the current block in on-FPGA mask mode; the rest of the difference can probably be handled on host). LM will need to be its own bitstream, although it could be a revision of the descrypt design. Denis probably has specific thoughts on it. Technically, we could share a bitstream between descrypt and LM, as that's basically different IV (0 vs. non-0), iterations (25 vs. 1), and salt size (12 vs. 0 bits, but we can simply set the 12 bits to all 0's), but this would be suboptimal. Overall, most JtR formats (perhaps 90%+, with exception for scrypt and the like) could be reasonably implemented for ZTEX, but a speedup over GPU is expected for only a few (bcrypt, maybe Lotus/Domino), the required effort is substantial, and there's almost no demand. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.