|
Message-ID: <20051016231343.GA1895@openwall.com> Date: Mon, 17 Oct 2005 03:13:43 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Using Hardwareaccelerators to speed up John Frank, Thank you for providing references to David Hulton's work in this area. I was not aware of it. I've invited David to join us on this mailing list, which he just did. David, -- any comments you could provide would be most appreciated. My comments are inline: > Solar Designer wrote: > >1. General-purpose FPGA-based boards. These would need to be programmed > >for the very specific task. I briefly evaluated this possibility back > >in 1998-1999 and it appeared that FPGAs would deliver roughly 5 times > >better DES performance for the money, compared against the most suitable > >CPUs (at the time, that was Alpha 21164PC - affordable and really good > >at bitslice DES). I used retail prices; the improvement could be a lot > >better for large quantities. On Fri, Oct 14, 2005 at 12:20:31AM +0200, Frank Dittrich wrote: > I think the technology has improved a lot. Indeed, it has. However, my estimate from 6+ years ago ("5 times better DES performance for the money") appears to still hold true for low-end FPGAs. > Also have a look at the slides: > http://www.ccc.de/congress/2004/fahrplan/files/340-fpga-slides.pdf On slide 35, "Password File Cracker", the following performance numbers are given: "PC (3.0Ghz P4 \w john)" - 300,000 c/s "Hardware (Low end FPGA \w jawn)" - 4,000,000 c/s I am assuming that these are for traditional DES-based crypt(3), which is 25 iterations of modified-DES. My guess is that the 3 GHz P4 benchmark was done with John 1.6, which did not yet use bitslice DES on x86 processors. Current publicly available development versions of John do around 700k c/s on 3 GHz P4s. Current non-public development versions of John do around 900k c/s with SSE code on the same P4s. (And as I have already mentioned, PPC G5s are even faster than that - up to 1.6M c/s - but they're more expensive.) So this gives us a 5x performance increase. As it relates to prices, low-end computers based on P4 Celerons (which are not any slower than "full" P4s for John) are likely cheaper than low-end general-purpose FPGA-based cards, both in retail quantities. > David Hulton claimed you'd be able to "crack password hashes as fast as > 100+ PCs using FPGA PCMCIA cards on your laptop". This claim could become the reality, but we're not quite there yet. Slide 36, "Up & Coming", gives an estimate of 60M c/s for Picomon, the most powerful card (of those listed) that would be usable in a laptop. Given that the cards currently available from Pico Computing are priced at around $2,500, my guess is that this new card, when released, is not going to be cheaper. (I'm sure it's a lot cheaper to produce, but companies such as Pico Computing need to cover their development costs and make a profit.) If we're comparing this against desktop PCs, a similarly priced one would be Apple's PowerMac G5 with dual 2.7 GHz processors. It can do over 3M c/s. So the FPGA-based card is "only" 20 times faster (which is still a lot!), not 100+ times faster. > See http://www.ccc.de/congress/2004/fahrplan/event/244.en.html > (IIRC, "basic functionality of john the ripper" merely implemented > brute forcing a part of the key space. This is a very important observation you make. It's not only about c/s rates, but also about the order in which candidate passwords are tried. Much of John's success is due to its ability to try candidate passwords in an optimal order. With a PCMCIA card capable of hashing candidate passwords at a rate of 60 million per second, either the card itself will have to generate the candidate passwords to try (in a far less optimal order) or the laptop's CPU would become the bottleneck since it wouldn't be able to feed the card with candidate passwords at this high a speed. A similar problem exists with testing the computed hashes against a large number of those loaded for cracking. Perhaps the hash tables (used to quickly locate potentially matching hashes) will have to be loaded onto the FPGA card. > But it should also be possible to let john create the password > candidates, and calculate (and compare) the hashes using FPGA > hardware, using an order of magnitude larger MAX_KEYS_PER_CRYPT > value than for general purpose CPUs.) Actually, fully-pipelined implementations of DES are not small, so you can only fit a handful of them onto current FPGAs. If I interpret the numbers from David's slides correctly, he has been assuming 1 to 5 instances of DES per chip. So MAX_KEYS_PER_CRYPT would need to be rather small, unless a larger value is determined to help reduce the communication overhead, etc. David, Some more comments on your publications: http://www.picocomputing.com/press/KeyRecoveryServer.pdf "World's Fastest Lanman/NTLM Key Recovery Server Shipped." This press release says that the server can try over 500M LM keys per second. Very impressive indeed. However, the claims that this is "250x faster than a top of the line CPU" and the "12 hours vs. 136 days" once again assume unoptimal software. (I am sure this is unintentional.) The current publicly available development version of John can do around 7M c/s at LM on modern P4s (2.8+ GHz) and 9.5M c/s at LM on G5 2.7 GHz. So your special-purpose server (with 10 FPGA cards) appears to be 50 to 70 times faster than individual general-purpose CPUs. (Curiously enough, my "5x speedup" estimate from 6+ years ago still holds true.) Also, I believe John's performance at LM hashes could be made 2-3 times better if I would re-design it to try candidate passwords in an order that is optimal for the low-level routines (essentially eliminating key setup overhead). Currently, John tries more likely passwords first, which is highly desirable when using it to detect weak passwords. Exhaustive key searches with no requirement to get the weakest passwords cracked early on are quite a different task. Please don't get me wrong, I find that you're doing the right thing and I'd be interested in possible cooperation. I just couldn't resist the temptation to defend my software. ;-) Thanks, -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments Was I helpful? Please give your feedback here: http://rate.affero.net/solar
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.