|
Message-ID: <20051017213736.GA8488@openwall.com> Date: Tue, 18 Oct 2005 01:37:36 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Using Hardwareaccelerators to speed up John On Mon, Oct 17, 2005 at 10:17:49AM -0700, h1kari wrote: > Solar Designer wrote: > > If so, starting with a few thousand LM hashes to crack, John running > > on a single modern CPU would outperform your implementation on a single > > Pico E-12 LO. Is that correct? > > Yeah, I would think so. If you're comparing to a few thousand hashes > after each computation of crypt() Actually, for bitslice DES, John obviously only does comparisons once for every N hashes computed, with N being the machine's word size (or vector size, e.g. 128 for AltiVec). For small numbers of hashes loaded (for a given salt, if applicable), the following algorithm is used: /* * The trick I used here allows to compare one ciphertext against all the * DES_bs_crypt() outputs in just O(log2(ARCH_BITS)) operations, assuming * that DES_BS_VECTOR is 0 or 1. This routine isn't vectorized, yet. */ int DES_bs_cmp_all(ARCH_WORD *binary) [...] (that's in DES_bs.c). For AltiVec on 32-bit PowerPC, it would require 4*5 = 20 operations (on average) to check a loaded password hash against 128 computed ones. For a plain 64-bit architecture such as Alpha, it would require only 6 operations for the 64 "comparisons". For larger numbers of hashes loaded (per salt, if applicable), hash tables are used instead. > would it slow you down a little bit > compared to your performance info you gave earlier? "A little bit" would be the answer. I can run some benchmarks if you're interested in exact numbers. I previously did such measurements for large numbers of crypt(3) hashes per salt, up to several thousand. The slowdown was barely measurable. Of course, LM hashes are a lot quicker and would be impacted more. > > so are you suggesting to run an embedded Linux and John on one or both > > of the embedded PowerPCs? > > Yeah, I was suggesting that mainly because of the latency and overhead > involved with talking to a host processor. Right now if we're generating > 20M c/s, that's roughly 1.2Gbps fof passwords going to the card, and > another 1.28Gbps coming back. Yes, and that's a bit excessive. However, we might be able to halve those numbers by using a trivial compression (or rather, encoding) algorithm on the candidate passwords, such as DAWG as implemented for wordlists in Crack 5 (and have the embedded PowerPC undo that?), and we don't really need full 64 bits of hashes on the way back. 32 bits (or even less) would be sufficient to rule out most candidate passwords and the rest could be tested on the host's CPU. Then it would fit in PCI or in Gigabit Ethernet. > When we're talking about Lanman/NTLM it's 25x that. This approach won't work well for non-iterated hashes. The speedup from it would be minimal, -- like getting the same 20M c/s for LM that we would for crypt(3). For those hashes, we have to let the FPGA at least produce batches of candidate passwords, maybe implementing an equivalent of John the Ripper "incremental" mode's inner loop. The outer loops could be left on the host system. The stream of computed hashes would remain a problem, though. > The other thing is we could use the APU bus for > specific operations that could be sped up inside of some of the > algorithms, similar to your SSE or Altivec optimizations, but more > geared towards crypto. Yes. Actually, we might be able to implement sort of a vector computer, increasing vector size way beyond AltiVec's 128 bits. Then the existing framework in John would work for it. > Our higher end cards do have 256MB of DDR2 ram > that runs up to 600mhz I believe, and 64MB of flash. We do have plans > for building one in the near future with 2GB+ of user flash, which may > be useful for wordlist storage and such. Yes, this definitely could run John in its entirety, but it could be somewhat tricky to manage for someone not familiar with the system. > > Perhaps we could start by having the FPGA card do the bare minimum and > > having John running on the host system communicate candidate passwords > > and salts to the card and computed hashes back. It won't be very fast, > > but it's likely the quickest way to get us started. We can do it for > > just one hash type initially (perhaps one of the Unix hashes since we > > need it to not be very fast). > > Yeah, that sounds like a good starting point which would be fairly > trivial right now. I could get something set up for you and send you a > card + driver in the next couple of weeks if you're really interested. > We're still working on getting linux running fully on our boards, but I > expect APU john optimizations would be trivial once that's completed in > the next couple of months. Yes, I am interested. Finding the time for this is, however, not going to be easy. I know I won't be able to approach this any earlier than December. Let's discuss this off-list. Thanks, -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.