Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20051013023349.GA26577@openwall.com>
Date: Thu, 13 Oct 2005 06:33:49 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Using Hardwareaccelerators to speed up John

On Thu, Oct 13, 2005 at 02:14:16AM +0200, sebastian.rother@...erlin.de wrote:
> At first: Sorry Solar but I start from zero.
> I can't know what you maybe forwarded or not.
> So I'm sorry if something here (or maybe even everything) is already known.
> 
> I already talked (a littlebit *looks into the sky*) with Solar direclty.

Sebastian, -- actually, I am grateful to you for including "full
context" in this posting. :-)

> Using OpenSSL could speed up John if a crypto-accelerator is present and
> if the OpenSSL-Version supports these Cards.
> I'm using OpenBSD wich supports such functions for about 2 years now (but
> the official OpenSSL should provide it now also).

That's only a part of the theory.  There are many things which can make
this fail in practice:

1. As you mention, many (most?) crypto accelerator cards are actually
quite slow or optimized for operations which are of no use for John
presently (e.g., public key crypto).

2. With password cracking, we're encrypting or hashing only very small
bits of data, each time with a new key (candidate password).  Crypto
hardware, however, is generally optimized for an entirely different
task, where a lot of data is encrypted or decrypted with a given key
and/or a message-digest is calculated for a non-negligible amount of
data (e.g., an IP packet).  The published Mbps rates are usually those
for intended uses and do not take into account any key setup overhead.
If the implementations would be abused for password cracking, that
overhead could easily amount to most of the processing time.

3. Speaking of DES-based crypt(3) in particular (including all of its
flavors that I am aware of), I have not seen any crypto accelerator card
vendor declare support for the modified DES that is used in crypt(3).
It means that most (all?) of such cards can't be used for computing
those hashes, except for the special case of "salt == 0" (encoded
as ".."), -- that would be one unlucky password hash in about 4096 for
the traditional crypt(3).

4. The stronger password hash types all use multiple iterations of
underlying cryptographic primitives.  For example, the traditional
crypt(3) uses 25 iterations of modified-DES, and the FreeBSD-style
MD5-based crypt(3) uses 1000 iterations of MD5 compression function
(for passwords no longer than 15 characters).  I have not seen any
crypto accelerator card vendor declare support for iterating a
cryptographic primitive over the same data (with the output becoming the
new input) without involvement from the host system.  Additionally,
often non-trivial processing is required between the iterations, such as
for the MD5-based crypt(3).  Let's use it as an example:

The current implementation of the MD5-based crypt(3) in John (that does
not yet use MMX/SSE/AltiVec and the like, -- great speedups are possible
here!) achieves 5k c/s on a typical Pentium 4 processor (and up to 10k
c/s on the fastest ones available).  The 5k c/s correspond to 5 million
invocations of the MD5 compression function a second, plus a lot of
"high level" overhead.  The compression function takes a 64-byte data
block and a 16-byte vector as its input, and produces another 16-byte
vector as its output.  That's 96 bytes of data to transfer per
invocation.  (In practice, it is likely that a crypto card would not
offer the compression function on its own, resulting in more overhead.)
Ignoring any protocol overhead, that would amount to 480 Mbytes/second
of data transfer to/from the card.  That's almost 4 times the PCI
bandwidth.  Of course, faster buses do exist, but didn't we want an
economical solution and also one allowing to use multiple cards in a
system (with all sharing the same bus)?

5. As I have already mentioned, some high-level overhead can't be
off-loaded from the main CPU unless the crypto card would happen to
implement the exact high-level function we require.  Such overhead is
very far from negligible.  Additionally, off-loading the actual crypto
to the card introduces the communications overhead.  Some software needs
to speak a communications protocol to the card.  The complexity for such
work is comparable and can even exceed that of computing the
cryptographic primitives right on the main CPU like John does currently.

6. Some simpler/quicker hash types, such as LM hashes, are likely
possible to implement almost fully on existing crypto cards, however
they're so quick that the point in doing so is moot.  The CPU would once
again have to exchange data with the card at such a high rate where the
overall performance might not be any better (and is likely to be worse)
than it is without the card.  The main CPU would also be the one to
continue to generate candidate passwords to try at this high rate, so
even if the bandwidth and communication protocol overhead issues were
somehow resolved, the CPU would quickly become the bottleneck.  So much
for adding more crypto cards to a system.

> Offen you'll find just some realy lame Chips on VPN-Hardware but if you
> don#t buy such a Cisco-Junk solution you could also get such a device here
> (not sold yet):
> 
> http://www.soekris.com/vpn1461.htm
> 
> This card could, depends to the algorithm, do e.g. up to 920Mbps of DES.

Now this is not that bad, however, John already achieves better than
that on modern CPUs.  In particular, it achieves 1M c/s for traditional
crypt(3) on PPC G5 1.8 GHz or P4 3.6 GHz (the latter with non-public SSE
code, I must admit).  This roughly corresponds to 1.6 Gbps at DES.
PPC G5 2.7 GHz does over 1.6M c/s, which roughly corresponds to 2.5 Gbps
at DES.

More importantly, please see above for why this rate likely does not
apply to password cracking.

> I thought about this idea because i've some Pentium2-Motherboards here.
> If I would e.g. buy 5 Crypto-Cards they would work simultaniusly (like
> SMP) on OpenBSD. The algorithm isn't the best Theo told me but it should
> work.

Your CPU would be the bottleneck.  I doubt its performance would be
sufficient to fully make use of a single card like that.

> because these cards are cheap AND fast it would be logical to "abuse"
> these cards to speed up john a littlebit.

This may seem logical at first, but it really won't work well.  I think
I've given enough reasons why not, above.

However, not all is lost.  There are things which actually would work:

1. General-purpose FPGA-based boards.  These would need to be programmed
for the very specific task.  I briefly evaluated this possibility back
in 1998-1999 and it appeared that FPGAs would deliver roughly 5 times
better DES performance for the money, compared against the most suitable
CPUs (at the time, that was Alpha 21164PC - affordable and really good
at bitslice DES).  I used retail prices; the improvement could be a lot
better for large quantities.

2. A special-purpose accelerator card for John, if such a beast would
ever be developed. ;-)  (Or any card designed with password cracking in
mind, not necessarily for John.)

Neither has anything to do with OpenSSL.  As explained in John the
Ripper documentation, I found crypt(3) to be too limited an API to allow
for most efficient implementations, -- which is why John uses other
interfaces internally.  The OpenSSL API is similarly too limited.

> OpenBSD on e.g. a new Via Epia Motherboard would enable John to crack AES
> with nearly 20Gbit/s because of the VIa "Pad-Lock"-Engine wich is also
> fully supported by OpenBSD.

Of course, encryption speed does not translate to cracking speed, as I
think I have explained above.  (It works better the other way around,
which is why I dare to make some statements about John's performance
above.)  I'd bet the 20 Gbps involves no key setup.  With cracking,
we've got primarily key setup.

Having this said, I'd be interested to review the specifications for
this motherboard, -- got an URL?

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.