Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120415001747.GA518@openwall.com>
Date: Sun, 15 Apr 2012 04:17:47 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: statistics -openssl vs john

Hi Deepika,

I'm sorry we failed to reply to your question on john-dev sooner.
Anyway, this fits john-users as well (or better).

First, magnum is right: you're comparing apples to pears.  Yet such
comparisons are sometimes useful if you know how to interpret the
results and don't assume that you have a direct comparison.

Then, these performance numbers and those you posted before suggest that
you might be on a virtual machine or at least on a system with other
load.  Benchmark results are very often incorrect when you run those
benchmarks inside a VM: the VM's timers might not behave well enough.
In your case, OpenSSL's performance numbers might be inflated (I am
getting twice worse speeds on a non-virtualized 2.5 GHz Core 2'ish CPU),
and John's affected in some other way.  Large c/s real vs. c/s virtual
differences are not normal when you're benchmarking things on a
supposedly otherwise idle system.  You need to actually make the system
idle first - and avoid VMs.

I've included some further comments inline:

On Sun, Apr 15, 2012 at 12:59:51AM +0530, Deepika Dutta Mishra wrote:
> Hi, I was doing speed test between openssl des and john des. I get
> following statistics for openssl
> 
> type             16 bytes     64 bytes    256 bytes   1024 bytes   8192
> bytes
> des cbc         100225.76k    89521.76k    89778.20k    95060.70k
> 96158.84k

Here's what I am getting with OpenSSL 1.0.0d on a Xeon E5420 2.5 GHz
(using one core in it):

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des cbc          47318.13k    49515.67k    50002.01k    49758.55k    49883.82k

OpenSSL 1.0.1 on FX-8120 o/c 4.5 GHz (turbo):

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des cbc          72202.15k    75636.31k    75584.09k    75591.00k    76682.58k

So your numbers look inflated to me.  Possibly your VM's timer that
OpenSSL's benchmark happened to use (via your guest OS kernel) ran
slower than real time.  Well, or maybe you just used a faster or/and
more suitable CPU (like overclocked Sandy Bridge)?

...Oh, I think I've just figured it out: OpenSSL uses virtual (CPU) time
for its benchmarks (confirmed with a quick test with 48 parallel
invocations with a script on the FX-8120, which only has 8 logical CPUs),
and from your John benchmarks we already know that you have a large
discrepancy between real and virtual time (other system load or/and VM).

So your benchmark results are not to be relied upon for any purpose.

> and for john
> 
> Benchmarking: Traditional DES [32/32 BS]... DONE
> Many salts:    434566 c/s real, 997527 c/s virtual
> Only one salt:    426208 c/s real, 568277 c/s virtual
> 
> Benchmarking: LM DES [32/32 BS]... DONE
> Raw:    9306K c/s real, 12086K c/s virtual

These are pretty low speeds for John.  Do you deliberately build it with
a non-optimal make target for a "fair" comparison against OpenSSL
(assuming that OpenSSL won't use SSE2 or the like for DES)?  That might
not actually be fair.  The primary advantage of bitslicing is that it
lets you use arbitrarily wide machine words or SIMD vectors efficiently.
With only 32-bit machine words, that advantage is not present, but with
128-bit SSE2 vectors it is.

Then, speaking of apples and oranges^Wpears, with OpenSSL you have a
fixed key and you encrypt one stream of data with it.  With the bitslice
DES code in JtR, you have a set of ever-changing keys and you encrypt a
constant value with those.  These two tasks are quite different - not
only in terms of parallelism (you got to have 32 separate keys or/and
blocks at once for your build of JtR above), but also in terms of work
performed (with JtR, you're doing a lot of key setup, whereas in OpenSSL
it is not benchmarked - it is out of the benchmark's loop there).

> Now considering openssl, it can process 100225.76 x 1000 = 100225760
> bytes/sec which should account to 100225760 /8 = 12528220 encryptions/sec
> (since DES block size is 8 bytes)

Yes (if your benchmark results were correct, which they are not).

> With john, considering LM DES (which according to what I read does 2 DES
> encryption),

No, it is just one DES encryption, but the key changes every time you do
it (JtR tries different candidate passwords), and there's also the hash
comparison step (to detect cracked passwords).

> the result is  9306 x 1000 = 9306000 x 2 = 18612000
> encryption/sec

It'd be just 9306 x 1000 = 9306000 encryptions/sec, but that's wrong
because OpenSSL uses virtual time, so you have to pick c/s virtual here,
so it'd be 12086 x 1000 = 12086000 encryptions/sec.  But that's still
wrong because we have no idea how your real to virtual time ratio
changed between the two benchmarks (clearly, it does change over time
significantly - this is seen on different ones of your JtR benchmarks)
and, more importantly, because in one case you're benchmarking DES
encryption alone and in the other key setup and encryption and hash
comparisons at once.

> This provided 1.48 times speedup with john des (non sse or other
> optimizations). Am I right in my calculation?

No.

Anyway, to get an idea of how fast John can really get, see:

http://www.openwall.com/lists/announce/2011/06/22/1

Using a similar apples to pears comparison, this gives (for a Core
i7-2600K 3.4 GHz + turbo):

---
Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     20668K c/s real, 2593K c/s virtual
Only one salt:  8724K c/s real, 1094K c/s virtual

That's for 8 threads on this quad-core CPU with SMT.

(By the way, this corresponds to over 500 million of DES block
encryptions per second, or a data encryption speed of 33 Gbps, if we
were encrypting data.  Of course, in practice there would be other
limitations, such as data transfer bandwidth.  But the crypto code and
the CPU are this fast.)
---

Newer versions of JtR built with newer gcc achieve higher speeds on the
same machine:

Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE
Many salts:     22773K c/s real, 2843K c/s virtual
Only one salt:  18284K c/s real, 2291K c/s virtual

Since every DES-based crypt(3) computation involves 25 modified-DES
encryptions (slower than normal DES), that's over 4.5 Gbytes/sec or
36 Gbps data encryption speed.  (In the multi-salt case, the key setup
is out of the loop.)

For a more direct comparison (yet still apples to pears indeed) to the
OpenSSL benchmarks I posted above, here's what John achieves on one core
in the FX-8120 o/c 4.5 GHz (turbo):

Benchmarking: Traditional DES [128/128 BS XOP-16]... DONE
Many salts:     5275K c/s real, 5275K c/s virtual
Only one salt:  4993K c/s real, 4993K c/s virtual

(Non-OpenMP build this time, to use just one CPU core.)

That's 1055000 x 1000 bytes per second (about 1 Gbyte/sec), which is
about 14 times faster than the OpenSSL speed.  And that's not
considering that JtR also implements DES-based crypt(3) salts in this
benchmark (roughly a 7% performance hit).

For a pure 32-bit build, if you must, I expect JtR to be faster than
OpenSSL's DES - in this apples to pears comparison - by a factor of
1.2 (x86 in 32-bit mode, register-starved) to 4 (decent architectures).

Here's a low speedup example (almost worst case for JtR), Pentium 3
1.0 GHz, deliberately non-optimal build of JtR ("make generic"):

Benchmarking: Traditional DES [32/32 BS]... DONE
Many salts:     125632 c/s real, 124388 c/s virtual
Only one salt:  124448 c/s real, 124448 c/s virtual

That's about 25 million bytes per second.  OpenSSL on the same machine:

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
des cbc          20319.54k    21552.33k    21682.94k    21918.41k    21882.78k

So we have a speedup of only between 1.2x to 1.25x here.  As soon as we
switch to an optimal build, things change dramatically (same machine):

Benchmarking: Traditional DES [64/64 BS MMX]... DONE
Many salts:     376320 c/s real, 376320 c/s virtual
Only one salt:  367040 c/s real, 367040 c/s virtual

That's about 75 million bytes per second, or a speedup of 3.5x.

I hope this answers your question more than exhaustively. ;-)

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.