john-users - comparing two benchmarks

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111107015024.GA22414@openwall.com>
Date: Mon, 7 Nov 2011 05:50:24 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: comparing two benchmarks

Hi,

Attached is a Perl script I wrote to compare pairs of John the Ripper
benchmarks, such as to see the overall effect of changes that are not
limited to individual hash or cipher types - e.g., different compilers,
optimization options, computers, or versions of John.  With so many
"formats" supported in -jumbo these days, it was difficult to see if
such changes had an overall positive or overall negative effect.
Of course, for typical uses of John you would care about individual
benchmarks, but for some uses an overall comparison is needed - e.g.
when creating binary packages of John for others to use or when using
John as a benchmark for C compilers or CPUs rather than as a password
cracker. ;-)

For example, it was non-obvious to me whether gcc 4.6.2 produced faster
x86-64 code as measured on Core 2 when compiling with -O2 or -Os.  Some
of John's benchmarks were faster with -O2, some with -Os.  In
1.7.8-jumbo-7, there are as many as 158 individual benchmark outputs.
Here's what I did to find out which of these options was faster overall:

1. Compiled John the Ripper 1.7.8-jumbo-7 as-is using the linux-x86-64
make target (it has -O2 in CFLAGS, among other things).

2. Ran "../run/john --test > asis".

3. Edited the Makefile replacing -O2 with -Os on the CFLAGS line, ran
"make clean linux-x86-64".

4. Ran "../run/john --test > Os".

5. Ran the comparison script on the two files:

$ ./relbench.pl asis Os
Geometric mean of 158:  0.914688 real, 0.915206 virtual
Standard deviation:     0.966662 real, 0.962432 virtual

Thus, it appears that going from -O2 to -Os made things slower by 8.5%
overall.  However, the standard deviation is large, meaning that things
vary between individual benchmarks a lot.

6. To double-check these results, I ran "../run/john --test > Os-2",
reverted the change to Makefile, did "make clean linux-x86-64" again,
and ran "../run/john --test > asis-2".

7. Comparing these additional files, I get:

$ ./relbench.pl asis asis-2
Geometric mean of 158:  1.000566 real, 1.002138 virtual
Standard deviation:     0.172785 real, 0.170517 virtual
$ ./relbench.pl Os Os-2
Geometric mean of 158:  1.002091 real, 1.001839 virtual
Standard deviation:     0.112707 real, 0.111287 virtual

Notice consistent overall performance and much lower standard deviation
when comparing what are supposed to be equivalent benchmarks.  Now let's
compare -O2 vs. -Os again, but using the extra files this time:

$ ./relbench.pl asis-2 Os-2
Geometric mean of 158:  0.916083 real, 0.914933 virtual
Standard deviation:     0.931147 real, 0.934134 virtual
$ ./relbench.pl asis Os-2
Geometric mean of 158:  0.916601 real, 0.916890 virtual
Standard deviation:     0.962342 real, 0.960655 virtual
$ ./relbench.pl asis-2 Os
Geometric mean of 158:  0.914171 real, 0.913253 virtual
Standard deviation:     0.938063 real, 0.939076 virtual

These are similar to what we saw originally, confirming that result:
yes, we have an 8.5% overall slowdown for -Os.

A drawback of the current approach is that all benchmark outputs are
assigned the same weight, even though some are related to each other
(e.g., many hashes are MD5-based) and the hashes and ciphers differ in
popularity (so some are more important than others).  But any other
approach to this would be non-perfect as well.

Alexander

View attachment "relbench.pl" of type "text/plain" (1969 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.