john-dev - benchmark-unify and relbench adjustments

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <BLU0-SMTP176E4B576F71CED2D39D6B6FD0C0@phx.gbl>
Date: Thu, 24 Oct 2013 14:01:44 +0200
From: Frank Dittrich <frank_dittrich@...mail.com>
To: john-dev@...ts.openwall.com
Subject: benchmark-unify and relbench adjustments

magnum, Solar, all,

changed format names and changed --test output require some adjustments
to increase the number of benchmarks which can be compared (between
bleeding and earlier versions, e.g., unstable.

Output for dynamic formats has been changed. Commit 197f067 moved the
expression from the format name into the algorithm section, i.e., after
the '['.
The expressions for the new formats dynamic_1400 and 1401 contain
"[...]", so that the "Benchmarking: " output now contains nested
"[...[...]...]".

Some formats have been renamed.

Several of the issues mentioned above have been addressed by the
attached patches (intended for bleeding, not for unstable).
Because I'd like these patches to be discussed here, I didn't create
pull requests.

0001-benchmark-unify-sort-format-name-mappings.patch just sorted the
format name mappings (the __DATA__ section) alphabetically (ignoring
case), so that it is easier to see which mappings had/have to be added,
adjusted, or removed (the dynamic changes allowed a few of the dynamic
mappings to be removed).
So, this patch contains no functional changes.
(A similar patch for unstable might be useful if we need to adjust
mappings for unstable as well, so far I didn't check this.)


0002-benchmark-unify-adjustments-for-dynamic-formats.patch deals with
the changes required for dynamic formats.
Because  commit 197f067 moved the expression out of the format name and
into the algorithm, it was easy to even convert the md5_gen(n) from
older jumbo versions (I tested 1.7.6-jumbo-12) to dynamic_n.


0003-minor-relbench-improvements-and-adjustments.patch contains some
relbench adjustments, also described in the commit message.

I suppressed some "Could not parse" errors:
-ignore "Benchmarking: .*\.\.\. FAILED"
-older john versions had "All \d+ tests have passed self-tests", now it
is "All \d+ formats passed self-tests" or "\d+ out of \d+ tests have FAILED"
(If at least 1 test failed in one of the input files, I'll print that
information, before printing the "Only in File [1|2]: ..." information.)

Instead of printing the individual benchmarks (those that exist only in
1 file, or the ratio in case -v has been used) in "random" order,
they'll now be sorted.

The previous relbench version suggested running benchmark-unify even if
there's no way this would increase the number of matching benchmarks.
(E.g., the old john version just had a "Raw" benchmark for a particular
format, while the new version has "Many salts" and "Only one salt" for
the same format.)
This has now been fixed.


Format names that changed between unstable and bleeding have not yet
been addressed.

Main reasons for not dealing with it right now:
-there are still several formats in unused (which should really be in a
separate subdirectory, e.g., broken) because they fail self test

-openssl-enc still fails self test: (promiscuos valid ($openssl$))

-some formats might still need to be renamed.
I'd suggest renaming "Office ..." to "MS Office ..." and
"M$ Cache Hash (DCC)" to "MS Cache Hash (DCC)", as well as
"M$ Cache Hash 2 (DCC2)" to "M$ Cache Hash 2 (DCC2)".


Finally, I don't know yet how to handle this change:

In addition to the format name, the format label has been included into
the output, at least if format label and format name differ.

This change has 2 major consequences:
-while relbench used to recognize different implementations of the same
hash algorithm in the same input file (and picked the fastest one for
comparison), this no longer works
-almost all format names of older john --test runs would need to be
mapped to the new output (<label>, <name>); this is neither an elegant
solution, nor does it work for formats with more than one implementation.

I was experimenting with another solution (this patch just checks an
environment variable instead of a new benchmark-unify option, but I can
provide a patch which introduces a --drop-labels option):

-----------------------------------------------------------------

diff --git a/run/benchmark-unify b/run/benchmark-unify
index febd4ea..d266b38 100755
--- a/run/benchmark-unify
+++ b/run/benchmark-unify
@@ -56,6 +56,9 @@ sub parse
 		if (defined($renamed{$name})) {
 			$name = $renamed{$name};
 		}
+		if ($drop_labels == 1) {
+			$name =~ s/^[^\s]*, (.*)/$1/;
+		}
 		print "Benchmarking: $name $end\n";
 	}
 	else {
@@ -65,6 +68,15 @@ sub parse

 $_ = '';

+# For now, just an environment variable, I might add a
+# ./benchmark-unify option --drop-labels[=0|=1] later
+if(defined $ENV{'JOHN_BENCHMARK_UNIFY_DROP_LABELS'}) {
+	$drop_labels = 1;
+}
+else {
+	$drop_labels = 0;
+}
+
 while(<DATA>) {
 	chomp;
 	($old_format, $new_format) = /^(.*)	(.*)$/;

-----------------------------------------------------------------

If JOHN_BENCHMARK_UNIFY_DROP_LABELS is defined, e.g., by using
$  JOHN_BENCHMARK_UNIFY_DROP_LABELS=1 ./benchmark-unify ...
then everything that looks like a format label will be dropped.

This will increase the number of benchmarks in bleeding that can be
compared with benchmarks in unstable, but it has unwanted side effects:

-the user would need to run benchmark-unify even on the --test output of
the latest john version (so far I always aimed at converting older
output to the newest, and keeping the newest output unchanged)

-in some cases, the format name will become (completely) meaningless
and/or ambiguous:

Fortigate, FortiOS
FortiOS

MSCHAPv2, C/R
C/R
mschapv2-naive, MSCHAPv2 C/R
MSCHAPv2 C/R
(so, these 2 implementations aren't recognized as the same hash format,
and just "C/R" is completely meaningless)

OpenVMS, Purdy
Purdy

WoWSRP, Battlenet
Battlenet

Clipperz, SRP
SRP

Drupal7, $S$ (x16385)
$S$ (x16385)

IKE, PSK
PSK

MongoDB, system / network
system / network

Office, 2007/2010 (SHA-1) / 2013 (SHA-512), with AES
2007/2010 (SHA-1) / 2013 (SHA-512), with AES

PBKDF2-HMAC-SHA256, rounds=12000
rounds=12000

PBKDF2-HMAC-SHA512, GRUB2 / OS X 10.8
GRUB2 / OS X 10.8

PST, custom CRC-32
custom CRC-32

RAdmin, v2.x
v2.x

LastPass, sniffed sessions
sniffed sessions

STRIP, Password Manager
Password Manager

Raw-SHA, "SHA-0"
"SHA-0"

Raw-SHA1-ng, (pwlen <= 15)
(pwlen <= 15)

Mozilla, key3.db
key3.db


Some of this has been discussed in the past:
http://thread.gmane.org/gmane.comp.security.openwall.john.devel/9522/focus=9688
http://www.openwall.com/lists/john-dev/2013/08/19/

But so far, no solution has been found.

I think the best option is to (optionally) remove the format labels in
benchmark-unify, even if this requires the user to run benchmark-unify
on newest john --test output (which wasn't necessary for older john
versions), and adjust the few format names which get meaningless if we
do so.
IMHO, we should try to have the same format name output (possibly after
removing the format label) for different implementations of the same
algorithm.
That includes OpenCL and CUDA implementations.
If someone wants to compare the performance of CPU formats and CPU
formats separately, this can be done by running
./john --test --format=cpu
./john --test --format=dynamic
./john --test --format=gpu
./john --test --format=opencl
./john --test --format=cuda

We need to come up with a solution before we can release bleeding as the
next john-1.8 jumbo version.

Frank

View attachment "0001-benchmark-unify-sort-format-name-mappings.patch" of type "text/x-patch" (2882 bytes)

View attachment "0002-benchmark-unify-adjustments-for-dynamic-formats.patch" of type "text/x-patch" (1698 bytes)

View attachment "0003-minor-relbench-improvements-and-adjustments.patch" of type "text/x-patch" (5863 bytes)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.