john-users - Creating Graphs from john.log

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ9ii1G4C4gPBP_dTQw_5AT6TtTh6xewZ-iMewL10-pJD351Wg@mail.gmail.com>
Date: Wed, 19 Dec 2012 23:34:24 -0500
From: Matt Weir <cweir@...edu>
To: john-users@...ts.openwall.com
Subject: Creating Graphs from john.log

It really helps to be able to graph and compare the effectiveness of
various cracking techniques. For example, being able to evaluate the
rulesets Simon created vs the default JtR single mode may help us
create a better default ruleset, (as a sidenote, I can't wait until
the videos of Simon's talks are online. I highly recommend checking
out the other talks from the Passwords^12 conference as well).

In the past I've used a custom "checker" program to track how many
passwords were cracked vs the number of guesses made. The problem is,
it's really slow. It would be a huge help to use JtR's logs instead.
Unfortunately I've run into a couple of problems doing that. Let me
start with where I'm at, and then move on to what I'd like to do.

Right now, John's "dummy" format is very nice. As a comparison, I get
42726K c/s real, 42726K c/s virtual for the dummy format on my
Macbook, vs 14129K c/s real, 14129K c/s virtual when I use raw-md5.
Here is a script that I use to translate my target passwords to the
dummy format:

cat input.txt | od -A n -t x1 | tr "0a" '\n' | sed 's/ //g' | awk '{if
(length($0)>0){i++;print i":$dummy$"$0;}}' > dummy_fmt.txt

I'm sure there's more elegant ways to do the conversion, but it seems
to work ok.

I'm running into two problems when using JtR's logs though:

1) I'd really like to output the number of guesses have been generated
when a password is cracked. Right now it outputs the time instead.
While you can get a rough idea of the number of guesses based on the
time, it creates a lot of difficulties when sharing/comparing data
with other people. Aka if I have a crazy fast computer and someone
else has an old 486, they might have a better ruleset but when I
compare it to my dumbforce run I wouldn't know it. Number of guesses
made is a platform agnostic measurement.

2) I need to be able to count duplicate passwords. This is a bit of a
contentious point, but when modeling a password cracking session I
strongly believe we need to be able to represent that some passwords
are much more common than others. An attacker should be rewarded for
guessing '123456' first and I want to be able to model that. Right now
JtR, (rightfully so), removes duplicate hashes for performance
reasons. It would be nice to be able to modify a flag in john.conf so
that duplicate guesses were not removed.

Of course there's a million other things that would be nice, (such as
having an output that was Excel/Gnuplot ready/friendly), but the above
two are my biggest requests. If they are already available I'd really
appreciate if you could remedy my ignorance. Also, if implementing
them would cause slowdown for normal use please don't do it. This is
just something that I think would be nice to have as it would make it
easier to develop better rulesets.

Thanks,
Matt
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.