john-users - Re: password ranking

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091207022407.GA20528@openwall.com>
Date: Mon, 7 Dec 2009 05:24:07 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: password ranking

On Mon, Dec 07, 2009 at 01:08:40AM +0100, Luke O'Connor wrote:
> What I was wondering was whether there is a document which describes the
> search strategy that is followed by JtR. I see from the documentation
> pages that many search options can be configured but I was hoping someone
> could give me a simple answer based on standard settings (if they exist).

Default settings and default data files do exist, but there can be no
answer that would be simple, correct, and complete at the same time.

> Imagine that I start JtR to search for a password like 8h2wt6ghw - expressed
> as windows hash for example - how many guesses will JtR make before the
> password will be found? Let's for the moment assume that JtR runs
> indefinitely.

This is either tricky or time-consuming to compute.  You could in fact
write a program that would compute this reasonably quickly for the
"incremental" mode (with given settings and .chr file), but such a
program does not readily exist (as far as I'm aware) and it is not a
trivial one to write.

> I think an interesting measure of password complexity would be some function
> JtR(Password) which returns the position of Password in the list of
> candidate passwords generated by JtR according to its search strategy.

JtR's "incremental" mode tries candidate passwords roughly in order of
decreasing estimated probability of each being "the" password.  (With
multiple hashes loaded for cracking this would have to be worded a bit
differently, but the main idea holds.)  The probability estimate is
roughly the product of conditional probabilities of the characters in
each character position in the candidate password.  The conditional
probabilities take two preceding characters into account.  In turn, the
conditional probabilities of each character are estimated as being equal
to frequencies of occurrences of the corresponding character triplet at
the corresponding position in passwords of the corresponding length as
seen in john.pot at the time the .chr file is generated.  Because of the
need to generate candidate passwords real fast and without consuming an
enormous amount of memory (such as for a sort buffer), the actual
ordering is slightly different.  Specifically, the .chr files do not
even encode the probability values, but they encode ordered lists of
characters and other parameters needed for the fast candidate password
generating algorithm instead.

Thus, instead of focusing on JtR's approximation, you could calculate
the product of estimated probabilities directly.  Well, you would not be
able to do that working from a .chr file (at best, you'd get the same
slightly distorted value that JtR "uses"), but you could do it from a
set of passwords (a john.pot file or the like) that you would otherwise
generate a .chr file from.  This could be easier to do, faster, and more
accurate than your proposed "JtR(Password)" function.  (Well, it would
not be more accurate if you're focusing on the difficulty of getting a
password cracked with JtR specifically, but it would be more accurate if
you are not focusing on JtR's specific attack.  It would continue to
assume an attack based on just frequencies of character sequences,
though, without other input information, whereas in practice a password
can be weak for other reasons.)

I must admit that probability of a candidate password is quite a
different metric than the number of other candidate passwords that would
be tried before reaching this one.  Yet either can be useful.

Then, in practice, with default settings, JtR actually runs through
"single crack" and wordlist modes, both with word mangling rules, before
it proceeds with "incremental".  You could want to simulate that, too -
and for this it is in fact practical to simply enumerate all candidate
passwords until yours is possibly hit.  For "single crack" mode, you'd
have to provide additional information along with the password (such as
the username), so in a sense the strength of the password depends on
what related information is available to a cracker.  This is quite true
for the real world.

> Is there any analysis along these lines, a document which describes how a
> given search strategy works its way through the set of all passwords (if
> it was able to run indefinitely)?

I am not aware of a decent and detailed description of JtR's "search
strategies", beyond various postings I made to this mailing list (much
like this one).  There are brief descriptions in the JtR documentation,
but those are meant for end-users.

However, there is a wiki page on the Markov mode, which is an unofficial
addition to JtR (found in the jumbo patch):

http://openwall.info/wiki/john/markov

As you can see, the Markov mode is quite similar to JtR's "incremental"
mode.  Perhaps the most obvious user-visible difference is that the
Markov mode is not intended to run indefinitely, until success, or until
being interrupted (it may also terminate on its own, yet fail to crack
the password), whereas the "incremental" mode is (assuming a large
enough charset and maximum length, resulting in a practically "infinite"
password space).

I hope this helps.

Alexander

P.S. It appears that you have inconsistent configuration of your mail
client software, where lines were first wrapped at something like 100
characters, but then wrapped for a second time at under 80 characters.
I corrected this manually for the quoting.  If you can, please correct
this on your end before you make your next posting (say, to wrap at 72
characters right away).  Thanks!
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.