john-users - Re: Rules for realistic words

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111231183046.GA17012@openwall.com>
Date: Sat, 31 Dec 2011 22:30:46 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Rules for realistic words

On Sat, Dec 31, 2011 at 02:17:55PM +0000, Alex Sicamiotis wrote:
> Currently it's something like
> 
> 1) single
> 2) dictionary
> 3) dictionary with rules

2 and 3 are one "pass 2" when you run John with no options.  The first
rule is normally ":", which means to try words as-is.

> 4) incremental with digits, Alpha, Lanman, All from lower characters to more characters.

This is normally "pass 3", and it normally uses all.chr (except that for
LM hashes it uses lanman.chr).  There's normally no or little need to
start with the more limited .chr files.

> Now for the 26 letters of Alpha, it goes like 26x26x26x26x26x26x26x26 = 208.8 billion combos
> For the Alpha+Digits it goes 36x36x36x36x36x36x36x36 = 2.82 trillion combos

Yes, this is one reason why you normally don't want to run these .chr
files and want to simply use all.chr, which tries the different kinds of
passwords in a semi-optimal order.

> What if there were intermediate character sets of frequently used letters as an intermediate step between dictionaries with rules and incremental with full character sets?

Incremental mode already focuses on frequently used characters (in fact,
even on trigraphs).

> For example the top 16 letters and 4 numbers = 20 characters in total.

Incremental mode gradually increases the number of different character
indices being tested.  It starts with just one, then two, and so on -
and it mixes in length switches as well, also according to statistics
(for passwords that were used to generate the .chr file).

> I think incremental mode already applies some sort of "more frequent" type of cracking, but I don't know how optimized it is in relation to this. If it already covers this sector, ignore this comment.

Yes, this is pretty much the case.  What it does is better than what
you describe.

> Another aspect that can take improvement, (not in cracking speed, but in cracking the easier ones out) is to emulate how language is constructed. For example greek & italian languages, use a lot of alternation between consonant and vowels. This means that you can have a rule which goes like this:

Incremental mode takes care of that by its use of trigraph frequencies.

For my opinion on vowel-consonant patterns specifically, see:

http://www.openwall.com/lists/john-users/2010/08/17/2

If you like, also see other messages in that thread by clicking
thread-prev and thread-next.

> By splicing words in human-like syllables, I achieved a hefty increase in effective cracking speed.

Really?  What exactly did you compare?  Did you possibly feed knowledge
of already cracked passwords into your patterns - that is, is your test
in-sample or out-of-sample?  If your test was an in-sample one, a
semi-fair comparison would be against a .chr file similarly generated
from your previously cracked passwords.  And I say "semi-" because the
incremental mode was optimized for the out-of-sample case; it would be
easy to achieve better effective performance for in-sample tests, but
those have no practical relevance.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.