john-users - Re: KoreLogic Contest.. first set of stats/rules

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100802034332.GA25802@openwall.com>
Date: Mon, 2 Aug 2010 07:43:32 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: KoreLogic Contest.. first set of stats/rules

First of all, Minga (and others at KoreLogic) - thank you for the contest!

On Sun, Aug 01, 2010 at 07:32:07PM -0500, Minga Minga wrote:
> We have released the first set of stats/information from the contest:
> 
> How the passwords were created:
> http://contest.korelogic.com/how_passes_created.html
> 
> Here are the "answers":
> http://contest.korelogic.com/plaintexts.html

Thank you for publishing this.  I was indeed curious about some of it.

> Most relevant - how they were created.
> http://contest.korelogic.com/rules.html
> Those are the john the ripper rules used to CREATE the contest.

Great.  This is a bit disappointing to me, though, in two ways:

1. Didn't you promise to release a ruleset for actual use when attacking
real-world passwords?  So far, you've only released these snippets that
you used to generate the passwords.  Only a subset of them are candidates
for inclusion in an actual-use ruleset, and this needs more work (as you
correctly propose).

2. Your use of the preprocessor is minimal.  The ruleset can be made a
lot shorter by more extensive use of the preprocessor.  OK, maybe this
tells me that this stuff is hard to grasp, although I don't see how I
could have made the syntax significantly simpler or the documentation
significantly better (more examples? but there are plenty in the default
john.conf's ruleset).  Any suggestions are welcome.

For example, the 90-line KoreLogicRulesReplaceNumbers ruleset:

[List.Rules:KoreLogicRulesReplaceNumbers]
/0s01
/0s02
/0s03
...
/9s96
/9s97
/9s98

can be shortened to 10 lines without any efficiency loss (and no
readability loss either):

/0 s0[1-9]
/1 s1[02-9]
/2 s2[013-9]
/3 s3[0-24-9]
/4 s4[0-35-9]
/5 s5[0-46-9]
/6 s6[0-57-9]
/7 s7[0-68-9]
/8 s8[0-79]
/9 s9[0-8]

In fact, it can be shortened even further to one line:

/[0-9] s\0[0-9] Q

but then there's slight efficiency loss (some candidate passwords are
almost generated, then rejected with the "Q" command).  (Maybe I need to
enhance JtR with detection and dropping of effective-no-op and
always-reject rules, then it'd be OK to "inadvertently" produce them
with the preprocessor.)

I've verified that both of my proposed replacements produce the exact
same results as yours (same MD5 hash of output on a test wordlist).

Another example of preprocessor non-use (note: these lines start with
the greater-than character, which I think many mail readers may display
improperly since this syntax is typically used for quoting in e-mail):

[List.Rules:KoreLogicRulesAdd1234_Everywhere]
Az"1234"
>0A[0]"1234"
>1A[1]"1234"
>2A[2]"1234"
>3A[3]"1234"
>4A[4]"1234"
>5A[5]"1234"
>6A[6]"1234"
>7A[7]"1234"
>8A[8]"1234"
>9A[9]"1234"

This is literally the same as:

Az"1234"
>[0-9]A\0"1234"

Also, both produce some duplicates.  This is solved with:

<1Az"1234"
>[0-9]A\0"1234"

which generates all the same candidate passwords but no duplicates.
This highlights another problem: the first one of these two lines is
only useful when applied to an empty string, thereby producing the
candidate password "1234".  Probably not what you meant (better to have
"1234" in your wordlist).  Thus, we further reduce this to one line:

>[0-9]A\0"1234"

Your KoreLogicRulesReplaceLettersCaps, which is 26 lines:

# This is a lamer/faster version of --rules:nt
[List.Rules:KoreLogicRulesReplaceLettersCaps]
/asaA
/bsbB
...
/ysyY
/zszZ

can be replaced with one line (verified, exact same output):

/[a-z] s\0\p[A-Z]

BTW, I like the idea behind this one (I mean your idea of toggling the
case of just one type of character at a time). :-)

Then, a block like:

/asa@[:c]
/asa4[:c]
/AsA4[:c]
/AsA@[:c]
/bsb8[:c]
/BsB8[:c]
/ese3[:c]
/EsE3[:c]

may be rewritten as one line:

/\r[aaAAbBeE] s\0\r\p[@44@...3] [:c]

These two are exactly the same.  And the single line can be extended to
include the rest of your rules in that block (46 lines), or this can be
split over, say, 4 lines for readability.

As to your comment about "[:c]" producing duplicates, yes, but it did
not produce as many duplicates in JtR's supplied ruleset because the
lines use "l" (lowercase) in there.  They do it primarily to avoid
having to generate separate rules for lower- and uppercase letters to
replace.  Then "c" is far more likely to produce a different string.
To avoid generating duplicates, you can use:

/\r[aaAAbBeE] s\0\r\p[@44@...3] [:M] \p[:c] \p[:Q]

OK, that's more than enough for now.  Basically, this needs more work.

BTW, spaces and colons are OK to have/generate - they have no
performance cost in recent versions of JtR (are squeezed out early on).

> In the next few weeks, Id us, as a community, to try to improve these rules
> and hopefully incorporate them into everyone's rules. The represent years
> of password analysis of over 3 million passwords from environments
> that enforce password complexity.

Sounds great, but frankly as it relates to many of the "rule blocks", I
seriously doubt that they're effective "enough" on real-world passwords.
Even in this contest, even when we (our team) figured out many patterns
and could encode them into rules, we then only ran the "slower" ones of
the rules (those producing more combinations) against the fastest hashes
only - we did not run them against slow hashes, it was just not worth
the time.  That's even when we _knew_ we'd get a few hits.  In the real
world, we would not know.

That said, I agree that we should analyze these rules, identify those
that are generally useful, optimize them, and put into some ruleset(s)
recommended for use.  Maybe produce two such rulesets: one for fast
hashes and/or short wordlists, the other for slow/salted hashes and/or
for large wordlists.

Thanks again,

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.