john-users - RE: JtR to process the LinkedIn hash dump

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <002001cd4574$1273bc30$375b3490$@net>
Date: Fri, 8 Jun 2012 07:41:48 -0500
From: "jfoug" <jfoug@....net>
To: <john-users@...ts.openwall.com>
Subject: RE: JtR to process the LinkedIn hash dump

Pre-Warning for most users, long and somewhat technical under the hood post.


>See this old thread about dupes recognition as well:
>
>http://thread.gmane.org/gmane.comp.security.openwall.john.user/50/focus=
>60
>
>The logic is still the same as 7 years ago (except that jumbo got the --
>field-separator-char=C option):
>
>In cracker.c, you'll find
>
>        log_guess(crk_db->options->flags & DB_LOGIN ? replogin : "?",
>                dupe ? NULL : pw->source, repkey, key, crk_db->options-
>>field_sep_char);
>
>
>Removing "dupe ? NULL :" will cause john to write duplicate hash
>representations to the pot file.
>
>Removing the dupes recognition logic from john should only be a
>temporary solution for cracking these LinkedIn hashes.

Yes, I saw that, however, this also causes all sorts of other issues with
that version of JtR, so that is why I chose to report to the users group,
the behavior, and HOW to properly work around the issue.  I would much
rather do that, than produce a very problematic version of JtR.

It could be handled in several ways.  1. Removing the dupe ? NULL : as you
mentioned.  2. As the work around (re-run the found passwords again), as I
mentioned.  3. Turning the format into a salted format, 2 salts.  1 for the
00000's and one for the others.   

Option 1 makes a buggy jtr.  Option 2 requires a little hand work and a few
seconds of re-run, but everything else works properly.  Option 3 finds all
values, except it runs twice as slow.

I went with option #2.

>There should be a possible permanent solution for this "special" format:
>During prepare, or during split, always initialize the first 32 bits.
>
>Then, john should see only one hash, and write hashes with '00000' to
>the pot file.

I do not believe this will help at all.  If you do get this to work, then
you lose the non 00000 hashes altogether.

The problem is you really DO have duplicates in there.   The loader code
DOES not see the duplicates, because it is not purposely broken like the
format (the format is 'broken' in that is ignores the first 32 bits).  Thus,
when you run JtR and it finds a dupe, it will write the first one to the pot
file, then ignore any others (but lists the PW to screen and to log file).
Then, upon loading from the pot file, a different dupe detection logic is
used.  Within loader, the actual strings are compared (after prepare() and
split()  ).  Thus, within loader, whichever hash was stored in the .pot
file, WILL be removed from the current run. However, if there was another
(or multiple), which are duplicates, BUT which have a different full hash
signature, then those will be left intact.  Thus, if you re-run against the
already found passwords, these duplicates WILL now be properly written to
the .pot file, and removed.  Pretty simple solution. 

>The other solution is similar, but involves you new get_source.
>It should be trivial to restore the correct sha1 hex string, even if you
>loaded the one with '00000'.

It does load this.  However, this code is NOT called from within the dupe
detection logic within cracker.  NOTE, this 'could' be an issue that needs
to be addressed.  I believe that the logic here, is simply working upon the
binary hash, or by using compare_one()

Now, re-reading your above (and below),  I do see what you are saying, but I
have to see just how this will properly integrate in. This is like prepare,
but in reverse.  Prepare is used to 'fixup' the ciphertext prior to JtR
running (split also, but it has no information about GECOS which is often
needed).  This would be fixing up the ciphertext on the backside, KNOWING
the missing bits.  I do not think that get_source() would work for this,
because it gets used for more than just writing the .pot file lines.  I
would need to see if it can be used for this or not.  It also depends upon
how it interacts with the loader code.  

For the listed example, we have these input hashes

0000003ced2802e237e597f6a9d14e963206d6c3
122b603ced2802e237e597f6a9d14e963206d6c3
 
And this password camp1985hill

I really thing that having JtR list 2 cracks, and writing 1 line to the .pot
file:

$dynamic_26$122b603ced2802e237e597f6a9d14e963206d6c3:camp1985hill

And then later when JtR re-runs, it strips both of the original hashes out
of there (using raw-sha1_LI), is probably about the best behavior we can get
for these hashes.  I will work to see if I can get JtR working with that
behavior.

A question however.  What if only this hash was in the input file?

0000003ced2802e237e597f6a9d14e963206d6c3

It will still be saved as 

$dynamic_26$122b603ced2802e237e597f6a9d14e963206d6c3:camp1985hill

In the .pot file, even though there is no directly represented hash of that
value.  The hash IS proper raw-sha1 (where the one with 00000 is not).
Again, if re-run, JtR would not re-load the single hash, it would be removed
by .pot detection against what was prior written out to the .pot file.

>This could make the cracked LinkedIn hashes stored in john.pot valid for
>the normal raw-sha1 format.
>(Assuming that the '00000' don't cause false positives, but that should
>be a valid assumption.)

With several million candidates, and only 20 bits of 0's being overlaid, I
would expect several 'real' 00000's to be in the data.  However, with only
20 bits of 0's, and 140 bits of 'valid' dat (which I look at 128 bits of
it), it still is 2^128 (or it is 2^64??) chance that we will find a false
positive.  I do not think that is an issue at all.  It should be the same
chance (or less), than a false pos for MD5 hash.

One 'other' way to fix this issue, is to simply write a pre-processor, that
simply drops any duplicates from the original input list.  Keep the real
hash, and dump the 00000 smashed value that is the same.  

Jim.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.