|
Message-ID: <002001cd4574$1273bc30$375b3490$@net> Date: Fri, 8 Jun 2012 07:41:48 -0500 From: "jfoug" <jfoug@....net> To: <john-users@...ts.openwall.com> Subject: RE: JtR to process the LinkedIn hash dump Pre-Warning for most users, long and somewhat technical under the hood post. >See this old thread about dupes recognition as well: > >http://thread.gmane.org/gmane.comp.security.openwall.john.user/50/focus= >60 > >The logic is still the same as 7 years ago (except that jumbo got the -- >field-separator-char=C option): > >In cracker.c, you'll find > > log_guess(crk_db->options->flags & DB_LOGIN ? replogin : "?", > dupe ? NULL : pw->source, repkey, key, crk_db->options- >>field_sep_char); > > >Removing "dupe ? NULL :" will cause john to write duplicate hash >representations to the pot file. > >Removing the dupes recognition logic from john should only be a >temporary solution for cracking these LinkedIn hashes. Yes, I saw that, however, this also causes all sorts of other issues with that version of JtR, so that is why I chose to report to the users group, the behavior, and HOW to properly work around the issue. I would much rather do that, than produce a very problematic version of JtR. It could be handled in several ways. 1. Removing the dupe ? NULL : as you mentioned. 2. As the work around (re-run the found passwords again), as I mentioned. 3. Turning the format into a salted format, 2 salts. 1 for the 00000's and one for the others. Option 1 makes a buggy jtr. Option 2 requires a little hand work and a few seconds of re-run, but everything else works properly. Option 3 finds all values, except it runs twice as slow. I went with option #2. >There should be a possible permanent solution for this "special" format: >During prepare, or during split, always initialize the first 32 bits. > >Then, john should see only one hash, and write hashes with '00000' to >the pot file. I do not believe this will help at all. If you do get this to work, then you lose the non 00000 hashes altogether. The problem is you really DO have duplicates in there. The loader code DOES not see the duplicates, because it is not purposely broken like the format (the format is 'broken' in that is ignores the first 32 bits). Thus, when you run JtR and it finds a dupe, it will write the first one to the pot file, then ignore any others (but lists the PW to screen and to log file). Then, upon loading from the pot file, a different dupe detection logic is used. Within loader, the actual strings are compared (after prepare() and split() ). Thus, within loader, whichever hash was stored in the .pot file, WILL be removed from the current run. However, if there was another (or multiple), which are duplicates, BUT which have a different full hash signature, then those will be left intact. Thus, if you re-run against the already found passwords, these duplicates WILL now be properly written to the .pot file, and removed. Pretty simple solution. >The other solution is similar, but involves you new get_source. >It should be trivial to restore the correct sha1 hex string, even if you >loaded the one with '00000'. It does load this. However, this code is NOT called from within the dupe detection logic within cracker. NOTE, this 'could' be an issue that needs to be addressed. I believe that the logic here, is simply working upon the binary hash, or by using compare_one() Now, re-reading your above (and below), I do see what you are saying, but I have to see just how this will properly integrate in. This is like prepare, but in reverse. Prepare is used to 'fixup' the ciphertext prior to JtR running (split also, but it has no information about GECOS which is often needed). This would be fixing up the ciphertext on the backside, KNOWING the missing bits. I do not think that get_source() would work for this, because it gets used for more than just writing the .pot file lines. I would need to see if it can be used for this or not. It also depends upon how it interacts with the loader code. For the listed example, we have these input hashes 0000003ced2802e237e597f6a9d14e963206d6c3 122b603ced2802e237e597f6a9d14e963206d6c3 And this password camp1985hill I really thing that having JtR list 2 cracks, and writing 1 line to the .pot file: $dynamic_26$122b603ced2802e237e597f6a9d14e963206d6c3:camp1985hill And then later when JtR re-runs, it strips both of the original hashes out of there (using raw-sha1_LI), is probably about the best behavior we can get for these hashes. I will work to see if I can get JtR working with that behavior. A question however. What if only this hash was in the input file? 0000003ced2802e237e597f6a9d14e963206d6c3 It will still be saved as $dynamic_26$122b603ced2802e237e597f6a9d14e963206d6c3:camp1985hill In the .pot file, even though there is no directly represented hash of that value. The hash IS proper raw-sha1 (where the one with 00000 is not). Again, if re-run, JtR would not re-load the single hash, it would be removed by .pot detection against what was prior written out to the .pot file. >This could make the cracked LinkedIn hashes stored in john.pot valid for >the normal raw-sha1 format. >(Assuming that the '00000' don't cause false positives, but that should >be a valid assumption.) With several million candidates, and only 20 bits of 0's being overlaid, I would expect several 'real' 00000's to be in the data. However, with only 20 bits of 0's, and 140 bits of 'valid' dat (which I look at 128 bits of it), it still is 2^128 (or it is 2^64??) chance that we will find a false positive. I do not think that is an issue at all. It should be the same chance (or less), than a false pos for MD5 hash. One 'other' way to fix this issue, is to simply write a pre-processor, that simply drops any duplicates from the original input list. Keep the real hash, and dump the 00000 smashed value that is the same. Jim.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.