Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 12 May 2015 11:38:52 +0300
From: Aleksey Cherepanov <lyosha@...nwall.com>
To: john-users@...ts.openwall.com
Subject: raw-md5 vs raw-md5u, one hash with 2 different passwords

In 2012 Alexander Cherepanov noticed that raw-md5u format can be
cracked as raw-md5 in some cases: 2 spaces represent the dagger symbol
"(U+2020) which exist in windows code pages and, [he] think, can
easily be entered from keyboard. If a unicode password consists of
only such symbols then it can be found by trying various printable
ascii characters in non-unicode way. But such cross-matches seem small
and exotic."


There is a practical session with john in utf-8 environment to show the
case:

$ printf '  ' | md5sum -
23b58def11b45727d3351702515f86af  -
$ cat dagger.pw
23b58def11b45727d3351702515f86af
$ john --pot=t.pot dagger.pw
[... a lot of suggestions about format ...]
Loaded 2 password hashes with no different salts (LM [DES 128/128 SSE2-16])
[...]
$ printf '  ' | john --stdin --pot=t.pot --format=raw-md5 dagger.pw
Loaded 1 password hash (Raw-MD5 [MD5 128/128 SSE4.1 4x3])
[...]
                 (?)
[...]
$ cat t.pot
$dynamic_0$23b58def11b45727d3351702515f86af:  
(there are 2 spaces on the end.)

Ok. As raw-md5 23b58def11b45727d3351702515f86af is '  '.


Let's try the dagger:

$ perl -C0 -e 'print "\x{2020}"'
Wide character in print at -e line 1.
†
$ perl -CSDA -e 'print "\x{2020}"'
†
$ perl -CSDA -e 'print "\x{2020}"' | hd
00000000  e2 80 a0                                          |...|

The dagger is not 20 20 bytes in utf-8. It would be 20 20 in utf-16be.

$ perl -CSDA -e 'binmode(STDOUT, ":encoding(utf-16be)"); print "\x{2020}"' | hd
00000000  20 20                                             |  |

So that's just 2 spaces in utf-8. We don't need it. To use the dagger
as input for john, we'll specify the encoding as utf-8.

$ perl -CSDA -e 'print "\x{2020}"' | john --stdin --pot=t2.pot --format=raw-md5u dagger.pw
Loaded 1 password hash (Raw-MD5u [md5(unicode($p)) 128/128 SSE4.1 4x3])
0g 0:00:00:00  0g/s 9.090p/s 9.090c/s 9.090C/s †

Without --encoding=utf-8 options john does not crack the hash.

$ perl -CSDA -e 'print "\x{2020}"' | john-dirty/run/john --stdin --pot=t2.pot --encoding=utf-8 --format=raw-md5u dagger.pw
Loaded 1 password hash (Raw-MD5u [md5(unicode($p)) 128/128 SSE4.1 4x3])
†                (?)

$ cat t2.pot
$dynamic_29$23b58def11b45727d3351702515f86af:†

$ hd t2.pot
00000000  24 64 79 6e 61 6d 69 63  5f 32 39 24 32 33 62 35  |$dynamic_29$23b5|
00000010  38 64 65 66 31 31 62 34  35 37 32 37 64 33 33 35  |8def11b45727d335|
00000020  31 37 30 32 35 31 35 66  38 36 61 66 3a e2 80 a0  |1702515f86af:...|
00000030  0a                                                |.|

As you can see, the dagger in .pot is in utf-8, it is not 2 spaces.


Let's try --show option. For the first .pot file with raw-md5 and 2 spaces:

$ john --pot=t.pot --show dagger.pw
0 password hashes cracked, 2 left
$ john --pot=t.pot --show --format=raw-md5 dagger.pw
?:  
1 password hash cracked, 0 left
$ john --pot=t.pot --show --format=raw-md5u dagger.pw
0 password hashes cracked, 1 left

For the second .pot with raw-md5u and the dagger:

$ john --pot=t2.pot --show dagger.pw
0 password hashes cracked, 2 left
$ john --pot=t2.pot --show --format=raw-md5 dagger.pw
0 password hashes cracked, 1 left
$ john --pot=t2.pot --show --format=raw-md5u dagger.pw
?:†
1 password hash cracked, 0 left

There are 2 lm hashes without --format= option and no password. Forcing
format, we get the right password having the respective password in
the .pot. The .pot files may be combined together:

$ cat t.pot t2.pot > combo.pot
$ john --pot=combo.pot --show --format=raw-md5u dagger.pw
?:†
1 password hash cracked, 0 left
$ john --pot=combo.pot --show --format=raw-md5 dagger.pw
?:  
1 password hash cracked, 0 left

It works because john writes hash to .pot with the tag of the format
that cracked the hash ($dynamic_29$ for raw-mdu and $dynamic_0$ for
raw-md5). (It should be noted that such representation is not always
unambiguous.)

Conclusion: One ciphertext may be crackable as different formats and
may represent different passwords. So when we are not sure in the
format and we got a crack, it does not reliably mean that we guessed
the format right. Though it is a rare case.

Thanks!

-- 
Regards,
Aleksey Cherepanov

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.