Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4D7D6017.6060308@bredband.net>
Date: Mon, 14 Mar 2011 01:23:51 +0100
From: magnum <rawsmooth@...dband.net>
To: john-users@...ts.openwall.com
Subject: UTF-8 patch

I just uploaded a "UTF-8 awareness" patch to the wiki 
(http://openwall.info/wiki/john/patches).

It adds the new option flag --utf8. Without this flag, John behaves as 
usual - that is, for any format (for example NT) that internally 
converts to Unicode (UTF-16 or UCS-2), the conversion assumes ISO-8859-1 
input. This means you can't crack passwords containing characters not 
present in ISO-8859-1.

Using this flag makes John assume UTF-8 input instead. That is, you 
should feed it with wordlists encoded in UTF-8, and possibly hash files 
with user names and info encoded in UTF8, for --single mode to work 
best. For unaffected formats, the option is ignored unless you use the 
new rejection rules:

Two new rejection rules are introduced:
-u  reject rule unless the --utf8 option is used
-U  reject rule if the --utf8 option is used

The former can be prepended to rules that are tailored for UTF-8, and 
the latter can be used for rules that are specific to ISO-8859-1. For 
most other rules, none of them should be used.

The SAPg format do use UTF-8 internally and with this patch you can turn 
off the incomplete ISO-8859-1 conversion that is originally used, and 
feed it directly with UTF-8.

Other affected formats: mscash, mscash2, mschapv2, mssql, mssql05, 
netlmv2, netntlm, netntlmv2. There is also a new format included, 
raw-md5-unicode that is md5(unicode($p)) with optional UTF-8 support.


This is somewhat EXPERIMENTAL and I haven't tested it on any other 
platform than Linux-x86-64. I know of no bugs though, except for the 
following which I believe is not my fault:

The NT format seems to have a bug that make it fail if the second 
character of the plaintext is U+2000 or higher (for example a Euro 
sign). From all I can tell this is an old bug but we would never trigger 
it until now as we could only use U+00FF at most.

You can do "john --test --utf8" to benchmark just the formats that are 
affected. Note that we lack UTF-8 / ISO-8859-1 specific self-tests for 
some formats, any help adding them would be great.

cheers
magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.