|
Message-ID: <4D7D6017.6060308@bredband.net> Date: Mon, 14 Mar 2011 01:23:51 +0100 From: magnum <rawsmooth@...dband.net> To: john-users@...ts.openwall.com Subject: UTF-8 patch I just uploaded a "UTF-8 awareness" patch to the wiki (http://openwall.info/wiki/john/patches). It adds the new option flag --utf8. Without this flag, John behaves as usual - that is, for any format (for example NT) that internally converts to Unicode (UTF-16 or UCS-2), the conversion assumes ISO-8859-1 input. This means you can't crack passwords containing characters not present in ISO-8859-1. Using this flag makes John assume UTF-8 input instead. That is, you should feed it with wordlists encoded in UTF-8, and possibly hash files with user names and info encoded in UTF8, for --single mode to work best. For unaffected formats, the option is ignored unless you use the new rejection rules: Two new rejection rules are introduced: -u reject rule unless the --utf8 option is used -U reject rule if the --utf8 option is used The former can be prepended to rules that are tailored for UTF-8, and the latter can be used for rules that are specific to ISO-8859-1. For most other rules, none of them should be used. The SAPg format do use UTF-8 internally and with this patch you can turn off the incomplete ISO-8859-1 conversion that is originally used, and feed it directly with UTF-8. Other affected formats: mscash, mscash2, mschapv2, mssql, mssql05, netlmv2, netntlm, netntlmv2. There is also a new format included, raw-md5-unicode that is md5(unicode($p)) with optional UTF-8 support. This is somewhat EXPERIMENTAL and I haven't tested it on any other platform than Linux-x86-64. I know of no bugs though, except for the following which I believe is not my fault: The NT format seems to have a bug that make it fail if the second character of the plaintext is U+2000 or higher (for example a Euro sign). From all I can tell this is an old bug but we would never trigger it until now as we could only use U+00FF at most. You can do "john --test --utf8" to benchmark just the formats that are affected. Note that we lack UTF-8 / ISO-8859-1 specific self-tests for some formats, any help adding them would be great. cheers magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.