|
Message-ID: <cd9632ca7943cfc13167f1eb6fe11fa6@smtp.hushmail.com> Date: Sun, 6 Jan 2013 03:23:51 +0100 From: magnum <john.magnum@...hmail.com> To: "john-dev@...ts.openwall.com" <john-dev@...ts.openwall.com> Subject: Markov UTF-8 magic (Was: [john-users] Incremental attack properties questions) On 5 Jan, 2013, at 14:29 , Frank Dittrich <frank_dittrich@...mail.com> wrote: > On 01/05/2013 01:11 PM, Frank Dittrich wrote: >> Since Markov mode generates words based on 2-byte-frequencies, and since >> it generates passwords shorter than maximum length, there will be a >> non-neglectable number of words with invalid utf-8 characters, >> especially at the end of the word. So you might need to combine --markov >> with an --external filter. > > If you don't want to write a general-purpose utf-8 validity check, but > just one which checks --markov output based on stats files which have > been generated using a word list encoded in (valid) UTF-8, then this > task is quite simple: > > If the last byte is < 0x80, the word is valid. > Else if the last byte is > 0xbf, the word is invalid. > Else if the second to last byte is >= 0xc0 and <= 0xdf, the word is valid. > Else if the third to last byte is >= 0xe0 and <= 0xef, the word is valid. > Else if the forth to last byte is >= 0xf0 and <= 0xf7, the word is valid. > Else the word is invalid. I'm thinking I could include this in the Markov mode itself, provided we run with --enc=utf8. Would that be sane? magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.