|
Message-ID: <20240515205701.GA17047@openwall.com> Date: Wed, 15 May 2024 22:57:01 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Markov phrases in john On Thu, May 09, 2024 at 07:49:11AM +0200, Albert Veli wrote: > To make it a bit clearer. Think of this famous comic strip > https://xkcd.com/936/ > > The phrase is made of four words > > correct > horse > battery > staple > > With a bit of luck, depending on what corpus text you use, these four words > could be found in maybe the top 2000 words. Now this is not a normal > sentence so it would not benefit from Markov statistics. But the idea is to > be able to form these kinds of combinations with a feature in john. A > feature that preferably tries the word combinations in order from most > likely to least likely, like it does for masks with individual characters. > Also you could use different delimiters. > > correct_horse_battery_staple > correct-horse-battery-staple > CorrectHorseBatteryStaple > > Or maybe even spaces as delimiters. But it would also be good if a > delimiter could be specified with an option. > > So my question is just if you think this would be a useful feature or if it > is possible to do this with current features in john, with some twist. Would be a useful feature, and is possible to do now with some twists and without consideration for word frequencies. PRINCE mode can combine words. Unfortunately, it lacks support for word separator characters (other than by including separators as separate wordlist lines, but then we also get repeated separators with no word between them). Yet we can hack that by using rules: $ cat w Correct Horse Battery Staple $ ./john --prince=w --prince-elem-cnt-min=4 --prince-elem-cnt-max=4 --max-length=28 --rules=': %2?u ip %3?u ip %4?u ip l' --rules-stack=phrase --stdout [...] battery#battery#staple#horse battery~battery~staple~horse batterybatterybatteryhorse 4512p 0:00:00:15 100.00% (2024-05-15 22:21) 285.9p/s batterybatterybatteryhorse Unfortunately, PRINCE is somehow too slow at high lengths, which is why I had to limit the above to --max-length=28. To output all combinations with separators, I think it'd need to be --max-length=31, but that appears to take ages. (Incidentally, I think that's in part because this greater maximum length gets passed into PRINCE code as-is, unaware that it would only be reachable with the rule stacked on top of it.) The rule I specified on the command line finds the capital letters and inserts spaces before them, then converts to all-lowercase. Then the Phrase ruleset (in current default john.conf) optionally inserts separators in an optimized order - see the comment before it. In fact, as stated in that comment, ideally we'd run another ruleset first: $ ./john --prince=w --prince-elem-cnt-min=4 --prince-elem-cnt-max=4 --max-length=28 --rules=': %2?u ip %3?u ip %4?u ip ' --rules-stack=PhrasePreprocess --stdout > w-preprocessed Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status Enabling duplicate candidate password suppressor 304p 0:00:00:15 100.00% (2024-05-15 22:50) 19.04p/s battery battery staple horse $ ./john -w=w-preprocessed --rules=phrase --stdout | tail -4 Using default input encoding: UTF-8 Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status Enabling duplicate candidate password suppressor 8816p 0:00:00:00 100.00% (2024-05-15 22:50) 62971p/s battery~battery~staple~horse Correct~Battery~Staple~Horse correct~battery~staple~horse Battery~Battery~Staple~Horse battery~battery~staple~horse Where it says 304p, it'd need to be 512 if we didn't have to limit the length to 28. Probably there's something in PRINCE to optimize to handle greater lengths, or/and to add built-in word separator support. I used capital letters to indicate word boundaries through PRINCE, but another way would be to include e.g. a space character before or after each word in the input list. Then remove the extra leading or trailing separator character from the "phrase" with a rule applied after PRINCE. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.