Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240515205701.GA17047@openwall.com>
Date: Wed, 15 May 2024 22:57:01 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

On Thu, May 09, 2024 at 07:49:11AM +0200, Albert Veli wrote:
> To make it a bit clearer. Think of this famous comic strip
> https://xkcd.com/936/
> 
> The phrase is made of four words
> 
> correct
> horse
> battery
> staple
> 
> With a bit of luck, depending on what corpus text you use, these four words
> could be found in maybe the top 2000 words. Now this is not a normal
> sentence so it would not benefit from Markov statistics. But the idea is to
> be able to form these kinds of combinations with a feature in john. A
> feature that preferably tries the word combinations in order from most
> likely to least likely, like it does for masks with individual characters.
> Also you could use different delimiters.
> 
> correct_horse_battery_staple
> correct-horse-battery-staple
> CorrectHorseBatteryStaple
> 
> Or maybe even spaces as delimiters. But it would also be good if a
> delimiter could be specified with an option.
> 
> So my question is just if you think this would be a useful feature or if it
> is possible to do this with current features in john, with some twist.

Would be a useful feature, and is possible to do now with some twists
and without consideration for word frequencies.

PRINCE mode can combine words.  Unfortunately, it lacks support for word
separator characters (other than by including separators as separate
wordlist lines, but then we also get repeated separators with no word
between them).  Yet we can hack that by using rules:

$ cat w
Correct
Horse
Battery
Staple
$ ./john --prince=w --prince-elem-cnt-min=4 --prince-elem-cnt-max=4 --max-length=28 --rules=': %2?u ip %3?u ip %4?u ip l' --rules-stack=phrase --stdout
[...]
battery#battery#staple#horse
battery~battery~staple~horse
batterybatterybatteryhorse
4512p 0:00:00:15 100.00% (2024-05-15 22:21) 285.9p/s batterybatterybatteryhorse

Unfortunately, PRINCE is somehow too slow at high lengths, which is why
I had to limit the above to --max-length=28.  To output all combinations
with separators, I think it'd need to be --max-length=31, but that
appears to take ages.  (Incidentally, I think that's in part because
this greater maximum length gets passed into PRINCE code as-is, unaware
that it would only be reachable with the rule stacked on top of it.)

The rule I specified on the command line finds the capital letters and
inserts spaces before them, then converts to all-lowercase.  Then the
Phrase ruleset (in current default john.conf) optionally inserts
separators in an optimized order - see the comment before it.  In fact,
as stated in that comment, ideally we'd run another ruleset first:

$ ./john --prince=w --prince-elem-cnt-min=4 --prince-elem-cnt-max=4 --max-length=28 --rules=': %2?u ip %3?u ip %4?u ip ' --rules-stack=PhrasePreprocess --stdout > w-preprocessed
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
Enabling duplicate candidate password suppressor
304p 0:00:00:15 100.00% (2024-05-15 22:50) 19.04p/s battery battery staple horse
$ ./john -w=w-preprocessed --rules=phrase --stdout | tail -4
Using default input encoding: UTF-8
Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status
Enabling duplicate candidate password suppressor
8816p 0:00:00:00 100.00% (2024-05-15 22:50) 62971p/s battery~battery~staple~horse
Correct~Battery~Staple~Horse
correct~battery~staple~horse
Battery~Battery~Staple~Horse
battery~battery~staple~horse

Where it says 304p, it'd need to be 512 if we didn't have to limit the
length to 28.

Probably there's something in PRINCE to optimize to handle greater
lengths, or/and to add built-in word separator support.

I used capital letters to indicate word boundaries through PRINCE, but
another way would be to include e.g. a space character before or after
each word in the input list.  Then remove the extra leading or trailing
separator character from the "phrase" with a rule applied after PRINCE.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.