john-users - Re: Markov phrases in john

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAEo4CeOWr+9QLX2b8xaW+FeYpz2ycb81db3E9FGMurygDyAQKw@mail.gmail.com>
Date: Thu, 9 May 2024 07:49:11 +0200
From: Albert Veli <albert.veli@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

To make it a bit clearer. Think of this famous comic strip
https://xkcd.com/936/

The phrase is made of four words

correct
horse
battery
staple

With a bit of luck, depending on what corpus text you use, these four words
could be found in maybe the top 2000 words. Now this is not a normal
sentence so it would not benefit from Markov statistics. But the idea is to
be able to form these kinds of combinations with a feature in john. A
feature that preferably tries the word combinations in order from most
likely to least likely, like it does for masks with individual characters.
Also you could use different delimiters.

correct_horse_battery_staple
correct-horse-battery-staple
CorrectHorseBatteryStaple

Or maybe even spaces as delimiters. But it would also be good if a
delimiter could be specified with an option.

So my question is just if you think this would be a useful feature or if it
is possible to do this with current features in john, with some twist.

Thanks.

On Wed, May 8, 2024 at 12:34 PM Albert Veli <albert.veli@...il.com> wrote:

> Hi, as many of you know a mask will not try combinations of characters
> in alphabetical order but rather in the most likely to least likely order
> using something like Markov chains:
>
> ./john --stdout --mask='?l?l'
> aa
> ea
> ia
> oa
> na
> ra
> la
> sa
> ...
>
>
> This is useful to find human-created passwords early. Nowadays it is more
> and more popular to use combinations of words to create passwords. Would
> it be possible to use Markov or similar to traverse entire words from a
> wordlist and use the most common pair of adjacent words from the list
> first, then the second most common and so on?
>
> Like Markov does for individual characters, but on entire words instead?
> I hope you understand what I mean. Then maybe extend this to three
> words. It is possible with the '?l?l?l' mask so in some way it should be
> possible to do for entire words too. Ideally there would be an option to
> specify word delimiter too. Maybe even an option to provide a corpus text
> to train the chains on. Then an option to specify how many words to
> include in the guesses, the top 100 words, the top 500 words or the top
> 2000 words and so on. For two word combinations you can use a larger
> number and for three or four words, smaller numbers.
>
> What do you think? Would this be useful, or is it possible now already?
>
>
> Regards,
>
> Albert
>

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.