john-users - Re: Markov phrases in john

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241201023808.GA16926@openwall.com>
Date: Sun, 1 Dec 2024 03:38:08 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

On Wed, Nov 20, 2024 at 03:14:31AM +0100, Solar Designer wrote:
> Anyway, it is interesting that OMEN alone performed better for you than
> incremental with tokenizer did.  My guess as to why is that incremental
> does too much (when extended in other ways, like with the tokenizer) in
> terms of separation by length and character position.
> 
> I also had this guess when I had tried extending incremental to
> 3rd-order Markov (4-grams) from its current 2nd-order (3-grams) while
> preserving the length and position separation.  This resulted in only
> slight and inconclusive improvement (at huge memory usage increase
> and/or reduction in character set), so I didn't release that version.

I've created/closed a GitHub issue to record that experiment:

https://github.com/openwall/john/issues/5584

The patch is included in there, so please feel free to give it a try.
I did not try it along with the tokenizer yet - would be interesting.

> If I had more time, I'd try selectively removing that separation or/and
> adding more fallbacks (like if a certain pair of characters never occurs
> in that position for that length, see if it does for others and use that
> before falling back to considering only one character instead of two).

Excerpt from an e-mail I wrote in late 2021:

> For incremental mode, I got inconsistent results for a possible upgrade
> from the current 3-grams to 4-grams, which I spent a couple of days on
> last week.  In my tests so far, results vary from -11.5% to +24.9%, and
> are commonly at around +5%.  This is by number of passwords cracked in
> comparison to the currently released code trained in the same way.
> 
> The variance is for different training sets, test sets, prior exclusion
> or not of passwords crackable by wordlist+rules, and different attack
> duration (such as 1 vs. 10 billion candidates tested).  While the
> results are mostly positive, it is not entirely obvious which ones
> reflect future real-world usage best.  Since there's significant extra
> processing and memory consumption for 4-grams vs. 3-grams, we might want
> to justify it with a greater improvement than what I'm getting so far.
> 
> Compared to the current publicly released .chr files, the improvement is
> more obvious - up to +39.4% in my tests so far - but much of it is also
> possible without code change (with more extensive training sets).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.