Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241201023808.GA16926@openwall.com>
Date: Sun, 1 Dec 2024 03:38:08 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

On Wed, Nov 20, 2024 at 03:14:31AM +0100, Solar Designer wrote:
> Anyway, it is interesting that OMEN alone performed better for you than
> incremental with tokenizer did.  My guess as to why is that incremental
> does too much (when extended in other ways, like with the tokenizer) in
> terms of separation by length and character position.
> 
> I also had this guess when I had tried extending incremental to
> 3rd-order Markov (4-grams) from its current 2nd-order (3-grams) while
> preserving the length and position separation.  This resulted in only
> slight and inconclusive improvement (at huge memory usage increase
> and/or reduction in character set), so I didn't release that version.

I've created/closed a GitHub issue to record that experiment:

https://github.com/openwall/john/issues/5584

The patch is included in there, so please feel free to give it a try.
I did not try it along with the tokenizer yet - would be interesting.

> If I had more time, I'd try selectively removing that separation or/and
> adding more fallbacks (like if a certain pair of characters never occurs
> in that position for that length, see if it does for others and use that
> before falling back to considering only one character instead of two).

Excerpt from an e-mail I wrote in late 2021:

> For incremental mode, I got inconsistent results for a possible upgrade
> from the current 3-grams to 4-grams, which I spent a couple of days on
> last week.  In my tests so far, results vary from -11.5% to +24.9%, and
> are commonly at around +5%.  This is by number of passwords cracked in
> comparison to the currently released code trained in the same way.
> 
> The variance is for different training sets, test sets, prior exclusion
> or not of passwords crackable by wordlist+rules, and different attack
> duration (such as 1 vs. 10 billion candidates tested).  While the
> results are mostly positive, it is not entirely obvious which ones
> reflect future real-world usage best.  Since there's significant extra
> processing and memory consumption for 4-grams vs. 3-grams, we might want
> to justify it with a greater improvement than what I'm getting so far.
> 
> Compared to the current publicly released .chr files, the improvement is
> more obvious - up to +39.4% in my tests so far - but much of it is also
> possible without code change (with more extensive training sets).

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.