|
Message-ID: <20241201023808.GA16926@openwall.com> Date: Sun, 1 Dec 2024 03:38:08 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Markov phrases in john On Wed, Nov 20, 2024 at 03:14:31AM +0100, Solar Designer wrote: > Anyway, it is interesting that OMEN alone performed better for you than > incremental with tokenizer did. My guess as to why is that incremental > does too much (when extended in other ways, like with the tokenizer) in > terms of separation by length and character position. > > I also had this guess when I had tried extending incremental to > 3rd-order Markov (4-grams) from its current 2nd-order (3-grams) while > preserving the length and position separation. This resulted in only > slight and inconclusive improvement (at huge memory usage increase > and/or reduction in character set), so I didn't release that version. I've created/closed a GitHub issue to record that experiment: https://github.com/openwall/john/issues/5584 The patch is included in there, so please feel free to give it a try. I did not try it along with the tokenizer yet - would be interesting. > If I had more time, I'd try selectively removing that separation or/and > adding more fallbacks (like if a certain pair of characters never occurs > in that position for that length, see if it does for others and use that > before falling back to considering only one character instead of two). Excerpt from an e-mail I wrote in late 2021: > For incremental mode, I got inconsistent results for a possible upgrade > from the current 3-grams to 4-grams, which I spent a couple of days on > last week. In my tests so far, results vary from -11.5% to +24.9%, and > are commonly at around +5%. This is by number of passwords cracked in > comparison to the currently released code trained in the same way. > > The variance is for different training sets, test sets, prior exclusion > or not of passwords crackable by wordlist+rules, and different attack > duration (such as 1 vs. 10 billion candidates tested). While the > results are mostly positive, it is not entirely obvious which ones > reflect future real-world usage best. Since there's significant extra > processing and memory consumption for 4-grams vs. 3-grams, we might want > to justify it with a greater improvement than what I'm getting so far. > > Compared to the current publicly released .chr files, the improvement is > more obvious - up to +39.4% in my tests so far - but much of it is also > possible without code change (with more extensive training sets). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.