|
Message-ID: <CAJ9ii1HTw7f7hHD0JLjyN60Hdr+3_gcmK8KNpbVNPoUdi+SvLw@mail.gmail.com> Date: Wed, 4 Dec 2024 19:24:57 -0500 From: Matt Weir <cweir@...edu> To: john-users@...ts.openwall.com Subject: Re: Markov phrases in john I wrote a new blog post about additional analysis of the Tokenizer and OMEN attacks. Link: https://reusablesec.blogspot.com/2024/12/analyzing-tokenizer-part-2-omen.html TLDR: I ran into significant problems training on the output of tokenize.pl. The control characters it inserts into the training data causes problems both when I was reading in the data as well as when I was trying to write the OMEN rulesets to disk. Therefore I was unable to successfully combine the two attack techniques. I plan on continuing to look into this, but it's quickly turning into a much bigger project than I originally expected. Cheers, Matt/Lakiw On Sat, Nov 30, 2024 at 9:40 PM Solar Designer <solar@...nwall.com> wrote: > On Wed, Nov 20, 2024 at 03:14:31AM +0100, Solar Designer wrote: > > Anyway, it is interesting that OMEN alone performed better for you than > > incremental with tokenizer did. My guess as to why is that incremental > > does too much (when extended in other ways, like with the tokenizer) in > > terms of separation by length and character position. > > > > I also had this guess when I had tried extending incremental to > > 3rd-order Markov (4-grams) from its current 2nd-order (3-grams) while > > preserving the length and position separation. This resulted in only > > slight and inconclusive improvement (at huge memory usage increase > > and/or reduction in character set), so I didn't release that version. > > I've created/closed a GitHub issue to record that experiment: > > https://github.com/openwall/john/issues/5584 > > The patch is included in there, so please feel free to give it a try. > I did not try it along with the tokenizer yet - would be interesting. > > > If I had more time, I'd try selectively removing that separation or/and > > adding more fallbacks (like if a certain pair of characters never occurs > > in that position for that length, see if it does for others and use that > > before falling back to considering only one character instead of two). > > Excerpt from an e-mail I wrote in late 2021: > > > For incremental mode, I got inconsistent results for a possible upgrade > > from the current 3-grams to 4-grams, which I spent a couple of days on > > last week. In my tests so far, results vary from -11.5% to +24.9%, and > > are commonly at around +5%. This is by number of passwords cracked in > > comparison to the currently released code trained in the same way. > > > > The variance is for different training sets, test sets, prior exclusion > > or not of passwords crackable by wordlist+rules, and different attack > > duration (such as 1 vs. 10 billion candidates tested). While the > > results are mostly positive, it is not entirely obvious which ones > > reflect future real-world usage best. Since there's significant extra > > processing and memory consumption for 4-grams vs. 3-grams, we might want > > to justify it with a greater improvement than what I'm getting so far. > > > > Compared to the current publicly released .chr files, the improvement is > > more obvious - up to +39.4% in my tests so far - but much of it is also > > possible without code change (with more extensive training sets). > > Alexander >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.