|
Message-ID: <20241030040147.GA26754@openwall.com> Date: Wed, 30 Oct 2024 05:01:47 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Markov phrases in john On Tue, Oct 29, 2024 at 10:21:39PM -0400, Matt Weir wrote: > I published a blog post explaining the new tokenizer attack works as well > as detailing instructions on how to configure and run it. Link: > https://reusablesec.blogspot.com/2024/10/running-jtrs-tokenizer-attack.html Good stuff. I like your introduction. Indeed, this isn't to be called "Markov phrases" - that's just the Subject line on this thread. It is an alternative to applying a Markov model to entire words (which is what the thread was originally about), instead applying it to any substrings. Where you suggest copy-pasting into john.conf, I instead suggest simply: ./tokenize.pl TRAINING_PASSWORDS.txt > john-local.conf This file is automatically included, and the sed line in there is no problem - it is treated a comment. Where you observe the first 25(ish) guesses become visibly worse, I guess that's because your training set is worse (before the tokenizer). You train on whatever 1 million passwords, but the .chr files supplied with JtR were trained on the full RockYou (32 million including dupes). If you want to show the effect of the tokenizer alone, you need to re-train both with and without tokenizer on the same input (I did). Here's the first 25 I am getting for my tokenizer-enabled RockYou-trained file (as used in the previous tests I posted about): $ ./john -inc=custom -ext=untokenize -stdo -max-candidates=25 Warning: only 253 characters available 123456 12345 loveme marian 12345a mario lovely lovelove justin maria superman 12341234 123412345 marie1 marie123 lovers lover1 123457 12341231 mariel marie2 lovely1 lovers1 12345j 12341235 This looks much better than what you observed. > The tests I'm interested in running are comparing the tokenizer attack vs. > standard incremental against different datasets and paired with different > attacks. Aka you ran a quick wordlist attack first (using RockYou), so > it'll be interesting to see how tokenizer works in conjunction with other > attacks vs. it being a stand-alone attack. Also I have concerns about using > HIBP as a test list. That's probably worth a whole other post/email, but > long story short I'm really interested to see how tokenizer does against a > site specific password dump vs. a more generic "combined leak list". For combining with other attacks, it is possible that training on stronger passwords (exclude those not matching a "policy") may yield better results (more new on top of what other attacks crack), see: https://github.com/openwall/john/issues/5220 I wonder how this fits in with tokenization - probably it's orthogonal, but I'm not sure. > Cheers and thanks for all the great work. I'm really looking forward to > better understanding this tool! Thank you very much. I'm looking forward to your test results. BTW, I notice you link to your previous blog posts on incremental mode from 2009-2010. If you compare against those old results now, please be aware that I improved the incremental mode itself significantly in 2013 ("such that the counts of character indices grow independently for each position" as my commit message says). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.