|
Message-ID: <20241031160347.GA4391@openwall.com> Date: Thu, 31 Oct 2024 17:03:47 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Markov phrases in john On Thu, Oct 31, 2024 at 01:27:25PM +0100, magnum wrote: > On 2024-10-30 03:21, Matt Weir wrote: > >I published a blog post explaining the new tokenizer attack works as well > >as detailing instructions on how to configure and run it. Link: > >https://reusablesec.blogspot.com/2024/10/running-jtrs-tokenizer-attack.html > > Good stuff (not only the blog post but this whole thread). Perhaps > stating the obvious, you need to ensure the original wordlist is pure > ascii, or any parts of UTF-8 and/or legacy codepage stuff will be > erroneously detokenized. > > BTW shouldn't the sed stuff all be /g? As in "s/me/\xa1/g;". If not, > words like "meme" or "james+me" would only have the first instance > tokenized, which I assume is not what we want. Oh, you're absolutely correct. I've just pushed an update to tokenize.pl, so that the generated sed expression takes care of both of these, as well as of producing pot format output. I've also added a usage example: grep -v '^#!comment:' password.lst | ./tokenize.pl > john-local.conf sed -n 's/^# //p' john-local.conf > tokenize.sh grep -v '^#!comment:' password.lst | sh tokenize.sh > fake.pot ./john --pot=fake.pot --make-charset=custom.chr ./john --incremental=custom --external=untokenize --stdout --max-candidates=10 ./john --incremental=custom --external=untokenize hashfile And this reminded me - in the test results I posted, I had actually run tokenize.pl like the above, so it was trained on our password.lst (a subset of RockYou overlapping with top HIBP), even though for further incremental mode training I used the full RockYou (with dupes, to match what we did for the released .chr files). Then in the message I posted, I wrongly wrote that I had trained both the tokenizer and incremental mode on the same input. Oops. Sorry. I think this doesn't invalidate my results, but it does make them inconsistent with the way I described them in that message - now corrected with this paragraph. Of course, we need to run more and proper tests. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.