john-users - Re: Markov phrases in john

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ9ii1HuFQE41qZHPqW7Z6fOiBcRCtr6phPqvXzqaXcu14PJ+g@mail.gmail.com>
Date: Tue, 29 Oct 2024 22:21:39 -0400
From: Matt Weir <cweir@...edu>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

Thanks for the explanation!

I published a blog post explaining the new tokenizer attack works as well
as detailing instructions on how to configure and run it. Link:
https://reusablesec.blogspot.com/2024/10/running-jtrs-tokenizer-attack.html

That post won't contain any new information that isn't already present in
this e-mail chain, but I find pictures help. I'm reluctantly splitting up
any analysis into a different blog post since it's late and I'm not sure if
I'll have time to run any serious tests in the next couple of days. But
hopefully this will be helpful for others who want to run their own tests.

The tests I'm interested in running are comparing the tokenizer attack vs.
standard incremental against different datasets and paired with different
attacks. Aka you ran a quick wordlist attack first (using RockYou), so
it'll be interesting to see how tokenizer works in conjunction with other
attacks vs. it being a stand-alone attack. Also I have concerns about using
HIBP as a test list. That's probably worth a whole other post/email, but
long story short I'm really interested to see how tokenizer does against a
site specific password dump vs. a more generic "combined leak list".

Cheers and thanks for all the great work. I'm really looking forward to
better understanding this tool!

Matt/Lakiw

On Tue, Oct 29, 2024 at 12:11 AM Solar Designer <solar@...nwall.com> wrote:

> On Mon, Oct 28, 2024 at 11:46:45PM -0400, Matt Weir wrote:
> > Looking through this more, my guess is that the output of this Sed script
> > needs to be put into potfile format so I can use --make-charset on it,
> (vs.
> > using it to generate a .chr file directly).
>
> Yes, exactly.  Something like:
>
> sed '...lots of stuff here...' TRAINING_PASSWORDS.txt | sed 's/^/:/' >
> fake.pot
> ./john --make-charset=custom.chr --pot=fake.pot
>
> > I can then have
> > incremental=tokenize to generate "encoded" guesses which I then need to
> run
> > through the JtR external mode to convert into actual password guesses.
>
> Yes, which is normally done in that same invocation, like this:
>
> ./john --incremental=custom --external=untokenize --format=nt pwfile
>
> You can redirect the output of tokenize.pl right into john-local.conf
> for the above command to work.
>
> > Side note: This is a weird edge case so very low priority request, but
> one
> > thing this made me realize is that it would be nice to use the
> > --make-charset option on a set of training passwords vs.a potfile to
> remove
> > a step in the generation process. That's just me being lazy though, and
> > I'll admit this is a task that is rare enough that optimizing it doesn't
> > provide much value.
>
> Yes, I agree this is something for us to improve.
>
> > > Step 3) Create entry in John.conf for the new charset. Example:
> > > [Incremental:Tokenize]
> > > File = $JOHN/tokenize.chr
>
> You may, or you may simply use the pre-defined Custom mode, so no edits
> are needed.
>
> Alexander
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.