john-users - Re: Markov phrases in john

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ9ii1GijmknTkRMWBNXLAPWE+xC4uiwZZKecoKRg=ayaqvEVA@mail.gmail.com>
Date: Sun, 17 Nov 2024 18:20:51 -0500
From: Matt Weir <cweir@...edu>
To: john-users@...ts.openwall.com
Subject: Re: Markov phrases in john

I just published a blog post comparing Tokenizer against other attack
types. Link:
https://reusablesec.blogspot.com/2024/11/analyzing-jtrs-tokenizer-attack-round-1.html

As a disclaimer, due to falling down a number of "non-tokenizer related"
rabbit holes as well as only being able to work on this work in short
bursts, I started writing this blog entry a couple of weeks ago, and I
didn't want to pivot and lose even more time. So all the tests utilize the
original version of tokenizer and don't include the improvements
discussed since then. Still I hope this research is helpful!

The short summary of the results are:
- Tokenizer performs better than Incremental mode in the first 5 billion
guesses
- OMEN performs better than Tokenizer in the first 5 billion guesses. But
OMEN has a number of implementation challenges where an Incremental based
attack can still be more practical.
- When trying to simulate multi-stage cracking attacks I really need a
better way to record much longer cracking sessions (aka trillions of
guesses). While Tokenizer appears to be a respectable attack to run after
running a large rules/wordlist attack, the fact that I only ran it for 5
billion guesses didn't make the results
trustworthy/statistically-significant. Basically the result is the test
needs to be redesigned vs. learning much about the actual attacks ;p

Cheers,
Matt/Lakiw

On Thu, Oct 31, 2024 at 10:15 PM Solar Designer <solar@...nwall.com> wrote:

> On Fri, Nov 01, 2024 at 12:19:00AM +0100, Solar Designer wrote:
> > On Thu, Oct 31, 2024 at 11:36:07PM +0100, Solar Designer wrote:
> > > What's more interesting, though, is that it's a way to get different
> > > passwords cracked.  For example, with token length forced to 4 (for all
> > > 158 tokens, many of which are full words or years), training on RockYou
> > > without dupes, at 1 billion candidates I got 1770275 or +670876.
> > > Combining this with the above result of "1870645 or +771246" (which was
> > > for token lengths 2 to 4), I get 2123847 or +1024448.  That's for 1+1=2
> > > billion candidates total.  Simply continuing the first (token length 2
> > > to 4) run to 2 billion instead gives merely 2016222 or +916823.
> > >
> > > So we get 12% more combined incremental mode cracks by splitting the 2
> > > billion candidate budget into two differently tokenized 1 billion runs.
> >
> > I was also interested in how wasteful or not such split is in terms of
> > duplicate candidates.
> >
> > For the token length 2 to 4 run, we have 997250925 unique (99.7%).
> > For the token length 4 run, we have 998700856 unique (99.9%).
> > For these two combined, we have 1885325771 unique (94.3%).
> >
> > So it's only moderately wasteful (and for such counts it's practical to
> > deduplicate when hashes are slow), but could get worse for longer runs.
>
> Upon a closer look, I realize that the token length 4 run is actually a
> mix of lots of token-less passwords and also many with tokens.  So it's
> an interesting and useful result, but it's not what it seemed at first -
> not so much of a focus on longer passwords in the second billion.
>
> To actually focus on longer passwords, I just processed the length 4
> token fake pot file through:
>
> sed -n '/[^ -~]/p'
>
> This leaves only lines with non-ASCII characters, which is what we use
> for tokens.  Then the corresponding 1 billion run cracks only +378031,
> but the ratio of longer passwords increases (359 are length 13+, up from
> 124 before the above sed).  Combined with the token length 2 to 4 run,
> it's 2018976 or +919577, which is still slightly higher than a 2 billion
> run for token length 2 to 4.
>
> To fully exclude token-less passwords from this second run, I modified
> the external mode:
>
> -       word[k] = 0;
> +
> +       if (i == k)
> +               word = 0;
> +       else
> +               word[k] = 0;
>
> (This filters out candidate passwords for which the length was left
> unchanged by token substitution, which means they had no tokens.)
>
> Then it cracks only +156803, which obviously leaves it behind a simple 2
> billion run for token length 2 to 4.  The number of cracked length 13+
> passwords increases only a bit further (387, up from 359 above).  First
> 25 candidates from this run are:
>
> master1
> malove
> minnie1
> melove
> jameslove
> jolove
> samanda
> sweetygirl
> sweets1
> ming1234
> ma1234
> me1234
> james1234
> jo1234
> masters
> mara123
> minnie2
> sweety1
> sweets3
> may1234
> miamor1
> miamore
> sara123
> sweetgirl1
> sweetgirl9
>
> Length 16+ cracked are:
>
> mariannamarianna
> ilovemyfamily123
> lovelovelove1994
> angelinaangelina
> alexalexalex2007
> bellababygirl2007
> sexygurl4eva1992
> cherryberry2cute
> 1989198919891989
> danceamandadance
> bearbearbearbear
> moneyoverbitches1
> 2005200520052005
> 0000000000002008
> ilovestephanie11
>
> (Those mostly with repetitions should of course also be crackable with
> wordlist+rules.)
>
> Modifying the external mode to insist on at least 2 tokens (length
> increase greater than 4) results in the below first 25 candidates:
>
> jameslove
> sweetygirl
> james1234
> sweetgirl1
> sweetgirl9
> amberlove
> amber1234
> jamesbaby
> amberbaby
> moneylove
> money1234
> moneybaby
> jerry1234
> jerrybaby
> jerrygirl
> sweetgirl2006
> sweetlove1
> sweetlove4
> sweetlover
> moneygirl
> jamesgirl
> jerryange
> ambergirl
> sweetlove
> sweet1234
>
> This gets closer to "Markov phrases", although words longer than 4 are
> formed from the tokens plus individual letters.  Unfortunately, this
> cracks only +14391 in 1 billion, out of which 427 are length 13+ (still
> an increase compared to previous runs).  It may be worth retesting this
> kind of filtering with shorter tokens, as I guess at length 4 the low
> number of available tokens becomes too much of a limiting factor for
> which passwords may be formed.
>
> Besides longer passwords still being relatively rare and these attacks
> not being as effective as wordlist+rules at cracking them, yet another
> factor is that longer passwords - and especially non-wordlist-crackable
> ones - may be under-represented in HIBP compared to real-world usage.
> That's because HIBP is compiled largely from previously-cracked
> passwords (many of them from a long ago), not only from plaintext leaks.
> So whatever passwords others couldn't crack before are simply not in
> there, unless the specific leak was plaintext.  In this context, Matt's
> suggested testing "against a site specific password dump" makes even
> more sense.
>
> Alexander
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.