|
Message-ID: <20210503141815.GA6683@openwall.com> Date: Mon, 3 May 2021 16:18:15 +0200 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: source of information for John's charset files On Sun, May 02, 2021 at 11:00:34PM -0400, Matt Weir wrote: > I apologize in advance if I misunderstood your testing procedure or your > results, but using the HIBP list as a test set is really problematic when > applying that to normal password cracking sessions. > > Duplicates matter and our techniques should reflect that. Making guesses of > '123456' and 'password' before 'ajger' should be rewarded, but using the > HIBP list all three guesses are awarded the same value. I could see > excluding the top 10k password guesses from an incremental training set, > (since '123456' and 'password' will be almost certainly cracked by a > dictionary attack), to optimize how incremental plays with brute-force, but > even that approach while it seems like it makes sense, has backfired on me > every time I have tried it, resulting in worse results when applying it to > new datasets. FWIW, the current HIBP hash lists do include the counts, so we can repeat each password the specified number of times and use that in our training or/and test sets, if we want to. My expectation is that training on unique passwords only will in fact reduce the number of cracked accounts when only incremental mode is used. However, after having run through RockYou as a wordlist, it's not obvious whether it's beneficial for incremental mode to also favor the repeated passwords like 123456. What you suggest about excluding e.g. just the top 10k makes sense. Another approach I thought of, but I don't recall trying, is to apply a logarithmic scale to the counts. For example, for passwords appearing 1000+ times include them 4 times, for 100+ include them 3 times, etc. > On a different point, I am totally ok with updating the training set from > RockYou. I could go on and on about the weirdness of that dataset, not to > mention that it really is showing its age. The gold standard right now of > public datasets would probably be the LinkedIn list, which also is showing > its age, but is a bit more comparable to current web passwords. An advantage of RockYou is that it's easily available to everyone. Another advantage is that it's plaintexts, so not biased to what was crackable, or to what a person having downloaded LinkedIn hashes would crack if they want to (re)generate .chr files from that. > Side note, I just saw your most recent results of training/running against > RockYou. I'm willing to admit I'm wrong if you are getting better results > training without dupes. That's just contrary to what I've seen in the past. > I might need to run some tests of my own to look into this. Note: better results when the test set is also without dupes. However, I think that's what matters after most dupes are eliminated using a wordlist anyway in real-world usage of our tools. Even newer results below: > On Sun, May 2, 2021 at 5:39 PM Solar Designer <solar@...nwall.com> wrote: > > On Sun, May 02, 2021 at 11:21:34PM +0200, Solar Designer wrote: > > > Anyway, I just ran some tests the other way around - "cracking" RockYou > > > passwords. I didn't try excluding RockYou itself from the training sets > > > here - can't do that while including our current .chr files in the > > > comparison. So this is in-sample testing, which is generally a wrong > > > thing to do, but with that in mind here are the results for different > > > training sets (all are for incremental mode and 1 billion candidates): > > > > > > RockYou with dupes - 20.2% > > > RockYou unique - 21.9% > > > HIBPv7 cracked - 17.9% > > > > > > The percentages cracked are those of RockYou unique. > > > > > > Not surprisingly, RockYou is best fit for itself. HIBP is an acceptable > > > fit as well. It could have potentially performed better than RockYou > > > on this test due to its larger size, but as we can see that was not > > > enough to overcome it not being such a perfect fit as RockYou itself. > > > > FWIW, RockYou unique being best fit for itself persists after I shuffled > > it and split it into a 1M test set and 13.3M training set (no matching > > passwords in the sets, but both sets are parts of RockYou). Got 21.5%. I decided to test these not only at 1 billion candidates, but also at other points. I use three training sets: RockYou with dupes (same as was used to generate our currently bundled .chr files - in fact, I just reuse ascii.chr from there), RockYou unique shuffled and 1M test set removed from it (so 13.3M training set), and HIBP v7 458M cracked (after removal of the fbobh_* pattern). The test set is always the mentioned 1M from RockYou unique shuffled. Here are the percentages cracked at 10M, 100M, 1G, 10G, 100G candidates: RockYou with dupes - 4.6%, 10.2%, 20.2%, 33.3%, 48.0% RockYou -1M unique - 4.7%, 11.2%, 21.5%, 35.0%, 48.3% HIBP v7 cracked - 3.2%, 8.7%, 17.8%, 30.0%, 44.5% So despite of "RockYou -1M unique" being the only one 100% out-of-sample test (no password appears in both the training and the test set) and also having the smallest training set (at 13.3M), it outperforms the two other tests across this whole range. Of course, HIBP performing worse doesn't necessarily mean it's a worse choice in general - just that it's a worse fit for RockYou. We've also seen that when using a portion of HIBP as the test set, things are the other way around - training on the rest of HIBP produces better results than training on RockYou does. BTW, each of these being the best fit for itself (even without overlap in actual passwords between test and training sets) could be not only (or not so much) in password patterns, but also in password lengths distribution (as incremental mode switches lengths back and forth based on what it was trained on). Also curious is how many different passwords the different training sets crack. At the 100G mark, the three runs above cracked a total of 52.5%. Grouping by two: RockYou dupes + HIBP - 51.4% RockYou unique + HIBP - 51.0% RockYou dupes + unique - 50.3% Combining one 10G run with one 100G run yields 47.0% (RockYou unique 10G with HIBP 100G) to 48.8% (HIBP 10G with RockYou unique 100G). Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.