Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 Jan 2021 19:54:33 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: source of information for John's charset files

On Mon, Jan 25, 2021 at 07:35:48PM +0100, Johny Krekan wrote:
> I understand right that the list rockyou which you used had duplicate 
> words for example 2x word example in it. My question is what is the 
> reason or advantage of using such wordlist with duplicates in comparison 
> with wordlist with no duplicates? If I create one .pot file from the 
> rockyou with no duplicates would it provide worse probability in finding 
> the password during same time as yours?

A reasonable expectation is that inclusion of duplicates in the training
set increases the number of cracked accounts rather than cracked unique
passwords in subsequent password security audits.  Conversely, omitting
the duplicates would possibly optimize for cracking more unique
passwords but perhaps fewer accounts.  An alternative hypothesis is that
inclusion of duplicates might also help crack more unique passwords that
are based on frequent substrings even if those came from fully duplicate
passwords, since otherwise those substrings would be under-represented
in the training set.  You or/and others are welcome to research whether
these hypotheses are true or not.  I no longer recall the results of my
own testing from back when I made this choice (IIRC, in 2013).

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.