john-users - Re: Loading a large password hash file

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160713140750.GA24793@openwall.com>
Date: Wed, 13 Jul 2016 17:07:50 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Loading a large password hash file

On Wed, Jul 13, 2016 at 09:11:24AM +0200, Albert Veli wrote:
> For the record, on my computer it's faster to split the hashfile and loop
> than waiting for the whole file to load. About 10 million lines per
> hashfile seems to be a good value for my computer:
> 
> split -l 10000000 huge_hashlist.txt
> 
> that creates split files with filenames xaa, xab etc. Then loop:
> 
> for hl in x*; do ./john --fork=4 --format=Raw-SHA1 <more arguments> $hl;
> done

Preferring to split at 10 million lines is unreasonably low for the
current code.  What version of JtR are you using?  There have been
relevant improvements made during September 2015 - maybe yours is older
than that?  How much RAM do you have?  How long does JtR take to load
those 10 million lines?  And how long for the whole input file (and how
large is it in your case)?

You might want to set "NoLoaderDupeCheck = Y" in john.conf, especially
if your hash file is already unique'd.  And this is a reason to run the
file through unique just once instead of having JtR eliminate dupes on
each load individually.  Moreover, when you split the file, JtR's dupe
elimination isn't fully effective (because it can't detect dupes across
the different split portions), so you really may prefer to unique first
and optionally split next.  I mean:

./unique -mem=25 new-hash-file < old-hash-file

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.