john-users - Re: Splitting workload on multiple hosts

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180409190300.GA22773@openwall.com>
Date: Mon, 9 Apr 2018 21:03:00 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Splitting workload on multiple hosts

On Mon, Apr 09, 2018 at 01:37:30PM -0400, Rich Rumble wrote:
> Sorry to dredge this subject back up, I'm not convinced Fork is fully using
> all 24 CPU's in my single machine to the best if it's ability, on an
> "incremental" run I'm doing. Will some modes work better in fork than
> others? I know certain algorithms do, and mine is one of them (raw-sha1). I
> have a few (other)issues, one being the hashes I'm going after are enormous
> and I can't fit them all in ram at once (HaveIBeenPwnd v2) so I've split
> them up into 20 1Gb slices. Perhaps a new thread may be needed for the
> incremental issue I'm not sure, but using -fork=24 seems to only see 6-8
> threads of 100% util, and status updates are also between 6-8 when pressing
> a key. So I have found I can load four 1Gb slices in ram (save-mem=2), and
> run fork=6 on those. In doing that I appear to have some overlap, in that
> some threads are being used twice for work, but I'm not 100% sure. But if I
> stop one of the four runs, as soon as it's stopped one or two of the
> remaining three start churning out passwords like crazy. I do not think
> this is a problem fork/node are there to solve, but was curious if there
> was a way to make sure work in cpu/threads 1-6 are only done by this john
> instance, and work for the other john instance 1-6 are only done by
> cpu/threads 7-12. Since I'm doing different work, I didn't think node would
> be the answer for that, I figured the potential for overlap would be the
> same even if I specified node=0-5 for each instance.

How much RAM do you have?  It sounds like some of the child processes
are simply getting killed on out-of-memory.  Unfortunately, when JtR
cracks a password the child processes deviate from each other in their
memory contents, and their combined memory usage grows.  This is not
ideal, but that's how it currently is with "--fork".

You'll want to get most of those HaveIBeenPwnd v2 passwords cracked
while running fewer processes (e.g., initially just one or two so that
you can possibly load all of the hashes at once) before you proceed to
attempt using all 24 that you need for your machine.

Helpful john.conf settings:

NoLoaderDupeCheck = Y

This is the default anyway, but maybe worth double-checking:

ReloadAtCrack = N

These are not obviously an improvement (with these at "N", the pot file
may grow larger from more duplicate entries, but cracking will be faster
and the memory usage increase from copy-on-write across --fork'ed
processes should be less, so more of them may be run):

ReloadAtDone = N
ReloadAtSave = N

Helpful command-line options:

-verb=1 -nolog -save-mem=1

"-save-mem=1" should actually speed things up by not wasting memory on
pointers to (non-existent) login names, which also improves the locality
of reference.  "-save-mem=2" has performance impact and is probably not
worth it in this case.

You may also want to increase PASSWORD_HASH_SIZE_FOR_LDR in params.h by
one (from 4 to 5) to speedup loading of large hash files like this, and
rebuild.  (The same change slows down loading of small files, which is
why it's not the default.)

FWIW, I previously experimented with HaveIBeenPwnd v1, which was 320M
hashes.  I loaded those all at once (without splitting) and was able to
run a few forks at first (4 or so) and all 40 forks eventually on a
machine with 128 GB RAM with 40 logical CPUs.

You really need to watch your RAM usage when you do things like this.
If you see less than a half of RAM free, chances are it will be eaten up
and some children will die as they crack more passwords.  So try to keep
your fork count such that you leave a half of RAM free when cracking
just starts.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.