john-users - Re: automating --fork with ZTEX and variable device counts

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200815125011.GA18736@openwall.com>
Date: Sat, 15 Aug 2020 14:50:11 +0200
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: automating --fork with ZTEX and variable device counts

On Fri, Aug 14, 2020 at 12:16:10PM -0800, Royce Williams wrote:
> Any tips for automating the use of --fork when the number of ZTEX boards
> can vary with device stability? (I've read OPTIONS and FAQ and don't see
> anything I can leverage.)

This problem was new to me, and I didn't have any tips.  Let's discuss:

> Since a board or two sometimes drifts in and out, hard-coding the fork
> count (so that it matches the number of devices specified on the
> commandline, etc) works for a while, but then if a board reappears or
> drops, the device count isn't cleanly divisible by the number of forks
> anymore, so john exits. This is manageable when doing individual
> interactive runs of john, but does not lend itself well to loops or
> automation.
> 
> Since the fork count is validated at runtime after starting john, even if I
> had a way to reliably check device count externally, the device count can
> (and does) change between when the check is performed and when john is
> started. So it seems that a pre-runtime check would not be useful to handle
> this case.

We're in fact checking that "Number of ZTEX devices must be a multiple
of forks."  We only do this at startup.  If the number changes while
john is running, it does not exit, but some of the forked processes may
work with fewer boards than intended until those boards come back up.

So you're saying the race window between an external check you might
have in a script and john's check at startup is too significant for your
cluster or use case.  Or maybe that the check you perform externally is
somewhat different from what john performs - e.g., perhaps it doesn't
check whether a board is possibly still in use by another john.

I ran some experiments of my own with four boards.  By deliberately
using too high a clock rate that knocks one of the boards out (160 MHz
for bcrypt), I was able to get john to recognize only 3 boards when it
was restarted for a new attack.  Then it did in fact bail out on my
attempted "--fork=2" or "--fork=4".

This specific instance of the problem went away after "killall john" and
waiting a few seconds.  (It turned out a child process corresponding to
the timing-out board was still around from a previous run, which I had
forcibly interrupted not letting it stay on "Waiting for 1 child to
terminate".)  So maybe repeated "killall john; sleep 1; killall john;
sleep 10" until the next scripted attack finally starts OK with its
pre-specified fork count is a workaround you can use.  (John treats a
subsequent signal differently from the first one, which is why I suggest
two instances of "killall" there.  You can also add a "killall -9 john"
at the end for greater assurance.)

Other than that, maybe we need source code changes such that you
wouldn't need to manually specify a fork count that cleanly divides the
number of boards.  Maybe we should introduce e.g. "--fork=0" to request
one fork per board.  This can be too many forks for a large cluster, but
unfortunately there's no other generic choice: the board count might
happen to be a prime number, and we do not currently support uneven
distribution of work across forks.

The most generic solution is simply not to use "--fork".  As you know,
we have decent multi-board support built-in, not depending on "--fork".
"--fork" is an extra hack to provide some speedup through making data
transfers and computation asynchronous across the groups of boards.
This hack does have its drawbacks.

Alexander
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.