Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180213164701.GA10666@openwall.com>
Date: Tue, 13 Feb 2018 17:47:02 +0100
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Markov Sampling

On Tue, Feb 13, 2018 at 04:44:03PM +0100, Matlink wrote:
> > The pre-defined --external=Parallel mode will do what you ask for.
> > You'll just need to customize the "node" and "total" numbers in its
> > init() in john.conf.
> Well, I guess it's only 'not printing' generated candidates? Does it
> really speed up the process, since generating a password candidate is
> more costly than printing it?

It doesn't speed up the processing inside JtR; it actually adds extra
processing.

> Concretely, is --markov --stdout --external=Parallel with node 1/100,
> 100 times faster than with node 1/1?

No.  It's probably roughly same speed: the external mode adds overhead
internally to JtR, but then those skipped candidates don't need to be
printed to the Unix pipe.

> > However, note that "every 10th" doesn't necessarily produce a
> > representative sample: the underlying cracking mode (in this case,
> > Markov) might happen to have some periodicity in its output, and one of
> > its period lengths might just happen to be a multiple of 10 or whatever.
> > So ideally you'd want to randomize the order (if the order somehow
> > doesn't matter for your research) over a larger number of candidate
> > passwords - say, pass a million of them through GNU coreutils' shuf(1) -
> > and then take every 10th out of that randomized list.
> 
> My issue is that I can't get the whole output because it is too costly
> for me to gather them due to UNIX pipe. I would like to my
> 
>     john --stdout --markov --sample=100 | my_sublime_post-process
> 
> be somewhat 100 times faster than
> 
>     john --stdout --markov --sample=1 | my_sublime_post-process

You could use the built-in --node=1/100 feature, which probably will
speed things up a lot, but then it almost certainly doesn't result in a
representative sample - it's just a way to split the work between
multiple nodes, without regard as to whether each node would get a
representative sample and be expected to crack a similar percentage of
real-world passwords that other nodes crack or not (so this probably
won't be the case, making this approach unsuitable for use in research).

The same applies to incremental mode.

> Your solution requires to get the whole output of john and then
> post-process it, but I can't find a satisfiable way to get its whole
> output (since john is really fast to generate candidates).

A question is whether you actually need to get this many candidates (or
a sample from this many), or whether fewer would suffice.  That depends
on what your ultimate goal is.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.