|
Message-ID: <20180213164701.GA10666@openwall.com> Date: Tue, 13 Feb 2018 17:47:02 +0100 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Markov Sampling On Tue, Feb 13, 2018 at 04:44:03PM +0100, Matlink wrote: > > The pre-defined --external=Parallel mode will do what you ask for. > > You'll just need to customize the "node" and "total" numbers in its > > init() in john.conf. > Well, I guess it's only 'not printing' generated candidates? Does it > really speed up the process, since generating a password candidate is > more costly than printing it? It doesn't speed up the processing inside JtR; it actually adds extra processing. > Concretely, is --markov --stdout --external=Parallel with node 1/100, > 100 times faster than with node 1/1? No. It's probably roughly same speed: the external mode adds overhead internally to JtR, but then those skipped candidates don't need to be printed to the Unix pipe. > > However, note that "every 10th" doesn't necessarily produce a > > representative sample: the underlying cracking mode (in this case, > > Markov) might happen to have some periodicity in its output, and one of > > its period lengths might just happen to be a multiple of 10 or whatever. > > So ideally you'd want to randomize the order (if the order somehow > > doesn't matter for your research) over a larger number of candidate > > passwords - say, pass a million of them through GNU coreutils' shuf(1) - > > and then take every 10th out of that randomized list. > > My issue is that I can't get the whole output because it is too costly > for me to gather them due to UNIX pipe. I would like to my > > john --stdout --markov --sample=100 | my_sublime_post-process > > be somewhat 100 times faster than > > john --stdout --markov --sample=1 | my_sublime_post-process You could use the built-in --node=1/100 feature, which probably will speed things up a lot, but then it almost certainly doesn't result in a representative sample - it's just a way to split the work between multiple nodes, without regard as to whether each node would get a representative sample and be expected to crack a similar percentage of real-world passwords that other nodes crack or not (so this probably won't be the case, making this approach unsuitable for use in research). The same applies to incremental mode. > Your solution requires to get the whole output of john and then > post-process it, but I can't find a satisfiable way to get its whole > output (since john is really fast to generate candidates). A question is whether you actually need to get this many candidates (or a sample from this many), or whether fewer would suffice. That depends on what your ultimate goal is. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.