john-users - Re: Speed up John

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20051121093430.GA7685@openwall.com>
Date: Mon, 21 Nov 2005 12:34:30 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Speed up John

On Thu, Nov 17, 2005 at 09:57:24AM +1100, David Luyer wrote:
> Considering long runs of john (where you're running for weeks to years
> over the one set of passwords), john can be run with a large wordlist in a 
> reasonable amount of time (weeks at most).  No need to parallelize.  The
> same with running incremental:digits or using an external to do other
> sets like 9 or 10 digits.

That's true, -- except for the slowest hash types such as bcrypt.

> The one thing which is slow is incremental:all.
> 
> The way incremental:all works is in passes, as you can see in the .rec 
> files.  In about a month running on about 100 crypts you'd probably do
> around the first 2000 passes on a fast system.

Yes.  You might be surprised that with 1 or 100,000 hashes loaded for
cracking, you would be at around 2000 "passes" after a month, too.  The
reason for this is that each subsequent pass takes longer to process.
The first thousand of passes is done quickly, the second thousand takes
a while, and the third thousand is almost never completed.

The meaning of these passes is revealed with this comment in charset.h:

/*
 * Cracking order.
 *
 * This is a list of current {length, fixed position, character count}.
 * There are CHARSET_LENGTH different lengths, and fixed position is up
 * to the current length, which means we have exactly (CHARSET_LENGTH *
 * (CHARSET_LENGTH + 1) / 2) different {length, fixed position} pairs;
 * for each such pair we need to try all charsets from 1 character and
 * up to CHARSET_SIZE characters large.
 */
	unsigned char order
		[CHARSET_LENGTH * (CHARSET_LENGTH + 1) / 2 * CHARSET_SIZE * 3]
		CC_PACKED;

With the default settings this gives us a total of:

	8 * 9 / 2 * 95 = 3420 passes

> A way to distribute the load sensibly would seem to be to have a server
> keep track of the passes as they are done and communicate the hashes
> as they are cracked;
> 
> 	server data set:
> 		[crypt string, plaintext string, source client id, crypt id]
> 		[client id, last sent crypt id]
> 		[sequence number, client id, start time, end time]
> 
> 	client starts, requests sequence number, works on that pass of
>       incremental:all; server stores sequence number, client id, start
>       time
> 
> 	client cracks crack password, sends it to server, server sends it
> 	to all online clients to add to their john.pot and updates the
>       data set to know it has been sent (any offline clients get it as
>       they connect)
> 
>       client ends pass and sends notice to server, server storess end
>       time and allocates first non-started pass to client
> 
> If a client fails the server should be able to have an entry deleted
> so that a new client will be allocated the same sequence number.
> 
> Comments?  Would this be seen as a reasonable distribution of workload
> and not a particularly bad cracking sequence?

Yes, this approach certainly works better than one implemented in dJohn
and presumably in JohnNet.  This is essentially what Ryan Lim's jtr-mpi
does, although it does not handle any failures and I don't think it
would be able to correctly recover an interrupted session either.  A
non-public John hack I've seen also does this and in a way more similar
to what you describe above.

A problem with this approach is that chunks of keyspace assigned to
slave nodes become larger and larger.  A single such chunk may be taking
days to process.  There are also not too many such passes, which will
limit scalability, albeit not too severely (this won't work well with
more than a few hundred of nodes).

These problems are relatively easily resolved, although I have not seen
that done other than in my own non-public hack I did in 1997.  It is
sufficient to further split each pass into as many ranges of candidate
password indices as there are slave nodes.  Then assign those ranges to
the nodes.  Whenever a node is done with its range, another node's range
is shrunk removing a half of its uncompleted part of the range (and that
node is notified) in order to put the idle node back to work.  Note that
this approach does not cause increasing keyspace fragmentation (which
would result in growing memory consumption on the master).  Similarly,
if the master determines that a node went offline, it reassigns its
entire range to whatever other node becomes idle first (there's room for
further optimizations to handle transient network errors here).  The
master advances to the next pass only after all online nodes are done
with their ranges.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.