john-users - Re: automation equipped working place of hash cracker, proposal

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120419000351.GA26167@debian>
Date: Thu, 19 Apr 2012 04:03:51 +0400
From: Aleksey Cherepanov <aleksey.4erepanov@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: automation equipped working place of hash cracker,
 proposal

On Wed, Apr 18, 2012 at 11:35:23PM +0200, Frank Dittrich wrote:
> On 04/18/2012 10:27 PM, Aleksey Cherepanov wrote:
> > On Mon, Apr 16, 2012 at 10:52:30AM +0200, Simon Marechal wrote:
> >> If I was to design this, I would do it that way :
> >> * the server converts high level demands into low level job units
> >> * the server has at least a network API, and possibly a web interface
> >> * the server handles dispatching
> > 
> > I think the easiest way to split cracking task into parts for distribution is
> > to split candidates list, to granulate it: we run our underlying attack
> > command with '--stdout', split it into some packs and distribute that packs to
> > nodes that will just use packs as wordlists. Pros: it is easy to implement, it
> > is flexible and upgradabl, it supports modes that we don't want run to the end
> > like incremental mode, all attacks could paralleled as such (if I am not
> > wrong). Cons: it seems to be suboptimal, it does not scale well (candidates
> > generation could become bottleneck, though it could distributed too),
> 
> I'm afraid network bandwidth will soon become a bottleneck, especially
> for fast saltless hashes.

If we take bigger packs of candidates then they could be compressed well. So
we trade off network with cpu time.

> Here's another idea for splitting tasks:
> 
> The task will already be split into several parts, because there will
> probably be password hashes of different formats.
> Some hash formats will better be cracked using GPU, while others will
> probably better be distributed for CPU cracking, to make the best use of
> the available hardware.
> For fast hashes, the strategy will probably not the same as for slow hashes.
> 
> If we do have more clients than available hash formats, the tasks must
> either be split by splitting the files containing hashes into smaller
> parts, so that several clients try to crack different hashes of the same
> format, or by letting different clients run different cracking sessions.
> 
> Splitting input files with password hashes only makes sense for salted
> hashes, and may be it shouldn't even be done for fast salted hashes.
> If we split the files, we have to make sure that different hashes for
> the same salt will not be spread across different files.
> If some salts appear more frequently than others, we should split the
> salts into different files according to the numb er of hashes per salt.
> This way, we can try more rules or the same set of rules, but on larger
> word lists, for those salts which occur many times.
> 
> Distributing tasks to different clients without transferring password
> candidates probably requires that the clients use the same john version,
> and also use a common set of word lists which can be distributed prior
> to the contest.
> If later on we realize that we need additional word lists or new chr
> files (or stats files for markov mode), we could either implement a way
> to distribute those new files as well, or the tasks which use these new
> files have to be distributed among a smaller set of clients with read
> access to a directory on the central server.)
> 
> Then, you could distribute tasks by generating small config files just
> for a particular task, and by transferring the config file and the
> command line to be used by the client.
> That way, the amount of data that has to be transferred from the server
> to the clients should be much smaller compared to generating and
> distributing lists of password candidates.
> 
> If you want to distribute a task which requires running incremental mode
> for a certain time, even that should work.
> IIRC. magnum implemented an option to specify how long a cracking
> session should be run before it gets interrupted automatically.
> Just make sure the client also returns the resulting .rec file to the
> server, so that it can be reused by the next client which continues the
> incremental mode session should we run out of other tasks to try.
> 
> > while it
> > would be easy to implement recheck for results for untrusted nodes (contest
> > environment) 
> 
> I think the check we did during the last contest (verify that the
> passwords in john.pot files transferred to the central server really
> crack the corresponding hashes) is OK, more is probably not needed.
> (If we detect a malicious client which reported wrong results, we can
> still schedule the tasks that were executed on this client on another
> client.)

So if malicious client does not report bad results but reports only a half of
normal results then we loose half of passwords this node could crack being
good. I think if we want to be able to include anonymous clients during
contest then we should think about it with enough amount of paranoia.

Also it could be nice to do rechecks due to bugs in clients because while we
know (and probably) test them in environments of most members untrusted nodes
could have such crazy setups that trigger bugs.

> > it would hard to hide sensitive data from them (real life),
> 
> I think that for a real-life scenario can assume a secure network
> connections and secure clients. If you have to worry about how well
> secured your clients are, better don't distribute any tasks to them.

I mostly refer to http://www.openwall.com/lists/john-dev/2012/04/02/11 .
Though it seems to be out of the scope now.

> > it does not respect work that is already done (by other distribution
> > projects).
> 
> Avoiding duplicate work that has been done by clients which just
> cooperate with the server on a lower level (e.g., by just dumping
> john.pot files with cracked hashes into a directory) will be very hard,
> if not impossible. Better don't waste your energy here.

I mean there are other projects aimed onto distribution like MPI support for
John. So it may be more effective and still easy to use them at least for
small clusters while doing high level parallelization on our own.

Regards,
Aleksey Cherepanov
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.