john-users - Re: Is JTR MPIrun can be optimized for more cores ?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100310165545.GA22719@openwall.com>
Date: Wed, 10 Mar 2010 19:55:45 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Is JTR MPIrun can be optimized for more cores ?

magnum, RB - thank you for your recent postings.  I appreciated those.

On Wed, Mar 10, 2010 at 09:03:17AM -0700, RB wrote:
> On Wed, Mar 10, 2010 at 01:26, websiteaccess@...il.com <websiteaccess@...il.com> wrote:
> > ------------------------------------------------------
> > 1  core,  duration 40 mn : found 8676
> > ------------------------------------------------------
> > 2 cores, duration  20 mn : found 8207
> > ------------------------------------------------------
> > 4 cores, duration  10 mn : found 7886
> > ------------------------------------------------------
> > 8 cores, duration   5  mn : found 7189
> > ------------------------------------------------------
> 
> This does serve to roughly illustrate the point,

Surely it does.  The slight reduction in the number of passwords cracked
is attributable not only to the property of the current MPI patch that
was mentioned in this thread earlier, but also to properties of the
Core i7 CPU - namely, that the clock rate decreases when more CPU cores
are brought to use, and that the CPU is not in fact 8-core (it is
quad-core with HT).  Also, with a lot of fast-to-compute hashes loaded
for cracking, a noticeable amount of time is spent on lookups over the
large hash table (should be 1M entries), which involves accesses to the
CPU's L3 cache and to RAM over buses shared between the cores.

> but is also a clear indicator of the lack of communication between the
> individual processes.

Is it?  Assuming that only the "incremental" mode was in use, any
communication between the processes wouldn't make much of a difference.
There's no overlap in candidate passwords tried by the different
processes, so no overlap in successful guesses too.  It is true that
successful guesses could be propagated to all processes to get the
corresponding hashes removed from all, which would reduce the number of
hashes being cracked at a given time and thereby potentially speedup
checks of computed hashes against those loaded for cracking - but for
this specific test any such speedup would be negligible (if not
negative, because removing the hashes has its "cost" too).  W.A.
mentioned that the total number of hashes loaded was 2 million, so
removing 8 thousand would really not make a difference.

> The most accurate test would be to do an incremental run of a
> character set you know will complete on a single core in a reasonable
> timeframe.  Then, test the completion time (using 'time john ...' for
> accuracy) of 2, 3, and 4-core runs.

I agree.  An 8-process run would make sense, too, and would likely
complete sooner than the 4-process one, because Core i7's HT works
fairly well with JtR's code (in my testing).

> Please be aware that the MPI patch by itself induces (as I recall)
> 10-15% overhead in a single-core run.

Huh?  For "incremental" mode, it should have no measurable overhead.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.