|
Message-ID: <20100310165545.GA22719@openwall.com> Date: Wed, 10 Mar 2010 19:55:45 +0300 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: Is JTR MPIrun can be optimized for more cores ? magnum, RB - thank you for your recent postings. I appreciated those. On Wed, Mar 10, 2010 at 09:03:17AM -0700, RB wrote: > On Wed, Mar 10, 2010 at 01:26, websiteaccess@...il.com <websiteaccess@...il.com> wrote: > > ------------------------------------------------------ > > 1 core, duration 40 mn : found 8676 > > ------------------------------------------------------ > > 2 cores, duration 20 mn : found 8207 > > ------------------------------------------------------ > > 4 cores, duration 10 mn : found 7886 > > ------------------------------------------------------ > > 8 cores, duration 5 mn : found 7189 > > ------------------------------------------------------ > > This does serve to roughly illustrate the point, Surely it does. The slight reduction in the number of passwords cracked is attributable not only to the property of the current MPI patch that was mentioned in this thread earlier, but also to properties of the Core i7 CPU - namely, that the clock rate decreases when more CPU cores are brought to use, and that the CPU is not in fact 8-core (it is quad-core with HT). Also, with a lot of fast-to-compute hashes loaded for cracking, a noticeable amount of time is spent on lookups over the large hash table (should be 1M entries), which involves accesses to the CPU's L3 cache and to RAM over buses shared between the cores. > but is also a clear indicator of the lack of communication between the > individual processes. Is it? Assuming that only the "incremental" mode was in use, any communication between the processes wouldn't make much of a difference. There's no overlap in candidate passwords tried by the different processes, so no overlap in successful guesses too. It is true that successful guesses could be propagated to all processes to get the corresponding hashes removed from all, which would reduce the number of hashes being cracked at a given time and thereby potentially speedup checks of computed hashes against those loaded for cracking - but for this specific test any such speedup would be negligible (if not negative, because removing the hashes has its "cost" too). W.A. mentioned that the total number of hashes loaded was 2 million, so removing 8 thousand would really not make a difference. > The most accurate test would be to do an incremental run of a > character set you know will complete on a single core in a reasonable > timeframe. Then, test the completion time (using 'time john ...' for > accuracy) of 2, 3, and 4-core runs. I agree. An 8-process run would make sense, too, and would likely complete sooner than the 4-process one, because Core i7's HT works fairly well with JtR's code (in my testing). > Please be aware that the MPI patch by itself induces (as I recall) > 10-15% overhead in a single-core run. Huh? For "incremental" mode, it should have no measurable overhead. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.