Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111024220404.GA24895@openwall.com>
Date: Tue, 25 Oct 2011 02:04:04 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: new DES key setup

On Mon, Oct 24, 2011 at 05:38:47PM -0400, Erik Winkler wrote:
> I also tested it on Quad Corei7 (2.8 Ghz).  Note the difference in performance with OpenMP vs. Standard.  These were compiled using gcc 4.6.1.
> 
> Standard (no OpenMP):
> ../run/john -test -format:DES
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:	4985K c/s real, 4985K c/s virtual
> Only one salt:	4780K c/s real, 4780K c/s virtual
> 
> ../run/john -test -format:LM 
> Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
> Raw:	66462K c/s real, 66462K c/s virtual

This is assembly code now.  Judging by the performance numbers, I guess
your CPU is AVX capable, so you'd get it to run even faster by using an
-x86-64-avx make target.  However, IIRC binutils in Xcode is not?  And
you might need to use gcc 4.5.x for better speed at AVX (for which
there's no hand-written assembly code), not 4.6.x.

> OpenMP (default is 8 threads):
> ../run/john -test -format:DES
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:	13284K c/s real, 2128K c/s virtual
> Only one salt:	11862K c/s real, 2011K c/s virtual
> 
> OMP_NUM_THREADS=1 ../run/john -test -format:DES
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:	4569K c/s real, 4569K c/s virtual
> Only one salt:	4318K c/s real, 4327K c/s virtual
> 
> OMP_NUM_THREADS=2 ../run/john -test -format:DES
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:	8554K c/s real, 4302K c/s virtual
> Only one salt:	7875K c/s real, 4072K c/s virtual
> 
> OMP_NUM_THREADS=4 ../run/john -test -format:DES
> Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
> Many salts:	13385K c/s real, 3505K c/s virtual
> Only one salt:	11714K c/s real, 3303K c/s virtual

So it does not scale to 4500Kx4 = 18000K, let alone to 4985K*4 = 19940K.
It only does up to 13385K instead, which is 67% of 19940K.  On my
systems, I am getting between 85% and 90%.  But these don't have Turbo
Boost.  In your case, the CPU clock speed is probably 10% higher or so
(beyond 2.8 GHz) when only one core is in use.  So to get a true measure
for your system's 100% performance (without AVX for now), you need to
run 8 separate non-OpenMP processes simultaneously, then add up their
c/s rates.

Hmm, apparently Core i7 2600S is rated at 2.8 GHz standard, 3.8 GHz
turbo - so that's way more than a 10% difference, which would explain
what you're seeing.  On the other hand, some other i7's at 2.8 GHz have
much more limited turbo modes (such as 3.1 GHz max) or none at all.
You did not mention which one you have.

> ../run/john -test -format:LM 
> Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
> Raw:	71899K c/s real, 28759K c/s virtual
> 
> OMP_NUM_THREADS=1 ../run/john -test -format:LM 
> Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
> Raw:	55342K c/s real, 55342K c/s virtual
> 
> OMP_NUM_THREADS=2 ../run/john -test -format:LM
> Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
> Raw:	69042K c/s real, 47354K c/s virtual
> 
> OMP_NUM_THREADS=4 ../run/john -test -format:LM
> Benchmarking: LM DES [128/128 BS SSE2-16]... DONE
> Raw:	75694K c/s real, 38777K c/s virtual

Yes, this is what I mean - LM is about as fast with and without OpenMP.
So you may want to run separate processes for now...  Then you'll get a
combined speed of something like 200M on your CPU.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.