john-users - Re: DES with OpenMP

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111231185054.GA17215@openwall.com>
Date: Sat, 31 Dec 2011 22:50:54 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: DES with OpenMP

On Sat, Dec 31, 2011 at 02:09:13PM +0000, Alex Sicamiotis wrote:
> I've benchmarked DES (openMP) with GCC 4.6 / 4.7 and ICC 12.1...

Thank you for contributing those benchmark results to the wiki.  It's an
impressive overclock you got (a really cheap CPU at 4 GHz).

http://openwall.info/wiki/john/benchmarks

> In my case, (dual core Celeron E3200 - Wolfdale 45nm core), the second core scaled +80% for GCC, and over 99.5% for ICC. So there's some kind of problem (?) in GCC-OpenMP I suppose for DES.

+80% is reasonable (that's 90% efficiency - that is, 180% out of 200%),
+99.5% is too high.  In my testing, the efficiency of the bitslice DES
parallelization with OpenMP is at around 90% for DES-based crypt(3) for
"many salts" on current multi-core CPUs.  +99.5% indicates that there is
another source of speedup besides the use of a second core.

Please take into consideration that in non-OpenMP builds for -x86-64 (as
well as for some other x86-* targets) assembly code is being used for
bitslice DES.  When you enable OpenMP, that is disabled in favor of C
code with SSE2 intrinsics.  It is possible that ICC tunes the C + SSE2
intrinsics code for your specific CPU model, whereas the supplied SSE2
assembly code was tuned for Core 2 in general.  Also, the compiler is
given an opportunity to make some cross-S-box optimizations, which are
not made in the supplied assembly code.  These extra optimizations might
account for 3% or so, which explains the unbelievably high
parallelization efficiency (for this code).  (96% would be believable,
albeit still very high for this code.)

With GCC 4.6, there is a performance regression (compared to 4.5 and
4.4), which was especially bad without OpenMP.  This is one reason why
JtR 1.7.9 forces the use of the supplied assembly code (whenever
available) for non-OpenMP builds.  When you build with GCC 4.6 and
OpenMP, you may be hit by this performance regression to some extent.
You may want to try GCC 4.5 to avoid it.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017

> Performance with GCC+openMP was better in blowfish and MD5 for GCC compared to ICC (not necessarily because ICC had scaling problems - rather it was slower in MD5 + Blowfish, even in single session).

BTW, 99% parallelization efficiency is quite realistic for these slower
hash types.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.