Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 22 Mar 2011 06:03:40 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: bitslice DES on AVX

Hi,

So I tested the AVX code on Rembrandt's "Intel(R) Core(TM) i7-2600 CPU @
3.40GHz" (quad-core with HT, 8 logical CPUs).  Thanks, Rembrandt!

The code previously only tested under Intel's Software Development
Emulator worked flawlessly.  I tried compiling with gcc 4.4.5 that was
installed on Rembrandt's Ubuntu, with a gcc 4.5.0 build I uploaded, and
finally with a fresh build of gcc 4.7.0-20110319 (development snapshot).
All worked fine.

The best performance is achieved with 128-bit AVX operations.  256-bit
is slightly slower.  SSE2 is slower yet.

Here's some relevant info on Sandy Bridge vs. Bulldozer:
http://www.realworldtech.com/page.cfm?ArticleID=RWT091810191937&p=10
256-bit could have worked better, but it is seen that it shouldn't have
provided much of an advantage over 128-bit as long as the instruction
stream allows for parallel execution of two ops almost all of the time.

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     4819K c/s real, 4867K c/s virtual
Only one salt:  4080K c/s real, 4080K c/s virtual

Benchmarking: Traditional DES [256/256 BS AVX]... DONE
Many salts:     4627K c/s real, 4674K c/s virtual
Only one salt:  3930K c/s real, 3930K c/s virtual

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     4143K c/s real, 4185K c/s virtual
Only one salt:  3583K c/s real, 3583K c/s virtual

These are for a single thread, no OpenMP.  I've also tried weird
combinations, such as AVX+MMX and many others.  Of these, 128-bit AVX
plus MMX plus 64-bit native (256-bit total achieved in this weird way)
got very close to plain AVX speed.  Others were substantially slower.

*-xop builds failed as expected (should work on future AMD CPUs), like:

Benchmarking: Traditional DES [128/256 BS XOP]... Illegal instruction

Some other performance numbers to note (with latest gcc):

Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw:    14530 c/s real, 14676 c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    936 c/s real, 936 c/s virtual

Benchmarking: dummy [N/A]... DONE
Raw:    136552K c/s real, 136552K c/s virtual

("dummy" is a new feature that will be in 1.7.7.)

LM hash with new DES key setup (yes, I ported that patch):

Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw:    63195K c/s real, 63195K c/s virtual

OpenMP benchmarks:

-omp-des-4 (ported to the current code), SSE2 (reference):

Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts:     15949K c/s real, 2003K c/s virtual
Only one salt:  7962K c/s real, 1001K c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    4752 c/s real, 598 c/s virtual

Another run:

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    4848 c/s real, 604 c/s virtual

Upgrade to AVX:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     19095K c/s real, 2401K c/s virtual
Only one salt:  8613K c/s real, 1091K c/s virtual

GOMP_SPINCOUNT=10000:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     19070K c/s real, 2420K c/s virtual
Only one salt:  9560K c/s real, 2051K c/s virtual

Another run:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     19243K c/s real, 2429K c/s virtual
Only one salt:  9682K c/s real, 2060K c/s virtual

OMP_NUM_THREADS=4:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     17817K c/s real, 4476K c/s virtual
Only one salt:  9270K c/s real, 2346K c/s virtual

Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw:    3302 c/s real, 834 c/s virtual

-omp-des-7 (ported to the current code):

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     17448K c/s real, 2228K c/s virtual
Only one salt:  13577K c/s real, 1797K c/s virtual

Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw:    68861K c/s real, 9032K c/s virtual

GOMP_SPINCOUNT=10000:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     17006K c/s real, 2358K c/s virtual
Only one salt:  14404K c/s real, 2211K c/s virtual

Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw:    65126K c/s real, 19211K c/s virtual

OMP_NUM_THREADS=4:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     16108K c/s real, 4087K c/s virtual
Only one salt:  14258K c/s real, 3609K c/s virtual

Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw:    96436K c/s real, 24169K c/s virtual

OMP_NUM_THREADS=3:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     12681K c/s real, 4227K c/s virtual
Only one salt:  11403K c/s real, 3813K c/s virtual

Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw:    91029K c/s real, 30444K c/s virtual

OMP_NUM_THREADS=2:

Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts:     8945K c/s real, 4495K c/s virtual
Only one salt:  8208K c/s real, 4104K c/s virtual

Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw:    79233K c/s real, 39616K c/s virtual

That's all for now.  AVX support itself will be in 1.7.7.  The updated
OpenMP patches will be released "against 1.7.7".

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.