|
|
Message-ID: <20110322030340.GA1475@openwall.com>
Date: Tue, 22 Mar 2011 06:03:40 +0300
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: bitslice DES on AVX
Hi,
So I tested the AVX code on Rembrandt's "Intel(R) Core(TM) i7-2600 CPU @
3.40GHz" (quad-core with HT, 8 logical CPUs). Thanks, Rembrandt!
The code previously only tested under Intel's Software Development
Emulator worked flawlessly. I tried compiling with gcc 4.4.5 that was
installed on Rembrandt's Ubuntu, with a gcc 4.5.0 build I uploaded, and
finally with a fresh build of gcc 4.7.0-20110319 (development snapshot).
All worked fine.
The best performance is achieved with 128-bit AVX operations. 256-bit
is slightly slower. SSE2 is slower yet.
Here's some relevant info on Sandy Bridge vs. Bulldozer:
http://www.realworldtech.com/page.cfm?ArticleID=RWT091810191937&p=10
256-bit could have worked better, but it is seen that it shouldn't have
provided much of an advantage over 128-bit as long as the instruction
stream allows for parallel execution of two ops almost all of the time.
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 4819K c/s real, 4867K c/s virtual
Only one salt: 4080K c/s real, 4080K c/s virtual
Benchmarking: Traditional DES [256/256 BS AVX]... DONE
Many salts: 4627K c/s real, 4674K c/s virtual
Only one salt: 3930K c/s real, 3930K c/s virtual
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 4143K c/s real, 4185K c/s virtual
Only one salt: 3583K c/s real, 3583K c/s virtual
These are for a single thread, no OpenMP. I've also tried weird
combinations, such as AVX+MMX and many others. Of these, 128-bit AVX
plus MMX plus 64-bit native (256-bit total achieved in this weird way)
got very close to plain AVX speed. Others were substantially slower.
*-xop builds failed as expected (should work on future AMD CPUs), like:
Benchmarking: Traditional DES [128/256 BS XOP]... Illegal instruction
Some other performance numbers to note (with latest gcc):
Benchmarking: FreeBSD MD5 [32/64 X2]... DONE
Raw: 14530 c/s real, 14676 c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 936 c/s real, 936 c/s virtual
Benchmarking: dummy [N/A]... DONE
Raw: 136552K c/s real, 136552K c/s virtual
("dummy" is a new feature that will be in 1.7.7.)
LM hash with new DES key setup (yes, I ported that patch):
Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw: 63195K c/s real, 63195K c/s virtual
OpenMP benchmarks:
-omp-des-4 (ported to the current code), SSE2 (reference):
Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE
Many salts: 15949K c/s real, 2003K c/s virtual
Only one salt: 7962K c/s real, 1001K c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 4752 c/s real, 598 c/s virtual
Another run:
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 4848 c/s real, 604 c/s virtual
Upgrade to AVX:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 19095K c/s real, 2401K c/s virtual
Only one salt: 8613K c/s real, 1091K c/s virtual
GOMP_SPINCOUNT=10000:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 19070K c/s real, 2420K c/s virtual
Only one salt: 9560K c/s real, 2051K c/s virtual
Another run:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 19243K c/s real, 2429K c/s virtual
Only one salt: 9682K c/s real, 2060K c/s virtual
OMP_NUM_THREADS=4:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 17817K c/s real, 4476K c/s virtual
Only one salt: 9270K c/s real, 2346K c/s virtual
Benchmarking: OpenBSD Blowfish (x32) [32/64 X2]... DONE
Raw: 3302 c/s real, 834 c/s virtual
-omp-des-7 (ported to the current code):
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 17448K c/s real, 2228K c/s virtual
Only one salt: 13577K c/s real, 1797K c/s virtual
Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw: 68861K c/s real, 9032K c/s virtual
GOMP_SPINCOUNT=10000:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 17006K c/s real, 2358K c/s virtual
Only one salt: 14404K c/s real, 2211K c/s virtual
Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw: 65126K c/s real, 19211K c/s virtual
OMP_NUM_THREADS=4:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 16108K c/s real, 4087K c/s virtual
Only one salt: 14258K c/s real, 3609K c/s virtual
Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw: 96436K c/s real, 24169K c/s virtual
OMP_NUM_THREADS=3:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 12681K c/s real, 4227K c/s virtual
Only one salt: 11403K c/s real, 3813K c/s virtual
Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw: 91029K c/s real, 30444K c/s virtual
OMP_NUM_THREADS=2:
Benchmarking: Traditional DES [128/256 BS AVX]... DONE
Many salts: 8945K c/s real, 4495K c/s virtual
Only one salt: 8208K c/s real, 4104K c/s virtual
Benchmarking: LM DES [128/256 BS AVX]... DONE
Raw: 79233K c/s real, 39616K c/s virtual
That's all for now. AVX support itself will be in 1.7.7. The updated
OpenMP patches will be released "against 1.7.7".
Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.