|
Message-ID: <20060511050048.GA27597@openwall.com> Date: Thu, 11 May 2006 09:00:48 +0400 From: Solar Designer <solar@...nwall.com> To: announce@...ts.openwall.com, john-users@...ts.openwall.com Subject: John the Ripper 1.7.1 Hi, I've proceeded with further development of John the Ripper after the 1.7 release. A new development version is out - numbered 1.7.1: http://www.openwall.com/john/ JtR 1.7.1 adds bitslice DES code for x86 with SSE2 for better performance at DES-based crypt(3) hashes on Pentium 4 and SSE2-capable AMD processors, as well as assorted high-level changes to improve performance on current x86-64 processors (both AMD and Intel). On a related note, the SecurityFocus interview with me on John the Ripper 1.7 is now also available off the Openwall website: http://www.openwall.com/john/interviews/SF-20060222-p1 For those who are interested in some benchmarks of the new code, here they are. I've used two systems, one with an Intel P4 Xeon (3.2 GHz) and the other with an AMD Athlon 64 ("3200+", 2.0 GHz). Although the Xeon is capable of Hyper-Threading, I only ran one process, thereby not taking advantage of HT for these benchmarks. Both CPUs are SSE2 and 64-bit capable. The OS on both systems was Linux and the same builds of John were used (I copied my pre-compiled executables to both systems). I've omitted the "BSDI DES" and "Kerberos AFS DES" benchmarks to make it easier to see the really important ones. The "BSDI DES" results are in all cases proportional to the "Traditional DES" ones (as expected) and the "Kerberos AFS DES" implementation is unoptimal and unimportant to most users of John. I'll start with the Xeon: vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 3.20GHz stepping : 3 Native 64-bit (pure C, built on Owl-current for x86-64, gcc 3.4.5): Benchmarking: Traditional DES [64/64 BS]... DONE Many salts: 949593 c/s real, 949593 c/s virtual Only one salt: 875699 c/s real, 877454 c/s virtual Benchmarking: FreeBSD MD5 [32/64 X2]... DONE Raw: 10106 c/s real, 10106 c/s virtual Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 450 c/s real, 450 c/s virtual Benchmarking: NT LM DES [64/64 BS]... DONE Raw: 8848K c/s real, 8848K c/s virtual The DES performance is rather good and Blowfish is OK, but it's the performance at FreeBSD-style MD5-based crypt(3) that stands out. Most CPUs don't cross 10k c/s at this benchmark. This one does due to the high clock rate and the availability of 16 registers with x86-64, which enables John to do two MD5 hashes in parallel, even with pure C code. 32-bit with SSE2 build on the Xeon: Benchmarking: Traditional DES [128/128 BS SSE2]... DONE Many salts: 924518 c/s real, 924518 c/s virtual Only one salt: 814592 c/s real, 814592 c/s virtual Benchmarking: NT LM DES [128/128 BS SSE2]... DONE Raw: 7069K c/s real, 7069K c/s virtual Although SSE2 is effectively 128-bit, this is a little bit slower than the native 64-bit build, but it has the advantage of not requiring a 64-bit capable CPU or OS. Similar performance is expected on non-Xeon P4s and on P4 Celerons that are not 64-bit capable. 32-bit with MMX build on the Xeon: Benchmarking: Traditional DES [64/64 BS MMX]... DONE Many salts: 654080 c/s real, 654080 c/s virtual Only one salt: 599385 c/s real, 599385 c/s virtual Benchmarking: NT LM DES [64/64 BS MMX]... DONE Raw: 6521K c/s real, 6521K c/s virtual As you can see, both DES-based hashes were faster with SSE2. In case of the traditional DES-based crypt(3), the difference is 35% to 40% in favor of the new SSE2 implementation. (On older Pentium 4 CPUs, the MMX code is faster than the above per-MHz, so the advantages of the use of SSE2 may be smaller.) For the sake of completeness, the other two benchmarks from the 32-bit builds (they are the same since these use neither SSE2 nor MMX): Benchmarking: FreeBSD MD5 [32/32]... DONE Raw: 9159 c/s real, 9159 c/s virtual Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE Raw: 453 c/s real, 454 c/s virtual Here MD5 became a little bit slower compared to the 64-bit build because there are only 8 registers available in 32-bit mode and only one hash is being computed at a time. Now the Athlon 64: vendor_id : AuthenticAMD cpu family : 15 model : 47 model name : AMD Athlon(tm) 64 Processor 3200+ stepping : 2 Native 64-bit: Benchmarking: Traditional DES [64/64 BS]... DONE Many salts: 791219 c/s real, 791219 c/s virtual Only one salt: 720435 c/s real, 720435 c/s virtual Benchmarking: FreeBSD MD5 [32/64 X2]... DONE Raw: 7419 c/s real, 7419 c/s virtual Benchmarking: OpenBSD Blowfish (x32) [32/64]... DONE Raw: 330 c/s real, 330 c/s virtual Benchmarking: NT LM DES [64/64 BS]... DONE Raw: 6638K c/s real, 6638K c/s virtual This is rather good considering that the real clock rate is only 2.0 GHz, but it is slower than the Xeon. So the "3200+" rating does not hold for this benchmark. However, with SSE2 things are better: Benchmarking: Traditional DES [128/128 BS SSE2]... DONE Many salts: 951193 c/s real, 951193 c/s virtual Only one salt: 827776 c/s real, 827776 c/s virtual Benchmarking: NT LM DES [128/128 BS SSE2]... DONE Raw: 6474K c/s real, 6474K c/s virtual Now we're at the same level of performance that the Xeon provides for DES-based crypt(3). For comparison against previous versions of John, the MMX build: Benchmarking: Traditional DES [64/64 BS MMX]... DONE Many salts: 785318 c/s real, 785318 c/s virtual Only one salt: 703667 c/s real, 703667 c/s virtual Benchmarking: NT LM DES [64/64 BS MMX]... DONE Raw: 6503K c/s real, 6503K c/s virtual As you can see, this is around 20% slower than SSE2 at DES-based crypt(3), achieving about the same performance that the native 64-bit build does. However, the performance at LM hashes is similar for all three builds (unlike on the Xeon). Finally, for the sake of completeness, the other two benchmarks for the 32-bit builds: Benchmarking: FreeBSD MD5 [32/32]... DONE Raw: 5935 c/s real, 5935 c/s virtual Benchmarking: OpenBSD Blowfish (x32) [32/32]... DONE Raw: 360 c/s real, 360 c/s virtual Overall, the new SSE2 code may provide an up to 40% speedup on current CPUs for DES-based crypt(3) (both traditional and BSDI-style), but its effect on LM hashes is not always positive. Future versions of JtR might provide support for SSE2 with 64-bit builds and improvements for LM hashes. Comments are welcome on the john-users mailing list. -- Alexander Peslyak <solar at openwall.com> GPG key ID: B35D3598 fp: 6429 0D7E F130 C13E C929 6447 73C3 A290 B35D 3598 http://www.openwall.com - bringing security into open computing environments
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.