|
Message-ID: <20100704231124.GA31023@openwall.com> Date: Mon, 5 Jul 2010 03:11:24 +0400 From: Solar Designer <solar@...nwall.com> To: announce@...ts.openwall.com Subject: [openwall-announce] Owl-current on CD; JtR DES crypt(3) and LM hash speedup Hi, As usual, this is a cumulative announcement for several things at once. These were previously tweeted about - http://twitter.com/openwall - and posted on the news page - http://www.openwall.com/news For this announcement, I'll group them into two categories: 1. It is now possible to get Openwall GNU/*/Linux -current snapshots on CD (with delivery worldwide) - 32-bit and/or 64-bit (your choice). The pricing starts at $9.35 (which just covers our costs), but you're encouraged to pick a more expensive option (which supports our project): http://www.openwall.com/Owl/order The intent is to keep recent -current snapshots available for purchase on CD along with releases, although that will depend on demand or lack thereof. Previously, only the last release was available for purchase on CD. 2. John the Ripper's bitslice DES code is being re-worked much further, resulting in greater ease of use on multi-core systems, as well as in major per-core speedups at LM hashes. This includes optional OpenMP parallelization, which allows a single "john" process, invoked in the usual manner, to take advantage of multiple CPU cores when auditing DES-based Unix crypt(3) hashes. (JtR 1.7.6 release only supported this kind of parallelization for certain other/slower hash types.) This also includes a new vectorization- and parallelization-friendly key setup algorithm, which makes LM hash computations more than twice faster per-core (as tested on x86-64) and allows for parallelization of DES-based crypt(3) hash computations even for the single-salt case (including parallelization of most of the key setup "overhead"). The current in-development yet publicly released patches for JtR 1.7.6 achieve the following performance numbers on a single Core i7 920 2.67 GHz CPU (quad-core capable of running 2 threads per core): LM hashes, single process, single thread (no OpenMP), "--test" - 39M c/s ... ditto, actual "incremental" mode run (more "overhead") - 30M+ c/s ... 8 simultaneous processes, combined "--test" speeds - 173M c/s LM hashes, single process, 8 threads (OpenMP), "--test" - 65M c/s ... ditto, actual "incremental" mode run (more "overhead") - 45M+ c/s DES crypt(3), 1 process, 8 threads (OpenMP), "--test", multi-salt - 10.2M c/s ... ditto, actual "incremental" mode run (more "overhead") - 10.0M+ c/s DES crypt(3), 1 process, 8 threads (OpenMP), "--test", single salt - 8.6M c/s ... ditto, actual "incremental" mode run (more "overhead") - 8.1M+ c/s These numbers for DES crypt(3) correspond to an OpenMP parallelization efficiency of 80% to 90% (vs. multiple separate processes running the non-OpenMP build with separate candidate password streams) - e.g., the same system would do 11.5M c/s combined for multi-salt with separate processes. This slight efficiency loss may be compensated for by the greater ease of use (just one JtR invocation to manage instead of 8) and by likely more optimal order in which candidate passwords are tried when there's just one stream of those. Finally, here are some more exciting performance numbers for a dual Xeon X5460 3.16 GHz server (8 CPU cores total) under light unrelated load: LM hashes, single process, single thread (no OpenMP), "--test" - 45M c/s ... 8 simultaneous processes, combined "--test" speeds - 356M c/s LM hashes, single process, 8 threads (OpenMP), "--test" - 64M c/s The 356M c/s figure is pretty exciting. Previously, one would expect this kind of performance from a GPU, but here it is achieved with two CPUs found in a single system, and even under light unrelated load. The 64M c/s figure for the OpenMP build is pretty good, but not exciting - we've already seen better speed for a single Core i7. Unrelated system load truly kills OpenMP performance in many cases, and the efficiency of LM hash parallelization with OpenMP is not great anyway. Now to DES-based crypt(3) on the dual Xeon: DES crypt(3), 1 process, 8 threads (OpenMP), "--test", multi-salt - 21M c/s DES crypt(3), 1 process, 8 threads (OpenMP), "--test", single salt - 15.5M c/s The OpenMP parallelization efficiency is 67% to 86% - that is, even better speeds may be achieved with 8 simultaneous processes - such as 24M c/s for multi-salt and 23M c/s for single salt. The patches may be found at: http://openwall.info/wiki/john/patches Here are some john-users postings with even more detail and substantiation for the performance numbers given above: http://www.openwall.com/lists/john-users/2010/07/03/1 http://www.openwall.com/lists/john-users/2010/06/30/2 As usual, feedback is welcome - on the john-users list, please. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.