|
Message-ID: <20110624234846.GB30978@openwall.com> Date: Sat, 25 Jun 2011 03:48:46 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: John the Ripper 1.7.8: DES speedup On Thu, Jun 23, 2011 at 09:59:27PM +0100, Larry Bonner wrote: > For anyone interested, Brian published his algorithm on the following > page: http://gladman.plushost.co.uk/oldsite/cryptography_technology/serpent/index.php Thanks! On Thu, Jun 23, 2011 at 8:14 AM, Simon Marechal <simon@...quise.net> wrote: > This sound pretty interesting. Is there a writeup coming I'm not sure. We have no specific plans on this. > or some description of the approach ? I'll try to explain it briefly: Roman's primary idea was to start with breadth-first search - that is, find Boolean functions for each possible truth table. Doing it for the full 6 inputs is impractical (we'd need 2**64 memory), but for 5 inputs it is practical. So we allocate a 2**32 entry array, indexed by truth table of 5-to-1 functions. Each filled entry holds the number of logic operations needed to reach that truth table from a certain initial state (typically from the 5 inputs supplied as-is). Since our actual S-boxes are 6-to-4 rather than 5-to-1, this is followed by depth-first search to choose 8 1-bit outputs (and their corresponding 32-bit truth tables, considering function complexities) that can be combined to produce our desired 4 output bits (with pre-defined target 64-bit truth tables), with one of the 6 input bits being the selector bit. There's also the task of choosing which bit to use as the selector and which two operations (OR and XOR, AND and XOR, or AND-NOT and XOR) to use to apply it to a pair of 5-to-1 outputs (the selector bit is chosen for all 4 outputs at once, but the two operations may be chosen differently per output). Luckily, that's a small enough number of alternatives to try. Additionally, in some cases Roman chose to provide other than 5 direct inputs of the S-box as the initial state to measure function complexities against. Finally, there's the task of reconstructing the chosen 5-to-1 functions and generating the proper expressions with intermediate results reuse (for the 8 partial outputs). Much of this requires a lot of manual tuning and re-tuning per S-box. I'm sure I have missed many details, some of them for simplicity and some because Roman and not I was the one to work on this. (So I don't recall or even don't know some of the details, even though I do have a copy of the program code.) As I had mentioned, my involvement was code generation and feedback to Roman to make sure we're able to generate decent code. Roman was running my Perl scripts to estimate the number of registers and 2-op instructions needed for different S-box expressions. Then I also ran them, and improved them further, to choose the most suitable ones of the remaining S-box expressions (thousands of them). My final revisions also estimated available parallelism, separately for 3-op and 2-op architectures. You can see some selection criteria in somewhat cryptic comments in the S-box files in JtR 1.7.8, like this: /* s8-019630, 41 gates, 14 regs, 11 andn, 4/21/60/101/143 stalls, 62 biop */ Not surprisingly, the goal to require fewer registers sometimes conflicted with the desire to have more parallelism. There were other conflicting criteria as well. Thus, the final S-box files in 1.7.8 have several versions of almost every S-box, with #if's. I also suggested a way to halve the big array (to 2**31 entries), but we never implemented that. This would be desirable if we decided to get a community involved in searching for even more optimal S-box expressions - such that people with 32-bit systems could participate too. (As currently implemented, Roman's program requires a 64-bit system with at least 5 GB RAM, and this might not always be enough.) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.