|
Message-ID: <20110512105140.GA12713@openwall.com> Date: Thu, 12 May 2011 14:51:40 +0400 From: Solar Designer <solar@...nwall.com> To: crypt-dev@...ts.openwall.com Subject: Re: alternative approach David, Yuri - On Thu, May 12, 2011 at 01:51:54PM +0400, Solar Designer wrote: > I actually thought of modifying Blowfish in specific ways for this > project. I am likely to write about this separately. The primary idea is to use smaller S-boxes. Normally, Blowfish uses four 8-to-32 S-boxes. We could reduce their size maybe to 4-to-16. This would make software implementations twice less efficient (fewer bits of data handled per instruction) while reducing the amount of logic (or BlockRAM) a lot. The size of S-boxes would reduce from 4 KB to 128 bytes (a 32x reduction). So we'd fit a lot more of these reduced Blowfish cores into a chip. I think 4-bit indices are close to the minimum we can use. With even smaller indices, we allow for bitslice implementations in software that keep the arrays in registers (or load them into registers when needed) and use not-too-many bitwise operations to extract the right elements. For 4-bit, it takes 16 registers (or loads) per S-box and at least 4 instructions (when there's a bit select instruction like selb, vsel, vpcmov, BFI_INT) or 12 (with and-not) or 16 instructions. Oh, there will also be plenty of instructions needed to spread/duplicate our index bits over the SIMD vectors before they're usable for bit selects. This is probably enough to kill any advantage of using a bitslice implementation unless the vectors are very wide yet fast. The S-boxes themselves will be ever-changing, like they are in Blowfish normally. This prevents DES-like bitslice implementations. Also, Blowfish uses XORs (no-carry addition) and ADDs (all-carry addition). In hardware, we can instead use partial-carry addition - e.g., do carry for every other bit. A non-bitslice software implementation would need to use an extra instruction there, whereas we save logic. Finally, we can add bit permutations. These are free in hardware, but are costly for non-bitslice software implementations. (Some modern instruction sets include arbitrary byte permutation instructions, but not arbitrary bit permutations.) We'd need to do some testing of the resulting component, such as for collision resistance, but we'd call it "non-crypto". We may use a software implementation for most of such testing (slow, but that's OK). As usual, comments are welcome. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.