Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110512105140.GA12713@openwall.com>
Date: Thu, 12 May 2011 14:51:40 +0400
From: Solar Designer <solar@...nwall.com>
To: crypt-dev@...ts.openwall.com
Subject: Re: alternative approach

David, Yuri -

On Thu, May 12, 2011 at 01:51:54PM +0400, Solar Designer wrote:
> I actually thought of modifying Blowfish in specific ways for this
> project.  I am likely to write about this separately.

The primary idea is to use smaller S-boxes.  Normally, Blowfish uses
four 8-to-32 S-boxes.  We could reduce their size maybe to 4-to-16.
This would make software implementations twice less efficient (fewer
bits of data handled per instruction) while reducing the amount of logic
(or BlockRAM) a lot.  The size of S-boxes would reduce from 4 KB to 128
bytes (a 32x reduction).  So we'd fit a lot more of these reduced
Blowfish cores into a chip.

I think 4-bit indices are close to the minimum we can use.  With even
smaller indices, we allow for bitslice implementations in software that
keep the arrays in registers (or load them into registers when needed)
and use not-too-many bitwise operations to extract the right elements.
For 4-bit, it takes 16 registers (or loads) per S-box and at least 4
instructions (when there's a bit select instruction like selb, vsel,
vpcmov, BFI_INT) or 12 (with and-not) or 16 instructions.  Oh, there
will also be plenty of instructions needed to spread/duplicate our
index bits over the SIMD vectors before they're usable for bit selects.
This is probably enough to kill any advantage of using a bitslice
implementation unless the vectors are very wide yet fast.

The S-boxes themselves will be ever-changing, like they are in Blowfish
normally.  This prevents DES-like bitslice implementations.

Also, Blowfish uses XORs (no-carry addition) and ADDs (all-carry
addition).  In hardware, we can instead use partial-carry addition -
e.g., do carry for every other bit.  A non-bitslice software
implementation would need to use an extra instruction there, whereas
we save logic.

Finally, we can add bit permutations.  These are free in hardware, but
are costly for non-bitslice software implementations.  (Some modern
instruction sets include arbitrary byte permutation instructions, but
not arbitrary bit permutations.)

We'd need to do some testing of the resulting component, such as for
collision resistance, but we'd call it "non-crypto".  We may use a
software implementation for most of such testing (slow, but that's OK).

As usual, comments are welcome.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.